Course Summary
Credit Type:
52 weeks (156 hours in total)
Dates Offered:
Credit Recommendation & Competencies
Level Credits (SH) Subject
Lower-Division Baccalaureate 3 Introduction to Python
Lower-Division Baccalaureate 3 Introduction to Database Systems
Upper-Division Baccalaureate 3 Advanced SQL Programming
Upper-Division Baccalaureate 3 Data Mining


The course objective is for students to immerse themselves in the role of a data engineer and acquire the essential skills needed to work with various tools and databases to design, deploy, and manage structured and unstructured data.

By the end of this Professional Certificate, students will be able to explain and perform the critical tasks required in a data engineering role. Learners will use the Python programming language and Linux/UNIX shell scripts to extract, transform and load (ETL) data, work with Relational Databases (RDBMS), query data using SQL statements, and use NoSQL databases and unstructured data. Learners will be introduced to Big Data, work with Big Data engines like Hadoop and Spark, and gain experience creating Data Warehouses and utilizing Business Intelligence tools to analyze and extract insights.

Each module includes numerous hands-on labs & projects to apply the concepts and skills learned. The program culminates in a Capstone Project, bringing together all of these skills to develop and implement an entire data platform with various data repositories and pipelines to address a real-world inspired data analytics problem.

This program does not require any prior data engineering or programming experience.

Learning Outcomes:

  • Describe data engineering and its function(s)
  • Describe and differentiate between the role and responsibilities of Data Engineers, Data Scientists, Data Analysts, Business Analysts, and Business Intelligence Analysts
  • Describe the different entities that form a modern data ecosystem
  • Explain what big data is, how it impacts the collection, monitoring, storage, analysis, and reporting of data, and what are some of the big data processing tools
  • Describe the elements of a data engineering ecosystem which includes data, data repositories, data integration platforms, data pipelines, languages, and BI and Reporting tools
  • Explain the characteristics and use of some of the programming, querying, and scripting languages
  • Explain the use of Data Integration Platforms and how they relate to data pipelines and the ETL and ELT processes
  • Describe what RDBMSes and NoSQL databases are and their examples of use
  • List and describe the most common use cases for MongoDB
  • Describe Apache Cassandra and explain how it fits in the NoSQL space
  • Demonstrate skill in retrieving SQL query results and analyzing data
  • Describe Apache Spark application submission, including use of Spark’s unified interface, ‘spark-submit’,describe and apply options for submitting applications, identify external application dependency management techniques and list Spark Shell benefits

General Topics:

  • Introduction to Data Engineering
  • Python for Data Science, AI & Development
  • Python Project for Data Engineering
  • Introduction to Relational Databases (RDBMS)
  • Databases and SQL for Data Science with Python
  • Introduction to NoSQL Databases
  • Introduction to Big Data with Spark and Hadoop
  • Data Engineering and Machine Learning using Spark
  • Hands-on Introduction to Linux Commands and Shell Scripting
  • ETL and Data Pipelines with Shell, Airflow and Kafka
  • Getting Started with Data Warehousing and BI Analytics
Instruction & Assessment

Instructional Strategies:

  • Audio Visual Materials
  • Case Studies
  • Lectures
  • Practical Exercises

Methods of Assessment:

  • Other
  • Quizzes
  • Peer review graded projects with rubrics

Minimum Passing Score:

Supplemental Materials