Course

Course Summary
Credit Type:
Course
ACE ID:
SKIL-0205
Organization:
Location:
Classroom-based
Length:
Self-paced, 30 hours
Dates Offered:
Credit Recommendation & Competencies
Level Credits (SH) Subject
Lower-Division Baccalaureate 2 Data Science or Data Engineering
Description

Objective:

The course objective is to help prepare the learner for a job role as a Data Wrangler.

Learning Outcomes:

  • Perform advanced grouping, aggregations, and filtering operations on DataFrames
  • Apply data wrangling functions using R
  • Apply data wrangling techniques using Trifacta
  • Create, load, and query Hive tables
  • Perform queries and utilize views on complex data types available in Hive
  • Use MapReduce to speed up the process to extract meaningful information from a dataset
  • Create a view from a Spark DataFrame and run SQL queries against it
  • Define and explore data in Windows
  • Create data pipelines using Apache Airflow
  • Recognize data modeling techniques and describe data modeling processes
  • Explore data architecture and implementation strategies using NoSQL, CAP theorem, and partitioning to improve performance
  • Perform data transformations, data cleaning, and statistical aggregations using Pandas DataFrames
  • Explore data in Pandas using popular chart types like the bar graph, histogram, pie chart, and box plot
  • Work with time series and string data in datasets
  • Work with masks and indexes, cleaning duplicated data, and assigning columns as categorical to perform operations
  • Use machine learning in data analytics
  • Set up Keras, implement a deep learning algorithm, and build data pipelines using KNIME
  • Perform MongoDB actions related to data wrangling using Python with the PyMongo library
  • Gather, filter, modify, and query data using MongoDB
  • Use partitioning to boost query performance in HDFS
  • Apply bucketing of Hive tables to boost query performance and to use window functions
  • Use Combiners to make MapReduce applications more efficient by minimizing data transfers
  • Perform various analyzes on datasets using Spark DataFrame API methods
  • Design and implement data lakes in the cloud and on-premises

General Topics:

  • Python - using Pandas to work with Series and DataFrames
  • Python - Using Pandas for Visualizations and Time-Series Data
  • Python - Pandas Advanced Features Cleaning Data in R
  • Technology Landscape and Tools for Data Management
  • Machine Learning and Deep Learning Tools in the Cloud
  • Data Wrangling with Trifacta
  • MongoDB Querying
  • MongoDB Aggregation
  • Getting Started with Hive
  • Loading and Querying Data with Hive
  • Viewing and Querying Complex Data with Hive
  • Optimizing Query Executions with Hive
  • Using Hive to Optimize Query Executions with Partitioning
  • Bucketing and Window Functions with Hive
  • Filtering Data Using Hadoop MapReduce
  • Hadoop MapReduce Applications With Combiners
  • Advanced Operations Using Hadoop MapReduce
  • Data Analysis Using the Spark DataFrame API
  • Data Analysis using Spark SQL
  • Data Lake Framework and Design Implementation
  • Data Lake Architectures and Data Management Principles
  • Data Architecture Deep Dive - Design and Implementation
  • Data Architecture Deep Dive - Microservices and Serverless Computing
Instruction & Assessment

Instructional Strategies:

  • Computer Based Training
  • Practical Exercises

Methods of Assessment:

  • Examinations
  • Quizzes

Minimum Passing Score:

70%
Supplemental Materials