# Course

Course Summary
Credit Type:
Course
ACE ID:
SKIL-0204
Organization:
Location:
Classroom-based
Length:
Self-paced, 30 hours
Dates Offered:
Credit Recommendation & Competencies
Level Credits (SH) Subject
Lower-Division Baccalaureate 2 Data Science
Description

## Objective:

The course objective is to help prepare the learner for a job role as a Data Analyst.

## Learning Outcomes:

• Implement effective data management solutions
• Use Talend Open Studio to showcase the ETL concept
• Create a data model
• Work with functions which apply to each element of an array
• Perform reshape operations on arrays to visualize its contents in different ways
• Perform operations between arrays of mismatched shapes by applying broadcasting rules
• Utilize NumPy to perform multi-dimensional array operations
• Work with Pandas Series by accessing elements using the default and a custom index
• Apply a join operation on two related but dissimilar DataFrames using the merge function
• Use Pandas for advanced tabular data manipulation
• Apply classification and clustering methods to data science problems using R
• Compare and contrast SQL and NoSQL database solutions
• Use Visual Paradigm to create a relational database ERD
• Describe distributed systems from a data perspective
• Install NumPy and learn how to create basic NumPy arrays
• Apply boolean masks to access array elements which fulfil a specific condition
• Install Pandas and create a Pandas Series
• Apply a multi-index to a DataFrame and reshape it using the stack and melt operations
• Implement a hierarchical index and access the DataFrame's contents based on that index
• Create, manipulate, and sort vectors in R
• Create factors and data frames in R
• Perform matrix operations in R
• Export tabular data from R to a CSV file, an Excel spreadsheet, and an HTML table
• Use the dplyr library including working with tabular data, piping data, mutating data, summarizing data, combining datasets, and grouping data
• Apply regression methods to data science problems using R
• Differentiate between inferential and descriptive statistics, enumerate the two most important types of descriptive statistics, and define the formula for standard deviation

## General Topics:

• Data architecture getting started
• Data engineering getting started Python - introduction to NumPy for multi-dimensional data
• Python - advanced operations with NumPy Arrays
• Python - introduction to Pandas and DataFrames
• Python - manipulating and analyzing data in Pandas DataFrames
• R data structures
• Importing and exporting data using R
• Data exploration using R
• R regression methods
• R classification and clustering
• Simple descriptive statistics
• Common approaches to sampling data
• Inferential statistics
• Apache Spark getting started
• Hadoop and MapReduce getting started
• Developing a basic MapReduce Hadoop application
• Introduction to the Shell for Hadoop HDFS
• Working with files in Hadoop HDFS
• Data silos, lakes, and streams introduction
• Data lakes on AWS
• Data lake sources, visualizations, and ETL operations
• Applied data analysis
Instruction & Assessment

## Instructional Strategies:

• Computer Based Training
• Practical Exercises

• Examinations
• Quizzes

## Minimum Passing Score:

70%
Supplemental Materials