Course

Credit Type:

Course

ACE ID:

SKIL-0204

Version:

Organization:

SkillSoft Corporation

Location:

Classroom-based

Length:

Self-paced, 30 hours

Minimum Passing Score:

ACE Credit Recommendation Period:

Credit Recommendation & Competencies

Level	Credits (SH)	Subject
Lower-Division Baccalaureate	2	Data Science

Description

Objective:

The course objective is to help prepare the learner for a job role as a Data Analyst.

Learning Outcomes:

Implement effective data management solutions
Use Talend Open Studio to showcase the ETL concept
Create a data model
Work with functions which apply to each element of an array
Perform reshape operations on arrays to visualize its contents in different ways
Perform operations between arrays of mismatched shapes by applying broadcasting rules
Utilize NumPy to perform multi-dimensional array operations
Work with Pandas Series by accessing elements using the default and a custom index
Apply a join operation on two related but dissimilar DataFrames using the merge function
Use Pandas for advanced tabular data manipulation
Apply classification and clustering methods to data science problems using R
Compare and contrast SQL and NoSQL database solutions
Use Visual Paradigm to create a relational database ERD
Describe distributed systems from a data perspective
Install NumPy and learn how to create basic NumPy arrays
Apply boolean masks to access array elements which fulfil a specific condition
Install Pandas and create a Pandas Series
Apply a multi-index to a DataFrame and reshape it using the stack and melt operations
Implement a hierarchical index and access the DataFrame's contents based on that index
Create, manipulate, and sort vectors in R
Create factors and data frames in R
Perform matrix operations in R
Export tabular data from R to a CSV file, an Excel spreadsheet, and an HTML table
Use the dplyr library including working with tabular data, piping data, mutating data, summarizing data, combining datasets, and grouping data
Apply regression methods to data science problems using R
Differentiate between inferential and descriptive statistics, enumerate the two most important types of descriptive statistics, and define the formula for standard deviation

General Topics:

Data architecture getting started
Data engineering getting started Python - introduction to NumPy for multi-dimensional data
Python - advanced operations with NumPy Arrays
Python - introduction to Pandas and DataFrames
Python - manipulating and analyzing data in Pandas DataFrames
R data structures
Importing and exporting data using R
Data exploration using R
R regression methods
R classification and clustering
Simple descriptive statistics
Common approaches to sampling data
Inferential statistics
Apache Spark getting started
Hadoop and MapReduce getting started
Developing a basic MapReduce Hadoop application
Hadoop HDFS getting started
Introduction to the Shell for Hadoop HDFS
Working with files in Hadoop HDFS
Hadoop HDFS file permissions
Data silos, lakes, and streams introduction
Data lakes on AWS
Data lake sources, visualizations, and ETL operations
Applied data analysis

Instruction & Assessment

Instructional Strategies:

Computer Based Training
Practical Exercises

Methods of Assessment:

Examinations
Quizzes

Supplemental Materials

Equivalencies

Other offerings from SkillSoft Corporation

View All Courses

College Credit Opportunities