Course Summary
Credit Type:
Self-paced, 30 hours
Dates Offered:
Credit Recommendation & Competencies
Level Credits (SH) Subject
Lower-Division Baccalaureate 2 Data Science


The course objective is to help prepare the learner for a job role as a Data Analyst.

Learning Outcomes:

  • Implement effective data management solutions
  • Use Talend Open Studio to showcase the ETL concept
  • Create a data model
  • Work with functions which apply to each element of an array
  • Perform reshape operations on arrays to visualize its contents in different ways
  • Perform operations between arrays of mismatched shapes by applying broadcasting rules
  • Utilize NumPy to perform multi-dimensional array operations
  • Work with Pandas Series by accessing elements using the default and a custom index
  • Apply a join operation on two related but dissimilar DataFrames using the merge function
  • Use Pandas for advanced tabular data manipulation
  • Apply classification and clustering methods to data science problems using R
  • Compare and contrast SQL and NoSQL database solutions
  • Use Visual Paradigm to create a relational database ERD
  • Describe distributed systems from a data perspective
  • Install NumPy and learn how to create basic NumPy arrays
  • Apply boolean masks to access array elements which fulfil a specific condition
  • Install Pandas and create a Pandas Series
  • Apply a multi-index to a DataFrame and reshape it using the stack and melt operations
  • Implement a hierarchical index and access the DataFrame's contents based on that index
  • Create, manipulate, and sort vectors in R
  • Create factors and data frames in R
  • Perform matrix operations in R
  • Export tabular data from R to a CSV file, an Excel spreadsheet, and an HTML table
  • Use the dplyr library including working with tabular data, piping data, mutating data, summarizing data, combining datasets, and grouping data
  • Apply regression methods to data science problems using R
  • Differentiate between inferential and descriptive statistics, enumerate the two most important types of descriptive statistics, and define the formula for standard deviation

General Topics:

  • Data architecture getting started
  • Data engineering getting started Python - introduction to NumPy for multi-dimensional data
  • Python - advanced operations with NumPy Arrays
  • Python - introduction to Pandas and DataFrames
  • Python - manipulating and analyzing data in Pandas DataFrames
  • R data structures
  • Importing and exporting data using R
  • Data exploration using R
  • R regression methods
  • R classification and clustering
  • Simple descriptive statistics
  • Common approaches to sampling data
  • Inferential statistics
  • Apache Spark getting started
  • Hadoop and MapReduce getting started
  • Developing a basic MapReduce Hadoop application
  • Hadoop HDFS getting started
  • Introduction to the Shell for Hadoop HDFS
  • Working with files in Hadoop HDFS
  • Hadoop HDFS file permissions
  • Data silos, lakes, and streams introduction
  • Data lakes on AWS
  • Data lake sources, visualizations, and ETL operations
  • Applied data analysis
Instruction & Assessment

Instructional Strategies:

  • Computer Based Training
  • Practical Exercises

Methods of Assessment:

  • Examinations
  • Quizzes

Minimum Passing Score:

Supplemental Materials