Course

Course Summary
Credit Type:
Course
ACE ID:
STAT-0046
Organization's ID:
613
Organization:
Length:
4 weeks (60 hours)
Dates Offered:
Credit Recommendation & Competencies
Level Credits (SH) Subject
Graduate 3 statistics
Description

Objective:

The course objective is to teach how to use machine learning and statistical methods to identify clusters in multivariate data, i.e., groups of cases that have relatively high within-group similarity. Using those same methods, and additional ones, students will also learn how to identify cases that are relatively unique - anomalies (also called outliers). Students will first cover the building blocks - measuring distance between records and distance between clusters. Then students will learn how to use hierarchical clustering and k-means clustering algorithms, as well as normal mixture models to identify clusters (and, by extension, anomalies). Students will also learn some additional statistical methods for identifying anomalies.

Learning Outcomes:

  • normalize data appropriately and calculate distances between records
  • use different metrics to calculate distances between clusters
  • conduct hierarchical cluster analysis and k-means clustering to identify clusters in multivariate data
  • fit a normal mixture models for continuous variables Interpret/diagnose the output of different clustering procedures
  • identify the assignment of individual cases to clusters
  • use the clustering output to identify potential anomalies
  • use exploratory and model-based statistical methods to identify anomalies (termed "outliers" in statistics)

General Topics:

  • Defining a cluster
  • Defining an anomaly
  • Machine learning clustering methods
  • Statistical methods
  • Applications
Instruction & Assessment

Instructional Strategies:

  • Computer Based Training
  • Discussion
  • Practical Exercises

Methods of Assessment:

  • Quizzes
  • Project

Minimum Passing Score:

80%
Supplemental Materials