Course

Course Summary
Credit Type:
Course
ACE ID:
SKIL-0226
Organization:
Location:
Online
Length:
59.5 hours and 32 lab hours
Dates Offered:
Credit Recommendation & Competencies
Level Credits (SH) Subject
Upper-Division Baccalaureate 2 software engineering
Description

Objective:

The course objective is to provide you with the knowledge and skills required to transition from a Network Admin to a Site Reliability Engineer (SRE). The course starts by providing network administrators with a better understand of the role that an SRE plays and their importance within the enterprise. From there, the Network Admin can move into the DevOps role. Here, learners will explore release engineering concepts, roles and best practices for software development, and automation. Next, the course covers the role of Chaos Engineer, where the focus is on troubleshooting techniques and best practices, software reliability testing, and ultimately, deploying products at scale. During the final stages of the course, learners will acquire the skills to become Site Reliability Engineers who are involved in scaling the SRE team and making the right decisions for the greatest business impact.

Learning Outcomes:

  • explore backup and recovery best practices
  • describe SRE scenario planning
  • explore best practices for build and release engineering, automation, and simplicity
  • discover best practices for SRE postmortem culture and cloud and container architectures
  • explore SRE emergency response and incident handling
  • discuss SRE testing for reliability, load balancing, and overload and cascade failures
  • deploy products at scale
  • manage software reliability metrics
  • define the SRE engagement model.
  • define and explore Site Reliability
  • examine modern operating system deployment strategies
  • monitor distributed systems
  • troubleshoot effectively for the SRE
  • define SRE distributed reliability, data pipelines, and integrity
  • explore scaling the SRE team, dealing with interrupts and overload, and communication and collaboration

General Topics:

  • Site reliability: engineering
  • Site reliability: tools and automation
  • OS deployment strategies: upgrading and maintaining systems
  • OS deployment strategies: deploying modern systems
  • OS deployment strategies: maintaining and managing modern systems
  • Backup and recovery: business continuity and disaster recovery
  • Backup and recovery: enterprise backup strategies
  • Backup and recovery: Windows client backup and recovery tools
  • Describing distributed systems
  • Monitoring distributed systems
  • Site reliability engineering: scenario planning
  • Build and release engineering best practices: release engineering
  • Build and release engineering best practices: release management
  • Best practices for the SRE: automation
  • Best practices for the SRE: use cases for automation
  • SRE simplicity: software system complexity
  • SRE simplicity: simple software systems
  • SRE postmortums: blameless postmortem culture creation
  • Cloud and containers for the SRE: Cloud architectures and solutions
  • Cloud and containers for the SRE: containers
  • Cloud and containers for the SRE: Implementing container solutions
  • SRE troubleshooting processes
  • SRE troubleshooting: tools
  • SRE emergency and incident response: responding to emergencies
  • SRE emergency and incident response: Incident response
  • SRE testing tasks: software reliability and testing
  • SRE testing tasks: testing considerations
  • SRE load balancing techniques: front-end load balancing
  • SRE load balancing techniques: data center load balancing
  • Site reliability engineer: managing overloads
  • Site reliability engineer: managing cascading failures
  • Distributed reliability: SRE critical state management
  • Distributed reliability: SRE distributed periodic scheduling
  • SRE data pipelines and integrity: data pipelines
  • SRE data pipelines and integrity: pipeline design
  • SRE data pipelines and integrity: data integrity
  • SRE products at scale: product launches
  • SRE team management
  • Core skills for site reliability engineers
  • SRE metric management
  • SRE engagement
Instruction & Assessment

Instructional Strategies:

  • Computer Based Training
  • Practical Exercises

Methods of Assessment:

  • Examinations
  • Quizzes

Minimum Passing Score:

70%
Supplemental Materials