Course

Credit Type:
Course
ACE ID:
SKIL-0226
Version:
2
Organization:
Location:
Online
Length:
48.75 hours (52 weeks)
Minimum Passing Score:
70
ACE Credit Recommendation Period:
Credit Recommendation & Competencies
Level Credits (SH) Subject
Lower-Division Baccalaureate 2 Software Engineering
Description

Objective:

The course objective is to provide learners with the knowledge and skills required to transition from a Network Admin to a Site Reliability Engineer (SRE). The course starts by providing network administrators with a better understanding of SRE's role and its importance within the enterprise. From there, the Network Admin can move into the DevOps role. Learners will explore release engineering concepts, roles, and best practices for software development and automation. Next, the course covers the role of Chaos Engineer, focusing on troubleshooting techniques and best practices, software reliability testing, and, ultimately, deploying products at scale. During the final stages of the course, learners will acquire the skills to become Site Reliability Engineers who are involved in scaling the SRE team and making the right decisions for the most significant business impact.

Learning Outcomes:

  • Understand and apply Site Reliability Engineering (SRE) principles, including defining SRE, exploring scenario planning, and managing software reliability metrics.
  • Master modern deployment strategies, backup and recovery best practices, and effective troubleshooting techniques for maintaining system reliability.
  • Implement best practices for build and release engineering, automation, simplicity, and postmortem culture within cloud and container architectures.
  • Enhance SRE team dynamics by exploring strategies for scaling teams, managing interrupts and overloads, and fostering effective communication and collaboration.
  • Develop skills in monitoring and managing distributed systems, including SRE emergency response, incident handling, and testing for reliability.

General Topics:

  • Introduction to Site Reliability Engineering (SRE) Principles
  • Tools and Automation for Site Reliability Engineering
  • Modern Operating System Deployment Strategies
  • Upgrading and Maintaining Operating Systems
  • Business Continuity and Disaster Recovery Strategies
  • Enterprise Backup and Recovery Solutions
  • Monitoring and Managing Distributed Systems
  • Scenario Planning and Incident Response for SRE
  • Best Practices in Build and Release Engineering
  • Simplifying Software Systems for Reliability
  • Creating a Blameless Postmortem Culture
  • Cloud Architectures and Container Solutions for SRE
  • Effective Troubleshooting Processes and Tools for SRE
  • Testing for Software Reliability and Load Balancing Techniques
  • Managing Overloads and Cascading Failures in SRE
  • Distributed System Reliability and Data Pipeline Integrity
  • Launching and Scaling Products at Scale
  • Team Management and Collaboration in SRE
  • Core Skills and Competencies for Site Reliability Engineers
  • Managing Software Reliability Metrics and SRE Engagement Models
Instruction & Assessment

Instructional Strategies:

  • Computer Based Training
  • Practical Exercises

Methods of Assessment:

  • Examinations
  • Quizzes
Supplemental Materials
Equivalencies

Other offerings from Internal Revenue Service