Executive Site Reliability Manager (40446)
Looking for an Executive Site Reliability Manager with a proven executive background in SRE or IT operations. You will serve as the chief authority on reliability and lead the design and execution of a comprehensive reliability strategy. Your role will involve mentoring a distributed team of Site Reliability Managers and overseeing domain-wide stability programs. With your deep technical expertise in SRE principles and incident management, you'll act as the executive escalation point during major incidents.
🚀 Project
- serve as the chief stability and reliability authority
- lead the design and execution of a reliability strategy
- define and champion the company’s reliability vision
- direct and mentor a distributed team of Site Reliability Managers (SRMs)
- oversee domain-wide stability programs and coordinate reliability initiatives
- act as the executive escalation point during major incidents
- ensure comprehensive observability and monitoring
- oversee the reliability of CI/CD pipelines and deployment strategies
- lead chaos engineering and capacity planning initiatives
- guide risk assessment for major releases and configuration changes
- partner with engineering and product leaders
- promote a blameless postmortem culture and facilitate learning
🎯 Skills
- proven executive experience in SRE or IT operations
- deep technical expertise in SRE principles, incident management, and cloud/hybrid architectures
- demonstrated success leading cross-functional teams
- strong familiarity with observability tools (Prometheus, Grafana, etc.)
- experience with deployment frameworks (Kubernetes, Terraform, Ansible)
- exceptional communication skills
- experience with ITIL, DevOps, and structured management frameworks
- ENG C1