-
Sr Manager, Site Reliability Engineering
- United Airlines (Chicago, IL)
-
Achieving our goals starts with supporting yours. Grow your career, access top-tier health and wellness benefits, build lasting connections with your team and our customers, and travel the world using our extensive route network.
Come join us to create what’s next. Let’s define tomorrow, together.
Description
Job overview and responsibilities
As the Senior Manager of Site Reliability Engineering, you are responsible for guiding a team dedicated to the instrumentation and analysis of vital business applications, ensuring their availability, and contributing to major incident resolution and root cause analysis. You hold accountability for devising the strategy, as well as the assessment, deployment, and management of IT operations tools and methodologies. Your leadership role involves steering technical experts who specialize in evaluating enterprise reliability and enhancing system efficiency. Furthermore, you are tasked with forging and upholding robust connections with digital technology and business executives at all tiers, leveraging your profound technical knowledge and outstanding leadership and analytical abilities to lead your team towards creating highly available applications, adhering to best practices, and promoting system optimization based on empirical evidence in partnership with development teams by leveraging modern DevOps practices.
+ Design, Develop & Drive Outcomes:
+ Understand the potential impact of system requirements and design choices across multiple cloud and on-premise technologies
+ Embrace the role of developing and mentoring the Site Reliability Engineering team, fostering expertise in this critical area
+ Guide the team to devise solutions that not only meet long-term objectives but also effectively address urgent technical debts
+ Position yourself as a prominent thought leader in Site Reliability Engineering Principles, influencing others through your knowledge and experience
+ Regularly disseminate best practices and champion process improvements, both within your team and in collaboration with other teams, to drive collective success
+ Program Management & Delivery:
+ Track the team’s progress on projects and key performance indicators, while also offering concrete, actionable suggestions for further enhancing or influencing product or project delivery
+ Encourage cross-functional collaboration and gather input from technology teams to promote ongoing program enhancement
+ Regularly provide insights on critical Site Reliability Engineering metrics to showcase the program’s achievements and identify potential areas for improvement
+ Keep an updated collection of materials to communicate the current status, including progress, obstacles, opportunities, and the program’s strategic direction to Digital Technology leaders
+ Effectively manage both internal and external relationships to foster and sustain beneficial strategic partnerships, thereby advancing the success of the Site Reliability Engineering Program Develop and roll out training initiatives to ensure that partners are well-equipped to fully utilize Observability programs
+ Oversee the 24/7 command center teams, ensuring they are adept at early detection, triage, and recovery for all applications and services, which contributes to a reduced mean time to recovery
+ Talent Management and People Development:
+ Initiate and facilitate the performance assessment process for your team, fostering an environment that encourages individuals at all performance tiers to excel
+ Establish and nurture relationships with team members to create a foundation of trust, recognizing areas where technical or analytical skills are lacking, devising strategies for improvement Regularly encourage team members to exchange expertise about Site Reliability Engineering practices and embrace new technologies
+ Lead and inspire teams to tackle intricate challenges and champion the use of open-source technologies and solutions
+ Organizational Effectiveness / People:
+ Possessing robust technical expertise and leadership qualities as you lead by example with a proven track record in Site Reliability Engineering
+ Your proficiency in driving the creation of multi-cloud infrastructure serves as a benchmark and motivates the team of developers and infrastructure engineers
+ Collaborate with your engineers to manage project dependencies, adeptly negotiate and plan for incremental delivery milestones with stakeholders, and achieve on-time project completion
+ Work closely with product teams to understand and address their performance and resilience concerns, and formulate sustainable strategies to resolve persistent challenges
+ Engineering Excellence and Practices:
+ Continuously work on enhancing the reliability, stability, and performance of our digital platforms, being at the forefront of promoting engineering excellence, implementing best practices, and overseeing the integration of fully automated telemetry within modern DevOps frameworks
+ Your work in advancing problem detection and service restoration processes is pivotal
+ Utilizing cutting-edge Site Reliability Engineering methods, coupled with automated alerting and self-healing mechanisms, you are instrumental in improving both cloud-based and on-premises systems, thereby fortifying our digital infrastructure’s robustness and efficiency
Qualifications
What’s needed to succeed (Minimum Qualifications):
+ Bachelor's degree in information technology, Business Administration, Computer Science or relevant field
+ 7+ years of IT and business/industry work experience
+ 5+ years of Site Reliability Engineering experience working with telemetry, observability, self-healing solutions, and platform automation
+ +5 years of experience leading projects and managing people
+ 2 - 3 years of leadership experience in managing cross-functional teams or projects, and influencing senior level management and key stakeholders
+ 2+ years of experience with leading DevOps practices and tools (CI/CD pipelines, Jenkins, GitHub)
+ Recognized expertise in field - in industry and/or within United
+ Proven expertise in leading and influencing technical staff or coordinating work across multiple technology teams
+ Proven experience with monitoring, logging and telemetry tools like Dynatrace, Splunk, Prometheus, AWS Cloudwatch, etc.
+ Proficiency with DevOps practices and tools (CI/CD pipelines, Jenkins, GitHub)
+ Ability to diagnose and troubleshoot issues effectively
+ Strong and effective communication skills and status reporting
+ Experience with AWS networking services like VPC, Route 53, and CloudFront, with understanding of cloud concepts like IaaS, PaaS, and SaaS
+ Experience with distributed storage technologies such as EC2 (Elastic Compute Cloud), S3 (Simple Storage Service), RDS (Relational Database Service), VPC (Virtual Private Cloud), Lambda, and CloudFormation
+ Experience in developing monitoring tools and log analysis tools to manage operations
+ Experience in one or more general purpose programming languages: Python, JavaScript, shell scripting (Unix/Linux)
+ Dynatrace Associate Certification or AWS Certified DevOps Engineer is a plus
+ Must be legally authorized to work in the United States for any employer without sponsorship
+ Successful completion of interview required to meet job qualification
+ Reliable, punctual attendance is an essential function of the position
The base pay range for this role is $137,275.00 to $187,000.00.
The base salary range/hourly rate listed is dependent on job-related, non-discriminatory factors such as experience, education, and skills. This position is also eligible for bonus and/or long-term incentive compensation awards.
You may be eligible for the following competitive benefits: medical, dental, vision, life, accident & disability, parental leave, employee assistance program, commuter, paid holidays, paid time off, 401(k) and flight privileges.
United Airlines is an equal opportunity employer. United Airlines recruits, employs, trains, compensates and promotes regardless of race, religion, color, national origin, gender identity, sexual orientation, physical ability, age, veteran status and other protected status as required by applicable law. Equal Opportunity Employer - Minorities/Women/Veterans/Disabled/LGBT.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform crucial job functions. Please contact [email protected] to request accommodation.
-
Recent Jobs
-
Sr Manager, Site Reliability Engineering
- United Airlines (Chicago, IL)
-
Outside Industrial Sales Representative
- UGI Corporation (Birmingham, AL)
-
Maintenance Technician
- Aerotek (San Jose, CA)