-
Senior Site Reliability Engineer, Crew IT
- Delta Air Lines, Inc. (Atlanta, GA)
-
How you'll help us Keep Climbing (overview & key responsibilities)
At Delta Air Lines,
connection is at the heart of everything we do and guides our every action. We
strive to welcome and care for all of our customers during their travels with
us and aim to deliver an elevated experience.
Delta is focused on
sustaining a strong IT operation, growing our capabilities, and maximizing
optimization across each of our tech hubs to elevate the travel experience for
our customers and empower our 90,000 Delta people.
We’re committed to
fostering innovation, and we’re excited to invite you to be part of our journey
as we shape the future of technology at the world’s best airline!
The Senior Site Reliability Engineer works to improve the Reliability and Resiliency of Delta Software Solutions to meet the business requirements by implementing SRE tools, processes, and standard methodologies. SRE is what happens when you ask a software engineer to design an operations function. The Senior Site Reliability Engineer designs, develops, tests, debugs, and automates tasks for applications. They troubleshoot incidents to address failure patterns, automate remediation through runbooks, and document application optimization.
Responsibilities
+ Supporting a reliable application suite for the environment in order to meet the development and maintenance requirements of systems/platforms.
+ Working as part of the development team to evaluate the health, stability, and reliability of applications.
+ Utilizing monitoring, alerts, dashboards, and management tools to ensure the availability, reliability and performance of applications and services.
+ Constantly working to improve and implement automation of applications tasks.
+ Providing technical support for systems/platforms according to application SLA's.
+ Responsible for developing resiliency in the application code, troubleshooting incidents, engaging with squads to address failure patterns, and participating in incident management.
+ Leading and mentoring junior team members and software engineers to enhance our SRE practice
What you need to succeed (minimum qualifications)
+ 5 or more years of hands-on experience as a Site Reliability Engineer or related technical engineering capacity.
+ Experience handling large numbers of diverse systems with configuration management systems like Puppet, Chef, Ansible
+ Experience with developing and maintaining tools, dashboards and scripts to monitor application functions across a wide array of systems to detect and resolve issues with an aim towards maintaining optimal conditions for system applications.
+ Knowledge of software engineering; ability to deliver new or enhanced fee-based software products.
+ Proficient in one or more of the following scripting languages: JavaScript, Nodejs, Python, Ansible, Bash, etc.
+ Strong documentation skills, with the ability to create and maintain clear, concise, and actionable technical documentation, including runbooks, incident reports, architectural diagrams, and operational procedures. Commitment to documentation as a first-class engineering practice.
+ Strong experience with monitoring and alerting systems like Prometheus, Grafana, Datadog and PagerDuty.
+ Knowledge of agile methodologies and the agile development lifecycle; ability to use formal agile methodologies, disciplines, practices and techniques for the delivery of new and enhanced applications.
+ Knowledge of concepts, values and tools applied in building Continuous Integration (CI), Continuous Delivery and Continuous Deployment (CD) pipeline; ability to design, build, implement and maintain CI/CD pipelines to achieve the automation of software delivery process.
+ Experience engineering software within an Amazon Web Services (AWS) cloud infrastructure or other prominent enterprise cloud provider.
+ Experience in containerized workloads and management platforms such as Docker or Kubernetes
+ Understanding of standard networking protocols and components such as HTTP, DNS, ECMP, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies
+ Knowledge of the theories and methodologies of reliability engineering; ability to design, develop and support various tools, services and applications to maintain a reliable site environment.
+ Embraces a diverse set of people, thinking and styles.
+ Consistently makes safety and security, of self and others, the priority.
+ High School diploma, GED or High School Equivalency.
What will give you a competitive edge (preferred qualifications)
+ Experience using tools and services such as, AWS Cloudwatch and Dynatrace.
+ Experience with Change, Incident, Problem and Configuration Management tools such as, PagerDuty and ServiceNOW.
+ Experience with IAC tools such as CFT, CDK or Terraform.
+ Experience with reliability engineering practices, including incident response practices, capacity planning and SLA tracking.
+ Bachelors Degree in Computer Science, Information Systems or related technical field.
+ Experience working in an airline technology environment.
Benefits and Perks to Help You Keep Climbing
Our culture is
rooted in a shared dedication to living our values – Care, Integrity,
Resilience and Servant Leadership – every day, in everything we do. At Delta,
our people are our success. At the heart of what we offer is our focus on
Sharing Success with Delta employees. Exploring a career at Delta gives you a
chance to see the world while earning great compensation and benefits to help
you keep climbing along the way:
+ Competitive salary,industry-leading profit sharing program, and performance incentives
+ 401(k) with generouscompany contributions up to 9%
+ New hires areeligible for up to 2-weeks of vacation. This is earned for use in the followingvacation year (April 1 – March 31)
+ In addition tovacation, new hires are eligible for up to 56 hours of paid personal timewithin a 12-month period
+ 10paid holidays per calendar year
+ Birthing parents areeligible for 12-weeks of paid maternity/parental leave
+ Non-birthing parentsare eligible for 2-weeks of paid parental leave
+ Comprehensive healthbenefits including medical, dental, vision, short/long term disability and lifeinsurance benefits
+ Family careassistance through fertility support, surrogacy and adoption assistance,lactation support, subsidized back-up care, and programs that help with lovedones in all stages
+ Holistic Wellbeingprograms to support physical, emotional, social, and financial health,including access to an employee assistance program offering support for you andanyone in your household, free financial coaching, and extensive resourcessupporting mental health
+ Domestic andInternational space-available flight privileges for employees and eligiblefamily members
+ Career developmentprograms to achieve your long-term career goals
+ World-widepartnerships to engage in community service and innovative goals created tofocus on sustainability and reducing our carbon footprint
+ Business ResourceGroups created to connect employees with common interests to promote inclusion,provide perspective and help implement strategies
+ Recognition rewardsand awards through the platform Unstoppable Together
+ Access to over 500discounts, specialty savings and voluntary benefits through Deltaperks such ascar and hotel rentals and auto, home, and pet insurance, legal services, andchildcare
Delta Air Lines, Inc. is an Equal Employment Opportunity / Affirmative Action employer and provides reasonable accommodation in its application process for qualified individuals with disabilities and disabled veterans. If you are a qualified individual, you may request a reasonable accommodation if you are unable or limited in your ability to access job openings through this site, apply for jobs through Delta’s online system, or at any point in the selection process. To request a reasonable accommodation, please click here
-
Recent Jobs
-
Senior Site Reliability Engineer, Crew IT
- Delta Air Lines, Inc. (Atlanta, GA)
-
Community Engagement Specialist
- Humana (Richmond, VA)