-
Senior SRE Engineer
- M&T Bank (Buffalo, NY)
-
Job Overview:
We are looking for a highly motivated **SR** **SRE Engineer** with a strong background in **Observability** to join our growing team. This role requires a seasoned professional to guide our team in building, scaling, and maintaining observability solutions that help ensure our systems and services are highly available, performant, and secure.
Responsibilities:
+ Lead the development and implementation of observability tools and practices across multiple platforms, including monitoring, logging, tracing, and alerting.
+ Work closely with product and engineering teams to define observability standards, goals, and best practices.
+ Design and optimize the architecture of observability infrastructure to provide clear insights into the health, performance, and scalability of services.
+ Troubleshoot and diagnose complex issues related to performance and availability, offering actionable insights and solutions.
+ Mentor and guide junior SREs on observability tools and practices, fostering a culture of reliability and proactive monitoring.
+ Manage incidents and post-incident reviews to continuously improve monitoring systems and practices.
+ Partner with DevOps, Software Engineers, and other stakeholders to ensure seamless integration of observability tools with CI/CD pipelines.
+ Implement and maintain high-availability monitoring and alerting systems.
+ Ensure automation of observability tooling to scale with the growth of systems and services.
Education and** **Experience** **Required:
Combined minimum of 6 years’ higher education and/or work experience in systems design, management and/or architecture
5+ years of experience in Site Reliability Engineering, DevOps or system design and/or architecture similar roles.
3+ years of experience leading or managing observability initiatives.
Strong hands-on experience with monitoring tools like Kibana, Dynatrace, Datadog, or similar.
Solid understanding of observability concepts (metrics, logging, tracing, alerting) and frameworks (e.g., OpenTelemetry).
Experience with cloud environments such as AWS, Google Cloud, or Azure.
Familiarity with containerization (Docker, Kubernetes) and orchestration platforms.
Excellent problem-solving skills and ability to troubleshoot complex distributed systems.
Mid-level programming skills in Python, Jason, PowerShell, or other relevant languages.
Experience with incident response and post-mortem analysis.
Excellent communication and collaboration skills
Advanced analytical skills, Advanced troubleshooting skills and Advanced problem solving skills
Education and Experience Preferred:
Familiarity with infrastructure as code (Terraform, CloudFormation).
Login and enrollment instrumentation using SLO/SLI and measuring FCI and FSI.
Experience in building and maintaining distributed systems at scale.
Knowledge of security best practices in observability.
Certifications in Cloud (AWS, GCP, Azure), SRE or DevOps are a plus.
Process-oriented, Logical thinker
Strong knowledge of server/client and virtual technologies
Adaptable, Able to learn quickly in a rapid pace environment
M&T Bank is committed to fair, competitive, and market-informed pay for our employees. The pay range for this position is $93,581.10 - $155,968.51 Annual (USD). The successful candidate’s particular combination of knowledge, skills, and experience will inform their specific compensation.
Location
Buffalo, New York, United States of America
M&T Bank Corporation is an Equal Opportunity/Affirmative Action Employer, including disabilities and veterans.
-