"Alerted.org

Job Title, Industry, Employer
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Advanced Search

Advanced Search

Cancel
Remove
+ Add search criteria
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Related to

  • Lead Site Reliability Engineer (SRE)

    EPAM Systems (Seattle, WA)



    Apply Now

    At EPAM, we’re not just building software — we’re engineering excellence.

     

    We’re looking for a **Lead Site Reliability Engineer (SRE)** with a passion for performance, precision, and proactive problem-solving to join a high-impact team supporting a leading sell-side trading environment.

     

    This role is ideal for someone who thrives in fast-paced financial systems, has a passion for working with data and monitoring tools, and wants to shape the reliability and efficiency of next-generation trading platforms.

     

    The Site Reliability Engineer will focus on ensuring stable connectivity to external partners within a SaaS environment. The ideal candidate will have expertise in financial systems, especially within trading ecosystems, and the ability to proactively drive performance enhancements and improve data usage and analysis. By identifying areas of opportunity, they will help deliver improved service and systems for end users.

     

    Additionally, the candidate will help proactively identify system issues, implement changes and resolutions, and ensure the stability of business-critical applications. They will collaborate to build actionable plans, execute strategies, and lead initiatives to enhance system reliability.

    Responsibilities

    + Provide a strategic vision for trading portfolio performance, covering network connectivity, traffic throughput, and applications

    + Define, configure, and set up alerting and monitoring frameworks for critical applications

    + Monitor application and platform performance using APM and monitoring tools to diagnose and resolve performance issues

    + Collaborate with Azure Cloud environments and contribute to a 24x7x365 support team to diagnose and address system challenges

    + Assess environmental and incident priorities, investigate issues swiftly, and execute efficient resolutions

    + Troubleshoot mission-critical systems and implement preventative problem management solutions

    + Lead on promoting observability, scalability, and resiliency best practices across development and operations teams

    + Analyze, design, and implement solutions to meet application performance and reliability goals

    + Collaborate with cross-functional teams to ensure smooth and unified troubleshooting and resolution processes across departments

    + Craft and maintain SLA/SLO dashboards to monitor system health and performance

    + Define and maintain SLIs, SLOs, and error budgets for applications and infrastructure to drive service improvement

    + Automate operational processes to enhance service offerings and system reliability

    Requirements

    + 5+ years of experience in site reliability engineering, production support, or related roles in fast-paced environments

    + Showcase of leadership or mentoring experience (minimum of 1 year) in guiding cross-functional teams on system reliability

    + Knowledge of monitoring and observability tools such as AppDynamics, New Relic, Prometheus, or Grafana

    + Background in Azure Cloud services, CI/CD pipelines, and container orchestration (Kubernetes or Docker)

    + Proficiency in scripting with Python, Bash, or PowerShell for automation and efficiency gains

    + Understanding of network protocols (TCP/IP, DNS, HTTP) and troubleshooting tools such as Wireshark or tcpdump

    + Capability to analyze complex system issues and performance bottlenecks using APM and log analysis

    + Familiarity with implementing SLA/SLO metrics and monitoring for production systems

    + Combined skills in high-availability systems and database performance optimization

    Nice to have

    + Expertise in SaaS solutions and APIs with a focus on handling external trading partners

    + Knowledge of disaster recovery strategies and business continuity planning

    + Background in trading platforms or buy-side/sell-side financial environments

     

    EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our clients, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.Engineer the Future with a Career at EPAM (https://www.youtube.com/embed/NU\_mnNITn2o?si=IiCxyQ4sr1YJWxDG)

     

    This Remote Position Cannot be Performed in New York City.

     

    Applications will be accepted on a rolling basis.

     

    In accordance with the LA County Fair Chance Ordinance, you may find a copy of the Notice containing a summary of the Ordinance’s key provisions here: Concept FCO Posting 8 27 24 (lacounty.gov)

     

    H1B visa sponsorship is not available for this position.

     

    It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.

     

    EPAM Systems, Inc. is an equal opportunity employer. We recognize the value of diversity and inclusion in creating success for our customers, business partners, shareholders, employees and communities. We are committed to recruiting, hiring, developing and promoting employees without discrimination. As a global employer, this commitment includes complying with all laws in the countries in which we operate. Nevertheless, we believe equal employment practices should not be limited to what the law requires. Equal opportunity and inclusion are essential to motivate, empower and recognize the best in everyone.

     

    At EPAM, employment actions are based on individual qualifications, without regard to race, color, religion, creed, gender, pregnancy status, sexual orientation, gender identity, gender expression, marital or familial status, national origin, ancestry, genetics, age, disability status, veteran status, citizenship status when otherwise legally able to work, or any other characteristic protected by law.

     


    Apply Now



Recent Searches

  • Investment Data Analytics Analyst (United States)
[X] Clear History

Recent Jobs

  • Lead Site Reliability Engineer (SRE)
    EPAM Systems (Seattle, WA)
[X] Clear History

Account Login

Cancel
 
Forgot your password?

Not a member? Sign up

Sign Up

Cancel
 

Already have an account? Log in
Forgot your password?

Forgot your password?

Cancel
 
Enter the email associated with your account.

Already have an account? Sign in
Not a member? Sign up

© 2025 Alerted.org