• Senior Software Engineer , Infrastructure…

    Coinbase (Charlotte, NC)
    …wide system's reliability and less customer impact . As a *Senior Software Engineer * you will help to promote reliability culture across Coinbase. You would ... on a daily basis. *What you'll be doing (ie. job duties):* * Improve observability , reliability and availability by defining and measuring key metrics * Build… more
    Coinbase (08/19/25)
    - Related Jobs
  • Principal Staff Site Reliability

    NVIDIA (Santa Clara, CA)
    …NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity ... optimizations (SR-IOV/ DPU) + Experience with Technologies like eBPF and XDP for Observability & DDoS mitigation + Collect and review system data for capacity and… more
    NVIDIA (08/21/25)
    - Related Jobs
  • Site Reliability Engineer 3

    MongoDB (New York, NY)
    …Samsung and Toyota, trust MongoDB to build next-generation, AI-powered applications. The Site Reliability Engineering team designs and builds the global ... clusters or some other container orchestration infrastructure + Experience with observability of large scale distributed systems To drive the personal growth… more
    MongoDB (08/19/25)
    - Related Jobs
  • FedNow Principal Site Reliability

    Federal Reserve Bank (Boston, MA)
    …the payments landscape in the United States. The position will be primarily on- site with residency commutable to one of our offices required. **Responsibilities** + ... As a Principal Engineer of the SRE / Production Operations team for...Experience working with Docker, Containers, ECR and EKS. + Observability - CloudWatch, OpenSearch, Dynatrace, Grafana, Prometheus + Familiarity… more
    Federal Reserve Bank (09/25/25)
    - Related Jobs
  • Principal Site Reliability

    Palo Alto Networks (Santa Clara, CA)
    …are robust and performant. This includes automation, architecture, performance, observability , troubleshooting, security, and reliability . Our Infrastructure ... Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps, Prometheus, Grafana, Loki, Docker, GCP, Backstage, MySQL, PagerDuty, FireHydrant, Python, Bash, Java, NodeJS and Go. **Your Impact** + **Design, build, and operate** reliable, secure Cloud… more
    Palo Alto Networks (10/07/25)
    - Related Jobs
  • Principal Site Reliability

    Palo Alto Networks (Santa Clara, CA)
    …are robust and performant. This includes automation, architecture, performance, observability , troubleshooting, security, and reliability . Our Infrastructure ... Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps, Prometheus, Grafana, Loki, Docker, GCP, Backstage, MySQL, PagerDuty, FireHydrant, Python, Bash, Java, NodeJS and Go. **Your Impact** + Design, build, and operate reliable, secure Cloud… more
    Palo Alto Networks (09/06/25)
    - Related Jobs
  • Sr. Software Reliability Engineer

    Abbott (Pleasanton, CA)
    …mothers, female executives, and scientists. **The Opportunity** We're looking for a strong **Senior Site Reliability Engineer (SRE)** who's ready to roll up ... , helping monitor systems, respond to incidents, and drive continuous improvements in reliability and observability **What You'll Work On** + **System … more
    Abbott (09/20/25)
    - Related Jobs
  • Director Database Reliability

    MetLife (Tampa, FL)
    …technical execution, and cultural change to enable adoption of site reliability principles, automation-first approaches, Infrastructure-as-Code (IaC), and ... to have you! The Opportunity: As Director of Database Reliability Engineering (DBRE), you will play a pivotal role...services. * Champions the adoption of AIOps and modern observability tools to enable intelligent, self-healing systems. * Drive… more
    MetLife (08/15/25)
    - Related Jobs
  • Staff Site Reliability

    MongoDB (Austin, TX)
    …to build next-generation, AI-powered applications. We are looking for an experienced Staff Engineer for our SRE, InfraSec team, to guide the security of our ... in leading projects within security-focused areas, such as runtime scanning, security observability , CSPM, and more Cloud Expertise: + Strong experience with at… more
    MongoDB (10/07/25)
    - Related Jobs
  • Principal Systems Reliability

    Alaska Airlines (Seatac, WA)
    …people love, we want to hear from you. **Role Summary** The Principal Systems Reliability Engineer (SRE) is the sole subject matter expert in software ... is required. **Preferred** + Demonstrate experience in coaching and mentoring system and site reliability engineers. + Experience applying ITIL and IT process… more
    Alaska Airlines (10/14/25)
    - Related Jobs