- Coinbase (Charlotte, NC)
- …wide system's reliability and less customer impact . As a *Senior Software Engineer * you will help to promote reliability culture across Coinbase. You would ... on a daily basis. *What you'll be doing (ie. job duties):* * Improve observability , reliability and availability by defining and measuring key metrics * Build… more
- NVIDIA (Santa Clara, CA)
- …NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity ... optimizations (SR-IOV/ DPU) + Experience with Technologies like eBPF and XDP for Observability & DDoS mitigation + Collect and review system data for capacity and… more
- MongoDB (New York, NY)
- …Samsung and Toyota, trust MongoDB to build next-generation, AI-powered applications. The Site Reliability Engineering team designs and builds the global ... clusters or some other container orchestration infrastructure + Experience with observability of large scale distributed systems To drive the personal growth… more
- Federal Reserve Bank (Boston, MA)
- …the payments landscape in the United States. The position will be primarily on- site with residency commutable to one of our offices required. **Responsibilities** + ... As a Principal Engineer of the SRE / Production Operations team for...Experience working with Docker, Containers, ECR and EKS. + Observability - CloudWatch, OpenSearch, Dynatrace, Grafana, Prometheus + Familiarity… more
- Palo Alto Networks (Santa Clara, CA)
- …are robust and performant. This includes automation, architecture, performance, observability , troubleshooting, security, and reliability . Our Infrastructure ... Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps, Prometheus, Grafana, Loki, Docker, GCP, Backstage, MySQL, PagerDuty, FireHydrant, Python, Bash, Java, NodeJS and Go. **Your Impact** + **Design, build, and operate** reliable, secure Cloud… more
- Palo Alto Networks (Santa Clara, CA)
- …are robust and performant. This includes automation, architecture, performance, observability , troubleshooting, security, and reliability . Our Infrastructure ... Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps, Prometheus, Grafana, Loki, Docker, GCP, Backstage, MySQL, PagerDuty, FireHydrant, Python, Bash, Java, NodeJS and Go. **Your Impact** + Design, build, and operate reliable, secure Cloud… more
- Abbott (Pleasanton, CA)
- …mothers, female executives, and scientists. **The Opportunity** We're looking for a strong **Senior Site Reliability Engineer (SRE)** who's ready to roll up ... , helping monitor systems, respond to incidents, and drive continuous improvements in reliability and observability **What You'll Work On** + **System … more
- MetLife (Tampa, FL)
- …technical execution, and cultural change to enable adoption of site reliability principles, automation-first approaches, Infrastructure-as-Code (IaC), and ... to have you! The Opportunity: As Director of Database Reliability Engineering (DBRE), you will play a pivotal role...services. * Champions the adoption of AIOps and modern observability tools to enable intelligent, self-healing systems. * Drive… more
- MongoDB (Austin, TX)
- …to build next-generation, AI-powered applications. We are looking for an experienced Staff Engineer for our SRE, InfraSec team, to guide the security of our ... in leading projects within security-focused areas, such as runtime scanning, security observability , CSPM, and more Cloud Expertise: + Strong experience with at… more
- Alaska Airlines (Seatac, WA)
- …people love, we want to hear from you. **Role Summary** The Principal Systems Reliability Engineer (SRE) is the sole subject matter expert in software ... is required. **Preferred** + Demonstrate experience in coaching and mentoring system and site reliability engineers. + Experience applying ITIL and IT process… more