- Google (Sunnyvale, CA)
- Senior Software Engineer , Site Reliability Engineering _corporate_fare_ Google _place_ Seattle, WA, USA; Sunnyvale, CA, USA **Mid** Experience driving ... SRE ensures that Google's services-both our internally critical and our externally-visible systems -have reliability and uptime appropriate to users' needs and a… more
- LinkedIn (Mountain View, CA)
- …We do this with a focus on performance, security, and reliability . As a Sr . Staff Software Engineer , you will fill the mission-critical role of ensuring that ... our complex, web-scale systems are healthy, automated, redundant and designed to scale.... with a focus on improving developer productivity and system sustenance. + You will effectively communicate with the… more
- Amazon (Culver City, CA)
- …and studio executives at all levels. Our Infrastructure Engineering team is looking for Sr Site Reliability Engineers to build, deploy, operate, and sustain our ... systems in AWS. The team will operationalize the stability and reliability of these systems and discover innovative ways to scale and operate them reliably as… more
- Palo Alto Networks (Santa Clara, CA)
- …delivering and deploying applications to production + Build observation (logging, metrics, alerting) systems to make sure system works well. + Design and ... Citizen or Green Card holder.** **Your Career** We are seeking development-heavy Site Reliability Engineers (SREs) who are passionate about bringing new ideas to all… more
- NBC Universal (Universal City, CA)
- … systems , responding to alerts, and resolving issues promptly. The engineer also oversees and improves complex telecommunications systems that support ... is expected to be completed during 2025. The Unified Communication Engineer at NBC Universal holds extensive responsibility across various Unified Communications… more
- Palo Alto Networks (Santa Clara, CA)
- …champion SRE best practices, and work collaboratively to ensure our systems are robust and performant. This includes automation, architecture, performance, ... observability, troubleshooting, security, and reliability . Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps, Prometheus,… more
- ServiceNow, Inc. (San Diego, CA)
- …improve the reliability and performance of the infrastructure through improved system design. + Drive a culture of intolerance to manual activity which results ... It all started in sunny San Diego, California in 2004 when a visionary engineer , Fred Luddy, saw the potential to transform how we work. Fast forward to today -… more
- NVIDIA (Santa Clara, CA)
- …once they are live by measuring and monitoring availability, latency and overall system health + Scale systems sustainably through mechanisms like automation, ... time enabling developers to make changes to the existing system through careful preparation and planning while keeping an... systems by pushing for changes that improve reliability and velocity + Practice sustainable incident response and… more
- NVIDIA (Santa Clara, CA)
- …once they are live by measuring and monitoring availability, latency and overall system health. + Scale systems sustainably through mechanisms like automation, ... time enabling developers to make changes to the existing system through careful preparation and planning while keeping an... systems by pushing for changes that improve reliability and velocity + Practice sustainable incident response and… more
- NVIDIA (Santa Clara, CA)
- …NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity ... efficiency of services and drive efficiency with software and hardware optimizations ( SR -IOV/ DPU) + Experience with Technologies like eBPF and XDP for Observability… more