- NVIDIA (Santa Clara, CA)
- …once they are live by measuring and monitoring availability, latency and overall system health. + Scale systems sustainably through mechanisms like automation, ... time enabling developers to make changes to the existing system through careful preparation and planning while keeping an... systems by pushing for changes that improve reliability and velocity + Practice sustainable incident response and… more
- Coinbase (Sacramento, CA)
- …improvements. * Educate, mentor and hold accountable the engineering team to improve the reliability of our systems and make reliability a core value ... platform - and with it, the future global financial system . To achieve our mission, we're seeking a very...you'll be doing (ie. job duties):* * Improve observability, reliability and availability by defining and measuring key metrics… more
- Rubrik (Sacramento, CA)
- … and services with the objective of achieving and exceeding availability and reliability goals * Manage and streamline monitoring systems to enhance ... enable teams at Rubrik to develop secure software and protect data and systems with appropriate security controls. Information Security also develops systems to… more
- NVIDIA (Santa Clara, CA)
- …NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity ... efficiency of services and drive efficiency with software and hardware optimizations ( SR -IOV/ DPU) + Experience with Technologies like eBPF and XDP for Observability… more
- LiveRamp (San Francisco, CA)
- …issues with Engineering teams** + **Setup and maintain Infrastructure & Product Reliability monitoring and alerting** + **Maintain and enhance CI/CD Tooling and ... Dynamodb** + **Optimize the performance and cost of the systems and rightsize Kubernetes containers.** + **Work in close...code, and automate routine tasks** + **Experience with securing systems in a public cloud environment** + **Understands how… more
- Insight Global (Santa Clara, CA)
- …fast-paced Infrastructure, Planning and Processes organization where you will be working as a Senior SRE Engineer . The position will be part of a fast-paced crew ... and Driverless Cars to cater to their infrastructure & systems needs. As an SRE, you'll also be working...Science, Information Technology, or related field, or equivalent experience. - System admin and Windows admin experience in an on… more
- Celestica (San Jose, CA)
- …with a background in the medical, telecommunication, or defense sectors. + Certified Reliability Engineer (CRE) certification is preferred. + Expertise in Design ... the company's strategic direction. **Overview:** We are seeking a Sr . Reliability Engineering Consultant to join our...and driving continuous improvements. A strong background in the reliability of complex electronic systems and their… more
- Teledyne (El Segundo, CA)
- …issues to senior leadership. **Supervisory Responsibilities** Directly manage the Reliability Department Staff: Reliability Engineer (s) and ... data. + Manage the Failure Review and Corrective Action System (FRACAS) and ensure timely resolution of reliability...related problems. + Must communicate concise program status to senior management. + Must be able to communicate and… more
- NVIDIA (Santa Clara, CA)
- GeForce Now is looking for a Manager, Network Site Reliability Engineer (SRE) to enhance our network infrastructure and operations. We are looking for a leader ... be doing: + Cultivate a top-performing team of Network Site Reliability Engineers through encouraging a culture of collaboration, accountability, and technical… more
- LinkedIn (Mountain View, CA)
- …architectural transformations at internet-scale companies + Deep knowledge of systems reliability , observability frameworks, and fault-tolerant architecture ... in Sunnyvale, CA or San Francisco, CA. **Responsibilities** + Serve as a senior technical leader driving the long-term reliability and observability strategy… more