- ServiceNow, Inc. (Santa Clara, CA)
- …work experiences in the future. **As a Senior Staff Machine Learning Engineer - Site Reliability Engineer you will:** + Contribute to the design, development and ... implementation of infrastructure, platform, deployment and observability features that power AI workloads. + Collaborate with researchers, AI engineers, and infrastructure teams to ensure our GPU clusters perform efficiently, scale well, and remain reliable. +… more
- MongoDB (San Francisco, CA)
- …remotely in the United States region. **Role Overview** We are seeking a talented Site Reliability Engineer (SRE) with a strong networking background to join the ... Fabric team. This role is pivotal in building and maintaining the robust infrastructure necessary for secure and efficient communication between our services. As an SRE on the Fabric team, you will leverage your expertise in networking, distributed systems,… more
- Motion Recruitment Partners (Sacramento, CA)
- Site Reliability Engineer **Remote Only** Contract $50/hr - $100/hr You'll closely collaborate with fellow cloud architects and engineers specializing in AWS to ... design, define, develop, test, and debug cloud solution components. You'll have the chance to work within a GitOps-based framework to create and manage container apps and use products like Kubernetes to further the mission. Use Python to automate across our… more
- MongoDB (San Francisco, CA)
- …we provide hybrid work accommodation. **Role Overview** We are seeking a talented Site Reliability Engineer (SRE) Lead with a strong networking background to ... join the Fabric team. This role is pivotal in building and maintaining the robust infrastructure necessary for secure and efficient communication between our services. As the lead SRE on the Fabric team, you will leverage your expertise in networking,… more
- General Motors (Mountain View, CA)
- …be a mentor, guide, and a partner, helping engineers grow, and ensuring the reliability and efficiency of the systems they are working on. We believe in setting ... + Develop tools and software to automate operational processes, improve system reliability , and reduce manual intervention. + Lead, Implement and improve monitoring… more
- Coinbase (Sacramento, CA)
- …Q3 2023. *What you'll be doing (ie. job duties):* * Improve observability, reliability and availability by defining and measuring key metrics * Build automation and ... service disruptions and automate incident response * Proactively find and analyze reliability problems across our business units and stack, then design and implement… more
- Palo Alto Networks (Santa Clara, CA)
- …of our SRE group. The InfoSec SRE group is fundamental to ensuring the reliability and availability of the production environment that hosts our InfoSec services. We ... that is managed through Infrastructure as Code, ensuring its reliability , scalability, and security. This includes: + **Securing production environments**… more
- ServiceNow, Inc. (San Diego, CA)
- …technical engineers who are tasked with maintaining and developing the reliability , scalability and performance of the ServiceNow cloud infrastructure. Our SRE's ... repeatable issues. + Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design. + Drive… more
- NVIDIA (Santa Clara, CA)
- …NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity ... architectures and identify opportunities for containerization to improve scalability, reliability , and efficiency. + Strong analytical skills with the ability… more
- Rubrik (Sacramento, CA)
- …and services with the objective of achieving and exceeding availability and reliability goals * Manage and streamline monitoring systems to enhance observability and ... visibility * Perform Production Readiness Assessments of new services to identify reliability needs and surface potential gaps * Develop and maintain documentation… more