- NVIDIA (Santa Clara, CA)
- NVIDIA's AI Infrastructure organization is seeking a Senior AI Observability Engineer to help architect and implement distributed observability systems for ... productivity of AI and HPC workloads. You will develop, deploy, and operate observability solutions for multiple compute clusters around the world. What You'll Be… more
- DoorDash (San Francisco, CA)
- …for DoorDash developers to ship great products quickly and reliably. The Observability team within Core Infrastructure builds and operates systems to provide ... and performance of all DoorDash systems and services. These observability systems are used by all DoorDash developers! About...at least 5 years experience as a backend software engineer + You have experience building and operating infrastructure… more
- NVIDIA (Santa Clara, CA)
- …Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection platform with a focus on performance at scale, ... system in Production + 8+ years experience delivering foundational infrastructure and observability platforms. + Experience in one or more of the following: Python,… more
- Palo Alto Networks (Santa Clara, CA)
- …including the design, implementation, and continuous enhancement of our comprehensive observability systems. To meet the opportunities that such a role provides, ... you will have a deep knowledge of modern observability and monitoring tools and practices, having managed high...DevOps/SRE Expertise: 5+ years of experience as a DevOps/SRE engineer with a passion for technology and a strong… more
- Cardinal Health (Sacramento, CA)
- …to add an SRE to the team. We are looking for a hands-on engineer with experience running platforms leveraging industry observability and automation platforms. ... scalability, and incident resilience across Sonexus platforms. You'll develop observability systems, engineer intelligent automation, and champion collaboration… more
- MongoDB (Palo Alto, CA)
- …trust MongoDB to build next-generation, AI-powered applications. The Networking & Observability Team builds infrastructure for low-overhead observability and ... or building core components for data processing systems + Familiarity with observability ecosystem and best practice + Excellent verbal and written technical… more
- General Motors (Mountain View, CA)
- …live and deliver a better future for generations to come. In this SRE SW Engineer role, you will develop and maintain key elements of the infrastructure health and ... let's innovate! **What You'll Do** + Implement scalable, reliable, secure SRE and Observability platform to monitor health of our production system and provide a… more
- General Motors (Mountain View, CA)
- …three times per week._ **_The Role:_** **The Software Engineering Site Reliability Engineer (SRE) is** **a Software Engineer ** **responsible for ensuring the ... Job Description** **What You'll Do** + Implement scalable, reliable, secure SRE and Observability platform to monitor health of our production system and provide a… more
- NVIDIA (Santa Clara, CA)
- …These platforms bring together the full power of NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC ... software stack. We're looking for a strong technical architect to own the end-to-end architecture of these products, at the system software level. Including firmware, kernel drivers, operating systems, and user mode drivers. You will work with component leads… more
- Northrop Grumman (Manhattan Beach, CA)
- …Specialty, Azure Data Engineer Associate, or Google Professional Data Engineer . + MLOps, Observability Tools, Data Versioning, and Containerization for ... deploying data engineering workflows. + Cloud security best practices, including IAM, encryption, and compliance with frameworks like NIST or FedRAMP. + Knowledge of advanced networking concepts such as VPC peering, VPNs, and load balancing for data-heavy… more