- NVIDIA (Santa Clara, CA)
- NVIDIA's AI Infrastructure organization is seeking a Senior AI Observability Engineer to help architect and implement distributed observability systems ... productivity of AI and HPC workloads. You will develop, deploy, and operate observability solutions for multiple compute clusters around the world. What You'll Be… more
- NVIDIA (Santa Clara, CA)
- …Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection platform with a focus on performance at scale, ... system in Production + 8+ years experience delivering foundational infrastructure and observability platforms. + Experience in one or more of the following: Python,… more
- MongoDB (Palo Alto, CA)
- …trust MongoDB to build next-generation, AI-powered applications. The Networking & Observability Team builds infrastructure for low-overhead observability and ... or building core components for data processing systems + Familiarity with observability ecosystem and best practice + Excellent verbal and written technical… more
- Palo Alto Networks (Santa Clara, CA)
- …and actionable insights into our systems' performance and health. **Your Impact** As a Senior Staff SRE with the Cortex Observability team, you will: + Cloud ... including the design, implementation, and continuous enhancement of our comprehensive observability systems. To meet the opportunities that such a role provides,… more
- General Motors (Mountain View, CA)
- …live and deliver a better future for generations to come. In this SRE SW Engineer role, you will develop and maintain key elements of the infrastructure health and ... let's innovate! **What You'll Do** + Implement scalable, reliable, secure SRE and Observability platform to monitor health of our production system and provide a… more
- LinkedIn (Mountain View, CA)
- …to optimize their models and deliver the best performance possible. As a Senior Software Engineer , you will have first-hand opportunities to advance one ... performance optimizations across billions of user queries Model Training Infrastructure: As an engineer on the AI Training Infra team, you will play a crucial role… more
- Palo Alto Networks (Santa Clara, CA)
- …including the design, implementation, and continuous enhancement of our comprehensive observability systems. To meet the opportunities that such a role provides, ... you will have a deep knowledge of modern observability and monitoring tools and practices, having managed high...our systems' performance and health. **Your Impact** As a Senior SRE with the Cortex Cloud Security Posture Management… more
- Cisco (San Jose, CA)
- Senior Distributed Golang Software Engineer , Isovalent Tetragon Team (US) Apply (https://jobs.cisco.com/jobs/Login?projectId=1444334) + Location:Offsite, San ... open-source software and enterprise solutions solving networking, security, and observability needs for modern cloud native infrastructure. The flagship technology,… more
- Coinbase (Sacramento, CA)
- …to ensure company wide system's reliability and less customer impact . As a * Senior Software Engineer * you will help to promote reliability culture across ... a daily basis. *What you'll be doing (ie. job duties):* * Improve observability , reliability and availability by defining and measuring key metrics * Build… more
- Capital One (San Francisco, CA)
- Senior AI Engineer (LLM Core) **Overview:** At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital ... agreed upon number of hours to be regularly worked. Cambridge, MA: $158,600 - $181,000 for Senior AI Engineer McLean, VA: $158,600 - $181,000 for Senior AI … more