- NVIDIA (Santa Clara, CA)
- …make a lasting impact on the world. We are looking for a Principal Software Engineer to join our Software Infrastructure team in Santa Clara, CA. This role blends ... large-scale training and inference pipelines. + Build developer-focused tooling for monitoring , profiling, and debugging database performance in real time. +… more
- NVIDIA (Santa Clara, CA)
- …data systems like Ray, Spark Rapids + Familiarity with metrics collection, health monitoring , and observability tools + Building, operating and maintaining full ... ML platform for data scientists to use. As a data processing platform engineer , you will design, implement and operate Kubernetes based GPU accelerated data… more
- Coinbase (Sacramento, CA)
- …root cause analysis, and blameless retrospectives * Define metrics and bolster monitoring / observability across corporate IAM systems * Participate in regular ... and fully supported. Coinbase is hiring! We are looking for an experienced system engineer (SE) to join the IT Operations Corporate Engineering team to build and… more
- Walmart (Sunnyvale, CA)
- …** **What you'll do ** We're looking for a seasoned **Senior Software Engineer ** with expertise in **Node.js,** **GraphQL, and React** based architectures. In this ... teams + Own the full software lifecycle from design to deployment and monitoring + Collaborate closely with product and frontend teams to define data models… more
- TP-Link North America, Inc. (Irvine, CA)
- …including coding standards, code reviews, testing, and CI/CD pipelines. + Ensure strong observability , monitoring , and reliability of backend systems. + Act as a ... lifestyle. Overview: At TP-Link Systems Inc., we are looking for a Cloud Software Engineer Backend Manager to lead our backend development team. In this role, you… more
- Walmart (Sunnyvale, CA)
- …adaptive security frameworks across the enterprise. **What you'll do:** As a **Principal Engineer ** at Walmart, you will serve as a key technical thought leader ... architectures** . + Design and implement agent-based systems for proactive monitoring , triaging, and decision-making. + Influence roadmap decisions and technical… more
- Palo Alto Networks (Santa Clara, CA)
- …robust and performant. This includes automation, architecture, performance, observability , troubleshooting, security, and reliability. Our Infrastructure Platform ... tools and automation frameworks, championing Infrastructure as Code (IaC) and Monitoring as Code (MaC) principles + Automate robust deployments and orchestrate… more
- NVIDIA (Santa Clara, CA)
- …building for performance and reliability at global scale, covering automation, monitoring , high availability, capacity planning, and lifecycle management. + Define ... optimizations (SR-IOV/ DPU) + Experience with Technologies like eBPF and XDP for Observability & DDoS mitigation + Collect and review system data for capacity and… more
- LiveRamp (San Francisco, CA)
- …with Engineering teams** + **Setup and maintain Infrastructure & Product Reliability monitoring and alerting** + **Maintain and enhance CI/CD Tooling and Terraform ... clouds (GCP or AWS)** + **Experience with deployment and monitoring of highly scalable products.** + **Hands on experience...+ **Experience with SRE best practices, working knowledge of observability principles is a big plus** + **Ability to… more
- Cadence Design Systems, Inc. (San Jose, CA)
- …platform and processes to improve operations. Key Responsibilities: + Implement monitoring framework to improve infrastructure reliability, observability , and ... alerts. + Identifying and implementing automation opportunities to reduce manual work and acceleration delivery. + Drive technical decisions on architecture, automation, and tooling. + Develop processes to track and scale key metrics for reliability,… more