• Principal Hardware Engineer - Hardware…

    Cadence Design Systems, Inc. (San Jose, CA)
    …platform and processes to improve operations. Key Responsibilities: + Implement monitoring framework to improve infrastructure reliability, observability , and ... alerts. + Identifying and implementing automation opportunities to reduce manual work and acceleration delivery. + Drive technical decisions on architecture, automation, and tooling. + Develop processes to track and scale key metrics for reliability,… more
    Cadence Design Systems, Inc. (01/07/26)
    - Related Jobs
  • Staff, Machine Learning Engineer

    Walmart (Sunnyvale, CA)
    …of the ML lifecycle-data sourcing, feature engineering, model training, deployment, monitoring , and continuous improvement. *Apply MLOps best practices such as CI/CD ... for ML, automated training pipelines, model versioning, and telemetry-based monitoring . *Implement robust evaluation frameworks for model performance, data quality,… more
    Walmart (12/19/25)
    - Related Jobs
  • Sr Site Reliability Engineer (Prisma…

    Palo Alto Networks (Santa Clara, CA)
    …robust and performant. This includes automation, architecture, performance, observability , troubleshooting, security, and reliability. Our Infrastructure Platform ... and automation frameworks** , championing **Infrastructure as Code (IaC)** and ** Monitoring as Code (MaC)** principles. + **Automate robust deployments** and… more
    Palo Alto Networks (12/12/25)
    - Related Jobs
  • Senior Principal AI Software Engineer

    Oracle (Sacramento, CA)
    …networking protocols, data center designs, infrastructure as a service, network monitoring and network automation. **Responsibilities** As a Senior Principal AI ... agents, and inference systems into the software stack for designing, monitoring , troubleshooting and deploying networks. + Evaluate, Integrate, and Optimize… more
    Oracle (12/09/25)
    - Related Jobs
  • Principal Staff Site Reliability Engineer

    NVIDIA (Santa Clara, CA)
    …building for performance and reliability at global scale, covering automation, monitoring , high availability, capacity planning, and lifecycle management. + Define ... optimizations (SR-IOV/ DPU) + Experience with Technologies like eBPF and XDP for Observability & DDoS mitigation + Collect and review system data for capacity and… more
    NVIDIA (11/20/25)
    - Related Jobs
  • Senior Software Systems Engineer , Release…

    General Motors (Sunnyvale, CA)
    …reliability or stability regressions. + **Integrate data pipelines** for continuous monitoring of release health, including automated collection of test, simulation, ... or equivalent). + Prior experience implementing **ELT/ETL pipelines** for quality monitoring , reliability, or release metrics. + Solid understanding of **system… more
    General Motors (10/28/25)
    - Related Jobs
  • Senior Software Engineer , Backend…

    Coinbase (Sacramento, CA)
    …* Lead end-to-end delivery of projects through implementation, deployment, and monitoring * Improve and maintain operational excellence standards across the team, ... proactively addressing technical debt and driving improvements in reliability and observability * Participate in code reviews and on-call rotation, lead incident… more
    Coinbase (01/08/26)
    - Related Jobs
  • Senior Software Engineer , Backend…

    Coinbase (Sacramento, CA)
    …* Lead end-to-end delivery of projects through implementation, deployment, and monitoring * Improve and maintain operational excellence standards across the team, ... proactively addressing technical debt and driving improvements in reliability and observability * Participate in code reviews and on-call rotation, lead incident… more
    Coinbase (01/08/26)
    - Related Jobs
  • Senior Software Product Engineer (Remote…

    VetsEZ (CA)
    …while fostering a culture of experimentation and delivery excellence. + Observability and Reliability: Implement monitoring , logging, and automated alerting ... (eg, CloudWatch, Datadog, Prometheus) to ensure system reliability and traceability of AI workflows. + Governance and Compliance: Ensure all AI-enabled components meet HIPAA, VA, and NIST security requirements, aligning with enterprise healthcare standards. +… more
    VetsEZ (12/31/25)
    - Related Jobs
  • Senior Principal Software Engineer

    Oracle (Sacramento, CA)
    …media tools). + Ensure services are built for scale, availability, observability , performance, and security, optimized for graphics and rendering pipelines. + ... workflows. + Drive operational excellence for GPU-powered services, including performance monitoring , failure analysis, and workload optimization. + Stay ahead of… more
    Oracle (11/25/25)
    - Related Jobs