- CareFirst (Reston, VA)
- …+ Service Mesh: Istio, AWS App Mesh, OpenShift Service Mesh etc. + Platform Monitoring , Observability , & Performance Tools: Data Dog, Cloud Watch etc. + DevOps ... of total system. Defines system support requirements to include monitoring , capacity, staffing and patching/updating. Analyzes and resolves program support… more
- Walmart (Bentonville, AR)
- …and Orchestration using Airflow and Kubernetes. + Hands-on experience with application monitoring and observability using Grafana, Prometheus, and Splunk to ... ML/Genai applications incorporating Data Pipelines, Model Training, Inferencing, Versioning and Monitoring . + Experience in cloud platforms such as GCP and Azure.… more
- NVIDIA (Santa Clara, CA)
- …building for performance and reliability at global scale, covering automation, monitoring , high availability, capacity planning, and lifecycle management. + Define ... optimizations (SR-IOV/ DPU) + Experience with Technologies like eBPF and XDP for Observability & DDoS mitigation + Collect and review system data for capacity and… more
- Walmart (Bentonville, AR)
- …Engineers, Product Managers, and MLOps/DevOps teams to streamline model deployment, monitoring , and lifecycle management. **What You Will Bring** + **Expert-level ... , including data ingestion, feature engineering, model training, deployment, and monitoring . + A solid understanding of **core machine learning principles** ,… more
- Cisco (Austin, TX)
- …Cloud. ThousandEyes is also a foundational component of Cisco's growing Full-Stack Observability ("FSO") business. About The Role We're all familiar with the ... is tasked with empowering our customers with ThousandEyes to ease their performance monitoring pains. If you enjoy variety in job responsibilities, this is the job… more
- Wayfair (Boston, MA)
- …databases. + Proven ability to architect for performance, scalability, and observability in complex, distributed service environments. + Hands-on experience with ... modern software development practices, including CI/CD, test automation, and monitoring . + Experience collaborating cross-functionally with product management, operations,… more
- General Motors (Sunnyvale, CA)
- …reliability or stability regressions. + **Integrate data pipelines** for continuous monitoring of release health, including automated collection of test, simulation, ... or equivalent). + Prior experience implementing **ELT/ETL pipelines** for quality monitoring , reliability, or release metrics. + Solid understanding of **system… more
- Publicis Groupe (Chicago, IL)
- …ingestion, transformation, cleaning, and preprocessing. + Working understanding of observability , monitoring , and performance tuning for production systems; ... experience building CI/CD pipelines and contributing to DevOps practices. + Knowledge of modern architectural styles (DDD, event-driven, ports and adapters, microservices) and standard processes for scalable, maintainable systems. + Strong foundation in… more
- NVIDIA (Santa Clara, CA)
- …lead the development of DGX Cloud strategy for GPU fleet lifecycle, health, observability and utilization monitoring , and remediation. You will define and drive ... the technical strategy across multiple environments (bare metal, cloud service provider, and neoclouds). Including defining and developing the auto-remediation strategies to detect, fix, validate, and restore-to-service critical systems. You will work with… more
- Microsoft Corporation (Redmond, WA)
- …tech - Leverage SAP BTP, Azure, Kyma/Kubernetes, AI/ML, DevOps pipelines, and modern observability tools at massive scale. + Drive impact fast - Operate in an ... for production-like testing **Reliability & Supportability** + Act as DRI for monitoring system degradation and downtime on simple problems + Follow playbook to… more