• Senior DGX Cloud Software Engineer…

    NVIDIA (TX)
    …ROI of building and maintaining automation is worth it. + Practice sustainable blameless incident prevention and incident response while being a member of ... programming languages: Python or Go + BS degree in Computer Science or a related technical field involving coding...Communication Library (NCCL). + Experience working with a centralized security organization to prioritize and mitigate security more
    NVIDIA (01/14/26)
    - Related Jobs
  • Site Facilities Engineering Manager

    BAE Systems (Wayne, NJ)
    …act as Incident Commander for scenarios highlighted in the Emergency Response / Incident Management Plan. 3. Develop relations and collaborate with other ... and organizations include and are not limited to Safety, Health, and Environmental, Security and IT, and Purchasing helping to deliver services supporting the core… more
    BAE Systems (01/10/26)
    - Related Jobs
  • AVP of Observability Engineering

    The Hartford (Penn, PA)
    …AI-powered anomaly detection and predictive analytics to reduce alert noise and improve incident response . + Embed AI-driven automation for: + Intelligent log ... a cutting-edge, AI powered observability ecosystem. This leader will ensure security , efficiency, and resiliency across The Hartford's technology platforms by… more
    The Hartford (01/10/26)
    - Related Jobs
  • HSE Manager Field Services

    Honeywell (Des Plaines, IL)
    …+ Lock out/Tag Out + Electrical Safety + Machinery and Equipment Safeguarding + Emergency Response and Travel Security + Hand and Portable Tools + Compressed Air ... and improve decision-making in the field. + Lead and support injury and incident investigations with Root Cause Analysis methodology. **YOU MUST HAVE** + 5 or… more
    Honeywell (01/06/26)
    - Related Jobs
  • Senior Manager, Software Engineering - AI…

    Choice Hotels (Scottsdale, AZ)
    …champion design patterns, quality metrics, and automation. + Production leadership: own incident readiness and response , post‑ incident learning, and ... Transform SDLC → AI‑DLC: embed AI across planning, design, coding, test, security , and operations; deliver measurable gains in speed, quality, resilience, developer… more
    Choice Hotels (12/31/25)
    - Related Jobs
  • IT Support Engineer - SNOC, Amazon Leo…

    Amazon (Arlington, VA)
    …* Monitor ServiceNow queues and broadband performance dashboards to manage workload and response times. * Serve as the first point of technical and procedural ... ServiceNow, Amazon Connect) to support mission-specific workflows. * Support and track incident resolution lifecycles, ensuring compliance with SLAs and incident more
    Amazon (12/30/25)
    - Related Jobs
  • Lead Engineer, Systems Engineering

    Intercontinental Exchange (ICE) (New York, NY)
    …reviews, design reviews, and architecture discussions + Participate in on-call rotations and incident response in production operations in a 24/7 environment + ... tooling using Python, Shell, and Ansible to streamline operations and enforce security controls. + Collaborate with cross-functional teams to develop and maintain… more
    Intercontinental Exchange (ICE) (12/23/25)
    - Related Jobs
  • (Senior) Software Engineer, Infrastructure…

    pony.ai (Fremont, CA)
    …Experience with observability and SRE practices (Prometheus, Grafana, ELK, Datadog; SLOs, incident response , postmortems). + Familiarity with workloads common to ... and governance. + Define and enforce best practices for service deployments, security policies, and operational guidelines. + Contribute to observability and SRE… more
    pony.ai (12/16/25)
    - Related Jobs
  • Site Reliability Developer 3

    Oracle (Reston, VA)
    …and platform automation to drive efficiency and repeatability. + Observability and Incident Response : + Develop observability stacks using tools like Prometheus, ... Site Reliability Developer Oracle Cloud Infrastructure (OCI) - OCI National Security Regions Reston, VA/ Seattle, WA/ Austin, TX https://www.oracle.com/cloud/ We are… more
    Oracle (12/12/25)
    - Related Jobs
  • Lead Platform Engineer (Audio Video / Unified…

    Capital One (Mclean, VA)
    …planning, testing, and executing upgrades and maintenance tasks. + Lead critical incident response efforts, ensuring timely resolution and conducting post- ... its components, and their interdependencies, ensuring high availability, scalability, and security . + Collaborate closely with product managers, security , and… more
    Capital One (11/27/25)
    - Related Jobs