• Senior Systems Software Engineer, AV…

    NVIDIA (Santa Clara, CA)
    …you will work with internal teams and external partners to integrate distributed systems , manage large-scale data pipelines, and operationalize next-generation ... pipelines using Go, Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution. + Integrate simulation and drive logs (eg… more
    NVIDIA (09/19/25)
    - Related Jobs
  • Senior Technical Systems AI

    NVIDIA (Santa Clara, CA)
    …design, or enterprise platform engineering. + Deep expertise in architecting large-scale distributed systems with a focus on reliability, performance, and ... record of publishing technical papers, architecture patterns, or thought leadership in AI systems . + Knowledge of observability tools, telemetry dashboards, and… more
    NVIDIA (10/16/25)
    - Related Jobs
  • Software Engineer Data/ AI /Intelligent…

    Cisco (San Jose, CA)
    …platforms, such as AWS, Azure, or Google Cloud. + Understanding of distributed systems concepts, including scalability, reliability, fault tolerance, and data ... Team** Our dedicated team members are building the future of Cisco's AI -driven platforms and data infrastructure, supporting innovation across the globe. You will… more
    Cisco (12/01/25)
    - Related Jobs
  • Senior Software Engineer, Distributed

    NVIDIA (Austin, TX)
    …from the crowd: + Technical competency in managing and automating large-scale distributed systems independent of cloud providers. Advanced hands-on experience ... part of an DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be...Bright Cluster Manager) + Proven operational excellence in maintaining reliable and performant AI infrastructure. NVIDIA is… more
    NVIDIA (10/04/25)
    - Related Jobs
  • Solutions Architect - Rack Scale AI

    NVIDIA (Santa Clara, CA)
    …hosts a heterogeneous mix of machines and devices with various operating systems (Windows/Linux/Android), a multitude of hardware platforms both NVIDIA GPUs and ... Tegra Processors. Are you passionate about distributed infrastructure and looking for sophisticated, critical issues, ready to build the next generation of cloud… more
    NVIDIA (12/10/25)
    - Related Jobs
  • Software Development Engineer, Distributed

    Amazon (Seattle, WA)
    …base. You'll bring a passion for innovation, data, search, analytics, and distributed systems . You'll also: - Solve challenging technical problems, often ... one of several AWS tools used for building Generative AI on AWS. The Neuron Compiler Engineering team is...for identifying and designing solutions that enable efficient and reliable build, test, and release mechanisms for the Neuron… more
    Amazon (12/01/25)
    - Related Jobs
  • Principal Software Developer - OCI AI

    Oracle (Nashville, TN)
    …Work closely with a collaborative and experienced global team. - Expand your knowledge in AI , cloud computing, and distributed systems . - Contribute to one ... tools to operationalize Large Language Models (LLMs) and agentic AI systems . Our goal is to empower...will contribute to the design and implementation of scalable, distributed systems that serve LLMs and support… more
    Oracle (11/25/25)
    - Related Jobs
  • Principal Software Engineer- OCI AI

    Oracle (San Juan, PR)
    …Work closely with a collaborative and experienced global team. - Expand your knowledge in AI , cloud computing, and distributed systems . - Contribute to one ... tools to operationalize Large Language Models (LLMs) and agentic AI systems . Our goal is to empower...will contribute to the design and implementation of scalable, distributed systems that serve LLMs and support… more
    Oracle (11/25/25)
    - Related Jobs
  • Senior Software Engineer, AI Resiliency

    NVIDIA (Santa Clara, CA)
    …and inference more reliable , scalable, and efficient. If you're passionate about AI , distributed systems , and high-performance computing, we want to hear ... driving down cluster downtime towards zero, ensuring that our AI systems remain robust and reliable...detection. + Hands-On Coding & Optimization: Contribute to large-scale distributed systems with high-quality, production-level C++ and… more
    NVIDIA (10/15/25)
    - Related Jobs
  • Software Developer 3 - OCI AI Platform

    Oracle (Columbus, OH)
    …. This is a highly technical, hands-on role where you'll build large-scale distributed systems , optimize AI /ML workflows, and collaborate with ... observability, CI/CD pipelines, and operational excellence. Troubleshoot complex issues in distributed systems and participate in on-call rotations as needed.… more
    Oracle (11/25/25)
    - Related Jobs