• Senior Systems Software Engineer, AV…

    NVIDIA (Santa Clara, CA)
    …you will work with internal teams and external partners to integrate distributed systems , manage large-scale data pipelines, and operationalize next-generation ... pipelines using Go, Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution. + Integrate simulation and drive logs (eg… more
    NVIDIA (09/19/25)
    - Related Jobs
  • Software Engineer - Distributed

    Rubrik (Palo Alto, CA)
    …/Kernel or Networking domain + Strong fundamentals in data structures, algorithms, and distributed systems design + Strong background in Systems Programming ... and CTO, our mission is to build a highly reliable , secure, and scalable software-defined platform. We are the...Go, and either C++, Java, or Scala + Large distributed systems design and development experience is… more
    Rubrik (08/07/25)
    - Related Jobs
  • Senior Distributed Software Engineer,…

    NVIDIA (Santa Clara, CA)
    …achieve this goal, we are looking for an engineer with a deep understanding of distributed systems , outstanding design skills, and a track record in building and ... the broader NVIDIA team to design and build a reliable , scalable, and efficient storage-as-a-service tailored to AI...years of industry experience + Strong background in developing distributed systems involving Golang, Kubernetes, and Cloud… more
    NVIDIA (08/08/25)
    - Related Jobs
  • Engineering Manager - Rack Scale AI

    NVIDIA (Santa Clara, CA)
    …Lead IPP's (Infrastructure, Planning and Process) Cloud Platform Team focused on Rack Scale AI Systems . IPP is a global organization within NVIDIA. This group ... of cloud design in the areas of virtualization and global infrastructure, distributed systems , load balancing and security + Excellent thought process… more
    NVIDIA (07/29/25)
    - Related Jobs
  • Software Engineer, SystemML - AI Networking

    Meta (Menlo Park, CA)
    …leverage our large-scale GPU training and inference fleet through an observable, reliable and high-performance distributed AI /GPU communication stack. ... learning domains: Distributed ML Training, GPU architecture, ML systems , AI infrastructure, high performance computing, performance optimizations, or… more
    Meta (08/01/25)
    - Related Jobs
  • Senior Software Engineer, AI Resiliency

    NVIDIA (Santa Clara, CA)
    …and inference more reliable , scalable, and efficient. If you're passionate about AI , distributed systems , and high-performance computing, we want to hear ... driving down cluster downtime towards zero, ensuring that our AI systems remain robust and reliable...detection. + Hands-On Coding & Optimization: Contribute to large-scale distributed systems with high-quality, production-level C++ and… more
    NVIDIA (07/22/25)
    - Related Jobs
  • Software Manager, AI Infrastructure System

    NVIDIA (Santa Clara, CA)
    …to encouraging an inclusive and diverse workplace. + Hands-on experience developing large-scale distributed systems Ways to stand out from the crowd: + Strong ... orgs to build products that use LLMs and agent systems to serve the needs of NVIDIA engineering teams....the product/team. + Develop and execute strategies for scalable, reliable , and secure AI infrastructure supporting both… more
    NVIDIA (09/30/25)
    - Related Jobs
  • (USA) Principal, Software Engineer - AI

    Walmart (Sunnyvale, CA)
    …build dynamic, context-aware systems . 2. **Architecture ; Scalability:** + Architect scalable, distributed AI systems with a focus on performance, fault ... to lead the design, development, and deployment of advanced AI systems . This role involves architecting scalable...Walmart GTP, you will be building highly scalable and reliable APIs, services and applications which will drive the… more
    Walmart (08/28/25)
    - Related Jobs
  • Principal, Software Engineer - Gen AI

    Walmart (Sunnyvale, CA)
    …build dynamic, context-aware systems . 2. **Architecture ; Scalability:** + Architect scalable, distributed AI systems with a focus on performance, fault ... to lead the design, development, and deployment of advanced AI systems . This role involves architecting scalable...Backend team, you will be building highly scalable and reliable APIs, services and applications which will drive the… more
    Walmart (09/20/25)
    - Related Jobs
  • Sr Machine Learning Engineer - GenAI, LLM, Agentic…

    eightfold.ai (Santa Clara, CA)
    …with opportunities. Responsibilities: + Research, design, development, and deployment of advanced AI agents and agentic systems . + Architect and implement ... About Eightfold. ai : Eightfold. ai is revolutionizing HR technology... is at the forefront of developing intelligent, autonomous systems that will redefine talent management. We are building… more
    eightfold.ai (08/08/25)
    - Related Jobs