• Senior Datacenter System Software Architect - DGX…

    NVIDIA (Santa Clara, CA)
    NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, a deep understanding of distributed systems , ... capacity to build and deploy leading infrastructure solutions for a broad range of AI -based applications that affect core data science. What are you waiting for if… more
    NVIDIA (11/15/25)
    - Related Jobs
  • Software Engineering Manager, MTIA

    Meta (Menlo Park, CA)
    …10. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: GPU/ASIC-based kernel development and ... ROCm), distributed systems for large scale training and serving, and systems architecture and performance 11. Accelerator (GPU/ASIC) kernel development and… more
    Meta (09/06/25)
    - Related Jobs
  • Principal Product Manager, HBM

    Micron Technology, Inc. (San Jose, CA)
    …in growing the Artificial Intelligence ( AI ), Machine Learning (ML) and High- Performance Computing ( HPC ) business segments. You will be working on innovative ... of Work (SOWs), business term sheets, and other customer-facing documents for high- performance memory products. + Represent the Product Management team in Product… more
    Micron Technology, Inc. (11/14/25)
    - Related Jobs
  • Senior Research Engineer, Foundation Model…

    NVIDIA (Santa Clara, CA)
    …works on multimodal foundation models, large-scale robot learning, embodied AI , and physics simulation. Our past projects include Eureka ... What you will be doing: + Design and maintain large-scale distributed training systems to support multi-modal foundation models for robotics. + Optimize GPU and… more
    NVIDIA (09/05/25)
    - Related Jobs
  • Senior Datacenter Resiliency Architect

    NVIDIA (Santa Clara, CA)
    …GPUs and SOCs powering product lines for the growing field of artificial intelligence ( AI ) and high- performance computing ( HPC ). What you'll be doing: + ... features to improve system Reliability, Availability, Serviceability (RAS), and performance in the Datacenter. + Model and analyze RAS...parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing - with… more
    NVIDIA (11/15/25)
    - Related Jobs
  • Software Engineer - Host Networking

    Meta (Menlo Park, CA)
    …networks, powering our global data centers and supporting cutting-edge technologies like AI , Generative AI , Recommendation engines, and Metaverse. Our network ... to join our teams and help build scalable distributed systems , develop innovative solutions to our challenges, and ship...firmware, and software for network devices, transport stacks, and AI workloads 2. Debug complex system-level issues and lead… more
    Meta (11/08/25)
    - Related Jobs
  • Software Engineer - Host Networking

    Meta (Menlo Park, CA)
    …networks, powering our global data centers and supporting cutting-edge technologies like AI , Generative AI , Recommendation engines, and Metaverse. Our network ... to join our teams and help build scalable distributed systems , develop innovative solutions to our challenges, and ship...firmware, and software for network devices, transport stacks, and AI workloads 2. Debug complex system-level issues and lead… more
    Meta (11/08/25)
    - Related Jobs
  • Software Engineer, Cuda-Q Libraries

    NVIDIA (CA)
    …developing the CUDA-Q platform for programming powerful hybrid quantum-classical multi-processor systems . We are looking for a dedicated engineer with expertise in ... real-time systems , GPU-programming, and is proficient in parallel and distributed...If you love the craft of software engineering and high- performance algorithm implementation, and relish the chance to impact… more
    NVIDIA (09/07/25)
    - Related Jobs
  • Senior Platform Telemetry Engineer

    NVIDIA (Santa Clara, CA)
    …GH200 superchip provides performance and productivity required for strong scaling for HPC and generative AI workload. Scale out is inherent to the design ... the world. Today, we are increasingly known as "the AI computing company." We are looking to grow our...& analysis engines. Experience with Redfish. Experience with notification systems like PagerDuty. + Active Open Compute (OCP) and… more
    NVIDIA (11/14/25)
    - Related Jobs
  • Senior Deep Learning Engineer - Autonomous…

    NVIDIA (Santa Clara, CA)
    …12+ years of professional experience building and scaling high- performance distributed systems , ideally in ML, HPC , or large-scale data infrastructure. + ... is preferred), large scale training (DDP/FSDP, NCCL, tensor/pipeline parallelism), and performance profiling. + Strong systems background: datacenter networking… more
    NVIDIA (10/03/25)
    - Related Jobs