• Software Engineer , SystemML - AI…

    Meta (Menlo Park, CA)
    …**Preferred Qualifications:** Preferred Qualifications: 7. Experience with NCCL and distributed GPU performance analysis on RoCE/Infiniband 8. PhD in Computer ... GPU training and inference fleet through an observable, reliable and high- performance distributed AI/ GPU communication stack. Currently, one of the team's… more
    Meta (12/20/25)
    - Related Jobs
  • Senior High- Performance LLM Training…

    NVIDIA (Santa Clara, CA)
    We are now looking for a Senior High- Performance LLM Training Engineer ! NVIDIA is seeking experienced engineers specializing in performance analysis and ... to work across the full hardware & software stack-from GPU architecture to application code-to achieve optimal performance... GPU architecture to application code-to achieve optimal performance , we want to hear from you! #LI- Hybrid… more
    NVIDIA (10/08/25)
    - Related Jobs
  • Senior System Software Engineer - AI…

    NVIDIA (Santa Clara, CA)
    …role involves developing tools for AI researchers and SW/HW teams running AI workload in GPU cluster. As a member of the software development team, we will work with ... debugging tricky failures and issues to help improve the performance and efficiency of the system. What you'll be...Create benchmarking and simulation technologies for AI system or GPU cluster + Partner with HW architects to propose… more
    NVIDIA (12/19/25)
    - Related Jobs
  • Senior DGX AI Cloud Performance Analysis…

    NVIDIA (Santa Clara, CA)
    …+ Work with various teams at NVIDIA to incorporate and influence the latest technologies for GPU performance analysis What we need to see: + Minimum of 8+ years ... to convert profiling data into actionable optimizations + Support deep learning software engineers and GPU architects in their performance analysis efforts… more
    NVIDIA (12/07/25)
    - Related Jobs
  • Sr. ML Kernel Performance Engineer

    Amazon (Cupertino, CA)
    …OpenCL, SYCL, or ROCm - Demonstrated experience with NVIDIA PTX and/or AMD GPU ISA - Experience developing high performance libraries for HPC applications ... Kernel Library team is at the forefront of maximizing performance for AWS's custom ML accelerators. Working at the...GPUs, CPUs, FPGAs, or custom architectures - Experience with GPU kernel optimization and GPGPU computing such as CUDA,… more
    Amazon (11/14/25)
    - Related Jobs
  • Performance Engineer , Embedded…

    NVIDIA (Santa Clara, CA)
    …potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the ... applications. NVIDIA is driven to deliver the best possible performance which allows researchers and scientists to do more...and passionate about joining NVIDIA. + Familiarity with NVIDIA GPU products and technologies. + Experience using GPU more
    NVIDIA (11/27/25)
    - Related Jobs
  • AI Senior Staff Systems Engineer

    Cadence Design Systems, Inc. (San Jose, CA)
    …for the entire lifecycle of our AI systems, from architecting and building high- performance GPU clusters to deploying and optimizing our most advanced AI ... and manage monitoring solutions for system health, job statuses, GPU utilization, and container performance to proactively...Proven track record as a Principal or Senior Staff Engineer . + Expert-level knowledge of NVIDIA GPU more
    Cadence Design Systems, Inc. (12/29/25)
    - Related Jobs
  • Software Engineer II and Senior Software…

    Microsoft Corporation (Mountain View, CA)
    **Overview** The Artificial Intelligence Performance team at Microsoft develops AI software that enables running AI models everywhere, from world's fastest AI ... on a collaborative and inclusive culture. We own inference performance of OpenAI and other state of the art...Bing, SQL Server, and Dynamics. As a Senior Software Engineer on the team, you will have the opportunity… more
    Microsoft Corporation (01/01/26)
    - Related Jobs
  • Principal Software Engineer - Dynamo

    NVIDIA (Santa Clara, CA)
    …AI inference frameworks (eg, vLLM, TensorRT-LLM, SGLang). + Experience with GPU resource scheduling, cache management, or high- performance networking. + ... scalable inference for large language and reasoning models in distributed GPU environments. By bringing to bear sophisticated techniques in serving architecture,… more
    NVIDIA (01/01/26)
    - Related Jobs
  • Principal Software Engineer - Large-Scale…

    NVIDIA (Santa Clara, CA)
    …and reasoning models across multi-node distributed environments. Built in Rust for performance and Python for extensibility, Dynamo orchestrates GPU shards, ... outgrow the memory and compute budget of any single GPU , this platform enables efficient, resilient deployment of cutting-edge...cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to define the vision and roadmap for memory… more
    NVIDIA (12/22/25)
    - Related Jobs