• Software Engineer , Systems ML - PyTorch…

    Meta (Bellevue, WA)
    …industry experience in developing compilers, ML systems, ML accelerators, GPU performance , and similar. **Preferred Qualifications:** Preferred Qualifications: ... open source, cutting-edge, and industry leading. **Required Skills:** Software Engineer , Systems ML - PyTorch Compiler, PyTorch Framework, PyTorch...10. Expert knowledge of GPU or ML accelerator performance and developing… more
    Meta (05/03/25)
    - Related Jobs
  • Software Engineer - XR Codec Interactions…

    Meta (Redmond, WA)
    …research GPU super clusters. You are a hybrid software/systems/infrastructure engineer who ensures that Meta's Research Super Clusters run smoothly and have ... with working on the frontiers of research.In this software engineer role, you will serve as the point of...every day at Meta's large scale ML model training GPU clusters, and we are always learning. **Required Skills:**… more
    Meta (05/01/25)
    - Related Jobs
  • Machine Learning Engineer II, AWS…

    Amazon (Seattle, WA)
    …- Develop robust monitoring and debugging tools to ensure the reliability and performance of training workflows on large GPU clusters. Design and maintain ... to process massive data, scale machine learning models while optimizing GPU utilization, memory management, and the training workflows (like kernel fusion,… more
    Amazon (05/01/25)
    - Related Jobs
  • Senior ASIC Design Engineer

    NVIDIA (WA)
    We are now looking for a Senior ASIC Design Engineer . NVIDIA is seeking ASIC Design Engineers to implement the world's leading SoC's and GPU 's. This position ... be doing: + As a key member of the GPU Design team, you will implement, document and deliver...Design team, you will implement, document and deliver high performance , area and power efficient RTL to achieve design… more
    NVIDIA (04/30/25)
    - Related Jobs
  • Principal, Software Engineer

    Walmart (Bellevue, WA)
    **Position Summary ** We are seeking a highly experienced Principal Software Engineer specializing in AI Systems to join our team and play a pivotal role in ... robust infrastructure for our Generative AI powered applications, deploying LLMs on GPU instances, supporting advanced AI research and development within our public… more
    Walmart (03/28/25)
    - Related Jobs
  • Deep Learning Engineer

    GE HealthCare (Bellevue, WA)
    **Job Description Summary** As a Deep Learning Engineer , you will play a crucial role in bridging the gap between AI science and production, helping to train AI ... and resolve challenges relating to large-scale model training involving multi- GPU and/or distributed training regimes. + Demonstrating algorithms to meet… more
    GE HealthCare (04/08/25)
    - Related Jobs
  • Hardware Systems Engineer , AI NPI

    Meta (Bellevue, WA)
    …the new product introduction (NPI) phase. **Required Skills:** Hardware Systems Engineer , AI NPI Responsibilities: 1. Drive and execute end-to-end system validation ... or more of the following modules/domains: PCIe, NVlink, Networking, Flash, Memory, CPU, GPU , TPU, DRAM (DDR4/5 or HBM), AI silicon/AI accelerators 15. 3+ years of… more
    Meta (02/05/25)
    - Related Jobs
  • Senior Compiler Engineer - Deep Learning

    NVIDIA (Redmond, WA)
    …inference and training (eg FlashInfer, Flash Attention) + Strong experience in GPU performance optimizations + Strong experience machine learning systems ... next era of computing. An era in which our GPU acts as the brains of computers, robots, and...are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want… more
    NVIDIA (04/23/25)
    - Related Jobs
  • Software Engineer , Systems ML - GenAI…

    Meta (Bellevue, WA)
    …GPUs/CPUs- Seamless eval reporting, analysis and debugging **Required Skills:** Software Engineer , Systems ML - GenAI Evals Platform Responsibilities: 1. Design ... facilitating development of novel evals and judges 3. Optimize GPU / CPU utilization to maximize efficiency and speed...CPU, or AI hardware accelerators 15. Experience in system performance optimizations such as runtime analysis of latency, memory… more
    Meta (04/17/25)
    - Related Jobs
  • Software Development Engineer II, Ml_ai

    Amazon (Bellevue, WA)
    …for large scale deep learning model training (100+ billion parameter GPT, 1000s of GPU devices). You have a proven track record of bringing innovative research to ... prior experience in one of: resource orchestrators like slurm/kubernetes, high performance computing, building scalable systems, experience in large language model… more
    Amazon (04/03/25)
    - Related Jobs