• AI/ HPC Network Engineer

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Network Engineer Responsibilities: 1. Design, develop, test and ... a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across our infrastructure...operate networking systems to support large scale AI training jobs. 2.… more
    Meta (05/08/25)
    - Related Jobs
  • Software Development Engineer , Nitro High…

    Amazon (Sunnyvale, CA)
    …assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future 10017 ... line. The Nitro Team is looking for engineers with systems knowledge and experience in area such as Linux...High performance computing workloads. The Nitro High Memory and HPC team owns the purpose built platform development for… more
    Amazon (04/29/25)
    - Related Jobs
  • Network Development Engineer I, HPC

    Amazon (Cupertino, CA)
    Description Do you like to use network and Unix systems engineering to deliver simple, sustainable, and repeatable solutions? Would you like to play a key role in ... Core Networking team is looking for a Network Development Engineer to join our Network Fabric Engineering (NFE) team....Network Fabric Engineering (NFE) team. As a Network Development Engineer , you will be responsible for building, deploying and… more
    Amazon (04/08/25)
    - Related Jobs
  • HPC Middleware Developer

    NVIDIA (Santa Clara, CA)
    We are now looking for a senior HPC software engineer . As a member of our the High Performance Computing Software development team, you will be responsible for ... InfiniBand, Ethernet + Deep knowledge in computer architecture and operating systems + Experience in performance optimizations + MSc or equivalent experience… more
    NVIDIA (05/17/25)
    - Related Jobs
  • Software Engineer , Accelerator…

    Meta (Menlo Park, CA)
    …Qualifications:** Preferred Qualifications: 11. Full-stack experience and understanding of AI/ HPC systems , from HW/infrastructure through the application layer, ... in some of the world's largest scale clusters. **Required Skills:** Software Engineer , Accelerator Systems & Technologies Responsibilities: 1. Understand and… more
    Meta (05/01/25)
    - Related Jobs
  • Hardware Systems Engineer , NPI AI…

    Meta (Menlo Park, CA)
    **Summary:** Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on AI/ML initiatives supporting large scale AI Training and ... to hyperscalar bring up and validation. **Required Skills:** Hardware Systems Engineer , NPI AI Lead Responsibilities: 1....rack level and at scale, as well as debugging AI/ HPC systems , performance optimizations, including familiarity with… more
    Meta (05/14/25)
    - Related Jobs
  • Hardware Systems Engineer , NPI AI

    Meta (Menlo Park, CA)
    **Summary:** Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on AI/ML initiatives supporting large scale AI Training and ... to Meta Silicon hyperscalar bring up and validation. **Required Skills:** Hardware Systems Engineer , NPI AI Responsibilities: 1. Lead the bring-up, validation,… more
    Meta (04/24/25)
    - Related Jobs
  • Senior High Performance Computing Engineer

    SLAC National Accelerator Laboratory (Menlo Park, CA)
    Senior High Performance Computing Engineer Job ID 6383 Location SLAC - Menlo Park, CA Full-Time Regular **SLAC Job Postings** **About SLAC:** The SLAC National ... hybrid work options.** **Position Overview:** As a Senior High Performance Computing Engineer in the Scientific Computing Services Division of the Technology and… more
    SLAC National Accelerator Laboratory (04/26/25)
    - Related Jobs
  • Senior System Software Engineer , NCCL…

    NVIDIA (Santa Clara, CA)
    …We deliver communication runtimes like NCCL and NVSHMEM for Deep Learning and HPC applications. We are looking for a motivated Partner Enablement Engineer ... guide our key partners and customers with NCCL. Most DL/ HPC applications run on large clusters with high-speed networking...Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP,… more
    NVIDIA (04/22/25)
    - Related Jobs
  • Senior Site Reliability Engineer - AI…

    NVIDIA (Santa Clara, CA)
    …Experience with Cloud Deployment, BCM, Terraform. + Understanding of fast, distributed storage systems like Lustre and GPFS for AI/ HPC workloads. + Familiarity ... diverse team today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the...automation to improve researchers productivity. As a Site Reliability Engineer , you are responsible for the big picture of… more
    NVIDIA (03/26/25)
    - Related Jobs