• Senior HPC Engineer, Infrastructure…

    NVIDIA (Santa Clara, CA)
    …and to power data centers. Join the team building many of the largest and fastest AI / HPC systems in the world! NVIDIA is looking for someone with the ... and internal teams to analyze, define, and implement large-scale AI / HPC projects. These efforts include a combination...they begin rolling out some of the most sophisticated systems in the world! + Provide feedback to internal… more
    NVIDIA (06/12/25)
    - Related Jobs
  • Sr. Software Development Engineer, HPC /ML…

    Amazon (Cupertino, CA)
    Description We are seeking an experienced engineer to work on distributed AI /ML systems . This role involves working on collective operations - the fundamental ... operations that enable AI to scale across multiple accelerators & servers. Most...building networking solutions that for Machine Learning (ML) and High- Performance Computing ( HPC ) workloads on AWS. We… more
    Amazon (05/14/25)
    - Related Jobs
  • Senior Software Engineer - HPC

    NVIDIA (Santa Clara, CA)
    …long term maintenance strategy. What you'll be doing: + Design highly available and scalable systems to meet the demands of our HPC clusters + Evaluate new and ... graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a "learning… more
    NVIDIA (05/28/25)
    - Related Jobs
  • Senior HPC Architect, Networking

    NVIDIA (Santa Clara, CA)
    …improved workflows and develop new, leading differentiated solutions. You will interact with HPC , OS, GPU compute, and systems specialist to architect, develop ... parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is...looking for an outstanding hands-on architect/engineer for a Senior HPC architect role to support deployment and bringup of… more
    NVIDIA (07/09/25)
    - Related Jobs
  • Senior Site Reliability Engineer, HPC

    NVIDIA (Santa Clara, CA)
    …experience. Ways to stand out from the crowd: + Experience analyzing and tuning performance for a variety of HPC or EDA workloads. + Solid understanding ... NVIDIA is the leader in AI , machine learning and datacenter acceleration. NVIDIA is...and operate these clusters at high reliability, efficiency, and performance and drive foundational improvements and automation to improve… more
    NVIDIA (07/03/25)
    - Related Jobs
  • HPC Middleware Developer

    NVIDIA (Santa Clara, CA)
    …Networking Protocols InfiniBand, Ethernet + Knowledge in computer architecture and operating systems + Experience in performance optimizations + MSc or ... We are now looking for a senior HPC software engineer. As a member of our the High Performance Computing Software development team, you will be responsible for… more
    NVIDIA (06/30/25)
    - Related Jobs
  • Research Scientist, AI & Systems

    Meta (Menlo Park, CA)
    …on existing accelerator systems and guiding the future of models and AI HW at Meta. This drives improved performance , new model architectures and ... the following areas: Accelerators/GPU architectures, High Performance Computing ( HPC ), Machine Learning Compilers, Training/Inference ML Systems , Model… more
    Meta (07/13/25)
    - Related Jobs
  • Senior Site Reliability Engineer - AI

    NVIDIA (Santa Clara, CA)
    … infrastructure. + Passion for solving complex technical challenges and optimizing system performance . + Experience with AI / HPC advanced job schedulers, and ... support operational and reliability aspects of large scale distributed systems with focus on performance at scale,...storage systems like Lustre and GPFS for AI / HPC workloads. + Familiarity with deep learning… more
    NVIDIA (06/25/25)
    - Related Jobs
  • Software Manager, AI Infrastructure System

    NVIDIA (Santa Clara, CA)
    …maintain infrastructure and large-scale applications for LLM-based solutions. Optimize these systems for performance , scalability, reliability, and secure data ... Strong technical background in cloud/distributed infrastructure + Experience debugging functional and performance issues in HPC GPU clusters + Background in… more
    NVIDIA (07/01/25)
    - Related Jobs
  • Software Engineering Manager - AI

    Meta (Menlo Park, CA)
    …Qualifications: 7. Experience in leading teams working on high performance computing ( HPC ) and AI /ML systems , including: 8. Communication libraries (eg, ... of Meta AI infrastructure! **Required Skills:** Software Engineering Manager - AI Systems Co-Design Responsibilities: 1. Lead and support the communications… more
    Meta (07/02/25)
    - Related Jobs