• Senior System Software Engineer , NCCL…

    NVIDIA (Santa Clara, CA)
    …We deliver communication runtimes like NCCL and NVSHMEM for Deep Learning and HPC applications. We are looking for a motivated Partner Enablement Engineer ... guide our key partners and customers with NCCL. Most DL/ HPC applications run on large clusters with high-speed networking...Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP,… more
    NVIDIA (07/07/25)
    - Related Jobs
  • Modeling & Simulation Systems

    Northrop Grumman (Los Angeles, CA)
    …is currently looking for an experienced **Principal or Senior Principal** level engineer in **Modeling and Simulations, Systems engineering or software ... to modeling and simulation. **Basic Qualifications for a Principal Engineer , Modeling and Simulation Systems /Software - (Level...running Monte Carlo simulations + Experience working with an HPC system + Experience with hardware in the loop… more
    Northrop Grumman (08/08/25)
    - Related Jobs
  • Production Systems Engineer , Fleet…

    Meta (Menlo Park, CA)
    **Summary:** Meta is seeking an experienced Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the ... and lifecycle of servers in production. **Required Skills:** Production Systems Engineer , Fleet AI Systems ...contributor 18. 3+ years of experience supporting AI or HPC systems and/or related systems ,… more
    Meta (08/01/25)
    - Related Jobs
  • Senior Software Engineer - Parallel…

    NVIDIA (Santa Clara, CA)
    Do you have expertise in CUDA kernel optimization, C++ systems programming, or compiler infrastructure? Join NVIDIA's nvFuser (https://github.com/NVIDIA/Fuser) team ... of GPUs! We're looking for engineers who excel at parallel programming and systems -level performance work and want to directly impact the future of AI compilation.… more
    NVIDIA (06/07/25)
    - Related Jobs
  • Senior Software Development Engineer

    Amazon (Cupertino, CA)
    Description We are seeking an experienced engineer to work on distributed AI/ML systems . This role involves working on collective operations - the fundamental ... kernels, and performant code is important. Experience with embedded systems is valued, and experience with high-speed networking or... is valued, and experience with high-speed networking or HPC interconnects is valued highly. If you like solving… more
    Amazon (07/11/25)
    - Related Jobs
  • Site Reliability Engineer , GNC (Falcon)

    SpaceX (Hawthorne, CA)
    Site Reliability Engineer , GNC (Falcon) Hawthorne, CA Apply SpaceX was founded under the belief that a future where humanity is out exploring the stars is ... goal of enabling human life on Mars. SITE RELIABILITY ENGINEER , GNC (FALCON) SpaceX is looking for a Site...simulations on a high-performance computing cluster, automated data analysis systems , continuous integration systems for rocket and… more
    SpaceX (05/14/25)
    - Related Jobs
  • Senior AI Observability Engineer

    NVIDIA (Santa Clara, CA)
    …Senior AI Observability Engineer to help architect and implement distributed observability systems for AI and HPC clusters. We serve and collaborate directly ... You will be working with a team of dedicated engineers on systems for data collection, aggregation, enrichment, storage, retrieval, and visualization to… more
    NVIDIA (07/22/25)
    - Related Jobs
  • Distinguished Engineer , AI Resiliency

    NVIDIA (Santa Clara, CA)
    …AI, ideally across the entire lifecycle-from design to deployment-of large-scale High-Performance Computing ( HPC ) systems . Ways to Stand Out from the Crowd: + ... We are seeking a Distinguished Engineer for AI Resiliency at NVIDIA! Join NVIDIA...GPU, memory, storage, and networking. + Experience in implementing HPC software development best practices in large-scale systems more
    NVIDIA (07/12/25)
    - Related Jobs
  • Research Data Center Facility Engineer

    Stanford University (Stanford, CA)
    …researchers from a variety of Stanford and SLAC organizations. The majority of the HPC systems are hosted in the Stanford Research Computing Facility (SRCF), ... Research Data Center Facility Engineer **Business Affairs: University IT (UIT), Stanford, California,...Stanford Research Computing. Research Computing offers High Performance Computing ( HPC ) hosting services, computational and data systems ,… more
    Stanford University (08/07/25)
    - Related Jobs
  • Analytics DevOps and Platform Engineer

    UCLA Health (Los Angeles, CA)
    …UCLA Health IT is looking for an outstanding Analytics DevOps and Platform Engineer , (IT Architect), to join the Solutions Architecture and Engineering (SAE) group. ... will possess a well-rounded skillset encompassing software development, knowledge of HPC and Citrix environments, and relevant cloud certifications. We are looking… more
    UCLA Health (05/22/25)
    - Related Jobs