• Senior HPC Cluster

    NVIDIA (Santa Clara, CA)
    …make a lasting impact on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters ... + Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more
    NVIDIA (09/17/25)
    - Related Jobs
  • Senior AI- HPC Cluster

    NVIDIA (Santa Clara, CA)
    …+ Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop ... and operating large scale compute infrastructure. + Experience with AI/ HPC job schedulers and orchestrators, such as Slurm, K8s...such as Slurm, K8s or LSF. Applied experience with AI/ HPC workflows that use MPI and NCCL. + Proficient… more
    NVIDIA (07/31/25)
    - Related Jobs
  • Senior Platform Engineer

    Travelers Insurance Company (Hartford, CT)
    Senior Platform Engineer to support and manage our High-Performance Computing ( HPC ) Bright cluster environment, which is essential for our Large Language ... will provide backup support, management, and modernization of the High-Performance Compute Cluster (Nvidia Bright Cluster ) and GPU workloads, enabling Travelers'… more
    Travelers Insurance Company (08/15/25)
    - Related Jobs
  • Senior HPC Engineer

    Texas A&M University System (College Station, TX)
    Job Title Senior HPC Engineer Agency Texas A&M University Department Technology Services - IT Enterprise Operations Proposed Minimum Salary Commensurate Job ... members' faculty and staff providing cutting-edge research and super computing needs. As a Senior High Performance Computing Engineer ( HPC ), you will provide… more
    Texas A&M University System (10/03/25)
    - Related Jobs
  • Senior GPU and HPC Infrastructure…

    NVIDIA (Santa Clara, CA)
    …and planning abilities. Experience working with High Performance Computing ( HPC ), GPUs, and high-performance networking (RDMA, Infiniband, RoCE) are strongly ... will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Work on software that manages NVLINK topography… more
    NVIDIA (07/10/25)
    - Related Jobs
  • Senior HPC Support Engineer

    NVIDIA (Seattle, WA)
    We are seeking a motivated Senior HPC Technical Support Engineer - AI Infrastructure focusing on InfiniBand, NVLink and AI GPU Cluster technology, ... + InfiniBand, RDMA, NVLink and NVIDIA GPU Technology + Clustering or HPC Data-Center technologies including Upper Layer Protocols (ie, MPI, NCCL) + Additional… more
    NVIDIA (09/09/25)
    - Related Jobs
  • Senior HPC Linux System…

    Leidos (Atlanta, GA)
    **Description** The Public Health and Human Services Operation of Leidos is seeking a ** Senior HPC ** **Linux System Administrator** to lead a team of system ... planning, coordinating infrastructure support activities, leading and mentoring system administrators + HPC and cluster management: Proven experience with HPC more
    Leidos (09/30/25)
    - Related Jobs
  • Senior Solutions Architect, HPC

    NVIDIA (TX)
    NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer . Do you want to be part of a team that brings new Artificial Intelligence ... center GPU server and networking system deployments as Solution Architect Engineer . Guide customer discussions on network design, compute/storage and support bring… more
    NVIDIA (09/03/25)
    - Related Jobs
  • Senior ML Platform Engineer , AI…

    NVIDIA (Santa Clara, CA)
    …Make the choice to join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the design and implementation of ground ... + Provide leadership and strategic guidance on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more
    NVIDIA (08/21/25)
    - Related Jobs
  • Senior Site Reliability Engineer

    NVIDIA (Santa Clara, CA)
    …artificial intelligence. Join our team at NVIDIA as a Senior Site reliability engineer focused on HPC storage and play a crucial role in designing, ... software + Experience with RDMA (InfiniBand or RoCE) fabrics + Background with HPC cluster management tools such as Slurm, PBS, LSF, etc. + Passionate and… more
    NVIDIA (08/21/25)
    - Related Jobs