• Senior AI- HPC Cluster Engineer

    NVIDIA (Santa Clara, CA)
    …doing: + Provide leadership and strategic guidance on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + ... or LSF + Proficient in administering Centos/RHEL and/or Ubuntu Linux distributions + Solid understanding of cluster configuration managements...IBOP and RDMA + Understanding of fast, distributed storage systems like Lustre and GPFS for AI/ HPC more
    NVIDIA (04/02/25)
    - Related Jobs
  • Senior AI- HPC Storage Engineer

    NVIDIA (Santa Clara, CA)
    …Make the choice to join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the design and implementation of ground ... implementation of distributed storage services. + Design, implement an on-prem AI/ HPC infrastructure supplemented with cloud computing to support the growing needs… more
    NVIDIA (05/07/25)
    - Related Jobs
  • Senior Software Architect - Deep Learning…

    NVIDIA (Santa Clara, CA)
    …like NCCL, NVSHMEM, and UCX that are crucial for scaling Deep Learning and HPC . We're seeking a Senior Software Architect to help co-design next-gen data ... to grow with the increasing scale of next generation systems . This is an outstanding opportunity to advance the...topology, algorithms, and communication scaling relevant to AI and HPC workloads. + Strong experience with Linux .… more
    NVIDIA (05/05/25)
    - Related Jobs
  • Senior Site Reliability Engineer,…

    NVIDIA (Santa Clara, CA)
    …As a Site Reliability Engineer, you are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a ... + Manage and support workload and resource schedulers in a large-scale HPC environment. + Automate Everything: Develop automation scripts to automate deployment,… more
    NVIDIA (04/04/25)
    - Related Jobs
  • Sr. Software Development Engineer, HPC /ML…

    Amazon (Cupertino, CA)
    …We are seeking an experienced engineer to work on distributed AI/ML systems . This role involves working on collective operations - the fundamental operations ... Most of our stack is C/C++ and relatively low level, so solid knowledge of Linux , kernels, and performant code is important. Experience with embedded systems is… more
    Amazon (03/21/25)
    - Related Jobs
  • Software Development Engineer, Nitro High Memory…

    Amazon (Sunnyvale, CA)
    …engineers with systems knowledge and experience in area such as Linux OS boot sequencing, Kernel, Hypervisor (Xen or KVM), peripheral device development (PCIe ... we're building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews.… more
    Amazon (04/29/25)
    - Related Jobs
  • Senior High Performance Computing Engineer

    SLAC National Accelerator Laboratory (Menlo Park, CA)
    …parallel applications (eg, gdb, Valgrind, Nvidia Nsight). + In-depth knowledge of Linux operating systems and advanced shell scripting. + Proven expertise ... Senior High Performance Computing Engineer Job ID 6383...role in managing and optimizing our High Performance Computing ( HPC ) environment in support of these groundbreaking scientific projects.… more
    SLAC National Accelerator Laboratory (04/26/25)
    - Related Jobs
  • Senior System Software Engineer, NCCL…

    NVIDIA (Santa Clara, CA)
    …We deliver communication runtimes like NCCL and NVSHMEM for Deep Learning and HPC applications. We are looking for a motivated Partner Enablement Engineer to guide ... our key partners and customers with NCCL. Most DL/ HPC applications run on large clusters with high-speed networking...Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP,… more
    NVIDIA (04/22/25)
    - Related Jobs
  • Senior Solutions Architect, Networking…

    NVIDIA (Santa Clara, CA)
    …on networking and help develop accelerated computing networking solutions for AI/ML and HPC on hyperscalers. As part of the NVIDIA Solutions Architecture team, you ... develop solutions for customer performance issues for both AI workload and systems performance. What we need to see: + BS/MS/PhD in Electrical/Computer Engineering,… more
    NVIDIA (04/09/25)
    - Related Jobs
  • Senior Storage and Data Production Engineer

    NVIDIA (Santa Clara, CA)
    …in algorithms, data structures, complexity analysis, software design, and maintaining large-scale Linux -based storage systems . + Experience in one or more of ... a team that involves designing, building, and maintaining large-scale production systems with high efficiency and availability. It encompasses various areas,… more
    NVIDIA (04/05/25)
    - Related Jobs