• Senior AI- HPC Cluster Engineer

    NVIDIA (Santa Clara, CA)
    …doing: + Provide leadership and strategic guidance on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + ... or LSF + Proficient in administering Centos/RHEL and/or Ubuntu Linux distributions + Solid understanding of cluster configuration managements...IBOP and RDMA + Understanding of fast, distributed storage systems like Lustre and GPFS for AI/ HPC more
    NVIDIA (04/02/25)
    - Related Jobs
  • Senior AI- HPC Storage Engineer

    NVIDIA (Santa Clara, CA)
    …Make the choice to join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the design and implementation of ground ... implementation of distributed storage services. + Design, implement an on-prem AI/ HPC infrastructure supplemented with cloud computing to support the growing needs… more
    NVIDIA (05/07/25)
    - Related Jobs
  • Senior Software Architect - Deep Learning…

    NVIDIA (Santa Clara, CA)
    …like NCCL, NVSHMEM, and UCX that are crucial for scaling Deep Learning and HPC . We're seeking a Senior Software Architect to help co-design next-gen data ... to grow with the increasing scale of next generation systems . This is an outstanding opportunity to advance the...topology, algorithms, and communication scaling relevant to AI and HPC workloads. + Strong experience with Linux .… more
    NVIDIA (05/05/25)
    - Related Jobs
  • Senior Site Reliability Engineer,…

    NVIDIA (Santa Clara, CA)
    …As a Site Reliability Engineer, you are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a ... + Manage and support workload and resource schedulers in a large-scale HPC environment. + Automate Everything: Develop automation scripts to automate deployment,… more
    NVIDIA (04/04/25)
    - Related Jobs
  • Sr. Software Development Engineer, HPC /ML…

    Amazon (Cupertino, CA)
    …We are seeking an experienced engineer to work on distributed AI/ML systems . This role involves working on collective operations - the fundamental operations ... Most of our stack is C/C++ and relatively low level, so solid knowledge of Linux , kernels, and performant code is important. Experience with embedded systems is… more
    Amazon (03/21/25)
    - Related Jobs
  • Software Development Engineer, Nitro High Memory…

    Amazon (Sunnyvale, CA)
    …engineers with systems knowledge and experience in area such as Linux OS boot sequencing, Kernel, Hypervisor (Xen or KVM), peripheral device development (PCIe ... we're building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews.… more
    Amazon (05/09/25)
    - Related Jobs
  • Software Development Engineer, Nitro High Memory…

    Amazon (Sunnyvale, CA)
    …engineers with systems knowledge and experience in area such as Linux OS boot sequencing, Kernel, Hypervisor (Xen or KVM), peripheral device development (PCIe ... we're building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews.… more
    Amazon (04/29/25)
    - Related Jobs
  • Senior High Performance Computing Engineer

    SLAC National Accelerator Laboratory (Menlo Park, CA)
    …parallel applications (eg, gdb, Valgrind, Nvidia Nsight). + In-depth knowledge of Linux operating systems and advanced shell scripting. + Proven expertise ... Senior High Performance Computing Engineer Job ID 6383...role in managing and optimizing our High Performance Computing ( HPC ) environment in support of these groundbreaking scientific projects.… more
    SLAC National Accelerator Laboratory (04/26/25)
    - Related Jobs
  • Senior Systems Administrator

    RTX Corporation (El Segundo, CA)
    …and workstation support. + Ability to assist with troubleshooting, patching, and support of Linux Systems . + Provide IT systems administration in a ... Will Do:** + Experience provisioning, installation/configuration, operation, and maintenance of systems hardware and software related to a Windows/ Linux more
    RTX Corporation (05/08/25)
    - Related Jobs
  • Senior System Software Engineer, NCCL…

    NVIDIA (Santa Clara, CA)
    …We deliver communication runtimes like NCCL and NVSHMEM for Deep Learning and HPC applications. We are looking for a motivated Partner Enablement Engineer to guide ... our key partners and customers with NCCL. Most DL/ HPC applications run on large clusters with high-speed networking...Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP,… more
    NVIDIA (04/22/25)
    - Related Jobs