- NVIDIA (Santa Clara, CA)
- …doing: + Provide leadership and strategic guidance on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + ... or LSF + Proficient in administering Centos/RHEL and/or Ubuntu Linux distributions + Solid understanding of cluster configuration managements...IBOP and RDMA + Understanding of fast, distributed storage systems like Lustre and GPFS for AI/ HPC … more
- NVIDIA (Santa Clara, CA)
- …Make the choice to join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the design and implementation of ground ... implementation of distributed storage services. + Design, implement an on-prem AI/ HPC infrastructure supplemented with cloud computing to support the growing needs… more
- NVIDIA (Santa Clara, CA)
- …like NCCL, NVSHMEM, and UCX that are crucial for scaling Deep Learning and HPC . We're seeking a Senior Software Architect to help co-design next-gen data ... to grow with the increasing scale of next generation systems . This is an outstanding opportunity to advance the...topology, algorithms, and communication scaling relevant to AI and HPC workloads. + Strong experience with Linux .… more
- NVIDIA (Santa Clara, CA)
- …As a Site Reliability Engineer, you are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a ... + Manage and support workload and resource schedulers in a large-scale HPC environment. + Automate Everything: Develop automation scripts to automate deployment,… more
- Amazon (Cupertino, CA)
- …We are seeking an experienced engineer to work on distributed AI/ML systems . This role involves working on collective operations - the fundamental operations ... Most of our stack is C/C++ and relatively low level, so solid knowledge of Linux , kernels, and performant code is important. Experience with embedded systems is… more
- Amazon (Sunnyvale, CA)
- …engineers with systems knowledge and experience in area such as Linux OS boot sequencing, Kernel, Hypervisor (Xen or KVM), peripheral device development (PCIe ... we're building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews.… more
- Amazon (Sunnyvale, CA)
- …engineers with systems knowledge and experience in area such as Linux OS boot sequencing, Kernel, Hypervisor (Xen or KVM), peripheral device development (PCIe ... we're building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews.… more
- SLAC National Accelerator Laboratory (Menlo Park, CA)
- …parallel applications (eg, gdb, Valgrind, Nvidia Nsight). + In-depth knowledge of Linux operating systems and advanced shell scripting. + Proven expertise ... Senior High Performance Computing Engineer Job ID 6383...role in managing and optimizing our High Performance Computing ( HPC ) environment in support of these groundbreaking scientific projects.… more
- RTX Corporation (El Segundo, CA)
- …and workstation support. + Ability to assist with troubleshooting, patching, and support of Linux Systems . + Provide IT systems administration in a ... Will Do:** + Experience provisioning, installation/configuration, operation, and maintenance of systems hardware and software related to a Windows/ Linux … more
- NVIDIA (Santa Clara, CA)
- …We deliver communication runtimes like NCCL and NVSHMEM for Deep Learning and HPC applications. We are looking for a motivated Partner Enablement Engineer to guide ... our key partners and customers with NCCL. Most DL/ HPC applications run on large clusters with high-speed networking...Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP,… more