Alerted.org | Alerted.org - Powering better job alerts

Senior HPC Dev Ops Engineer

NVIDIA (Westford, MA)

Apply Now

As the HPC Operations Engineer for the new Accelerated Quantum Center (https://www.nvidia.com/en-us/solutions/quantum-computing/accelerated-quantum-center/) located in Boston area, you’ll be the linchpin in transforming next-generation HPC + Quantum infrastructure into robust, scalable architectures, balancing technical feasibility with customer needs. Engage with a team recognized for its industry leadership, advanced GPU supercomputers, and dedication to supporting our engineers, developers, and customers. You will help develop an innovative hybrid HPC + Quantum computing infrastructure featuring a substantial NVIDIA GB200 NVL72 GPU cluster (572 GPUs) that will be integrated with diverse quantum computing platforms. The lead HPC Engineer will offer technical mentorship, system administration, optimizing performance, and coordinating GPU compute and quantum system workloads.

What you'll be doing:

+ Build and operate a brand new hybrid compute environment spanning HPC and quantum systems.

+ Lead Linux provisioning, configuration management, and system tuning across hundreds of GPU nodes and supporting infrastructure.

+ Coordinate and optimize Slurm job scheduling — define policies, handle QoS, tune workloads, and help users translate research requirements into efficient batch workflows.

+ Coordinate data center tasks, partner with data center operations teams, connect with quantum lab.

+ Integrate and sustain container orchestration (e.g., Singularity, Docker, or Kubernetes for HPC) to back simulation workloads and quantum job processing.

+ Run storage environment consisting of Lustre, NFS, and Cloud storage.

+ Work closely with quantum engineering teams to merge quantum control nodes, orchestration gateways, and facilitate data exchange between HPC and quantum systems.

+ Addressing and improving performance for complex hybrid workloads, covering CUDA, MPI, and CUDA-Q applications.

+ Develop and automate operational workflows with Ansible, GitHub Actions, and CI/CD pipelines.

+ Support researchers and developers with environment setup, debugging, and performance profiling on NVIDIA hardware and quantum simulators as well as s erve as the primary systems administrator and reliability owner for the GB200 GPU infrastructure.

What We Need to See:

+ Proven experience (12+ years) in HPC systems engineering or administration within large-scale Linux-based GPU environments.

+ Extensive expertise in Slurm, Linux systems administration, and proficiency in configuration management tools like Ansible or Base Command (previously known as Bright Computing).

+ Practical familiarity with NVIDIA GPU technologies, InfiniBand, RDMA, and high-speed networking configurations.

+ Proficiency in containerization and orchestration tools like Singularity, Docker, or Kubernetes.

+ Knowledge of data center operations, including rack power, cooling methods (such as liquid-cooled systems), and network management.

+ Ability to automate, script, install/compile applications, and optimize performance.

+ Bachelor’s degree in Computer Science, Electrical/Computer Engineering, Physics, or equivalent experience.

+ Outstanding problem-solving and diagnostic skills, and the ability to operate in a multidisciplinary, high-performance environment.

Ways to Stand Out from the Crowd:

+ Exposure or familiarity of quantum computing systems (neutral-atom, trapped-ion, or superconducting).

+ Programming experience in Python, C/C++, or Shell Scripting for automation or performance tuning.

+ Knowledge of CUDA, MPI, and parallel programming paradigms is required.

+ Familiarity with NVIDIA HPC SDK, cuQuantum, or CUDA-Q toolchains.

+ Background supporting scientific or R&D computing environments.

This is a career-defining opportunity to work at the frontier of quantum-classical hybrid computing. You’ll help architect and operate a flagship facility that fuses GPU-accelerated HPC with quantum processors — advancing discovery in materials, life sciences, and innovative research. This is your chance to create a significant impact and push the boundaries of what's possible!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 208,000 USD - 333,500 USD for Level 5, and 248,000 USD - 396,750 USD for Level 6.

You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/) .

Applications for this job will be accepted at least until November 4, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Apply Now

"Alerted.org

Advanced Search

Senior HPC Dev Ops Engineer

Recent Searches

Recent Jobs

Account Login

Sign Up

Forgot your password?