- Meta (Menlo Park, CA)
- …and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Systems Performance Engineer Responsibilities: 1. Active member of ... a multi-disciplinary team to develop solutions for large scale training systems. 2. Responsible for the overall performance of the communication system, including performance benchmarking, monitoring and troubleshooting production issues. 3. Identify potential… more
- Meta (New York, NY)
- …host networking, communications lib and scheduling infrastructure. **Required Skills:** AI/ HPC System Performance Engineer Responsibilities: 1. Lead ... multi-disciplinary teams to develop solutions for large scale training systems. Assess trade-offs of various solutions and make pragmatic decisions 2. Ensure timely milestone delivery with teamwork and close collaboration 3. Responsible for the overall… more
- Micron Technology, Inc. (Richardson, TX)
- …intelligence, inspiring the world to learn, communicate and advance faster than ever. As an HPC Staff Engineer at Micron, you will join a diverse team of ... administrators directly responsible for managing and supporting production server and storage environments, including enterprise SAN NAS and cloud storage systems across the company's global infrastructure! Your role will involve implementing new storage… more
- NVIDIA (Santa Clara, CA)
- …the heart of this transformation. We are looking for a Senior AI & HPC Observability Engineer to design and build the next-generation observability platform for ... covering metrics, logs, traces, and events for GPU-powered AI and HPC workloads. + Build large-scale telemetry data pipelines leveraging OpenTelemetry, Kafka,… more
- Johns Hopkins University (Baltimore, MD)
- …on performance optimization, workflow design, and reproducible computing. Classified Title: HPC Sr. Scientific Software Engineer Job Posting Title (Working ... Title): HPC Sr. Scientific Software Engineer (IT@JH Research Computing) Role/Level/Range: ATP/04/PG Starting Salary Range: $99,800 - $175,000 Annually… more
- BAE Systems (Fort Meade, MD)
- …Other incentives may be available based on position level and/or job specifics. ** Engineer - Future HPC ** **115255BR** EEO Career Site Equal Opportunity ... You will work with some of the most advanced HPC systems on the planet. As the subject matter...Education, Experience, & Skills** 12 Years experience as systems engineer / program management planning and executing SIGINT operations.… more
- Federal Reserve Bank (Kansas City, MO)
- …Community Affairs. We are seeking an experienced High Performance Computing Engineer who can plan, implement, and maintain advanced cyberinfrastructure solutions. ... The ideal candidate will have deep expertise in HPC architectures, parallel computing frameworks, and scientific computing applications. You will work independently… more
- Mayo Clinic (Rochester, MN)
- …Research & Speciality Services area is seeking a highly skilled and motivated Tech Spec I HPC Engineer to join the HPC Team. The ideal candidate will have ... Python, Bash, Powershell, and capturing and reporting on usage metrics across HPC platforms. This role requires a deep understanding of high-performance computing (… more
- NVIDIA (Santa Clara, CA)
- …+ Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop ... and operating large scale compute infrastructure. + Experience with AI/ HPC job schedulers and orchestrators, such as Slurm, K8s...such as Slurm, K8s or LSF. Applied experience with AI/ HPC workflows that use MPI and NCCL. + Proficient… more
- NVIDIA (Santa Clara, CA)
- …Make the choice to join us today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the design and implementation of ground ... + Provide leadership and strategic guidance on the management of large-scale HPC systems including the deployment of compute, networking, and storage. + Develop… more