• AI / HPC Systems

    Meta (Menlo Park, CA)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...a loss-less fabric interconnect with minimal latency. To improve performance of these systems we constantly look… more
    Meta (04/20/25)
    - Related Jobs
  • AI / HPC Systems

    Meta (Washington, DC)
    …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI / HPC Systems Performance Engineer Responsibilities: 1. ... **Summary:** Meta's AI Training and Inference Infrastructure is growing exponentially...workloads that expects a loss-less fabric interconnect. To improve performance of these systems we constantly look… more
    Meta (03/22/25)
    - Related Jobs
  • Hardware Systems Engineer, NPI AI

    Meta (Austin, TX)
    … testing with focus on automation. 22. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity with ... and/or similar languages. **Preferred Qualifications:** Preferred Qualifications: 16. Proficiency in High- Performance Computing ( HPC ) or AI system… more
    Meta (04/24/25)
    - Related Jobs
  • Hardware Systems Engineer, NPI AI

    Meta (Menlo Park, CA)
    … system architecture at rack level and at scale, as well as debugging AI / HPC systems , performance optimizations, including familiarity with relevant ... of issues. RTP team also helps in exploring, developing and productizing high- performance software and hardware technologies for AI at datacenter scale.RTP… more
    Meta (04/26/25)
    - Related Jobs
  • Production Systems Engineer, Sustaining

    Meta (Menlo Park, CA)
    …hardware and software components, co-design 15. Experience in developing or debugging AI / HPC systems , performance optimizations, including familiarity ... or supporting production hardware at scale 9. Experience in deploying and productionizing AI / HPC systems and/or related components at scale 10. Experience in… more
    Meta (04/23/25)
    - Related Jobs
  • HPC / AI - Kubernetes Engineer

    Deloitte (Hermitage, TN)
    …day-to-day operations of the High- Performance Computing ( HPC ) and AI infrastructure, ensuring all systems meet or exceed requirements for scalability, ... Responsibilities: + System support and management of infrastructure for HPC and AI systems , this...system performance , ensuring the efficient execution of AI models and HPC applications. Implement techniques… more
    Deloitte (04/25/25)
    - Related Jobs
  • High Performance Computing ( HPC

    Amazon (Annapolis Junction, MD)
    …and storage - 5+ years building or optimizing computational applications for large scale HPC systems (eg physics based simulations) to take advantage of high ... Description Amazon Web Services is seeking a High Performance Computing ( HPC ) Solutions Architect to...the world's technology? Come join us! Key job responsibilities HPC is growing in importance as these systems more
    Amazon (03/12/25)
    - Related Jobs
  • Senior AI - HPC Cluster Engineer

    NVIDIA (Santa Clara, CA)
    …to work effectively with diverse teams and individuals. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Passion for ... GPU compute clusters that run demanding deep learning, high performance computing, and computationally intensive workloads. We seek a...storage systems like Lustre and GPFS for AI / HPC workloads + Familiarity with deep learning… more
    NVIDIA (04/02/25)
    - Related Jobs
  • Senior AI - HPC Storage Engineer

    NVIDIA (Santa Clara, CA)
    …designing and operating large scale storage infrastructure. + Experience analyzing and tuning performance for a variety of AI / HPC workloads. + Experience ... join us today! As a member of the GPU AI / HPC Infrastructure team, you will provide leadership...solutions to enable runs of demanding deep learning, high performance computing, and computationally intensive workloads. We seek an… more
    NVIDIA (02/05/25)
    - Related Jobs
  • Senior Observability Architect, AI

    NVIDIA (Santa Clara, CA)
    …looking for a technical leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and ... and visualization to spectacularly improve efficiency, performance , and productivity of AI and HPC workloads. You will lead technical teams to develop,… more
    NVIDIA (02/13/25)
    - Related Jobs