- Meta (Menlo Park, CA)
- …fabric and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Network Engineer Responsibilities: 1. Design, develop, test and ... a loss-less fabric interconnect. To improve performance of these systems we constantly look for opportunities across our infrastructure...operate networking systems to support large scale AI training jobs. 2.… more
- Amazon (Sunnyvale, CA)
- …assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future 10017 ... line. The Nitro Team is looking for engineers with systems knowledge and experience in area such as Linux...High performance computing workloads. The Nitro High Memory and HPC team owns the purpose built platform development for… more
- Amazon (Cupertino, CA)
- Description Do you like to use network and Unix systems engineering to deliver simple, sustainable, and repeatable solutions? Would you like to play a key role in ... Core Networking team is looking for a Network Development Engineer to join our Network Fabric Engineering (NFE) team....Network Fabric Engineering (NFE) team. As a Network Development Engineer , you will be responsible for building, deploying and… more
- NVIDIA (Santa Clara, CA)
- We are now looking for a senior HPC software engineer . As a member of our the High Performance Computing Software development team, you will be responsible for ... InfiniBand, Ethernet + Deep knowledge in computer architecture and operating systems + Experience in performance optimizations + MSc or equivalent experience… more
- Meta (Menlo Park, CA)
- …Qualifications:** Preferred Qualifications: 11. Full-stack experience and understanding of AI/ HPC systems , from HW/infrastructure through the application layer, ... in some of the world's largest scale clusters. **Required Skills:** Software Engineer , Accelerator Systems & Technologies Responsibilities: 1. Understand and… more
- Meta (Menlo Park, CA)
- **Summary:** Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on AI/ML initiatives supporting large scale AI Training and ... to hyperscalar bring up and validation. **Required Skills:** Hardware Systems Engineer , NPI AI Lead Responsibilities: 1....rack level and at scale, as well as debugging AI/ HPC systems , performance optimizations, including familiarity with… more
- Meta (Menlo Park, CA)
- **Summary:** Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on AI/ML initiatives supporting large scale AI Training and ... to Meta Silicon hyperscalar bring up and validation. **Required Skills:** Hardware Systems Engineer , NPI AI Responsibilities: 1. Lead the bring-up, validation,… more
- SLAC National Accelerator Laboratory (Menlo Park, CA)
- Senior High Performance Computing Engineer Job ID 6383 Location SLAC - Menlo Park, CA Full-Time Regular **SLAC Job Postings** **About SLAC:** The SLAC National ... hybrid work options.** **Position Overview:** As a Senior High Performance Computing Engineer in the Scientific Computing Services Division of the Technology and… more
- NVIDIA (Santa Clara, CA)
- …We deliver communication runtimes like NCCL and NVSHMEM for Deep Learning and HPC applications. We are looking for a motivated Partner Enablement Engineer ... guide our key partners and customers with NCCL. Most DL/ HPC applications run on large clusters with high-speed networking...Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP,… more
- NVIDIA (Santa Clara, CA)
- …Experience with Cloud Deployment, BCM, Terraform. + Understanding of fast, distributed storage systems like Lustre and GPFS for AI/ HPC workloads. + Familiarity ... diverse team today! As a member of the GPU AI/ HPC Infrastructure team, you will provide leadership in the...automation to improve researchers productivity. As a Site Reliability Engineer , you are responsible for the big picture of… more