-
System Engineer - Interconnect
- Meta (Austin, TX)
-
Summary:
The Accelerator Reference Design Team is looking for a System Engineer to design, implement, and maintain hardware designs for custom AI hardware interconnects. The ARD team is working on the design and implementation of hardware modules for Meta’s custom AI accelerators. Meta is developing one of the world’s highest performant AI/HPC clusters using custom-designed AI accelerators. In this role, you will have a unique opportunity to shape the future AI/HPC of Meta by specifying technical requirements for custom interconnects, driving specifications and designs of those interconnects, and steering the industry and ecosystem partners.The ideal candidate will operate in a highly multi-tasked, fast-paced and very cross-functional engineering environment. They will solve complex problems in high-performance AI that span across silicon, hardware, and software. They will also work directly in hardware design and protocol development for high-performance interconnects. A successful candidate will be a HW System and platform builder, work on rack scale solutions, individual servers, and custom ASICs. The candidate will also be fluent in state of the art high-performance interconnects. The position is for a lead developer and architect, debugging complex issues, working in broad and ambiguous problems, and delivering hardware systems at-scale. The candidate will work closely with internal customers and partners who are on the front-line of developing Meta’s custom ASICs for AI.The Accelerator Reference Design Team designs, builds, brings-up, tests and integrates hardware systems that power Meta’s custom AI silicon platforms, deployed in data centers worldwide. This is a rare opportunity to join our team and help us build some of the world’s most open and efficient AI platforms.
Required Skills:
System Engineer - Interconnect Responsibilities:
1. Work as part of the Accelerator Reference Design Team to design, develop, test and integrate high-performance interconnects for Meta’s custom AI hardware
2. Collect requirements and develop specifications for Rackscale AI/HPC systems.
3. Develop and maintain code, to collect, analyze, and interpret data for interconnect performance between accelerators
4. Collaborate with cross-functional teams to develop detailed hardware specifications for our silicon and platforms, focusing on high-performance interconnect
5. Develop concepts and proof-of-concept experiments to improve performance and reliability of new and existing interconnects
6. Specify and Design the module and interconnect functionality of Meta’s custom AI silicon so that it integrates well with the data centers and the fleet.
Minimum Qualifications:
Minimum Qualifications:
7. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
8. 8+ years of hands-on experience in implementing high-performance computer server systems
9. 4+ years of experience in developing and deploying customized high-performance interconnects
10. 4+ years of designing or specifying custom silicon functionality for high-performance interconnects
11. Basic proficiency in C++ and/or Python
12. Experience with general understanding of architectural trade-offs in hardware and software across cost, performance, and reliability
13. Experience working effectively as an individual and in a multidisciplinary team
14. Troubleshooting skills and the experience diving into software, firmware, hardware, and network problems
Preferred Qualifications:
Preferred Qualifications:
15. 10+ years of hands-on experience in implementing high-performance computer server systems
16. 6+ years of experience in developing and deploying custom, high-performance interconnects
17. Network/fabric bus design & operation experience: Ethernet, Infiniband, RoCE networks, popular HPC fabrics
18. Core domain knowledge in servers and networking and at least one of high performance computing or silicon (ASIC or FPGA) development
19. Experience in developing both hardware and software for interconnect protocol stacks
20. Experienced with system performance analysis, debug, and optimization practices
21. Knowledgeable with chip architecture, microarchitecture, design and interconnects
22. Familiarity with GPU/accelerator low level programming and performance validation
Public Compensation:
$170,000/year to $240,000/year + bonus + equity + benefits
**Industry:** Internet
Equal Opportunity:
Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.
Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at [email protected].
-