- 
        SDE I - Systems, Runtime, and ML Infrastructure…
- Amazon (Cupertino, CA)
- 
             Description At AWS, we're pioneering the future of cloud computing and AI acceleration through innovative hardware-software co-design. Our teams within Annapurna Labs and AWS AI are creating the foundation for next-generation cloud infrastructure that powers thousands of customers worldwide, from cutting-edge startups to global enterprises. We operate at an unprecedented scale, designing custom silicon chips, advanced networking solutions, and ML accelerators that were unimaginable just a few years ago. Our work spans from the lowest levels of hardware abstraction to high-performance distributed training systems, creating unique opportunities for early-career engineers to make significant impact across multiple domains. Key job responsibilities - Develop and optimize software for custom hardware and ML infrastructure - Collaborate with hardware teams to understand and leverage chip architecture - Implement and improve networking, runtime, and system-level software - Assist in building and maintaining tools for profiling, monitoring, and debugging ML workloads - Contribute to the development of open-source ML frameworks and infrastructure projects - Participate in code reviews and implement best practices for software development - Learn and apply new technologies to solve complex engineering challenges About the team Candidates will be routed to specific teams based on their interests and our current needs during the application process: - The Elastic Network Adapter (ENA) team revolutionizes EC2 core networking, enabling enhanced networking capabilities across AWS's most critical compute instances. Here, you'll work with networking protocols and high-performance drivers that power millions of cloud workloads. - Our AWS Neuron SDK team develops the complete software stack for custom ML accelerators (Inferentia and Trainium), democratizing access to AI infrastructure. This team bridges the gap between popular ML frameworks and custom hardware. - The Machine Learning Server Software team maintains and optimizes the world's most advanced ML servers, focusing on system-level software that ensures peak performance of AI workloads. While we don't work directly on ML algorithms, we build the critical infrastructure that makes ML possible at scale. - The SoC Hardware Abstraction Layer (HAL) team works at the intersection of hardware and software, developing the crucial middleware that manages our custom silicon chips. This team ensures our innovative hardware designs translate into reliable, high-performance solutions. Basic Qualifications - To qualify, applicants should have earned (or expect to earn) a Bachelor’s or Master’s degree between December 2022 to September 2025. - Strong programming skills in C/C++ or Python, with solid understanding of data structures and algorithms - Understanding of computer architecture, operating systems, and Linux environments - Internship or project experience related to systems programming, networking, or ML Preferred Qualifications - Familiarity with version control systems (e.g., Git) and software development methodologies - Knowledge of ML concepts or frameworks (e.g., PyTorch, TensorFlow) - Interest in open-source development or contributions to technical communities Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner. Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $99,500/year in our lowest geographic market up to $200,000/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits . This position will remain posted until filled. Applicants should apply via our internal or external career site. 
 
 
- 
        
Recent Jobs
- 
                
                    SDE I - Systems, Runtime, and ML Infrastructure (AWS Custom Silicon), Annapurna Labs
                
                - Amazon (Cupertino, CA)