• AI/ HPC Systems Performance Engineer

    Meta (Olympia, WA)
    …and host networking, comms lib and scheduling infrastructure. **Required Skills:** AI/ HPC Systems Performance Engineer Responsibilities: 1. Active member of ... a multi-disciplinary team to develop solutions for large scale training systems. 2. Responsible for the overall performance of the communication system, including performance benchmarking, monitoring and troubleshooting production issues. 3. Identify potential… more
    Meta (03/22/25)
    - Related Jobs
  • Software Engineer , Systems ML - HPC

    Meta (Bellevue, WA)
    …AI workload needs.We are hiring in multiple locations. **Required Skills:** Software Engineer , Systems ML - HPC Specialist Responsibilities: 1. Apply relevant ... **Summary:** Meta is seeking an AI Software Engineer to join our Research & Development teams....on the web.Some aspects of this role as an HPC specialist may include authoring components such as cuBLAS,… more
    Meta (02/14/25)
    - Related Jobs
  • Software Engineer - AWS PCS, High…

    Amazon (Seattle, WA)
    Description The AWS High Performance Computing ( HPC ) team is looking for experienced SDE to work on a new HPC service. The HPC team is building a core set of ... that allow our customers to plan, schedule, and execute HPC workloads across the full range of AWS compute...different locations. This is an opportunity to operate and engineer systems on a global scale, while touching and… more
    Amazon (04/30/25)
    - Related Jobs
  • Senior Software Development Engineer

    Amazon (Seattle, WA)
    Description We are seeking an experienced engineer to work on distributed AI/ML systems. This role involves working on collective operations - the fundamental ... systems is valued, and experience with high-speed networking or HPC interconnects is valued highly. If you like solving...you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful… more
    Amazon (03/14/25)
    - Related Jobs
  • Hardware Systems Engineer , AI NPI

    Meta (Bellevue, WA)
    …the new product introduction (NPI) phase. **Required Skills:** Hardware Systems Engineer , AI NPI Responsibilities: 1. Drive and execute end-to-end system validation ... strategy (hardware and software), with a focus on various AI/ HPC hardware systems in datacenter applications. 2. Lead the bring-up, validation, and deployment of… more
    Meta (05/07/25)
    - Related Jobs
  • Production Systems Engineer , Fleet AI…

    Meta (Bellevue, WA)
    **Summary:** Meta is seeking an experienced Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation ... and lifecycle of servers in production. **Required Skills:** Production Systems Engineer , Fleet AI Systems Lead Responsibilities: 1. Lead interfacing with external… more
    Meta (03/29/25)
    - Related Jobs
  • Production Systems Engineer , Fleet AI…

    Meta (Bellevue, WA)
    **Summary:** Meta is seeking a Production Systems Engineer to join our Release to Production (RTP) team. Our servers and data centers are the foundation upon which ... and lifecycle of servers in production. **Required Skills:** Production Systems Engineer , Fleet AI Systems Responsibilities: 1. Interface with external vendors and… more
    Meta (03/15/25)
    - Related Jobs
  • Systems Development Engineer - SRE, Kuiper

    Amazon (Redmond, WA)
    …Qualifications - Experience working with ASIC teams and High-Performance Computing ( HPC ) environments - AWS certifications (eg, AWS Certified Solutions Architect, ... AWS Certified DevOps Engineer ) - Experience with container orchestration, monitoring tools, and database administration - Familiarity with incident management and… more
    Amazon (05/01/25)
    - Related Jobs
  • Sr. Software Engineer , EC2 VPC

    Amazon (Seattle, WA)
    …To achieve our ambitious goals, we're expanding our team and looking for a Senior Engineer to lead the development of a new EC2 Service critical to scale our current ... and next-generation Machine Learning (ML) and HPC Platforms. You will also be a technical leader for a team that owns the software-defined networking (SDN) dataplane… more
    Amazon (04/27/25)
    - Related Jobs
  • Software Engineer - Datacenter networking

    Meta (Bellevue, WA)
    …of our network engineering teams is for you! **Required Skills:** Software Engineer - Datacenter networking Responsibilities: 1. Design and implement drivers (and/or ... PHY, FPGAs, sensors, fan control, power etc). 3. Develop and enhance HPC collective communication and parallel computing libraries such as NCCL, RCCL, OneCCL,… more
    Meta (04/18/25)
    - Related Jobs