• Senior Software Engineer, ML Training Platform

    DoorDash (San Francisco, CA)
    …ML Training Platform-creating reliable, extensible solutions for data transformations, distributed model training, and rapid experimentation in production. You'll ... + Architect & Implement Scalable Solutions - Design resilient pipelines for distributed model training (eg, PyTorch, LightGBM) on Kubernetes, optimizing for both… more
    DoorDash (11/17/25)
    - Related Jobs
  • Senior Technical Services Engineer- Weekends

    MongoDB (Palo Alto, CA)
    …a team that works at the frontier of SaaS services and distributed database systems, this is the role for you. Click here ... **You may also have** + Experience with scalable and highly available distributed systems + Advanced experience with Operating System and Networking Core concepts… more
    MongoDB (11/08/25)
    - Related Jobs
  • Distinguished, Architect - AI/ML

    Walmart (Sunnyvale, CA)
    …user experience. + **Perform complex troubleshooting and analysis** of large-scale distributed systems across Walmart's entire technology stack, using expertise in ... coding, algorithms, and distributed system design. **Strategic Technical Innovation:** + **Partner closely with all engineering organizations** including E-commerce,… more
    Walmart (11/01/25)
    - Related Jobs
  • Senior Manager, Ads Infrastructure

    Unity Technologies (San Francisco, CA)
    …opportunity** The Ads Infrastructure team at Unity builds and operates the core distributed systems that power one of the largest real time advertising platforms in ... + Lead and mentor infrastructure engineers building and operating large-scale distributed systems + Contribute directly to code, design, and incident resolution… more
    Unity Technologies (10/18/25)
    - Related Jobs
  • Lead Engineer, Online Archive Storage Systems

    MongoDB (San Francisco, CA)
    …role that combines people management with deep technical expertise to solve complex distributed systems problems at scale. As the Lead Engineer for the ADFA Storage ... production systems + 5+ years software engineering experience, primarily in backend/ distributed systems + Comfort with 24/7 on-call rotation responsibilities **The… more
    MongoDB (10/18/25)
    - Related Jobs
  • Staff Cloud Engineer - Architecture

    TP-Link North America, Inc. (Irvine, CA)
    …performance, scalability, and budget constraints. + Possess experience in distributed systems and architectural design, building highly available, resilient, and ... distributed architectures that ensure system stability under large-scale, high-concurrency...cloud solutions. + Deep understanding of architectural theories and distributed system principles, capable of designing highly available, resilient,… more
    TP-Link North America, Inc. (10/16/25)
    - Related Jobs
  • Senior GPU and HPC Infrastructure Engineer - DGX…

    NVIDIA (Santa Clara, CA)
    …networking, familiarity with software testing and deployment, familiarity with distributed systems, and excellent communication and planning abilities. Experience ... GPU clusters. + Build automated test infrastructure that we use to qualify distributed systems for operation. + Work with engineering teams across NVIDIA to ensure… more
    NVIDIA (10/09/25)
    - Related Jobs
  • Senior Software Development Engineer, AI/ML, AWS…

    Amazon (Cupertino, CA)
    …to work at the intersection of machine learning, high-performance computing, and distributed architectures, where you'll help shape the future of AI acceleration ... compiler engineers and runtime engineers to create, build and tune distributed inference solutions with Trainium and Inferentia. Experience optimizing inference… more
    Amazon (10/08/25)
    - Related Jobs
  • Principal Staff Software Engineer, Service…

    LinkedIn (Mountain View, CA)
    …practical experience. + 7+ years of industry experience in software design, distributed systems, or infrastructure engineering. + 7+ years of experience in ... in an architect or technical leadership role. + Experience with distributed systems, networking, or inter-service communication protocols (eg, gRPC, HTTP/2, RPC… more
    LinkedIn (10/08/25)
    - Related Jobs
  • Site Reliability Engineer (Senior or Staff),…

    MongoDB (San Francisco, CA)
    …SRE on the Fabric team, you will leverage your expertise in networking, distributed systems, and automation to ensure our systems are resilient, scalable, and ... should** + Have 6+ years of experience working on software and operating distributed systems, with deep expertise in networking fundamentals and a good understanding… more
    MongoDB (10/07/25)
    - Related Jobs