• Principal Engineer, VCF Cluster Management Team

    Broadcom (Palo Alto, CA)
    …solutions. We are dedicated to building robust, scalable, and high-performance distributed systems that empower enterprises to achieve their digital transformation ... for defining the technical vision, architecting, and leading the implementation of complex distributed systems that are central to our VCF offerings. You will work… more
    Broadcom (07/29/25)
    - Related Jobs
  • Senior Storage Engineer - DGX Cloud

    NVIDIA (Santa Clara, CA)
    …We are looking for an engineer who has a deep understanding of distributed systems development, object storage, network file transfer protocols, and file systems. ... + Solve technical problems spanning the areas of orchestration, distributed systems, service modeling, API modeling, monitoring, deployment, and automation… more
    NVIDIA (08/08/25)
    - Related Jobs
  • Sr. Technical Product Manager AI/ML Training,…

    Amazon (Cupertino, CA)
    …ML training workloads on AWS Trainium through deep understanding of distributed training, compilation systems, and hardware acceleration. The ideal candidate will ... have a solid understanding of AI/ML models training, distributed training architectures, and performance optimization techniques. They should be able to assess… more
    Amazon (08/18/25)
    - Related Jobs
  • Staff Software Engineer, AI Platform

    LinkedIn (Mountain View, CA)
    …resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and optimize ... problems. -Designing, implementing, and optimizing the performance of large-scale distributed serving or training for personalized recommendation as well as… more
    LinkedIn (08/08/25)
    - Related Jobs
  • Software Engineer - Callisto

    Rubrik (Palo Alto, CA)
    …and stack. At the heart of Rubrik's architecture is an open-source scalable, distributed SQL database. This Is a fundamental building block for all infrastructure ... components (eg distributed file system) and applications (eg Oracle db backup)...early career software engineer with a strong interest in distributed database technologies and cloud computing platforms and a… more
    Rubrik (08/07/25)
    - Related Jobs
  • Software Engineer, SystemML - AI Networking

    Meta (Menlo Park, CA)
    …has been integrated into PyTorch and is on the critical path of multi-GPU distributed training. In other words, nearly every distributed GPU-based ML workload in ... GPU training and inference fleet through an observable, reliable and high-performance distributed AI/GPU communication stack. Currently, one of the team's focus is… more
    Meta (08/01/25)
    - Related Jobs
  • Sr. Software Architect - Virtualization

    Panasonic Avionics Corporation (Irvine, CA)
    …IOT, Cloud or similar industry. + 10+ years of experience architecting distributed systems using Java, C++ or GoLang. + Experience implementing virtualization ... both bare metal and Cloud environments. + 10+ years of experience architecting distributed systems using Java, C++, GoLang or similar languages. + Deep technical… more
    Panasonic Avionics Corporation (08/11/25)
    - Related Jobs
  • Sr Software Dev Engineer, Edge AI ML Platform…

    Amazon (Sunnyvale, CA)
    …Edge AI team at Amazon Devices (Lab126) where you'll architect and implement distributed training systems that scale to hundreds of billions of parameters. Your work ... versions that run on constrained edge devices. Lead the development of our distributed training platform for large language models up to 400B parameters Design… more
    Amazon (08/02/25)
    - Related Jobs
  • Data Solutions Engineer, Storage

    DoorDash (San Francisco, CA)
    …We're hiring a Data Solutions Engineer with deep expertise in distributed databases, particularly Apache Cassandra, Redis, Kafka, and database agnostic abstractions. ... In this role, you will design, optimize, and scale distributed data access layers that power DoorDash's most critical systems, ensuring high availability, low… more
    DoorDash (07/27/25)
    - Related Jobs
  • Staff Machine Learning Engineer, AI Platform

    General Motors (Sunnyvale, CA)
    …model training performance analysis and optimizaiton solutions to scale distributed training workflows and maximize resource utilization across heterogeneous ... experience + 3+ years specialized experience in AI/ML infrastructure, eg, enabling distributed training for scaling large ML models + Strong programming skills in… more
    General Motors (07/23/25)
    - Related Jobs