- NVIDIA (Santa Clara, CA)
- …in defining and implementing critical resiliency features for AI supercomputers at a scale of 100,000+ GPUs. Your expertise will be crucial in driving down cluster ... features that improve AI system reliability at a massive scale , such as fast checkpoint-recovery, error detection, error isolation,...be determined based on your location, experience, and the pay of employees in similar positions. The base salary… more
- NVIDIA (Santa Clara, CA)
- …AI and HPC software stack. NVIDIA NVLink Fusion will enable industry-leading AI scale -up and scale -out performance with NVIDIA technology plus semi-custom ASICs ... build an ASIC hybrid AI infrastructure with NVIDIA NVLink, rack- scale architecture. We're searching for a highly motived, technical...be determined based on your location, experience, and the pay of employees in similar positions. The base salary… more
- NVIDIA (Santa Clara, CA)
- …architect/engineer for a Senior HPC architect role to support deployment and bringup of large- scale GPU compute clusters. Be a key player to enable the most exciting ... in artificial intelligence and GPU computing. Provide insights on and implement at- scale system administration and tuning mechanisms for large- scale compute… more
- Microsoft Corporation (San Francisco, CA)
- …overall, Azure Inventory of cloud resources and is a core service enabling at scale experiences like ARG, Azure Portal, Azure Search, Azure Catalog. It provides a ... these values to all of Azure customers. So naturally, scale and performance are our team's DNA. Anything we...online-software solutions. Software Engineering IC4 - The typical base pay range for this role across the US is… more
- Cardinal Health (Sacramento, CA)
- …this critical role, primary focus is to architect, implement and operationalize large- scale enterprise data platforms, and solutions leveraging one or more of Google ... technologies to build Data Platforms and Pipelines at enterprise scale to support Analytics and ML/AI solutions. All this...+ 401k savings plan + Access to wages before pay day with myFlexPay + Flexible spending accounts (FSAs)… more
- DoorDash (San Francisco, CA)
- …pivotal moment for DelEx. We're laying down the foundations for the next decade of scale while pushing the boundaries of what AI and LLMs can do for customers, ... direction for the organization. You'll design, influence, and deliver large- scale platforms that combine cutting-edge AI/ML with rock-solid engineering foundations.… more
- Red Hat (Sacramento, CA)
- **Job Summary:** The Red Hat Performance and Scale Engineering team (PSAP) is hiring a hands-on performance and resilience engineer to lead the "AI workloads fault ... injection and resilience at scale " efforts for vLLM and llm-d (distributed LLM inference...$211,180.00. Actual offer will be based on your qualifications. ** Pay Transparency** Red Hat determines compensation based on several… more
- Microsoft Corporation (Mountain View, CA)
- Are you passionate about cloud computing, large scale distributed systems engineering problems and working on bleeding edge technology at massive scale ? The ... to improve availability, reliability, efficiency, observability, and performance at scale , including advances in acceleration and security. + Supports sustaining… more
- DoorDash (San Francisco, CA)
- …We build and operate two critical platforms that move data at massive scale to the lakehouse: Realtime Streaming Platform: Moves analytical events from DoorDash's ... on the horizon, the team is rearchitecting the ingestion stack to meet the scale and agility required for the next decade. We're also investing in developer… more
- Walmart (Sunnyvale, CA)
- …shipping decisions using ML. This is a unique opportunity to build for scale , innovate relentlessly, and impact billions of dollars in commerce a€" all while ... at the forefront of redefining global commerce on a scale . We are the backbone of one of the...the future of retail. **Benefits and Perks:** Beyond competitive pay , you can receive incentive awards for your performance.… more