- NVIDIA (Santa Clara, CA)
- …NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack. We are searching for a highly motivated ... our data center platforms and products + Characterize real-world AI training, inference, and HPC workloads at...to stand out from the crowd: + Experience with AI / ML frameworks (PyTorch, TensorFlow, JAX). Knowledge of… more
- Oracle (Seattle, WA)
- …the forefront of building a cutting-edge, ultra-high-performance GPU platform designed to support AI / ML / HPC workloads. This is your chance to be part of ... automation, and diagnostic services. These are essential for running distributed AI / ML / HPC workloads across thousands of GPUs, leveraging technologies like… more
- NVIDIA (Santa Clara, CA)
- …and tools that enable researchers and engineers to develop the next generation of AI / ML systems. By joining us, you'll help design solutions that power some ... of GPUs and petabytes of storage in multi-region clusters. + Collaborate with AI / ML research teams to understand their requirements and translate them into… more
- Microsoft Corporation (Redmond, WA)
- …to improve defenses and enablement. + Align with central Microsoft security and AI roadmaps, landing platform capabilities in Copilot and MAI consumer scenarios. ... Slurm, HPC ), containerization and orchestration technologies (Docker, Kubernetes) for ML model deployment, and ML lifecycle management in production… more
- Microsoft Corporation (Redmond, WA)
- …improve defenses and enablement. + Align with central Microsoft security and AI roadmaps, influencing platform capabilities and landing them in Copilot ... Slurm, HPC ), containerization and orchestration technologies (Docker, Kubernetes) for ML model deployment, and ML lifecycle management in production… more
- NVIDIA (Santa Clara, CA)
- …by collaborating with teams with varied strengths including GPU Compute, Distributed Systems, Networking, ML Infra, AI Platform , and Cloud Services to ensure ... reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads ( AI / ML , HPC clusters, GPU infrastructure) + Embedding security… more
- Oracle (Nashville, TN)
- …background in distributed cloud systems **with direct experience in GPU computing, AI / ML workloads, and high-performance infrastructure.** They will be an ... driver installation, firmware management, and performance troubleshooting Familiarity with AI / ML frameworks (eg, PyTorch, TensorFlow, JAX) and distributed… more
- NVIDIA (Santa Clara, CA)
- …professional experience building and scaling high-performance distributed systems, ideally in ML , HPC , or large-scale data infrastructure. + Extensive knowledge ... people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An...speed and improve safety, working closely with research and platform teams across NVIDIA. What you'll be doing: +… more
- NVIDIA (Santa Clara, CA)
- …of GPUs. Join our team of experts and help us build a supercharged AI platform that improves efficiency, resilience, and Model FLOPs Utilization (MFU). In ... This team focuses on optimizing efficiency and resiliency of ML workloads, as well as developing scalable AI...in building a highly scalable, fault tolerant and optimized AI platform . What you will be doing:… more
- NVIDIA (Santa Clara, CA)
- …etc.) and integration into large‑scale telemetry systems. + Deep knowledge of AI / ML infrastructure, high‑performance computing ( HPC ), networking, and cloud ... NVIDIA has become the platform upon which every new AI -powered...with enterprise platforms; deployments at modern data‑center scale; delivered ML / AI observability solutions for LLMOps, predictive incident… more