- Oracle (Santa Clara, CA)
- …automation, and diagnostic services. These are essential for running distributed AI/ML/ HPC workloads across thousands of GPUs, leveraging technologies like RoCE and ... Infiniband. **Why Join Us?** + Innovative Projects: Build groundbreaking solutions for our customers from the ground up. + Exciting Times: Be part of a young, fast-growing team working on ambitious new initiatives. + Dynamic Environment: Collaborate in a… more
- Super Micro Computer (San Jose, CA)
- …for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company ... among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers,… more
- Oracle (Nashville, TN)
- …automation, and diagnostic services. These are essential for running distributed AI/ML/ HPC workloads across thousands of GPUs, leveraging technologies like RoCE and ... Infiniband. We are looking for a highly skilled and motivated distributed systems engineer who can architect solutions to scale and optimize Monitoring and Repair solutions for AI infrastructure components like GPU control plane and GPU data plane that provide… more
- Oracle (Nashville, TN)
- …(eg, NCCL, Horovod, DeepSpeed) Experience supporting or operating large-scale HPC , AI, or GPU-accelerated clusters in production environments Excellent ... problem-solving skills, with the ability to troubleshoot complex issues and drive resolution in a fast-paced environment Written and verbal communication skills with the ability to present complex information clearly to all audiences Strong documentation… more
- Oracle (Santa Clara, CA)
- …building a cutting-edge, ultra-high-performance GPU cluster based Data Centers designed to support AI/ML/ HPC workloads. This is your chance to be part of the AI ... revolution, creating systems that allow customers to scale from tens to thousands of GPUs without compromising performance. We are the AI Infrastructure Delivery Engineering org at OCI. The OCI Infrastructure Delivery Engineering team will be responsible for… more
- quadric.io, Inc (Burlingame, CA)
- …Candidates must demonstrate deep technical mastery of Quadric's product ecosystem including HPC Hardware (IP, Chips, Boards), SDK, and various algorithms (NN, DSP, ... Vision, Path Planning, etc.). Responsibilities: + Customize AI/LLM frameworks and integrate Quadric products and SDK into AI/LLM ecosystem + Develop AI applications, demos using Quadric's SDK and AI frameworks to showcase product capabilities and performance… more
- Oracle (Phoenix, AZ)
- …of hands-on experience in data center operations , ideally in high-performance computing ( HPC ) , AI infrastructure or related technical fields. + Prior experience in ... a **hyperscale** data center or similar large-scale IT environment is preferred. + Direct experience working with AI hardware, such as **NVIDIA H100** , **AMD MI300** , **NVIDIA A100** , or similar GPU-based accelerator systems is a plus. + **Technical… more
- Oracle (Seattle, WA)
- …in a customer facing role in a tech company. Experience with AI and HPC end customers is a big plus. Disclaimer: **Certain US customer or client-facing roles ... may be required to comply with applicable requirements, such as immunization and occupational health mandates.** **Range and benefit information provided in this posting are specific to the stated locations only** US: Hiring Range in USD from: $109,200 to… more
- NVIDIA (Santa Clara, CA)
- …experience with performance modeling, profiling, debug, and code optimization of a DL/ HPC /high-performance application + Architectural knowledge of CPU and GPU + GPU ... programming experience (CUDA or OpenCL) GPU deep learning has provided the foundation for machines to learn, perceive, reason and solve problems posed using human language. The GPU started out as the engine for simulating human imagination, conjuring up the… more
- RTX Corporation (Marlborough, MA)
- …+ Experience administrating Linux operating systems. + Experience with datacenter / HPC computer hardware. + Experience with scripting (eg shell, bash, python). ... **Qualifications We Prefer:** + CompTIA Linux+ certification. + Experience with automation tools / frameworks (eg Terraform, Ansible, Chef, GitLab, GitLab Runner). + Experience with observability tools such as Grafana, Telegraf, and Prometheus. + Experience… more