- Oracle (Albany, NY)
- …layer to support exponential scale and deliver advanced capabilities for OCI GPU Servers. We are seeking a talented Principal Software Engineer with deep ... experience in GPU and Networking technologies to help build the infrastructure...You'll develop secure provisioning and management solutions for OCI GPU -accelerated servers, ensuring robust performance, reliability, and security at… more
- Oracle (Albany, NY)
- …Innovation team is pioneering the creation of next-generation AI/HPC networking for GPU superclusters at massive scale. Our mission is to design and deliver ... changes required across (Kernel, NIC, switch, transport, protocol, storage, GPU comms) + Develop production-grade, high-performance software features with rigorous… more
- Oracle (Albany, NY)
- …job requires a solid understanding of PCIe subsystems and ARM, x86 and GPU architectures. The ideal candidate should have proven experience in debugging system level ... Good working knowledge of X86 and ARM assembler a plus. + Experience with GPU architectures including configuring systems for cluster testing, using GPU system… more
- Oracle (Albany, NY)
- …Windows OS background to take on the challenge of engineering Compute GPU /HPC Infrastructure solutions and build an imaging service for Large Scale Compute/HPC/AI/ML ... issues (eg, QEMU), and building solutions to address Platform and Custom GPU /HPC images to meet customer workload requirements. OCI has taken the lead… more
- Red Hat (Albany, NY)
- …for latency and throughput. You care about Time Per Output Token (TPOT), GPU utilization, GPU networking optimizations, and Kubernetes scheduler efficiency. + ... networking, including the ability to tune scheduler logic (affinity/tolerations) for ** GPU workloads** and troubleshoot complex **CNI** failures. + **AI Inference… more
- Oracle (Albany, NY)
- …AI Infrastructure is at the forefront of building a cutting-edge, ultra-high-performance GPU platform designed to support AI/ML/HPC workloads. This is your chance to ... is responsible for designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services. These… more
- Meta (New York, NY)
- …working on high performance computing (HPC) and AI/ML systems, including: GPU /ASIC-based kernel development and optimization (eg CUDA, ROCm), distributed systems for ... and serving, and systems architecture and performance 11. Accelerator ( GPU /ASIC) kernel development and optimization 12. Experience in accelerating libraries… more
- Oracle (Albany, NY)
- …support the end-to-end lifecycle of AI and machine learning workloads. From GPU infrastructure and training pipelines to model serving and deployment tools-we ... will work on critical components of OCI's AI platform, including **high-scale GPU cluster management** , **self-service ML infrastructure** , and **model training… more
- Oracle (Albany, NY)
- …The Compute Scaled Manufacturing organization's mission is to meet surging GPU demand for Oracle's AI infrastructure by scaling the server qualification ... program manager with a background in supply chain, server (CPU and GPU ) hardware, data center capacity delivery, cloud services, server technology, and software… more
- Oracle (Albany, NY)
- …heart of OCI is the large-scale distributed infrastructure to provide compute CPU and GPU bare metal and virtual machine capacity to our customers. We are the group ... that ingests CPU/ GPU servers as they land in the data centers,...provide technical guidance and mentorship across the lifecycle of CPU/ GPU server systems, from manufacturing and validation including firmware… more