- TEKsystems (Beverly Hills, CA)
- …providers. Required Competencies: A minimum of 6 years of hands-on experience as a Network Engineer , or 12 years of progressive experience leading up to the ... a Infrastructure Administrator to lead the support, operation, and maintenance of IT Network infrastructure for the Academy of Motion Picture Arts and Sciences. This… more
- LiveRamp (San Francisco, CA)
- …use cases-within organizations, between brands, and across its premier global network of top-quality partners.** **Hundreds of global innovators, from iconic ... with Engineering teams** + **Setup and maintain Infrastructure & Product Reliability monitoring and alerting** + **Maintain and enhance CI/CD Tooling and Terraform… more
- NVIDIA (Santa Clara, CA)
- …with kubernetes including cluster operations, operator development, node health monitoring and working with GPU resource scheduling. We welcome out-of-the-box ... software related to scheduling GPU resources on kubernetes. + Implementing monitoring and health management capabilities that enable industry leading reliability,… more
- Amazon (Cupertino, CA)
- …to help. You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital ... data center. After launch you will oversee the fleet of servers you develop, monitoring their quality and how they are meeting the customer requirements. This is a… more
- LiveRamp (San Francisco, CA)
- …use cases-within organizations, between brands, and across its premier global network of top-quality partners.** **Hundreds of global innovators, from iconic ... of software products, including planning, design, coding, deployment, rollout, monitoring , and maintenance of services supporting these products** + **Comfortable… more
- NVIDIA (Santa Clara, CA)
- …and influence hardware design and architecture review. Develop performance-optimized active monitoring BMC solutions using DMTF Standards such as MCTP, Redfish, ... experience in developing BMC and/or microcontroller firmware for managing CPU, GPU, Network and Storage Devices. + Experience with the following embedded interfaces… more
- NVIDIA (Santa Clara, CA)
- …lifecycle management for large-scale Machine Learning systems. + Implement monitoring and health management capabilities that enable industry-leading reliability, ... multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Work on software that manages NVLINK topography across GPU… more
- Amazon (Cupertino, CA)
- …to help. You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital ... data center. After launch you will oversee the fleet of servers you develop, monitoring their quality and how they are meeting the customer requirements. A day in… more
- NVIDIA (Santa Clara, CA)
- …software related to managing fleets of GPU nodes. + Implementing monitoring and health management capabilities that enable industry leading reliability, ... harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Working with teams across NVIDIA to ensure production AI… more
- Amazon (Cupertino, CA)
- …or CS - 5+ years of experience with complex server, storage or network server designs - 5+ years of experience developing functional specifications, design ... with deploying and operating hardware and applications at scale - Developed monitoring and alerting systems to quickly identify and categorize failures in production… more