- NVIDIA (Santa Clara, CA)
- …can make a lasting impact on the world. NVIDIA is hiring an AI operations engineer within the Finance AI and Data Science team. You will work alongside data ... priorities, and knowledge bases. + Monitor & optimize AI systems using observability stacks to track model performance, system health, and lifecycle metrics. Build… more
- NVIDIA (Santa Clara, CA)
- …maintaining vital systems efficiently and reliably.. As a Senior Storage Product Engineer , you will take ownership of NVIDIA's Product Team's internal and ... Fabrics. + Expertise in algorithms, data structures, complexity analysis, software development, and automating maintenance of large-scale Linux-based storage… more
- NVIDIA (Santa Clara, CA)
- …on the world. We are seeking a highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for EDA and high-performance ... scalable automation solutions. + Continuously improve infrastructure provisioning, management, observability and day to day operation through automation. + Build… more
- NVIDIA (Santa Clara, CA)
- …the world. We are seeking a dedicated Base Command Manager (BCM) Engineer to support product deployments/escalations and collaborate with Engineering and our Field ... around large-scale, BCM-managed clusters. + Evaluate changes in BCM and underlying OS/ software stacks, communicating the impact to the field organization to maintain… more
- Cisco (San Jose, CA)
- …integrating AI into our solutions to transform collaboration, security, networking, observability , and more. We are innovating ethical AI products and infrastructure ... in high-dimensional data spaces using advanced techniques. + Implement robust software systems for integrating and maintaining machine learning models. + Collaborate… more
- NVIDIA (Santa Clara, CA)
- …with high efficiency and availability. It encompasses various areas, including software and systems engineering practices, storage, data management, and services. ... data access efficiency, and optimizing storage performance. Much of our software development focuses on optimizing operations through automation, performance tuning,… more
- Cisco (San Jose, CA)
- …on edge, and DevOps automation. This role requires strong expertise in AI/ML, software development, and a passion for applying AI to real-world challenges. Join us ... science, Machine Learning, Mathematics, Statistics, or related field with 6+ years of software engineering experience, or Master's degree in a related field with 3+… more
- ServiceNow, Inc. (Pleasanton, CA)
- …Selenium, TestNG) and integrating tests into CI/CD pipelines + Understanding software quality principles including reliability, observability , and production ... It all started in sunny San Diego, California in 2004 when a visionary engineer , Fred Luddy, saw the potential to transform how we work. Fast forward to today -… more
- NVIDIA (Santa Clara, CA)
- …metrics to measure the efficiency of services and drive efficiency with software and hardware optimizations (SR-IOV/ DPU) + Experience with Technologies like eBPF ... and XDP for Observability & DDoS mitigation + Collect and review system data for capacity and planning purposes, analyze capacity data and develop plans for… more
- Amazon (San Francisco, CA)
- …on EKS and ensure we are following industry best practices - Develop software to manage our cloud infrastructure and write tooling to automate manual tasks ... for teams across the company - Improve the logging, monitoring, and observability story for our cloud services - Drive operational excellence through readiness… more