- pony.ai (Fremont, CA)
- …public at NASDAQ in November 2024. Responsibilities As a (Senior) Kubernetes Engineer , you will: + Design, operate, and optimize Kubernetes clusters across hybrid ... cloud environments (public cloud and on-prem datacenter)....policies, and operational guidelines. + Contribute to observability and SRE practices to ensure reliability at scale (SLOs, incident… more
- EPAM Systems (San Jose, CA)
- In your role as **Azure Platform Engineer ** , you will help and guide the build of (network) solutions to enhance the availability, performance, and stability of the ... Azure platform. As a Platform Engineer , you will need to balance service reliability, platform...speed. You will help guide a team of skilled Cloud Engineers in building software delivery pipelines, deploying and… more
- Google (Sunnyvale, CA)
- …as well as users across Google (eg, Borg team, ML teams, Hardware platform teams, SRE teams, Google Cloud , etc.). Information collected and processed as part of ... Staff Software Engineer , Borglet ML, Offloads _corporate_fare_ Google _place_ Sunnyvale,...to be reliable, and efficient. **About the job** Google Cloud 's software engineers develop the next-generation technologies that change… more
- NVIDIA (Santa Clara, CA)
- …and scalability across global public and private clouds. + Implement SRE fundamentals, including incident management, monitoring, and performance optimization, while ... or related field, or equivalent experience with 8+ years in Software Development, SRE , or Production Engineering. + Proficiency in Python and at least one other… more
- ServiceNow, Inc. (Santa Clara, CA)
- It all started in sunny San Diego, California in 2004 when a visionary engineer , Fred Luddy, saw the potential to transform how we work. Fast forward to today - ... 8,100 customers, including 85% of the Fortune 500(R). Our intelligent cloud -based platform seamlessly connects people, systems, and processes to empower… more
- Leidos (Vista, CA)
- …This position will require up to 75% travel Come put your Site Reliability Engineer ( SRE ) skills into action! Leidos has openings for talented SREs to ... in the field. You will automate the buildout of infrastructure in cloud and on-premises environments to operate Kubernetes clusters and microservices deployments. In… more
- Zoom (San Jose, CA)
- …and Tasks, own services end to end, partner with product, frontend and SRE , improve performance, security and observability, use data to resolve issues. About the ... on Java and the Spring ecosystem, modern CI and containers, event messaging, and cloud services. We partner tightly with frontend, design, product, and SRE to… more
- NVIDIA (Santa Clara, CA)
- The NVIDIA DGX Cloud organization is looking for software engineering talent to build NVIDIA's accelerated compute infrastructure. This includes software to assist ... and trouble-shooting of compute hardware and networking equipment. As a software engineer , you will work with other software engineers, product architects, and… more
- NVIDIA (Santa Clara, CA)
- …some of the most impactful fields of our generation: Cloud Engineering and Cloud Functions. If you're a creative engineer who enjoys autonomy and shares our ... NVIDIA DGX Cloud is a fully managed, cloud -based...infrastructure resiliency. We are looking for a Senior Software Engineer with experience in building highly agile and reliable… more
- PennyMac (Westlake Village, CA)
- …homeownership through the complete mortgage journey. A Typical Day The Sr DevOps Engineer - AI platform will: + Design, implement, and manage scalable and resilient ... and maintain Windows/Linux based environments, ensuring seamless integration with cloud platforms. + Develop and maintain infrastructure-as-code(IaC) using both AWS… more