- General Motors (Sacramento, CA)
- …perform code reviews, and create technical designs that improve performance and reliability of observability systems . + Proactively identify and address ... most in their domain. **The Role** We are looking for a Principal Engineer with an extensive engineering background, experience using a variety of developer tools… more
- NVIDIA (Santa Clara, CA)
- …This candidate must have enterprise server integration, strong Linux experience, reliability testing with various telemetries, scale out cluster, test plan ... stack from design doc. + Installing and testing various systems OS, server firmware and SW stack. + Drive...stack. + Drive support for root cause analysis on reliability and validation test failures to identify root cause(s)… more
- Meta (Menlo Park, CA)
- …we are seeking for engineers to work on the space of GenAI/LLM scaling reliability and performance. **Required Skills:** Software Engineer , SystemML - Scaling / ... SW stacks around NCCL and PyTorch to improve the full-stack distributed ML reliability and performance (eg Large-Scale GenAI/LLM training) from the trainer down to… more
- DoorDash (San Francisco, CA)
- …like Ads, Groceries, Logistics, Fraud, and Search. About the Role As a Senior Software Engineer in the team, you will take ownership of major projects within our ML ... - Own and deliver significant sub-projects that enhance our platform's performance, reliability , and ease of use. + Architect & Implement Scalable Solutions -… more
- Palo Alto Networks (Santa Clara, CA)
- …and GCP Production environments including databases, Infrastructure as Code and observability systems . To meet the opportunities that such a role provides, you will ... influence the operability of the product and ensure the reliability and availability of our services **Your Experience** +...Experience - 8+ years of experience as a DevOps/SRE engineer with a passion for technology and a strong… more
- Marathon Petroleum Corporation (Martinez, CA)
- …and troubleshooting fixed equipment (refinery pressure vessels, heat exchangers, piping systems , etc.). Reliability Engineers will also be continuously ... our people, and fosters a collaborative team environment. Responsibilities: As a Mechanical Engineer at a Marathon refinery, you can expect to become familiar with… more
- Zoom (San Jose, CA)
- Machine Learning Engineer What you can expect As a Machine Learning Engineer on our team, you will play a critical role in shaping the future of Zoom AI through ... design discussions, and technical presentations to ensure the quality and reliability of our engineering solutions. + Identifying opportunities for improvement in… more
- US Bank (Cupertino, CA)
- …creates optimal design adhering to architectural best practices; considers scalability, reliability and performance of systems /contexts affected when defining ... job - they want to make a difference! US Bank is seeking a Software Engineer who will contribute toward the success of our technology initiatives in our digital… more
- Google (Sunnyvale, CA)
- Software Engineer , Stack Management, Vertex GenAI, Cloud AI _corporate_fare_ Google _place_ Sunnyvale, CA, USA; Kirkland, WA, USA **Mid** Experience driving ... evaluation, optimization, data processing, debugging). + Experience in C++, infrastructure systems , and Cloud. **Preferred qualifications:** + Master's degree or PhD… more
- Google (Sunnyvale, CA)
- Staff AC/DC Rack Power Engineer , Platforms Infrastructure _corporate_fare_ Google _place_ Sunnyvale, CA, USA **Advanced** Experience owning outcomes and decision ... of vendors working on projects and evaluate new technologies. As a Power Engineer in Technical Infrastructure, you will play a key role in the development… more