- EPAM Systems (San Jose, CA)
- …balancer, DNS, etc. + Understand the concepts of Site Reliability Engineering ( SRE ) to maximize automation, reduce waste, increase scale, and apply systemic thinking ... + Ability to express ideas effectively in individual and group situations (including non-verbal communication), adjusting language or terminology to the characteristics and needs of the audience + Ability to listen effectively to others and give constructive… more
- Microsoft Corporation (Mountain View, CA)
- …quality. + 1+ year(s) of experience applying site-reliability engineering ( SRE ) practices, including monitoring, incident response, and improving system resilience. ... Software Engineering IC4 - The typical base pay range for this role across the US is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and… more
- ServiceNow, Inc. (Santa Clara, CA)
- …(eg, Azure, AWS, GCP). + Partner with the Site Reliability Engineering ( SRE ) team to improve operational processes and reliability. + Review, consult, and ... prepare for planned changes and releases to the production environment. + Create and maintain detailed documentation of infrastructure, automation, and standard operating procedures. + Provide feedback to infrastructure architects and contribute to design… more
- NVIDIA (Santa Clara, CA)
- …is built. From healthcare research applications to autonomous vehicles, or voice- recognition systems, there is a need to simplify and deliver predictability ... propose novel approaches and shape new proof‑of‑concepts. + Bridge development, SRE , and partner teams. Facilitate clear communication, triage emergent issues… more
- Amazon (Mountain View, CA)
- …serverless) - Experience with DevOps practices and tools - Knowledge of SRE principles and practices About the team Diverse Experiences AWS values diverse ... experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn't followed a traditional path, or includes alternative experiences, don't… more
- NVIDIA (Santa Clara, CA)
- …+ BS (or equivalent experience) with 5+ years of professional experience in DevOps, SRE , or Build/Release engineering roles at similar scale. + Fluent in Python for ... scripting, tooling, and automation + Deep hands‑on experience in CI/CD, virtualization, and container orchestration. Usage of tools like GitLab CI/CD, Jenkins, CircleCI, Docker, Kubernetes is required. Ways to stand out from the crowd: + Proven understanding… more
- ServiceNow, Inc. (Santa Clara, CA)
- …perspective. Our engineers are responsible for restoring database/application services, guiding SRE and CS operations on any database-related issues, working with ... development on database defects and migrations, and strategizing the scaling of the ServiceNow platform. The ideal candidate for this position is a software engineer with a strong background in database technologies, performance analysis of databases and RHEL,… more
- Coinbase (Sacramento, CA)
- …Coinbase is hiring! We are looking for an experienced Site Reliability Engineer ( SRE ) to join the IT Operations Corporate Engineering team to build and scale ... our identity and access management tooling. A successful candidate will have demonstrated previous success in similar role(s) in rapidly growing, security-first environments. The right person is passionate about infrastructure as code, open source tooling,… more
- NVIDIA (Santa Clara, CA)
- …of secure communication protocols (mutual-TLS, IPsec, or similar). + Knowledge of SRE principles (observability, SLOs, logging, etc.) Ways to stand out from the ... crowd: + Experience in a Hyperscale Cloud Service Provider (public facing or not). + Understanding of networking protocols such as IP, IPv6, BGP, HTTP, ICMP, tunneling protocols (VXLAN, Geneve, FoU, GRE), etc. + Familiarity with Infiniband networking. +… more
- NVIDIA (Santa Clara, CA)
- …developing multi-cloud infrastructure services. Experience teaching reliability engineering (eg SRE ) and/or other scale-oriented cloud systems practices to peers ... and/or other companies (eg CRE). Experience in running private or public cloud systems based on one or more of Kubernetes, OpenStack, Docker or Slurm. + Experience with accelerated compute and communications technologies such BlueField Networking, Infiniband… more