- ServiceNow, Inc. (Santa Clara, CA)
- …(eg, Azure, AWS, GCP). + Partner with the Site Reliability Engineering ( SRE ) team to improve operational processes and reliability. + Review, consult, and ... prepare for planned changes and releases to the production environment. + Create and maintain detailed documentation of infrastructure, automation, and standard operating procedures. + Provide feedback to infrastructure architects and contribute to design… more
- EPAM Systems (San Jose, CA)
- …balancer, DNS, etc. + Understand the concepts of Site Reliability Engineering ( SRE ) to maximize automation, reduce waste, increase scale, and apply systemic thinking ... + Ability to express ideas effectively in individual and group situations (including non-verbal communication), adjusting language or terminology to the characteristics and needs of the audience + Ability to listen effectively to others and give constructive… more
- NVIDIA (Santa Clara, CA)
- …propose novel approaches and shape new proof‑of‑concepts. + Bridge development, SRE , and partner teams. Facilitate clear communication, triage emergent issues ... rapidly, and ensure feedback loops between engineering and customer operations remain tight. + Coordinate execution across different functions. Work with engineering, design, operations, sales, and marketing to embed resiliency and observability requirements… more
- Amazon (Mountain View, CA)
- …serverless) - Experience with DevOps practices and tools - Knowledge of SRE principles and practices About the team Diverse Experiences AWS values diverse ... experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn't followed a traditional path, or includes alternative experiences, don't… more
- ServiceNow, Inc. (Santa Clara, CA)
- …perspective. Our engineers are responsible for restoring database/application services, guiding SRE and CS operations on any database-related issues, working with ... development on database defects and migrations, and strategizing the scaling of the ServiceNow platform. The ideal candidate for this position is a software engineer with a strong background in database technologies, performance analysis of databases and RHEL,… more
- Coinbase (Sacramento, CA)
- …Coinbase is hiring! We are looking for an experienced Site Reliability Engineer ( SRE ) to join the IT Operations Corporate Engineering team to build and scale ... our identity and access management tooling. A successful candidate will have demonstrated previous success in similar role(s) in rapidly growing, security-first environments. The right person is passionate about infrastructure as code, open source tooling,… more
- NVIDIA (Santa Clara, CA)
- …of secure communication protocols (mutual-TLS, IPsec, or similar). + Knowledge of SRE principles (observability, SLOs, logging, etc.) Ways to stand out from the ... crowd: + Experience in a Hyperscale Cloud Service Provider (public facing or not). + Understanding of networking protocols such as IP, IPv6, BGP, HTTP, ICMP, tunneling protocols (VXLAN, Geneve, FoU, GRE), etc. + Familiarity with Infiniband networking. +… more
- NVIDIA (Santa Clara, CA)
- …developing multi-cloud infrastructure services. Experience teaching reliability engineering (eg SRE ) and/or other scale-oriented cloud systems practices to peers ... and/or other companies (eg CRE). Experience in running private or public cloud systems based on one or more of Kubernetes, OpenStack, Docker or Slurm. + Experience with accelerated compute and communications technologies such BlueField Networking, Infiniband… more
- Aeris Communications (San Jose, CA)
- …Collaborate actively with other developers and other cross-functional teams like QA, SRE , and Operations. Assist in support of the existing code in production ... environments. Key Responsibilities + Investigate and evaluate advanced technologies, protocols, and architectures to identify scalable and efficient solutions that address system-level challenges and support secure, high-performance product development in the… more
- MongoDB (San Francisco, CA)
- …AI-powered applications. We are looking for an experienced Staff Engineer for our SRE , InfraSec team, to guide the security of our cloud-based infrastructure. As a ... Staff SRE , you will be very hands-on technically while also...on security work, with ideally 2+ years in a senior or staff engineering role Security Mindset: + A… more