- Cisco (San Jose, CA)
- …with our customers. Our CRE team adapts the best practices of Site Reliability Engineering ( SRE ) and applies them to our customers. As part of the role, you will ... gain a deep understanding of our customers, their architecture down into their various configurations. The main mission of this role is to ensure that our customers can continue running Isovalent networking for Kubernetes, reliably, at scale. You will work… more
- IBM (San Francisco, CA)
- …5+ years of proven experience in Support Engineering, Professional Services, or SRE /DevOps * 3+ years of experience leading complex technical escalations or ... incidents. * Demonstrated ability to remain calm, composed, and effective during High-pressure incidents. * Exceptional communication and stakeholder engagement skills across technical and executive levels. * Strong influencing and negotiation skills,… more
- Insight Global (Newark, CA)
- …services. . Familiarity with cloud-native resiliency patterns and site reliability engineering ( SRE ) methods. . Proven ability to assess and design effective Major ... Incident Response Plans (MIRPs) that align with operational SLAs and business risk tolerances. . Experience in business continuity planning, incident response coordination, and process maturity assessments. . Excellent communication and documentation skills… more
- Robert Half Technology (Oakland, CA)
- …environment. + Participate in on-call rotations, addressing security incidents and SRE responsibilities. + Collaborate with engineering and DevOps teams to identify ... and close security gaps. + Implement infrastructure-as-code and automate security checks in CI/CD pipelines using tools like GitHub Enterprise, Jenkins, Artifactory, and CircleCI. Requirements What We're Looking For: + Bachelor's degree or equivalent work… more
- Insight Global (Irvine, CA)
- …in Computer Science or related field. 47+ years in Tech Ops, DevOps, SRE , or MLOps roles. Experience with cloud platforms (especially GCP/Vertex AI), CI/CD tools, ... scripting (Python, Bash), and containerization (Docker, Kubernetes). Strong troubleshooting skills and familiarity with monitoring tools (eg, Prometheus, Grafana). Masters degree and relevant cloud certifications. Experience with MLOps/LLMOps, AI/ML frameworks… more
- Zoom (San Jose, CA)
- …(CS/CE/EE/IS or related majors) + Have 2 years experience on Cloud Based DevOps or SRE + Demonstrate knowle dge of AWS serv ices, Linux batch command, ELK stack and ... container orchestration (ie K8s, Docker) + Possess hands-on experience on scripting, such as Ansible, Terraform, Python, GO + Have the ability to monitor, debug and automate routine tasks + Be willing to learn, be proactive, and think creatively Ways of… more
- Palo Alto Networks (Santa Clara, CA)
- …rotations, Postmortems, and run books to continue supporting the infrastructure owned by the SRE team while finding ways to reduce the time to resolution and improve ... the reliability of services. + Support, optimize, and deploy mission-critical, front-end, and back-end production. + Improving site performance, monitoring, and overall stability of our infrastructure **Your Experience** + Bachelors/Masters degree in Computer… more
- Microsoft Corporation (Mountain View, CA)
- …quality. + 1+ year(s) of experience applying site-reliability engineering ( SRE ) practices, including monitoring, incident response, and improving system resilience. ... Software Engineering IC4 - The typical base pay range for this role across the US is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and… more
- EPAM Systems (San Jose, CA)
- …balancer, DNS, etc. + Understand the concepts of Site Reliability Engineering ( SRE ) to maximize automation, reduce waste, increase scale, and apply systemic thinking ... + Ability to express ideas effectively in individual and group situations (including non-verbal communication), adjusting language or terminology to the characteristics and needs of the audience + Ability to listen effectively to others and give constructive… more
- ServiceNow, Inc. (Santa Clara, CA)
- …(eg, Azure, AWS, GCP). + Partner with the Site Reliability Engineering ( SRE ) team to improve operational processes and reliability. + Review, consult, and ... prepare for planned changes and releases to the production environment. + Create and maintain detailed documentation of infrastructure, automation, and standard operating procedures. + Provide feedback to infrastructure architects and contribute to design… more