-
Site Reliability Engineer Consultant
- Cognizant (Santa Clara, CA)
-
About the role
As a Site Reliability Engineer Consultant, you’ll make an impact by ensuring the stability, scalability, and performance of on-prem engineering cloud infrastructure. You’ll safeguard uptime across multiple data centers, strengthen observability, and drive automation that enhances efficiency and service reliability for global engineering teams. You will be a valued member of the Cloud Infrastructure and Security team and work collaboratively with the Engagement Delivery Lead.
In this role, you will:
•Maintain and optimize on-prem infrastructure across multiple data centers.
•Ensure high availability, reliability, and operational readiness of the engineering environment.
•Lead capacity planning and utilization optimization efforts.
•Guard service-level agreements through proactive monitoring, alerting, and incident response.
•Conduct root cause analysis and post-mortems to prevent recurrence of incidents.
•Participate in critical issue war rooms to provide timely resolutions.
•Build and maintain monitoring, logging, and observability systems using Prometheus, Grafana, and ELK Stack.
•Enhance monitoring systems with custom alerts aligned to business needs.
•Maintain KPI pipelines and automation using Jenkins and Python.
•Develop and maintain scripts and tools to automate operational processes using Python, Go, Bash, and Jenkins.
•Collaborate with development and infrastructure teams to resolve performance and reliability challenges.
•Create and maintain documentation for configurations, operational procedures, and troubleshooting guides.
Work model:
At Cognizant, we strive to provide flexibility wherever possible, and we are here to support a healthy work-life balance though our various wellbeing programs. Based on this role’s business requirements, this is an onsite position requiring 5 days a week in a client or Cognizant office in Santa Clara, California.
The working arrangements for this role are accurate as of the date of posting. This may change based on the project you’re engaged in, as well as business and client requirements. Rest assured; we will always be clear about role expectations.
What you need to have to be considered
•Proven experience managing large-scale on-prem infrastructure or data center environments.
Strong Kubernetes’ expertise, including deployments, debugging, and administration.
•Hands-on experience with Prometheus, Grafana, and observability best practices.
•Solid Linux/Unix fundamentals and system-level troubleshooting skills.
•Proficiency in automation and scripting using Python, Go, Bash, or Jenkins.
•Excellent analytical and problem-solving abilities with experience handling escalations (L1–L4).
These will help you stand out
•Familiarity with hardware such as GPUs or Tegra systems.
•Experience managing bare-metal infrastructure with tools like IPMI, Redfish, and KVM.
•Exposure to MySQL or other relational databases.
•Previous experience in large-scale, performance-critical engineering environments.
We're excited to meet people who share our mission and can make an impact in a variety of ways. Don't hesitate to apply, even if you only meet the minimum requirements listed.
Think about your transferable experiences and unique skills that make you stand out as someone who can bring new and exciting things to this role.
Salary and Other Compensation:
Applications will be accepted until October 28, 2025.
The annual salary for this position is between $79,000 - $92,500 depending on experience and other qualifications of the successful candidate.
This position is also eligible for Cognizant’s discretionary annual incentive program, based on performance and subject to the terms of Cognizant’s applicable plans.
**Benefits:** Cognizant offers the following benefits for this position, subject to applicable eligibility requirements:
• Medical/Dental/Vision/Life Insurance
• Paid holidays plus Paid Time Off
• 401(k) plan and contributions
• Long-term/Short-term Disability
• Paid Parental Leave
• Employee Stock Purchase Plan
Cognizant is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected Veteran status, age, or any other characteristic protected by law.
-
Recent Jobs
-
Site Reliability Engineer Consultant
- Cognizant (Santa Clara, CA)
-
Satellite Systems Engineer, Operations Controller - Clearance Required
- Lockheed Martin (Huntsville, AL)
-
Senior Cyber Tool and Capability Developer
- Draper (Boston, MA)
-
Senior Mechanical Engineer - Shipboard Effectors
- RTX Corporation (Tucson, AZ)