-
Senior Site Reliability Engineer - Distributed…
- Cognizant (Arizona City, AZ)
-
About the role
As a Site Reliability Engineer, you will make an impact by designing and implementing observability solutions tailored for distributed edge computing environments. You will be a valued member of the Technology & Engineering team and work collaboratively with cross-functional teams to ensure system reliability, performance, and visibility across remote facilities.
In this role, you will:
Design and implement observability frameworks for edge computing environments, including monitoring, logging, tracing, and metrics collection.
Define and maintain SLIs, SLOs, and business KPIs to measure and enhance system reliability across edge and centralized infrastructure.
Build dashboards, visualizations, and alerting systems for real-time insights and incident response.
Implement distributed tracing and log aggregation systems to troubleshoot complex edge issues.
Collaborate with engineering teams to embed observability best practices into edge applications and infrastructure.
Proactively identify issues using advanced observability tools, reducing MTTD and MTTR.
Lead incident postmortems and implement observability-driven improvements.
Develop automation scripts and tools to optimize observability pipelines for bandwidth-constrained environments.
Optimize data storage and querying strategies for performance, cost, and scalability.
Stay current with emerging observability trends and advocate for adoption of edge-specific solutions.
Work model
At Cognizant, we strive to provide flexibility wherever possible, and we are here to support a healthy work-life balance through our various wellbeing programs. Based on this role’s business requirements, this is an **onsite** position requiring 5 days a week in a client or Cognizant office.
Please note: This role will require an in-person meet and greet at our Cognizant office or client location.
The working arrangements for this role are accurate as of the date of posting. This may change based on the project you’re engaged in, as well as business and client requirements. Rest assured; we will always be clear about role expectations.
What you need to have to be considered
10+ years of IT experience
3–5 years of experience in service reliability/operations for large-scale hybrid environments.
3–5 years of experience writing automation scripts and building dashboards for application performance management.
2–4 years of experience with programming languages such as Go, Python, Java, or Rust.
Working knowledge of databases such as Oracle, SQL Server, Redis, ClickHouse, PostgreSQL, MongoDB, or time-series databases.
At least 2 years of experience with cloud platforms and containerization (GCP, AWS, Rancher, Azure, OpenShift).
Experience maintaining containerized apps in GKE/RKE/AKE environments.
Experience implementing cloud observability using OpenTelemetry (OTEL).
Experience with GraphQL frameworks (Apollo, Prisma, Hasura).
Strong understanding of networking protocols (TCP/IP, HTTP, DNS, load balancing, service mesh).
These will help you stand out
Proven experience managing application availability and building automation for high-availability platforms.
Hands-on experience with monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace.
Experience with CI/CD tools and extenders such as Rally and Confluence.
Experience with in-memory caching solutions (Redis preferred).
Strong debugging skills across integrated technical platforms and API gateways.
Hands-on experience with GCS, Cloud SQL, Spanner, and Firestore.
Experience in enterprise-level infrastructure and operations.
Expertise in high-availability and distributed systems, Linux/Windows administration, and support.
Experience monitoring and troubleshooting HashiCorp Vault environments.
Working knowledge of Vertex AI, Gen AI, and BigQuery.
Bachelor’s degree in computer science, IT or equivalent
Salary and Other Compensation:
The annual salary for this position is depending on experience and other qualifications of the successful candidate.
This position is also eligible for Cognizant’s discretionary annual incentive program, based on performance and subject to the terms of Cognizant’s applicable plans.
Benefits: Cognizant offers the following benefits for this position, subject to applicable eligibility requirements:
• Medical/Dental/Vision/Life Insurance
• Paid holidays plus Paid Time Off
• 401(k) plan and contributions
• Long-term/Short-term Disability
• Paid Parental Leave
• Employee Stock Purchase Plan
Disclaimer: The salary, other compensation, and benefits information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.
Cognizant is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected Veteran status, age, or any other characteristic protected by law.
-
Recent Searches
- sap ewm lead consultant (United States)
- Patient Transport Assistant 40 (Massachusetts)
- Vice President Field Programs (California)
- Senior C C Java (United States)
Recent Jobs
-
Senior Site Reliability Engineer - Distributed Systems
- Cognizant (Arizona City, AZ)