Alerted.org | Alerted.org - Powering better job alerts

Junior Site Reliability Engineer

TEKsystems (San Diego, CA)

Apply Now

Description

Likely on site 4-5 days per week in Rancho Bernardo

Site Reliability Engineer, Production Operations Engineering

PlayStation isn’t just the Best Place to Play —it’s also the Best Place to Work! We’ve thrilled gamers since 1994, when we launched the original PlayStation! Today, we’re recognized as a global leader in interactive and digital entertainment. The PlayStation brand falls under Sony Interactive Entertainment, a wholly-owned subsidiary of Sony Corporation.

As a member of the operations SRE team within the platform technology group, you will carry the responsibility of keeping key user experiences on the platform available, resilient and high performing, while continually enabling our service teams to deliver new and exciting products and technical features. Our team strives to iteratively learn, improve and automate our processes every single day, which continually sets the standard for operational excellence within our organization. You will be empowered to drive and lead technical initiatives, helping identify and proactively drive improvements in both process and technology supporting millions of users.

Responsibilities:

• Application operations and production support of internal and public facing services within an AWS cloud environment, ensuring availability, resiliency, scalability and performance.

• Provision, automate and ensure the production readiness of all new services and features introduced.

• Identify areas for operational process improvement and automation. Drive the hands-on development of scripts and tools to automate these processes within our environment.

• Increase observability on our platform by implementing robust monitoring and alerting patterns across our services. Develop rich, informative dashboards / reports on our services that provide valuable insight and meaningful alerting to drive down the MTTD and MTTR on platform incidents.

• Collaborate and partner with other SRE teams that specialize in areas such as data services, CICD, and platform hosting to inspire changes and ensure optimal end-to-end system performance and resiliency across all back-end services within PlayStation.

• Iteratively drive performance and capacity validation analysis for our services. Apply AWS patterns and technologies such as spot instances, dynamic auto-scaling and EKS to optimize resource usage and AWS spend.

• Conduct, document and present root cause analysis documents to share incident insights and findings with our broader engineering organization.

• Provide rotational on-call support where you’ll respond, detect, triage and resolve production incidents.

Key Qualifications:

• Equally adept at software development and systems engineering/operations

• Build, deploy, operate and support services at scale

• PASSIONATE(!) desire to automate and improve everything including process improvements, standardizing tools and technologies

• Excellent troubleshooting skills that span user experience, system, infrastructure, and network (TCP/IP). Ability to zoom in from user error to JVM garbage collection problem to packet loss on the network.

• Drive operational and infrastructural requirements that promote availability, reliability, performance and security at all phases of SDLC on a global scale

• Customer and peer relationship focused with strong interpersonal and communication skills

Required Skills:

• Fluency with running distributed services at scale

• In depth understanding of Unix/Linux systems internals and networking

• Source code (GitHub) and configuration management tools (Ansible, Chef, etc.)

• Software development experience in one or more of following: Python, Go or Java

• Building and deploying Infrastructure as Code: CloudFormation/Terraform

• Building continuous integration and continuous delivery (CICD) pipelines in Jenkins, Spinnaker, or similar

• Operating and running Java services/APIs in AWS cloud infrastructure

• AWS systems and network protocols (ie: ALB, R53, API-Gateway, TCP/IP, HTTP/HTTPS, DNS)

• Configuring, tuning, and automating AWS services including Lambda, RDS, DynamoDB and Elasticache.

• Container technologies and orchestration (ie: Docker, Kubernetes, EKS, Fargate)

• Application monitoring tools: DataDog, CloudWatch, Splunk, Grafana

• Data Reporting & Analytics: SQL, MySQL, Oracle, or Big Data

• Operating and supporting large scale and/or critical customer-facing production services or applications

Experience:

• BS degree in Computer Science, Software Engineering, or related technical area

• 3+ years operating and supporting services in production environment at scale

Skills

sre, devops, site reliability, triage, root cause analysis, software engineering

Top Skills Details

sre,devops,site reliability,triage,root cause analysis,software engineering

Additional Skills & Qualifications

.

Experience Level

Entry Level

Pay and Benefits

The pay range for this position is $43.94 - $43.94/hr.

Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following:

• Medical, dental & vision • Critical Illness, Accident, and Hospital • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available • Life Insurance (Voluntary Life & AD&D for the employee and dependents) • Short and long-term disability • Health Spending Account (HSA) • Transportation benefits • Employee Assistance Program • Time Off/Leave (PTO, Vacation or Sick Leave)

Workplace Type

This is a hybrid position in San Diego,CA.

Application Deadline

This position is anticipated to close on Nov 7, 2025.

h4>About TEKsystems:

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

About TEKsystems and TEKsystems Global Services

We’re a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We’re a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We’re strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We’re building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.

The company is an equal opportunity employer and will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

Apply Now

"Alerted.org

Advanced Search

Junior Site Reliability Engineer

Recent Searches

Recent Jobs

Account Login

Sign Up

Forgot your password?