Alerted.org | Alerted.org - Powering better job alerts

Site Reliability Engineer

IBM (Austin, TX)

Apply Now

Introduction

A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.

Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.

IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.

Your role and responsibilities

As an Entry-Level Site Reliability Engineer (SRE) on our Austin Development Lab Engineering Team, you will join a team dedicated to ensuring the reliability, scalability, and performance of IBM systems and infrastructure. This plays a critical role in advancing critical IBM Power System development initiatives, gaining hands-on experience with both physical hardware and software environments.

Your responsibilities will include, but not limited to:

* Assisting in the setup, configuration, and maintenance of IBM Power servers and related infrastructure.

* Performing hands-on tasks in the lab, including racking, cabling, hardware troubleshooting, and physical system configuration.

* Supporting software-related reliability initiatives such as automation, monitoring, performancetuning, and system optimization.

* Participating in incident response, diagnostics, and root-cause analysis for both hardware and software issues.

* Collaborating with cross-functional teams to ensure smooth integration between physical systems and application environments.

* Supporting projects related to lab analytics—gathering, analyzing, and interpreting data to help guide better business and operational decisions.

* Contributing to the deployment, scaling, and ongoing maintenance of production and test systems.

* Writing clear, concise documentation for processes, configurations, and troubleshooting steps.

* Learning and applying best practices in systems reliability, observability, and infrastructure operations.

* You will be expected to grow into a well-rounded SRE capable of tackling challenges in both the physical data center like environment and the software layer that powers our services.

* Mentorship and hands-on training will be provided to help you develop the skills to excel in both domains.

Required technical and professional expertise

• To be successful in this role, the candidate must be hands-on, proactive, talented at problem solving, have the attitude to challenge the norm and have a strong desire to learn and work towards perfection.

• Passion for eliminating repetitive manual processes using automation.

• Strong attention to detail and excellent analytical capabilities.

• Excellent troubleshooting, problem solving, and debugging skills.

• Proficiency in programming concepts and frameworks.

• Proficiency in scripting/coding for automation using Python, shell scripting (bash, etc), Ansible, and related tools and languages.

• Familiarity with server operations, virtualization, and related infrastructure concepts.

• Fundamental understanding of computer networks.

• Fundamental understanding of data science/analytics framework.

• An automation mindset, wherever possible, you should use scripting and automation.

• Ability to work independently and as part of a team to achieve the SRE agenda.

• Complete project work, both supervised and unsupervised.

• Ability to effectively prioritize and execute tasks in a high-pressure environment.

• Good Written, oral, and interpersonal communication skills.

Preferred technical and professional experience

* Fundamental understanding of Linux/Unix systems is a plus.

* Fundamental knowledge of Red Hat OpenShift and Kubernetes is a plus

* Automation/Scripting: In-depth experience with Ansible, Python, Terraform, and CI/CD tools is a plus, but a fundamental understanding is a must.

* Hands-on experience crafting alerts and dashboards using Python or any other language.

IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.

Apply Now

"Alerted.org

Advanced Search

Site Reliability Engineer

Recent Searches

Recent Jobs

Account Login

Sign Up

Forgot your password?