- 
        Remote Site Reliability Engineer
- Insight Global (Dallas, TX)
- 
             Job Description As a Site Reliability Engineer specializing in AWS Platforms, you’ll ensure infrastructure and applications across AWS Organizations are built for reliability, scalability, and performance. You’ll integrate reliability and security practices throughout the software development lifecycle, emphasizing shift-left security, early vulnerability detection, and automated rollbacks. Your role includes developing and maintaining automation tools using Terraform and Ansible, ensuring Infrastructure as Code (IaC) scripts meet best practices for availability, compliance, and security. You’ll provision new AWS accounts and automate service, network, and security configurations to maintain consistent and secure environments. Monitoring and incident response are key responsibilities, including deploying tools like Prometheus, Grafana, and ELK, conducting root cause analysis, and leading post-mortems. You’ll optimize system performance and resource utilization to meet user expectations and compliance standards. Disaster recovery planning and testing will be part of your duties to minimize downtime and data loss. Collaboration with development, operations, security, and product teams is essential to address reliability and security issues. Staying current with AWS trends, tools, and methodologies will help you continuously improve infrastructure resilience and reliability across the organization. We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to [email protected] learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/. Skills and Requirements o Minimum of 5 years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role focused on infrastructure reliability. o Proven experience with Linux/Unix system administration, cloud platforms (AWS, Azure, or similar), and container orchestration (Kubernetes, Docker). o Proficiency in one or more programming/scripting languages (e.g., Python, Go, Ruby, Bash). o Strong familiarity with CI/CD pipelines, configuration management tools (e.g., Ansible, Puppet, Chef), and Infrastructure as Code (Terraform, CloudFormation) in AWS Organizations contexts. o Experience implementing shift left security practices, integrating security scanning tools into CI/CD pipelines. o Familiarity with security scanning and compliance tools such as Prisma Cloud, Wiz, or similar platforms. o Experience with automated testing, monitoring, logging, alerting, and incident response tools. o Solid understanding of networking concepts, distributed systems, high availability architectures, and compliance standards. • Experience implementing SRE principles and practices in a government or highly regulated environment. • Familiarity with chaos engineering, resilience testing, and fault injection methodologies. • Knowledge of database management and optimization for high availability (SQL/NoSQL) in regulated settings. • Certifications such as AWS Certified Solutions Architect – Specialty, Google Cloud Professional DevOps Engineer, CISSP, or similar are a plus. 
 
 
- 
        
Recent Jobs
- 
                
                    Remote Site Reliability Engineer
                
                - Insight Global (Dallas, TX)
- 
                
                    Senior Vetting & Analysis Analyst
                
                - CACI International (Colorado Springs, CO)
- 
                
                    Enterprise MODS Technical Program Manager
                
                - NVIDIA (Santa Clara, CA)
- 
                
                    Manager, Controls Engineering
                
                - TAIT Towers (Orlando, FL)