"Alerted.org

Job Title, Industry, Employer
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Advanced Search

Advanced Search

Cancel
Remove
+ Add search criteria
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Related to

  • Cloud Infrastructure Site Reliability Engineer

    Insight Global (Alpharetta, GA)



    Apply Now

    Job Description

    Job Description:

    As a Cloud Infrastructure Site Reliability Engineer (SRE) with expertise in multiple public cloud service provider platforms, you will be responsible for operating infrastructure solutions, following the principles and practices pioneered by Google’s SRE model. Your work will ensure our cloud services meet uptime, reliability, and performance targets, and you will drive automation and continuous improvement across our production environments. This role will involve collaborating with cross-functional teams to enhance our cloud reliability posture and streamline processes through automation.

    Key Responsibilities:

    • Design, build, and maintain highly available, scalable, and secure cloud infrastructure on platforms such as AWS, GCP, or Azure.

    • Develop and implement automation for provisioning, monitoring, scaling, and incident response using Infrastructure-as-Code tools (e.g., Terraform, CloudFormation, Ansible).

    • Monitor system reliability, capacity, and performance; proactively detect and address issues before they impact users.

    • Respond to production incidents, participate in on-call rotations, and lead post-incident reviews to drive root cause analysis and reliability improvements.

    • Collaborate with software engineering and security teams to ensure new services and features are production-ready and meet reliability standards.

    • Build and maintain tools for deployment, monitoring, and operations; automate manual processes to reduce toil.

    • Document operational processes and system architectures to ensure knowledge sharing and repeatability.

    • Continuously evaluate and implement new technologies to improve system reliability, security, and efficiency.

     

    This role will require you to be onsite 5 days a week in Alpharetta, GA or in Berkley Heights, NJ.

     

    Pay Rate: 92-95/hr. with a 165k conversion salary.

     

    We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to [email protected] learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

    Skills and Requirements

    • Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.

    • 3+ years of experience in software development with proficiency in at least one programming language (e.g., Python, Go, Java, C++).

    • Experience administrating cloud platforms (AWS, GCP, Azure), including networking, security, containerization, storage, data management, and serverless technologies.

    • Solid understanding of Linux systems, networking fundamentals, virtualized, and distributed systems, file systems, system processes and configurations.

    • Deep understanding of observability (monitoring, alerting, and logging) tools in cloud environments. Ability to set up and maintain monitoring dashboards, alerts, and logs.

    • Familiarity with Continuous Integration/Continuous Deployment (CI/CD) tools for automated testing, deployments, provisioning, and observability.

    • Ability to manage and respond to incidents, perform root cause analysis, and implement post-mortem reviews.

     

    Understanding of setting, monitoring, and maintaining Service-Level Objectives (SLOs) and Service-Level Agreements (SLAs) for system reliability. • Experience working with enterprise-scale financial services or other regulated industries

     

    • 5+ years of experience in SRE, DevOps, infrastructure, or cloud engineering roles, preferably supporting large-scale, distributed systems.

    • Excellent problem-solving, troubleshooting, and communication skills.

    • Experience leading technical projects or mentoring junior engineers.

     

    Certifications: Certified Engineer, DevOps, SRE, CSREF

     


    Apply Now



Recent Searches

  • Client Manager Select Segment (Florida)
  • Lab Assistant Trainee Core (Louisiana)
  • Speech Language Pathologist 75 (United States)
  • Staff Software Development Engineer (Georgia)
[X] Clear History

Recent Jobs

  • Cloud Infrastructure Site Reliability Engineer
    Insight Global (Alpharetta, GA)
  • Network Engineer Journeyman
    CACI International (High Point, NC)
[X] Clear History

Account Login

Cancel
 
Forgot your password?

Not a member? Sign up

Sign Up

Cancel
 

Already have an account? Log in
Forgot your password?

Forgot your password?

Cancel
 
Enter the email associated with your account.

Already have an account? Sign in
Not a member? Sign up

© 2025 Alerted.org