"Alerted.org

Job Title, Industry, Employer
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Advanced Search

Advanced Search

Cancel
Remove
+ Add search criteria
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Related to

  • HPC SRE Systems Engineer

    Ford Motor Company (Dearborn, MI)



    Apply Now

    We are seeking a highly skilled and motivated HPC SRE Systems Engineer to join our growing team. You will be responsible for designing, building, and maintaining our HPC and SRE infrastructure that our platform depends on for daily operation, ensuring optimal performance and reliability for our critical applications. This role will also have a focus on automating deployments of our infrastructure and monitoring stack leveraging CICD and IaC. If you are interested to engage with a dynamic HPC stack and be a driving force working towards the resiliency of our platform, this position could be a good fit for you.

     

    What you'll do...

     

    + Design, implement, and maintain a robust and scalable HPC infrastructure to support containerized AI/ML workloads across traditional HPC and Kubernetes environments.

    + Implement monitoring solutions to ensure health and availability of critical infrastructure and applications.

    + Develop automation for repeatable and resilient infrastructure deployments.

    + Troubleshoot and resolve complex technical issues related to Linux systems, networking, storage, and HPC applications.

    + Develop and maintain documentation for software and procedures.

    + Collaborate with software engineers and researchers to ensure seamless integration of HPC resources and scaling of applications.

    + Stay up-to-date on the latest advancements in HPC and AI/ML technologies and best practices.

     

    You'll have...

     

    + Associate's degree in Computer Science, Engineering, or work experience equivalent.

    + 5+ years of experience in Systems or Software engineering

    + Strong understanding of Linux operating systems, preferably in an HPC environment

    + Proficiency programming in one or more languages, preferably go, python, or bash scripting.

    + Familiarity with how to scale applications and the metrics collection, analysis, and visualization tools used to identify bottlenecks like Prometheus and Grafana.

    + Excellent problem-solving and troubleshooting skills. The ability to define what problems need to be solved.

    + Strong communication and collaboration skills.

     

    Even better, you may have...

     

    + Experience with containerization technologies like Docker or Kubernetes.

    + Experience with automation tools like Ansible, Puppet, or Chef.

    + Experience with monitoring tools like Prometheus, Icinga, Nagios, or Elasticsearch.

    **Requisition ID** : 45304

     


    Apply Now



Recent Searches

  • Logistics Operations Analyst 3rd (Washington)
[X] Clear History

Recent Jobs

  • HPC SRE Systems Engineer
    Ford Motor Company (Dearborn, MI)
  • Senior Data Informatics Analyst
    ServiceNow, Inc. (Orlando, FL)
[X] Clear History

Account Login

Cancel
 
Forgot your password?

Not a member? Sign up

Sign Up

Cancel
 

Already have an account? Log in
Forgot your password?

Forgot your password?

Cancel
 
Enter the email associated with your account.

Already have an account? Sign in
Not a member? Sign up

© 2025 Alerted.org