"Alerted.org

Job Title, Industry, Employer
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Advanced Search

Advanced Search

Cancel
Remove
+ Add search criteria
City & State or Zip Code
20 mi
  • 0 mi
  • 5 mi
  • 10 mi
  • 20 mi
  • 50 mi
  • 100 mi
Related to

  • High-Performance Computing Systems Engineer

    Texas A&M University System (Kingsville, TX)



    Apply Now

    Job Title

     

    High-Performance Computing Systems Engineer

     

    Agency

     

    Texas A&M University - Kingsville

    Department

    I Tech

     

    Proposed Minimum Salary

     

    Commensurate

     

    Job Location

     

    Kingsville, Texas

     

    Job Type

     

    Staff

    Job Description

    Job Summary

    The High-Performance Computing Engineer (HPC) is a unique role that combines the design, development, and operational management of the institution's high-performance computing resources. This position offers the opportunity to work closely with faculty, researchers, and students, supporting their computational research projects and ensuring the HPC infrastructure meets their needs. The engineer will play a crucial role in optimizing computational methods and facilitating groundbreaking research across disciplines. The High-Performance Computing Engineer manages the High-Performance Computing cluster administration, unit coordination, maintaining HPC systems, strategic planning for the University’s HPC infrastructure, and providing advanced technical support for using HPC systems.

     

    Essential Duties and Responsibilities

    System Architecture and Design

    + To meet research needs, design and implement HPC infrastructure, including compute clusters, storage, and interconnects to accommodate for our computational needs of our research community.

    + Evaluate and integrate HPC, cloud, and storage technology advancements to enhance performance.

    System Administration and Maintenance

    + Manage and optimize HPC clusters, addressing hardware, software, and networking.

    + Perform system administration tasks on HPC clusters, including configuration, maintenance, and troubleshooting of hardware, software and networking components.

    + Monitor performance, troubleshoot, and implement security measures.

    User Support and Collaboration

    + Provide technical support and training for researchers on HPC tools and best practices.

    + Organize training sessions and workshops on HPC best practices and programming, and optimization techniques.

    + Collaborate with researchers on computational strategies and code optimization.

    Strategic Planning

    + Represent the department in strategic planning and advisory roles.

    + Guide IT strategies to support teaching, research, and service goals.

    + Collaborate and advise the CIO and other executive staff on issues concerning information technology needs of Texas A&M – Kingsville.

    + Establish information technology strategy, direction, and strategic plans to achieve the university’s teaching, learning, research, and service goals.

    Software and Application Management

    + Deploy and maintain scientific software and development tools.

    + Develop scripts and tools to automate tasks and enhance workflows.

    + Must be fluent in multiple programming languages to meet our campus needs.

    Disaster Recovery and Continuity

    + Regularly review and document disaster recovery and business continuity procedures.

    + Assess HPC utilization, lifecycle, and performance for improvement opportunities.

    + Ensure we are aligned with our campus and system policies and rules.

    + Design, test, and verify the disaster recovery plan to ensure continuity.

    Research and Development Computing

    + Lead Administrator for our campus HPC systems and document performance analyses.

    + Identify and implement solutions to advance computational research.

    Data Management and Storage

    + Develop policies for data integrity, backup, and availability.

    + Design scalable storage solutions for efficient data access and integration.

    + Optimize scalable solutions with efficiency.

    Networking and Collaboration

    + Build partnerships with industry, academic institutions, and HPC networks.

    Training and Education

    + Create training programs and documentation to support organizational needs.

    + Communicate effectively across all organizational levels.

     

    The above represents the major duties, responsibilities, and authorities of this job, and is not intended to be a complete list of all tasks and functions. Other duties may be assigned.

    Additional Responsibilities

    Other: 5%

    + May require availability to work some nights, weekends, and holidays.

    + Perform other duties as assigned.

    Minimum Requirements

    Education – Bachelor’s degree or an equivalent combination of education and experience

     

    Experience – Six years of related experience

    Preferred Requirements

    Education – Master's in Computer or Computational Science, Statistics, or Engineering program.

    Experience:

    + Ten years or more experience in HPC related to hands-on system administration and management of large-scale supercomputing clusters at all levels, the use of parallelization techniques, the use of programming languages, tools, and techniques with Fortran, C/C++, Java, or POSIX threads, etc., and mass storage architecting and planning.

    + Five years of management and leadership experience in HPC or research computing centers.

    + Experience with computing clusters in Windows and Linux and virtualized environments.

    + Experience in enhancing and maintaining the securing of HPC resources.

    + Ability to evaluate and benchmark cluster architectures and their key subsystems (e.g., mass storage, interconnect, processor technology). Knowledge of scripting languages like Bash, Python, and Perl to maintain HPC systems and scientific computing. Knowledge of C/C++, Fortran, CUDA, OpenCL, OpenMP, and MPI for scientific computing. Configuration management tools include Puppet, Chef, Ansible, Salt, etc. Knowledge of container technologies such as Docker, Singularity, and Kubernetes. Excellent troubleshooting skills include quickly recognizing failure modes and corresponding symptoms. Excellent intercommunication skills.

    + Higher Education Experience

    Licensing / Professional Certifications:

    + Linux/UNIX certifications related to systems administration.

    + Certifications related to managing high-performance storage systems.

     

    The target base annual salary is $110,000 and may be negotiable based on funding availability and candidate experience/skillset in relation to the minimum requirements of this position.

     

    Supervision of Others

     

    This position generally does not supervise employees.

     

    All positions are security-sensitive. Applicants are subject to a criminal history investigation, and employment is contingent upon the institution’s verification of credentials and/or other information required by the institution’s procedures, including the completion of the criminal history check.

     

    Equal Opportunity/Veterans/Disability Employer.

     


    Apply Now



Recent Searches

[X] Clear History

Recent Jobs

  • High-Performance Computing Systems Engineer
    Texas A&M University System (Kingsville, TX)
[X] Clear History

Account Login

Cancel
 
Forgot your password?

Not a member? Sign up

Sign Up

Cancel
 

Already have an account? Log in
Forgot your password?

Forgot your password?

Cancel
 
Enter the email associated with your account.

Already have an account? Sign in
Not a member? Sign up

© 2025 Alerted.org