-
Sr. HPC Architect - Hybrid
- Caris Life Sciences (Irving, TX)
-
At Caris, we understand that cancer is an ugly word—a word no one wants to hear, but one that connects us all. That’s why we’re not just transforming cancer care—we’re changing lives.
We introduced precision medicine to the world and built an industry around the idea that every patient deserves answers as unique as their DNA. Backed by cutting-edge molecular science and AI, we ask ourselves every day: _“What would I do if this patient were my mom?”_ That question drives everything we do.
But our mission doesn’t stop with cancer. We're pushing the frontiers of medicine and leading a revolution in healthcare—driven by innovation, compassion, and purpose.
Join us in our mission to improve the human condition across multiple diseases. If you're passionate about meaningful work and want to be part of something bigger than yourself, Caris is where your impact begins.
Position Summary
A Senior HPC Architect is responsible for designing and optimizing high-performance computing (HPC) systems, leveraging their expertise in parallel programming, performance analysis, and hardware architecture to create scalable, efficient solutions for demanding computational workloads, often collaborating with software developers and hardware engineers to achieve optimal performance across complex scientific or data-intensive applications.
Job Responsibilities
+ System Design and Implementation:
+ Architecting and designing high-performance computing clusters, selecting appropriate hardware components like CPUs, GPUs, storage systems, and networking infrastructure.
+ Installing and configuring operating systems (typically Linux) on cluster nodes.
+ Setting up and managing distributed file systems (like Lustre, Ceph, GPFS) for large data storage and access.
+ Implementing job scheduling systems (e.g., LSF, Slurm, PBS) to manage workload distribution across the cluster.
+ Performance Optimization:
+ Monitoring system performance metrics (CPU utilization, memory usage, network bandwidth) to identify bottlenecks and optimize resource allocation.
+ Benchmarking applications and performing performance analysis to identify areas for improvement.
+ Tuning application code for parallel processing to leverage the power of the HPC cluster.
+ User Support:
+ Providing technical support to researchers and users on how to access and utilize the HPC system
+ Training users on best practices for submitting jobs and optimizing their applications for the HPC environment
+ Troubleshooting user issues related to application execution, data management, and system access
+ System Administration:
+ Managing system updates, patching, and security configurations to maintain a stable and secure HPC environment
+ Implementing backup and disaster recovery procedures for critical data and system configurations
+ Monitoring system health and proactively addressing potential issues through alerts and logging systems
Required Qualifications
+ Minimum of five years’ experience in Linux systems administration.
+ Bachelor's degree in computer science, engineering, math, or scientific discipline with 2+ years of systems engineering; or 6 years’ experience in HPC architecture.
+ Hands-on architecture design experience with HPC to include storage, file system, InfiniBand, security, authentication, and compute architecture
+ Experience using Git to manage shared software configuration code bases
+ Hands-on experience with cloud-based services (e.g. Azure, AWS, GCP).
+ Good understanding of storage administration and optimization, such as performing upgrades and defining RAID configurations.
+ Deep understanding of parallel computing concepts and programming paradigms (MPI, OpenMP, CUDA).
+ Expertise in performance analysis tools and techniques to identify and address performance bottlenecks.
+ Knowledge of HPC hardware architectures, including processors, memory subsystems, network fabrics, and interconnects
+ Familiarity with HPC software stack components like compilers, runtime systems, job schedulers, and scientific libraries
+ Good understanding of storage administration and optimization, such as performing upgrades and defining RAID configurations.
+ Strong programming skills in languages commonly used in HPC (C, C++, Fortran)
+ Strong skills with scripting languages like Python and Shell scripting (e.g.,bash,ksh, Perl, Python) for automation
+ Experience with system administration and cluster management tools (e.g., LSF, Slurm, PBS)
+ Experience with distributed file systems (Lustre, Ceph, GPFS)
+ Excellent communication and problem-solving abilities to effectively collaborate with cross-functional teams
Preferred Qualifications
+ Experience in life sciences, healthcare and/or research institutions highly preferred
+ Experience building and installing scientific software and other 3rd party software applications on HPC systems
+ Experience with HPC schedulers and resource managers
+ Experience executing scientific software on HPC systems
+ Experience writing user documentation
+ Strong technical and analytical skills
+ Strong verbal and written communication skills
+ Always maintains the highest level of professionalism when interacting with internal and external customers
+ Demonstrates a high-energy, positive attitude and commitment to quality customer service
+ Contributes to a positive team environment within the center by demonstrating a strong work ethic, effectively communicating with others, and proactively anticipating center and user needs
+ Experience coordinating and running support teams
+ Related industry certifications preferred.
Physical Demands
+ Ability to lift, move and install HPC data center hardware and supplies.
+ Standing for extended periods while performing data center related tasks.
Training
+ All job specific, safety, and compliance training are assigned based on the job functions associated with this employee.
Other
+ This position requires periodic travel and some evenings, weekends, and/or holidays.
+ Job may require after-hours response to emergency issues.
+ Periodically scheduled on-call may require after-hours response for technical emergencies not explicitly related to assigned job responsibilities
**Conditions of Employment:** Individual must successfully complete pre-employment process, which includes criminal background check, drug screening, credit check ( applicable for certain positions) and reference verification.
This job description reflects management’s assignment of essential functions. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities to this job at any time.
Caris Life Sciences is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, gender identity, sexual orientation, age, status as a protected veteran, among other things, or status as a qualified individual with disability.
Caris Life Sciences is a leading innovator in molecular science and artificial intelligence focused on fulfilling the promise of precision medicine through quality and innovation.
Caris is committed to quality and excellence at our state-of-the-art laboratories. Learn more about our tissue lab and the advanced technologies that are helping improve the lives of cancer patients.
-
Recent Jobs
-
Sr. HPC Architect - Hybrid
- Caris Life Sciences (Irving, TX)
-
Furnace Operator I
- Collins Aerospace (Spokane, WA)
-
Manager, Global Strategies Group
- Deloitte (Washington, DC)
-
Training & Tech Manual Sub-IPT Lead
- Raytheon (Austin, TX)