- Celestica (Merrimack, NH)
- …Hardware experience: Familiarity with main elements of CPU, DPU, memory, NICs, board monitoring elements is a must + Debugging and testing skills: Ability to ... identify and resolve software and hardware issues at the rack level. + Problem-solving skills: Strong analytical and problem-solving abilities + Experience with data center deployments: Prior experience in data center architectures, developing and maintaining… more
- Banc of California (Santa Ana, CA)
- …as assigned. **WHAT YOU'LL BRING** + Demonstrates knowledge of, adherence to, monitoring and responsibility for compliance with state and federal regulations and ... laws as they pertain to this position including but not limited to the following: Regulation Z (Truth in Lending Act), Regulation B (Equal Credit Opportunity Act), Fair Housing Act (FHA), Home Mortgage Disclosure Act (HMDA), Real Estate Settlement Procedures… more
- Comcast (Chicago, IL)
- …as Aerospike, Snowflake, Databricks, Spark, Presto, and EMR. + Experience with monitoring tools like Datadog, Prometheus, Grafana, and ELK stack. + Demonstrated ... ability to troubleshoot and resolve complex technical issues. + Excellent communication and collaboration skills with the ability to work effectively across teams and regions. + Ability to thrive in a fast-paced, dynamic environment with strong adaptability to… more
- Amazon (Cupertino, CA)
- …data center. After launch you will oversee the fleet of servers you develop, monitoring their quality and how they are meeting the customer requirements. This is a ... fast-paced, intellectually challenging position, and you'll work with thought leaders in multiple technology areas. You'll have high standards for yourself and everyone you work with, and you'll be constantly looking for ways to improve your products'… more
- MongoDB (New York, NY)
- …engineers debug tasks more effectively + Ensure best practices in testing (Cypress), monitoring , and reliability + Mentor team members and help shape the technical ... and cultural direction of our team **What You Bring to the Table** + 5+ years of strong experience in React and TypeScript + Experience working with backend systems (experience in a statically typed compiled language like Go is a plus) + Interest in developing… more
- Coinbase (Phoenix, AZ)
- …* Lead end-to-end delivery of projects through implementation, deployment, and monitoring * Improve and maintain operational excellence standards across the team, ... proactively addressing technical debt and driving improvements in reliability and observability * Participate in code reviews and on-call rotation, lead incident response, and foster a team-wide environment that welcomes constructive feedback to maintain high… more
- Coinbase (Jefferson City, MO)
- …* Lead end-to-end delivery of projects through implementation, deployment, and monitoring * Improve and maintain operational excellence standards across the team, ... proactively addressing technical debt and driving improvements in reliability and observability * Participate in code reviews and on-call rotation, lead incident response, and foster a team-wide environment that welcomes constructive feedback to maintain high… more
- NVIDIA (Santa Clara, CA)
- …lifecycle management for large-scale Machine Learning systems. + Implement monitoring and health management capabilities that enable industry-leading reliability, ... availability, and scalability of GPU assets. You will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry. + Work on software that manages NVLINK topography across GPU clusters. + Build automated test… more
- NVIDIA (Santa Clara, CA)
- …+ Apply reinforcement learning to finetune multimodal LLMs. + Develop robust monitoring and debugging tools to ensure the reliability and performance of training ... workflows on large GPU clusters. What we need to see: + Bachelor's degree in Computer Science, Robotics, Engineering, or a related field or equivalent experience. + 10+ years of full-time industry experience in large-scale MLOps and AI infrastructure. + Proven… more
- Amazon (Seattle, WA)
- …with deploying and operating hardware and applications at scale - Developed monitoring and alerting systems to quickly identify and categorize failures in production ... environments - Proven results oriented person with a bias for action. Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. Los Angeles County applicants: Job… more
Recent Jobs
-
Electrical, Instrumentation & Control Technician (EI&C)
- City Utilities of Springfield (Springfield, MO)
-
Summer Internship - Customs Operations & Compliance
- Satair USA, Inc. (Dulles, VA)
-
Senior Director, Enterprise Architecture
- Keurig Dr Pepper (Frisco, TX)
-
Director of Multi-Carrier Contact Center Operations (Charlotte)
- Usaa (Charlotte, NC)