- Amazon (Cupertino, CA)
- …accelerators. You will work closely with teams across AWS Neuron including compiler , training and inference optimization to optimize frameworks for AWS's accelerator ... architectures, and engage closely with the PyTorch and JAX and other ML Framework communities to take advantage of their latest capabilities and improve performance and usability for ML model developers. A successful candidate will have a experience developing… more
- NVIDIA (Santa Clara, CA)
- …Comfortable engaging developers on topics like performance optimization, profiling, debugging, and compiler toolchains. NVIDIA is commonly regarded as one of the top ... employers in the technology industry. We have a team of highly innovative and dedicated individuals. If you possess creativity and independence, we want to work with you! Your base salary will be determined based on your location, experience, and the pay of… more
- Meta (Menlo Park, CA)
- …hardware, models and runtime, giving crucial feedback to the architecture, compiler , kernel, modeling and runtime teams. 4. Explore, co-design and productionize ... model compression techniques such as Quantization, Pruning, Distillation and Sparsity to improve training and inference efficiency. 5. Explore, prototype and productionize highly optimized ML kernels to unlock full potential of current and future accelerators… more
- Capital One (San Francisco, CA)
- …Deep knowledge of deep learning algorithmic and/or optimizer design + Experience with compiler design + **Finetuning** + PhD focused on topics related to guiding ... LLMs with further tasks (Supervised Finetuning, Instruction-Tuning, Dialogue-Finetuning, Parameter Tuning) + Demonstrated knowledge of principles of transfer learning, model adaptation and model guidance + Experience deploying a fine-tuned large language model… more
- NVIDIA (Santa Clara, CA)
- …internal and external partners, including teams within NVIDIA such as the Compiler , Driver, and GPU Architecture teams. + Drive technology discussions and provide ... feedback on system architecture as well as demonstrate ongoing growth in technical and leadership abilities. + Accurately estimate and prioritize tasks in order to create realistic delivery schedules. + Write fast, effective, maintainable, reliable and… more
- Meta (Menlo Park, CA)
- …hardware. As part of the AI acceleration software stack, we develop PyTorch compiler frontend for MTIA, PyTorch runtime for inference & training, high performance ... runtime and kernel libraries exploiting various hardware architectural features and tooling.We are looking for an engineering manager to support MTIA software stack development for training and inference platform. **Required Skills:** Software Engineering… more
- Capital One (San Francisco, CA)
- …Deep knowledge of deep learning algorithmic and/or optimizer design + Experience with compiler design + **Finetuning** + PhD focused on topics related to guiding ... LLMs with further tasks (Supervised Finetuning, Instruction-Tuning, Dialogue-Finetuning, Parameter Tuning) + Demonstrated knowledge of principles of transfer learning, model adaptation and model guidance + Experience deploying a fine-tuned large language model… more
- NVIDIA (Santa Clara, CA)
- …the crowd: + Prior experience with a LLM framework or a DL compiler in inference, deployment, algorithms, or implementation + Prior experience with performance ... modeling, profiling, debug, and code optimization of a DL/HPC/high-performance application + Architectural knowledge of CPU and GPU + GPU programming experience (CUDA or OpenCL) NVIDIA is widely considered to be one of technology's most desirable employers. We… more
- NVIDIA (Santa Clara, CA)
- …inference frameworks engineering, focusing on SGLang. + Partner with internal compiler , libraries, and research teams to deliver end-to-end optimized inference ... pipelines across NVIDIA accelerators. + Oversee performance tuning, profiling, and optimization of large-scale models for LLM, multimodal, and generative AI applications. + Guide engineers in adopting best practices for CUDA, Triton, CUTLASS, and multi-GPU… more
- NVIDIA (Santa Clara, CA)
- …Ways to stand out from the crowd: + Background in domain specific compiler and library solutions for LLM inference and training (eg FlashInfer, Flash Attention) ... + Expertise in inference engines like vLLM and SGLang + Expertise in machine learning compilers (eg Apache TVM, MLIR) + Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar) + Open… more