- NVIDIA (Santa Clara, CA)
- …focuses on optimizing generative AI models such as large language models ( LLM ) and diffusion models for maximal inference efficiency using techniques ranging from ... NVIDIA's ecosystem (TensorRT Model Optimizer, Megatron-LM, Megatron-Bridge, Nvidia-NeMo, NeMo-AutoModel, TensorRT- LLM ) and open-source frameworks (PyTorch, Hugging Face, vLLM, SGLang).… more
- Oracle (Sacramento, CA)
- …of developing cutting-edge AI solutions that push the boundaries of machine learning, LLM applications, and agentic AI. Our team builds real-world AI systems and ... to the design and deployment of advanced AI systems, including LLM -powered agents, Retrieval-Augmented Generation (RAG) pipelines, and structured AI workflows. As… more
- NVIDIA (Santa Clara, CA)
- …for NVIDIA software and hardware engineers. + Work with HW chip designers and LLM research teams to grasp GPU design needs and align LLM infrastructure ... continuously looking for opportunities to apply these advancements to improve LLM infrastructure. + Lead with purpose and maintain high-quality engineering practices… more
- Oracle (Sacramento, CA)
- …developing cutting-edge AI solutions that push the boundaries of machine learning, LLM applications and AI agents. We work on real-world AI applications, deploying ... solutions across Oracle's Enterprise customers. If you are passionate about Data, LLM , AI Agents, Retrieval-Augmented Generation (RAG), and Enterprise scale AI, we… more
- NVIDIA (Santa Clara, CA)
- …design and ship methodologies, code, and reference architectures that bring RAG, LLM inference, and Multi-Agent workflows to life using NVIDIA libraries (NeMo, NIMs, ... GenAI lifecycle with depth in select areas such as Data Curation, LLM Pre-training, Finetuning such as PEFT, SFT, post-training, Reasoning, RAG, Multi-agent… more
- NVIDIA (Santa Clara, CA)
- …that scale from a handful to thousands of GPUs, supporting a variety of LLM frameworks (eg, TensorRT- LLM , vLLM, SGLang). + Disaggregated Serving: Architect and ... of disaggregated serving for Dynamo-supported inference engines (vLLM, SGLang, TRT- LLM , llama.cpp, mistral.rs). + Improve intelligent routing and KV-cache management… more
- CVS Health (Sacramento, CA)
- …retrieval pipelines, Snowflake compute (Snowpark), and integration with LLM -driven applications. This role will be utilizing their expertise ... for compute, storage, and API usage related to document analytics and LLM integration. + Produce technical documentation, runbooks, and clear explanations of model/… more
- Oracle (Sacramento, CA)
- …Technologist** who brings both depth and versatility across AI/ML solutions & LLM agents. A **people leader** who can inspire, mentor, and grow high-performing ... a high-performing engineering team dedicated to developing and scaling AI/ML solutions and LLM agents for the healthcare sector. + Oversee the design and development… more
- Meta (Menlo Park, CA)
- …content and user understanding team, with a focus of Large Language Model ( LLM ). We conduct focused research and engineering to build state-of-the-art LLMs. As a ... key driver of Meta's app growth, we're dedicated use LLM -powered world knowledge to deliver user experiences across Facebook, Instagram, Threads, and more. We are… more
- NVIDIA (Santa Clara, CA)
- …in agentic and reasoning use cases. As the scale and complexity of these LLM systems continues to increase, we are seeking outstanding engineers to join our team ... and help shape the future of LLM inference. Our team is dedicated to pushing the...generative AI, agents, and inference systems into the NVIDIA LLM software stack. + Workload Analysis and Optimization: Conduct… more