vLLM

vLLM · 2025-09-04T22:54:10.073Z

🚀Join us for the Boston vLLM Meetup on September 18! Our first Boston meetup back in March was fully packed, so register early! Hosted by Red Hat and Venture Guides, this event brings together vLLM users, developers, maintainers, and engineers to explore the latest in vLLM and optimized inference. Expect deep technical talks, live demos, and plenty of time to connect with the community. 📍Location: Venture Guides office by TD Garden/North Station 🕔Time: 5:00 PM – 8:30 PM Agenda highlights: * Intro to vLLM & project update * Model optimization with LLM Compressor and Speculators * Demo: vLLM + LLM Compressor in action * Distributed inference with llm-d * Q&A, discussion, and networking (with pizza 🍕 & refreshments) 👉 Register here: https://luma.com/vjfelimw Come meet the vLLM team, learn from experts, and connect with others building the future of inference.

Software Development

An open source, high-throughput and memory-efficient inference and serving engine for LLMs.

View all 13 employees

About us

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs

Website: https://github.com/vllm-project/vllm
External link for vLLM
Industry: Software Development
Company size: 51-200 employees
Type: Nonprofit

Employees at vLLM

See all employees

Updates

vLLM

4,386 followers
8h
Report this post
Welcome Ray from Anyscale to join the PyTorch foundation family! 🎉
Anyscale

53,743 followers
9h Edited

Today we are excited to announce Ray is joining The Linux Foundation, where it will be part of the PyTorch Foundation alongside PyTorch and vLLM . This ensures the project’s long-term neutrality, open governance, and deep alignment with the AI ecosystem. Ray began at UC Berkeley, created by Robert Nishihara, Philipp Moritz, and Ion Stoica to scale their own AI research in reinforcement learning. What started as a research project has become a foundational technology that powers data processing, training, and serving for teams running AI at production scale. By joining the broader Linux Foundation ecosystem, Ray now sits not only at the heart of the open-source AI stack, but also within the scalable AI compute stack that powers real-world production. This connects Ray alongside projects like Kubernetes under the Cloud Native Computing Foundation (CNCF), forming a unified foundation for modern AI infrastructure. As AI evolves from research to production, this alignment matters more than ever. Teams need an open, cohesive stack that reduces complexity and waste. Hosting Ray within the overarching Linux Foundation strengthens this open compute layer: from PyTorch (framework) to vLLM (inference), Ray (distributed compute), and Kubernetes (orchestration). “Ray joining the PyTorch Foundation is a significant milestone for open source AI. With PyTorch, vLLM, and Ray working side by side under one neutral home, developers can have a single source for a unified, community-driven compute stack that can scale from strategy to production, without the friction of proprietary systems. We are enthusiastic about what’s ahead for the foundation and are excited to bring our community a hub of flexible, fast, and open AI infrastructure.” – Joe Spisak, Board Member of the PyTorch Foundation and Director of Product Management, Meta Super Intelligence Labs Check out our announcement blog to learn more about Ray and how you can join the Ray community as a contributor or developer: https://lnkd.in/eRREMG2s Together, we will continue building the compute fabric for AI.
Like Comment Share
vLLM

4,386 followers
8h
Report this post
A new SOTA Deep Research model from Pokee AI, with vLLM's day-0 support!🚀

Zheqing (Bill) Zhu

Founder and CEO at Pokee AI | ex-Head of Applied RL Group at Meta AI (Senior Staff TLM) | Stanford PhD in RL
8h

🚀 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗣𝗼𝗸𝗲𝗲𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵-𝟳𝗕, 𝗮 𝗦𝗢𝗧𝗔 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝗱𝗲𝗲𝗽 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗮𝗴𝗲𝗻𝘁 𝘁𝗵𝗮𝘁 𝗼𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗮𝗹𝗹 𝗼𝘁𝗵𝗲𝗿 𝟳𝗕 𝗱𝗲𝗲𝗽 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗮𝗴𝗲𝗻𝘁𝘀. We are open-sourcing both weights and inference code on Hugging Face! We are excited to have partnered with vLLM, sgl-project and verl on training and inference pipelines. • 𝗚𝗶𝘁𝗵𝘂𝗯 𝗥𝗲𝗽𝗼: https://lnkd.in/gYShw3Nc • 𝗛𝘂𝗴𝗴𝗶𝗻𝗴𝗳𝗮𝗰𝗲 𝗠𝗼𝗱𝗲𝗹: https://lnkd.in/g6VXjFZz • 𝗪𝗲𝗯𝗽𝗮𝗴𝗲: https://lnkd.in/gEHDBj6q • 𝗔𝗿𝗫𝗶𝘃: https://lnkd.in/geeAMnv8 PokeeResearch-7B makes three key improvements to LLM RL agent training pipelines. • Fully adopt reinforcement learning from AI feedback (RLAIF) with grounding; • Self-verification and CoT on tool call and answer reasoning; • Inference time scaling that finds the best possible output with self-evaluation. 𝗣𝗼𝗸𝗲𝗲𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵-𝟳𝗕 𝗴𝗿𝗲𝗮𝘁𝗹𝘆 𝗼𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗮𝗹𝗹 𝗼𝘁𝗵𝗲𝗿 𝟳𝗕 𝗱𝗲𝗲𝗽 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗺𝗼𝗱𝗲𝗹𝘀 𝗮𝗰𝗿𝗼𝘀𝘀 𝗕𝗿𝗼𝘄𝘀𝗲𝗖𝗼𝗺𝗽, 𝗛𝗟𝗘, 𝗚𝗔𝗜𝗔, 𝗮𝗻𝗱 𝟳 𝗽𝗼𝗽𝘂𝗹𝗮𝗿 𝗤𝗔 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀. Want to get the best results, fast & easy? We also host our proprietary PokeeResearch-Preview model via an API at https://lnkd.in/gEHDBj6q. You get a similar quality deep research result compared to OpenAI and Perplexity, but with 4x more affordable than their deep research APIs. The thoughtful, well-researched and well-cited results you need, at a dramatically better price! This only marks the beginning of Pokee’s open-source initiative. We will keep sharing the best model, tool, and software to the community. The most comprehensive benchmark and dataset for tool calling, the most grounded model for RL agent evaluation, better RL approaches for LLMs are all coming soon! We’re also beginning a weekly launch series 👀 - 𝗷𝗼𝗶𝗻 𝘂𝘀 𝗼𝗻 𝗗𝗶𝘀𝗰𝗼𝗿𝗱 (https://lnkd.in/g93q8BCW ) to get early access to our models; share ideas & feedback; and be a part of the team as we launch exciting new products to the world!

GitHub - Pokee-AI/PokeeResearchOSS: Pokee Deep Research Model Open Source Repo github.com

Like Comment Share
vLLM

4,386 followers
1d
Report this post
💡 vLLM @ Open Source AI Week！ 1️⃣ Wednesday, Oct 23 & Thursday, Oct 24: vLLM @ Pytorch Conference 2025 🚀 Explore vLLM at PyTorch Conference 2025! 📅 Sessions to catch: 1. Easy, Fast, Cheap LLM Serving for Everyone – Simon Mo, Room 2004/2006 2. Open Source Post-Training Stack: Kubernetes + Ray + PyTorch + VLLM – Robert Nishihara, Room 2004/2006 3. No GPU Left Behind: Scaling Online LLM Training With Co-located VLLM in TRL – Mert Toslali & Yu Chin Fabian Lim, Room 2000/2002 4. Enabling VLLM V1 on AMD GPUs With Triton – TBA, Room 2001/2003 5. Lightning Talk: vllm-triton-backend: State-of-the-art Performance on NVIDIA & AMD – Burkhard Ringlein, Room 2001/2003 Don’t miss insights on scaling, GPU efficiency, and cutting-edge LLM serving! https://lnkd.in/g-shmYxh 2️⃣ Wednesday, Oct 23: Trivia & Community: NVIDIA × DeepInfra x vLLM Join us during PyTorch Open Source AI Week for a fun, interactive evening: AI, tech & pop culture trivia with prizes Network with AI infrastructure & open-source enthusiasts Food & drinks included Learn, connect, and have fun outside the conference sessions! https://luma.com/cpgzpcwt
Like Comment Share
vLLM reposted this
NVIDIA AI

1,424,135 followers
2w Edited
Report this post
NVIDIA + Open Source AI Week 2025 – powered by partnerships, events, and community 👏 We’re excited to bring together technology, community, and collaboration at Open Source AI Week 2025, hosted by The Linux Foundation. From informal meetups to hackathons, panels to poster sessions, here’s the scoop on where you can join us in to advance open-source AI 👇 🥤 AI Dev Night with Unsloth AI & Mistral AI — Join us for boba and talks on training & deployment with RTX AI PCs. 🧩 Trivia & Community with Deep Infra Inc. & vLLM — A fun, interactive quiz night to connect engineers, practitioners, and open-source devs. 🧑💻 GPU MODE IRL Hackathon — Over 215 networked NVIDIA B200 GPUs, we are joining other mentors from Thinking Machines, Unsloth AI, PyTorch, Periodic Labs, Mobius Labs, Google DeepMind, and more — courtesy of Nebius. 🙌 PyTorch Conference 2025 — NVIDIA-led sessions, posters, meetups, and panels aligned with the flagship event. 🤖 Open Agent Summit — NVIDIA Developer Advocate Mitesh Patel will join a panel on The Future of Agents & Human-Agent Collaboration. 🧠 Measuring Intelligence Summit — Vivienne Zhang (Senior PM, Generative AI Software) will speak on reasoning models, benchmarks, and superintelligence 🤓 Technical Sessions & Posters — Covering topics like Lightweight, High-Performance FSDP on NVIDIA GPU, Scaling KV Caches for LLMs, and more. ⚡ Dynamo & Dine with Baseten — hands-on LLM inference & scaling 💬 Model Builders Meetup with NVIDIA Nemotron & Prime Intellect — open frontier models + RL 🔗 Stay in the loop — bookmark our event page for updates as we add more → https://nvda.ws/48ExDSb
8 Comments

Like Comment Share
vLLM reposted this
Deep Infra Inc.

1,836 followers
2w
Report this post
Come join us for trivia night in SF on Oct 22 with NVIDIA and vLLM. All things open-source and inference. RSVP: https://luma.com/cpgzpcwt
NVIDIA AI

1,424,135 followers
2w Edited

NVIDIA + Open Source AI Week 2025 – powered by partnerships, events, and community 👏 We’re excited to bring together technology, community, and collaboration at Open Source AI Week 2025, hosted by The Linux Foundation. From informal meetups to hackathons, panels to poster sessions, here’s the scoop on where you can join us in to advance open-source AI 👇 🥤 AI Dev Night with Unsloth AI & Mistral AI — Join us for boba and talks on training & deployment with RTX AI PCs. 🧩 Trivia & Community with Deep Infra Inc. & vLLM — A fun, interactive quiz night to connect engineers, practitioners, and open-source devs. 🧑💻 GPU MODE IRL Hackathon — Over 215 networked NVIDIA B200 GPUs, we are joining other mentors from Thinking Machines, Unsloth AI, PyTorch, Periodic Labs, Mobius Labs, Google DeepMind, and more — courtesy of Nebius. 🙌 PyTorch Conference 2025 — NVIDIA-led sessions, posters, meetups, and panels aligned with the flagship event. 🤖 Open Agent Summit — NVIDIA Developer Advocate Mitesh Patel will join a panel on The Future of Agents & Human-Agent Collaboration. 🧠 Measuring Intelligence Summit — Vivienne Zhang (Senior PM, Generative AI Software) will speak on reasoning models, benchmarks, and superintelligence 🤓 Technical Sessions & Posters — Covering topics like Lightweight, High-Performance FSDP on NVIDIA GPU, Scaling KV Caches for LLMs, and more. ⚡ Dynamo & Dine with Baseten — hands-on LLM inference & scaling 💬 Model Builders Meetup with NVIDIA Nemotron & Prime Intellect — open frontier models + RL 🔗 Stay in the loop — bookmark our event page for updates as we add more → https://nvda.ws/48ExDSb
Like Comment Share
vLLM

4,386 followers
1mo
Report this post
🇸🇬 vLLM Singapore Meetup — Highlights Thanks to everyone who joined! Check out the slides by vLLM’s DarkLight1337 with tjtanaa / Embedded LLM * V1 is here: faster startup, stronger CI & perf checks. * Scaling MoE: clear Expert Parallelism (EP) setup for single/multi-node + elastic EP to match traffic. * Disaggregated serving: split prefill vs. decode to tune TTFT (time-to-first-token) vs. throughput. * MLLM speedups: reuse embeddings with a processor cache, optional GPU-side processors, and encoder DP-across-TP (replicate small encoders per TP rank; shard the decoder) to cut comms overhead. Also: WEKA — vLLM + LMCache Lab + SSD for high-perf KV cache. @ASTARsg MERaLiON — deploying AudioLLM with vLLM + Ray for autoscaling & load balancing. Slides Folder: https://lnkd.in/gwVdv6-k
1 Comment

Like Comment Share
vLLM reposted this
Stephen Watt

Vice President, Distinguished Engineer, Office of the CTO
1mo Edited
Report this post
Hi folks - If you're in the Austin Area on Wednesday September 17th, we (PyTorch ATX) are hosting a joint meetup with the vLLM community at the Capitol Factory and we'd love to have you join us. The sessions are listed below. You'll get a solid grounding in vLLM and also learn about two really cool new ground breaking projects in the semantic router and llm-d. We have 200 people already signed up, but still have a few spots open, please help us share the event. It's going to be awesome! - https://lnkd.in/gPwt-ZQn - Getting started with inference using vLLM - Steve Watt, PyTorch ambassador - An intermediate guide to inference using vLLM - PagedAttention, Quantization, Speculative Decoding, Continuous Batching and more - Luka Govedič, vLLM core committer - vLLM Semantic Router - Intelligent Auto Reasoning Router for Efficient LLM Inference on Mixture-of-Models - Huamin Chen, vLLM Semantic Router project creator - Combining Kubernetes and vLLM to deliver scalable, distributed inference with llm-d - Greg Pereira, llm-d maintainer
6 Comments

Like Comment Share
vLLM

4,386 followers
1mo
Report this post
🚀Join us for the Boston vLLM Meetup on September 18! Our first Boston meetup back in March was fully packed, so register early! Hosted by Red Hat and Venture Guides, this event brings together vLLM users, developers, maintainers, and engineers to explore the latest in vLLM and optimized inference. Expect deep technical talks, live demos, and plenty of time to connect with the community. 📍Location: Venture Guides office by TD Garden/North Station 🕔Time: 5:00 PM – 8:30 PM Agenda highlights: * Intro to vLLM & project update * Model optimization with LLM Compressor and Speculators * Demo: vLLM + LLM Compressor in action * Distributed inference with llm-d * Q&A, discussion, and networking (with pizza 🍕 & refreshments) 👉 Register here: https://luma.com/vjfelimw Come meet the vLLM team, learn from experts, and connect with others building the future of inference.

Boston vLLM Meetup · Luma luma.com

Like Comment Share

vLLM

Software Development

An open source, high-throughput and memory-efficient inference and serving engine for LLMs.

About us

Employees at vLLM

Michael Goin

Inference Optimization @ Red Hat | vLLM Maintainer

Robert Shaw

LLM Inference

Wenlong Wang

Ph.D. @UMN | @ex-Google | @vLLM

Luka Govedič

vLLM committer @ Red Hat | Performance engineering, HPC, parallel computing, CPU & CUDA

Updates

Join now to see what you are missing

Similar pages

sgl-project

Hugging Face

Ollama

Embedded LLM

LMCache Lab

Prime Intellect

Unsloth AI

Thinking Machines Lab

llm-d

Anyscale