biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Christian Munley	48e376a007	qwen3coder tool parser fix anyOf double encoded parameters (#36032 ) Signed-off-by: Christian Munley <cmunley@nvidia.com>	2026-03-05 09:06:57 +00:00
Isotr0py	21eb2c3372	[Chore] Correct MTP models test registry ordering (#36115 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-05 08:55:04 +00:00
Seiji Eicher	e2b31243c0	[Docs] Update `CacheConfig` block_size docstring to remove inaccurate limit when using CUDA (#35632 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2026-03-05 06:24:08 +00:00
Martin Hickey	c3598d02fa	[Misc] Remove deprecated items that are due for removal (#36006 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2026-03-05 06:14:50 +00:00
Benjamin Chislett	57c629e9c1	[Bugfix] Fix block_size for hybrid model MTP (#36036 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-05 06:10:54 +00:00
zihaoanllm	d106bf39f5	[Doc] Add Parallel Draft Models (#35973 ) Signed-off-by: <zihaoan2@amd.com> Signed-off-by: zihaoanllm <zihaoan2@amd.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 05:44:07 +00:00
Yanan Cao	b0651021e5	[Kernel] [Helion] [11/N] Retune configs for silu_mul_fp8 (#36062 )	2026-03-04 21:25:59 -08:00
Hanjun Cho	f600d5192e	[Bugfix] Fix score layer quantization for sequence classification models - Qwen3 (VL) Reranker (#35849 ) Signed-off-by: Hanjun Cho <gkswns0531@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-04 20:57:20 -08:00
Tianmu Li	8e7820131e	[Perf] Use dummy M for weight prepacking on x86 (#35890 ) Signed-off-by: Li, Tianmu <tianmu.li@intel.com>	2026-03-05 04:56:49 +00:00
Andrii Skliar	0a12cea25f	Order `config.py` in Lexicographical order (#35866 ) Signed-off-by: Andrii Skliar <askliar@nvidia.com> Co-authored-by: Andrii Skliar <askliar@nvidia.com>	2026-03-04 20:56:47 -08:00
Zhengxu Chen	dd6dbd93f8	[compile] Fix extra cache save on warm start. (#35921 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 12:56:30 +08:00
Harry Mellor	26366009c5	[CI] Don't leave docs preview comment on closed PRs (#36087 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 04:51:46 +00:00
Nick Hill	16c472abe7	[Core] Move ray-specific WorkerWrapperBase methods to RayWorkerWrapper (#35328 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-05 12:11:59 +08:00
daje0601	3b23d57c96	[Model] Add LoRA support for Whisper models (#29856 ) Signed-off-by: daje0601 <englishmt4118@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-05 10:38:25 +08:00
Wentao Ye	2f4226fe52	[CI] Fix pre-commit mypy issue in main (#36049 )	2026-03-04 18:13:12 -08:00
nkm-meta	792cbd64ca	Add platform method to enable custom collective ops registration (#34760 ) Signed-off-by: Naina Kuruballi Mahesh <nainakm@meta.com>	2026-03-05 00:50:32 +00:00
Zhengxu Chen	2ed4722e26	[compile] Reduce log spam from compile. (#36044 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 00:48:36 +00:00
Nick Hill	a3299c3d1d	[Model Runner V2] Misc code simplification (#35941 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-04 15:26:35 -08:00
Andreas Karatzas	6c21a0c2d7	[ROCm][CI] Added MI325 mirrors (stage C) (#35239 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-04 14:48:46 -08:00
Shanshan Shen	562339abc3	[Misc] Support OOT linear method registering (#35981 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-03-04 22:25:56 +00:00
amitz-nv	d7adcadb9b	[Bugfix] Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 (#36017 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>	2026-03-04 22:23:51 +00:00
Simon Mo	f678c3f61a	[RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag (#35928 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-03-04 17:05:32 -05:00
Thomas Parnell	be0a3f7570	[Bugfix] Fix race in non-blocking num_accepted_tokens GPU->CPU copy (#36013 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 13:52:44 -08:00
Harry Mellor	17dc9c7fc9	[CI] Bump `mypy` version (#34950 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 20:55:11 +00:00
fenypatel99	7eca859110	Add PyTorch profiler schedule support with warmup/active iterations (#35240 )	2026-03-04 12:53:38 -08:00
Russell Bryant	636ee223ac	[Docs] Document security risks of GPT-OSS Python tool (#35139 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 20:27:31 +00:00
Robert Shaw	b7d59ffce2	[UX] Remove NoOpOffloader log (#35678 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-03-04 12:13:40 -08:00
Richard Zou	5569f5218d	[torch.compile] Stop lazily compiling (#35472 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-04 12:13:17 -08:00
Davina Zaman	138d891d7f	[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441 ) Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 11:44:39 -08:00
Stefano Castagnetta	d7166e74c1	[CI] Add Blackwell AsyncTP correctness test (#35871 ) Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>	2026-03-04 19:41:21 +00:00
Nick Hill	417fd28fb1	[Model Runner V2] Fix pooling (#36019 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-04 10:53:17 -08:00
tomeras91	7faba503c4	[Kernel][Mamba] Optimize Mamba2 SSD prefill Triton kernels (#35397 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2026-03-04 19:47:17 +01:00
Hyunkyun Moon	bc6be89d16	[Frontend] Add vllm launch command for GPU-less preprocessing serving (#34551 ) Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>	2026-03-04 18:41:52 +00:00
Maxime Grenu	32224f568a	docs: update CPU Docker images to reference Docker Hub instead of AWS ECR (#34882 ) Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:31:35 -08:00
Abhishek Mathukiya	f3dc292e9f	docs: add version requirement note for --profiler-config flag (#32454 ) Signed-off-by: abhishkh <mathukiya.a@northeastern.edu>	2026-03-04 18:13:54 +00:00
Chen	138c5fa186	[Docs] Add RunPod GPU deployment guide for vLLM (#34531 ) Signed-off-by: lisperz <zhuchen200245@163.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:11:34 -08:00
Russell Bryant	2f2c1d73a7	[Docs] Upgrade dynamic LoRA warning to admonition block (#35218 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 10:01:42 -08:00
Bhuminjay Soni	fb3e78ab09	[Feature][CI]: compare `func` & `no_func` outputs in test_functionalization.py (#35481 ) Signed-off-by: Bhuminjay <bhuminjaysoni@gmail.com> Signed-off-by: Bhuminjay Soni <Soni5Happy@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-04 18:01:16 +00:00
Michael Yao	fd3bfe74c9	[Docs] Update design/multiprocessing.md (#30677 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2026-03-04 17:58:59 +00:00
tc-mb	bfdb512f11	fix minicpmo4.5: fix attn_mask in vit attn && fix resampler pos_emb i… (#34127 ) Signed-off-by: tc-mb <caitianchi@modelbest.cn> Co-authored-by: hezhihui <hezhihui@modelbest.cn>	2026-03-04 17:46:17 +00:00
Sage	d25c1ec3c9	docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build (#35090 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-04 17:45:35 +00:00
Xing Liu	7cc6058ac6	[Doc] Add MTP docs and update speculative decoding guidance (#35197 ) Signed-off-by: liuxing <945764858@qq.com>	2026-03-04 17:23:34 +00:00
Manrique Vargas	28028dff2f	fix(docs): use static rdzv backend in multi-node troubleshooting script (#34784 ) Signed-off-by: machov <mv1742@nyu.edu>	2026-03-04 17:15:35 +00:00
Dr Alex Mitre	3417ba5648	docs: add README for logits_processor examples (#35933 )	2026-03-04 17:09:19 +00:00
Yan Ma	58cfe0dc44	Fix phi4-mm and remove cuda binding (#35964 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-03-05 01:08:05 +08:00
simone-dotolo	e86221deb6	[Doc] Fix GPU Worker count in Process Count Summary (#36000 ) Signed-off-by: simone-dotolo <simonedotolo@libero.it> Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-04 17:03:14 +00:00
Netanel Haber	289fc48ab7	Use MMEncoderAttention (=use FlashAttention) instead of torch.sdpa in radio.py (#35653 )	2026-03-04 08:43:13 -08:00
Christian Pinto	2f2212e6cc	Split generic IO Processor plugins tests from Terratorch specific ones (#35756 ) Signed-off-by: Christian Pinto <christian.pinto@ibm.com>	2026-03-05 00:01:03 +08:00
Nicolò Lucchesi	18e01a0a10	[Misc] Add `--attention-backend auto` option (#35738 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-04 15:12:27 +00:00
sungsoo ha	6cb901093f	[Core] Add All-to-All communication backend for DCP (#34883 ) Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Signed-off-by: sungsoo ha <hasungsoo@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:01:57 -05:00

1 2 3 4 5 ...

14506 Commits