biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
zhrrr	16786da735	[Model Runner V2] support apply penalty for spec decode (#33251 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2026-02-06 10:56:48 -08:00
vllmellm	aaa2efbe98	[DOC] [ROCm] Update docker deployment doc (#33971 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 10:05:35 -08:00
Seiji Eicher	aca5967416	[KV Connector] Add missing method overrides to MultiConnector (#33292 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2026-02-06 12:58:21 -05:00
Wentao Ye	67a746e87f	[Log] Optimize duplicate startup log (#33944 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-06 17:49:56 +00:00
Chauncey	7bec435130	[Bugfix] Fix the issue where tool calling does not work when using fast detokenization with dsv32 (#33964 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-02-06 09:23:44 -08:00
Eldar Kurtić	5c52644b10	[Docs] Update link to Benchmark CLI documentation (#33254 ) Signed-off-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>	2026-02-06 16:00:59 +00:00
zofia	2ce9fe4ad0	[XPU][5/N] add wna16 xpu kernel (#33973 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>	2026-02-06 15:59:53 +00:00
Cyrus Leung	cd8b405bd0	[Refactor] Consolidate sequence normalization and enc-dec parsing (#33928 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-06 15:43:47 +00:00
tc-mb	4707f7ebb4	[Model] Support MiniCPM-o 4.5 (#33431 ) Signed-off-by: caitianchi <caitianchi@modelbest.cn> Signed-off-by: tc-mb <caitianchi@modelbest.cn> Co-authored-by: mslv <mslv@baai.ac.cn>	2026-02-06 15:29:10 +00:00
Michael Goin	c39ee9ee2b	[Docs] Add sections on process architecture and minimum CPU resources (#33940 ) It seems users can be confused about vLLM's performance when running with very small amounts of CPU cores available. We are missing a clear overview of what vLLM's process architecture is, so I added this along with some diagrams in arch_overview.md, and included a section on CPU resource recommendations in optimization.md Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-06 15:26:43 +00:00
Andreas Karatzas	350ca72c04	[ROCm][AITER] Fix AITER import regression for explicit backend selection (#33749 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-06 15:08:16 +00:00
FredericOdermatt	1fb0495a72	[FIX] guidance: use max(vocab_size, len(tokenizer)) for n_vocab (#33509 ) Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch>	2026-02-06 14:23:03 +00:00
Raushan Turganbay	85ee1d962b	[Bugfix] Fix models and tests for transformers v5 (#33977 ) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 21:47:41 +08:00
Harry Mellor	51a7bda625	Update `WeightTransferConfig` to be more standard like the others (#33989 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 13:15:00 +00:00
SorenDreano	6e7b1c4b59	[Docs] Improve documentation (#33799 ) Co-authored-by: Soren Dreano <soren@numind.ai> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-06 12:57:09 +00:00
Kurt Shuster	2991dd3d22	[Bugfix][Model] Support LoRA on Qwen3 Output Embedding (#29816 ) Signed-off-by: kurt <kurt@thinkingmachines.ai>	2026-02-06 20:25:31 +08:00
Luka Govedič	ac32e66cf9	[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-06 04:19:49 -08:00
Fadi Arafeh	f79d9dce16	[CPU][BugFix] Fix loading of w8a8int models with bias (#33582 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-02-06 11:59:20 +00:00
Harry Mellor	ba5cbbf107	Bump HF Hub client to get bug fix (#33984 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 11:25:33 +00:00
zhang-prog	233b26ab35	[PaddleOCR-VL] Add BC for transformers 5.0 config (#33976 ) Signed-off-by: zhangyue66 <zhangyue66@baidu.com>	2026-02-06 10:33:49 +00:00
Harry Mellor	791a94bed0	Consolidate and fix forbidden import `pre-commit` checks (#33982 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 01:47:41 -08:00
Xinyu Chen	e969a169ef	support view_from_cpu_tensor on XPU (#33868 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>	2026-02-06 08:34:20 +00:00
Harry Mellor	6d8d34be6d	Fix `main` pre-commit (#33975 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 00:08:05 -08:00
Gassan Salama	1363e3d6d5	[cpu][performance] CPU Paged Attention NEON BFMMLA BF16 Implementation (#32263 ) Signed-off-by: Gassan <gassan.salama@arm.com>	2026-02-06 15:01:48 +08:00
chengchengpei	965525667b	Onboard voyage-4-nano (#33720 ) Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com> Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-06 06:23:34 +00:00
sihao_li	6550815c3a	[XPU]Replace pip in docker.xpu with uv pip (#31112 ) Signed-off-by: sihao.li <sihao.li@intel.com>	2026-02-06 14:02:33 +08:00
Kunshang Ji	7439e4f41b	[XPU][4/N] add mxfp4 moe model support (#33679 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-06 13:03:59 +08:00
R3hankhan	ac04dd374f	[CPU] Add BF16 Kernel type for s390x (#33788 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-06 04:57:02 +00:00
Cyrus Leung	035a6cb09a	[Misc] Update code for encoder-decoder models (#33900 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-06 11:38:39 +08:00
Mingliang Li	a32cb49b60	feat(frontend): early-fail tokenization guard for user requests (#31366 ) Signed-off-by: limingliang <limingliang@stepfun.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: limingliang <limingliang@stepfun.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-05 19:38:02 -08:00
Rabi Mishra	20d7454c9b	fix(ROCm): Make flash_attn import optional in MLA attention (#33511 ) Signed-off-by: rabi <ramishra@redhat.com>	2026-02-06 02:22:53 +00:00
Simon Mo	5819ca8944	[Docs] Add reo analytics (#33957 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2026-02-05 17:42:22 -08:00
Xin Yang	79028d4388	[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568 )	2026-02-05 20:34:00 -05:00
emricksini-h	325ab6b0a8	[Feature] OTEL tracing during loading (#31162 )	2026-02-05 16:59:28 -08:00
Wei Zhao	91a07ff618	[Bugfix] Fix DeepSeek v3.2 tokenizer outputting None issue (#33832 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-05 23:50:49 +00:00
Hashem Hashemi	d5c4800112	Adds padding and perf improvements to wvSplitK_fp8 (#33527 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-05 22:16:02 +00:00
Lumosis	42d5d705f9	[Minor] Sort safetensors files to ensure deterministic loading order (#33491 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2026-02-05 17:05:09 -05:00
Cyrus Leung	116880a5a0	[Bugfix] Make MM batching more robust (#33817 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-05 20:40:58 +00:00
Matthew Bonanni	4145e50d85	[Bugfix] Fix DSV3.2 NVFP4 (#33932 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-05 19:22:19 +00:00
Nicolò Lucchesi	20f5d185a6	[Misc] Rename `translations` to `speech_to_text` for OAI serving component (#33904 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-05 19:16:52 +00:00
Harry Mellor	1887acca9e	Fix tokenizer test for renamed attr on Transformers v5 (#33902 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-05 19:16:20 +00:00
Tsukasa OI	92e7562a99	[Bugfix] Suppress non-TTY color output on the process name part of the log (#29714 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2026-02-05 18:47:09 +00:00
Isotr0py	87d0d17ab5	[Models] Consolidate Deepseek-OCR2 processor (#33909 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-05 18:29:20 +00:00
bnellnm	a57c8228ff	[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-05 18:07:18 +00:00
zackyoray	1ee95841bd	[Bugfix] Fix swapped engine_ids in NIXL Llama 4 local attention path (#33795 ) Signed-off-by: Yoray Zack <yorayz@nvidia.com>	2026-02-05 17:51:58 +00:00
Nicolò Lucchesi	7d8c6804e2	[Misc] Add debug logs (#33931 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-05 09:42:40 -08:00
Benjamin Chislett	af3162d3aa	[Spec Decode] Unified Parallel Drafting (#32887 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-05 12:37:18 -05:00
danisereb	5b2a9422f0	[BugFix] Fix LoRA Fp8 (#33879 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-05 17:25:55 +00:00
Aaron Hao	c1858b7ec8	[Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Co-authored-by: SumanthRH <sumanthrh99@gmail.com>	2026-02-05 12:13:23 -05:00
Mario Hong	82914d2ae8	[Bugfix] Fix step3p5 parser when using mtp (#33690 ) Signed-off-by: mariohong <mariohong128@gmail.com>	2026-02-05 16:04:04 +00:00

... 13 14 15 16 17 ...

14386 Commits