biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
zifeitong	52181baaea	Update DeepGEMM version pin in Dockerfile to match #32479 (#33935 ) Signed-off-by: Zifei Tong <zifeitong@gmail.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-07 05:30:22 -08:00
Rohan Potdar	de3869bb4d	move checks out of `unified_kv_cache_update` custom op (#33943 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-07 05:30:09 -08:00
whx	ce9b3cd3e9	[PluggableLayer][3/N] Apply PluggableLayer to mamba layers. (#33660 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-02-07 05:26:05 -08:00
Jee Jee Li	db4ede9743	[Model] Enable Step3p5ForCausalLM testing (#33755 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-07 05:25:24 -08:00
Pooya Davoodi	2cb2340f7a	[Frontend]Add support for transcriptions and translations to run_batch (#33934 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-07 05:24:57 -08:00
TundeAtSN	4df44c16ba	Enable Eagle3 speculative decoding for Mistral3ForConditionalGeneration to support eagle3 (#33939 ) Signed-off-by: Akintunde Oladipo <akintunde.oladipo@servicenow.com> Signed-off-by: TundeAtSN <akintunde.oladipo@servicenow.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-07 05:24:52 -08:00
Richard Zou	81fe69cae5	[torch.compile] Stop compiling identical artifacts (#34003 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-07 05:24:48 -08:00
Mohammad Miadh Angkad	dd6a6e1190	[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006 ) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-07 05:24:44 -08:00
Cyrus Leung	edb359cce4	[Renderer] Define `render_cmpl` and `render_chat` (#34039 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 05:24:40 -08:00
wang.yuqi	6ed5eda300	[CI][Build] Pin grpcio-tools==1.78.0 (#34048 ) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-07 05:24:35 -08:00
Cyrus Leung	11a4c9d30d	[Misc] Simplify `get_max_tokens` (#34036 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 00:59:49 -08:00
lukec	15a0b9e570	Fix spelling errors (#33978 )	2026-02-06 23:58:50 -08:00
Andreas Karatzas	c490d8cc73	[ROCm][CI] Pinning lm-eval version to resolve multi-modal small eval bug (#34038 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-06 22:21:08 -08:00
Cyrus Leung	48312e579a	[Misc] Make `PlaceholderRange.get_num_embeds` a method (#34035 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 05:30:17 +00:00
Vel	bc32444b23	[Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support (#33517 ) Signed-off-by: code4me2 <velvetmoon222999@gmail.com>	2026-02-06 20:28:01 -08:00
Wentao Ye	18e8545297	[Revert] Add util `handle_deprecated` back (#33998 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-07 04:14:45 +00:00
果冻虾仁	6f7adc533a	fix description in plugin_system.md (#33999 )	2026-02-06 19:37:02 -08:00
Nick Hill	40218a82ba	[ModelRunner V2] Revert token rank comparison difference for now (#34017 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-07 11:11:05 +08:00
kourosh hakhamaneshi	1c3b22058f	[Misc] Add backward-compatible import aliases for renamed translations module (#34015 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-07 11:01:41 +08:00
Xin Yang	3920cafdd6	[Bugfix] Fix _fused_moe_lora_expand signature mismatch (#33821 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-07 10:45:59 +08:00
rasmith	ec28784fdc	[CI][AMD]Bugfix] Check that model_config is not None in enable_norm_pad_fusion (#34007 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-07 02:43:25 +00:00
Nicolò Lucchesi	55aeec04f5	[Bugfix] Fix Whisper tokenization (#34011 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-07 10:42:52 +08:00
Ikenna	906077181b	[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 (#33967 ) Signed-off-by: Ikenna <ikennachifo@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-07 02:27:33 +00:00
Aaron Hao	89a385d79f	[Feat][RL] Pause and Resume with keep requests for single engine (#32351 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-07 00:08:58 +00:00
kourosh hakhamaneshi	4a2d00eafd	[bugfix] [ROCm] Fix premature CUDA initialization in platform detection (#33941 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2026-02-06 16:17:55 -06:00
Dimitrios Bariamis	207c3a0c20	Fix RoutingMethodType logic (#33919 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2026-02-06 14:03:34 -08:00
Sumanth R Hegde	ae2e93f89b	[Fix] Fix `logprobs=0` handling for `/inference/v1/generate` endpoint (#34010 ) Signed-off-by: SumanthRH <sumanthrh99@gmail.com>	2026-02-06 20:33:40 +00:00
xuebwang-amd	9e9acce577	[Bugfix] Fix no attribute error of SharedFusedMoE (DeepSeek-V3.1 as test model) (#33993 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com>	2026-02-06 19:11:32 +00:00
Charlie Fu	fe5438200b	[Rocm][Bugfix] Fix dtype not same for gemm_a4w4 op (#33734 ) Signed-off-by: charlifu <charlifu@amd.com>	2026-02-06 19:09:59 +00:00
Wentao Ye	77c09e1130	[Refactor] Remove align block size logic in `moe_permute` (#33449 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-06 10:57:06 -08:00
zhrrr	16786da735	[Model Runner V2] support apply penalty for spec decode (#33251 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2026-02-06 10:56:48 -08:00
vllmellm	aaa2efbe98	[DOC] [ROCm] Update docker deployment doc (#33971 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 10:05:35 -08:00
Seiji Eicher	aca5967416	[KV Connector] Add missing method overrides to MultiConnector (#33292 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2026-02-06 12:58:21 -05:00
Wentao Ye	67a746e87f	[Log] Optimize duplicate startup log (#33944 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-06 17:49:56 +00:00
Chauncey	7bec435130	[Bugfix] Fix the issue where tool calling does not work when using fast detokenization with dsv32 (#33964 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-02-06 09:23:44 -08:00
Eldar Kurtić	5c52644b10	[Docs] Update link to Benchmark CLI documentation (#33254 ) Signed-off-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>	2026-02-06 16:00:59 +00:00
zofia	2ce9fe4ad0	[XPU][5/N] add wna16 xpu kernel (#33973 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>	2026-02-06 15:59:53 +00:00
Cyrus Leung	cd8b405bd0	[Refactor] Consolidate sequence normalization and enc-dec parsing (#33928 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-06 15:43:47 +00:00
tc-mb	4707f7ebb4	[Model] Support MiniCPM-o 4.5 (#33431 ) Signed-off-by: caitianchi <caitianchi@modelbest.cn> Signed-off-by: tc-mb <caitianchi@modelbest.cn> Co-authored-by: mslv <mslv@baai.ac.cn>	2026-02-06 15:29:10 +00:00
Michael Goin	c39ee9ee2b	[Docs] Add sections on process architecture and minimum CPU resources (#33940 ) It seems users can be confused about vLLM's performance when running with very small amounts of CPU cores available. We are missing a clear overview of what vLLM's process architecture is, so I added this along with some diagrams in arch_overview.md, and included a section on CPU resource recommendations in optimization.md Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-06 15:26:43 +00:00
Andreas Karatzas	350ca72c04	[ROCm][AITER] Fix AITER import regression for explicit backend selection (#33749 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-06 15:08:16 +00:00
FredericOdermatt	1fb0495a72	[FIX] guidance: use max(vocab_size, len(tokenizer)) for n_vocab (#33509 ) Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch>	2026-02-06 14:23:03 +00:00
Raushan Turganbay	85ee1d962b	[Bugfix] Fix models and tests for transformers v5 (#33977 ) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 21:47:41 +08:00
Harry Mellor	51a7bda625	Update `WeightTransferConfig` to be more standard like the others (#33989 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 13:15:00 +00:00
SorenDreano	6e7b1c4b59	[Docs] Improve documentation (#33799 ) Co-authored-by: Soren Dreano <soren@numind.ai> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-06 12:57:09 +00:00
Kurt Shuster	2991dd3d22	[Bugfix][Model] Support LoRA on Qwen3 Output Embedding (#29816 ) Signed-off-by: kurt <kurt@thinkingmachines.ai>	2026-02-06 20:25:31 +08:00
Luka Govedič	ac32e66cf9	[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-06 04:19:49 -08:00
Fadi Arafeh	f79d9dce16	[CPU][BugFix] Fix loading of w8a8int models with bias (#33582 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-02-06 11:59:20 +00:00
Harry Mellor	ba5cbbf107	Bump HF Hub client to get bug fix (#33984 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 11:25:33 +00:00
zhang-prog	233b26ab35	[PaddleOCR-VL] Add BC for transformers 5.0 config (#33976 ) Signed-off-by: zhangyue66 <zhangyue66@baidu.com>	2026-02-06 10:33:49 +00:00

1 2 3 4 5 ...

13716 Commits