zifeitong
52181baaea
Update DeepGEMM version pin in Dockerfile to match #32479 ( #33935 )
...
Signed-off-by: Zifei Tong <zifeitong@gmail.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-07 05:30:22 -08:00
Rohan Potdar
de3869bb4d
move checks out of unified_kv_cache_update custom op ( #33943 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-07 05:30:09 -08:00
whx
ce9b3cd3e9
[PluggableLayer][3/N] Apply PluggableLayer to mamba layers. ( #33660 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-02-07 05:26:05 -08:00
Jee Jee Li
db4ede9743
[Model] Enable Step3p5ForCausalLM testing ( #33755 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-07 05:25:24 -08:00
Pooya Davoodi
2cb2340f7a
[Frontend]Add support for transcriptions and translations to run_batch ( #33934 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-07 05:24:57 -08:00
TundeAtSN
4df44c16ba
Enable Eagle3 speculative decoding for Mistral3ForConditionalGeneration to support eagle3 ( #33939 )
...
Signed-off-by: Akintunde Oladipo <akintunde.oladipo@servicenow.com >
Signed-off-by: TundeAtSN <akintunde.oladipo@servicenow.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-07 05:24:52 -08:00
Richard Zou
81fe69cae5
[torch.compile] Stop compiling identical artifacts ( #34003 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-07 05:24:48 -08:00
Mohammad Miadh Angkad
dd6a6e1190
[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune ( #34006 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-07 05:24:44 -08:00
Cyrus Leung
edb359cce4
[Renderer] Define render_cmpl and render_chat ( #34039 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 05:24:40 -08:00
wang.yuqi
6ed5eda300
[CI][Build] Pin grpcio-tools==1.78.0 ( #34048 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-07 05:24:35 -08:00
Cyrus Leung
11a4c9d30d
[Misc] Simplify get_max_tokens ( #34036 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 00:59:49 -08:00
lukec
15a0b9e570
Fix spelling errors ( #33978 )
2026-02-06 23:58:50 -08:00
Andreas Karatzas
c490d8cc73
[ROCm][CI] Pinning lm-eval version to resolve multi-modal small eval bug ( #34038 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-06 22:21:08 -08:00
Cyrus Leung
48312e579a
[Misc] Make PlaceholderRange.get_num_embeds a method ( #34035 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 05:30:17 +00:00
Vel
bc32444b23
[Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support ( #33517 )
...
Signed-off-by: code4me2 <velvetmoon222999@gmail.com >
2026-02-06 20:28:01 -08:00
Wentao Ye
18e8545297
[Revert] Add util handle_deprecated back ( #33998 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-07 04:14:45 +00:00
果冻虾仁
6f7adc533a
fix description in plugin_system.md ( #33999 )
2026-02-06 19:37:02 -08:00
Nick Hill
40218a82ba
[ModelRunner V2] Revert token rank comparison difference for now ( #34017 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-07 11:11:05 +08:00
kourosh hakhamaneshi
1c3b22058f
[Misc] Add backward-compatible import aliases for renamed translations module ( #34015 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-07 11:01:41 +08:00
Xin Yang
3920cafdd6
[Bugfix] Fix _fused_moe_lora_expand signature mismatch ( #33821 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-07 10:45:59 +08:00
rasmith
ec28784fdc
[CI][AMD]Bugfix] Check that model_config is not None in enable_norm_pad_fusion ( #34007 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-07 02:43:25 +00:00
Nicolò Lucchesi
55aeec04f5
[Bugfix] Fix Whisper tokenization ( #34011 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-07 10:42:52 +08:00
Ikenna
906077181b
[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 ( #33967 )
...
Signed-off-by: Ikenna <ikennachifo@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-07 02:27:33 +00:00
Aaron Hao
89a385d79f
[Feat][RL] Pause and Resume with keep requests for single engine ( #32351 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-07 00:08:58 +00:00
kourosh hakhamaneshi
4a2d00eafd
[bugfix] [ROCm] Fix premature CUDA initialization in platform detection ( #33941 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
2026-02-06 16:17:55 -06:00
Dimitrios Bariamis
207c3a0c20
Fix RoutingMethodType logic ( #33919 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2026-02-06 14:03:34 -08:00
Sumanth R Hegde
ae2e93f89b
[Fix] Fix logprobs=0 handling for /inference/v1/generate endpoint ( #34010 )
...
Signed-off-by: SumanthRH <sumanthrh99@gmail.com >
2026-02-06 20:33:40 +00:00
xuebwang-amd
9e9acce577
[Bugfix] Fix no attribute error of SharedFusedMoE (DeepSeek-V3.1 as test model) ( #33993 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
2026-02-06 19:11:32 +00:00
Charlie Fu
fe5438200b
[Rocm][Bugfix] Fix dtype not same for gemm_a4w4 op ( #33734 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2026-02-06 19:09:59 +00:00
Wentao Ye
77c09e1130
[Refactor] Remove align block size logic in moe_permute ( #33449 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-06 10:57:06 -08:00
zhrrr
16786da735
[Model Runner V2] support apply penalty for spec decode ( #33251 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2026-02-06 10:56:48 -08:00
vllmellm
aaa2efbe98
[DOC] [ROCm] Update docker deployment doc ( #33971 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 10:05:35 -08:00
Seiji Eicher
aca5967416
[KV Connector] Add missing method overrides to MultiConnector ( #33292 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2026-02-06 12:58:21 -05:00
Wentao Ye
67a746e87f
[Log] Optimize duplicate startup log ( #33944 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-06 17:49:56 +00:00
Chauncey
7bec435130
[Bugfix] Fix the issue where tool calling does not work when using fast detokenization with dsv32 ( #33964 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-06 09:23:44 -08:00
Eldar Kurtić
5c52644b10
[Docs] Update link to Benchmark CLI documentation ( #33254 )
...
Signed-off-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com >
2026-02-06 16:00:59 +00:00
zofia
2ce9fe4ad0
[XPU][5/N] add wna16 xpu kernel ( #33973 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2026-02-06 15:59:53 +00:00
Cyrus Leung
cd8b405bd0
[Refactor] Consolidate sequence normalization and enc-dec parsing ( #33928 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-06 15:43:47 +00:00
tc-mb
4707f7ebb4
[Model] Support MiniCPM-o 4.5 ( #33431 )
...
Signed-off-by: caitianchi <caitianchi@modelbest.cn >
Signed-off-by: tc-mb <caitianchi@modelbest.cn >
Co-authored-by: mslv <mslv@baai.ac.cn >
2026-02-06 15:29:10 +00:00
Michael Goin
c39ee9ee2b
[Docs] Add sections on process architecture and minimum CPU resources ( #33940 )
...
It seems users can be confused about vLLM's performance when running
with very small amounts of CPU cores available. We are missing a clear
overview of what vLLM's process architecture is, so I added this along with
some diagrams in arch_overview.md, and included a section on CPU resource
recommendations in optimization.md
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-06 15:26:43 +00:00
Andreas Karatzas
350ca72c04
[ROCm][AITER] Fix AITER import regression for explicit backend selection ( #33749 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-06 15:08:16 +00:00
FredericOdermatt
1fb0495a72
[FIX] guidance: use max(vocab_size, len(tokenizer)) for n_vocab ( #33509 )
...
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch >
2026-02-06 14:23:03 +00:00
Raushan Turganbay
85ee1d962b
[Bugfix] Fix models and tests for transformers v5 ( #33977 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 21:47:41 +08:00
Harry Mellor
51a7bda625
Update WeightTransferConfig to be more standard like the others ( #33989 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 13:15:00 +00:00
SorenDreano
6e7b1c4b59
[Docs] Improve documentation ( #33799 )
...
Co-authored-by: Soren Dreano <soren@numind.ai >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-06 12:57:09 +00:00
Kurt Shuster
2991dd3d22
[Bugfix][Model] Support LoRA on Qwen3 Output Embedding ( #29816 )
...
Signed-off-by: kurt <kurt@thinkingmachines.ai >
2026-02-06 20:25:31 +08:00
Luka Govedič
ac32e66cf9
[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) ( #33731 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: ProExpertProg <luka.govedic@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-06 04:19:49 -08:00
Fadi Arafeh
f79d9dce16
[CPU][BugFix] Fix loading of w8a8int models with bias ( #33582 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-02-06 11:59:20 +00:00
Harry Mellor
ba5cbbf107
Bump HF Hub client to get bug fix ( #33984 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 11:25:33 +00:00
zhang-prog
233b26ab35
[PaddleOCR-VL] Add BC for transformers 5.0 config ( #33976 )
...
Signed-off-by: zhangyue66 <zhangyue66@baidu.com >
2026-02-06 10:33:49 +00:00