Jee Jee Li
db4ede9743
[Model] Enable Step3p5ForCausalLM testing ( #33755 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-07 05:25:24 -08:00
Pooya Davoodi
2cb2340f7a
[Frontend]Add support for transcriptions and translations to run_batch ( #33934 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-07 05:24:57 -08:00
Richard Zou
81fe69cae5
[torch.compile] Stop compiling identical artifacts ( #34003 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-07 05:24:48 -08:00
Cyrus Leung
edb359cce4
[Renderer] Define render_cmpl and render_chat ( #34039 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 05:24:40 -08:00
lukec
15a0b9e570
Fix spelling errors ( #33978 )
2026-02-06 23:58:50 -08:00
Cyrus Leung
48312e579a
[Misc] Make PlaceholderRange.get_num_embeds a method ( #34035 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 05:30:17 +00:00
Ikenna
906077181b
[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 ( #33967 )
...
Signed-off-by: Ikenna <ikennachifo@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-07 02:27:33 +00:00
Aaron Hao
89a385d79f
[Feat][RL] Pause and Resume with keep requests for single engine ( #32351 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-07 00:08:58 +00:00
kourosh hakhamaneshi
4a2d00eafd
[bugfix] [ROCm] Fix premature CUDA initialization in platform detection ( #33941 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
2026-02-06 16:17:55 -06:00
Sumanth R Hegde
ae2e93f89b
[Fix] Fix logprobs=0 handling for /inference/v1/generate endpoint ( #34010 )
...
Signed-off-by: SumanthRH <sumanthrh99@gmail.com >
2026-02-06 20:33:40 +00:00
Wentao Ye
77c09e1130
[Refactor] Remove align block size logic in moe_permute ( #33449 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-06 10:57:06 -08:00
Seiji Eicher
aca5967416
[KV Connector] Add missing method overrides to MultiConnector ( #33292 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2026-02-06 12:58:21 -05:00
Cyrus Leung
cd8b405bd0
[Refactor] Consolidate sequence normalization and enc-dec parsing ( #33928 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-06 15:43:47 +00:00
Andreas Karatzas
350ca72c04
[ROCm][AITER] Fix AITER import regression for explicit backend selection ( #33749 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-06 15:08:16 +00:00
Raushan Turganbay
85ee1d962b
[Bugfix] Fix models and tests for transformers v5 ( #33977 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 21:47:41 +08:00
Kurt Shuster
2991dd3d22
[Bugfix][Model] Support LoRA on Qwen3 Output Embedding ( #29816 )
...
Signed-off-by: kurt <kurt@thinkingmachines.ai >
2026-02-06 20:25:31 +08:00
Luka Govedič
ac32e66cf9
[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) ( #33731 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: ProExpertProg <luka.govedic@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-06 04:19:49 -08:00
Xinyu Chen
e969a169ef
support view_from_cpu_tensor on XPU ( #33868 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2026-02-06 08:34:20 +00:00
chengchengpei
965525667b
Onboard voyage-4-nano ( #33720 )
...
Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com >
Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com >
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-06 06:23:34 +00:00
Mingliang Li
a32cb49b60
feat(frontend): early-fail tokenization guard for user requests ( #31366 )
...
Signed-off-by: limingliang <limingliang@stepfun.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: limingliang <limingliang@stepfun.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-05 19:38:02 -08:00
Xin Yang
79028d4388
[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel ( #33568 )
2026-02-05 20:34:00 -05:00
emricksini-h
325ab6b0a8
[Feature] OTEL tracing during loading ( #31162 )
2026-02-05 16:59:28 -08:00
Hashem Hashemi
d5c4800112
Adds padding and perf improvements to wvSplitK_fp8 ( #33527 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-05 22:16:02 +00:00
Cyrus Leung
116880a5a0
[Bugfix] Make MM batching more robust ( #33817 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-05 20:40:58 +00:00
Harry Mellor
1887acca9e
Fix tokenizer test for renamed attr on Transformers v5 ( #33902 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-05 19:16:20 +00:00
bnellnm
a57c8228ff
[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor ( #33375 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-05 18:07:18 +00:00
Benjamin Chislett
af3162d3aa
[Spec Decode] Unified Parallel Drafting ( #32887 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-05 12:37:18 -05:00
Aaron Hao
c1858b7ec8
[Feat][RL][1/2] Native Weight Syncing API: NCCL ( #31943 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Co-authored-by: SumanthRH <sumanthrh99@gmail.com >
2026-02-05 12:13:23 -05:00
Mario Hong
82914d2ae8
[Bugfix] Fix step3p5 parser when using mtp ( #33690 )
...
Signed-off-by: mariohong <mariohong128@gmail.com >
2026-02-05 16:04:04 +00:00
wang.yuqi
1c3a221d3b
[Bugfix] Fix corner case of sparse embedding ( #33886 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 02:51:22 -08:00
liranschour
8322d4e47f
Enable Cross layers KV cache layout at NIXL Connector V2 ( #33339 )
...
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-05 02:17:02 -08:00
Andreas Karatzas
3e472e81f9
[ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) ( #32710 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-02-05 10:01:23 +00:00
Cyrus Leung
038914b7c8
[Refactor] Move task outside of PoolingParams.verify ( #33796 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 09:33:11 +00:00
Mark McLoughlin
2abd97592f
[KV Connector][Metrics] Do not count local prefix cache hits in connector queries ( #30522 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-05 09:57:27 +02:00
Andreas Karatzas
1f70313e59
[Bugfix] Fix ScoreMultiModalParam multi-document scoring returning single result ( #33837 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 06:17:00 +00:00
rasmith
c1395f72cd
[CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly ( #33840 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-05 05:05:48 +00:00
Nick Hill
add9f1fbd9
[Minor] Include StreamingInput in inputs package ( #33856 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-05 04:38:20 +00:00
Andreas Karatzas
fb1270f1f8
[CI][Bugfix]: return McpCall for built-in MCP tools in non-streaming mode ( #32762 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-05 11:14:06 +08:00
Luka Govedič
4d9513537d
[CI][torch.compile] Reduce e2e fusion test time ( #33293 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: ProExpertProg <luka.govedic@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-04 19:09:03 -05:00
Ilya Boytsov
439afa4eea
feat: Add ColBERT late interaction model support ( #33686 )
...
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com >
Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 08:05:13 +08:00
Nick Hill
fa4e0fb028
[Core] Don't schedule spec tokens with prefill chunks ( #33652 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-04 23:40:22 +00:00
Richard Zou
9f14c9224d
Revert "[torch.compile] Significantly speed up cold start times" ( #33820 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-04 21:59:59 +00:00
Simon Danielsson
4292c90a2a
[Bugfix] Support RotaryEmbedding CustomOp for gpt-oss ( #33800 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
2026-02-04 20:17:41 +00:00
kourosh hakhamaneshi
2f6d17cb2f
[rocm][ray] Fix: Unify Ray device visibility handling across CUDA and ROCm ( #33308 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
2026-02-04 10:09:14 -08:00
Isotr0py
192ad4648b
[Bugfix] Fix interns1-pro initialization and PP ( #33793 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-04 17:54:45 +00:00
Cyrus Leung
80f921ba4b
[Bugfix] Fix normalize still being passed to PoolerConfig ( #33794 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-04 23:56:02 +08:00
Or Ozeri
8e32690869
[KV Connector][BugFix] scheduler: Delay freeing blocks of aborted async loads ( #32255 )
...
Fixes a not-yet-reported case where it was possible for blocks to be
freed by an abort before an async transfer completed, resulting
in corrupted KV data.
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-02-04 11:16:34 +00:00
zhanqiuhu
4403e3ed4c
[Metrics] Add labeled prompt token metrics for P/D disaggregation ( #33290 )
...
Add labeled Prometheus metrics to distinguish where prompt tokens come
from in P/D disaggregated deployments.
In P/D disaggregation, decode instances receive KV cache from prefill instances.
Currently, decode reports inflated prompt throughput because it counts all
prompt tokens as "computed", even though most were transferred.
This PR adds labeled metrics so users can understand actual compute work vs
transferred work:
vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally
vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer
vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache
vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
2026-02-04 07:46:48 +00:00
Frank Wang
45f8fd6f97
[Feature] Enable TRITON_ATTN for Batch Invariance ( #33688 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
2026-02-04 13:27:34 +08:00
R3hankhan
4dffc5e044
[CPU] Split attention dispatch by head_dim alignment ( #32161 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-02-03 19:37:15 -08:00