Matthew Bonanni
|
4145e50d85
|
[Bugfix] Fix DSV3.2 NVFP4 (#33932)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-05 19:22:19 +00:00 |
|
Nicolò Lucchesi
|
20f5d185a6
|
[Misc] Rename translations to speech_to_text for OAI serving component (#33904)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-05 19:16:52 +00:00 |
|
Harry Mellor
|
1887acca9e
|
Fix tokenizer test for renamed attr on Transformers v5 (#33902)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-05 19:16:20 +00:00 |
|
Tsukasa OI
|
92e7562a99
|
[Bugfix] Suppress non-TTY color output on the process name part of the log (#29714)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2026-02-05 18:47:09 +00:00 |
|
Isotr0py
|
87d0d17ab5
|
[Models] Consolidate Deepseek-OCR2 processor (#33909)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-05 18:29:20 +00:00 |
|
bnellnm
|
a57c8228ff
|
[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-05 18:07:18 +00:00 |
|
zackyoray
|
1ee95841bd
|
[Bugfix] Fix swapped engine_ids in NIXL Llama 4 local attention path (#33795)
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
|
2026-02-05 17:51:58 +00:00 |
|
Nicolò Lucchesi
|
7d8c6804e2
|
[Misc] Add debug logs (#33931)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-05 09:42:40 -08:00 |
|
Benjamin Chislett
|
af3162d3aa
|
[Spec Decode] Unified Parallel Drafting (#32887)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-05 12:37:18 -05:00 |
|
danisereb
|
5b2a9422f0
|
[BugFix] Fix LoRA Fp8 (#33879)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-02-05 17:25:55 +00:00 |
|
Aaron Hao
|
c1858b7ec8
|
[Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
|
2026-02-05 12:13:23 -05:00 |
|
Mario Hong
|
82914d2ae8
|
[Bugfix] Fix step3p5 parser when using mtp (#33690)
Signed-off-by: mariohong <mariohong128@gmail.com>
|
2026-02-05 16:04:04 +00:00 |
|
Nicolò Lucchesi
|
81a90e5277
|
[Docs] Add bart-plugin to docs (#33905)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-05 12:20:25 +00:00 |
|
wang.yuqi
|
1c3a221d3b
|
[Bugfix] Fix corner case of sparse embedding (#33886)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 02:51:22 -08:00 |
|
Cyrus Leung
|
7bd42e609d
|
[Refactor] Clean up input preprocessing (#33687)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-05 18:43:42 +08:00 |
|
Isotr0py
|
a2522839d8
|
[Bugfix] Fix Kimi-K2.5 NVFP4 checkpoints weight loading (#33876)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-05 10:29:54 +00:00 |
|
jiahanc
|
59a5cb387a
|
[perf] Integrate flashinfer concat_mla_k (#31171)
|
2026-02-05 05:23:11 -05:00 |
|
liranschour
|
8322d4e47f
|
Enable Cross layers KV cache layout at NIXL Connector V2 (#33339)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-05 02:17:02 -08:00 |
|
Andreas Karatzas
|
3e472e81f9
|
[ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) (#32710)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-02-05 10:01:23 +00:00 |
|
Cyrus Leung
|
038914b7c8
|
[Refactor] Move task outside of PoolingParams.verify (#33796)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 09:33:11 +00:00 |
|
Pavani Majety
|
d2f4a71cd5
|
[Bugfix] Kimi-K2 grouped_topk usage for Flashinfer monolithic kernels. (#33858)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2026-02-05 09:32:10 +00:00 |
|
Mark McLoughlin
|
2abd97592f
|
[KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-02-05 09:57:27 +02:00 |
|
Chauncey
|
6abb0454ad
|
[Perf] Optimize the performance of structured output + reasoning (#33557)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-02-05 15:45:29 +08:00 |
|
Li, Jiang
|
db6f71d4c9
|
[CI/Build] Fix CPU CI test case title (#33870)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-02-05 15:07:14 +08:00 |
|
Fadi Arafeh
|
fd03538bf9
|
[CPU][BugFix] Allow w8a8 oneDNN quantized matmul to support 3D inputs (#33727)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-02-05 06:26:09 +00:00 |
|
Andreas Karatzas
|
1f70313e59
|
[Bugfix] Fix ScoreMultiModalParam multi-document scoring returning single result (#33837)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 06:17:00 +00:00 |
|
Li, Jiang
|
07daee132b
|
[CI/Build] Parallelize CPU CI tests (#33778)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-02-05 13:53:48 +08:00 |
|
Andrew Xia
|
9595afda18
|
[2/N] move responses/serving _make_response_output_items logic to parser (#33281)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2026-02-05 13:46:15 +08:00 |
|
rasmith
|
c1395f72cd
|
[CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly (#33840)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-02-05 05:05:48 +00:00 |
|
rinbaro
|
007b183d74
|
[docs] fix unintentional misspellings (#33863)
Signed-off-by: rinbaro <ilgomishra@gmail.com>
|
2026-02-04 20:50:59 -08:00 |
|
Nick Hill
|
add9f1fbd9
|
[Minor] Include StreamingInput in inputs package (#33856)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-05 04:38:20 +00:00 |
|
Luka Govedič
|
e3bf79ffa0
|
Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" (#33841)
|
2026-02-04 19:54:27 -08:00 |
|
Andreas Karatzas
|
fb1270f1f8
|
[CI][Bugfix]: return McpCall for built-in MCP tools in non-streaming mode (#32762)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-05 11:14:06 +08:00 |
|
Kevin H. Luu
|
72bb24e2db
|
[release] Minor fixes to release annotation (#33849)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-02-05 02:07:35 +00:00 |
|
Chauncey
|
a7be77beef
|
[Bugfix] fix DeepSeek R1 with CUTLASS MLA Broken on B200 (#33637)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-02-05 01:28:36 +00:00 |
|
zhanqiuhu
|
bbe0574d8e
|
[Bugfix] Disable TRTLLM attention when KV transfer is enabled (#33192)
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
v0.15.2rc0
|
2026-02-05 00:49:18 +00:00 |
|
Luka Govedič
|
4d9513537d
|
[CI][torch.compile] Reduce e2e fusion test time (#33293)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-04 19:09:03 -05:00 |
|
Ilya Boytsov
|
439afa4eea
|
feat: Add ColBERT late interaction model support (#33686)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 08:05:13 +08:00 |
|
Nick Hill
|
fa4e0fb028
|
[Core] Don't schedule spec tokens with prefill chunks (#33652)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-04 23:40:22 +00:00 |
|
Sage Moore
|
ce498a6d61
|
Change the type signature of MixtureOfExperts.expert_weights to MutableSequence[Sequence[Tensor]] (#33573)
Signed-off-by: Sage Moore <sagmoore@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-04 17:02:46 -05:00 |
|
Richard Zou
|
9f14c9224d
|
Revert "[torch.compile] Significantly speed up cold start times" (#33820)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-04 21:59:59 +00:00 |
|
Muhammad Hashmi
|
535de06cb1
|
[Model] Add transcription support for Qwen3-Omni (#29828)
Signed-off-by: Muhammad Hashmi <mhashmi@berkeley.edu>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
|
2026-02-04 21:17:47 +00:00 |
|
Simon Danielsson
|
4292c90a2a
|
[Bugfix] Support RotaryEmbedding CustomOp for gpt-oss (#33800)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
2026-02-04 20:17:41 +00:00 |
|
Taeksang Kim
|
6e98f6d8b6
|
Implement zero-copy GQA for multimodal and CPU (#33732)
Signed-off-by: Taeksang Kim <ts.kim@hyperaccel.ai>
|
2026-02-04 20:11:39 +00:00 |
|
kourosh hakhamaneshi
|
2f6d17cb2f
|
[rocm][ray] Fix: Unify Ray device visibility handling across CUDA and ROCm (#33308)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2026-02-04 10:09:14 -08:00 |
|
Isotr0py
|
192ad4648b
|
[Bugfix] Fix interns1-pro initialization and PP (#33793)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-04 17:54:45 +00:00 |
|
Lucas Wilkinson
|
0e92298622
|
[Misc] Delay deprecation of CommonAttentionMetadata properties (#33801)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-04 08:41:57 -08:00 |
|
jiangkuaixue123
|
87d9a26166
|
[Bugfix] Fix ubatch wrapper num_tokens calculate (#33694)
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
|
2026-02-04 16:41:45 +00:00 |
|
Cyrus Leung
|
80f921ba4b
|
[Bugfix] Fix normalize still being passed to PoolerConfig (#33794)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-04 23:56:02 +08:00 |
|
Wentao Ye
|
711edaf0d0
|
[Perf] Optimize spec decoding + async scheduling, 1.5% Throughput improvement (#33612)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2026-02-04 09:34:32 -05:00 |
|