Harry Mellor
|
51a7bda625
|
Update WeightTransferConfig to be more standard like the others (#33989)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-06 13:15:00 +00:00 |
|
SorenDreano
|
6e7b1c4b59
|
[Docs] Improve documentation (#33799)
Co-authored-by: Soren Dreano <soren@numind.ai>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-02-06 12:57:09 +00:00 |
|
Kurt Shuster
|
2991dd3d22
|
[Bugfix][Model] Support LoRA on Qwen3 Output Embedding (#29816)
Signed-off-by: kurt <kurt@thinkingmachines.ai>
|
2026-02-06 20:25:31 +08:00 |
|
Luka Govedič
|
ac32e66cf9
|
[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-06 04:19:49 -08:00 |
|
Fadi Arafeh
|
f79d9dce16
|
[CPU][BugFix] Fix loading of w8a8int models with bias (#33582)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-02-06 11:59:20 +00:00 |
|
Harry Mellor
|
ba5cbbf107
|
Bump HF Hub client to get bug fix (#33984)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-06 11:25:33 +00:00 |
|
zhang-prog
|
233b26ab35
|
[PaddleOCR-VL] Add BC for transformers 5.0 config (#33976)
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
|
2026-02-06 10:33:49 +00:00 |
|
Harry Mellor
|
791a94bed0
|
Consolidate and fix forbidden import pre-commit checks (#33982)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-06 01:47:41 -08:00 |
|
Xinyu Chen
|
e969a169ef
|
support view_from_cpu_tensor on XPU (#33868)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2026-02-06 08:34:20 +00:00 |
|
Harry Mellor
|
6d8d34be6d
|
Fix main pre-commit (#33975)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-06 00:08:05 -08:00 |
|
Gassan Salama
|
1363e3d6d5
|
[cpu][performance] CPU Paged Attention NEON BFMMLA BF16 Implementation (#32263)
Signed-off-by: Gassan <gassan.salama@arm.com>
|
2026-02-06 15:01:48 +08:00 |
|
chengchengpei
|
965525667b
|
Onboard voyage-4-nano (#33720)
Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com>
Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com>
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-06 06:23:34 +00:00 |
|
sihao_li
|
6550815c3a
|
[XPU]Replace pip in docker.xpu with uv pip (#31112)
Signed-off-by: sihao.li <sihao.li@intel.com>
|
2026-02-06 14:02:33 +08:00 |
|
Kunshang Ji
|
7439e4f41b
|
[XPU][4/N] add mxfp4 moe model support (#33679)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-06 13:03:59 +08:00 |
|
R3hankhan
|
ac04dd374f
|
[CPU] Add BF16 Kernel type for s390x (#33788)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-02-06 04:57:02 +00:00 |
|
Cyrus Leung
|
035a6cb09a
|
[Misc] Update code for encoder-decoder models (#33900)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-06 11:38:39 +08:00 |
|
Mingliang Li
|
a32cb49b60
|
feat(frontend): early-fail tokenization guard for user requests (#31366)
Signed-off-by: limingliang <limingliang@stepfun.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: limingliang <limingliang@stepfun.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-05 19:38:02 -08:00 |
|
Rabi Mishra
|
20d7454c9b
|
fix(ROCm): Make flash_attn import optional in MLA attention (#33511)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-02-06 02:22:53 +00:00 |
|
Simon Mo
|
5819ca8944
|
[Docs] Add reo analytics (#33957)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2026-02-05 17:42:22 -08:00 |
|
Xin Yang
|
79028d4388
|
[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel (#33568)
|
2026-02-05 20:34:00 -05:00 |
|
emricksini-h
|
325ab6b0a8
|
[Feature] OTEL tracing during loading (#31162)
|
2026-02-05 16:59:28 -08:00 |
|
Wei Zhao
|
91a07ff618
|
[Bugfix] Fix DeepSeek v3.2 tokenizer outputting None issue (#33832)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-02-05 23:50:49 +00:00 |
|
Hashem Hashemi
|
d5c4800112
|
Adds padding and perf improvements to wvSplitK_fp8 (#33527)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-05 22:16:02 +00:00 |
|
Lumosis
|
42d5d705f9
|
[Minor] Sort safetensors files to ensure deterministic loading order (#33491)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-02-05 17:05:09 -05:00 |
|
Cyrus Leung
|
116880a5a0
|
[Bugfix] Make MM batching more robust (#33817)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-05 20:40:58 +00:00 |
|
Matthew Bonanni
|
4145e50d85
|
[Bugfix] Fix DSV3.2 NVFP4 (#33932)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-05 19:22:19 +00:00 |
|
Nicolò Lucchesi
|
20f5d185a6
|
[Misc] Rename translations to speech_to_text for OAI serving component (#33904)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-05 19:16:52 +00:00 |
|
Harry Mellor
|
1887acca9e
|
Fix tokenizer test for renamed attr on Transformers v5 (#33902)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-05 19:16:20 +00:00 |
|
Tsukasa OI
|
92e7562a99
|
[Bugfix] Suppress non-TTY color output on the process name part of the log (#29714)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2026-02-05 18:47:09 +00:00 |
|
Isotr0py
|
87d0d17ab5
|
[Models] Consolidate Deepseek-OCR2 processor (#33909)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-05 18:29:20 +00:00 |
|
bnellnm
|
a57c8228ff
|
[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor (#33375)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-05 18:07:18 +00:00 |
|
zackyoray
|
1ee95841bd
|
[Bugfix] Fix swapped engine_ids in NIXL Llama 4 local attention path (#33795)
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
|
2026-02-05 17:51:58 +00:00 |
|
Nicolò Lucchesi
|
7d8c6804e2
|
[Misc] Add debug logs (#33931)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-05 09:42:40 -08:00 |
|
Benjamin Chislett
|
af3162d3aa
|
[Spec Decode] Unified Parallel Drafting (#32887)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-05 12:37:18 -05:00 |
|
danisereb
|
5b2a9422f0
|
[BugFix] Fix LoRA Fp8 (#33879)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-02-05 17:25:55 +00:00 |
|
Aaron Hao
|
c1858b7ec8
|
[Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
|
2026-02-05 12:13:23 -05:00 |
|
Mario Hong
|
82914d2ae8
|
[Bugfix] Fix step3p5 parser when using mtp (#33690)
Signed-off-by: mariohong <mariohong128@gmail.com>
|
2026-02-05 16:04:04 +00:00 |
|
Nicolò Lucchesi
|
81a90e5277
|
[Docs] Add bart-plugin to docs (#33905)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-05 12:20:25 +00:00 |
|
wang.yuqi
|
1c3a221d3b
|
[Bugfix] Fix corner case of sparse embedding (#33886)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 02:51:22 -08:00 |
|
Cyrus Leung
|
7bd42e609d
|
[Refactor] Clean up input preprocessing (#33687)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-05 18:43:42 +08:00 |
|
Isotr0py
|
a2522839d8
|
[Bugfix] Fix Kimi-K2.5 NVFP4 checkpoints weight loading (#33876)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-05 10:29:54 +00:00 |
|
jiahanc
|
59a5cb387a
|
[perf] Integrate flashinfer concat_mla_k (#31171)
|
2026-02-05 05:23:11 -05:00 |
|
liranschour
|
8322d4e47f
|
Enable Cross layers KV cache layout at NIXL Connector V2 (#33339)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-05 02:17:02 -08:00 |
|
Andreas Karatzas
|
3e472e81f9
|
[ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) (#32710)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-02-05 10:01:23 +00:00 |
|
Cyrus Leung
|
038914b7c8
|
[Refactor] Move task outside of PoolingParams.verify (#33796)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 09:33:11 +00:00 |
|
Pavani Majety
|
d2f4a71cd5
|
[Bugfix] Kimi-K2 grouped_topk usage for Flashinfer monolithic kernels. (#33858)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2026-02-05 09:32:10 +00:00 |
|
Mark McLoughlin
|
2abd97592f
|
[KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-02-05 09:57:27 +02:00 |
|
Chauncey
|
6abb0454ad
|
[Perf] Optimize the performance of structured output + reasoning (#33557)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-02-05 15:45:29 +08:00 |
|
Li, Jiang
|
db6f71d4c9
|
[CI/Build] Fix CPU CI test case title (#33870)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-02-05 15:07:14 +08:00 |
|
Fadi Arafeh
|
fd03538bf9
|
[CPU][BugFix] Allow w8a8 oneDNN quantized matmul to support 3D inputs (#33727)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-02-05 06:26:09 +00:00 |
|