Nicolò Lucchesi
|
7337ff7f03
|
[Docs] PD with Nixl compat matrix (#38628)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-31 15:01:21 +00:00 |
|
Kyle Sayers
|
5869f69c5f
|
[Online Quant] [QeRL] Minor code cleanup (#38574)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-31 14:56:43 +00:00 |
|
wliao2
|
4dfad17ed1
|
replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-31 22:32:54 +08:00 |
|
wenjun liu
|
e8057c00bc
|
[CI] Avoid concurrent docker pull in intel XPU CI runners to prevent rate limit issues (#38594)
Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
|
2026-03-31 22:23:18 +08:00 |
|
Nicolò Lucchesi
|
7430389669
|
[Bugfix][CI] Skip flaky test_eagle test (#38566)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-31 09:42:37 -04:00 |
|
ElizaWszola
|
202f147cf2
|
Fix MLA runs when use_inductor_graph_partition=True (#38631)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
|
2026-03-31 13:37:43 +00:00 |
|
Jiangyun Zhu
|
ea7bfde6e4
|
[CI] fix LM Eval Qwen3.5 Models (B200) (#38632)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-31 13:20:08 +00:00 |
|
sihao_li
|
d71a15041f
|
[XPU]move testing dependencies from Dockerfile to xpu-test.in (#38596)
Signed-off-by: sihao.li <sihao.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-31 12:49:43 +00:00 |
|
Ilya Markov
|
abdbb68386
|
[EPLB] Add alternative communication for EPLB weight exchange (#33176)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
|
2026-03-31 08:17:12 -04:00 |
|
liuzhenwei
|
0c63739135
|
[EPD] update EPD script arguments (#36742)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2026-03-31 12:02:09 +00:00 |
|
wang.yuqi
|
719735d6c5
|
[CI Failure] pin colmodernvbert revision (#38612)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-31 10:54:54 +00:00 |
|
Maosheng Liao
|
aae3e688f8
|
Fix document of torchrun_example.py (#31113)
|
2026-03-31 10:54:23 +00:00 |
|
Matthew Bonanni
|
7d65463528
|
[WIP][CI][Bugfix] Fix test_run_eagle_dp (#38584)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-31 12:30:25 +02:00 |
|
Mateusz Sokół
|
8278825b57
|
DOC: TPU mention fix (#38129)
Signed-off-by: Mateusz Sokół <mat646@gmail.com>
|
2026-03-31 03:27:56 -07:00 |
|
Chang Su
|
acf7292bf2
|
[Misc] Move --grpc CLI argument into make_arg_parser (#38570)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-03-31 03:24:05 -07:00 |
|
Chauncey
|
ce884756f0
|
[Feature]: add presence_penalty and frequency_penalty fields to Responses API (#38613)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-31 08:45:57 +00:00 |
|
wang.yuqi
|
d9d21eb8e3
|
[Frontend][3/n] Improve pooling entrypoints | scoring. (#28631)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-31 07:52:00 +00:00 |
|
Yintong Lu
|
f09daea261
|
[CPU] Support int8 compute mode in CPU AWQ (#35697)
Signed-off-by: Yintong Lu <yintong.lu@intel.com>
|
2026-03-31 15:27:37 +08:00 |
|
Kevin H. Luu
|
42318c840b
|
[ci] Remove benchmarks job (#38611)
|
2026-03-31 06:46:21 +00:00 |
|
zhangyiming
|
1ac6694297
|
[OOT] Add OOT support for linear kernel. (#37989)
Signed-off-by: menogrey <1299267905@qq.com>
|
2026-03-31 14:33:21 +08:00 |
|
Kfir Toledo
|
6cc7abdc66
|
[kv_offload+HMA] Fix num_blocks with different per-layer page sizes and improve assert message (#38554)
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-31 06:00:40 +00:00 |
|
Flora Feng
|
d53cb9cb8e
|
[Tool Parser][2/3] Use self.tools instead of request.tools in tool parsers (#38189)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-31 13:41:36 +08:00 |
|
Louie Tsai
|
44eef0ca1e
|
vLLM Benchmark Suite perf regression after PR#32723 (#38576)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
|
2026-03-31 05:23:17 +00:00 |
|
Andreas Karatzas
|
b9cdc85207
|
[ROCm][CI] Fix Whisper translation test attention backend selection (#38508)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-31 13:21:49 +08:00 |
|
Flora Feng
|
3e802e8786
|
[Mypy] Fix adjust_request typing (#38264)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-31 04:21:18 +00:00 |
|
Martin Hickey
|
350af48e14
|
[KVConnector] Remove redundant method KVConnectorOutput::merge() (#38546)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2026-03-31 07:11:02 +03:00 |
|
Lucas Kabela
|
e31915063d
|
[Bugfix] Fix for builtins (forward fix of pytorch/177558) (#37234)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-03-31 01:08:11 +00:00 |
|
Flora Feng
|
29e48707e8
|
[Refactor] Consolidate Tool type alias in tool_parsers/utils.py (#38265)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-31 00:55:51 +00:00 |
|
sungsoo ha
|
4ac227222f
|
[Bugfix][DCP] Fix CUDA graph capture for Decode Context Parallelism (#36070)
Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-30 20:20:43 -04:00 |
|
Vadim Gimpelson
|
bb51d5b40d
|
Add @vadiklyutiy as committer (#38589)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-31 07:50:04 +08:00 |
|
Prathmesh Bhatt
|
93b3ec1585
|
feat(attention): extract KV-cache update from FlashAttentionDiffKV ba… (#36466)
Signed-off-by: Prathmesh Bhatt <71340361+Prathmesh234@users.noreply.github.com>
|
2026-03-30 23:16:09 +00:00 |
|
Netanel Haber
|
e812bf70bd
|
Restore non-hf processor path for Nano-Nemotron-VL (bypass call_hf_processor_mm_only) - fixes #38018 (#38567)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
|
2026-03-30 21:56:52 +00:00 |
|
SandishKumarHN
|
bcc6f67447
|
[Bugfix] Use null block (0) for padded block table entries (#35431)
Signed-off-by: SandishKumarHN <sandish@fb.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-30 14:02:51 -07:00 |
|
Asaf Gardin
|
1fc69f59bb
|
[Bug fix][Quantization] Fix dummy weight loading (#38478)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-03-30 16:38:02 -04:00 |
|
Micah Williamson
|
d9c7db18da
|
[ROCm][CI] Pin test_hybrid test to TRITON_ATTN on ROCm (#38381)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-30 20:26:46 +00:00 |
|
Ilya Markov
|
12701e8af2
|
[EPLB] Optmize eplb mapping and record in router for prefill (#36261)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-03-30 19:48:33 +00:00 |
|
Benjamin Chislett
|
494636b29d
|
[Feat][Spec Decode] DFlash (#36847)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-30 15:03:15 -04:00 |
|
mikaylagawarecki
|
ab1a6a43fa
|
[3/n] Migrate cutlass/scaled_mm_entry.cu torch stable ABI (#37221)
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
|
2026-03-30 11:20:13 -07:00 |
|
fangyuchu
|
b5e608258e
|
[Refactor] Unify engine process monitoring in engine manager and add Ray backend support (#35862)
Signed-off-by: fangyuchu <fangyuchu@qq.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-30 10:16:09 -07:00 |
|
Matthew Bonanni
|
2c734ed0e0
|
[Bugfix][MLA] Change default SM100 MLA prefill backend back to TRT-LLM (#38562)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-30 09:51:24 -07:00 |
|
Chendi.Xue
|
3b1dbaad4e
|
[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-30 16:47:30 +00:00 |
|
Johnny
|
b4a2f3ac36
|
[NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 (#38423)
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com>
|
2026-03-30 09:36:18 -07:00 |
|
roikoren755
|
8e6293e838
|
[Mamba] Add stochastic rounding support (#35753)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-30 12:33:49 -04:00 |
|
Hongxia Yang
|
dbdd9ae067
|
[ROCm][Bugfix] fix exception related to trust_remote_code for MiniMax-M2.1-MXFP4 (#37698)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-30 15:49:23 +00:00 |
|
Matthias Gehre
|
e8b055a5ac
|
[Bugfix] Handle ParallelLMHead in compressed-tensors get_quant_method (#37291)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-30 07:30:52 -07:00 |
|
tomeras91
|
246dc7d864
|
[Misc] Add @tomeras91 as a maintainer of Nemotron related code + mamba block (#38547)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2026-03-30 21:12:17 +08:00 |
|
Thomas Parnell
|
7c3f88b2a8
|
[Bugfix] Remove false-positive format mismatch warnings in FLA ops (#38255)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2026-03-30 12:32:26 +00:00 |
|
Li, Jiang
|
6557f4937f
|
[Bugfix][CPU] Skip set_num_threads after thread binding (#38535)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-30 20:13:00 +08:00 |
|
Andreas Karatzas
|
677424c7ac
|
[Core][CI] Add opt-in media URL caching via VLLM_MEDIA_CACHE (#37123)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 04:58:53 -07:00 |
|
Collin McCarthy
|
1031c84c36
|
Fix ambiguous num_blocks for hybrid attn mamba (#37236)
Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-03-30 11:09:45 +00:00 |
|