Jaden Mathias
|
0e60c925cf
|
[Bugfix] Remove assert causing hipErrorStreamCaptureUnsupported (#34455)
Signed-off-by: Jaden Mathias <jaden.mathias@amd.com>
|
2026-02-18 18:54:54 +00:00 |
|
Teng Ma
|
d7ff22204a
|
[Misc] Add mooncake-transfer-engine to kv_connectors requirements (#34826)
Signed-off-by: Teng Ma <teng-ma@linux.alibaba.com>
|
2026-02-18 18:26:24 +00:00 |
|
Isotr0py
|
c0bd8b13da
|
[Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion (#34697)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
|
2026-02-18 09:46:53 -08:00 |
|
Michael Goin
|
caeb887bf6
|
[Bugfix] Fix NVFP4 TRTLLM MoE non-gated support; add gsm8k for Nemotron-3-Nano FP8+NVFP4 (#34725)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-18 09:39:22 -08:00 |
|
Ilya Markov
|
6b3166a7c7
|
[CI][Bugfix] Fix multinode test script (#34820)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-02-18 11:45:10 -05:00 |
|
Robert Shaw
|
25e2e136ef
|
[CI] temporarily disable multi-node tests (#34825)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-02-18 11:32:44 -05:00 |
|
Robert Shaw
|
6874638bc4
|
[Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) (#34758)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-02-18 07:42:36 -08:00 |
|
Burkhard Ringlein
|
e24663c5a9
|
Add unit tests for fp8 output fusion of triton_attn (#34228)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-18 06:22:49 -05:00 |
|
Nick Hill
|
c50e105a88
|
[Model Runner V2] Avoid prepare prefill kernel launch overhead (#34780)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-18 00:49:21 -08:00 |
|
Cyrus Leung
|
a766b30349
|
[Renderer] Deprecate code paths for old input processing (#34775)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-18 00:35:04 -08:00 |
|
Asaf Joseph Gardin
|
1faa8cb73c
|
[Quantization] - Added uses_meta_device_weights to quant config (#34645)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-02-17 23:43:44 -08:00 |
|
Marek Michalowski
|
e89a91d927
|
[Bugfix] fix activation in cpu_fused_moe_torch call (#34696)
Signed-off-by: Marek Michalowski <marek.michalowski@arm.com>
|
2026-02-17 23:39:46 -08:00 |
|
Michael Goin
|
909b147197
|
[Bugfix] Fix prefix creation for Qwen3.5 (#34723)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-17 23:39:15 -08:00 |
|
ElizaWszola
|
a88b3be7c4
|
[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-17 23:35:04 -08:00 |
|
Nick Hill
|
a49ea5a58f
|
[Model Runner V2] A bit more PP simplification (#34766)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-17 21:39:07 -08:00 |
|
Cyrus Leung
|
30ebe0dc3c
|
[CI/Build] Remove use of skip_v1 (#34699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-18 12:19:11 +08:00 |
|
Andreas Karatzas
|
cef65f0715
|
[ROCm][CI] Removed hard-coded attn backend requirement for Qwen VL (#34753)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-18 03:59:53 +00:00 |
|
Russell Bryant
|
6f3b2047ab
|
[Core] Fix SSRF bypass via backslash-@ URL parsing inconsistency (#34743)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: isotr0py <2037008807@qq.com>
|
2026-02-18 03:53:35 +00:00 |
|
Luka Govedič
|
02e8f26cea
|
[torch.compile] Turn on silu+fp4 quant fusion by default for O1+ (#34718)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2026-02-18 03:29:15 +00:00 |
|
Hongxia Yang
|
4a00a511bb
|
[BugFix] [Build] fix string literals comparison in indexer_k_quant_and_cache calling site (#34653)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-02-17 19:19:41 -08:00 |
|
Cyrus Leung
|
a0d8d944e2
|
[Renderer] Move MM Hash parsing into Renderer (#34711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-17 19:18:55 -08:00 |
|
Amr Mahdi
|
df3f537a66
|
[CI] Remove unused precompiled wheel args from image build (#34767)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2026-02-17 18:58:18 -08:00 |
|
Matthew Bonanni
|
7743152957
|
[Attention] Refactor check_and_update_config (#33600)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-17 17:06:54 -08:00 |
|
Wentao Ye
|
ab33d2a629
|
[Feature] Decode Context Parallel support for GPU model runner v2 (#34179)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-17 16:27:15 -08:00 |
|
Woosuk Kwon
|
be3af2d29e
|
[Model Runner V2] Further simplification for PP (#34724)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-17 15:18:18 -08:00 |
|
Jongseok Park
|
c656ba3b4d
|
[Kernel] Triton-based Top-k and Top-p sampler kernels (#33538)
Signed-off-by: js_park <cakeng@naver.com>
Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com>
Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-17 23:14:30 +00:00 |
|
Matthew Bonanni
|
dc5fa77a4e
|
[Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs (#34457)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-17 14:01:27 -05:00 |
|
Flora Feng
|
1e4a084c8e
|
[CI] Fix flaky test_parsable_context (#34717)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-02-17 18:42:52 +00:00 |
|
Richard Zou
|
7967e854da
|
[BugFix] Fix sp tests (#34716)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-17 17:07:56 +00:00 |
|
almayne
|
6bd6d0c3c1
|
Fixed whisper CPU test that does not spawn properly. (#34324)
Signed-off-by: Anna Mayne <anna.mayne@arm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-17 06:46:23 -08:00 |
|
Nicolò Lucchesi
|
8e962fef5f
|
[CI][Nixl] Add CrossLayer KV layout tests (#34615)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-17 21:35:40 +08:00 |
|
Cyrus Leung
|
574fe75245
|
[Renderer] Move InputPreprocessor into Renderer (2/2) (#34560)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-17 05:29:01 -08:00 |
|
junuxyz
|
c61a98f529
|
[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
|
2026-02-17 12:22:56 +00:00 |
|
Harry Mellor
|
28bffe9466
|
Fix docs build warning (#34686)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-17 02:31:40 -08:00 |
|
ChenqianCao
|
ad65177a19
|
[Bugfix] Fix 'remove_instance_endpoint' method logic in disagg_proxy_demo (#32922)
Signed-off-by: ChenqianCao <39755070+ChenqianCao@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-17 10:06:53 +00:00 |
|
Tim Dettmers
|
d44a5b6c47
|
Remove dead bitsandbytes CxB code from 8-bit inference path (#34633)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-17 01:49:14 -08:00 |
|
Jiangyun Zhu
|
1d65283e95
|
Revert "[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj" (#34683)
|
2026-02-17 01:29:27 -08:00 |
|
kourosh hakhamaneshi
|
c464b57374
|
[Ray] Propagate third-party env vars to Ray workers via prefix matching (#34383)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-02-17 01:08:42 -08:00 |
|
Amr Mahdi
|
c5c38e152a
|
[CI] Fix bake config artifact path for AMI rebuild pipeline (#34656)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2026-02-17 06:39:44 +00:00 |
|
Woosuk Kwon
|
d00df624f3
|
[Model Runner V2] Minor refactoring for penalties (#34662)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-16 21:43:00 -08:00 |
|
Woosuk Kwon
|
9752da9d9c
|
[Model Runner V2] Minor simplification for BadWordsState (#34669)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-16 21:27:24 -08:00 |
|
Woosuk Kwon
|
04925b2202
|
[Model Runner V2] Minor cleanup for PP (#34666)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-16 19:15:31 -08:00 |
|
Woosuk Kwon
|
d74278fb67
|
[Model Runner V2] Fix unintended CPU-GPU sync in make_dummy (#34667)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-16 19:00:29 -08:00 |
|
haosdent
|
b68fd899d1
|
[Bugfix] Fix fused MoE int32 overflow in stride*offset without perf regression (#34507)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-16 17:58:49 -08:00 |
|
Aneesh Puttur
|
0b5f9b7204
|
[CI] Enable mypy import following for vllm/v1/kv_offload (#34639)
Signed-off-by: Aneesh Puttur <aneeshputtur@gmail.com>
|
2026-02-17 09:58:15 +08:00 |
|
zhanqiuhu
|
9a8853f781
|
[Core] Pipeline Parallel support for Model Runner V2 (#33960)
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
|
2026-02-16 17:48:16 -08:00 |
|
zhrrr
|
387a1898d9
|
[Model Runner V2] support bad_words sampling param (#33433)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-16 16:36:06 -08:00 |
|
roikoren755
|
3b30e61507
|
[NemotronH] Do not force router to run in fp32 (#34582)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-02-16 10:15:32 -08:00 |
|
Alexei-V-Ivanov-AMD
|
824f9e8f3c
|
Targeting the MI355 agent pool with all existing tests (#34629)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2026-02-16 17:02:27 +00:00 |
|
Nicolò Lucchesi
|
6cc403e67d
|
[Bugfix][CI] Fix flaky entrypoints/openai/test_response_api_with_harmony.py::test_function_calling[openai/gpt-oss-20b] (#34624)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-16 16:11:07 +00:00 |
|