Harry Mellor
|
f83b933b84
|
[CI] Bump mypy version to 1.19.1 (#36104)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
v0.17.1rc0
|
2026-03-10 09:18:28 -07:00 |
|
Pleaplusone
|
82f3f30e26
|
[ROCm][Perf] Enable sparse_mla's cudagraph on ROCm platform (#35719)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-03-10 09:14:35 -07:00 |
|
Matthew Bonanni
|
9095cbbfb6
|
[Bugfix][Sparse MLA] report indexer CG support properly (#36519)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-10 09:14:31 -07:00 |
|
Hashem Hashemi
|
721ae79f50
|
Improvements to wvSplitKrc skinny GEMM solution (#34304)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-03-10 09:14:27 -07:00 |
|
AllenDou
|
aefc59f088
|
FunASR model bugfix (#36633)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
|
2026-03-10 08:14:21 -07:00 |
|
Harry Mellor
|
d88f28da05
|
Fix hf_override_fn when it modifies model_type (#35200)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-10 15:03:18 +00:00 |
|
Srinivasoo7
|
106ff69c4e
|
feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342)
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-10 14:43:40 +00:00 |
|
Jiangyun Zhu
|
ca5fb4bbd8
|
[Bugfix] Avoid merging empty-only partitions into splitting-op subgraphs (#36595)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-10 07:39:01 -07:00 |
|
Alvin Tang
|
cf88b23749
|
fix: check HTTP status in batch read_file to prevent silent failures (#36397)
Signed-off-by: gambletan <ethanchang32@gmail.com>
Co-authored-by: gambletan <ethanchang32@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-10 07:22:40 -07:00 |
|
wang.yuqi
|
a3189a08b0
|
[Model] Consolidate score logic by introduce score_type (#36479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-10 13:32:25 +00:00 |
|
SoluMilken
|
409c4e632d
|
[Misc] fix typo: homogenous-> homogeneous (2 lines change) (#36508)
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
|
2026-03-10 06:25:37 -07:00 |
|
Raushan Turganbay
|
8850738b70
|
[Bugfix] Fix processor signature (#36630)
Signed-off-by: raushan <raushan@huggingface.co>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-10 06:20:47 -07:00 |
|
Mark McLoughlin
|
234860399b
|
[Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270) (#36628)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-03-10 06:20:41 -07:00 |
|
Harry Mellor
|
c88510083b
|
Fix Qwen2.5-VL test for Transformers v5 (#36532)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-10 12:05:34 +00:00 |
|
Vadim Gimpelson
|
4ff8c3c8f9
|
[BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-10 03:32:20 -07:00 |
|
Chang Su
|
507ddbe992
|
feat(grpc): extract gRPC servicer into smg-grpc-servicer package, add --grpc flag to vllm serve (#36169)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2026-03-10 03:29:59 -07:00 |
|
Nick Hill
|
ddbb0d230a
|
[Model Runner V2] Fix mm input embeddings lookup (#36588)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-10 00:24:58 -07:00 |
|
Nick Hill
|
9efc3bdcd6
|
[Model Runner V2] Fix _compute_slot_mappings_kernel for chunked prefill (#36580)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-10 00:23:42 -07:00 |
|
amirkl94
|
156e33553c
|
Fix: Re-Enable EP for trtllm MoE FP8 backend (#36494)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
|
2026-03-09 23:11:27 -07:00 |
|
hallerite
|
d0cd736caa
|
[Bugfix] Fix RuntimeError: Already borrowed that degrades VLM serving throughput under concurrent load. (#36557)
Signed-off-by: hallerite <hallerite@users.noreply.github.com>
Signed-off-by: hallerite <git@hallerite.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-09 22:30:51 -07:00 |
|
Harry Mellor
|
195c997203
|
Fix LFM2 MoE test for Transformers v5 (#36534)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-09 22:29:17 -07:00 |
|
Zhuohan Li
|
04b67d8f62
|
Remove unused disable_fallback field (#36546)
|
2026-03-09 20:56:54 -07:00 |
|
Wentao Ye
|
7279374f91
|
[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-09 20:55:58 -07:00 |
|
Woosuk Kwon
|
006aea17d7
|
[BugFix] Remove incorrect assert in split_decodes_and_prefills (#36553)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-09 20:02:02 -07:00 |
|
Hojin Yang
|
0836be3b03
|
[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-10 10:59:19 +08:00 |
|
Ajay Anubolu
|
4e95ec111c
|
[Bugfix] Fix Qwen3-Next in_proj_ba weight sharding with TP > 1 (#36242)
Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
|
2026-03-09 19:16:26 -07:00 |
|
Andreas Karatzas
|
179547d62c
|
[ROCm][CI] Fix ROCm GPT-OSS Eval test group (#36179)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-09 17:55:20 -07:00 |
|
youkaichao
|
f85b4eda3a
|
[bugfix] fix nvlink for nixl/ucx (#36475)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2026-03-10 07:49:47 +08:00 |
|
Woosuk Kwon
|
2a194ddd72
|
[Model Runner V2] Add model_state inputs to CUDA graph capture (#36544)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-09 15:14:51 -07:00 |
|
Shaun Kotek
|
203a7f27da
|
add nemotron v3 reasoning parser (#36393)
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com>
Co-authored-by: root <root@gpu-259.slurm-workers-slurm.slurm.svc.cluster.local>
|
2026-03-09 15:11:41 -07:00 |
|
Lucas Wilkinson
|
483463f735
|
[MRV2] Extensible CG dispatch rework (#35959)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-03-09 13:58:45 -07:00 |
|
Matthew Bonanni
|
4e571ce643
|
[MTP][Misc] Clean up dead code (#36507)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-09 14:43:06 -04:00 |
|
Micah Williamson
|
4ff9b045fe
|
[ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm (#36025)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-09 13:27:55 -05:00 |
|
Lucas Kabela
|
3fd03f1ec2
|
[BE] Rename should_torch_compile_mm_vit to should_torch_compile_mm_encoder (#36281)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-03-09 18:22:05 +00:00 |
|
Woosuk Kwon
|
10a5f4d53d
|
[Model Runner V2] Use NamedTuple for execute_model_state (#35930)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-09 11:17:34 -07:00 |
|
Simon Mo
|
fe0c085c28
|
[Docs] Remove the reo beacon (#36528)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
|
2026-03-09 11:16:50 -07:00 |
|
Taneem Ibrahim
|
8d6b3d5dda
|
[Misc] Refactored 5 duplicate helper functions that were copied-pasted across multiple parsers (#36436)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-03-09 14:14:11 -04:00 |
|
Copilot
|
4b87ffbefb
|
[torch.compile] Rename compile_ranges_split_points to compile_ranges_endpoints (#36027)
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-09 18:04:40 +00:00 |
|
Shaun Kotek
|
fa028207aa
|
Fix/resupport nongated fused moe triton (#36412)
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com>
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: liweiguang <codingpunk@gmail.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: cong-or <conchubhar.gannon@gmail.com>
Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com>
Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: nvnbagrov <nbagrov@nvidia.com>
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com>
Co-authored-by: danisereb <daserebrenik@nvidia.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Weiguang Li <codingpunk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Alex Brooks <albrooks@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: cong-or <conchubhar.gannon@gmail.com>
Co-authored-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>
Co-authored-by: liuzhenwei <zhenwei.liu@intel.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-09 11:01:18 -07:00 |
|
Russell Bryant
|
d460a18fc6
|
[Docs] Expand --allowed-media-domains security guidance with threat details (#36506)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-09 17:43:42 +00:00 |
|
Woosuk Kwon
|
6e956d9eca
|
[Model Runner V2] Add dummy profile_cudagraph_memory API (#36520)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-09 10:20:13 -07:00 |
|
Andreas Karatzas
|
1e0f917b34
|
[ROCm][CI] Fix logprob divergence for TitanML/tiny-mixtral under AITER rms_norm (#36101)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-09 12:07:44 -05:00 |
|
Andreas Karatzas
|
c174d54f86
|
[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-09 12:02:41 -05:00 |
|
SoluMilken
|
55d27cca55
|
[Misc] fix typo: dependant -> dependent (2 lines change) (#36511)
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
|
2026-03-09 10:00:12 -07:00 |
|
Roberto L. Castro
|
580864d81e
|
[Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 (#34917)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
|
2026-03-09 09:50:36 -07:00 |
|
Roberto L. Castro
|
2b28b9b269
|
[Attention][Perf] Optimize cp_gather_and_upconvert_fp8_kv_cache - DeepSeek-v3.2 (#35290)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-03-09 09:46:57 -07:00 |
|
Taoyu Zhu
|
70485a11bd
|
[ROCM] Optimize the fused_topk_bias to use aiter instead of fallback torch ops. (#36253)
Signed-off-by: zhutaoyu <zhutaoyu97@gmail.com>
|
2026-03-09 11:30:35 -05:00 |
|
Harry Mellor
|
74a9f54cdb
|
[CI] Fix edge case that could lead to broken docs builds on main (#36515)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-09 09:06:19 -07:00 |
|
Matthew Bonanni
|
00c4cb5606
|
[Bugfix] Clear stale CG keys after memory profiling (#36416)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-09 11:56:00 -04:00 |
|
Wentao Ye
|
941e52c298
|
[Refactor] Simplify chat_completion_full_generator for tool parsers (#35634)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-09 23:33:46 +08:00 |
|