Harry Mellor
f83b933b84
[CI] Bump mypy version to 1.19.1 ( #36104 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-10 09:18:28 -07:00
Pleaplusone
82f3f30e26
[ROCm][Perf] Enable sparse_mla's cudagraph on ROCm platform ( #35719 )
...
Signed-off-by: ganyi <ygan@amd.com >
2026-03-10 09:14:35 -07:00
Matthew Bonanni
9095cbbfb6
[Bugfix][Sparse MLA] report indexer CG support properly ( #36519 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-10 09:14:31 -07:00
Hashem Hashemi
721ae79f50
Improvements to wvSplitKrc skinny GEMM solution ( #34304 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-03-10 09:14:27 -07:00
AllenDou
aefc59f088
FunASR model bugfix ( #36633 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-03-10 08:14:21 -07:00
Harry Mellor
d88f28da05
Fix hf_override_fn when it modifies model_type ( #35200 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-10 15:03:18 +00:00
Srinivasoo7
106ff69c4e
feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency ( #35342 )
...
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com >
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com >
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com >
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
2026-03-10 14:43:40 +00:00
Jiangyun Zhu
ca5fb4bbd8
[Bugfix] Avoid merging empty-only partitions into splitting-op subgraphs ( #36595 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-03-10 07:39:01 -07:00
Alvin Tang
cf88b23749
fix: check HTTP status in batch read_file to prevent silent failures ( #36397 )
...
Signed-off-by: gambletan <ethanchang32@gmail.com >
Co-authored-by: gambletan <ethanchang32@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-10 07:22:40 -07:00
wang.yuqi
a3189a08b0
[Model] Consolidate score logic by introduce score_type ( #36479 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-10 13:32:25 +00:00
SoluMilken
409c4e632d
[Misc] fix typo: homogenous-> homogeneous (2 lines change) ( #36508 )
...
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw >
2026-03-10 06:25:37 -07:00
Raushan Turganbay
8850738b70
[Bugfix] Fix processor signature ( #36630 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-10 06:20:47 -07:00
Mark McLoughlin
234860399b
[Frontend][Core] Revert "Add shutdown timeout" ( #34730 and #36270 ) ( #36628 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-03-10 06:20:41 -07:00
Harry Mellor
c88510083b
Fix Qwen2.5-VL test for Transformers v5 ( #36532 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-10 12:05:34 +00:00
Vadim Gimpelson
4ff8c3c8f9
[BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU ( #35219 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-03-10 03:32:20 -07:00
Chang Su
507ddbe992
feat(grpc): extract gRPC servicer into smg-grpc-servicer package, add --grpc flag to vllm serve ( #36169 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-03-10 03:29:59 -07:00
Nick Hill
ddbb0d230a
[Model Runner V2] Fix mm input embeddings lookup ( #36588 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-10 00:24:58 -07:00
Nick Hill
9efc3bdcd6
[Model Runner V2] Fix _compute_slot_mappings_kernel for chunked prefill ( #36580 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-10 00:23:42 -07:00
amirkl94
156e33553c
Fix: Re-Enable EP for trtllm MoE FP8 backend ( #36494 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
2026-03-09 23:11:27 -07:00
hallerite
d0cd736caa
[Bugfix] Fix RuntimeError: Already borrowed that degrades VLM serving throughput under concurrent load. ( #36557 )
...
Signed-off-by: hallerite <hallerite@users.noreply.github.com >
Signed-off-by: hallerite <git@hallerite.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-09 22:30:51 -07:00
Harry Mellor
195c997203
Fix LFM2 MoE test for Transformers v5 ( #36534 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-09 22:29:17 -07:00
Zhuohan Li
04b67d8f62
Remove unused disable_fallback field ( #36546 )
2026-03-09 20:56:54 -07:00
Wentao Ye
7279374f91
[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement ( #36159 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-09 20:55:58 -07:00
Woosuk Kwon
006aea17d7
[BugFix] Remove incorrect assert in split_decodes_and_prefills ( #36553 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-09 20:02:02 -07:00
Hojin Yang
0836be3b03
[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support ( #31471 )
...
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-10 10:59:19 +08:00
Ajay Anubolu
4e95ec111c
[Bugfix] Fix Qwen3-Next in_proj_ba weight sharding with TP > 1 ( #36242 )
...
Signed-off-by: AjAnubolu <anuboluajay@gmail.com >
2026-03-09 19:16:26 -07:00
Andreas Karatzas
179547d62c
[ROCm][CI] Fix ROCm GPT-OSS Eval test group ( #36179 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-09 17:55:20 -07:00
youkaichao
f85b4eda3a
[bugfix] fix nvlink for nixl/ucx ( #36475 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2026-03-10 07:49:47 +08:00
Woosuk Kwon
2a194ddd72
[Model Runner V2] Add model_state inputs to CUDA graph capture ( #36544 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-09 15:14:51 -07:00
Shaun Kotek
203a7f27da
add nemotron v3 reasoning parser ( #36393 )
...
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com >
Co-authored-by: root <root@gpu-259.slurm-workers-slurm.slurm.svc.cluster.local >
2026-03-09 15:11:41 -07:00
Lucas Wilkinson
483463f735
[MRV2] Extensible CG dispatch rework ( #35959 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-03-09 13:58:45 -07:00
Matthew Bonanni
4e571ce643
[MTP][Misc] Clean up dead code ( #36507 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-09 14:43:06 -04:00
Micah Williamson
4ff9b045fe
[ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm ( #36025 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-03-09 13:27:55 -05:00
Lucas Kabela
3fd03f1ec2
[BE] Rename should_torch_compile_mm_vit to should_torch_compile_mm_encoder ( #36281 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-03-09 18:22:05 +00:00
Woosuk Kwon
10a5f4d53d
[Model Runner V2] Use NamedTuple for execute_model_state ( #35930 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-09 11:17:34 -07:00
Simon Mo
fe0c085c28
[Docs] Remove the reo beacon ( #36528 )
...
Co-authored-by: Cursor Agent <cursoragent@cursor.com >
2026-03-09 11:16:50 -07:00
Taneem Ibrahim
8d6b3d5dda
[Misc] Refactored 5 duplicate helper functions that were copied-pasted across multiple parsers ( #36436 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-03-09 14:14:11 -04:00
Copilot
4b87ffbefb
[torch.compile] Rename compile_ranges_split_points to compile_ranges_endpoints ( #36027 )
...
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-03-09 18:04:40 +00:00
Shaun Kotek
fa028207aa
Fix/resupport nongated fused moe triton ( #36412 )
...
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com >
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com >
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: liweiguang <codingpunk@gmail.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Alex Brooks <albrooks@redhat.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: cong-or <conchubhar.gannon@gmail.com >
Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com >
Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com >
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
Signed-off-by: Xin Yang <xyangx@amazon.com >
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: nvnbagrov <nbagrov@nvidia.com >
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com >
Co-authored-by: danisereb <daserebrenik@nvidia.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Weiguang Li <codingpunk@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: Alex Brooks <albrooks@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: cong-or <conchubhar.gannon@gmail.com >
Co-authored-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com >
Co-authored-by: liuzhenwei <zhenwei.liu@intel.com >
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-09 11:01:18 -07:00
Russell Bryant
d460a18fc6
[Docs] Expand --allowed-media-domains security guidance with threat details ( #36506 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-09 17:43:42 +00:00
Woosuk Kwon
6e956d9eca
[Model Runner V2] Add dummy profile_cudagraph_memory API ( #36520 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-09 10:20:13 -07:00
Andreas Karatzas
1e0f917b34
[ROCm][CI] Fix logprob divergence for TitanML/tiny-mixtral under AITER rms_norm ( #36101 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-09 12:07:44 -05:00
Andreas Karatzas
c174d54f86
[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks ( #36292 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-09 12:02:41 -05:00
SoluMilken
55d27cca55
[Misc] fix typo: dependant -> dependent (2 lines change) ( #36511 )
...
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw >
2026-03-09 10:00:12 -07:00
Roberto L. Castro
580864d81e
[Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 ( #34917 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
2026-03-09 09:50:36 -07:00
Roberto L. Castro
2b28b9b269
[Attention][Perf] Optimize cp_gather_and_upconvert_fp8_kv_cache - DeepSeek-v3.2 ( #35290 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-03-09 09:46:57 -07:00
Taoyu Zhu
70485a11bd
[ROCM] Optimize the fused_topk_bias to use aiter instead of fallback torch ops. ( #36253 )
...
Signed-off-by: zhutaoyu <zhutaoyu97@gmail.com >
2026-03-09 11:30:35 -05:00
Harry Mellor
74a9f54cdb
[CI] Fix edge case that could lead to broken docs builds on main ( #36515 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-09 09:06:19 -07:00
Matthew Bonanni
00c4cb5606
[Bugfix] Clear stale CG keys after memory profiling ( #36416 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-09 11:56:00 -04:00
Wentao Ye
941e52c298
[Refactor] Simplify chat_completion_full_generator for tool parsers ( #35634 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-09 23:33:46 +08:00
Wentao Ye
be292b7c14
[Bug] Fix pooling model benchmark script ( #36300 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-09 11:17:45 -04:00
Matthew Bonanni
77a73458e3
Reapply [Attention] Refactor check_and_update_config ( #35122 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-09 07:17:14 -07:00
Tianyu Guo
5578f2a4d3
Support online use_audio_in_video ( #36319 )
...
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-09 07:16:44 -07:00
Cyrus Leung
3ec2115015
[Frontend] Move warmup into Renderer ( #36482 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-09 06:03:21 -07:00
Isotr0py
b0906d8b02
[MM Encoder] Default to use TORCH_SDPA backend for ViT on Volta/Turing GPU ( #36472 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-09 03:43:44 -07:00
Kevin H. Luu
aaf5fa9abf
[ci] Bound openai dependency to 2.24.0 ( #36471 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-03-09 03:43:26 -07:00
Cyrus Leung
f96c3ab08c
[Deprecation][1/2] Remove items deprecated in v0.18 ( #36470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-09 03:43:23 -07:00
Xin Yang
dc6b578466
[Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next ( #35777 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-03-08 23:41:01 -07:00
liuzhenwei
1bc9c77f6d
[XPU] Add test script of PD disaggregation ( #36434 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2026-03-09 05:50:27 +00:00
Alex Brooks
65a4da1504
[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) ( #36160 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
2026-03-09 05:46:23 +00:00
Li, Jiang
217f27598d
[Bugfix] Avoid to replace non-tensor members in cpu model runner ( #36430 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-03-09 13:06:28 +08:00
wang.yuqi
fff3711a24
[Frontend][2/n] Improve pooling entrypoints | embed. ( #36110 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
2026-03-09 11:42:19 +08:00
Tushar Shetty
c4d859c274
[Bugfix] Skip out-of-stage layers in get_layers_from_vllm_config for pipeline parallel ( #36243 )
...
Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com >
Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com >
2026-03-08 20:40:16 -07:00
cong-or
747431044d
feat(attention): extract KV-cache update from FlexAttention backend ( #36263 )
...
Signed-off-by: cong-or <conchubhar.gannon@gmail.com >
2026-03-08 20:40:12 -07:00
Cyrus Leung
d62856b928
[Misc] Move processors to transformers_utils ( #35953 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-09 11:31:39 +08:00
Alex Brooks
bd2659a566
Increase Flexibility for OOV Multimodal Token Handling ( #34858 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
2026-03-08 20:30:49 -07:00
Shaun Kotek
90512b2e8b
fix: Use iterator as not to store all the file loads in memory at once ( #36149 )
...
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com >
2026-03-08 20:25:21 -07:00
wang.yuqi
dcf8862fd4
[Examples][1/n] Resettle basic examples. ( #35579 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-08 20:22:53 -07:00
Weiguang Li
43aa389231
[Bugfix] Fix CPU OMP autobind assertion to use local_world_size ( #35815 )
...
Signed-off-by: liweiguang <codingpunk@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-03-08 20:07:29 -07:00
Wentao Ye
384425f84e
[Dependency] Remove default ray dependency ( #36170 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-08 20:06:22 -07:00
Harry Mellor
a0f44bb616
Allow markdownlint to run locally ( #36398 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-08 20:05:24 -07:00
Kunshang Ji
fde4771bbd
[XPU][Doc] update xpu document about triton dependency/conflict issue. ( #36301 )
...
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-03-09 02:09:22 +00:00
Jiangyun Zhu
e5ff140216
[cudagraph] fix cudagraph warning in deepseekv32 ( #28044 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-03-08 20:27:41 -04:00
danisereb
0a6a3a1290
Add support for ModelOpt MXFP8 MoE models ( #35986 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-03-08 13:00:05 -07:00
Sage
4497431df6
[Frontend] Add GPU-less render serving path (vllm launch render) ( #36166 )
2026-03-08 16:35:09 +01:00
nvnbagrov
b7332b058c
[Model] Nano Nemotron VL - fast media preprocessing ( #35657 )
...
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com >
2026-03-08 03:04:05 -07:00
Andreas Karatzas
40077ea3de
[CI] fix flaky empty responses and add diagnostic assertions in vision chat tests ( #36341 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-08 14:42:24 +08:00
Samuel Shen
5d6aae4577
[LMCache MP Patch]: Race Condition + Duplicated Block Ids ( #35831 )
2026-03-07 13:52:48 -08:00
Roy Huang
63298ee173
[Bugfix][LMCache][KVConnector] fix potential memory leak in LMCache multiprocess mode ( #35931 )
2026-03-07 13:52:35 -08:00
Richard Zou
2dde535df1
[compile] Split compile/warmup monitoring ( #36098 )
2026-03-07 13:52:11 -08:00
Wei Zhao
379689d533
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse ( #35891 )
2026-03-07 13:51:54 -08:00
PatchyTIS
a6be75dbd2
[Core] NGram GPU Implementation compatible with Async Scheduler ( #29184 )
2026-03-07 13:51:37 -08:00
Micah Williamson
ee54f9cdb9
[ROCm][CI] Accept Different But Valid Output for test_olmoe_tp ( #35224 )
2026-03-07 13:50:52 -08:00
Micah Williamson
fc4657756f
[ROCm][CI] Enable AITER for failing test_gpt_oss test case on MI355 ( #36174 )
2026-03-07 13:50:17 -08:00
qli88
eebd14651f
[CI] Enable Crosslayer KV layout tests for ROCm platforms ( #35416 )
2026-03-07 13:49:56 -08:00
Matthew Bonanni
ebb9cc5f2b
[UX][Startup] Account for CUDA graphs during memory profiling ( #30515 )
2026-03-07 13:49:23 -08:00
rahul-sarvam
85f50eb41f
Adding support to Sarvam's MoE models ( #33942 )
...
Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com >
2026-03-08 01:16:24 +08:00
Taneem Ibrahim
5261223c2d
[Misc] Remove duplicate parser registration ( #36303 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-03-07 09:37:01 -05:00
lif
00b814ba5a
[V0 Deprecation] Remove unused swap_space parameter ( #36216 )
...
Signed-off-by: majiayu000 <1835304752@qq.com >
Co-authored-by: mcelrath
2026-03-07 22:09:55 +08:00
vllmellm
ee8a29511f
[Bugfix] Fix compressed-tensors quantization failure for DeepSeek-R1 on MI300x ( #36247 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-03-07 09:26:59 +00:00
milesial
755356b3d1
feat: expose media_io_kwargs at runtime ( #34778 )
...
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com >
2026-03-07 04:27:04 +00:00
Andreas Karatzas
58928475e4
[ROCm][CI] Making entrypoints more deterministic on ROCm ( #36293 )
2026-03-06 19:04:40 -08:00
Mengtao (Martin) Yuan
1a9718085c
Fix CUDA graph decode capture crash in AITER FlashAttention ( #36042 )
...
Signed-off-by: Martin Yuan <myuan@meta.com >
Co-authored-by: Martin Yuan <myuan@meta.com >
2026-03-06 18:12:07 -08:00
Kunshang Ji
7eb524e64c
refine vllm bench throughput --backend hf ( #35971 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-07 02:10:33 +00:00
Nick Hill
c7f32e08c2
[BugFix] Avoid ignored trust_remote_code warnings ( #36290 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-07 01:24:18 +00:00
Nick Hill
b354686524
[Model Runner V2] Fix warmup for pipeline parallel ( #36280 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-06 16:58:51 -08:00
Nick Hill
6a18d8789b
[Core] Fix benign error log during normal shutdown ( #36270 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2026-03-07 00:39:21 +00:00
Itay Alroy
24a03915f5
mla: don't update kv cache on dummy forwards ( #36282 )
...
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
2026-03-07 00:36:00 +00:00
Andreas Karatzas
b5e34e1fca
[ROCm][CI] Fixing yaml file for external amd-ci signal ( #36284 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 18:30:39 -06:00
Copilot
ce8546a12b
[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page ( #35538 )
...
Signed-off-by: ProExpertProg <luka.govedic@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
Co-authored-by: ProExpertProg <luka.govedic@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-03-06 23:55:06 +00:00
Chuan (Richard) Li
c188749bcd
[ROCm] Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5/Linear) ( #35850 )
...
Signed-off-by: Li <chuali@amd.com >
2026-03-06 20:24:03 +00:00
Alexei-V-Ivanov-AMD
225d1090a0
Enabling some B200-specific tests on MI355 ( #35253 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
Signed-off-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com >
2026-03-06 19:27:20 +00:00
eellison
f3c6c9c9d7
[CustomOp] CustomOp FusedRMSNormGated ( #35877 )
...
Signed-off-by: Elias Ellison <elias.ellison@gmail.com >
Signed-off-by: eellison <elias.ellison@gmail.com >
2026-03-06 10:53:37 -08:00
Nick Hill
26bd43b52d
Revert "[BugFix] Fix engine hanging after KV cache initialization fai… ( #36262 )
2026-03-06 08:28:09 -08:00
Travis Johnson
6b625a8807
[Bugfix] Quickfix followups to busy loop removal in #28053 ( #36068 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-06 08:13:05 -08:00
Richard Zou
54756b6109
[compile] Stop unconditionally patching constrain_to_fx_strides ( #36152 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-06 10:17:27 -05:00
Raphaël Rialland
39f9ea0da4
[Bugfix] Fix cudagraph_mode:FULL dispatch (This does not impact FULL_AND_PIECEWISE (default)) ( #36165 )
2026-03-06 09:15:31 -05:00
Isotr0py
e4ae148a78
[Refactor] Modular video loader backend refactoring ( #35202 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-06 06:06:59 -08:00
Isotr0py
1d0c0d209c
[Misc] Lazy import registered processors ( #36024 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-06 06:06:45 -08:00
Chenguang Zheng
fcb73f306c
[bugfix] add api process rank in default multimodal request ( #36150 )
...
Signed-off-by: fake0fan <645327136@qq.com >
Signed-off-by: Chenguang ZHENG <645327136@qq.com >
2026-03-06 12:00:09 +00:00
Harry Mellor
e2090bf3af
[CI] Fix startup error test ( #36230 )
...
A change in engine startup error messages in #35478 caused this test failure.
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-06 11:50:28 +00:00
Andreas Karatzas
2a00d3241f
[CI][MM] Gate vision encoder attention mask to MiniCPM only, fixing Aria regression ( #36206 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 01:17:08 -08:00
Alex Brooks
10f4db4dbe
[Frontend] Add Support for MM Encoder/Decoder Beam Search (Offline) ( #36153 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-06 01:16:56 -08:00
Nicolò Lucchesi
5b3ba94ab4
[Core][KVConnector] Support HMA+NixlConnector ( #35758 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-06 08:51:21 +01:00
zhanqiuhu
90f3c01fa4
[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding ( #35158 )
...
Signed-off-by: Claude <noreply@anthropic.com >
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-03-06 08:50:44 +01:00
Andreas Karatzas
807d680337
[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance ( #35553 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 15:15:12 +08:00
Tyler Michael Smith
5afb387bd4
Change "following fields were present in the request but ignored" log from warn to debug ( #36173 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-03-05 22:15:46 -08:00
Walter Beller-Morales
43e77e59ab
[BugFix] avoid infinite loop with VLLM_PORT and get_open_ports_list ( #36191 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-03-05 22:15:29 -08:00
Russell Bryant
00bd08edee
[Security] Respect user trust_remote_code setting in NemotronVL and KimiK25 ( #36192 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-05 22:15:19 -08:00
Ajay Anubolu
43f10573c9
[Bugfix] Fix misleading context length error messages ( #36197 )
...
Signed-off-by: AjAnubolu <anuboluajay@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-05 22:15:12 -08:00
Yongye Zhu
86e1060b17
[Bugfix] Fix inner_dp_world initialization order for multi-node TP ( #35892 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-03-05 22:04:44 -08:00
Mark McLoughlin
27066d1b2b
[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish ( #34730 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-03-05 22:04:31 -08:00
cong-or
57c84ff129
perf: add __slots__ to KVCacheBlock ( #36164 )
...
Signed-off-by: cong-or <conchubhar.gannon@gmail.com >
2026-03-05 22:04:09 -08:00
Xiang Shi
e68de8adc0
docs: fix wrong cc in int8.md ( #36209 )
...
Signed-off-by: Xiang Shi <realkevin@tutanota.com >
2026-03-06 06:01:02 +00:00
Andreas Karatzas
a1ffa56a1e
[CI] Fix bge-m3 similarity reference values after *Defination* typo fix ( #36208 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 05:07:29 +00:00
Shiyan Deng
0a208d1f54
[BugFix] Fix engine hanging after KV cache initialization failure ( #35478 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-03-05 20:58:09 -08:00
Shiyan Deng
03a49bb8f0
[Feature] Add --distributed-timeout-seconds CLI option ( #36047 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-03-05 20:57:51 -08:00
Shiyan Deng
8e87cc57f1
[Bug] Fix a corner case in _process_simple_streaming_events ( #34754 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-03-05 20:57:32 -08:00
Cyrus Leung
6dd302653f
[Misc] Rename group_mm_kwargs_by_modality -> group_and_batch_mm_kwargs ( #36158 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-06 12:32:48 +08:00
Cyrus Leung
de00ebeac4
[Bugfix] Fix simple Mistral-Small example ( #36156 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-05 20:25:11 -08:00
Andreas Karatzas
639680d220
[ROCm][CI] Adding missing dependencies for Multi-modal models tests ( #36177 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 12:23:10 +08:00
Rohan Potdar
c5362c739f
Reenable features for ROCm attention backends ( #36185 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-03-05 20:21:06 -08:00
Nikhil Gupta
0a49676fb0
cpu: aarch64: Upgrade OneDNN for aarch64 to add support for int8 matmul ( #36147 )
...
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com >
2026-03-06 03:48:59 +00:00
Jeffrey Wang
c012a8c477
Don't fire ray compatibility webhook when PR or branch is not provided ( #36088 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2026-03-06 00:42:21 +00:00
Dor Huri
ebed80a7c8
[Performance] Extract KV-cache update from TreeAttention backend ( #35384 )
...
Signed-off-by: dorhuri123 <dor.huri1@live.biu.ac.il >
2026-03-06 00:22:43 +00:00
Nick Hill
a73af584fe
[Model Runner V2] Fix warmup for very small kvcache and/or blocksizes ( #36176 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-05 14:48:10 -08:00
Zhengxu Chen
a97954b6a8
[compile] Consistent compiler config for saved/loaded vllm backends. ( #35810 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-05 15:08:12 -05:00
Yanhong Li
a911f4dd20
[Model] Add support for OLMo Hybrid ( #32550 )
2026-03-05 14:51:06 -05:00
Russell Bryant
5395471d29
[CI] Add explicit permissions to macOS smoke test workflow ( #35775 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-05 19:08:48 +00:00
Frank Wang
a57c877f18
[BugFix] Fallback from FA4->FA2 for Batch Invariance ( #36059 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
2026-03-05 14:05:56 -05:00
Xin Yang
f917020983
[Perf] Optimize FusedMoEModularKernel output tensor using torch.empty ( #35794 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-03-05 13:47:53 -05:00
tomeras91
86483ca774
[Bugfix] Disable FlashInfer TRTLLM BF16 path for non-gated MoE ( #36146 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2026-03-05 09:49:05 -08:00
Netanel Haber
b93a9e6f6d
ParakeetProjection.norm = RMSNorm instead of nn.LayerNorm ( #36133 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-03-05 17:29:30 +00:00
Xinyu Chen
d8839ef7d9
[XPU] Enable ModelRunnerV2 on XPU ( #36078 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2026-03-05 17:19:18 +00:00
Avery Miao
e998fa76b9
[BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 ( #35994 )
...
Signed-off-by: Miao, Avery <avery.miao@intel.com >
2026-03-05 09:16:29 -08:00
Jiayi Yan
6a895197fa
[Bugfix][CI] fix typos ( #34934 )
...
Signed-off-by: 1195343015 <1195343015@qq.com >
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 17:05:46 +00:00
Sage Moore
8c760b6ab6
[ROCm] Refactor ROCm attention backend selection logic ( #35246 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2026-03-05 10:51:26 -06:00
AllenDou
3ee68590c7
refactor funasr model. ( #36108 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-05 08:07:37 -08:00
Cyrus Leung
7196348157
[Bugfix] Fix Qwen-VL tokenizer implementation ( #36140 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-05 08:07:19 -08:00
Ning Xie
176c799f4c
[openai api] log exception in exception handler (1/N) ( #31164 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-03-05 16:00:12 +00:00
Or Ozeri
612e7729c2
[KVConnector] Scheduler: Fix num_computed_tokens after async KV load ( #34616 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-03-05 14:25:15 +00:00
Harry Mellor
ecde7af9c4
Fix import that was moved in Transformers 5.2.0 ( #36120 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 13:59:44 +00:00
Harry Mellor
8df523351f
[Docs] Only build docs if documentation or ready labels are present ( #36135 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 13:58:16 +00:00
Andreas Karatzas
b03ff6a96b
[CI] Stabilize test_no_args_tool_call and add ROCm-specific server args ( #36107 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-05 21:52:49 +08:00
Ajay Anubolu
ed81d5edd1
[Bugfix] Fix RunAI streamer crash with S3-hosted model paths ( #35976 )
...
Signed-off-by: AjAnubolu <anuboluajay@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-05 12:14:20 +00:00
Shiyan Deng
3c23ac840e
[Bugfix] Fix mypy errors in hermes_tool_parser.py ( #36114 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
2026-03-05 11:37:47 +00:00
cjackal
a708ef5944
[Misc] Fix SyntaxWarning - invalid escape sequence '\e' ( #36020 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
2026-03-05 10:55:31 +00:00
Kunshang Ji
66a2209645
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize ( #36085 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-05 10:36:39 +00:00
Doug Smith
0bfa229bf1
[Release] Include source distribution (sdist) in PyPI uploads ( #35136 )
...
Signed-off-by: dougbtv <dosmith@redhat.com >
Co-authored-by: Daniele Trifirò <dtrifiro@redhat.com >
2026-03-05 01:43:50 -08:00
Paco Xu
7493c51c55
[Docs] add Dynamo/aibrix integration and kubeai/aks link ( #32767 )
...
Signed-off-by: Paco Xu <paco.xu@daocloud.io >
2026-03-05 17:39:50 +08:00
Reagan Lee
ac773bbe80
[Docs] Update docs to include mm processor + encoder benchmarks ( #34083 )
...
Signed-off-by: Reagan <reaganjlee@gmail.com >
2026-03-05 01:38:25 -08:00
Christian Munley
48e376a007
qwen3coder tool parser fix anyOf double encoded parameters ( #36032 )
...
Signed-off-by: Christian Munley <cmunley@nvidia.com >
2026-03-05 09:06:57 +00:00
Isotr0py
21eb2c3372
[Chore] Correct MTP models test registry ordering ( #36115 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-05 08:55:04 +00:00
Seiji Eicher
e2b31243c0
[Docs] Update CacheConfig block_size docstring to remove inaccurate limit when using CUDA ( #35632 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2026-03-05 06:24:08 +00:00
Martin Hickey
c3598d02fa
[Misc] Remove deprecated items that are due for removal ( #36006 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
2026-03-05 06:14:50 +00:00
Benjamin Chislett
57c629e9c1
[Bugfix] Fix block_size for hybrid model MTP ( #36036 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-03-05 06:10:54 +00:00
zihaoanllm
d106bf39f5
[Doc] Add Parallel Draft Models ( #35973 )
...
Signed-off-by: <zihaoan2@amd.com >
Signed-off-by: zihaoanllm <zihaoan2@amd.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 05:44:07 +00:00
Yanan Cao
b0651021e5
[Kernel] [Helion] [11/N] Retune configs for silu_mul_fp8 ( #36062 )
2026-03-04 21:25:59 -08:00
Hanjun Cho
f600d5192e
[Bugfix] Fix score layer quantization for sequence classification models - Qwen3 (VL) Reranker ( #35849 )
...
Signed-off-by: Hanjun Cho <gkswns0531@gmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-04 20:57:20 -08:00
Tianmu Li
8e7820131e
[Perf] Use dummy M for weight prepacking on x86 ( #35890 )
...
Signed-off-by: Li, Tianmu <tianmu.li@intel.com >
2026-03-05 04:56:49 +00:00
Andrii Skliar
0a12cea25f
Order config.py in Lexicographical order ( #35866 )
...
Signed-off-by: Andrii Skliar <askliar@nvidia.com >
Co-authored-by: Andrii Skliar <askliar@nvidia.com >
2026-03-04 20:56:47 -08:00
Zhengxu Chen
dd6dbd93f8
[compile] Fix extra cache save on warm start. ( #35921 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-05 12:56:30 +08:00
Harry Mellor
26366009c5
[CI] Don't leave docs preview comment on closed PRs ( #36087 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 04:51:46 +00:00
Nick Hill
16c472abe7
[Core] Move ray-specific WorkerWrapperBase methods to RayWorkerWrapper ( #35328 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-05 12:11:59 +08:00
daje0601
3b23d57c96
[Model] Add LoRA support for Whisper models ( #29856 )
...
Signed-off-by: daje0601 <englishmt4118@gmail.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-05 10:38:25 +08:00
Wentao Ye
2f4226fe52
[CI] Fix pre-commit mypy issue in main ( #36049 )
2026-03-04 18:13:12 -08:00
nkm-meta
792cbd64ca
Add platform method to enable custom collective ops registration ( #34760 )
...
Signed-off-by: Naina Kuruballi Mahesh <nainakm@meta.com >
2026-03-05 00:50:32 +00:00
Zhengxu Chen
2ed4722e26
[compile] Reduce log spam from compile. ( #36044 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-05 00:48:36 +00:00
Nick Hill
a3299c3d1d
[Model Runner V2] Misc code simplification ( #35941 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-04 15:26:35 -08:00
Andreas Karatzas
6c21a0c2d7
[ROCm][CI] Added MI325 mirrors (stage C) ( #35239 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-04 14:48:46 -08:00
Shanshan Shen
562339abc3
[Misc] Support OOT linear method registering ( #35981 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-03-04 22:25:56 +00:00
amitz-nv
d7adcadb9b
[Bugfix] Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 ( #36017 )
...
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com >
2026-03-04 22:23:51 +00:00
Simon Mo
f678c3f61a
[RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag ( #35928 )
...
Co-authored-by: Cursor Agent <cursoragent@cursor.com >
2026-03-04 17:05:32 -05:00
Thomas Parnell
be0a3f7570
[Bugfix] Fix race in non-blocking num_accepted_tokens GPU->CPU copy ( #36013 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-04 13:52:44 -08:00
Harry Mellor
17dc9c7fc9
[CI] Bump mypy version ( #34950 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 20:55:11 +00:00
fenypatel99
7eca859110
Add PyTorch profiler schedule support with warmup/active iterations ( #35240 )
2026-03-04 12:53:38 -08:00
Russell Bryant
636ee223ac
[Docs] Document security risks of GPT-OSS Python tool ( #35139 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-04 20:27:31 +00:00
Robert Shaw
b7d59ffce2
[UX] Remove NoOpOffloader log ( #35678 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-04 12:13:40 -08:00
Richard Zou
5569f5218d
[torch.compile] Stop lazily compiling ( #35472 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-04 12:13:17 -08:00
Davina Zaman
138d891d7f
[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode ( #32441 )
...
Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 11:44:39 -08:00
Stefano Castagnetta
d7166e74c1
[CI] Add Blackwell AsyncTP correctness test ( #35871 )
...
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com >
2026-03-04 19:41:21 +00:00
Nick Hill
417fd28fb1
[Model Runner V2] Fix pooling ( #36019 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-04 10:53:17 -08:00
tomeras91
7faba503c4
[Kernel][Mamba] Optimize Mamba2 SSD prefill Triton kernels ( #35397 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2026-03-04 19:47:17 +01:00
Hyunkyun Moon
bc6be89d16
[Frontend] Add vllm launch command for GPU-less preprocessing serving ( #34551 )
...
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com >
2026-03-04 18:41:52 +00:00
Maxime Grenu
32224f568a
docs: update CPU Docker images to reference Docker Hub instead of AWS ECR ( #34882 )
...
Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 10:31:35 -08:00
Abhishek Mathukiya
f3dc292e9f
docs: add version requirement note for --profiler-config flag ( #32454 )
...
Signed-off-by: abhishkh <mathukiya.a@northeastern.edu >
2026-03-04 18:13:54 +00:00
Chen
138c5fa186
[Docs] Add RunPod GPU deployment guide for vLLM ( #34531 )
...
Signed-off-by: lisperz <zhuchen200245@163.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 10:11:34 -08:00
Russell Bryant
2f2c1d73a7
[Docs] Upgrade dynamic LoRA warning to admonition block ( #35218 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-04 10:01:42 -08:00
Bhuminjay Soni
fb3e78ab09
[Feature][CI]: compare func & no_func outputs in test_functionalization.py ( #35481 )
...
Signed-off-by: Bhuminjay <bhuminjaysoni@gmail.com >
Signed-off-by: Bhuminjay Soni <Soni5Happy@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-03-04 18:01:16 +00:00
Michael Yao
fd3bfe74c9
[Docs] Update design/multiprocessing.md ( #30677 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2026-03-04 17:58:59 +00:00
tc-mb
bfdb512f11
fix minicpmo4.5: fix attn_mask in vit attn && fix resampler pos_emb i… ( #34127 )
...
Signed-off-by: tc-mb <caitianchi@modelbest.cn >
Co-authored-by: hezhihui <hezhihui@modelbest.cn >
2026-03-04 17:46:17 +00:00
Sage
d25c1ec3c9
docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build ( #35090 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-04 17:45:35 +00:00
Xing Liu
7cc6058ac6
[Doc] Add MTP docs and update speculative decoding guidance ( #35197 )
...
Signed-off-by: liuxing <945764858@qq.com >
2026-03-04 17:23:34 +00:00
Manrique Vargas
28028dff2f
fix(docs): use static rdzv backend in multi-node troubleshooting script ( #34784 )
...
Signed-off-by: machov <mv1742@nyu.edu >
2026-03-04 17:15:35 +00:00
Dr Alex Mitre
3417ba5648
docs: add README for logits_processor examples ( #35933 )
2026-03-04 17:09:19 +00:00
Yan Ma
58cfe0dc44
Fix phi4-mm and remove cuda binding ( #35964 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2026-03-05 01:08:05 +08:00
simone-dotolo
e86221deb6
[Doc] Fix GPU Worker count in Process Count Summary ( #36000 )
...
Signed-off-by: simone-dotolo <simonedotolo@libero.it >
Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-04 17:03:14 +00:00
Netanel Haber
289fc48ab7
Use MMEncoderAttention (=use FlashAttention) instead of torch.sdpa in radio.py ( #35653 )
2026-03-04 08:43:13 -08:00
Christian Pinto
2f2212e6cc
Split generic IO Processor plugins tests from Terratorch specific ones ( #35756 )
...
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
2026-03-05 00:01:03 +08:00
Nicolò Lucchesi
18e01a0a10
[Misc] Add --attention-backend auto option ( #35738 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-04 15:12:27 +00:00
sungsoo ha
6cb901093f
[Core] Add All-to-All communication backend for DCP ( #34883 )
...
Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com >
Signed-off-by: sungsoo ha <hasungsoo@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 10:01:57 -05:00
Cyrus Leung
ead7bde1ab
[Bugfix] Make kaldi_native_fbank optional ( #35996 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-04 06:47:32 -08:00
Qi Wang
6aa6ad8992
[BugFix] Fix implicit and incorrect assumption on ECConnector is_producer ( #34783 )
...
Signed-off-by: Qi Wang <qiwa@nvidia.com >
2026-03-04 15:01:30 +01:00
Raghavan
c8c3935b70
[Bugfix][Model] Fix FP8 k_scale/v_scale not loaded for Qwen3-MoE ( #35656 )
...
Signed-off-by: raghavan <oneraghavan@gmail.com >
2026-03-04 13:15:38 +00:00
Ronen Schaffer
bb6888b8b1
[Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blocks in LRU/ARC prepare_store() ( #35846 )
...
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com >
2026-03-04 14:25:33 +02:00
Taneem Ibrahim
1aaec59d79
[MISC] fixed tool_parser mypy errors ( #35640 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 12:23:12 +00:00
pougetat
1659b2e058
[Feature] Add basic metrics for /realtime endpoint ( #35500 )
...
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Signed-off-by: pougetat <thomas.pougetabadie@gmail.com >
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-04 19:56:32 +08:00
haosdent
d6e04f4c43
[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models ( #34094 ) ( #34571 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com >
2026-03-04 11:56:22 +01:00
Kunshang Ji
a8f66cbde8
[XPU] bump vllm-xpu-kernels to v0.1.3 ( #35984 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-04 18:23:31 +08:00
Kunshang Ji
16d2ad1d38
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache ( #30681 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 09:49:47 +00:00
Chuan (Richard) Li
5dc3538736
[ROCm][Bugfix] Fall back from CK MXFP4 MoE when GEMM dimensions are unsupported ( #35893 )
...
Signed-off-by: Li <chuali@amd.com >
2026-03-04 08:30:54 +00:00
Nathan Price
36bf213181
[Bugfix] Add missing dynamic_arg_dims for Qwen3-ASR torch.compile ( #35869 )
...
Signed-off-by: Nathan Price <nathan@abridge.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-04 08:29:01 +00:00
Joe Runde
6f0dd93801
[Core] Remove busy loop from idle buffer readers ( #28053 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-04 07:44:20 +00:00
Andrii Skliar
5d199ac8f2
Support Audio Extraction from MP4 Video for Nemotron Nano VL ( #35539 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Signed-off-by: Andrii Skliar <askliar@nvidia.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Signed-off-by: Andrii <askliar@nvidia.com >
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Co-authored-by: Andrii Skliar <askliar@oci-nrt-cs-001-vscode-01.cm.cluster >
Co-authored-by: Andrii <askliar@nvidia.com >
Co-authored-by: root <root@pool0-03748.cm.cluster >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: root <root@pool0-02416.cm.cluster >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com >
Co-authored-by: root <root@pool0-04880.cm.cluster >
2026-03-03 23:20:33 -08:00
Komal Kumar Teru
9e0f44bec4
[cohere][fix][spec-decode]: fix crash when allowed_token_ids is set without penalties ( #35654 )
...
Signed-off-by: kkt-cohere <komal@cohere.com >
2026-03-03 23:20:15 -08:00
lailoo
097eb544e9
[Bugfix] Improve engine ready timeout error message ( #35616 )
...
Signed-off-by: damaozi <1811866786@qq.com >
2026-03-04 05:54:32 +00:00
ShiJie Zhong
7cdba98edf
[BugFix] Support tool_choice=none in the Anthropic API ( #35835 )
...
Signed-off-by: ZhongsJie <zhongsjie@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-03-04 05:24:46 +00:00
Charlie Fu
3c85cd9d74
[Rocm][CI] Fix ROCm LM Eval Large Models (8 Card) ( #35913 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2026-03-04 04:50:13 +00:00
Andreas Karatzas
edba15045a
[Bugfix] Guard mm_token_type_ids kwarg in get_mrope_input_positions ( #35711 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-04 04:12:51 +00:00
Cyrus Leung
e379396167
[Refactor] Clean up processor kwargs extraction ( #35872 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-03 19:53:53 -08:00
Isotr0py
6e9f21e8a2
[Chore] Remove debug code in model implementation ( #35883 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-03 19:50:58 -08:00
AllenDou
c1d963403c
[model] support FireRedASR2 ( #35727 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-03 19:41:30 -08:00
Shanshan Shen
77e6dcbbfa
[PluggableLayer][MM] Add PluggableLayer for RelPosAttention ( #33753 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-03-03 19:41:27 -08:00
William Zhang
70c73df69e
[Bugfix] Fix EVS implementation for Qwen3 VL ( #33607 )
...
Signed-off-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com >
2026-03-04 02:18:11 +00:00
xjx
9a9d442464
Enable bnb for multiple indices weight ( #35838 )
...
Signed-off-by: xjx <493337577@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-04 01:46:47 +00:00
Andreas Karatzas
f7da9cdffc
[ROCm][CI] Support async weight transfer example with platform-aware determinism ( #35710 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-04 09:44:14 +08:00
Jaewon
f22ff2958c
[Bugfix] Fix coord_socket assertion in DPEngineCoreProc for offline DP mode ( #35916 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
2026-03-04 00:10:11 +00:00
Nick Hill
d15c3b90fc
[Core] Move save_tensorized_model logic to Worker ( #35825 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-03 15:31:59 -08:00
zhrrr
97286a20ed
[Model Runner V2] support dp & ep for spec decoding ( #35294 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-03-03 15:19:45 -08:00
Amr Mahdi
12b38c0f45
[CI/Build] Allow mounting AWS credentials for sccache S3 auth ( #35912 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-03-03 14:30:47 -08:00
Woosuk Kwon
467886a0c4
[Model Runner V2] Fix inputs_embeds=None bug for MM models ( #35917 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-03 13:47:45 -08:00
bnellnm
a9b8b13e5c
[Bugfix] Fix misnamed parameter in compressed_tensors_moe.py ( #35813 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-03-03 16:29:57 -05:00
Micah Williamson
e7213003cb
[ROCm][CI] Fix TP size issue for test_gpt_oss ( #35887 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-03-03 20:57:34 +00:00
Rohan Potdar
3a8eef5869
[ROCm][Bugfix]: Disable AITER Triton ROPE by default ( #35601 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-03-03 13:43:56 -06:00
Robert Shaw
97995f6376
[MoE Refactor] Create MK for TRTLLM Kernels ( #32564 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
2026-03-03 10:39:50 -08:00
Robert Shaw
881a6b011b
[CI] Temporarily Disable Llama4 MoE Refactor Test ( #35870 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-03-03 10:36:15 -08:00
Matthew Bonanni
8e1fd5baf0
[CI] Bump num_speculative_tokens to 3 in nightly DeepSeek tests ( #35882 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-03 09:26:44 -08:00
JasonCohere
ae88468bcc
fix: Ensure invalid audio files return 400 error ( #34715 )
...
Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-03-03 08:47:39 -08:00
ojhaanshika
e05cb3b93e
TRTLLM gen-full attn Test Coverage ( #34986 )
...
Signed-off-by: Anshika Ojha <anshikao@nvidia.com >
Co-authored-by: Anshika Ojha <anshikao@gb-nvl-059-compute09.nvidia.com >
2026-03-03 11:35:34 -05:00
Lucas Wilkinson
28ef9ba399
[BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA ( #34552 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-03 07:21:57 -08:00
TJian
fb7fdc49c4
[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops ( #34307 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-03-03 06:24:21 -08:00
wang.yuqi
ea463978bb
[Frontend][1/n] Improve pooling entrypoints | classify. ( #35604 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-03-03 06:05:36 -08:00
Li, Jiang
440f0e7dc6
[Bugfix] Avoid src/dst as None in irecv/isend_tensor_dict ( #35754 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-03-03 05:56:08 -08:00
wang.yuqi
fd4a90f337
[CI] And PPL test for Qwen3.5. ( #35853 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-03 13:15:51 +00:00
Thomas Parnell
ad9d09e2b8
[Perf] [Hybrid] Copy num_accepted_tokens in non-blocking way when not using prefix caching ( #35442 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2026-03-03 04:15:43 -08:00
Szymon Reginis
4beebfd146
[CI/Build][Intel] Add new performance benchmarks for Intel Gaudi 3 ( #31025 )
...
Signed-off-by: Szymon Reginis <sreginis@habana.ai >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-03 19:48:24 +08:00
hallerite
b8401cde0e
add regression test ( #35834 )
...
Signed-off-by: hallerite <git@hallerite.com >
2026-03-03 07:32:15 +00:00
TJian
5dfc5abe94
[ROCm] [Release] Change the package from aiter to amd-aiter ( #35198 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-03-02 23:13:39 -08:00
lin-shh
8fa68a8ce4
Fix TYPE_CHECKING stub defaults in envs.py to match actual runtime defaults ( #35645 )
2026-03-02 21:59:43 -08:00
lin-shh
35a6f0bfe2
[Misc] Fix typos in comments: explict→explicit, paramaters→parameters ( #35648 )
2026-03-02 21:59:14 -08:00
Taneem Ibrahim
3a6cbf16e2
[MISC] Removed unused function find_all_indices() from tool_parsers/utils.py ( #35683 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-03-03 13:58:42 +08:00
Lucas Wilkinson
f44d1ddc8c
[BugFix] Fix cmake based incremental install (wrong vllm install dir) ( #35773 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-03-02 21:58:16 -08:00
Cyrus Leung
48a54c1e0d
[CI/Build] Trigger processor tests on registry update ( #35824 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-03 13:55:57 +08:00
Micah Williamson
8b9e8b7454
[ROCm][CI] Fix Assertion Logic For test_gpt_oss ( #35806 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-03-03 05:08:04 +00:00
Wentao Ye
c21d0039ec
[Refactor] Fix maxsim cuda platform and add cli to control it ( #35427 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-03 12:48:31 +08:00
Isotr0py
7d8bbe6f42
[CI/Build] Automatically patch video metadata for multimodal processor test ( #35822 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-03 04:27:45 +00:00
aykoppol
25e02647c2
[Core] Add optional flags to check for repetitive token patterns in engine output ( #35451 )
...
Signed-off-by: aykoppol <aykoppol+git@gmail.com >
2026-03-03 12:23:25 +08:00
Woosuk Kwon
a0a5178ab4
[Model Runner V2] Use ModelState.prepare_attn() for cuda graph capture [5/N] ( #35774 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-02 20:06:27 -08:00
Isotr0py
8ea8ba275e
[V0 deprecation] Remove Swin model ( #35821 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-02 20:03:41 -08:00
Woosuk Kwon
4f85bae9d6
[Docs][Model Runner V2] Add Design Docs ( #35819 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-02 19:58:14 -08:00
Andy Lo
0a7165fd71
[ModelRunnerV2] Rename sampler functions and variables for clarity ( #35459 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2026-03-02 19:48:56 -08:00
Robert Shaw
6521ccf286
[CI] Temporarily Disable Nightly Failures ( #35770 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-03 01:49:13 +00:00
Martin Vit
8ebd872f50
[Tool Parser] Fix Qwen3Coder streaming parameter loss with speculative decode ( #35615 )
...
Signed-off-by: Martin Vit <martin@voipmonitor.org >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-03 09:40:37 +08:00
zhrrr
168ee03e1c
[Model Runner V2][Perf] align dummy_run tokens to uniform decode for dp cudagraph ( #35376 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2026-03-02 17:10:47 -08:00
liuzhenwei
9dd656f0ea
[XPU][NIXL] Add GPUDirect RDMA support for XPU ( #35270 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-03 08:42:49 +08:00
Jakub Zakrzewski
c8b678e53e
[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 ( #35735 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
2026-03-03 08:32:14 +08:00
Andreas Karatzas
18c29c746b
[ROCm][CI] Fix backslash-continuation in pytest marker re-quoting and treat exit code 5 as success ( #35798 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-02 16:07:51 -08:00
Hanjie Qiu
96fc09503a
[All Reduce] Change default backend of Flashinfer All Reduce to trtllm ( #35793 )
...
Signed-off-by: hjjq <hanjieq@nvidia.com >
2026-03-02 18:57:38 -05:00
Roger Wang
1b82b433fc
[Bugfix] Fix MM processor test for Qwen3.5 ( #35797 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-03-02 23:05:08 +00:00
Robert Shaw
9319044ee9
[MoE][Perf] Wrap DSV3 QKVAProj GEMM in custom op for torch.compile ( #35751 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-02 23:03:49 +00:00
Boyuan Feng
c42dc402c1
clean unused cudagraph_batch_sizes ( #35552 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2026-03-02 22:00:16 +00:00
Ye (Charlotte) Qi
fa6a6be519
[Bugfix] Fix missing sequence_lengths in qwen3_omni_moe_thinker ( #35741 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2026-03-02 21:11:56 +00:00
Aaron Hao
cad21918e3
[BUG] Fix rlhf_async example ( #35788 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-03-02 20:36:40 +00:00
Jeffrey Wang
53700bf49b
[ci] Add Ray compatibility check informational CI job ( #34672 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2026-03-02 12:06:16 -08:00
Yashwant Bezawada
a13d8c03c9
[KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops ( #31057 )
...
Signed-off-by: Yashwant Bezawada <yashwant_b@me.com >
2026-03-02 15:04:47 -05:00
Fynn Schmitt-Ulms
9433acb8df
[Spec Decode] Add hidden states extraction system ( #33736 )
...
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com >
2026-03-02 14:29:09 -05:00
Richard Zou
d1a6e96d9e
[torch.compile] Improve cold and warm start compile tests ( #35709 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-02 19:27:06 +00:00
CSWYF3634076
2a9e3347e9
[BugFix][Model]Fix the garbled code in Ernie4.5-VL caused by fast_moe_cold_start ( #35587 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2026-03-02 18:56:33 +00:00
Isotr0py
cc0d565f40
[CI/Build] Enable Qwen3.5 tests on CI ( #35763 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-02 17:43:53 +00:00
Patryk Wolsza
358e4d5ba7
[CI][HPU] Pin vllm commit compatible with vllm-gaudi - HPU tests ( #35307 )
...
Signed-off-by: PatrykWo <patryk.wolsza@intel.com >
2026-03-02 17:02:26 +00:00
Cyrus Leung
792a74b973
[Doc] Improve UX of --enable-log-requests ( #35723 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-02 08:24:09 -08:00
Turner Jabbour
4034c3d32e
[Core] Move test utility to test file ( #35672 )
...
Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com >
2026-03-02 10:56:03 -05:00
Martin Hickey
7560d674c9
[CI] Fix mypy for vllm/device allocator ( #35518 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-02 15:53:18 +00:00
ElizaWszola
d9c7730877
[Performance] Extract kv update ops from MLA attention backends ( #34627 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Di Wu <dw2761@nyu.edu >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-03-02 10:43:19 -05:00
Runkai Tao
ada4f4fadd
[Fix Bug]num_active_loras always equals to zero ( #34119 )
...
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-02 23:17:46 +08:00
Harry Mellor
7e9149d9a9
[Docs] Add breadcrumbs for better UX ( #35749 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-02 14:31:54 +00:00
Martin Hickey
87c98b0236
[MyPy][BugFix] Check profiler is assigned before calling start() on it ( #35505 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-02 13:23:42 +00:00
Tyler Michael Smith
de7dd634b9
Fix unresolved-import errors when using Astral's ty by removing src.root ( #35681 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-03-02 10:26:47 +00:00
Chauncey
9a87b0578f
[Feat] Supports Anthropic Messages count_tokens API ( #35588 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-02 09:48:54 +00:00
wangxiyuan
510bc9e1df
[Misc] Cleanup useless current_platform import ( #35715 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2026-03-02 09:36:54 +00:00
Charles Ashby
cbd361fd46
[CPU][Distributed] Fix Enable _CPUSHMDistributed only when TP/PP ranks share the same SHM group name ( #34169 )
...
Signed-off-by: Charles Ashby <charlesa.l@hotmail.com >
2026-03-02 09:34:35 +00:00
Nicolò Lucchesi
c212202d93
[Misc] Bound NIXL upper bound version ( #35495 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-02 16:57:07 +08:00
Andreas Karatzas
ec27b36b4b
[CI] Defining extended V1 e2e + engine tests ( #35580 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-02 08:10:54 +00:00
Charlie Fu
3fd1d4ec2c
[Rocm][CI] Fix LM Eval Large Models (H100) test group ( #34750 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2026-03-02 07:43:38 +00:00
EdalatiAli
cb21972a97
[Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels ( #34448 )
...
Signed-off-by: EdalatiAli <aliedalati@cohere.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-03-01 23:31:19 -08:00
Andreas Karatzas
c34963f138
[ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism ( #35152 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-02 15:04:18 +08:00
Hongxia Yang
f26650d649
[ROCm] add amd-quark package in requirements for rocm to use quantized models ( #35658 )
...
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com >
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com >
2026-03-02 06:02:43 +00:00
Kunshang Ji
92f5d0f070
[XPU] fix mxfp4 activation type ( #35691 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-02 11:48:39 +08:00
Jesse Cai
a60985b07e
Fix deprecated v1 config tests ( #35327 )
...
Signed-off-by: Jesse Cai <jessecai@fb.com >
2026-03-01 20:32:03 -05:00
Lucas Wilkinson
8b5014d3dd
[Attention] FA4 integration ( #32974 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-03-01 23:44:57 +00:00
zhanqiuhu
57a96e26c9
Revert "[Bugfix] Disable TRTLLM attention with KV transfer enabled ( #33192 )" ( #34832 )
...
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
2026-03-01 22:32:37 +00:00
Richard Zou
e82fbeec7b
[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 ( #35475 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-01 21:44:22 +00:00
haosdent
6290470843
[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile ( #35256 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-03-01 15:14:46 -05:00
Woosuk Kwon
72f4d16262
[Model Runner V2] Use block table apis for capture inputs ( #35671 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-01 10:31:13 -08:00
Seungho Yoon
5a435507d8
fix(mxfp4): return is_monolithic=False when LoRA is enabled for Triton backend ( #35382 )
...
Signed-off-by: Seungho Yoon <yoonsnowdev@gmail.com >
2026-03-01 09:59:30 -05:00
Taneem Ibrahim
59d7af9c6c
[MISC] Fixing a null reference by removing parallel_utils from mypy EXCLUDE ( #35630 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-03-01 09:26:44 -05:00
Asaf Gardin
bbf81f9a92
[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching ( #34798 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-03-01 20:40:23 +08:00
Woosuk Kwon
da543d1abe
[Model Runner V2] Minor refactoring for EncoderRunner ( #35628 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-01 00:15:39 -08:00
Ryan Rock
87d319c52f
[AMD][CI] Support Triton attention with ExampleConnector ( #34931 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2026-03-01 09:58:07 +02:00
lin-shh
a9ec392c86
Fix typo: implictly -> implicitly in isaac.py docstring ( #35646 )
2026-02-28 23:34:37 -08:00
lailoo
afd089f231
[Bugfix][Model] Fix Qwen3.5/Qwen3Next ignoring --dtype flag on older GPUs ( #35617 )
2026-03-01 03:27:37 +00:00
gnovack
3ecd0bf9fc
Add TMA support to fused_moe_lora kernel ( #32195 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-01 10:55:25 +08:00
Woosuk Kwon
e3eb146f7a
[Model Runner V2] Add ModelStateInterface [4/N] ( #35621 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-28 13:19:45 -08:00
Martin Vit
95a395dbec
[Bugfix] Fix Anthropic API base64 image handling in Messages endpoint ( #35557 )
...
Signed-off-by: Martin Vit <martin@voipmonitor.org >
2026-02-28 20:57:08 +00:00
Isotr0py
e94b263bd6
[Chore] Cleanup BNB utilization dead code ( #35620 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-28 19:22:41 +00:00
Wentao Ye
e113a30113
[Deprecation] Deprecate code in 0.17 as scheduled ( #35441 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-28 17:32:37 +00:00
Cyrus Leung
1dafb29f91
[Benchmark] Avoid unnecessary video download in MMVU ( #35618 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-28 09:07:02 -08:00
emricksini-h
49b9ae32e9
[Fix] Avoid sending image input to other PP ranks ( #35405 )
...
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-01 00:14:29 +08:00
cwazai
63d7972f13
Fix Qwen3_5MTP packed_modules_mapping for gate_up_proj ( #35581 )
2026-02-28 14:50:55 +00:00
flutist
c68e69f144
custom dataset img support base64 ( #35280 )
...
Signed-off-by: xjx <493337577@qq.com >
2026-02-28 11:49:52 +00:00
Chauncey
7e08c22b8c
[Feat] Add CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function ( #35271 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-28 10:12:00 +00:00
Augusto Yao
8e75d88554
add io_process_plugin for sparse embedding ( #34214 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-28 09:16:37 +00:00
Mario Hong
0892d1ab1f
[Feature]Supports Anthropic Thinking Block ( #33671 )
...
Signed-off-by: mariohong <mariohong128@gmail.com >
Co-authored-by: zetaohong <i-hongzetao@stepfun.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-02-28 09:02:33 +00:00
Hashem Hashemi
7600642eae
Add padding support to wvSplitK solution for skinny GEMMs ( #33762 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-28 09:02:05 +00:00
Andreas Karatzas
1e69c04887
[ROCm][CI] Parametrize vision score tests across attention backends with per-backend tolerances ( #35571 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-28 08:59:26 +00:00
Cyrus Leung
4292e3b807
[Benchmark] Improve UX of sweep scripts ( #35600 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-28 00:36:02 -08:00
Cyrus Leung
24d6ea8afd
[Benchmark] Rename SLA Finder to Workload Explorer ( #35586 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-27 23:31:55 -08:00
Chauncey
57c86c0741
[Misc] Change logging level from info to debug for tool parser import ( #35575 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-28 14:51:35 +08:00
Chauncey
06254d4cbb
[CI] add trainer_send_weights for MockWeightTransferEngine ( #35589 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-28 06:47:43 +00:00
Andreas Karatzas
f5d1281c9d
[ROCm][CI] Expose tests to AMD production CI and fix amdsmi heap corruption ( #35071 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-28 13:57:31 +08:00
Andreas Karatzas
94029ffaf0
[ROCm] Derive device capability from GCN arch string without CUDA init ( #35069 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-28 13:55:28 +08:00
Andreas Karatzas
88e8525f2e
[ROCm][CI] Adding infiniband mappings for moriio tests ( #35170 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-28 13:53:28 +08:00
Ilya Markov
b2d8b422b2
[EPLB] Enforce sync eplb for NCCL-based all2all backend ( #35212 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-02-28 05:47:12 +00:00
Umut Polat
1d5ab5d603
[Bugfix] Move chat completion response_format validation to Pydantic model_validator ( #35510 )
...
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com >
2026-02-27 21:26:19 -08:00
Huy Do
7b346ba8ed
[Bugfix] Propagate compilation_time from workers to main process for TP>1 ( #35503 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2026-02-28 05:03:22 +00:00
Itay Alroy
dea268336f
[1/N] Elastic EP Milestone 2 ( #34861 )
...
Signed-off-by: Yongji Wu <wuyongji317@gmail.com >
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com >
Co-authored-by: Yongji Wu <wuyongji317@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com >
2026-02-28 04:46:42 +00:00
Ma Jian
90805ff464
[CI/Build] CPU release supports both of AVX2 and AVX512 ( #35466 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: jiang1.li <jiang1.li@intel.com >
2026-02-28 04:35:21 +00:00
Matthew Bonanni
2562e0271e
[MTP] Validate that MTP weights are actually loaded ( #35548 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-28 12:27:40 +08:00
Cyrus Leung
fd68cd132b
[Bugfix] Fixes for SLA finder ( #35537 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-27 20:20:55 -08:00
Micah Williamson
0edf101d2b
[ROCm] Add stablelm Head Size 80 To Supported Head Sizes For ROCM_ATTN ( #35527 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-28 12:16:34 +08:00
Douglas Lehr
d5b6f3ba36
[ROCm][Quantization] Add Composable Kernel (CK) backend support for M… ( #34301 )
...
Signed-off-by: Doug Lehr <douglehr@amd.com >
Signed-off-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com >
Signed-off-by: Douglas Lehr <Doug.Lehr@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com >
2026-02-28 03:37:01 +00:00
Woosuk Kwon
1a014a0a93
[Model Runner V2] Move MM encoder to Model States [3/N] ( #35564 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-27 18:32:38 -08:00
Woosuk Kwon
86ac7bcf84
[Model Runner V2] Support pooling models ( #35120 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-27 18:03:01 -08:00
Umut Polat
405f28d38d
[Misc] Clean up ResponsesRequest model validators ( #35531 )
...
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com >
2026-02-28 01:19:21 +00:00
youkaichao
5323672bc2
[misc] cleanup one level of error stack when nixl fails to initialize ( #35517 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2026-02-28 08:42:37 +08:00
Roberto L. Castro
a201ad72d8
[Refactor][Kernel] Add global helper to deduplicate vectorized memory ops ( #35105 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
2026-02-27 16:28:17 -08:00
Rohan Potdar
e3691988d0
[ROCm]: fix aiter rope functionalization ( #35533 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-27 22:42:30 +00:00
Gregory Shtrasberg
9fa6c68fa6
[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends ( #35334 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-02-27 21:32:55 +00:00
Aaron Hao
2ce6f3cf67
[Feat][RL][2/2] Native Weight Syncing API: IPC ( #34171 )
...
Signed-off-by: hao-aaron <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-02-27 13:45:21 -07:00
Jakub Zakrzewski
1f3dbd95fd
[Bugfix][Model] Fix gpt-oss batch invariance ( #35404 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
2026-02-27 20:41:24 +00:00
Lucas Wilkinson
1d532f9d8f
[DP] Only use DP padding when cudagraphs are actually used ( #34102 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-27 15:14:31 -05:00
Lucas Kabela
234a65b781
[Bugfix] Add monkeypatch to prevent race condition from writing ( #35420 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-02-27 14:51:36 -05:00
SteadfastAsArt
2decec9856
[Transformers backend] Ignore MTP weights when num_nextn_predict_layers=0 ( #34888 )
...
Signed-off-by: SteadfastAsArt <695488173@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-27 19:39:23 +00:00
Zhengxu Chen
29b35477b0
[compile] Fix caching error over pytree slice node. ( #35308 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-02-27 19:34:16 +00:00
Nick Hill
b1d9f5372d
[Model Runner V2] Warmup kernels ( #35172 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-27 10:43:30 -08:00
Raushan Turganbay
fd6de37fca
[BugFix] Fix 3D rope in transformers backend ( #35097 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-27 18:34:49 +00:00
Netanel Haber
c8aca0c9e1
Support parakeet as audio encoder for nemotron-nano-vl ( #35100 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-27 11:07:38 -07:00
Martin Hickey
b602e4f299
[Doc] Fix link to Llama chat template for usability ( #35525 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-27 17:51:09 +00:00
Huamin Li
157722da75
[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block ( #35480 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2026-02-28 01:50:37 +08:00
Nick Hill
1d897ff04f
[Misc] Fill in some v1 CODEOWNERS gaps ( #35524 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-27 09:34:37 -08:00
fort726
905d76b51d
[Model] Add huggingface skt/A.X-K1 model ( #32407 )
...
Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com >
Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com >
Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-27 09:26:02 -08:00
Yanan Cao
9098ce690c
[Kernel] [Helion] [7/N] Use HOP to represent Helion Kernel call to enable fx tracing and pattern matching ( #34390 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-02-27 09:21:35 -08:00
Nick Hill
876312f0b5
[Core] Fix gpu_worker.py pre-commit errors ( #35312 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-27 07:54:24 -08:00
Boyuan Feng
5de98abc12
Add @BoyuanFeng to CODEOWNERS ( #35317 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2026-02-27 15:53:47 +00:00
Koushik Dutta
9251ed5c4f
[Bugfix] Handle case when kimi ends reasoning with a tool call ( #33646 )
...
Signed-off-by: Koushik Dutta <koushd@gmail.com >
Co-authored-by: mondaylord <20212010046@fudan.edu.cn >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-27 14:58:28 +00:00
Yueqian Lin
e8249378e4
[Bugfix] Fix check_interleaved_audio_video false positive for batched non-interleaved requests ( #35487 )
...
Signed-off-by: linyueqian <linyueqian@outlook.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-27 06:48:25 -08:00
haosdent
6d4f9d3ad5
[Bugfix] Fix DCP + FA3 crash due to missing num_splits in _forward_with_dcp ( #35082 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-27 22:27:06 +08:00
Harry Mellor
fbe3f0120a
Revert "Add GlmOcrConfig for GLM-OCR model type recognition" ( #35512 )
2026-02-27 06:13:27 -08:00
Jason Li
66c1751d13
[compile] Cleanup: Remove unnecessary +rms_norm forcing for sequence parallelism ( #35410 )
...
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com >
2026-02-27 08:36:37 -05:00
Tib
6467b635b6
[Bugfix] Add missing activation attr to RMSNormGated ( #35423 )
...
Signed-off-by: tibG <naps@qubes.milou >
Co-authored-by: tibG <naps@qubes.milou >
2026-02-27 12:53:35 +00:00
Max Hu
9c3fe9936b
Flashinfer cuDNN backend for Qwen3 VL ViT attention ( #34580 )
...
Signed-off-by: Max Hu <maxhu@nvidia.com >
Signed-off-by: Max Hu <hyoung2991@gmail.com >
Co-authored-by: Max Hu <maxhu@nvidia.com >
Co-authored-by: Shang Wang <shangw@nvidia.com >
2026-02-27 20:20:23 +08:00
Umut Polat
b66a74649e
[Bugfix] Replace assert with ValueError for response_format validation in completions endpoint ( #35456 )
...
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com >
2026-02-27 08:01:06 +00:00
Wang Xingran
07bdabef03
[Bugfix] Use 'sum' reduction instead of 'avg' in Async TP reduce-scatter ( #33088 )
...
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com >
Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com >
Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com >
2026-02-27 07:06:08 +00:00
Chengyi Nie
a572baff5e
[Model Performance] Add Qwen3MoE tuned MoE configs for H200 ( #35457 )
...
Signed-off-by: Chengyi Nie <cnie@roblox.com >
Co-authored-by: Chengyi Nie <cnie@roblox.com >
2026-02-27 13:51:14 +08:00
zofia
516cf26698
[Bug] correct out dtype of rms_norm_gated native path ( #35369 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-27 05:19:51 +00:00
Jiangyun Zhu
487e5c51f7
[Bugfix] disable allreduce_rms_fusion by default when pp size > 1 ( #35424 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-02-27 04:18:52 +00:00
Daniel Huang
1a8c71674e
[BugFix] Repo utils debug print patch ( #35434 )
...
Signed-off-by: Daniel Huang <daniel1.huang@intel.com >
2026-02-27 03:50:56 +00:00
Wentao Ye
062b789632
[Bug] Fix outdated links in source code ( #35314 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-27 03:50:46 +00:00
gnovack
a532c83849
use 'max_active_experts' for moe lora input size ( #33197 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2026-02-27 03:50:43 +00:00
Jee Jee Li
1e5ad9b74f
[Bugfix] Fix Qwen3NextForCausalLM packed_modules_mapping ( #35413 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-26 19:46:30 -08:00
Nicolò Lucchesi
cabdaa7619
[Misc] Move GPUModelRunner.prepare_kernel_block_sizes to utils ( #35400 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-27 11:42:51 +08:00
Chenyaaang
06be53563b
[Core]Extract is_last_rank in Ray for tpu to override ( #33012 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2026-02-27 03:18:52 +00:00
Angela Yi
c29ee9c326
[compile] Invalidate cache for cpu flags ( #35119 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-02-27 02:54:11 +00:00
daniel-salib
d43048ce05
[Bugfix] Emit reasoning_part events in simple streaming path for Resp… ( #35184 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2026-02-27 09:49:06 +08:00
Michael Goin
4fec53cfcb
[CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI ( #34274 )
2026-02-26 17:58:03 -07:00
roikoren755
38c498b8e3
[Performance] Cublas Bf16 Gate with Fp32 Output ( #35121 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-02-26 16:51:28 -08:00
Andrii Skliar
56a6371706
[Update] Use FlashInfer fast_decode_plan directly instead of replication ( #34687 )
...
Signed-off-by: Andrii <askliar@nvidia.com >
Co-authored-by: Andrii <askliar@nvidia.com >
2026-02-26 16:31:43 -08:00
Pavani Majety
6283021142
[Bugfix] Fix KV Scale loading for MLA Models ( #35430 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2026-02-26 23:38:19 +00:00
Aleksandr Malyshev
01923eec70
[ROCm][Quantization] GPT OSS Upstream MoE wmxfp4_afp8 with static scales ( #30357 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2026-02-26 16:50:16 -06:00
pkousha
31fb6f43da
[Kernel][perf] optimize NCCL symm_mem vs custom_AR selection thresholds ( #33839 )
...
Signed-off-by: <>
Signed-off-by: pkousha <43781676+pkousha@users.noreply.github.com >
Co-authored-by: Pouya Kousha <pkousha@login-eos01.eos.clusters.nvidia.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-26 14:35:58 -08:00
Tyler Michael Smith
eb19955c37
[WideEP] Remove pplx all2all backend ( #33724 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-26 14:30:10 -08:00
Lucia Fang
0f2f24c8b2
[Bugfix] Fix MessageQueue connect_ip for cross-node data parallelism ( #35429 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-26 22:08:16 +00:00
sychen52
d0105b84f0
add mixed precision support for modelopt ( #35047 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com >
2026-02-26 21:56:24 +00:00
danielafrimi
832a780f3a
Nemotron: use per-layer config in NemotronHMLPDecoderLayer for heterogeneous models ( #35396 )
...
Signed-off-by: dafrimi <dafrimi@nvidia.com >
2026-02-26 16:55:19 -05:00
ElizaWszola
98217b09f9
[Performance] Extract KV cache update op from flashinfer forward ( #35422 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
2026-02-26 21:29:01 +00:00
不做了睡大觉
967572dd5f
fix(reasoning): Qwen3ReasoningParser returns truncated output as reasoning ( #35230 )
...
Signed-off-by: stakeswky <stakeswky@users.noreply.github.com >
Co-authored-by: stakeswky <stakeswky@users.noreply.github.com >
2026-02-26 20:30:45 +00:00
Woosuk Kwon
3d66502e1b
[Model Runner V2] Prepare attn metadata in ModelState [2/N] ( #35383 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-26 11:47:02 -08:00
Woosuk Kwon
c66aa48e99
[Model Runner V2] Add model states [1/N] ( #35350 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-26 11:20:35 -08:00
Nick Hill
b6d5a17298
[Model Runner V2] Fix error-handling ( #35063 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-26 11:00:19 -08:00
Lucas Wilkinson
5e58bdc711
[Bugfix] Remove erroneous lower bound on LoRA vocab size constraint ( #35354 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-26 18:44:50 +00:00
Runkai Tao
a1f53addb1
[BugFix] Align fused MoE-LoRA kernel config with actual weight shapes ( #34396 )
...
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu >
2026-02-26 18:03:10 +00:00
Wentao Ye
05970c772c
[Refactor] Remove dead code for attention benchmark script ( #35418 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-26 09:53:46 -08:00
Yiliu Dong
d940607629
[Core] Support min_tokens with speculative decoding ( #32642 )
...
Signed-off-by: qianlihuang <yiliu.dong@qq.com >
Co-authored-by: qianlihuang <yiliu.dong@qq.com >
2026-02-26 12:31:28 -05:00
Wentao Ye
99c7892c5b
[Perf] Optimize maxsim scores computation for pooling models, 13.9% E2E throughput improvement ( #35330 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-26 17:14:54 +00:00
hujia177
ec8f943db1
Add GlmOcrConfig for GLM-OCR model type recognition ( #34982 )
2026-02-26 17:04:42 +00:00
Or Ozeri
f2ad952f40
[BugFix][kv_offload]: Fix kernel block size detection ( #35125 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-02-26 16:29:34 +00:00
Sage Moore
9e2cabdf9c
[ROCm] Update the torch version in rocm_build.txt to use the official 2.10 release ( #34387 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2026-02-26 16:28:45 +00:00
Douglas Lehr
ec8ab9d254
[ROCm] Add dynamic mxfp4 quantization for DeepSeek V2 projection layers ( #34157 )
...
Signed-off-by: Doug Lehr <douglehr@amd.com >
Signed-off-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
2026-02-26 10:00:49 -06:00
Wentao Ye
05972ea7e5
[Refactor] Remove dead or duplicate func utils or variables ( #35318 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-26 10:57:56 -05:00
Jakub Zakrzewski
111d869069
[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model ( #35297 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
2026-02-26 14:17:17 +00:00
stingoChen
7fea7250a4
[Bug] Fix missing <think> tag after tool call in MiniMax 2.1 ( #35352 )
...
Signed-off-by: 冬马 <chenxinke@cai-inc.com >
Co-authored-by: 冬马 <chenxinke@cai-inc.com >
2026-02-26 22:11:07 +08:00
Cyrus Leung
845ee348ef
[Misc] Standardize handling of mm_processor_kwargs.size ( #35284 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-26 13:05:46 +00:00
Asaf Gardin
ec13e549d3
[Bugfix] Fix uint32 overflow in Mamba selective scan state pointer arithmetic ( #35275 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-02-26 12:22:06 +00:00
Li-Yongwen
c6ca51598a
[Bugfix] fix device_name for routing replay ( #34336 )
...
Signed-off-by: liyongwen <1310439159@qq.com >
2026-02-26 12:18:38 +00:00
Yueqian Lin
c0615a296d
[Bugfix] Fix Qwen2.5-Omni and Qwen3-Omni mixed-modality embed regression ( #35368 )
...
Signed-off-by: linyueqian <linyueqian@outlook.com >
2026-02-26 11:58:23 +00:00
Harry Mellor
01914445b0
Remove bc-lint ( #35274 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-26 03:01:01 -08:00
Kunshang Ji
5281713e11
[XPU] use fixed UMD version in dockerfile.xpu ( #35392 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-26 18:54:55 +08:00
HZY
32693db8ce
[Bugfix] [Qwen3.5]Fix Qwen3.5 FP8 quantization: tuple shard_id weight loading ( #35289 )
...
Signed-off-by: daowu.hzy <daowu.hzy@alibaba-inc.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-26 18:26:15 +08:00
Akash kaothalkar
e03ddcfbd4
[Hardware][Powerpc]Enable prefix caching and chunked prefill for ppc64le ( #35081 )
...
Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash kaothalkar <akash.kaothalkar@ibm.com >
2026-02-26 10:21:24 +00:00
Sophie du Couédic
02acd16861
[Benchmarks] Plot benchmark timeline and requests statistics ( #35220 )
...
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-26 02:17:43 -08:00
Jiangyun Zhu
ab87f85231
[Model] Ring 2.5 ( #35102 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-02-26 02:17:11 -08:00
Krish Gupta
3827c8c55a
[Test] Add tests for n parameter in chat completions API ( #35283 )
...
Signed-off-by: KrxGu <krishom70@gmail.com >
2026-02-26 09:14:07 +00:00
Kevin McKay
ade81f17fe
[Bugfix][Hardware][AMD] Gate FP4 ops on gfx950 to prevent MI300X crash ( #35250 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2026-02-26 16:11:07 +08:00
Gregory Shtrasberg
6042e66cd5
[ROCm] Add extra step in config initialization to populate custom ops before compilation config init ( #34848 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-02-26 16:05:40 +08:00
Chaojun Zhang
9f9a675b23
[XPU][8/N] Fix kernel bugs in XPU LoRA and MOE LORA ( #34115 )
...
Signed-off-by: chzhang <chaojun.zhang@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-26 15:46:44 +08:00
Ofir Zafrir
a07c4c5939
[BugFix][XPU] Fix speculative decoding on Intel XPU due to bug with IGC_ForceOCLSIMDWidth=16 ( #35298 )
...
Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-26 07:15:16 +00:00
Cyrus Leung
d3a51da92a
[Benchmark] Simplify SLA scan ( #35306 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-25 22:35:41 -08:00
Flora Feng
186ea22efe
[Misc][Harmony] Move Responses API only harmony utils to responses/harmony.py ( #35339 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-02-26 14:35:16 +08:00
Daniele
4a9c07a0a2
[BugFix] anthropic/serving_messages: fix tool call arguments streaming ( #34887 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-26 05:39:48 +00:00
Jason Li
9d37941017
[torch.compile] Sequence Parallelism threshold compile ranges ( #28672 )
...
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com >
Signed-off-by: Jason Li <jasonlizhengjian@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-26 05:00:12 +00:00
Fadi Arafeh
4171ff6dd9
[CPU][Feat] Enable KleidiAI INT8_W4A8 for all input dtypes ( #34890 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-02-26 05:00:10 +00:00
Woosuk Kwon
13025e71e8
[Model Runner V2] Add coding style guide ( #35325 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-25 20:42:40 -08:00
Hanjie Qiu
71dfce6aa6
[Kernel] Refactor FlashInfer allreduce for mnnvl backend ( #34109 )
...
Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com >
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com >
2026-02-26 03:17:20 +00:00
hujiaxin0
2aa4140402
openpangu-vl support video input ( #34134 )
...
Signed-off-by: hujiaxin <524446785@qq.com >
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com >
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-26 03:08:09 +00:00
Roberto L. Castro
86c3b5a808
[BugFix] Fix fp4 quant kernel on CUDA 12.8 ( #35210 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
2026-02-25 18:32:50 -08:00
Seungmin Kim
160424a937
[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs ( #33992 )
...
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com >
Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com >
Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-25 18:15:51 -08:00
Lucas Wilkinson
9511a3f8ee
[Bugfix] Fix AttributeError in SMControlContextManager ( #35338 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-25 18:01:10 -08:00
Michael Goin
de527e1cec
[UX] Add --moe-backend arg for explicit kernel selection ( #33807 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-25 17:44:44 -08:00
Yongye Zhu
1976356ee6
[MoE Refactor] MXFP4 Cutlass Experts to MK ( #34542 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2026-02-25 17:32:39 -08:00
Michael Goin
cbf8f7028c
[UX] Add --performance-mode {balanced,interactivity,throughput} ( #34936 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-25 17:28:31 -08:00
Ming Yang
6831650c40
[offloader] v2: Hide weight onloading latency via prefetching ( #29941 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-25 17:20:59 -08:00
Andreas Karatzas
ed42507f6d
[ROCm][CI] Amending deletion of AMD mirror ( #35322 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-25 14:17:56 -08:00
Andreas Karatzas
9571e99945
[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests ( #35265 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-25 14:16:18 -08:00
Elizabeth Thomas
c97234c08b
fix(mxfp4): Disable monolithic path for TRITON backend with EP ( #34270 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-25 13:33:42 -08:00
rasmith
b188bab441
[CI][AMD][BugFix] Add torch.cuda.set_device to test_punica_ops so punica kernels execute on same device as tensor ( #34985 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-25 19:18:00 +00:00
Lucas Wilkinson
15d76f74e2
Revert "[Misc] Enable weights loading tracking for quantized models" ( #35309 )
2026-02-25 09:20:15 -08:00
Andreas Karatzas
8fd6975479
[ROCm][CI] Disable skinny GEMMs in multimodal tests to fix non-deterministic results ( #35049 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-25 16:48:37 +00:00
pushkar
5d18bf8b32
[Bugfix] Fix Harmony preamble visibility in Responses API ( #32114 )
...
Signed-off-by: Pushkar Patel <git@thepushkarp.com >
Signed-off-by: pupa <pupa@users.noreply.github.com >
2026-02-25 08:08:16 -08:00
haosdent
0788ff0a15
[Bugfix] Gracefully disable AllReduceFusionPass on GPUs without multicast support ( #35085 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-25 07:31:45 -08:00
Chendi.Xue
d72b0be33c
[XPU]Fix for Qwen-OMNI crash ( #35249 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2026-02-25 07:31:07 -08:00
Bhoomit
42489e43c2
[Misc][LoRA] Increase max vocab size limit to 258048 in logits processor ( #34773 )
...
Signed-off-by: Bhoomit Vasani <vbhoomit@amazon.com >
2026-02-25 23:30:55 +08:00
Mario Hong
af5e6afa0a
[Bugfix] Fix step3p5 reasoning with interleaved thinking ( #34211 )
...
Signed-off-by: mariohong <mariohong128@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-02-25 15:13:01 +00:00
Benjamin Chislett
ee59a7c615
[Tests] Add GSM8k check to SpecDec E2E tests ( #34772 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-25 07:51:14 -05:00
Joao Gante
709eadbb0b
Doc link typo ( #35281 )
...
Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-25 03:00:31 -08:00
Harry Mellor
90fc7f9109
Fix custom processors that use deleted behaviour for Transformers v5 ( #35107 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-25 02:36:21 -08:00
Yanwen Lin
675ec59aa9
[Bugfix][CPU] Fix basic unit tests failing in CPU platforms ( #34677 )
...
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-25 08:36:15 +00:00
Yanwen Lin
80e60a6133
[Doc] Suggest "--managed-python" flag when installing python using uv ( #33069 )
...
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com >
2026-02-25 08:19:43 +00:00
jonoillar
26e722f906
[DOC][BugFix] Specfiy build dependency installation ( #34513 )
...
Signed-off-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com >
Co-authored-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com >
2026-02-25 08:04:06 +00:00
lichuang
2c619e5e3f
[Docs]Fix documentation formatting in architecture overview ( #34679 )
...
Signed-off-by: codedump <lichuang1982@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-25 08:00:15 +00:00
Simon Mo
8a685be8d9
docs: document committer proposal process in governance ( #35225 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-25 07:58:48 +00:00
Laura Wang
2465071510
[Perf] Add opt-in SM100 Oink RMSNorm custom-op path ( #31828 )
...
Signed-off-by: Laura Wang <3700467+Laurawly@users.noreply.github.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-24 23:01:53 -08:00
wenshuai
cd43673668
[Perf] Optimize FP8 gemm of sm120. ( #34424 )
...
Signed-off-by: wenshuai <wenshuai@xiaomi.com >
2026-02-24 22:25:24 -08:00
Xinyu Chen
35d44b4557
[XPU]Support CUDAGraph on XPU Platform ( #34482 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
Co-authored-by: chzhang <chaojun.zhang@intel.com >
Co-authored-by: zhenwei-intel <zhenwei.liu@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-24 22:22:52 -08:00
Kunshang Ji
8ad54a991b
[Platform] Add current_platform.num_compute_units interface ( #35042 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
2026-02-24 22:22:49 -08:00
Kunshang Ji
92510edc32
remove cuda check in top_k_top_p_triton kernel ( #35011 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-24 22:22:31 -08:00
Isotr0py
a6c137521c
[Misc] Add shard_id validation for MergedColumnLinear ( #35055 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-24 22:12:28 -08:00
Isotr0py
4572a06afe
[Misc] Enable weights loading tracking for quantized models ( #35074 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-24 22:11:03 -08:00
Zhengxu Chen
5cc29cfb8b
[compile] Improve error message during artifacts load failure. ( #35115 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-02-24 22:01:09 -08:00
Chen Zhang
8fae54faff
[Linear Attention] fix bug for linear attention + prefix caching + reset_prefix_cache ( #35157 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2026-02-24 22:00:19 -08:00
Harry Mellor
f7967577f5
Remove requirement to use --hf-overrides for DeepseekVLV2ForCausalLM ( #35203 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-24 22:00:06 -08:00
pks
af770b8e7b
[Bugfix] Fix AttributeError when passing StructuredOutputsParams to CompletionRequest ( #35237 )
...
Signed-off-by: Patrick Simianer <patrick@lilt.com >
2026-02-24 22:00:03 -08:00
Andreas Karatzas
2ff3e436ad
[Responses][CI] Filter negative token IDs in schema fuzz test to avoid 500 errors ( #35231 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-25 05:52:44 +00:00
Jhao-Ting Chen
c2c4c4611a
[FIX] fused moe with lora shared expert dual stream (1.07x otps) ( #34933 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-25 04:40:45 +00:00
Rohan Potdar
f38f8c9742
[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE ( #35180 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-25 04:36:40 +00:00
Flora Feng
ec1d30c0f6
[Responses] Decouple SSE event helpers from Harmony context ( #35148 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-02-24 20:05:25 -08:00
Pooya Davoodi
e3b2324ec4
[Frontend] Use init_app_state and FrontendArgs in run_batch ( #32967 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-24 19:40:39 -08:00
Nick Hill
dbf0da817a
[Core] Cleanup engine pause/sleep logic ( #34528 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-24 19:33:34 -08:00
Xin Yang
3bbb2046ff
[Bugfix] Fix expert_ids padding values in moe_align_block_size kernel ( #35161 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-24 17:14:24 -08:00
yugong333
576fe50333
Adding Nemotron fp8 Triton MoE Config ( #34674 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-24 15:56:38 -08:00
Hashem Hashemi
a0e50a4260
Convert wvSplitKQ to 16x16 MFMA in prep for mi4xx. ( #34100 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-24 23:35:21 +00:00
Benjamin Chislett
9fa5b25a23
[Bug][DSV3.2] Always prepare metadata for DeepGEMM Sparse Attention ( #35075 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-24 14:55:22 -08:00
Robert Shaw
ea97750414
[CI] Fix Distributed Tests ( #35236 )
...
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2026-02-24 22:31:56 +00:00
Andreas Karatzas
067c5d9ad1
[ROCm][CI] Added MI325 mirrors ( #34923 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-24 13:37:15 -08:00
Benjamin Chislett
f5972a872f
[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support ( #33726 )
...
Signed-off-by: Shahar Mor <smor@nvidia.com >
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Shahar Mor <smor@nvidia.com >
Co-authored-by: Roi Koren <roik@nvidia.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-24 09:49:56 -08:00
Matthew Bonanni
a9e15e040d
Add @MatthewBonanni to CODEOWNERS ( #35207 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-24 10:45:10 -07:00
Lucas Wilkinson
542ca66357
Revert "[CI/Build] Remove redundant OpenTelemetry pip install from CI configs" ( #35211 )
2026-02-24 09:26:42 -08:00
Cyrus Leung
fc8456c336
[CI/Build] Fix kernels test location ( #35205 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-24 09:20:34 -08:00
Wentao Ye
9ce8fad2a9
[Perf] Optimize Python Slice for Structured Output using islice instead of [:] ( #33593 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-24 09:02:36 -08:00
Harry Mellor
c38b8d5a31
Remove padding_index from models that don't use it for better Transformers v5 compatibility ( #35189 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-24 08:04:46 -08:00
Robert Shaw
60da0e1544
[CI] Remove Duplicated Tests ( #35199 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-24 23:53:30 +08:00
danisereb
9609b1f18d
Integrate flashinfer mm_mxfp8 in ModelOpt MXFP8 ( #35053 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-24 08:45:13 -07:00
danisereb
a0c7081695
Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe ( #35088 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-24 07:25:44 -08:00
R3hankhan
34ce0ffd1f
[CPU][Perf] Accelerate Attention head for s390x using vector intrinsics ( #34434 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-02-24 07:25:39 -08:00
Robin Nabel
0de5333989
Fix GLM4 parser tests ( #34905 )
...
Signed-off-by: Robin Nabel <opensource@nabel.co >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-02-24 22:27:42 +08:00
Eldar Kurtić
a87cc50859
[Attn,KV-cache] Use per-head scales in the attention selector ( #34281 )
...
Signed-off-by: Your Name <you@example.com >
Signed-off-by: Eldar Kurtic <research@neuralmagic.com >
Co-authored-by: Eldar Kurtic <research@neuralmagic.com >
Co-authored-by: Your Name <you@example.com >
2026-02-24 09:02:43 -05:00
Cyrus Leung
761e63e541
[Frontend] Always pass supported_tasks to validation ( #35186 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-24 04:16:33 -08:00
Isotr0py
d12d201409
[Bugfix] Fix failing FunASR processor test ( #35111 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-24 04:13:45 -08:00
eustlb
b3ad37c5db
[glm-asr] change defaults dummy audio size ( #35108 )
...
Signed-off-by: Eustache Le Bihan <eulebihan@gmail.com >
2026-02-24 04:13:33 -08:00
Wentao Ye
14561fabfd
[Perf] Optimize pooling model redundant copy, 1.8% throughput improvement ( #35127 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-24 04:13:11 -08:00
Zhengxu Chen
c77f3e1207
[compile] Save aot compile artifacts atomically. ( #35117 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-02-24 04:11:01 -08:00
Dor Huri
012dee9233
[Feature] Add LoRA tower/connector support for Llama 4 Vision (mllama4) ( #35147 )
...
Signed-off-by: dorhuri123 <dor.huri1@live.biu.ac.il >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-24 04:10:32 -08:00
Tugsbayasgalan Manlaibaatar
f1c664545b
Make voxtral compile friendly ( #33959 )
...
Signed-off-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-24 09:33:35 +01:00
Xin Yang
c870eb9e0f
[LoRA] Update LoRA expand kernel block_n calculation ( #32621 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-23 23:17:53 -08:00
BadrBasowid
6af03f2394
[Refactor] [1/N] Reorganize kernel abstraction directory ( #34055 )
...
Signed-off-by: BadrBasowid <badr.basowid@gmail.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-24 06:47:22 +00:00
Vlad Tiberiu Mihailescu
1a6cf39dec
[CI/Build] Remove redundant OpenTelemetry pip install from CI configs ( #35032 )
...
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com >
2026-02-23 22:24:11 -08:00
Nicolò Lucchesi
f91808ae0d
[MM] Allow audio chunking for offline LLM ( #34628 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-23 21:04:28 -08:00
Vadim Gimpelson
33a0d43c71
[BUGFIX][Qwen3.5] Hardcode mlp.gate as not quantizable ( #35156 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-23 19:42:24 -08:00
pschlan-amd
80d93fd6da
gpu_model_runner: Cache is_encoder_decoder from model config ( #35099 )
...
Signed-off-by: Patrick Schlangen <pschlan@amd.com >
2026-02-23 19:08:34 -08:00
Jia Guo
ec85340531
[Quantization] Support FP8 MoE bias for models like GPT-OSS ( #34906 )
...
Signed-off-by: jasperjiaguo <jasperg662@gmail.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-02-23 19:07:47 -08:00
Rohan Potdar
2ff4e51152
[ROCm] AITER fused RoPE+KVCache ( #33443 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
Signed-off-by: charlifu <charlifu@amd.com >
Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com >
Co-authored-by: charlifu <charlifu@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com >
2026-02-23 19:06:00 -08:00
Asaf Gardin
95642441d0
[Mamba1] - Change supports_update_block_table to True ( #35054 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-02-23 19:05:57 -08:00
Xin Yang
a7c9f7b7ec
[Bugfix] Fix lora_ids in FusedMoE LoRA test ( #35135 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-23 21:49:25 -05:00
Michael Goin
a4bd661fb3
[Perf] Enable FlashInfer DeepGEMM swapAB on SM90 by default ( #34924 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-23 17:34:41 -08:00
Michael Goin
3ef9fd0f98
[Bugfix] Fix DSV3 kernels breaking _C and _moe_C on unsupported arches ( #35123 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-23 17:11:27 -08:00
Michael Goin
22a97e6613
[Perf] Improve default triton fused moe configs ( #34846 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-23 16:01:28 -08:00
Aaron Hao
596ed1f02e
[RL] Validation for pause_mode='keep' ( #34992 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-02-23 16:30:56 -05:00
Nicolò Lucchesi
b8d8b7e934
[Misc] Monitor interface changes ( #35113 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-23 17:14:51 +00:00
Harry Mellor
28c5e69ba0
Enforce that model is the first positional arg when --served-model-name is used ( #34973 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 08:38:05 -08:00
Harry Mellor
864167d376
Fix custom processors that use deleted import for Transformers v5 ( #35101 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 08:38:00 -08:00
haosdent
a2ba6a5244
[Bugfix] Fix prefix caching for Mamba 'all' mode (Nemotron models) ( #34874 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-23 17:31:51 +01:00
Harry Mellor
c4f38696f7
Use Xet high performance mode for Transformers v5 ( #35098 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 08:19:30 -08:00
haosdent
a7f341c323
[Bugfix] Fix MRotaryEmbedding missing truncate attr with YaRN scaling ( #35080 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-23 16:05:52 +00:00
Robert Shaw
d13ece38d7
[CI] Skip Responses API ( #34990 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-23 07:46:45 -08:00
Mark McLoughlin
5cc7c4452e
[Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) ( #30950 )
...
Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus.
`--enable-mfu-metrics` is required for these to be exposed.
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-23 15:01:07 +00:00
Eldar Kurtić
b95bb6927f
[kv-cache, ct] Use compressed-tensors as a source of ground-truth for quant strategies ( #34254 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
2026-02-23 07:37:55 -07:00
Cyrus Leung
392645454b
[Refactor] Decouple TimingContext from InputProcessingContext ( #35083 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-23 14:15:50 +00:00
Eldar Kurtić
1e8438a89a
[Llama4,CI] Bring back Llama-4 bug fixes, and also fix Maverick tests ( #35033 )
...
Signed-off-by: Eldar Kurtic <you@example.com >
Co-authored-by: Eldar Kurtic <you@example.com >
2026-02-23 09:04:34 -05:00
Robert Shaw
8435b2e049
[ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (6% B1 Speedup) ( #34302 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-23 14:02:26 +00:00
Yan Ma
b1b5e045df
[XPU] allow TORCH_SDPA/TRITON_ATTN as XPU vit Backend ( #35010 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2026-02-23 05:06:44 -08:00
Andreas Karatzas
5f68464f92
[ROCm][CI] Fix spec decode profile assertion and logprob test determinism ( #35043 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-23 05:05:54 -08:00
Vincent Gimenes
aa08a30fc9
[CLEANING] Remove unused disable_by_batch_size from SpeculativeConfig ( #35060 )
...
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com >
2026-02-23 05:05:36 -08:00
Wentao Ye
7f40e9e516
[Refactor] Remove dead private func _fp8_perm and _extract_mask_for_item ( #35068 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-23 05:05:20 -08:00
Harry Mellor
103e614b14
Fix pipeline parallel with embed scaling in the Transformers modelling backend ( #35094 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 05:04:47 -08:00
Neil Schemenauer
54e2f83d0a
[Feature] Lazy import for the "mistral" tokenizer module. ( #34651 )
...
Signed-off-by: Neil Schemenauer <nas@arctrix.com >
2026-02-23 00:43:01 -08:00
Gabe Goodhart
e631f8e78e
fix: Apply embedding_multiplier to inputs_embeds ( #34813 )
...
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-23 00:42:46 -08:00
Martin Hickey
e97c46a92d
[BugFix]: Fix local mypy issues ( #34739 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 00:40:29 -08:00
Jee Jee Li
7291d1b288
[Bugfix] Fix kernel benchmark ( #33752 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-22 21:18:08 -08:00
Cyrus Leung
987506bca6
[Refactor] Simplify dummy data generation ( #35025 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-22 20:55:27 -08:00
Woosuk Kwon
c645e9a214
[Model Runner V2] Remove propose_draft method ( #35070 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-22 18:27:12 -08:00
Nick Hill
944ffb5968
[Model Runner V2][Minor] Remove redundant do_spec_decode field ( #35039 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-22 16:18:04 -08:00
qizixi
2bcf71b9c0
[Spec Decode] Reduce TP communication for speculative decoding draft token generation ( #34049 )
...
Signed-off-by: qizixi <qizixi@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-22 14:59:16 -08:00
tacos8me
b7892a3bef
[Model] Add NVFP4 quantization support for Step3.5-Flash ( #34478 )
...
Signed-off-by: tacos8me <ian@cloudhabit.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-22 12:30:46 -07:00
Benjamin Chislett
682566b18e
[Bug] Refactor max_num_batched_tokens to account for drafting ( #34898 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-22 11:18:46 -05:00
qizixi
b9c2a565cc
[Spec Decode] Defer clearing KV connector metadata for EAGLE3 speculative decode + prefill / decode disagg setup ( #34529 )
...
Signed-off-by: qizixi <qizixi@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-22 08:08:32 -08:00
Andreas Karatzas
dd8c3a7fb2
[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays ( #35052 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-22 10:07:18 +00:00
Andreas Karatzas
a8a47c17b6
[ROCm][CI] Fix flaky embedding chat test by using tolerance-based comparison ( #35050 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-22 09:03:44 +00:00
Roger Wang
40f88d8318
[Bugfix] Fix Qwen3/Qwen3.5 Reasoning Parser ( #34779 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-21 23:15:35 -08:00
Woosuk Kwon
2cbf9656ce
[Model Runner V2] Enable CUDA graph for Eagle3 ( #35040 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-21 21:42:50 -08:00
Xiao Li
30132cd144
Fix apply_top_k_top_p_triton called by non-cuda logits Tensor ( #35030 )
...
Signed-off-by: Xiao Li <ilx@meta.com >
2026-02-21 21:11:54 -08:00
Cyrus Leung
cbd95a2dd1
[Benchmark] Use sns.relplot for plotting ( #35027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-21 20:26:48 -08:00
Athrael Soju
970861ac0c
[New Model] Add ColModernVBERT ( #34558 )
...
Signed-off-by: Athrael Soju <athrael.soju@gmail.com >
Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com >
2026-02-22 12:23:41 +08:00
Wentao Ye
d24bdd7c4b
[CI] Bump mteb version to mteb[bm25s]>=2, <3 for pooling model unit tests ( #34961 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-21 20:23:24 -08:00
Andreas Karatzas
d403c1da1c
[CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream ( #35008 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-22 04:01:10 +00:00
Woosuk Kwon
b71fbd06e2
[Model Runner V2] Support attention group ( #35036 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-21 16:42:53 -08:00
Vadim Gimpelson
74d90b1ce4
[Model Bash][DSR1] Add selective dynamic shape marking for CustomOp ( #34900 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-21 19:28:01 -05:00
Woosuk Kwon
a4047d4ea9
[Model Runner V2] Support Eagle3 (no CUDA graph) ( #35029 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-21 12:55:24 -08:00
Cyrus Leung
965fe45935
[CI/Build] Fix gRPC version mismatch ( #35013 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-21 12:14:41 -07:00
Roman
98b0205c3c
[Frontend] Add automatic language detection for Whisper transcription ( #34342 )
...
Signed-off-by: space_check <roman.vuskov@rwth-aachen.de >
Signed-off-by: Roman <45857014+spacecheck@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-21 04:49:41 -08:00
Huy Do
272b535ab3
[Bugfix] Gate 256-bit instructions to CUDA 12.9+ ( #34791 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 04:48:14 -08:00
Cyrus Leung
f74f1572ca
[Benchmark] Improve benchmarks ( #35012 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-21 10:31:58 +00:00
petrpechman
bebfe55b1c
[Doc] Fix example of eagle3 ( #34960 )
...
Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz >
Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz >
2026-02-21 09:57:53 +00:00
Nick Hill
820d7815eb
[Core] Minor structured-output related scheduler optimization ( #34765 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 01:38:28 -08:00
Nicolò Lucchesi
ab6f3487a6
[PD] Change kv_load_failure_policy Default from "recompute" to "fail" ( #34896 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer
8dc8a99b56
[ROCm] Enable bitsandbytes quantization support on ROCm ( #34688 )
...
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com >
2026-02-21 00:34:55 -08:00
jennyyyyzhen
2aab2bb543
[ROCM] Optimize ROCM_AITER_FA spec decode eagle performance ( #34541 )
...
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu >
2026-02-20 20:32:05 -08:00
Andreas Karatzas
54254f7a61
[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends ( #34599 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:25:23 -08:00
Andreas Karatzas
cf93c1a128
[ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 ( #34570 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:25:07 -08:00
Andreas Karatzas
89358f0d35
[CI] Fix ColBERT HF comparison tests on AMD CI + refactor ( #34567 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:12:05 -08:00
zhongdaor-nv
a0fe7ea2f0
[feat] Add per-block extra_keys to KV events ( #33304 )
...
Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 20:11:40 -08:00
Andreas Karatzas
991d6bff38
[CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure ( #33949 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:03:32 -08:00
Kata Coder
5719a4e4e6
[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed ( #34574 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
2026-02-20 20:01:40 -08:00
pougetat
11be2c74dc
[Realtime] Add Qwen3-ASR realtime streaming support ( #34613 )
...
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-20 19:59:42 -08:00
Xin Yang
7a5adad480
[Kernel] Optimize sample_recovered_tokens_kernel ( #34974 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-20 19:59:06 -08:00
Li
59c6233297
Support prompt_embeds for pooling requests in output processor ( #34904 )
...
Signed-off-by: Li Zhang <lzhanga@amazon.com >
Co-authored-by: Li Zhang <lzhanga@amazon.com >
2026-02-20 19:57:38 -08:00
Taneem Ibrahim
d38cd3dde5
[Misc] Fix mypy errors in vllm/profiler and remove from exclude list ( #34959 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-02-20 19:56:33 -08:00
Rohan Potdar
ded333fb9b
[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion ( #34636 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-20 19:56:16 -08:00
Yanan Cao
9d7577b2bd
[Kernel] [Helion] [9/N] Canonicalize GPU variant names to base model names ( #34928 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-20 19:55:51 -08:00
Vlad Tiberiu Mihailescu
e739c29ea4
[CI/Build] Add opentelemetry libs in default vllm build (requirements/common.txt) ( #34466 )
...
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com >
2026-02-20 19:54:55 -08:00
yugong333
a55caf6ae9
[LoRA] Support Quantized Adapters ( #30286 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Signed-off-by: wz1qqx <ziqi.wang@novita.ai >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: wz1qqx <55830058+wz1qqx@users.noreply.github.com >
Co-authored-by: wz1qqx <ziqi.wang@novita.ai >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 19:54:35 -08:00
Lucas Wilkinson
0e22cd618b
Revert "[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers " ( #34997 )
2026-02-20 17:19:19 -08:00
Wei Zhao
ea5f903f80
Bump Flashinfer Version and Re-enable DeepSeek NVFP4 AR+Norm Fusion ( #34899 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 13:37:31 -08:00
Ryan Rock
0632ed8778
[AMD][CI] Fix test_custom_allreduce for A100 testgroup ( #34735 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2026-02-20 21:33:04 +00:00
Lucas Wilkinson
aaefc58ee0
[CI] Revert PRs 34818 and 33600 ( #34979 )
2026-02-20 13:25:50 -08:00
Wei Zhao
f24b2de3d3
[Test] Add FP8 KV Cache Testing for MLA Backends ( #34473 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-20 18:51:58 +00:00
Michael Goin
fac1507f03
[CI] Remove failing prime-rl integration test ( #34843 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-02-20 10:17:42 -08:00
Zhengxu Chen
f863994084
[compile] Fix torch.compile time discrepancy in logging. ( #34912 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 08:47:14 -08:00
Zhengxu Chen
e4a5d8c653
[compile] Move torch_aot_compile directory under torch_compile_cache ( #34831 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-02-20 08:46:45 -08:00
Yanan Cao
a6d0299c75
[Kernel] [Helion] [6/N] Add num_tokens dimension to silu_mul autotuning and dispatching ( #34185 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-02-20 08:36:51 -08:00
Harry Mellor
6ce80f7071
Ensure that MkDocs v2 does not get installed ( #34958 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-20 15:38:11 +00:00
Huamin Li
1fe462168c
[perf] Avoid dtype promotion sync in mamba_get_block_table_tensor ( #34870 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 06:21:56 -08:00
Flora Feng
ed31a020ee
[Refactor] Extract Harmony streaming SSE event builders into streaming_events.py ( #34909 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 06:20:46 -08:00
Cyrus Leung
f9ac19204f
[V0 Deprecation] Remove unused MM placeholders in request output ( #34944 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-20 06:19:23 -08:00
Vadim Gimpelson
59965affbd
[BUGFIX] Fix _dummy_run missing prepare_inputs_event synchronization ( #34866 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-20 05:54:27 -08:00
Xin Yang
b1c4f0b265
[Kernel] Optimize grouped topk kernel ( #34206 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-20 01:34:45 -08:00
Kevin McKay
8de7c636cc
[Bugfix][Hardware][AMD] Fix ROCM_AITER_FA speculative decoding support ( #32877 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-19 22:25:46 -08:00
Frank Wang
059779231f
[Minor] Add logging when using MXFP4 MXFP8 TRTLLM backend ( #34916 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-19 22:07:57 -08:00
tianshu-Michael-yu
ea37530b47
[Models] LFM2: Support LoRA ( #34921 )
...
Co-authored-by: Piotr Mazurek <piotr635@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-19 22:07:23 -08:00
Micah Williamson
f5432e35a3
[ROCm][CI] Loosen RemoteOpenAIServer Startup Timeout ( #34922 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-20 05:37:49 +00:00
杨朱 · Kiki
07cab212f0
[Misc] Add deprecated environment variable utilities ( #33677 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-19 21:33:25 -08:00
rasmith
0c1dc42748
[CI][AMD][BugFix][P/D] Add default_vllm_config to test_moriio_connector.py so tests pass ( #33739 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-19 21:32:40 -08:00
Varun Chawla
676f82ae81
Add validation to reject non-text content in system messages ( #34072 )
...
Signed-off-by: Varun Chawla <varun_6april@hotmail.com >
2026-02-19 21:30:33 -08:00
Elizabeth Thomas
81bfc21a6a
[Model Bash]: Improve FP8 Oracle for Config Specific Kernel Selection ( #34260 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com >
Signed-off-by: Robert Shaw <robertgshaw2-redhat@h100-02.nemg-001.lab.rdu2.dc.redhat.com >
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
Co-authored-by: Robert Shaw <robertgshaw2-redhat@h100-02.nemg-001.lab.rdu2.dc.redhat.com >
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com >
2026-02-19 21:29:08 -08:00
Matthias Gehre
4e2c7caf2d
[Bugfix] Add regression test for MoE quant_config under torch.compile ( #34335 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-02-20 13:27:26 +08:00
Bowen Bao
d9e62c03eb
[Quark] Fix MoE fp8 activation scale handling on mi300 ( #34386 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com >
2026-02-19 21:27:14 -08:00
Kevin H. Luu
a1a2d79442
[ci] Use the right tag for CPU arm64 image ( #34915 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-02-19 19:59:15 -08:00
Cyrus Leung
ac900c89bb
[Refactor] Implement output type check in LLM ( #34794 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-19 19:57:55 -08:00
Mark McLoughlin
76df6072ff
[Core] Fix state names in pause_scheduler() ( #34840 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-19 17:21:46 -08:00
Michael Goin
16f24e8797
[CI] Add GPT-OSS Eval job for H100 ( #34359 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-02-19 17:14:54 -08:00
Nick Hill
40b2f1c3d9
[Model Runner V2] Minor CPU optimizations ( #34856 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-19 16:05:37 -08:00
Mayank Ketkar
648951a9c3
[Bugfix] Fix benchmark_fused_collective crash on CustomOp init ( #34665 )
...
Signed-off-by: Mayank Ketkar <mketkar@zoox.com >
Signed-off-by: Mayank Ketkar <mayket04@gmail.com >
Co-authored-by: Mayank Ketkar <mketkar@zoox.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-02-19 19:01:00 -05:00
Michael Goin
f72061a19a
[UX] More descriptive reasons in is_supported_config for MoE ( #34908 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-19 15:20:52 -08:00
Matthew Bonanni
662205d34e
[Bugfix] Fix Basic Models Test ( #34818 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-19 14:49:07 -08:00
Roger Wang
4fb8beefaa
[Bugfix] Fix cutlass fp8 kernel on hopper for Qwen3.5 ( #34914 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-19 13:34:55 -08:00
Alexei-V-Ivanov-AMD
304319c4ed
Change targets for AMD build in the "CI" pipeline ( #34918 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2026-02-19 21:26:53 +00:00
Wentao Ye
c683d11c94
[Refactor] Deprecate head_first for chunk_gated_delta_rule ( #34263 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-19 13:23:49 -05:00
roikoren755
3eff45d793
Revert "[NemotronH] Do not force router to run in fp32 ( #34582 )" ( #34808 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-19 09:47:05 -08:00
Robert Shaw
4685a630a2
[Model Bash][DeepSeekR1] Remove Shared Expert Clone ( #34344 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-19 07:56:14 -08:00
Eldar Kurtić
ee1d25f199
[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers ( #34471 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-19 07:55:41 -08:00
Linda
6fff24f30f
[Bugfix] Qwen3.5 kv-scale weight remapping ( #34719 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com >
2026-02-19 04:13:37 -08:00
Cyrus Leung
23210a911e
[CI/Build] Try to make beam search test less flaky ( #34885 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-19 19:16:58 +08:00
Cyrus Leung
1391378861
[Bugfix] Fix edge case in UUID data parsing ( #34884 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-19 02:24:30 -08:00
Andreas Karatzas
f6220f9877
[ROCm][Test] Fix beam search determinism failures from batch-size-dependent FP divergence and removed wrong marker ( #34878 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-19 08:25:26 +00:00
Andreas Karatzas
2df2bb27b0
[ROCm][CI] Removing all blocking labels from MI355 until stable infra ( #34879 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-19 07:53:08 +00:00
Tal Nir
f75b61a9e9
[Voxtral Realtime] Fix engine crash on empty multimodal embeddings ( #34862 )
...
Signed-off-by: Tal Nir <tal@nervexneurotech.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-18 23:21:47 -08:00
Wei Zhao
7f51e93864
[Bug] Fix DeepSeek V3 weight loading caused by incorrect prefix ( #34876 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-18 23:20:30 -08:00
Alex Brooks
4611af1663
[Bugfix] Add Quant Config to Llava Next Projector ( #34847 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
2026-02-18 23:18:23 -08:00
Manrique Vargas
ad5aa6bd9f
fix(docs): fix typos in comments and docstrings ( #34836 )
...
Signed-off-by: machov <mv1742@nyu.edu >
2026-02-18 23:17:41 -08:00
Jaeyeon Kim(김재연)
9681068cf9
[Frontend] Fix reasoning_tokens for text-based parsers in Responses API ( #33513 )
...
Signed-off-by: Jaeyeon Kim <anencore94@gmail.com >
2026-02-18 23:16:41 -08:00
Kevin H. Luu
b6101d384d
Deprecate test-pipeline.yaml ( #34864 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-02-19 02:15:27 +00:00
Woosuk Kwon
5fcb0cdd68
[Model Runner V2] Use FP32 for Gumbel Noise ( #34854 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-18 17:07:37 -08:00
Woosuk Kwon
c878b43b64
[Model Runner V2] Remove unnecessary copies in PW CUDA graph capture ( #34849 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-18 15:52:50 -08:00
rasmith
2b84ac669c
[CI][AMD][BugFix] Use torch.testing.assert_close instead of assert torch.allclose in test_rocm_skinny_gemms.py ( #34181 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-18 23:10:19 +00:00
zhrrr
11d3976b88
[Model Runner V2] support piecewise & mixed cudagraph ( #32771 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2026-02-18 15:03:17 -08:00
Yongye Zhu
40da9625a1
[MoE Refactor] Convert mxfp4 marlin into modular kernel format ( #34588 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-18 14:37:14 -08:00
Flora Feng
8d9babd4de
Fix empty tool_call_id in Anthropic messages API tool result conversion ( #34745 )
...
Signed-off-by: <>
Signed-off-by: sfeng33 <4florafeng@gmail.com >
Co-authored-by: Flora Feng <sfeng33@h100-01.nemg-001.lab.rdu2.dc.redhat.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-18 14:31:59 -08:00
Aaron Hao
e99ba957ec
[BUG] Fixing Weight Sync unit test ( #34841 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-02-18 17:20:10 -05:00
Kyle Sayers
64ac1395e8
[Docs] Clean up speculators docs ( #34065 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2026-02-18 13:48:11 -08:00
Cyrus Leung
61cf087680
[Bugfix] Fix lora tests ( #34834 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-18 13:22:31 -08:00
Wenlong Wang
847a57cd12
[Bugfix][MoE Kernel] Fix incorrect routing selection for models without expert groups (e.g., MiniMax-M2.1) ( #34673 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-18 13:03:24 -08:00
rasmith
fcd6ac97ed
[CI][AMD][BugFix] Skip tests in test_unquantized_backend_selection that should not run on ROCm ( #34655 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-18 15:00:40 -05:00
Woosuk Kwon
95be2a7f22
[Model Runner V2] Minor simplification for DCP ( #34786 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-18 11:04:53 -08:00
Jaden Mathias
0e60c925cf
[Bugfix] Remove assert causing hipErrorStreamCaptureUnsupported ( #34455 )
...
Signed-off-by: Jaden Mathias <jaden.mathias@amd.com >
2026-02-18 18:54:54 +00:00
Teng Ma
d7ff22204a
[Misc] Add mooncake-transfer-engine to kv_connectors requirements ( #34826 )
...
Signed-off-by: Teng Ma <teng-ma@linux.alibaba.com >
2026-02-18 18:26:24 +00:00
Isotr0py
c0bd8b13da
[Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion ( #34697 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
2026-02-18 09:46:53 -08:00
Michael Goin
caeb887bf6
[Bugfix] Fix NVFP4 TRTLLM MoE non-gated support; add gsm8k for Nemotron-3-Nano FP8+NVFP4 ( #34725 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-18 09:39:22 -08:00
Ilya Markov
6b3166a7c7
[CI][Bugfix] Fix multinode test script ( #34820 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-02-18 11:45:10 -05:00
Robert Shaw
25e2e136ef
[CI] temporarily disable multi-node tests ( #34825 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-18 11:32:44 -05:00
Robert Shaw
6874638bc4
[Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) ( #34758 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-18 07:42:36 -08:00
Burkhard Ringlein
e24663c5a9
Add unit tests for fp8 output fusion of triton_attn ( #34228 )
...
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-18 06:22:49 -05:00
Nick Hill
c50e105a88
[Model Runner V2] Avoid prepare prefill kernel launch overhead ( #34780 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-18 00:49:21 -08:00
Cyrus Leung
a766b30349
[Renderer] Deprecate code paths for old input processing ( #34775 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-18 00:35:04 -08:00
Asaf Joseph Gardin
1faa8cb73c
[Quantization] - Added uses_meta_device_weights to quant config ( #34645 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-02-17 23:43:44 -08:00
Marek Michalowski
e89a91d927
[Bugfix] fix activation in cpu_fused_moe_torch call ( #34696 )
...
Signed-off-by: Marek Michalowski <marek.michalowski@arm.com >
2026-02-17 23:39:46 -08:00
Michael Goin
909b147197
[Bugfix] Fix prefix creation for Qwen3.5 ( #34723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-17 23:39:15 -08:00
ElizaWszola
a88b3be7c4
[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales ( #33255 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-17 23:35:04 -08:00
Nick Hill
a49ea5a58f
[Model Runner V2] A bit more PP simplification ( #34766 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-17 21:39:07 -08:00
Cyrus Leung
30ebe0dc3c
[CI/Build] Remove use of skip_v1 ( #34699 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-18 12:19:11 +08:00
Andreas Karatzas
cef65f0715
[ROCm][CI] Removed hard-coded attn backend requirement for Qwen VL ( #34753 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-18 03:59:53 +00:00
Russell Bryant
6f3b2047ab
[Core] Fix SSRF bypass via backslash-@ URL parsing inconsistency ( #34743 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: isotr0py <2037008807@qq.com >
2026-02-18 03:53:35 +00:00
Luka Govedič
02e8f26cea
[torch.compile] Turn on silu+fp4 quant fusion by default for O1+ ( #34718 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2026-02-18 03:29:15 +00:00
Hongxia Yang
4a00a511bb
[BugFix] [Build] fix string literals comparison in indexer_k_quant_and_cache calling site ( #34653 )
...
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com >
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com >
2026-02-17 19:19:41 -08:00
Cyrus Leung
a0d8d944e2
[Renderer] Move MM Hash parsing into Renderer ( #34711 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-17 19:18:55 -08:00
Amr Mahdi
df3f537a66
[CI] Remove unused precompiled wheel args from image build ( #34767 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-02-17 18:58:18 -08:00
Matthew Bonanni
7743152957
[Attention] Refactor check_and_update_config ( #33600 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-17 17:06:54 -08:00
Wentao Ye
ab33d2a629
[Feature] Decode Context Parallel support for GPU model runner v2 ( #34179 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-17 16:27:15 -08:00
Woosuk Kwon
be3af2d29e
[Model Runner V2] Further simplification for PP ( #34724 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-17 15:18:18 -08:00
Jongseok Park
c656ba3b4d
[Kernel] Triton-based Top-k and Top-p sampler kernels ( #33538 )
...
Signed-off-by: js_park <cakeng@naver.com >
Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com >
Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-02-17 23:14:30 +00:00
Matthew Bonanni
dc5fa77a4e
[Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs ( #34457 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-17 14:01:27 -05:00
Flora Feng
1e4a084c8e
[CI] Fix flaky test_parsable_context ( #34717 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-02-17 18:42:52 +00:00
Richard Zou
7967e854da
[BugFix] Fix sp tests ( #34716 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-17 17:07:56 +00:00
almayne
6bd6d0c3c1
Fixed whisper CPU test that does not spawn properly. ( #34324 )
...
Signed-off-by: Anna Mayne <anna.mayne@arm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-17 06:46:23 -08:00
Nicolò Lucchesi
8e962fef5f
[CI][Nixl] Add CrossLayer KV layout tests ( #34615 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-17 21:35:40 +08:00
Cyrus Leung
574fe75245
[Renderer] Move InputPreprocessor into Renderer (2/2) ( #34560 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-17 05:29:01 -08:00
junuxyz
c61a98f529
[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior ( #34514 )
...
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com >
2026-02-17 12:22:56 +00:00
Harry Mellor
28bffe9466
Fix docs build warning ( #34686 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-17 02:31:40 -08:00
ChenqianCao
ad65177a19
[Bugfix] Fix 'remove_instance_endpoint' method logic in disagg_proxy_demo ( #32922 )
...
Signed-off-by: ChenqianCao <39755070+ChenqianCao@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-17 10:06:53 +00:00
Tim Dettmers
d44a5b6c47
Remove dead bitsandbytes CxB code from 8-bit inference path ( #34633 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-17 01:49:14 -08:00
Jiangyun Zhu
1d65283e95
Revert "[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj" ( #34683 )
2026-02-17 01:29:27 -08:00
kourosh hakhamaneshi
c464b57374
[Ray] Propagate third-party env vars to Ray workers via prefix matching ( #34383 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-17 01:08:42 -08:00
Amr Mahdi
c5c38e152a
[CI] Fix bake config artifact path for AMI rebuild pipeline ( #34656 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-02-17 06:39:44 +00:00
Woosuk Kwon
d00df624f3
[Model Runner V2] Minor refactoring for penalties ( #34662 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-16 21:43:00 -08:00
Woosuk Kwon
9752da9d9c
[Model Runner V2] Minor simplification for BadWordsState ( #34669 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-16 21:27:24 -08:00
Woosuk Kwon
04925b2202
[Model Runner V2] Minor cleanup for PP ( #34666 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-16 19:15:31 -08:00
Woosuk Kwon
d74278fb67
[Model Runner V2] Fix unintended CPU-GPU sync in make_dummy ( #34667 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-16 19:00:29 -08:00
haosdent
b68fd899d1
[Bugfix] Fix fused MoE int32 overflow in stride*offset without perf regression ( #34507 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-16 17:58:49 -08:00
Aneesh Puttur
0b5f9b7204
[CI] Enable mypy import following for vllm/v1/kv_offload ( #34639 )
...
Signed-off-by: Aneesh Puttur <aneeshputtur@gmail.com >
2026-02-17 09:58:15 +08:00
zhanqiuhu
9a8853f781
[Core] Pipeline Parallel support for Model Runner V2 ( #33960 )
...
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
2026-02-16 17:48:16 -08:00
zhrrr
387a1898d9
[Model Runner V2] support bad_words sampling param ( #33433 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-16 16:36:06 -08:00
roikoren755
3b30e61507
[NemotronH] Do not force router to run in fp32 ( #34582 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-02-16 10:15:32 -08:00
Alexei-V-Ivanov-AMD
824f9e8f3c
Targeting the MI355 agent pool with all existing tests ( #34629 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2026-02-16 17:02:27 +00:00
Nicolò Lucchesi
6cc403e67d
[Bugfix][CI] Fix flaky entrypoints/openai/test_response_api_with_harmony.py::test_function_calling[openai/gpt-oss-20b] ( #34624 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-16 16:11:07 +00:00
Almog Tavor
72d5951d02
[Bugfix] Treat generation_config max_tokens as default not ceiling ( #34063 )
...
Signed-off-by: almogtavor <almogtavor@gmail.com >
2026-02-16 07:58:24 -08:00
Lucas Kabela
a3205beffb
[CI] Enable mypy coverage for individual excluded files ( #34292 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-16 07:34:29 -08:00
Christian Pinto
6930becd45
(bugfix): Fixed encode in LLM entrypoint for IOProcessr plugin prompts ( #34618 )
...
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
2026-02-16 07:33:55 -08:00
Andreas Karatzas
03a8770a6d
[ROCm][CI] Fix plugins test group; updating terratorch and dependencies ( #34589 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-16 07:33:42 -08:00
Yiqi Xue
bc56a1d56e
[Bugfix] Fix ARC touch KeyError for non-ready T1 blocks in kv offload ( #34576 )
...
Signed-off-by: Yiqi Xue <xuey666@gmail.com >
2026-02-16 07:33:19 -08:00
danisereb
ec7d9e6745
Fix call to moe_mk in modelopt MoE modules (required for LoRA) ( #34575 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-16 07:33:09 -08:00
Isotr0py
3bb4e4311c
[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj ( #34492 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-16 07:32:51 -08:00
Amr Mahdi
08f8c198ae
[CI] Disable precompiled wheel path in CI image builds ( #34606 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-02-16 15:14:43 +00:00
Harry Mellor
a21cedf4ff
Bump lm-eval version for Transformers v5 compatibility ( #33994 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-16 05:24:35 -08:00
emricksini-h
3ef74cde5d
[CI][Tracing] Fix race condition by adding server readiness check ( #34364 )
...
Attempt to resolve #34284 : "Metrics Tracing (2GPU)" fails with a
segmentation fault.
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai >
2026-02-16 12:57:39 +00:00
Ekagra Ranjan
cd81cdb399
[Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs ( #31058 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-16 11:08:44 +00:00
Andreas Karatzas
1e828573b4
[CI][Metrics] Stabilize tests with polling and subprocess guards ( #34566 )
...
test_abort_metrics_reset is flaky due to hardware-dependent
fixed sleeps: replace fixed sleeps with polling.
test_metrics_exist_run_batch passes even when the engine crashes
on startup (false positive): add subprocess lifecycle guards.
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-16 10:52:02 +00:00
Samu Tamminen
a5ccc85c8c
[Bugfix] Fix Dynamo unexpected keyword argument ( #34320 )
...
Signed-off-by: Samu Tamminen <stammine@amd.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-16 01:32:30 -08:00
Roger Wang
b5475d0534
Revert "[Misc] fix qwen3.5 config" ( #34610 )
2026-02-16 01:06:05 -08:00
JJJYmmm
9521002f0a
[Misc] fix qwen3.5 config ( #34604 )
2026-02-16 00:25:38 -08:00
Cyrus Leung
ec17bdd894
[Renderer] Move InputPreprocessor into Renderer (1.5/2) ( #34598 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-15 23:46:33 -08:00
Amr Mahdi
bb59c90248
[CI] Write bake config to temp directory instead of repo root ( #34569 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-02-15 22:15:47 -08:00
bnellnm
5bff999d12
[Bugfix] Add method to swap quant_method on FusedMoE to fix LoRA issues ( #34453 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-15 20:10:50 -08:00
Lucas Wilkinson
bb85929aa6
[BugFix] Fix Python 3.13 FlashMLA import error ( #34548 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-15 20:09:18 -08:00
Parth Bansal
5653021094
[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model ( #34584 )
...
Signed-off-by: Parth Bansal <parthbansal127@gmail.com >
2026-02-16 12:09:00 +08:00
Andreas Karatzas
974d829b05
[CI][Frontend] Return 422 instead of 500 for invalid Anthropic tool_choice ( #34590 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-15 20:06:48 -08:00
Isotr0py
91ac5d9bfd
[CI/Build] Enable tests for recent day-0 new models ( #34585 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 18:17:04 -08:00
Luka Govedič
23d825aba1
[torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test ( #34392 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-15 06:33:57 -08:00
Maryam Tahhan
f07a128413
[CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… ( #33079 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-02-15 06:33:08 -08:00
Isotr0py
71cd89264f
[MM Encoder] Add Triton ViT attention backend ( #32183 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 06:32:47 -08:00
Isotr0py
19fab44152
[Doc] Update Encoder-Decoder models support doc with Florence-2 ( #34581 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 04:18:57 -08:00
Seiji Eicher
79c7e09235
[KV Connector] Add temporary, off-by-default VLLM_DISABLE_REQUEST_ID_RANDOMIZATION workaround ( #34415 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2026-02-14 23:26:10 -08:00
haosdent
79f3fab05a
[Bugfix] Handle num_expert_group=None in flashinfer block-scale FP8 MoE ( #34494 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-14 23:25:46 -08:00
Vadim Gimpelson
604b9eaec5
[BUGFIX] Fix accuracy regression for NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 with TP>1 ( #34476 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-14 23:25:17 -08:00
Stanislav Kirillov
50dbd6c9e6
[bugfix] Fix critical bug when reporting for all paths where handler.create_error_response is used ( #34516 )
...
Signed-off-by: Stanislav Kirillov <stas@nebius.com >
Co-authored-by: Stanislav Kirillov <stas@nebius.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-14 23:24:25 -08:00
Andreas Karatzas
98bcc6ca59
[CI][Entrypoints] Validate detokenize token IDs to prevent int64 overflow causing 500 ( #34468 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-14 23:08:38 -08:00
Andreas Karatzas
f13e86d8dd
[Kernels] Fix Helion GPU utils to use platform-agnostic device name API ( #34537 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-14 20:29:23 -08:00
Woosuk Kwon
9ca768c740
[Model Runner V2] Minor cleanup for Sampler ( #34563 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-14 18:29:03 -08:00
Thomas Parnell
d5fe3f702c
[Hybrid] Enable mamba prefix cache "align" mode with async scheduling ( #33997 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2026-02-14 13:15:56 -08:00
Cyrus Leung
73391a1baa
[Renderer] Move InputPreprocessor into Renderer (1/2) ( #34510 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-14 10:14:21 -08:00
Andreas Karatzas
b3c14229b0
[ROCm][CI] Guard sparse MLA backend imports for ROCm compatibility in tests ( #34538 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-14 07:32:09 -08:00
Roger Wang
2f186635cb
[Bugfix] Fix Qwen3.5 config loading ( #34554 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-14 03:56:11 -08:00
Christian Pinto
342a7cda2d
[Misc] Update tests and examples for Prithvi/Terratorch models ( #34416 )
...
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-13 23:03:51 -08:00
Kata Coder
d1ea65d0a1
[new model] add COLQwen3 code & Inference ( #34398 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
Signed-off-by: katacoder <craftsangjae@gmail.com >
2026-02-14 12:15:19 +08:00
Andreas Karatzas
de42abb366
[CI] Heavy refactoring of Voxtral multimodal audio model tests ( #34294 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-13 20:04:29 -08:00
Julien Denize
60ca7981bc
Add explicit validation error for tool calls. ( #34438 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2026-02-13 20:04:01 -08:00
Christian S. Perone
0ef5b9147b
fix: use __annotations__ instead of get_type_hints() for dynamic kwargs detection ( #34527 )
...
Signed-off-by: Christian S. Perone <christian.perone@gmail.com >
Signed-off-by: Christian S. Perone <perone@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-13 20:03:37 -08:00
Shiyan Deng
ed242652d7
[bug] Make sure get_modality_with_max_tokens is deterministic ( #34533 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
2026-02-13 20:02:59 -08:00
Wei Zhao
b37b679770
[Feature][Perf] Support Selective CPU Weight Offloading ( #34535 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-13 20:02:24 -08:00
Andreas Karatzas
a0638d052d
[Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993 ( #34543 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-13 20:01:42 -08:00
Harry Huang
c027541eaf
[Hybrid] Enable spec decoding in mamba cache align mode ( #33705 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-02-13 13:02:28 -08:00
Ben Browning
fd267bc7b7
[Bugfix]: Fix structured output in multi-turn gpt-oss ( #34454 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-13 11:12:48 -08:00
Michael Goin
bfaa559305
Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides" ( #34530 )
2026-02-13 10:35:29 -08:00
Richard Zou
87789c8364
[Misc] vLLM's --enforce-eager should turn off compile and cudagraphs only ( #34523 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-13 09:52:20 -08:00
Pushpinder Singh
bcd65c1f6a
[Bugfix] Replace c10::optional with std::optional in topk kernel ( #34467 )
...
Signed-off-by: Pushpinder Singh <pushpindersingh135@gmail.com >
2026-02-13 08:30:23 -08:00
Wei Zhao
59d53066d8
[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation ( #32993 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-13 08:11:26 -08:00
LoganJane
4a9952ec1b
[Bugfix] Add quant_config in ViT of Kimi-K2.5 ( #34501 )
...
Signed-off-by: LoganJane <LoganJane73@hotmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-13 16:05:34 +00:00
Roger Wang
1dae7b7843
[Bugfix] Exclude language_model_only key from MM AOT compile hash but include in model one ( #34508 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-13 13:59:00 +00:00
Roger Wang
5885e330ef
[Misc] Port Qwen3.5 Configs ( #34512 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-13 05:24:25 -08:00
Ilya Boytsov
071d863e20
Extend ColBERT support to non-standard BERT backbones ( #34170 )
...
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com >
2026-02-13 09:53:09 +00:00
Woosuk Kwon
0916e7960b
[GDN] Use CPU tensors to build GDN metadata ( #34498 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-13 01:24:45 -08:00
Wentao Ye
3d2a026fd0
[Feature] Pipeline Parallel Async send/recv, 2.9% E2E throughput improvement ( #33368 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-02-13 16:38:16 +08:00
Aaron Hao
dddbff4624
[Core] Move pause and resume functions into engine ( #34125 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Signed-off-by: hao-aaron <ahao@anyscale.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-02-13 00:15:10 -08:00
Martin Hickey
47e9b63e1a
[KVConnector] Clean up redundant code in KV connectors ( #34147 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
2026-02-13 00:14:30 -08:00
Matthias Gehre
934acddef9
[Perf] fused_moe: add int4_w4a16 benchmark support and tuning config ( #34130 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-13 00:14:27 -08:00
Marek Michalowski
742d214d6e
[Bugfix] fix the import path in moe test utils.py ( #34245 )
...
Signed-off-by: Marek Michalowski <marek.michalowski@arm.com >
2026-02-13 00:13:45 -08:00
haosdent
4137c5dfa7
[Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode ( #34418 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-13 00:13:22 -08:00
Harry Huang
7a8a46ddcb
[BugFix] Fix and optimize max_num_blocks_per_req calculation for MambaSpec ( #34440 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-02-13 00:13:14 -08:00
myselvess
bcf0731aa0
[New Model] support new model ovis2.6 ( #34426 )
...
Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com >
2026-02-13 00:12:45 -08:00
Cyrus Leung
ec090c2429
[Refactor] Call renderer for online IO processor request ( #34490 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-12 22:48:45 -08:00
Roger Wang
eea3024f43
[Bugfix] Fix mamba state dtype setting for Qwen3-Next and Qwen3.5 ( #34489 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-12 22:48:42 -08:00
Cyrus Leung
2f308214c0
[Refactor] Pass full VllmConfig to Renderer ( #34485 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 22:48:38 -08:00
Cyrus Leung
1b4e8e53f8
[CI/Build] Fix CUDA re-initialization error in distributed model tests ( #34491 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-13 06:43:53 +00:00
haosdent
dcf6ee8592
[Bugfix] Fix encoder cache underestimation for GLM-4V/GLM-OCR single image ( #34483 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-12 21:04:06 -08:00
Cyrus Leung
372b2e762a
[Bugfix] Standardize getting number of image patches/tokens ( #34358 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 20:47:01 -08:00
Andreas Karatzas
6afa587d31
[ROCm][CI] Fix serving tokens test failures ( #34047 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-13 11:27:53 +08:00
Cyrus Leung
94ed6cf6ea
Add new sections to CODEOWNERS ( #34309 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 18:39:28 -08:00
Harry Huang
bf37812ca7
[Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode ( #33706 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-02-12 18:21:52 -08:00
Frank Wang
b86bf4417e
[Bugfix] Fix Random Dataset Prefix Length Inaccuracy ( #33907 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-12 18:21:19 -08:00
Yanan Cao
de13dd781f
[Kernel] [Helion] [5/N] Add Helion Autotuning infrastructure ( #34025 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-02-12 18:21:05 -08:00
LoganJane
62788f99a4
[Bugfix] Delete unused redundant code in Kimi-K2.5 ( #34427 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-12 18:18:42 -08:00
Cyrus Leung
ea5ff3a1f6
[Refactor] Simplify BOS/EOS token handling ( #34435 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 18:18:24 -08:00
bnellnm
04ea31baab
[Bugfix] Remove assert that's no longer valid ( #34443 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-12 18:18:15 -08:00
Harry Huang
6f019e6e0a
[BugFix] Add block_size validation for mamba cache align mode ( #34445 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-02-12 18:18:07 -08:00
Zhuohan Li
d707678dfb
Fix num_logprobs parameter description in sampler.py ( #34451 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2026-02-12 18:18:03 -08:00
Cyrus Leung
fc22cae4ac
[CI/Build] Update video URLs for testing ( #34446 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 18:15:36 -08:00
Yanan Cao
96161fe978
[Kernel] [Helion] [4/N] Add silu_mul_fp8 Helion kernel ( #33373 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-02-12 18:13:12 -08:00
Jaewon
4453ba8d9e
[Core] Profiler improvements and lazy initialization ( #33198 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-12 16:16:38 -08:00
Jaewon
aa181c923b
[Core] Add sleep level 0 mode with enqueue/wait pattern ( #33195 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-12 16:16:25 -08:00
Alec S
be7370daf3
[Frontend] Enable generic structured_outputs for responses API ( #33709 )
...
Signed-off-by: Alec Solder <alecs@fb.com >
Co-authored-by: Alec Solder <alecs@fb.com >
2026-02-12 16:15:48 -08:00
Mengtao (Martin) Yuan
9ea1f598ce
Use paged_attention_v1 for sliding window decode in rocm_aiter_fa ( #34378 )
...
Signed-off-by: Martin Yuan <myuan@meta.com >
Co-authored-by: Martin Yuan <myuan@meta.com >
2026-02-12 16:14:43 -08:00
amitz-nv
f120bd42d3
[Kernel] Support Flashinfer trtllm fused MoE non gated FP8 & NVFP4 ( #33506 )
...
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com >
2026-02-12 13:06:58 -08:00
Hashem Hashemi
fac4e96940
small adjustment to wvSplitKrc ( #34410 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-12 20:26:36 +00:00
Michael Goin
6d4e27ce29
[Bugfix] Enforce DeepGEMM when using sparse_attn_indexer on CUDA ( #34374 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-12 12:08:06 -08:00
Andreas Karatzas
4c078fa546
[ROCm][CI] Pin TorchCodec to v0.10.0 for ROCm compatibility ( #34447 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-12 18:47:34 +00:00
Patrick von Platen
6c0baee610
[Voxtral Realtime] Refactor & Improve buffering logic ( #34428 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-12 09:46:43 -08:00
Patrick von Platen
1100a97621
[Voxstral Realtime] Enable tests ( #33803 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-02-12 09:43:24 -08:00
xuebwang-amd
766e167821
[ROCm][quantization] improve OCP weight quant parser robust ( #34431 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-12 09:40:19 -08:00
Isotr0py
becbe24808
[Bugfix] Remove broken raw url GGUF model loading support ( #34433 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-12 09:40:01 -08:00
Harry Mellor
679ca5d8d3
Fix MoE for the Transformers modelling backend ( #34436 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-12 09:29:42 -08:00
Matthew Bonanni
f2c47886fd
[Attention] Add FlashInfer Sparse MLA backend ( #33451 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2026-02-12 17:21:54 +00:00
Nicolò Lucchesi
334c715e0f
[Docs] Spec decoding docs warning removal ( #34439 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-12 09:01:51 -08:00
Aaron Hao
7b5a8b4a9d
[BUG] Reset running requests when clearing cache for pause/resume ( #34382 )
...
Signed-off-by: hao-aaron <ahao@anyscale.com >
2026-02-12 16:19:13 +00:00
danisereb
dea63512bb
Add config file for fused MoE for Nemotron (TP4, B200) ( #34411 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-12 06:09:55 -08:00
Douglas Lehr
8a798be929
[ROCm] Enable MXFP4 MoE weight pre-shuffling on gfx950 and update aiter ( #34192 )
...
Signed-off-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com >
2026-02-12 05:06:33 -08:00
Cyrus Leung
fb455ed547
[V0 Deprecation] Remove code related to per-request logits processors ( #34400 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 20:44:28 +08:00
baonudesifeizhai
f5897613fb
Fix Mistral config remap to accept compressed-tensors quantization #34028 ( #34104 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2026-02-12 08:22:06 +00:00
Louie Tsai
55a1a9563a
Vllm CPU benchmark suite improvement ( #34128 )
...
Signed-off-by: louie-tsai <louie.tsai@intel.com >
2026-02-12 16:04:44 +08:00
AllenDou
386bfe5d08
[bugfix] refactor FunASR's _get_data_parser ( #34397 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-02-12 07:26:49 +00:00
Kyle Sayers
e9cd691132
[Bugfix] Fix Sparse24 Compressed Tensors models ( #33446 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-11 23:15:16 -08:00
Yichuan Wang
80f2ba6ea6
Fix DeepSeek-OCR tensor validation for all size variants ( #34085 )
...
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-11 22:50:23 -08:00
Lucas Wilkinson
136b0bfa59
[BugFix] Fix DP chunking ( #34379 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Bill Nell <bnell@redhat.com >
2026-02-12 06:44:03 +00:00
Cyrus Leung
b96f7314b4
[Refactor] Pass Renderer to Input Processor ( #34329 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 19:38:11 -08:00
Cyrus Leung
ced2a92f40
[Refactor] Move validation to params definitions ( #34362 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 19:33:15 -08:00
Runkai Tao
e1d97c38f8
[Bug Fix] Fix naive_block_assignment always defaulting to False due to arg misalignment ( #33848 )
...
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu >
2026-02-12 11:30:57 +08:00
Michael Goin
ec12d39d44
[Bugfix] Fix MTP accuracy for GLM-5 ( #34385 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-12 11:08:19 +08:00
Michael Goin
ff1f83b056
[Refactor] Replace activation: str with MoEActivation enum ( #33843 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-02-11 17:29:32 -08:00
Kevin H. Luu
83b47f67b1
[ci] Integrate AMD tests into CI ( #33626 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
Signed-off-by: khluu <khluu000@gmail.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-12 08:54:17 +08:00
Micah Williamson
fb7b30c716
[ROCm][CI] Revert Test Groups From mi325_8 to mi325_1 Agent Pool In AMD CI ( #34384 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-11 15:52:34 -08:00
bnellnm
31d992d215
[Bugfix] Fix some issues with MoERunner PR #32344 ( #34371 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-11 14:33:14 -08:00
Wei Zhao
5aff2699bd
Fix CI failure - Flashinfer Kernel tests ( #34316 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-11 14:17:16 -08:00
Raushan Turganbay
527ca32197
[Bugfix] Fix more multimodal tests for transformers V5 ( #34334 )
...
Signed-off-by: raushan <raushan@huggingface.co >
2026-02-11 22:02:05 +01:00
Junseo Park
5458eb835d
[Bugfix] send None sentinel on final commit so server properly sends transcription.done ( #33963 )
...
Signed-off-by: pjs102793 <pjs102793@naver.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-02-11 21:01:53 +00:00
Tomas Ruiz
144d9b7cc8
[Benchmarks] Reduce ready checker log verbosity ( #34349 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2026-02-11 20:57:57 +00:00
elvischenv
83e26c834e
[GPT-OSS] Remove unnecessary contiguous ( #34337 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2026-02-11 15:29:29 -05:00
TJian
5001211369
[ROCm] [CI] fix test_unrecognized_env ( #34350 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-02-11 18:50:44 +00:00
Eldar Kurtić
11c7ace340
[Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) ( #34243 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
2026-02-11 13:24:22 -05:00
Xinyu Dong
be7f3d5d20
[Bugfix] fix default is_neox_style is True for deepseek ( #34353 )
...
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com >
2026-02-11 18:20:45 +00:00
Isotr0py
0ab06100f4
[Multimodal] Expose mm_processor_kwargs for DummyInputsBuilder ( #34330 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-11 09:37:40 -08:00
Xinyu Chen
ffb3d553cc
[Model Runner V2] Init cuda graph pool when necessary ( #33217 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2026-02-11 09:12:13 -08:00
junuxyz
fa7e0bfacf
[CI][BugFix] Fix silent failure in shellcheck hook and baseline exist… ( #32458 )
...
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com >
2026-02-11 17:03:48 +00:00
SorenDreano
48134a2c22
[Docs] Fix typo ("defult") and double spacing ( #34348 )
...
Signed-off-by: SorenDreano <71752785+SorenDreano@users.noreply.github.com >
Co-authored-by: Soren Dreano <soren@numind.ai >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-11 09:02:27 -08:00
kliuae
64f570ab56
[ROCm] [aiter] Split KV cache update for AiterFlashAttention ( #33681 )
...
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
2026-02-11 16:26:44 +00:00
Rohan Potdar
fd618871b4
[Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache ( #33948 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-11 11:12:05 -05:00
Harry Mellor
67a42b5a44
Don't try and run GLM-ASR with remote code ( #34352 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-11 08:09:40 -08:00
Lucas Wilkinson
c7914d30f9
Reapply [Attention][FA3] Update FA3 to include new swizzle optimization ( #34043 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-11 07:07:56 -08:00
Adam Binford
1b8756562e
Responses harmony system message structured ( #34268 )
...
Signed-off-by: Adam Binford <adamq43@gmail.com >
2026-02-11 05:14:28 -08:00
Linda
275e0d2a99
[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE ( #33715 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2026-02-11 12:38:11 +00:00
Harry Mellor
0f5e55e7a8
Make JAIS compatible with Transformers v5 ( #34264 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-11 12:30:37 +00:00
Harry Mellor
1e9204bff3
Make Qwen3VL compatible with Transformers v5 ( #34262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-11 04:13:23 -08:00
Li, Jiang
05339a7b20
[Bugfix][CPU] Fix llama4 inference on CPU ( #34321 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-02-11 19:07:23 +08:00
Harry Mellor
40b8f55358
[Docs] Reduce time spent generating API docs ( #34255 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-11 02:56:02 -08:00
Seiji Eicher
5045d5c983
Patch protobuf for CVE-2026-0994 ( #34253 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-02-11 02:25:04 -08:00
Nick Hill
e09546cf05
[Frontend] Exploit tokenizers "new stream" in FastIncrementalDetokenizer ( #34217 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-11 11:03:24 +01:00
Tianqi Ren
786806dd44
[Doc] Update Marlin support matrix for Turing ( #34319 )
...
Signed-off-by: Tianqi Ren <tianqi.r@outlook.com >
2026-02-11 09:03:41 +00:00
Nick Hill
79504027ef
[Misc] Bump fastsafetensors version for latest fixes ( #34273 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-11 00:30:09 -08:00
Luka Govedič
addac0e653
[torch.compile] Enable AR+rms fusion by default available for -O2 ( #34299 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2026-02-11 00:30:00 -08:00
Cyrus Leung
675a22ed66
[Chore] Move BaseRenderer to base.py ( #34308 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 00:29:51 -08:00
Kunshang Ji
cb9574eb85
[XPU][9/N] clean up existing ipex code/doc ( #34111 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-11 00:27:15 -08:00
AllenDou
21dfb842d7
[model] support FunASR model ( #33247 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-02-11 07:37:09 +00:00
R3hankhan
d1b837f0ae
[CPU] Enable FP16 (Half dtype) support for s390x ( #34116 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-02-11 14:41:42 +08:00
Roger Wang
0b20469c62
[Bugfix] Fix weight naming in Qwen3.5 ( #34313 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-10 21:37:14 -08:00
Tyler Michael Smith
d7982daff5
[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides ( #34279 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-11 05:15:52 +00:00
Robert Shaw
9b17c57460
[ModelBash][DSR1 NVFp4] Removed Bf16 Bias Cast ( #34298 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-11 05:00:00 +00:00
Hashem Hashemi
1b3540e6c6
Threshold fix wvSplitk for occasional CI fails ( #34013 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-11 03:59:14 +00:00
Matthias Gehre
7a048ee65f
[Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 ( #34149 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-02-11 03:58:56 +00:00
Cyrus Leung
c9a1923bb4
[Plugin] Simplify IO Processor Plugin interface ( #34236 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 19:47:39 -08:00
zofia
b482f71e9f
[XPU][7/N] enable xpu fp8 moe ( #34202 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2026-02-11 03:33:59 +00:00
Дзержи́нский
1485396abb
[Kernel] Apply 256bit LDG/STG To Activation Kernels ( #33022 )
...
Signed-off-by: Dzerzhinsky <256908701+AstroVoyager7@users.noreply.github.com >
Signed-off-by: Дзержи́нский <256908701+AstroVoyager7@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-10 19:31:51 -08:00
Kebe
5ee5c86eeb
[Bugfix][DeepSeek-V3.2] fix fp8 kvcache type cast ( #33884 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2026-02-10 19:31:36 -08:00
Cyrus Leung
b5dcb372e4
[Misc] Clean up validation logic in input processor ( #34144 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 19:29:29 -08:00
Tyler Michael Smith
066c6da6a0
[WideEP] Fix nvfp4 DeepEP High Throughput All2All backend ( #33738 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-10 19:15:43 -08:00
Richard Zou
e30cedd44b
[torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter ( #34093 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-10 19:15:40 -08:00
Cyrus Leung
3bcd494ef4
[Redo] Add --trust-remote-code to dataset bench args ( #34251 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 11:10:12 +08:00
tianshu-Michael-yu
0e725a7d22
[Bugfix] Fix Worker.load_model context-manager composition for sleep mode ( #34021 )
...
Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com >
2026-02-11 11:07:51 +08:00
Lucas Wilkinson
ba0511fd80
[Misc] Add run one batch script that supports profiling ( #32968 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-10 18:29:49 -08:00
Micah Williamson
4a1550d22d
[ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline ( #34280 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-11 01:08:11 +00:00
bnellnm
d1481ba783
[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner ( #32344 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-10 19:51:07 -05:00
7. Sun
dc6de33c3d
[CI] Add pip caching to cleanup_pr_body workflow ( #32979 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-02-11 00:45:28 +00:00
Tyler Michael Smith
c4b9e6778f
[Misc] Add pre-commit hook to catch boolean ops in with-statements ( #34271 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-10 15:13:20 -08:00
Richard Zou
341eed3d30
[torch.compile] Disable recursive pre_grad_passes ( #34092 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-10 18:02:31 -05:00
Zhengkai Zhang
6f2f59f2b3
[Misc][Spec Decode] support different load config for draft model ( #34022 )
...
Signed-off-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com >
Co-authored-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com >
2026-02-10 14:52:43 -08:00
Ilya Markov
bb2fc8b5e7
[BugFix] Fix async EPLB hang with DeepEP LL all2all backend ( #32860 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-02-10 22:34:47 +00:00
Ilya Markov
67132945bb
[Perf] Move eplb rebalance algo to async thread ( #30888 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-02-10 22:19:10 +00:00
Gregory Shtrasberg
f0ca0671c7
[Feature] Warn about unrecognized environment variables ( #33581 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-02-10 15:45:38 -06:00
Pavani Majety
578977bb5e
[SM100] Resubmit FMHA FP8 prefill for MLA ( #31195 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2026-02-10 16:18:43 -05:00
Roger Wang
9615575afc
[Bugfix] Fix mamba cache dtype for Qwen3.5 ( #34200 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-10 13:12:31 -08:00
Matthew Bonanni
4293c00b84
[Benchmarks] Fix attention benchmark smoke test ( #34269 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-10 16:04:07 -05:00
J Seppänen
506ad7d7c1
[Bugfix] Fix weights offloading for sleep mode ( #32947 )
...
Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-02-10 20:38:17 +00:00
Reagan Lee
fdd6f2ad58
Convert online APIs to use Renderer ( #34084 )
...
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com ”>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com ”>
2026-02-10 19:44:31 +00:00
Qi Wang
33bcd3dc3b
[Misc] Introduce ec_both role EC (encoder cache) connector ( #34182 )
...
Signed-off-by: Qi Wang <qiwa@nvidia.com >
2026-02-10 18:55:35 +00:00
Michael Goin
1f5febb4b8
[UX nit] Fix non-default api_server_count message ( #34152 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-10 10:35:58 -08:00
Andy Lo
ae871ca923
Minor cleanup for Voxtral ( #34247 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2026-02-10 18:18:30 +00:00
Woosuk Kwon
a2443de5fa
[Model Runner V2] Use pinned memory for write_contents ( #34222 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-10 08:55:22 -08:00
Harry Mellor
f84a2a8f31
[Docs] Speed up build environment set-up ( #34240 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-10 16:34:43 +00:00
Vadim Gimpelson
000214c4bb
[BUGFIX] Fix accuracy bugs in Qwen3-Next MTP ( #34077 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-10 10:57:11 -05:00
junuxyz
c5a66d1697
[Core][BugFix] Fix PP KV cache sharding memory validation ( #33698 )
...
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com >
2026-02-10 10:46:24 -05:00
Roberto L. Castro
afdce12c89
[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention ( #33680 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-02-10 10:29:52 -05:00
Zhengxu Chen
82e11973cc
[compile] Enable AOT compile with 2.10 in trunk. ( #34155 )
...
Signed-off-by: Zhengxu Chen <zhxchen17@meta.com >
2026-02-10 23:24:42 +08:00
xuebwang-amd
b129136c7a
[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations ( #29008 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-10 10:08:05 -05:00
mgazz
599e4335a4
Support benchmarking of Geospatial models ( #33922 )
...
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com >
2026-02-10 07:04:16 -08:00
Fan Yang
a1946570d8
add --insecure arg to the vllm bench to skip TLS ( #34026 )
...
Signed-off-by: Fan Yang <yan9fan@meta.com >
Co-authored-by: Fan Yang <yan9fan@meta.com >
2026-02-10 22:23:52 +08:00
Harry Mellor
d0bc520569
Bump mamba-ssm version in CI for Transformers v5 compatibility ( #34233 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-10 14:46:01 +01:00
Krish Gupta
748625cdaf
[V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input ( #34220 )
...
Signed-off-by: KrxGu <krishom70@gmail.com >
2026-02-10 13:05:32 +00:00
Harry Mellor
61413973e8
Stop testing for slow tokenizers as they will not exist soon ( #34235 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-10 12:08:20 +00:00
Phúc H. Lê Khắc
94de871546
[Misc] allow specify is_mm_prefix_lm in hf_config ( #34215 )
2026-02-10 11:16:21 +00:00
tc-mb
e042d7e685
Add flagos in MiniCPM-o ( #34126 )
...
Signed-off-by: tc-mb <caitianchi@modelbest.cn >
Signed-off-by: Vincent-Xiao <vincent.xiao.me@gmail.com >
Co-authored-by: Vincent-Xiao <vincent.xiao.me@gmail.com >
2026-02-10 02:51:48 -08:00
Roger Wang
ae4e280602
[Bugfix] Fix FI kernelchunk_gated_delta_rule output shape for Qwen3.5 ( #34219 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-10 10:41:24 +00:00
zzaebok
cbea11c9f0
[Docs] Fix format error in KV load failure recovery doc ( #34137 )
...
Signed-off-by: Jaebok Lee <jaebok9541@naver.com >
2026-02-10 02:16:26 -08:00
Cyrus Leung
2c32558a3c
[Bugfix] Fix --trust-remote-code conflict ( #34218 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 00:29:10 -08:00
Zetong Li
5f970120f0
[Bugfix] Fix memory inconsistency in cross-process shared memory ( #32022 )
...
Signed-off-by: Zetong Li <slippersss@126.com >
2026-02-10 08:22:03 +00:00
Cyrus Leung
998e2d91f8
Revert #34208 ( #34216 )
2026-02-09 23:59:04 -08:00
Wentao Ye
e1060a71a1
[Perf] Optimize detokenizer python logic ( #32975 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-02-09 23:54:41 -08:00
Chen Zhang
97fa8f6590
[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers ( #29387 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2026-02-10 07:41:16 +00:00
wang.yuqi
dab1de9f38
[Frontend][CI] Consolidate instrumentator entrypoints ( #34123 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-10 07:30:19 +00:00
Balaxxe
8d48d0a9d9
[Bugfix] Sort hf_weights_files in fastsafetensors_weights_iterator to match #33491 ( #34190 )
...
Signed-off-by: Balaxxe <136368465+jaim12005@users.noreply.github.com >
2026-02-09 23:06:30 -08:00
Andrew Xia
9608844f96
[responsesAPI] fix simpleContext streaming output_messages ( #34188 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2026-02-09 22:53:07 -08:00
Cyrus Leung
f69b903b4c
[Bugfix] Add --trust-remote-code to dataset bench args ( #34208 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-09 22:37:50 -08:00
Lucas Wilkinson
81e217fe6b
[Bugfix] Fix DP Attention Padding in Dummy Run ( #34187 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-10 05:29:39 +00:00
Cyrus Leung
ab97bcf662
[CI/Build] Relax test_mcp_tool_call ( #34204 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 05:18:57 +00:00
Cyrus Leung
25e48a3aae
[Doc] Update usage of --limit-mm-per-prompt ( #34148 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-09 21:12:13 -08:00
Roger Wang
8a5e0e2b2b
[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching ( #34183 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-10 13:03:32 +08:00
Andreas Karatzas
4cde2e0159
[ROCm][Bugfix] Resolve Dynamo tracing crash from amdsmi calls in on_gfx* arch detection ( #34108 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-09 20:50:20 -08:00
Roger Wang
047a457fa4
[Bugfix] Adopt ChunkGatedDeltaRule for Qwen3.5 ( #34198 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-10 03:47:54 +00:00
Yuwei An
e94ec59733
[LMCache] Token Base IPC API ( #34175 )
...
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com >
2026-02-10 01:18:42 +00:00
Ning Xie
13397841ab
[structured output] validate unsupported json features first ( #33233 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2026-02-09 23:49:09 +00:00
Gregory Shtrasberg
c60f8e3b49
[Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available ( #34153 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-02-09 17:38:54 -06:00
Michael Goin
5e75a14a66
[Doc] Add DCP support to attention backend doc ( #33936 )
2026-02-09 18:33:43 -05:00
Nick Hill
e7e52781ff
[ModelRunner V2][BugFix] Fix max_query_len calculation ( #34167 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-09 21:47:17 +00:00
Charlie Fu
bb9f97308d
[torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. ( #33945 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2026-02-09 16:15:43 -05:00
Hongxia Yang
4d39650961
[ROCm] update triton branch to support gpt-oss models for gfx11xx devices ( #34032 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
2026-02-09 19:36:30 +00:00
Artus Krohn-Grimberghe
8fd31f6245
[Bugfix] Voxtral prompt/audio placeholder alignment ( #34140 )
...
Signed-off-by: Artus KG <artuskg@gmail.com >
2026-02-09 19:30:38 +00:00
Artus Krohn-Grimberghe
eadb4e868b
[Bugfix] Avoid duplicate k-proj weight emission in helper ( #34142 )
...
Signed-off-by: Artus KG <artuskg@gmail.com >
2026-02-09 19:17:44 +00:00
Jiangyun Zhu
285bab4752
[Kernel] use flashinfer for gdn prefill ( #32846 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-02-09 12:17:25 -05:00
TomerBN-Nvidia
995bbf38f1
[Bugfix] Fix shared expert input for latent MoE in EP+DP (Nemotron-H) ( #34087 )
...
Signed-off-by: Tomer Natan <tbarnatan@nvidia.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-09 16:44:18 +00:00
Mohammad Miadh Angkad
d4f123cc48
[Kernel] FlashInfer: switch allreduce fusion to unified API ( #33985 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
2026-02-09 15:43:24 +00:00
ZhengHongming888
cb62e86f83
Add NUMA Core binding in nixl_connector for CPU xPyD ( #32365 )
...
Signed-off-by: Hongming Zheng <hongming.zheng@intel.com >
Signed-off-by: ZhengHongming888 <hongming.zheng@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-09 15:39:12 +00:00
Luka Govedič
781ddf7868
[CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200 ( #34031 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2026-02-09 10:05:14 -05:00
Roger Wang
64a9c2528b
[UX] Add --language-model-only for hybrid models ( #34120 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-09 14:57:33 +00:00
Lucas Wilkinson
d0d97e2974
[Misc] Fix up attention benchmarks ( #33810 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-09 09:42:03 -05:00
JJJYmmm
9562912cea
[MODEL] Adding Support for Qwen3.5 Models ( #34110 )
...
Signed-off-by: JJJYmmm <1650675829@qq.com >
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wulipc <wulipc@users.noreply.github.com >
Co-authored-by: ywang96 <ywang96@users.noreply.github.com >
Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-09 21:12:58 +08:00
zofia
9bdb06b436
[XPU][6/N] add xpu scaled_mm kernel ( #34117 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2026-02-09 20:17:35 +08:00
Nikhil Gupta
caad9f1e01
[Fix] [CPU Backend] : Prepack weights for w8a8 oneDNN matmul ( #33901 )
...
Signed-off-by: nikhil-arm <nikhil.gupta2@arm.com >
2026-02-09 18:04:41 +08:00
Ekagra Ranjan
1d5922fade
[ASR] Fix audio benchmark and add RTFx metric ( #32300 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2026-02-09 10:02:37 +00:00
Andreas Karatzas
3025b3cebb
[CI] Remove empty image_size_factors for fuyu, glm4_1v, glm_ocr ( #34107 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-09 17:37:04 +08:00
Jee Jee Li
978a37c823
[Model] GLM adaptation ( #34124 )
2026-02-09 17:32:52 +08:00
ihb2032
5a5c43511a
fix(cpu): fix mla_decode compilation on x86 without AVX512 ( #34052 )
...
Signed-off-by: ihb2032 <hebome@foxmail.com >
Co-authored-by: root <root@LAPTOP-FKNHV411.localdomain >
2026-02-09 08:55:41 +00:00
Nick Hill
d9bede0314
[BugFix] Fix fastsafetensors TP all procs using all GPUs ( #34070 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-09 15:15:46 +08:00
wang.yuqi
22b64948f6
[Frontend][last/5] Make pooling entrypoints request schema consensus. ( #31127 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-09 06:42:38 +00:00
Reagan Lee
7c233dbb36
[Tiny] Rename encoder budget file to more specific name ( #34103 )
...
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com ”>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com ”>
2026-02-09 03:48:19 +00:00
kourosh hakhamaneshi
a75a5b54c7
[bug-fix] supported_tasks is breaking backward compatibility at init_app_state ( #34027 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-09 09:46:46 +08:00
Andrey Talman
f97ca67176
[Release 2.10] Update to Torch 2.10 - final release ( #30525 )
2026-02-08 13:51:09 -08:00
danisereb
084aa19f02
Add support for ModelOpt MXFP8 dense models ( #33786 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-08 11:16:48 -08:00
navmarri14
1ecfabe525
glm 4.6 fused tuned inference config for B200 ( #32958 )
2026-02-08 18:55:47 +00:00
Richard Zou
4df841fe75
[torch.compile] Add an option to force-enable the MOE cold start optimization ( #33735 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-08 18:42:56 +00:00
TomerBN-Nvidia
a263aa6140
[BugFix] Change support no act and mul for marlin ( #34088 )
...
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com >
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com >
2026-02-08 17:18:22 +00:00
aabbccddwasd
179ae7da8f
[Revert] Fix performance regression for GLM-4.7-GPTQ decode and MTP acceptance rate ( #33771 )
...
Signed-off-by: aabbccddwasd <aabbccddwasd@qq.com >
2026-02-08 08:13:24 -08:00
Reagan Lee
c4df59ad43
Add embedding input functionality for disabled modalities [remake] ( #32493 )
...
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com ”>
Signed-off-by: Reagan Lee <reaganjlee@gmail.com >
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com >
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com ”>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-08 04:57:16 -08:00
TJian
785cf28fff
[ROCm] [CI] Reduce Resource of two test groups ( #34059 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-02-08 15:17:26 +08:00
Nick Hill
a96197f564
[Perf] Simplify DeepseekV32 tokenizer, ensure fast detokenization used ( #33855 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-08 07:16:34 +00:00
Andreas Karatzas
ab10d79855
[ROCm][Bugfix] fix act_quant_fusion module import error ( #34069 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-07 19:21:12 -08:00
Cyrus Leung
7fcb705b80
[CI/Build] Skip GCS test ( #34057 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 08:52:38 -08:00
Cyrus Leung
b956cdf818
[Doc] Fix run_batch docs ( #34056 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 06:18:16 -08:00
Hashem Hashemi
ed17f54c8b
Perf tuning and expansion of cases covered for wvSplitKrc ( #33493 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-07 05:33:11 -08:00
Jiang Wu
860981d8d8
Make directory exist ok for ray spinning up multiple replicas on a single instance ( #33604 )
...
Signed-off-by: Jiang Wu <jwu@cclgroup.com >
2026-02-07 05:30:49 -08:00
zifeitong
52181baaea
Update DeepGEMM version pin in Dockerfile to match #32479 ( #33935 )
...
Signed-off-by: Zifei Tong <zifeitong@gmail.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-07 05:30:22 -08:00
Rohan Potdar
de3869bb4d
move checks out of unified_kv_cache_update custom op ( #33943 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-07 05:30:09 -08:00
whx
ce9b3cd3e9
[PluggableLayer][3/N] Apply PluggableLayer to mamba layers. ( #33660 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-02-07 05:26:05 -08:00
Jee Jee Li
db4ede9743
[Model] Enable Step3p5ForCausalLM testing ( #33755 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-07 05:25:24 -08:00
Pooya Davoodi
2cb2340f7a
[Frontend]Add support for transcriptions and translations to run_batch ( #33934 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-07 05:24:57 -08:00
TundeAtSN
4df44c16ba
Enable Eagle3 speculative decoding for Mistral3ForConditionalGeneration to support eagle3 ( #33939 )
...
Signed-off-by: Akintunde Oladipo <akintunde.oladipo@servicenow.com >
Signed-off-by: TundeAtSN <akintunde.oladipo@servicenow.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-07 05:24:52 -08:00
Richard Zou
81fe69cae5
[torch.compile] Stop compiling identical artifacts ( #34003 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-07 05:24:48 -08:00
Mohammad Miadh Angkad
dd6a6e1190
[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune ( #34006 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-07 05:24:44 -08:00
Cyrus Leung
edb359cce4
[Renderer] Define render_cmpl and render_chat ( #34039 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 05:24:40 -08:00
wang.yuqi
6ed5eda300
[CI][Build] Pin grpcio-tools==1.78.0 ( #34048 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-07 05:24:35 -08:00
Cyrus Leung
11a4c9d30d
[Misc] Simplify get_max_tokens ( #34036 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 00:59:49 -08:00
lukec
15a0b9e570
Fix spelling errors ( #33978 )
2026-02-06 23:58:50 -08:00
Andreas Karatzas
c490d8cc73
[ROCm][CI] Pinning lm-eval version to resolve multi-modal small eval bug ( #34038 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-06 22:21:08 -08:00
Cyrus Leung
48312e579a
[Misc] Make PlaceholderRange.get_num_embeds a method ( #34035 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 05:30:17 +00:00
Vel
bc32444b23
[Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support ( #33517 )
...
Signed-off-by: code4me2 <velvetmoon222999@gmail.com >
2026-02-06 20:28:01 -08:00
Wentao Ye
18e8545297
[Revert] Add util handle_deprecated back ( #33998 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-07 04:14:45 +00:00
果冻虾仁
6f7adc533a
fix description in plugin_system.md ( #33999 )
2026-02-06 19:37:02 -08:00
Nick Hill
40218a82ba
[ModelRunner V2] Revert token rank comparison difference for now ( #34017 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-07 11:11:05 +08:00
kourosh hakhamaneshi
1c3b22058f
[Misc] Add backward-compatible import aliases for renamed translations module ( #34015 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-07 11:01:41 +08:00
Xin Yang
3920cafdd6
[Bugfix] Fix _fused_moe_lora_expand signature mismatch ( #33821 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-07 10:45:59 +08:00
rasmith
ec28784fdc
[CI][AMD]Bugfix] Check that model_config is not None in enable_norm_pad_fusion ( #34007 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-07 02:43:25 +00:00
Nicolò Lucchesi
55aeec04f5
[Bugfix] Fix Whisper tokenization ( #34011 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-07 10:42:52 +08:00
Ikenna
906077181b
[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 ( #33967 )
...
Signed-off-by: Ikenna <ikennachifo@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-07 02:27:33 +00:00
Aaron Hao
89a385d79f
[Feat][RL] Pause and Resume with keep requests for single engine ( #32351 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-07 00:08:58 +00:00
kourosh hakhamaneshi
4a2d00eafd
[bugfix] [ROCm] Fix premature CUDA initialization in platform detection ( #33941 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
2026-02-06 16:17:55 -06:00
Dimitrios Bariamis
207c3a0c20
Fix RoutingMethodType logic ( #33919 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2026-02-06 14:03:34 -08:00
Sumanth R Hegde
ae2e93f89b
[Fix] Fix logprobs=0 handling for /inference/v1/generate endpoint ( #34010 )
...
Signed-off-by: SumanthRH <sumanthrh99@gmail.com >
2026-02-06 20:33:40 +00:00
xuebwang-amd
9e9acce577
[Bugfix] Fix no attribute error of SharedFusedMoE (DeepSeek-V3.1 as test model) ( #33993 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
2026-02-06 19:11:32 +00:00
Charlie Fu
fe5438200b
[Rocm][Bugfix] Fix dtype not same for gemm_a4w4 op ( #33734 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2026-02-06 19:09:59 +00:00
Wentao Ye
77c09e1130
[Refactor] Remove align block size logic in moe_permute ( #33449 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-06 10:57:06 -08:00
zhrrr
16786da735
[Model Runner V2] support apply penalty for spec decode ( #33251 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2026-02-06 10:56:48 -08:00
vllmellm
aaa2efbe98
[DOC] [ROCm] Update docker deployment doc ( #33971 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 10:05:35 -08:00
Seiji Eicher
aca5967416
[KV Connector] Add missing method overrides to MultiConnector ( #33292 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2026-02-06 12:58:21 -05:00
Wentao Ye
67a746e87f
[Log] Optimize duplicate startup log ( #33944 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-06 17:49:56 +00:00
Chauncey
7bec435130
[Bugfix] Fix the issue where tool calling does not work when using fast detokenization with dsv32 ( #33964 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-06 09:23:44 -08:00
Eldar Kurtić
5c52644b10
[Docs] Update link to Benchmark CLI documentation ( #33254 )
...
Signed-off-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com >
2026-02-06 16:00:59 +00:00
zofia
2ce9fe4ad0
[XPU][5/N] add wna16 xpu kernel ( #33973 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2026-02-06 15:59:53 +00:00
Cyrus Leung
cd8b405bd0
[Refactor] Consolidate sequence normalization and enc-dec parsing ( #33928 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-06 15:43:47 +00:00
tc-mb
4707f7ebb4
[Model] Support MiniCPM-o 4.5 ( #33431 )
...
Signed-off-by: caitianchi <caitianchi@modelbest.cn >
Signed-off-by: tc-mb <caitianchi@modelbest.cn >
Co-authored-by: mslv <mslv@baai.ac.cn >
2026-02-06 15:29:10 +00:00
Michael Goin
c39ee9ee2b
[Docs] Add sections on process architecture and minimum CPU resources ( #33940 )
...
It seems users can be confused about vLLM's performance when running
with very small amounts of CPU cores available. We are missing a clear
overview of what vLLM's process architecture is, so I added this along with
some diagrams in arch_overview.md, and included a section on CPU resource
recommendations in optimization.md
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-06 15:26:43 +00:00
Andreas Karatzas
350ca72c04
[ROCm][AITER] Fix AITER import regression for explicit backend selection ( #33749 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-06 15:08:16 +00:00
FredericOdermatt
1fb0495a72
[FIX] guidance: use max(vocab_size, len(tokenizer)) for n_vocab ( #33509 )
...
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch >
2026-02-06 14:23:03 +00:00
Raushan Turganbay
85ee1d962b
[Bugfix] Fix models and tests for transformers v5 ( #33977 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 21:47:41 +08:00
Harry Mellor
51a7bda625
Update WeightTransferConfig to be more standard like the others ( #33989 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 13:15:00 +00:00
SorenDreano
6e7b1c4b59
[Docs] Improve documentation ( #33799 )
...
Co-authored-by: Soren Dreano <soren@numind.ai >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-06 12:57:09 +00:00
Kurt Shuster
2991dd3d22
[Bugfix][Model] Support LoRA on Qwen3 Output Embedding ( #29816 )
...
Signed-off-by: kurt <kurt@thinkingmachines.ai >
2026-02-06 20:25:31 +08:00
Luka Govedič
ac32e66cf9
[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) ( #33731 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: ProExpertProg <luka.govedic@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-06 04:19:49 -08:00
Fadi Arafeh
f79d9dce16
[CPU][BugFix] Fix loading of w8a8int models with bias ( #33582 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-02-06 11:59:20 +00:00
Harry Mellor
ba5cbbf107
Bump HF Hub client to get bug fix ( #33984 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 11:25:33 +00:00
zhang-prog
233b26ab35
[PaddleOCR-VL] Add BC for transformers 5.0 config ( #33976 )
...
Signed-off-by: zhangyue66 <zhangyue66@baidu.com >
2026-02-06 10:33:49 +00:00
Harry Mellor
791a94bed0
Consolidate and fix forbidden import pre-commit checks ( #33982 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 01:47:41 -08:00
Xinyu Chen
e969a169ef
support view_from_cpu_tensor on XPU ( #33868 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2026-02-06 08:34:20 +00:00
Harry Mellor
6d8d34be6d
Fix main pre-commit ( #33975 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 00:08:05 -08:00
Gassan Salama
1363e3d6d5
[cpu][performance] CPU Paged Attention NEON BFMMLA BF16 Implementation ( #32263 )
...
Signed-off-by: Gassan <gassan.salama@arm.com >
2026-02-06 15:01:48 +08:00
chengchengpei
965525667b
Onboard voyage-4-nano ( #33720 )
...
Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com >
Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com >
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-06 06:23:34 +00:00
sihao_li
6550815c3a
[XPU]Replace pip in docker.xpu with uv pip ( #31112 )
...
Signed-off-by: sihao.li <sihao.li@intel.com >
2026-02-06 14:02:33 +08:00
Kunshang Ji
7439e4f41b
[XPU][4/N] add mxfp4 moe model support ( #33679 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-06 13:03:59 +08:00
R3hankhan
ac04dd374f
[CPU] Add BF16 Kernel type for s390x ( #33788 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-02-06 04:57:02 +00:00
Cyrus Leung
035a6cb09a
[Misc] Update code for encoder-decoder models ( #33900 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-06 11:38:39 +08:00
Mingliang Li
a32cb49b60
feat(frontend): early-fail tokenization guard for user requests ( #31366 )
...
Signed-off-by: limingliang <limingliang@stepfun.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: limingliang <limingliang@stepfun.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-05 19:38:02 -08:00
Rabi Mishra
20d7454c9b
fix(ROCm): Make flash_attn import optional in MLA attention ( #33511 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-02-06 02:22:53 +00:00
Simon Mo
5819ca8944
[Docs] Add reo analytics ( #33957 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2026-02-05 17:42:22 -08:00
Xin Yang
79028d4388
[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel ( #33568 )
2026-02-05 20:34:00 -05:00
emricksini-h
325ab6b0a8
[Feature] OTEL tracing during loading ( #31162 )
2026-02-05 16:59:28 -08:00
Wei Zhao
91a07ff618
[Bugfix] Fix DeepSeek v3.2 tokenizer outputting None issue ( #33832 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-05 23:50:49 +00:00
Hashem Hashemi
d5c4800112
Adds padding and perf improvements to wvSplitK_fp8 ( #33527 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-05 22:16:02 +00:00
Lumosis
42d5d705f9
[Minor] Sort safetensors files to ensure deterministic loading order ( #33491 )
...
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2026-02-05 17:05:09 -05:00
Cyrus Leung
116880a5a0
[Bugfix] Make MM batching more robust ( #33817 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-05 20:40:58 +00:00
Matthew Bonanni
4145e50d85
[Bugfix] Fix DSV3.2 NVFP4 ( #33932 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-05 19:22:19 +00:00
Nicolò Lucchesi
20f5d185a6
[Misc] Rename translations to speech_to_text for OAI serving component ( #33904 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 19:16:52 +00:00
Harry Mellor
1887acca9e
Fix tokenizer test for renamed attr on Transformers v5 ( #33902 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-05 19:16:20 +00:00
Tsukasa OI
92e7562a99
[Bugfix] Suppress non-TTY color output on the process name part of the log ( #29714 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com >
2026-02-05 18:47:09 +00:00
Isotr0py
87d0d17ab5
[Models] Consolidate Deepseek-OCR2 processor ( #33909 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-05 18:29:20 +00:00
bnellnm
a57c8228ff
[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor ( #33375 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-05 18:07:18 +00:00
zackyoray
1ee95841bd
[Bugfix] Fix swapped engine_ids in NIXL Llama 4 local attention path ( #33795 )
...
Signed-off-by: Yoray Zack <yorayz@nvidia.com >
2026-02-05 17:51:58 +00:00
Nicolò Lucchesi
7d8c6804e2
[Misc] Add debug logs ( #33931 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 09:42:40 -08:00
Benjamin Chislett
af3162d3aa
[Spec Decode] Unified Parallel Drafting ( #32887 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-05 12:37:18 -05:00
danisereb
5b2a9422f0
[BugFix] Fix LoRA Fp8 ( #33879 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-05 17:25:55 +00:00
Aaron Hao
c1858b7ec8
[Feat][RL][1/2] Native Weight Syncing API: NCCL ( #31943 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Co-authored-by: SumanthRH <sumanthrh99@gmail.com >
2026-02-05 12:13:23 -05:00
Mario Hong
82914d2ae8
[Bugfix] Fix step3p5 parser when using mtp ( #33690 )
...
Signed-off-by: mariohong <mariohong128@gmail.com >
2026-02-05 16:04:04 +00:00
Nicolò Lucchesi
81a90e5277
[Docs] Add bart-plugin to docs ( #33905 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 12:20:25 +00:00
wang.yuqi
1c3a221d3b
[Bugfix] Fix corner case of sparse embedding ( #33886 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 02:51:22 -08:00
Cyrus Leung
7bd42e609d
[Refactor] Clean up input preprocessing ( #33687 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-05 18:43:42 +08:00
Isotr0py
a2522839d8
[Bugfix] Fix Kimi-K2.5 NVFP4 checkpoints weight loading ( #33876 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-05 10:29:54 +00:00
jiahanc
59a5cb387a
[perf] Integrate flashinfer concat_mla_k ( #31171 )
2026-02-05 05:23:11 -05:00
liranschour
8322d4e47f
Enable Cross layers KV cache layout at NIXL Connector V2 ( #33339 )
...
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-05 02:17:02 -08:00
Andreas Karatzas
3e472e81f9
[ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) ( #32710 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-02-05 10:01:23 +00:00
Cyrus Leung
038914b7c8
[Refactor] Move task outside of PoolingParams.verify ( #33796 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 09:33:11 +00:00
Pavani Majety
d2f4a71cd5
[Bugfix] Kimi-K2 grouped_topk usage for Flashinfer monolithic kernels. ( #33858 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2026-02-05 09:32:10 +00:00
Mark McLoughlin
2abd97592f
[KV Connector][Metrics] Do not count local prefix cache hits in connector queries ( #30522 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-05 09:57:27 +02:00
Chauncey
6abb0454ad
[Perf] Optimize the performance of structured output + reasoning ( #33557 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-05 15:45:29 +08:00
Li, Jiang
db6f71d4c9
[CI/Build] Fix CPU CI test case title ( #33870 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-02-05 15:07:14 +08:00
Fadi Arafeh
fd03538bf9
[CPU][BugFix] Allow w8a8 oneDNN quantized matmul to support 3D inputs ( #33727 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-02-05 06:26:09 +00:00
Andreas Karatzas
1f70313e59
[Bugfix] Fix ScoreMultiModalParam multi-document scoring returning single result ( #33837 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 06:17:00 +00:00
Li, Jiang
07daee132b
[CI/Build] Parallelize CPU CI tests ( #33778 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-02-05 13:53:48 +08:00
Andrew Xia
9595afda18
[2/N] move responses/serving _make_response_output_items logic to parser ( #33281 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Signed-off-by: Andrew Xia <axia@meta.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2026-02-05 13:46:15 +08:00
rasmith
c1395f72cd
[CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly ( #33840 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-05 05:05:48 +00:00
rinbaro
007b183d74
[docs] fix unintentional misspellings ( #33863 )
...
Signed-off-by: rinbaro <ilgomishra@gmail.com >
2026-02-04 20:50:59 -08:00
Nick Hill
add9f1fbd9
[Minor] Include StreamingInput in inputs package ( #33856 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-05 04:38:20 +00:00
Luka Govedič
e3bf79ffa0
Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" ( #33841 )
2026-02-04 19:54:27 -08:00
Andreas Karatzas
fb1270f1f8
[CI][Bugfix]: return McpCall for built-in MCP tools in non-streaming mode ( #32762 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-05 11:14:06 +08:00
Kevin H. Luu
72bb24e2db
[release] Minor fixes to release annotation ( #33849 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-02-05 02:07:35 +00:00
Chauncey
a7be77beef
[Bugfix] fix DeepSeek R1 with CUTLASS MLA Broken on B200 ( #33637 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-05 01:28:36 +00:00