Qiu
|
9b24cf6f47
|
[bugfix] correct local_chunk_len for DCP in reorg_kvcache with long context (#28526)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit 968060c15a)
|
2025-11-15 21:54:19 -08:00 |
|
Nick Hill
|
facbc2c21e
|
[BugFix] Ensure EngineArgs.create_engine_config is idempotent (#28515)
Signed-off-by: Nick Hill <nhill@redhat.com>
(cherry picked from commit 327c0a9a23)
|
2025-11-15 21:54:15 -08:00 |
|
Roger Wang
|
e2fd9a2edf
|
[Misc] Turn off encoder torch compile by default (#28634)
Signed-off-by: Roger Wang <hey@rogerw.io>
(cherry picked from commit d3387750f1)
|
2025-11-15 21:54:05 -08:00 |
|
Huy Do
|
1326f17492
|
Use official xformers-0.0.33 built for PT 2.9 (#28600)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
(cherry picked from commit c33b87e777)
|
2025-11-15 21:53:04 -08:00 |
|
Harry Mellor
|
caf412e593
|
Skip models that cannot currently init on Transformers v5 (#28471)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
(cherry picked from commit 51c599f0ec)
|
2025-11-15 21:52:58 -08:00 |
|
Harry Mellor
|
a035b5cffb
|
[CI] Skip "Multi-Modal Models Test (Extended) 3" test that's broken in current Transformers (#28559)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
(cherry picked from commit a39dd7bb06)
|
2025-11-15 21:52:46 -08:00 |
|
Harry Mellor
|
5b4dcecdd7
|
Remove deprecated fields from CompilationConfig (#27593)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
(cherry picked from commit a742134cc5)
|
2025-11-15 21:48:13 -08:00 |
|
Isotr0py
|
609bb244bd
|
[Performance] Cache loaded custom logitsprocs to avoid overheads (#28462)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
(cherry picked from commit 3f770f4427)
|
2025-11-15 21:44:19 -08:00 |
|
Roger Wang
|
3a9ea77c35
|
[Bugfix] Fix max image size for PaddleOCR-VL (#28442)
Signed-off-by: Roger Wang <hey@rogerw.io>
(cherry picked from commit 4fd4b743a2)
|
2025-11-15 21:44:19 -08:00 |
|
Robert Shaw
|
28a82bb5e6
|
[Bugfix] Fix Stream Sync for Shared Expert Overlap (#28430)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
(cherry picked from commit e605e8e323)
|
2025-11-15 21:44:19 -08:00 |
|
Michael Goin
|
2a21f3e7c2
|
Only register rocm_aiter_ops if aiter is found (#28428)
Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit f2d9ad0620)
|
2025-11-15 21:36:19 -08:00 |
|
Lucas Wilkinson
|
ab625ba2fc
|
[CI/Test Fix] Fix CP tests on Blackwell (#28404)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit 39029d5192)
|
2025-11-15 21:36:19 -08:00 |
|
Wentao Ye
|
324c8cbd79
|
[Feature] Refactor batch invariant fp8 DeepGEMM (#27606)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
(cherry picked from commit 35d801f13f)
|
2025-11-15 21:35:58 -08:00 |
|
Adrian Abeyta
|
75ecaf48fe
|
[Bugfix] Ensure calculated KV scales are applied in attention. (#27232)
Signed-off-by: adabeyta <aabeyta@redhat.com>
(cherry picked from commit a5a790eea6)
|
2025-11-15 21:33:58 -08:00 |
|
Robert Shaw
|
30700b1cd7
|
[CI] Fix Plugin Tests Tests (#28413)
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
v0.11.1rc6
|
2025-11-10 22:36:11 +00:00 |
|
Andrew Xia
|
4b94ed8f92
|
[Frontend][2/n] remove empty content from _parse_tool_calls_from_content (#28331)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-11-10 14:07:49 -08:00 |
|
Lucas Wilkinson
|
6dec9f6109
|
[BugFix] Fix DeepGEMM over-allocating workspace (#28254)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-10 17:01:17 -05:00 |
|
Wei Wei
|
bf6a3d0ff5
|
[Misc] Add more scoping for improved trace (#28329)
Signed-off-by: Wei Wei <wwei6@meta.com>
|
2025-11-10 21:03:21 +00:00 |
|
Sage Moore
|
40d33264c6
|
[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#28377)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Sage Moore <sagemoore@utexas.edu>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-10 20:39:19 +00:00 |
|
Jonas M. Kübler
|
9c84ca8293
|
[FA/Chore] Bump FA version for FP8 two-level accumulation (#27889)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-11-10 12:06:04 -08:00 |
|
Rémi Delacourt
|
6d54336ae5
|
[Bugfix] Fix llguidance backend, rollback when EOS was encountered (#25905)
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-11-10 14:53:32 -05:00 |
|
jiahanc
|
34553b9d27
|
[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-11-10 12:34:57 -05:00 |
|
Varun Sundar Rabindranath
|
b039bfda8f
|
[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-10 09:21:52 -08:00 |
|
Cyrus Leung
|
d0e186c16f
|
[V0 Deprecation] Remove unused context_len and seq_len from M-RoPE (#28395)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-11 00:30:06 +08:00 |
|
vllmellm
|
f080a83511
|
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops (#24490)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-10 08:20:53 -08:00 |
|
caozuoba
|
40e2eeeb92
|
[Kernel] Optimization of the mm_k operator. (#28280)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-10 16:03:46 +00:00 |
|
zejunchen-zejun
|
b06b9470ca
|
[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model (#27474)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-11-10 10:38:56 -05:00 |
|
TJian
|
4673e465ff
|
Add @tjtanaa to codeowner for ROCm and multi-modal (#28360)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-11-10 21:39:17 +08:00 |
|
Ferrebo
|
912744d066
|
[Fix] optimize visual token mask with caching and multi-token support (#28374)
Signed-off-by: Ferrebo <itachi971009@gmail.com>
Signed-off-by: kebo01 <kebo01@baidu.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-10 13:23:49 +00:00 |
|
Yu Jiaqi
|
15be507c86
|
[bugfix] fix siglip batch text output error (#28365)
Signed-off-by: piood <2477084691@qq.com>
|
2025-11-10 21:21:15 +08:00 |
|
Mark McLoughlin
|
6f7de33bed
|
[Metrics] Refactor LoRA state tracking (#26801)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-11-10 16:34:36 +08:00 |
|
Shinichi Hemmi
|
a98cc35c34
|
Restore PlaMo2 unit test as pfnet/plamo-2-1b now supports transformers >=4.56 (#28019)
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
|
2025-11-10 06:50:02 +00:00 |
|
Lucas Wilkinson
|
e8697faf03
|
[V0 deprecation] Remove no longer used get_metadata_cls (#28370)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-10 14:32:09 +08:00 |
|
Xiake Sun
|
03fa4d3fb3
|
[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X (#28373)
Signed-off-by: Xiake Sun <xiake.sun@amd.com>
Signed-off-by: Xiake Sun <xisun@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-10 04:53:40 +00:00 |
|
Varun Sundar Rabindranath
|
6b2b9fd934
|
[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness (#28322)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-10 10:45:29 +08:00 |
|
JartX
|
c5f685b3ae
|
[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP (#28279)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2025-11-09 23:09:36 +00:00 |
|
Jiangyun Zhu
|
c4768dcf47
|
[Kernel] Fix fused_gdn_gating (#28343)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-09 14:26:35 -07:00 |
|
Zhewen Li
|
a65a934ebe
|
[CI/Build] Temporary fix to LM Eval Small Models (#28324)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-09 21:08:38 +00:00 |
|
usberkeley
|
4a8d6bd168
|
Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method (#28214)
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
|
2025-11-09 19:11:46 +00:00 |
|
Lucas Wilkinson
|
636efd10a5
|
[Core] Separate out attention metadata building logic from prepare inputs (#26764)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-09 13:51:43 -05:00 |
|
Nick Hill
|
289eb6c537
|
[Core] Simplify async KV output aggregation (#28327)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-09 09:44:13 -08:00 |
|
Nicolò Lucchesi
|
19d91ece4b
|
[CI] Fix flaky test_eagle_correctness test (#28364)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-09 16:04:59 +00:00 |
|
Jiangyun Zhu
|
7ae5a5fb11
|
[Misc] Add some comments in qwen3-next (#28267)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-08 23:59:24 -08:00 |
|
Yong Hoon Shin
|
de2b78305f
|
[ROCm] Add env to enable/disable aiter triton gemm (#28321)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-11-08 22:27:00 -08:00 |
|
Ning Xie
|
e5e9067e61
|
[Misc] fix typo and add detailed log (#28178)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-11-09 05:33:46 +00:00 |
|
yihong
|
3a7d580343
|
fix: close issue 28338 by fixed python version (#28339)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-11-09 05:07:26 +00:00 |
|
Kevin H. Luu
|
05f8d69077
|
[chore] Move some wikimedia images to S3 (#28351)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2025-11-09 01:58:26 +00:00 |
|
Mohammad Miadh Angkad
|
404d7a9d14
|
[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 (#28345)
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
|
2025-11-08 15:50:10 -07:00 |
|
ElizaWszola
|
171133f929
|
[Bugfix] Fix test fused quant layernorm tests (#27865)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-08 14:31:33 -08:00 |
|
Cole Murray
|
32787d0644
|
Remove setuptools upper bound constraint (<80) (#28337)
Signed-off-by: Cole Murray <colemurray.cs@gmail.com>
|
2025-11-08 22:30:18 +00:00 |
|