Cyrus Leung
|
646b85544b
|
[Refactor] Remove Molmo2 processor wrapper (#36667)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-11 03:07:20 -07:00 |
|
tc-mb
|
4286cc5ec2
|
fix(minicpmv): fix audio inference by handling meta device in init_re… (#36751)
Signed-off-by: caitianchi <caitianchi@modelbest.cn>
|
2026-03-11 03:06:28 -07:00 |
|
LoganJane
|
545d18d81b
|
[Bugfix] Support other quantization methods in glm41v (#36321)
Signed-off-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com>
Co-authored-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-11 09:48:05 +00:00 |
|
roikoren755
|
e661b9ee83
|
[NemotronH] Small fix reasoning parser (#36635)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-11 02:44:41 -07:00 |
|
YiSheng5
|
c910eeb125
|
[XPU]Bug fix for some unexpected error when use AgRs backend on XPU device. (#36593)
Signed-off-by: yisheng <yi.sheng@intel.com>
|
2026-03-11 09:17:46 +00:00 |
|
Harry Mellor
|
f4ae58b38b
|
Remove unused config field from Gemma2 (#36672)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 01:51:19 -07:00 |
|
Isotr0py
|
e568cf88bc
|
[UX] Infer dtype for local checkpoint (#36218)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-11 08:50:04 +00:00 |
|
Nicolò Lucchesi
|
098d844731
|
[NIXL][1/N] Refactor kernel_block_size detection (#35752)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-11 01:11:23 -07:00 |
|
JartX
|
a40ee486f2
|
[Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 (#35923)
Signed-off-by: JartX <sagformas@epdcenter.es>
Co-authored-by: akaratza <akaratza@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-03-11 07:45:57 +00:00 |
|
pschlan-amd
|
eac2dc2b41
|
AITER MLA backend: Avoid CPU sync in _build_decode (#35765)
Signed-off-by: Patrick Schlangen <pschlan@amd.com>
|
2026-03-11 07:25:00 +00:00 |
|
Flora Feng
|
d5080aeaa4
|
[Refactor] Remove deadcode in Responses API serving (#36726)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-11 07:11:41 +00:00 |
|
liuzhenwei
|
f22d6e0267
|
[Hardware][NIXL] set default kv buffer type for different platform (#36438)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-11 05:19:28 +00:00 |
|
Kunshang Ji
|
76c6e6da08
|
[XPU] Support block fp8 moe by fallback to TritonExpert on XPU (#36458)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-10 21:54:09 -07:00 |
|
typer-J
|
4184653775
|
feat: add RISC-V support for CPU backend (v2) (#36578)
Signed-off-by: typer-J <2236066784@qq.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-03-10 21:51:39 -07:00 |
|
Sladyn
|
4aaaf8c8ce
|
feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503)
Signed-off-by: sladynnunes <snunes@usc.edu>
|
2026-03-11 04:35:33 +00:00 |
|
Hongbin Guo
|
4bf533623b
|
[Doc] Fix duplicate words in comments (#36713)
Signed-off-by: Hongbin10 <jdmjdm1998@163.com>
|
2026-03-10 21:28:31 -07:00 |
|
Matthew Bonanni
|
5f77ef15ae
|
[Misc][Attention] Clean up unused method in CPU_ATTN (#36673)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-10 21:27:22 -07:00 |
|
elvischenv
|
7d6abdd022
|
[Fix] Use torch.empty for output in attention+quant fusion (#31785)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-03-10 21:26:14 -07:00 |
|
Wentao Ye
|
a8ff2cca92
|
[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-10 21:25:30 -07:00 |
|
tunglinwood
|
42fadebecb
|
[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127)
Signed-off-by: tunglinwood <tunglinwood@gmail.com>
Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com>
Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>
|
2026-03-10 21:24:48 -07:00 |
|
tianshu-Michael-yu
|
a197eda9c3
|
Add tuned H100 MoE configs for LFM2 8B and 24B (#36699)
|
2026-03-10 21:22:02 -07:00 |
|
Kevin H. Luu
|
82b110d50e
|
[ci] Bound nvidia-cudnn-frontend version (#36719)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-11 12:17:35 +08:00 |
|
Benjamin Chislett
|
9040cd40af
|
[DSV3.2][MTP] Optimize Indexer MTP handling (#36723)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-11 12:16:56 +08:00 |
|
fangyuchu
|
fa0d353acf
|
[Bugfix] Surface exceptions from non-blocking execute_model in UniProcExecutor to avoid DP deadlocks (#35194)
Signed-off-by: fangyuchu <fangyuchu@qq.com>
|
2026-03-11 03:22:21 +00:00 |
|
Augusto Yao
|
b386bb3d7c
|
fix bugs when token_classify & classify run concurrently (#36614)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
|
2026-03-10 20:16:34 -07:00 |
|
Ning Xie
|
fe714dd507
|
[openapi server] log exception in exception handler(2/N) (#36201)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-03-10 20:16:30 -07:00 |
|
Matthew Bonanni
|
8ab3d7427c
|
[Bugfix] Fix DeepSeek V3.2 OOM during CG memory profiling (#36691)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-11 03:01:07 +00:00 |
|
Wei Zhao
|
84e436ed1c
|
[Bug] Fix TRTLLM Block FP8 MoE Monolithic (#36296)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-03-10 22:04:47 -04:00 |
|
Andreas Karatzas
|
81939e7733
|
[ROCm][CI] Making some tests optional to reduce workload (#36090)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-10 16:45:27 -07:00 |
|
Woosuk Kwon
|
195d1ca3e8
|
[Minor] Enhance error message for TRTLLM decode uniformity check (#36609)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-10 15:38:45 -07:00 |
|
Nick Hill
|
8d983d7cd6
|
[Model Runner V2] Add initial CI tests (#36041)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-10 14:55:21 -07:00 |
|
Nick Hill
|
65b2f405dc
|
[Core] Simplify core kv-cache blocks initialization logic (#36521)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-10 20:20:02 +00:00 |
|
Nick Hill
|
2a68464c5b
|
[Test] test_async_scheduling.py improvements (#36340)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-10 11:17:26 -07:00 |
|
Zhengxu Chen
|
bdd8981dab
|
[compile] Apply stored functorch config while finalizing loaded artifacts. (#36582)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-10 09:34:35 -07:00 |
|
Woosuk Kwon
|
f088a831dd
|
[Model Runner V2] Use unpadded num_tokens for PW CUDA graph attn metadata (#36626)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-10 09:30:56 -07:00 |
|
Harry Mellor
|
f83b933b84
|
[CI] Bump mypy version to 1.19.1 (#36104)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
v0.17.1rc0
|
2026-03-10 09:18:28 -07:00 |
|
Pleaplusone
|
82f3f30e26
|
[ROCm][Perf] Enable sparse_mla's cudagraph on ROCm platform (#35719)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-03-10 09:14:35 -07:00 |
|
Matthew Bonanni
|
9095cbbfb6
|
[Bugfix][Sparse MLA] report indexer CG support properly (#36519)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-10 09:14:31 -07:00 |
|
Hashem Hashemi
|
721ae79f50
|
Improvements to wvSplitKrc skinny GEMM solution (#34304)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-03-10 09:14:27 -07:00 |
|
AllenDou
|
aefc59f088
|
FunASR model bugfix (#36633)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
|
2026-03-10 08:14:21 -07:00 |
|
Harry Mellor
|
d88f28da05
|
Fix hf_override_fn when it modifies model_type (#35200)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-10 15:03:18 +00:00 |
|
Srinivasoo7
|
106ff69c4e
|
feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342)
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-10 14:43:40 +00:00 |
|
Jiangyun Zhu
|
ca5fb4bbd8
|
[Bugfix] Avoid merging empty-only partitions into splitting-op subgraphs (#36595)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-10 07:39:01 -07:00 |
|
Alvin Tang
|
cf88b23749
|
fix: check HTTP status in batch read_file to prevent silent failures (#36397)
Signed-off-by: gambletan <ethanchang32@gmail.com>
Co-authored-by: gambletan <ethanchang32@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-10 07:22:40 -07:00 |
|
wang.yuqi
|
a3189a08b0
|
[Model] Consolidate score logic by introduce score_type (#36479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-10 13:32:25 +00:00 |
|
SoluMilken
|
409c4e632d
|
[Misc] fix typo: homogenous-> homogeneous (2 lines change) (#36508)
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
|
2026-03-10 06:25:37 -07:00 |
|
Raushan Turganbay
|
8850738b70
|
[Bugfix] Fix processor signature (#36630)
Signed-off-by: raushan <raushan@huggingface.co>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-10 06:20:47 -07:00 |
|
Mark McLoughlin
|
234860399b
|
[Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270) (#36628)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-03-10 06:20:41 -07:00 |
|
Harry Mellor
|
c88510083b
|
Fix Qwen2.5-VL test for Transformers v5 (#36532)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-10 12:05:34 +00:00 |
|
Vadim Gimpelson
|
4ff8c3c8f9
|
[BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-10 03:32:20 -07:00 |
|