maang
|
a34abc49b7
|
[FixBug] Improve exception string in tensorizer.py (#31680)
Signed-off-by: maang <maang_h@163.com>
Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-11 05:01:53 -08:00 |
|
rongfu.leng
|
d70249e2e9
|
[Misc] fix this log format not space (#32112)
Signed-off-by: lengrongfu <lenronfu@gmail.com>
|
2026-01-11 05:01:16 -08:00 |
|
Cyrus Leung
|
a374532111
|
[CI/Build] Separate out flaky responses API tests (#32110)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-11 05:01:12 -08:00 |
|
Isotr0py
|
cee7436a26
|
[Misc] Make scipy as optional audio/benchmark dependency (#32096)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-11 00:18:57 -08:00 |
|
Or Ozeri
|
4c16ba617f
|
[KVConnector] OffloadingConnector: Fix bug in handling of preemptions (#29870)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-11 08:05:36 +00:00 |
|
Matt
|
bde57ab2ed
|
[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-10 23:19:46 -08:00 |
|
Fadi Arafeh
|
9103ed1696
|
[CPU][BugFix] Disable AOT Compile for CPU (#32037)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-10 23:15:49 -08:00 |
|
Laith Sakka
|
46eb30f519
|
make assume_32_bit_indexing configurable (#32044)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2026-01-10 23:15:46 -08:00 |
|
Andy Liu
|
0dd63639be
|
[MTP][GLM][Bugfix] Fixed .weight_scale loading logic that dropped MTP prediction accuracy with fp8+mtp (#32101)
Signed-off-by: Andy Liu <andyliu@roblox.com>
|
2026-01-10 23:14:54 -08:00 |
|
Cyrus Leung
|
ef96fa3f1f
|
[Benchmark][2/2] Use spline interpolation to tune SLA variables (#32095)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-10 20:27:27 -08:00 |
|
Or Ozeri
|
2a4dbe24ea
|
[BugFix] Wait for compute before offloading KV to CPU (#31341)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-10 22:25:08 +00:00 |
|
RickyChen / 陳昭儒
|
8020a60402
|
[Bugfix] Fix Qwen3-VL-Reranker model loading for sequence classification (#32089)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-10 12:40:09 -08:00 |
|
Vadim Gimpelson
|
e15a5ff07b
|
[MISC] Add strict contiguity check for FlashInfer attention tensors (#32008)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
|
2026-01-10 12:40:05 -08:00 |
|
Vensen
|
6ea001cfb7
|
[Bugfix][Quantization] Ensure input contiguity in per_token_quant_int8 (#31637)
Signed-off-by: vensen <vensenmu@gmail.com>
|
2026-01-10 12:40:02 -08:00 |
|
shyeh25
|
1c46dea001
|
Revert "[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#308… (#31617)
Signed-off-by: shyeh25 <206795756+shyeh25@users.noreply.github.com>
|
2026-01-10 12:39:59 -08:00 |
|
Or Ozeri
|
028599739d
|
[BugFix] scheduler: Fix resuming of preempted requests after async load (#31583)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-10 12:39:25 -08:00 |
|
gnovack
|
d1fd802fa3
|
fused_moe_kernel - cast accumulator after applying router weights (#32002)
Signed-off-by: gnovack <gnovack@amazon.com>
|
2026-01-11 04:36:45 +08:00 |
|
Xin Yang
|
543c23be78
|
[LoRA][Perf] Improve FusedMoE LoRA performance for small rank (#32019)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-10 11:04:18 -08:00 |
|
jvlunteren
|
b8bf5c45bb
|
[Kernel] Optimize Sliding Window Attention in 3D Triton Kernel (#31984)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2026-01-10 18:13:44 +00:00 |
|
Michael Goin
|
e6c6f2c79d
|
[Quant] Support MXFP4 W4A16 for compressed-tensors dense models (#31926)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-10 06:44:35 -08:00 |
|
Jeremy Teboul
|
07286ec5a6
|
[Bugfix] Fix integer overflow in Gemma3n audio processing (#31657)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
|
2026-01-10 17:52:53 +08:00 |
|
Ning Xie
|
14fc7a68c7
|
[Bugfix] fix offline chat output prompt (#32076)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-10 07:50:57 +00:00 |
|
Cyrus Leung
|
5f2385a4c8
|
[Benchmark][1/2] Generalize SLA criterion validation from binary flags to margins (#32075)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-10 07:11:03 +00:00 |
|
Frelam
|
a01a1c0d69
|
[Bugfix] fix encoder cache leak of waiting requests in scheduler to solve stuck in CPU scheduling (#31857)
Signed-off-by: frelam <frelam112233@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-10 06:27:58 +00:00 |
|
Lucas Wilkinson
|
da6709c9fe
|
[Misc] Delay deprecation of CommonAttentionMetadata properties (#32074)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-09 21:06:44 -08:00 |
|
Andreas Karatzas
|
d83becd503
|
[ROCm][CI] Fix flaky test_function_calling_with_stream and reduce schema test examples (#32063)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-10 05:02:35 +00:00 |
|
roikoren755
|
0c9614876e
|
Update modelopt KV cache quantization resolution to new scheme (#31895)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-01-10 04:54:13 +00:00 |
|
Cyrus Leung
|
583a90e005
|
[Refactor] Separate sequence and token pooling types (#32026)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-10 04:53:24 +00:00 |
|
maang
|
52d428295d
|
[Core] Refactor ColumnParallelLinear: remove unused parameter and optimize forward (#31939)
Signed-off-by: maang <maang_h@163.com>
|
2026-01-10 04:19:49 +00:00 |
|
Kevin McKay
|
c60578de0a
|
[Bugfix][Hardware][AMD] Use dynamic WARP_SIZE in sampler vectorized_process (#31295)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2026-01-10 03:57:38 +00:00 |
|
PatrykSaffer
|
80fead8bf6
|
Fuse RoPE and MLA KV-cache write (#25774)
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-09 19:18:37 -08:00 |
|
Akshat Shrivastava
|
e45946bd91
|
feature/issac 0.2 (#31550)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-10 03:18:05 +00:00 |
|
Lucas Kabela
|
ea6d067a2a
|
[Misc][LLaMa4] Compile LLaMa Vision Encoder (#30709)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-09 22:01:38 -05:00 |
|
Ning Xie
|
abd9224280
|
resolve pydantic error in startup benchmark (#31348)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-10 02:41:27 +00:00 |
|
Kevin McKay
|
4dc0d606b7
|
[Bugfix] Narrow broad exceptions in compilation backends (#31616)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-01-09 21:39:22 -05:00 |
|
Micah Williamson
|
ac0675ff6b
|
[CI] Allow Deprecated Quantization For LM Eval Tests (#32065)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-09 19:10:47 -07:00 |
|
Wentao Ye
|
e18464a57d
|
[Perf] Optimize async scheduling placeholder using empty (#32056)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-10 00:46:11 +00:00 |
|
Russell Bryant
|
1963245ed1
|
[Core] Use weights_only=True with torch.load (#32045)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-01-10 00:28:57 +00:00 |
|
Matthew Bonanni
|
0308901975
|
[2/N][Attention] Fix pre-commit errors (#32052)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-10 00:27:15 +00:00 |
|
Lucas Kabela
|
aaf4b70aae
|
[Misc][BE] Type coverage for vllm/compilation [2/3] (#31744)
|
2026-01-09 18:30:38 -05:00 |
|
Nick Hill
|
3adffd5b90
|
[Misc] Enable async scheduling by default with spec decoding (#31998)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-09 23:09:34 +00:00 |
|
zhrrr
|
97ba96fbe9
|
[perf][async] support non cpu sync get logprob tensors for spec (#31336)
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2026-01-09 21:24:51 +00:00 |
|
Chendi.Xue
|
94578127a4
|
[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout (#30275)
|
2026-01-09 21:22:19 +00:00 |
|
Matthew Bonanni
|
2612ba9285
|
[1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-09 13:10:24 -08:00 |
|
Andrew Xia
|
1f8b7c536b
|
[responsesAPI] fix incomplete_messages for simple/parsable context (#31836)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2026-01-09 21:00:57 +00:00 |
|
Lucas Wilkinson
|
0a0aa07747
|
[Quant] Make static quant support all group shapes (#30833)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-09 12:49:27 -08:00 |
|
jiahanc
|
f9e2a75a1e
|
[fix] add cutedsl to global sf (#32001)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2026-01-09 12:03:02 -08:00 |
|
Runkai Tao
|
a4d5d663e2
|
Add unpermute-aware fused MoE path and small-batch fallback (#29354)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-09 12:58:39 -07:00 |
|
Jeremy Teboul
|
657e9c0e18
|
[Fix] Introduce audio channels spec (#31595)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
|
2026-01-09 19:34:51 +00:00 |
|
Wentao Ye
|
308feab33f
|
[Perf] Optimize cutlass moe problem size calculation, 5.3% E2E Throughput improvement, 2.2% TTFT improvement (#31830)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-09 11:13:43 -08:00 |
|