Mads Kildegård
|
ea38474ac5
|
[Frontend][Responses API] Multi-turn (with type: "output_text") support for non-harmony requests (#29175)
Signed-off-by: Mads Kildegård <mkildegaard99@gmail.com>
|
2025-11-22 09:58:22 +00:00 |
|
Andrew Xia
|
742e9ff6b3
|
[responsesAPI] parse reasoning item input (#28248)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-22 15:42:11 +08:00 |
|
Woosuk Kwon
|
e9056056fb
|
[Model Runner V2] Limit cudagraph size to max decode batch size (#29221)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-21 20:21:35 -08:00 |
|
Jee Jee Li
|
1489902b53
|
[LoRA] Cleanup FusedMoEWithLoRA (#29187)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-22 04:01:30 +00:00 |
|
Yanan Cao
|
933f67ecd8
|
[Bugfix]Fix a conditional to not check zero value (#28754)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-11-21 19:59:07 -08:00 |
|
Yihua Cheng
|
77e1c035d0
|
[chore][LMCache connector] Remove useless logs from lmcache connector (#29069)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
|
2025-11-22 03:18:00 +00:00 |
|
FlintyLemming
|
052950e5b3
|
Add fused MoE config for H200 E160 N192 fp8 (#29182)
Signed-off-by: FlintyLemming <admin@flinty.moe>
|
2025-11-21 17:37:51 -08:00 |
|
Jie Luo
|
5c8f2adf50
|
[Bugfix] Fix block size in block_table with PCP (#29094)
Signed-off-by: Livinfly <luojie3m@gmail.com>
|
2025-11-22 01:34:28 +00:00 |
|
Lukas Geiger
|
d045e22dfe
|
[Model][Qwen3VL] Tune Triton w8a8 block fp8 kernel for L40s (#29217)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-21 17:30:55 -08:00 |
|
Wentao Ye
|
1d34eb11e0
|
[CI] Bug: Fix triton import issue (#29202)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-21 17:14:49 -08:00 |
|
Lucas Wilkinson
|
30d6466238
|
[BugFix] Fix Eagle IndexError: list index out of range for even num_speculative_tokens (#29102)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-22 00:47:05 +00:00 |
|
Woosuk Kwon
|
e9af6ba62a
|
[Model Runner V2] Optimize Gumbel Sampling Kernel (#29210)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-21 15:52:28 -08:00 |
|
Mark McLoughlin
|
c6fa3895e9
|
[KV Connector] Fix async connector prefix cache metrics (#28585)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-11-21 17:45:00 -05:00 |
|
Varun Sundar Rabindranath
|
3137991f55
|
[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor (#29162)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-21 14:28:17 -08:00 |
|
Julien Denize
|
57430fc95c
|
Default model load/config/tokenizer to mistral format if relevant files exist (#28659)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-11-21 13:58:59 -08:00 |
|
Ning Xie
|
53a1ba6ec5
|
[log] add weights loading time log to sharded_state loader (#28628)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-11-21 21:06:09 +00:00 |
|
Lucas Wilkinson
|
1840c5cb18
|
[BugFix] Make sure to allocate worst case MoE workspace during profile run in the DP + EP case (#27426)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-21 11:41:52 -08:00 |
|
Woosuk Kwon
|
1bed891f72
|
[Chore] Fix pre-commit error after #25266 (#29190)
|
2025-11-21 10:21:40 -08:00 |
|
Cyrus Leung
|
ceca060501
|
[Deprecation] Deprecate seed=None (#29185)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-21 18:19:25 +00:00 |
|
Chendi.Xue
|
460d02a417
|
[NIXL] Fix after virtual block_size for host_buffer with heter kv_layout (#29122)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-11-21 08:55:27 -08:00 |
|
Mingyuan Ma
|
b4c8fbaae2
|
Add TRTLLM MoE NVFP4 kernel to CompressedTensorsW4A4MoeMethod (#28892)
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-11-21 09:54:11 -07:00 |
|
rasmith
|
e99e467384
|
[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in lighting_attn.py (#29132)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-21 11:53:09 -05:00 |
|
Wentao Ye
|
a42ab317ac
|
[Log] Optimize startup log (#28948)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-11-21 08:46:20 -08:00 |
|
Aleksandr Malyshev
|
b7f1f490a6
|
Upstream triton fp4 weight preshuffle (#28888)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-11-21 11:34:46 -05:00 |
|
Woosuk Kwon
|
30b44a1598
|
GPU Model Runner V2 (#25266)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-21 08:20:55 -08:00 |
|
Cyrus Leung
|
d7219bcda3
|
[Misc] Move dynamic seed initialization to EngineArgs (#29165)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-21 15:27:44 +00:00 |
|
wangxiyuan
|
4050bae417
|
[Doc] Update plugin doc (#28532)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-21 14:57:26 +00:00 |
|
Julien Denize
|
434f3d3eb8
|
Fix mistral config (#29172)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-11-21 14:01:20 +00:00 |
|
sfbemerk
|
2092ce8c39
|
Tool Call Parser logs should not contain user input / model output except on DEBUG (#29160)
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-11-21 20:57:19 +08:00 |
|
who who who
|
fc9f821d20
|
fix cross attention (#28346)
Signed-off-by: fsx950223 <fsx950223@outlook.com>
|
2025-11-21 04:55:43 -08:00 |
|
Russell Bryant
|
cca2d2cdbe
|
[Core] Align whisper closer to other multimodal models (#27292)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-11-21 12:01:54 +00:00 |
|
Cyrus Leung
|
aab0102a26
|
[V0 deprecation] Remove more V0 references (#29088)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-21 11:56:59 +00:00 |
|
Huamin Li
|
8ac3a41487
|
[CI Failure] Fix Gemma3 RoPE configuration for sliding attention layers (#29111)
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-20 23:53:30 -08:00 |
|
Canlin Guo
|
7d6da483b0
|
[Minor][Clean] Remove the legacy assertion in video (#29150)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2025-11-20 23:52:34 -08:00 |
|
Chenheli Hua
|
e4c3182c68
|
[Small] Capture AttributeError when checking ray dependency. (#29024)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-11-20 22:54:10 -08:00 |
|
Alex Brooks
|
b4734b9550
|
[Bugfix] Fix default MM LoRA alignment for single str prompts (#29140)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-11-21 13:32:30 +08:00 |
|
Jialin Ouyang
|
30b9c67743
|
Revert "[Redo] #26368 (#28771)" (#29121)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-20 21:27:45 -08:00 |
|
Matthew Bonanni
|
11857a00b0
|
[Attention] Add ROCM_AITER_MLA_SPARSE to attention backend registry (#29103)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-20 20:24:43 -08:00 |
|
Boyuan Feng
|
8c25f9cfb6
|
[BugFix] skip combo kernel on cpu (#29129)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-11-21 11:50:59 +08:00 |
|
Cyrus Leung
|
56e96b37e4
|
[V0 Deprecation] Remove best_of (#29090)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-21 11:40:40 +08:00 |
|
jeremyteboul
|
0730414999
|
[Core] Add audio_embeds support to chat completions (#29059)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
|
2025-11-21 11:39:47 +08:00 |
|
zhrrr
|
a982f5b5ea
|
[kernel][perf] support uncontiguous input for rms_norm kernel (#28103)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-11-20 19:39:09 -08:00 |
|
Cyrus Leung
|
0e741c12e3
|
[Bugfix] Fix Plamo3 rope handling (#29092)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-21 11:38:35 +08:00 |
|
Wentao Ye
|
56669c1f29
|
[CI] Fix mypy for vllm/v1/worker (#29037)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-21 11:36:07 +08:00 |
|
Hongxia Yang
|
3f5f36da3f
|
[ROCm] Fix for import when building with upstream triton for gfx1100 for gpt-oss serving (#29127)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
|
2025-11-21 03:30:07 +00:00 |
|
Wentao Ye
|
e1eefa4c40
|
[Bug] Fix torch warning of tf32 usage (#29112)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-21 01:54:59 +00:00 |
|
Xiao Li
|
ed6ae1e36a
|
[AITER] [ROCm] Fix crash when loading llama4 model with old aiter version installed, fallback to forward_native implementation (#29124)
Signed-off-by: Xiao Li <ilx@meta.com>
|
2025-11-20 17:54:35 -08:00 |
|
Jee Jee Li
|
9875be6431
|
[LoRA][2/2]Remove LoRA extra vocab (#28545)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-21 09:46:43 +08:00 |
|
Wentao Ye
|
df44df0143
|
[Feature] Shared Experts Overlap with FI deepgemm swap kernel, 2.2% throughput improvement and 3.6% TTFT improvement (#28879)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-20 18:41:49 -07:00 |
|
Driss Guessous
|
3fd74189db
|
Fixes bench (#29058)
Signed-off-by: drisspg <drisspguessous@gmail.com>
|
2025-11-20 21:21:54 +00:00 |
|