Seokhyun An
|
b337647aa0
|
[Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template (#30648)
Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>
|
2025-12-15 04:21:12 +00:00 |
|
Jee Jee Li
|
a524d1ba0a
|
[Bugfix] Fix deepseek_v32 tokenizer_mode (#30658)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-15 04:20:31 +00:00 |
|
Shanshan Shen
|
87b4d1557d
|
[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125)
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-12-15 11:13:32 +08:00 |
|
Wenqi Glantz
|
84e23d103d
|
additional protection for CVE-2025-62164 (#30649)
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
|
2025-12-15 03:07:10 +00:00 |
|
Shanshan Shen
|
738648fb81
|
[CustomOp] Support object-level enable for CustomOp (#30547)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-12-15 11:02:09 +08:00 |
|
Boyuan Feng
|
917fdae5b2
|
[Log] Skip piecewise cudagraph warn when using full cudagraph (#30657)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-12-15 02:49:45 +00:00 |
|
Robert Shaw
|
e2ed238885
|
Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" (#30653)
|
2025-12-14 19:33:41 -05:00 |
|
Or Ozeri
|
174e39ead7
|
CPU KV Offloading: Use more CUDA streams (#29013)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-12-14 23:50:45 +00:00 |
|
RioS
|
9ccbf6b692
|
[responsesAPI]add extra body parameters (#30532)
Signed-off-by: Ri0S <aa248424@gmail.com>
|
2025-12-14 19:25:45 +00:00 |
|
Chendi.Xue
|
ae2e503dda
|
[NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 (#30420)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-12-14 15:38:28 +00:00 |
|
Tsukasa OI
|
9e33a1a75b
|
[Model][Quantization] Override HF defaults to GGUF ones (incl. Qwen3 MoE) (#30118)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2025-12-14 15:01:42 +00:00 |
|
Vensen
|
add4b0ca44
|
[Bugfix][benchmarks] Fix input token calculation for rerank benchmark metrics (#30596)
Signed-off-by: vensen <vensenmu@gmail.com>
|
2025-12-14 14:57:15 +00:00 |
|
ZiTian Zhao
|
ae88aada38
|
[Feature]Add EVS (Efficient Video Sampling) Support for Qwen3-VL (#29752)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: deitxfge <huhaibo1990@126.com>
|
2025-12-14 05:24:56 -08:00 |
|
yifant-code
|
5ccf0efa84
|
[Bugfix] Improve error messages in ModelConfig validation (#30213)
Signed-off-by: ytian218 <ytian218@bloomberg.net>
Co-authored-by: ytian218 <ytian218@bloomberg.net>
|
2025-12-14 21:23:37 +08:00 |
|
ElizaWszola
|
994acec0cc
|
[Bugfix] Fix fusion for VL models (#30244)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
|
2025-12-14 21:22:37 +08:00 |
|
zifeitong
|
48b8456ff9
|
[Bugfix] Revert Qwen2-VL part of change in #28271 (#30542)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
|
2025-12-14 05:20:08 -08:00 |
|
Drew Botwinick
|
5b64ac21f9
|
[Bugfix] Update get_processor_data to use get_all method (#30583)
Signed-off-by: Drew Botwinick <6953152+dbotwinick@users.noreply.github.com>
|
2025-12-14 21:19:20 +08:00 |
|
Bin Bao
|
a8ec486592
|
[Misc] Add a script to benchmark compilation time (#29919)
Signed-off-by: Bin Bao <binbao@meta.com>
|
2025-12-14 13:02:39 +00:00 |
|
tjp_zju
|
6ecc1e411b
|
[Bugfix] fix _get_quant_method of FusedMoE for deepseekV3.2 on non-NV… (#30057)
Signed-off-by: tjp_zju <tanjianpingzju1990@gmail.com>
|
2025-12-14 02:20:51 -08:00 |
|
Shengliang Xu
|
0bb0bae436
|
Nvidia ModelOpt workaround for issue 28072 (#30164)
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
|
2025-12-14 18:18:31 +08:00 |
|
Johannes F
|
060893654d
|
fix: Update json features supported by xGrammar (#30390)
Signed-off-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com>
Signed-off-by: Johannes F <johannesflommersfeld@users.noreply.github.com>
Co-authored-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-14 02:16:06 -08:00 |
|
Matthias Gehre
|
e9add129ad
|
[Bugfix] awq_gemm: fix argument order swap (#30364)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-12-14 18:15:37 +08:00 |
|
Ilya Markov
|
3224ea9915
|
[torch.compile] Add encoder tag for compilation (#30489)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2025-12-14 18:15:11 +08:00 |
|
Lasha Koroshinadze
|
3a20450d31
|
Add AudioFlamingo3 model support (#30539)
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha Koroshinadze <26011196+lashahub@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-14 02:14:55 -08:00 |
|
Didier Durand
|
1a55cfafcb
|
[Doc]: fixing typos in various files (#30540)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-12-14 02:14:37 -08:00 |
|
drslark
|
add1b9d3de
|
[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring (#30632)
Signed-off-by: drslark <slarksblood@qq.com>
|
2025-12-14 01:32:16 -08:00 |
|
Cyrus Leung
|
dcb31196da
|
[Chore] Remove redundant RequestPrompt (#30612)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-14 09:22:37 +00:00 |
|
Laith Sakka
|
f569c654e1
|
enable unbacked with aot_compile (#30462)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-14 08:14:06 +00:00 |
|
Micah Williamson
|
97f2f160fd
|
[ROCm][CI] Add "Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test" Back Into AMD CI (#30590)
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-14 06:56:26 +00:00 |
|
Kayvan Mivehnejad
|
29f7d97715
|
Improve parse_raw_prompt test cases for invalid input .v2 (#30512)
Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>
|
2025-12-14 11:18:41 +08:00 |
|
Qier Li
|
dc7fb5bebe
|
[Bug][KVConnector][Metrics] Remove a vacuous assertion breaking external-launcher (#30577)
Co-authored-by: Qier Li <qier@fb.com>
|
2025-12-14 01:23:08 +00:00 |
|
Qidong Su
|
24429d5924
|
[Doc] Add instructions for building docker image on GB300 with CUDA13 (#30414)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
|
2025-12-13 21:56:53 +00:00 |
|
Wentao Ye
|
6e78ed6ba7
|
[Logs] Optimize startup logs 4 (#29903)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-13 16:12:53 -05:00 |
|
Isotr0py
|
7c16f3fbcc
|
[Doc] Add documents for multi-node distributed serving with MP backend (#30509)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-13 18:02:29 +00:00 |
|
lif
|
ddbfbe5278
|
[Docs] Clarify Expert Parallel behavior for attention and MoE layers (#30615)
Signed-off-by: majiayu000 <1835304752@qq.com>
|
2025-12-13 08:37:59 -09:00 |
|
Laith Sakka
|
763963aa73
|
set assume_32bit_indexing and pass unbacked hints (#30459)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-13 15:36:53 +00:00 |
|
Cyrus Leung
|
39cefbdf17
|
[Refactor] TokenizerRegistry only uses lazy imports (#30609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-13 23:16:22 +08:00 |
|
Chen Zhang
|
ace34e3783
|
[Bugfix] Qwen3-next with --hf-overrides \{\"num_hidden_layers\":8\} (#30433)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-12-13 22:12:45 +08:00 |
|
Isotr0py
|
e5db3e2774
|
[CI/Build] Fix broken mm processor test Mistral-3-large (#30597)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-13 04:43:01 -08:00 |
|
Cyrus Leung
|
64251f48df
|
[Chore] Adjust tokenizer import to avoid circular imports (#30601)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-13 04:42:39 -08:00 |
|
Nick Hill
|
1cec5b7ea9
|
[Scheduer] Simplify stop checking for pooling models (#30591)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-13 09:45:26 +00:00 |
|
Cyrus Leung
|
b09806e28f
|
[Bugfix] Dictionary MM embeddings for online chat (#30507)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-13 15:48:56 +08:00 |
|
Tsukasa OI
|
fdc135d768
|
[Misc][Quantization] Clarify the intent of GGUF FusedMoE weight materialization (#30310)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2025-12-13 13:55:14 +08:00 |
|
Roberto L. Castro
|
4fa7ce46f3
|
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-12 19:34:23 -08:00 |
|
Nicolò Lucchesi
|
57e9bf1864
|
[CI] Whisper logprobs tests (#30504)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-13 10:49:11 +08:00 |
|
Michael Goin
|
2f32a68d75
|
[CI] Update several models in registry that are available online now (#30514)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-12-12 18:28:13 -08:00 |
|
Matthew Bonanni
|
f5dfbbd8e9
|
[Docs] Remove references to VLLM_ATTENTION_BACKEND (#30564)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-13 10:20:15 +08:00 |
|
Michael Goin
|
fc0119425c
|
Add IBM and Red Hat to compute resources sponsors (#30581)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-12-13 01:34:23 +00:00 |
|
Matthew Bonanni
|
86a3261525
|
[Bugfix] Pass FA version in MultiHeadAttention (#30575)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-13 00:02:11 +00:00 |
|
rasmith
|
08f8a5627e
|
[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality (#30292)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-12 18:41:56 -05:00 |
|