Didier Durand
|
1a55cfafcb
|
[Doc]: fixing typos in various files (#30540)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-12-14 02:14:37 -08:00 |
|
drslark
|
add1b9d3de
|
[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring (#30632)
Signed-off-by: drslark <slarksblood@qq.com>
|
2025-12-14 01:32:16 -08:00 |
|
Cyrus Leung
|
dcb31196da
|
[Chore] Remove redundant RequestPrompt (#30612)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-14 09:22:37 +00:00 |
|
Laith Sakka
|
f569c654e1
|
enable unbacked with aot_compile (#30462)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-14 08:14:06 +00:00 |
|
Micah Williamson
|
97f2f160fd
|
[ROCm][CI] Add "Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test" Back Into AMD CI (#30590)
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-14 06:56:26 +00:00 |
|
Kayvan Mivehnejad
|
29f7d97715
|
Improve parse_raw_prompt test cases for invalid input .v2 (#30512)
Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>
|
2025-12-14 11:18:41 +08:00 |
|
Qier Li
|
dc7fb5bebe
|
[Bug][KVConnector][Metrics] Remove a vacuous assertion breaking external-launcher (#30577)
Co-authored-by: Qier Li <qier@fb.com>
|
2025-12-14 01:23:08 +00:00 |
|
Qidong Su
|
24429d5924
|
[Doc] Add instructions for building docker image on GB300 with CUDA13 (#30414)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
|
2025-12-13 21:56:53 +00:00 |
|
Wentao Ye
|
6e78ed6ba7
|
[Logs] Optimize startup logs 4 (#29903)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-13 16:12:53 -05:00 |
|
Isotr0py
|
7c16f3fbcc
|
[Doc] Add documents for multi-node distributed serving with MP backend (#30509)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-13 18:02:29 +00:00 |
|
lif
|
ddbfbe5278
|
[Docs] Clarify Expert Parallel behavior for attention and MoE layers (#30615)
Signed-off-by: majiayu000 <1835304752@qq.com>
|
2025-12-13 08:37:59 -09:00 |
|
Laith Sakka
|
763963aa73
|
set assume_32bit_indexing and pass unbacked hints (#30459)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-13 15:36:53 +00:00 |
|
Cyrus Leung
|
39cefbdf17
|
[Refactor] TokenizerRegistry only uses lazy imports (#30609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-13 23:16:22 +08:00 |
|
Chen Zhang
|
ace34e3783
|
[Bugfix] Qwen3-next with --hf-overrides \{\"num_hidden_layers\":8\} (#30433)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-12-13 22:12:45 +08:00 |
|
Isotr0py
|
e5db3e2774
|
[CI/Build] Fix broken mm processor test Mistral-3-large (#30597)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-13 04:43:01 -08:00 |
|
Cyrus Leung
|
64251f48df
|
[Chore] Adjust tokenizer import to avoid circular imports (#30601)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-13 04:42:39 -08:00 |
|
Nick Hill
|
1cec5b7ea9
|
[Scheduer] Simplify stop checking for pooling models (#30591)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-13 09:45:26 +00:00 |
|
Cyrus Leung
|
b09806e28f
|
[Bugfix] Dictionary MM embeddings for online chat (#30507)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-13 15:48:56 +08:00 |
|
Tsukasa OI
|
fdc135d768
|
[Misc][Quantization] Clarify the intent of GGUF FusedMoE weight materialization (#30310)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2025-12-13 13:55:14 +08:00 |
|
Roberto L. Castro
|
4fa7ce46f3
|
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-12 19:34:23 -08:00 |
|
Nicolò Lucchesi
|
57e9bf1864
|
[CI] Whisper logprobs tests (#30504)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-13 10:49:11 +08:00 |
|
Michael Goin
|
2f32a68d75
|
[CI] Update several models in registry that are available online now (#30514)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-12-12 18:28:13 -08:00 |
|
Matthew Bonanni
|
f5dfbbd8e9
|
[Docs] Remove references to VLLM_ATTENTION_BACKEND (#30564)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-13 10:20:15 +08:00 |
|
Michael Goin
|
fc0119425c
|
Add IBM and Red Hat to compute resources sponsors (#30581)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-12-13 01:34:23 +00:00 |
|
Matthew Bonanni
|
86a3261525
|
[Bugfix] Pass FA version in MultiHeadAttention (#30575)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-13 00:02:11 +00:00 |
|
rasmith
|
08f8a5627e
|
[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality (#30292)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-12 18:41:56 -05:00 |
|
Kevin H. Luu
|
b4039c08b5
|
[ci] Mark PrimeRL integration test as soft fail (#30578)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2025-12-12 14:13:09 -08:00 |
|
Wentao Ye
|
1e6b115300
|
[Refactor] Reduce duplicate code in per_token_group_quant cuda kernels (#30496)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-12 16:45:23 -05:00 |
|
danielafrimi
|
13618626df
|
[MoE-FP8-modelopt] Add FlashInfer alignment padding for intermediate dimensions (#29748)
Signed-off-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster>
Signed-off-by: dafrimi <dafrimi@nvidia.com>
Co-authored-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-12-12 20:42:32 +00:00 |
|
danielafrimi
|
6ec0d8dbe4
|
[Fix]Load kv-cache dtype from hf_quant_config.json automatically (#29980)
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>
|
2025-12-12 11:27:47 -08:00 |
|
Li, Jiang
|
9693dd0fe3
|
[CI/Build] Add x86 CPU wheel release pipeline (#28848)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-12 19:21:35 +00:00 |
|
Xin Yang
|
1f19d8f899
|
[Perf] Set split_k to 1 for triton_kernels (#30528)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2025-12-12 14:07:57 -05:00 |
|
shivampr
|
cd7740ac5c
|
[ROCm] Enable Triton ScaledMM fallback + kernel selection fix (#26668)
Signed-off-by: Shivam <shivampr.dev@gmail.com>
Signed-off-by: Shivam <shivamprasad91@gmail.com>
|
2025-12-12 13:28:20 -05:00 |
|
Wentao Ye
|
02a5880394
|
[CI] Fix mypy for vllm/v1/executor (#30517)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-12 18:05:34 +00:00 |
|
realliujiaxu
|
d2c919dcc2
|
[bugfix] fix bug when top_logprobs=0 with spec decoding (#30059)
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
|
2025-12-12 09:03:35 -08:00 |
|
Benjamin Bartels
|
f3237f3f6b
|
[Frontend] Fixes anthropic streaming message_start usage nesting (#30266)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2025-12-12 16:28:54 +00:00 |
|
jvlunteren
|
9c0ee995a8
|
[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel (#28306)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-12-12 16:55:40 +01:00 |
|
Michael Goin
|
09ad3b76b3
|
[Bug] Fix attention_backend arg string parsing (#30534)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-12 08:40:50 -07:00 |
|
Christina Norman
|
dc13c99eed
|
fix(gguf): Disable bfloat16 for GGUF on blackwell device (#30408)
Signed-off-by: Christina <truffle@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Christina Norman <christina@example.com>
Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-12 23:10:12 +08:00 |
|
Vladislav Nosivskoy
|
3e34adcdfb
|
[DeepSeek V3.2] Proper drop_thinking logic (#30490)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2025-12-12 15:01:06 +00:00 |
|
Lucas Wilkinson
|
3e41992fec
|
[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-12 05:57:47 -08:00 |
|
吴坎
|
91401c7a26
|
[Bugfix] Fix CMakeLists Environment Variable (#21804)
Signed-off-by: wu-kan <github@wu-kan.com>
Signed-off-by: 吴坎 <github@wu-kan.cn>
Signed-off-by: wu-kan <github@wu-kan.cn>
|
2025-12-12 10:54:52 +00:00 |
|
Jaehwang Jung
|
f90319d5d1
|
[Bugfix] Schedule failure due to wrong get_image_size_with_most_features (#29692)
|
2025-12-12 02:27:20 -08:00 |
|
rasmith
|
302b2c1eb9
|
[CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. (#30291)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-12 09:30:23 +00:00 |
|
Ben Browning
|
8f8fda261a
|
[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-12 12:59:53 +08:00 |
|
Zhengxu Chen
|
fe1787107e
|
[compile] Parse compile range cache keys as Range during cache loading. (#30516)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-12 04:30:51 +00:00 |
|
Andreas Karatzas
|
783644e4ac
|
[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-12 03:54:56 +00:00 |
|
Ryan Rock
|
197473c4e7
|
[CI/Build] Use spawn subprocess for ROCm (#30272)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2025-12-12 03:33:17 +00:00 |
|
Nick Hill
|
947dfda9c2
|
[LMCache] Relax lmcache version requirement (#30425)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-11 18:18:47 -09:00 |
|
Michael Goin
|
9f2fc16a69
|
[Bugfix][Model] Fix Afmoe rope_parameters issue (#30505)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-12 02:53:57 +00:00 |
|