Chauncey
|
4c1c501a7e
|
[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-15 07:41:34 +00:00 |
|
Andreas Karatzas
|
ae1eba6a9a
|
[ROCm][CI] Pin transformers 4.57.3 to fix jina test failures (#32350)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-15 15:19:34 +08:00 |
|
Ofir Zafrir
|
e9ec2a72d8
|
[Bugfix] Fix stale common_attn_metadata.max_seq_len in speculative decoding with Eagle (#32312)
Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com>
|
2026-01-15 06:39:37 +00:00 |
|
Lucas Wilkinson
|
2c9b4cf5bf
|
[BugFix] Fix DeepSeek-V3.1 + DeepGEMM incompatible scale shapes (#32361)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>
|
2026-01-15 06:32:22 +00:00 |
|
Ning Xie
|
9d7ae3fcdb
|
[code clean] remove duplicate check (#32376)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-15 05:29:34 +00:00 |
|
rasmith
|
3c2685645e
|
[CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol (#32201)
Signed-off-by: Randall Smith <ransmith@amd.com>
|
2026-01-15 05:04:34 +00:00 |
|
Micah Williamson
|
773d7073ae
|
[ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] (#32355)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-15 04:53:43 +00:00 |
|
kzwrime
|
edadca109c
|
[Bugfix] Add CpuCommunicator.dispatch and combine to fix DP+MoE inference (#31867)
Signed-off-by: kunzh <zhikun.wu@outlook.com>
|
2026-01-15 04:50:48 +00:00 |
|
Li Wang
|
d86fc23bdd
|
[Misc] Remove redundant line (#32366)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2026-01-15 04:29:56 +00:00 |
|
Shiyan Deng
|
375e5984fe
|
Support configure skip_special_tokens in openai response api (#32345)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
|
2026-01-15 04:07:26 +00:00 |
|
baonudesifeizhai
|
19b251fe3d
|
Fix optional parameter parsing in MiniMax M2 tool parser #32278 (#32342)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
|
2026-01-15 04:05:48 +00:00 |
|
Ryan Rock
|
15422ed3f7
|
[CI/Build][Hardware][AMD] Fix v1/shutdown (#31997)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2026-01-15 04:01:42 +00:00 |
|
dolpm
|
8471b27df9
|
[compile] raise on compile_size implicit padding (#32343)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-14 20:46:56 +00:00 |
|
Lumosis
|
66652e8082
|
[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
|
2026-01-14 20:10:01 +00:00 |
|
vllmellm
|
e27078ea80
|
[Bugfix][ROCm][performance] Resolve the performance regression issue of the Qwen3-Next-80B-A3B-Thinking under rocm_atten (#32336)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-01-14 19:32:48 +00:00 |
|
Aleksandr Samarin
|
d084e9fca7
|
[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding (#26291)
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com>
Signed-off-by: southfreebird <yvorott@gmail.com>
Co-authored-by: southfreebird <yvorott@gmail.com>
|
2026-01-14 13:20:52 -05:00 |
|
qli88
|
3a612322eb
|
[CI] Move rixl/ucx from Dockerfile.rocm_base to Dockerfile.rocm (#32295)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
|
2026-01-14 16:53:36 +00:00 |
|
Cyrus Leung
|
9ea07b41da
|
[1/N] Reorganize multimodal processing code (#32327)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-14 15:25:31 +00:00 |
|
Ning Xie
|
552b262936
|
rename tokenize serving api request id prefix to tokenize (#32328)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-14 14:52:20 +00:00 |
|
Chauncey
|
00e6402d56
|
[Frontend] track responsesAPI server_load (#32323)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-14 12:00:37 +00:00 |
|
Shanshan Shen
|
ce0946249d
|
[Misc] Make mem utils can be reused by other platforms (#32322)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-01-14 03:46:01 -08:00 |
|
Cyrus Leung
|
3f28174c6a
|
[Frontend] Standardize use of create_error_response (#32319)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-14 11:22:26 +00:00 |
|
Chauncey
|
769d0629e1
|
[Refactor] [9/N] to simplify the vLLM openai translations serving ar chitecture (#32313)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-14 10:20:58 +00:00 |
|
Cyrus Leung
|
90db5b31e4
|
[Refactor] Move top-level dummy data generation to registry (#32310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-14 02:17:46 -08:00 |
|
Roger Wang
|
b8199f6049
|
[Model] Re-implement Qwen3Omni Audio Encoder (#32167)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-01-14 15:40:30 +08:00 |
|
sangho.lee
|
7e6f123810
|
Add Molmo2 multimodal model support (#30997)
Signed-off-by: sanghol <sanghol@allenai.org>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-14 15:33:09 +08:00 |
|
Chauncey
|
9312a6c03a
|
[Refactor] [8/N] to simplify the vLLM openai responsesapi_serving architecture (#32260)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-14 07:26:24 +00:00 |
|
Michael Goin
|
6388b50058
|
[Docs] Add docs about OOT Quantization Plugins (#32035)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-14 15:25:45 +08:00 |
|
Hongxia Yang
|
048bb59728
|
AMD CI Test - unskip moe_sum test and moe_align_block_size tests (#32039)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
|
2026-01-13 23:25:10 -08:00 |
|
Angela Yi
|
7933638051
|
[misc] Remove is_torch_equal_or_newer(2.4) cases (#32296)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-13 23:22:07 -08:00 |
|
David
|
6b176095e3
|
[Build] Relax anthropic version pin from ==0.71.0 to >=0.71.0 (#32289)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-13 23:21:39 -08:00 |
|
Andreas Karatzas
|
9d0d7f48d5
|
[ROCm][CI] Handle missing vision_config in Isaac model attention patch (#32281)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-14 07:21:26 +00:00 |
|
Yi Liu
|
50632adc58
|
Consolidate Intel Quantization Toolkit Integration in vLLM (#31716)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
|
2026-01-14 07:11:30 +00:00 |
|
Micah Williamson
|
6fa6e7ef0c
|
[ROCm][CI] Disable Async Scheduling For Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test (#32275)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-14 13:29:42 +08:00 |
|
Woosuk Kwon
|
90c0836902
|
[Model Runner V2] Refactor Sampler (#32245)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-13 17:58:12 -08:00 |
|
Roberto L. Castro
|
8ef50d9a6b
|
[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
|
2026-01-13 15:22:53 -08:00 |
|
emricksini-h
|
2a60ac91d0
|
[Improvement] Persist CUDA compat libraries paths to prevent reset on apt-get (#30784)
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
|
2026-01-13 14:35:05 -08:00 |
|
Michael Goin
|
9e65bb4ef4
|
Add mergify label job for "bug" in PR titles (#31980)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-13 14:28:19 -08:00 |
|
Simon Mo
|
0db574b185
|
[Build] Add scripts for cherry-picking and trigger build (#32282)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
|
2026-01-13 13:21:05 -08:00 |
|
HappyAmazonian
|
2f4a71daf2
|
[Misc] Add In-Container restart capability through supervisord for sagemaker entrypoint (#28502)
Signed-off-by: Shen Teng <sheteng@amazon.com>
Signed-off-by: HappyAmazonian <91216626+HappyAmazonian@users.noreply.github.com>
|
2026-01-13 13:06:10 -08:00 |
|
Rabi Mishra
|
69f8a0ea37
|
fix(rocm): Use refresh_env_variables() for rocm_aiter_ops in test_moe (#31711)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-01-13 19:11:54 +00:00 |
|
Wentao Ye
|
f28125d87b
|
[Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improvement (#32058)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-13 10:58:18 -08:00 |
|
Dmitry Tokarev
|
46f8c6b725
|
Fix CUDA 13 wheel installation doc (#32276)
Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com>
|
2026-01-13 10:48:37 -08:00 |
|
Andrew Xia
|
af54d2e2d0
|
[responseAPI] support partial message generation (#32100)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <mitandrewxia@gmail.com>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-01-13 10:41:26 -08:00 |
|
Sage Moore
|
6beef12b9b
|
[EPLB][Cleanup] Remove is_async_enabled from EplbModelState (#32050)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-01-13 18:19:03 +00:00 |
|
Mark McLoughlin
|
ab74b2a27a
|
[Trivial] Remove duplicate enable_mfu_metrics (#32246)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-01-14 01:09:23 +08:00 |
|
Matthew Bonanni
|
2263d44b68
|
[4/N][Attention] Move MLA common to model_executor (#32060)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-13 09:08:45 -08:00 |
|
Mathis Felardos
|
4f3676e726
|
nixl_connector: export UCX_MEM_MMAP_HOOK_MODE=none to avoid a UCX memory leak (#32181)
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
|
2026-01-13 16:21:10 +00:00 |
|
Martin Hickey
|
510265472c
|
[BugFix] [KVConnector] Fix KV events for LMCache connector (#32169)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-13 15:50:34 +00:00 |
|
Chauncey
|
4f02cb2eac
|
[Refactor] [7/N] to simplify the vLLM lora serving architecture (#32251)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-13 15:37:34 +00:00 |
|