biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	dcb31196da	[Chore] Remove redundant `RequestPrompt` (#30612 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-14 09:22:37 +00:00
Laith Sakka	f569c654e1	enable unbacked with aot_compile (#30462 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-14 08:14:06 +00:00
Kayvan Mivehnejad	29f7d97715	Improve parse_raw_prompt test cases for invalid input .v2 (#30512 ) Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>	2025-12-14 11:18:41 +08:00
Cyrus Leung	39cefbdf17	[Refactor] `TokenizerRegistry` only uses lazy imports (#30609 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-13 23:16:22 +08:00
Isotr0py	e5db3e2774	[CI/Build] Fix broken mm processor test Mistral-3-large (#30597 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-13 04:43:01 -08:00
Cyrus Leung	64251f48df	[Chore] Adjust tokenizer import to avoid circular imports (#30601 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-13 04:42:39 -08:00
Cyrus Leung	b09806e28f	[Bugfix] Dictionary MM embeddings for online chat (#30507 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-13 15:48:56 +08:00
Roberto L. Castro	4fa7ce46f3	[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 ) Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-12 19:34:23 -08:00
Nicolò Lucchesi	57e9bf1864	[CI] Whisper logprobs tests (#30504 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-13 10:49:11 +08:00
Michael Goin	2f32a68d75	[CI] Update several models in registry that are available online now (#30514 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-12-12 18:28:13 -08:00
rasmith	08f8a5627e	[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality (#30292 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-12 18:41:56 -05:00
shivampr	cd7740ac5c	[ROCm] Enable Triton ScaledMM fallback + kernel selection fix (#26668 ) Signed-off-by: Shivam <shivampr.dev@gmail.com> Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-12 13:28:20 -05:00
realliujiaxu	d2c919dcc2	[bugfix] fix bug when top_logprobs=0 with spec decoding (#30059 ) Signed-off-by: realliujiaxu <realliujiaxu@163.com>	2025-12-12 09:03:35 -08:00
Benjamin Bartels	f3237f3f6b	[Frontend] Fixes anthropic streaming message_start usage nesting (#30266 ) Signed-off-by: bbartels <benjamin@bartels.dev>	2025-12-12 16:28:54 +00:00
jvlunteren	9c0ee995a8	[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel (#28306 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com> Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com> Co-authored-by: Thomas Parnell <tom.parnell@gmail.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-12-12 16:55:40 +01:00
Lucas Wilkinson	3e41992fec	[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-12 05:57:47 -08:00
Jaehwang Jung	f90319d5d1	[Bugfix] Schedule failure due to wrong get_image_size_with_most_features (#29692 )	2025-12-12 02:27:20 -08:00
rasmith	302b2c1eb9	[CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. (#30291 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-12 09:30:23 +00:00
Ben Browning	8f8fda261a	[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729 ) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-12-12 12:59:53 +08:00
Andreas Karatzas	783644e4ac	[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-12 03:54:56 +00:00
Michael Goin	9f2fc16a69	[Bugfix][Model] Fix Afmoe rope_parameters issue (#30505 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-12 02:53:57 +00:00
rasmith	ba80926681	[CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they require cutlass_pack_scale_fp8 (#30508 ) Signed-off-by: Randall Smith <ransmith@amd.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-12 01:02:19 +00:00
rasmith	48661d275f	[CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm (#30417 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-12 00:24:20 +00:00
Andrew Briand	a00d88973d	[EPLB] Support EPLB w/ NVFP4 (#29804 ) Signed-off-by: Andrew Briand <abriand@nvidia.com> Co-authored-by: Andrew Briand <abriand@nvidia.com>	2025-12-11 22:59:40 +00:00
Harry Mellor	cf3eacfe58	Standardise `get_rope` to use `rope_parameters["partial_rotary_factor"]`, not `rotary_dim` (#30389 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-11 20:45:23 +00:00
Harry Mellor	8781cd6b88	Add Eagle and Eagle3 support to Transformers modeling backend (#30340 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-11 17:02:10 +00:00
Julien Denize	aa3c250c48	[IMPROVEMENT] Change MistralReasoningParser behavior (#30391 ) Signed-off-by: juliendenize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-12-11 17:53:26 +01:00
Shengqi Chen	305b168a9f	[CI] refine more logic when generating and using nightly wheels & indices, add cuda130 build for aarch64, specify correct manylinux version (#30341 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-12 00:42:30 +08:00
Martin Hickey	f4417f8449	[KVConnector] Add KV events to KV Connectors (#28309 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2025-12-11 15:30:29 +01:00
Cyrus Leung	d917747c95	[Bugfix] Fix `task` still being passed in tests/benchmarks (#30476 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-11 10:33:55 +00:00
jeremyteboul	4515eb1a0b	[Fix] Update lazing loading of video loader backend (#30444 ) Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>	2025-12-11 10:14:57 +00:00
Rei.	6299628d32	[bugfix] fix MiniMaxM2ReasoningParser streaming output not separating reasoning_content. (#29882 ) Signed-off-by: Rei <1477174254@qq.com>	2025-12-11 09:05:08 +00:00
Ning Xie	d02d1043de	fix: enhance human_readable_int function (#30337 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-12-10 23:30:33 -08:00
Wentao Ye	d6464f2679	[Chore] Fix torch precision warning (#30428 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-11 04:05:56 +00:00
Cyrus Leung	7e24e5d4d6	[Deprecation] Remove deprecated task, seed and MM settings (#30397 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:39 -08:00
Cyrus Leung	5a87d8b9b1	[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:35 -08:00
shivampr	8580919ac3	[Bugfix] fix confusing OOM errors during v1 init (#28051 ) Signed-off-by: Shivam <shivamprasad91@gmail.com> Signed-off-by: shivampr <shivampr.dev@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-12-10 23:17:41 +00:00
Jialin Ouyang	9f042ba26b	[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-12-10 14:13:01 -05:00
Will Eaton	a9e4106f28	[P/D] KV Load Failure Recovery/Abort Configuration (#26813 ) Signed-off-by: Will Eaton <weaton@redhat.com> Signed-off-by: Will Eaton <me@wseaton.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-10 11:00:52 -08:00
Nicolò Lucchesi	c756fb6781	[Core] Whisper enable `FULL_DECODE_ONLY` CudaGraph (#30072 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-10 06:14:24 -08:00
Aditya Tewari	cebda2a4af	[CPU] Support for Whisper (#30062 ) Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>	2025-12-10 04:58:42 -08:00
Fadi Arafeh	434ac76a7c	[cpu][ci] Add CPU Attention Tests for Neon Backend (#30347 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-12-10 05:37:35 +00:00
Andreas Karatzas	ed7af3178a	[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-12-10 05:33:13 +00:00
Micah Williamson	7d80c73d42	[CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance (#30367 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-10 02:35:49 +00:00
rasmith	b75f826fca	[CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform (#30020 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-10 02:28:37 +00:00
Andrew Xia	c3487aca34	[responsesAPI][6] Fix multi turn MCP tokenization (#30230 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-10 10:13:13 +08:00
Lucas Wilkinson	abe93bce59	[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-12-09 17:18:10 -08:00
Charlie Fu	3c680f4a17	[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: wuhuikx <hattie.wu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>	2025-12-09 22:39:26 +00:00
Kyle Sayers	fccd532587	[Quantization] FP8 Weight Reloading for Quantized RL Rollout (#28480 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-12-09 13:54:32 -08:00
rasmith	7618dc973d	[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145 )	2025-12-09 20:18:17 +00:00

1 2 3 4 5 ...

3870 Commits