Cyrus Leung
|
dcb31196da
|
[Chore] Remove redundant RequestPrompt (#30612)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-14 09:22:37 +00:00 |
|
Laith Sakka
|
f569c654e1
|
enable unbacked with aot_compile (#30462)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-14 08:14:06 +00:00 |
|
Kayvan Mivehnejad
|
29f7d97715
|
Improve parse_raw_prompt test cases for invalid input .v2 (#30512)
Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>
|
2025-12-14 11:18:41 +08:00 |
|
Cyrus Leung
|
39cefbdf17
|
[Refactor] TokenizerRegistry only uses lazy imports (#30609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-13 23:16:22 +08:00 |
|
Isotr0py
|
e5db3e2774
|
[CI/Build] Fix broken mm processor test Mistral-3-large (#30597)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-13 04:43:01 -08:00 |
|
Cyrus Leung
|
64251f48df
|
[Chore] Adjust tokenizer import to avoid circular imports (#30601)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-13 04:42:39 -08:00 |
|
Cyrus Leung
|
b09806e28f
|
[Bugfix] Dictionary MM embeddings for online chat (#30507)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-13 15:48:56 +08:00 |
|
Roberto L. Castro
|
4fa7ce46f3
|
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-12 19:34:23 -08:00 |
|
Nicolò Lucchesi
|
57e9bf1864
|
[CI] Whisper logprobs tests (#30504)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-13 10:49:11 +08:00 |
|
Michael Goin
|
2f32a68d75
|
[CI] Update several models in registry that are available online now (#30514)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-12-12 18:28:13 -08:00 |
|
rasmith
|
08f8a5627e
|
[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality (#30292)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-12 18:41:56 -05:00 |
|
shivampr
|
cd7740ac5c
|
[ROCm] Enable Triton ScaledMM fallback + kernel selection fix (#26668)
Signed-off-by: Shivam <shivampr.dev@gmail.com>
Signed-off-by: Shivam <shivamprasad91@gmail.com>
|
2025-12-12 13:28:20 -05:00 |
|
realliujiaxu
|
d2c919dcc2
|
[bugfix] fix bug when top_logprobs=0 with spec decoding (#30059)
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
|
2025-12-12 09:03:35 -08:00 |
|
Benjamin Bartels
|
f3237f3f6b
|
[Frontend] Fixes anthropic streaming message_start usage nesting (#30266)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2025-12-12 16:28:54 +00:00 |
|
jvlunteren
|
9c0ee995a8
|
[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel (#28306)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-12-12 16:55:40 +01:00 |
|
Lucas Wilkinson
|
3e41992fec
|
[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-12 05:57:47 -08:00 |
|
Jaehwang Jung
|
f90319d5d1
|
[Bugfix] Schedule failure due to wrong get_image_size_with_most_features (#29692)
|
2025-12-12 02:27:20 -08:00 |
|
rasmith
|
302b2c1eb9
|
[CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. (#30291)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-12 09:30:23 +00:00 |
|
Ben Browning
|
8f8fda261a
|
[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-12 12:59:53 +08:00 |
|
Andreas Karatzas
|
783644e4ac
|
[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-12 03:54:56 +00:00 |
|
Michael Goin
|
9f2fc16a69
|
[Bugfix][Model] Fix Afmoe rope_parameters issue (#30505)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-12 02:53:57 +00:00 |
|
rasmith
|
ba80926681
|
[CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they require cutlass_pack_scale_fp8 (#30508)
Signed-off-by: Randall Smith <ransmith@amd.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-12 01:02:19 +00:00 |
|
rasmith
|
48661d275f
|
[CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm (#30417)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-12 00:24:20 +00:00 |
|
Andrew Briand
|
a00d88973d
|
[EPLB] Support EPLB w/ NVFP4 (#29804)
Signed-off-by: Andrew Briand <abriand@nvidia.com>
Co-authored-by: Andrew Briand <abriand@nvidia.com>
|
2025-12-11 22:59:40 +00:00 |
|
Harry Mellor
|
cf3eacfe58
|
Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim (#30389)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 20:45:23 +00:00 |
|
Harry Mellor
|
8781cd6b88
|
Add Eagle and Eagle3 support to Transformers modeling backend (#30340)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 17:02:10 +00:00 |
|
Julien Denize
|
aa3c250c48
|
[IMPROVEMENT] Change MistralReasoningParser behavior (#30391)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-12-11 17:53:26 +01:00 |
|
Shengqi Chen
|
305b168a9f
|
[CI] refine more logic when generating and using nightly wheels & indices, add cuda130 build for aarch64, specify correct manylinux version (#30341)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-12 00:42:30 +08:00 |
|
Martin Hickey
|
f4417f8449
|
[KVConnector] Add KV events to KV Connectors (#28309)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2025-12-11 15:30:29 +01:00 |
|
Cyrus Leung
|
d917747c95
|
[Bugfix] Fix task still being passed in tests/benchmarks (#30476)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-11 10:33:55 +00:00 |
|
jeremyteboul
|
4515eb1a0b
|
[Fix] Update lazing loading of video loader backend (#30444)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
|
2025-12-11 10:14:57 +00:00 |
|
Rei.
|
6299628d32
|
[bugfix] fix MiniMaxM2ReasoningParser streaming output not separating reasoning_content. (#29882)
Signed-off-by: Rei <1477174254@qq.com>
|
2025-12-11 09:05:08 +00:00 |
|
Ning Xie
|
d02d1043de
|
fix: enhance human_readable_int function (#30337)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-12-10 23:30:33 -08:00 |
|
Wentao Ye
|
d6464f2679
|
[Chore] Fix torch precision warning (#30428)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-11 04:05:56 +00:00 |
|
Cyrus Leung
|
7e24e5d4d6
|
[Deprecation] Remove deprecated task, seed and MM settings (#30397)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-10 19:59:39 -08:00 |
|
Cyrus Leung
|
5a87d8b9b1
|
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-10 19:59:35 -08:00 |
|
shivampr
|
8580919ac3
|
[Bugfix] fix confusing OOM errors during v1 init (#28051)
Signed-off-by: Shivam <shivamprasad91@gmail.com>
Signed-off-by: shivampr <shivampr.dev@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-12-10 23:17:41 +00:00 |
|
Jialin Ouyang
|
9f042ba26b
|
[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-12-10 14:13:01 -05:00 |
|
Will Eaton
|
a9e4106f28
|
[P/D] KV Load Failure Recovery/Abort Configuration (#26813)
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-10 11:00:52 -08:00 |
|
Nicolò Lucchesi
|
c756fb6781
|
[Core] Whisper enable FULL_DECODE_ONLY CudaGraph (#30072)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-10 06:14:24 -08:00 |
|
Aditya Tewari
|
cebda2a4af
|
[CPU] Support for Whisper (#30062)
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
|
2025-12-10 04:58:42 -08:00 |
|
Fadi Arafeh
|
434ac76a7c
|
[cpu][ci] Add CPU Attention Tests for Neon Backend (#30347)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-12-10 05:37:35 +00:00 |
|
Andreas Karatzas
|
ed7af3178a
|
[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-10 05:33:13 +00:00 |
|
Micah Williamson
|
7d80c73d42
|
[CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance (#30367)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-10 02:35:49 +00:00 |
|
rasmith
|
b75f826fca
|
[CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform (#30020)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-10 02:28:37 +00:00 |
|
Andrew Xia
|
c3487aca34
|
[responsesAPI][6] Fix multi turn MCP tokenization (#30230)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-12-10 10:13:13 +08:00 |
|
Lucas Wilkinson
|
abe93bce59
|
[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-12-09 17:18:10 -08:00 |
|
Charlie Fu
|
3c680f4a17
|
[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: wuhuikx <hattie.wu@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
|
2025-12-09 22:39:26 +00:00 |
|
Kyle Sayers
|
fccd532587
|
[Quantization] FP8 Weight Reloading for Quantized RL Rollout (#28480)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-12-09 13:54:32 -08:00 |
|
rasmith
|
7618dc973d
|
[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145)
|
2025-12-09 20:18:17 +00:00 |
|