Cyrus Leung
|
8c017b3490
|
[Model] Always use Transformers backend for PaliGemma and Gemma3-MM (#26715)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-17 05:03:35 +00:00 |
|
Jee Jee Li
|
fec2b341ad
|
[Kernel] Lazy import FlashInfer (#26977)
|
2025-10-17 04:48:18 +00:00 |
|
Boyuan Feng
|
17c540a993
|
[torch.compile] fix simple inductor graph partition test (#27050)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-10-16 21:09:36 -04:00 |
|
Cyrus Leung
|
4d4d6bad19
|
[Chore] Separate out vllm.utils.importlib (#27022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-17 00:48:59 +00:00 |
|
Lucia Fang
|
11ae016bd7
|
[torch.compile] Passing only necessary compilation config to inductor pass config (#27041)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-10-17 00:01:52 +00:00 |
|
jiahanc
|
41d3071918
|
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-16 16:20:25 -07:00 |
|
Harry Mellor
|
fb5e10d3fb
|
Refactor Transformers backend to use mixins (#26906)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-16 21:50:39 +00:00 |
|
Bram Wasti
|
b2f78cbad4
|
[small][batch invariance] Rename the env and internal flags to simplify usage (#26855)
Signed-off-by: Bram Wasti <bwasti@meta.com>
|
2025-10-16 21:40:25 +00:00 |
|
Michael Goin
|
01c977e96d
|
[CI] Prune Quantization Tests and skip compilation (#27038)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-16 17:26:35 -04:00 |
|
Wentao Ye
|
b3dda72c23
|
[Feature] Migrate DeepGEMM API from get_m_alignment_for_contiguous_layout to get_mk_alignment_for_contiguous_layout (#26935)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-16 16:46:48 -04:00 |
|
Varun Sundar Rabindranath
|
fb0571b077
|
[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-16 12:53:11 -07:00 |
|
Wentao Ye
|
2ed8b6b3d0
|
[Bug] Fix batch invariant test has to is (#27032)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-16 19:45:14 +00:00 |
|
Harry Mellor
|
aa255ff55a
|
Support set in the CLI generation (#27031)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-16 18:07:18 +00:00 |
|
Tahsin Tunan
|
43721bc67f
|
[CI] Replace large models with tiny alternatives in tests (#24057)
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-16 15:51:27 +01:00 |
|
Cyrus Leung
|
d2740fafbf
|
[Chore] Separate out vllm.utils.collections (#26990)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 08:35:35 +00:00 |
|
Cyrus Leung
|
76f0d05bc6
|
[CI/Build] Update expected beam search output for Phi3V (#26978)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 05:12:44 +00:00 |
|
Bram Wasti
|
7d8975de84
|
Deepseek-v3 Batch Invariant on 8xH100 (#26609)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-15 22:06:02 -07:00 |
|
Cyrus Leung
|
f6cdc9a02f
|
[Chore] Rename utils submodules (#26920)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 03:58:13 +00:00 |
|
Richard Zou
|
9b6504c307
|
[BugFix] Work around graph partition x torch.compile cache issue (#26956)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-10-15 20:06:11 -07:00 |
|
Angela Yi
|
e19b16dde6
|
[bugfix] Fix SP + PP without specifying compile size (#26955)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-10-15 20:05:33 -07:00 |
|
Michael Goin
|
f8a0acbdbe
|
[CI] Enable Blackwell Llama4 MoE tests (#26731)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-15 21:02:57 -06:00 |
|
kliuae
|
1317034379
|
[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops (#24097)
Signed-off-by: chenjun <junchen2@amd.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-10-16 10:41:34 +08:00 |
|
InChang Jeong
|
0ecc553ee6
|
[Bugfix] reasoning_parser parameter handling in run_batch.py (#26225)
Signed-off-by: inc-jeong <inc.jeong@navercorp.com>
Signed-off-by: InChang Jeong <inc.jeong@navercorp.com>
Co-authored-by: USER <user@AL02367916.local>
|
2025-10-16 10:24:05 +08:00 |
|
Adrian Abeyta
|
0a9ef0cfce
|
Move query quantization to attention layer for Flashinfer & Triton. (#26534)
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Adrian Abeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-15 19:01:38 -04:00 |
|
Pradeep Dasigi
|
4794c2bd92
|
Olmo 3 tool parser and tests (#26143)
Signed-off-by: Pradeep Dasigi <pradeepd@allenai.org>
|
2025-10-15 16:36:12 +00:00 |
|
Cyrus Leung
|
828523ad8e
|
[Chore] Separate out vllm.utils.async_utils (#26913)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-15 15:33:00 +00:00 |
|
Cyrus Leung
|
136a17fe6e
|
[Chore] Separate out vllm.utils.func (#26904)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-15 13:03:58 +00:00 |
|
Boyuan Feng
|
f57438338d
|
[BugFix] Patch inductor memory plan logic (#26878)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-15 12:51:45 +00:00 |
|
wangxiyuan
|
8f4b313c37
|
[Misc] rename torch_dtype to dtype (#26695)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-10-15 12:11:48 +00:00 |
|
Cyrus Leung
|
f93e348010
|
[Misc] Remove isort and yapf ignores (#26888)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-15 12:09:03 +00:00 |
|
wang.yuqi
|
f54f85129e
|
[Model][2/N] Improve all pooling task | Support multi-vector retrieval (#25370)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-15 11:14:41 +00:00 |
|
Cyrus Leung
|
b8a4572157
|
[Misc] Use helper function to generate dummy messages in OpenAI MM tests (#26875)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-15 07:17:37 +00:00 |
|
Boyuan Feng
|
f0862eae43
|
[Graph Partition] pass tests for decorator (#26831)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-10-15 06:39:48 +00:00 |
|
Tao Hui
|
85a65e7f51
|
[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972) (#25589)
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-10-15 11:09:52 +08:00 |
|
Morrison Turnansky
|
96b9aa5aa0
|
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level (#26355)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-15 02:51:16 +00:00 |
|
Zhikaiiii
|
9354660036
|
[Bugfix]fix Qwen3 xml tool parser (#26345)
Signed-off-by: Zhikaiiii <1658973216@qq.com>
|
2025-10-15 09:50:30 +08:00 |
|
Luka Govedič
|
2dcd12d357
|
[torch.compile] Fix tests for torch==2.9 inductor partition (#26116)
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-10-14 19:55:02 -04:00 |
|
Ye Hu
|
0512c04aee
|
[frontend][gptoss] Add per turn stats into Harmony Context (#25061)
Signed-off-by: lacora <hyelacora@gmail.com>
Co-authored-by: Ye Hu <yehu@fb.com>
|
2025-10-14 16:48:13 -07:00 |
|
Michael Goin
|
7e0ef4084a
|
[CI Failure] Fix torchao dep failure for Quantization Test (#26824)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-14 16:41:43 -07:00 |
|
Nick Hill
|
4aed506b65
|
[Core] Streamline some structured output related code (#26737)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-14 23:27:44 +00:00 |
|
Jialin Ouyang
|
380f17527c
|
[Perf] Cache vllm.env.__getattr__ result to avoid recomputation (#26146)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-14 17:03:21 -04:00 |
|
Matthew Bonanni
|
82af928c41
|
[Attention][Spec Decode] FlashMLA spec decode support (#26541)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-14 19:38:20 +00:00 |
|
Michael Goin
|
c3a722fcb2
|
[CI Failure] Fix tests with missing TinyLlama-1.1B-Chat-v1.0-FP8-e2e (#26816)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-14 18:38:59 +00:00 |
|
Chauncey
|
df850c4912
|
[Feature][Responses API] Stream Function Call - harmony (#24317)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-14 08:31:43 -07:00 |
|
Qier Li
|
720394de43
|
[KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats (#26046)
Signed-off-by: Qier Li <kevin44036@gmail.com>
|
2025-10-14 14:38:07 +00:00 |
|
Jaya Yuan
|
ea97940d6c
|
[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention (#24864)
Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com>
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com>
|
2025-10-14 13:07:50 +00:00 |
|
Jee Jee Li
|
fdd32750f0
|
[CI/Build] Cleanup LoRA test (#26752)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-14 12:06:35 +00:00 |
|
Cyrus Leung
|
9c4cb68339
|
[Chore] Remove SupportsV0Only interface and update supported models docs (#26783)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-14 04:55:10 -07:00 |
|
Chauncey
|
780eb03d9b
|
[CI] Fix test_tool_id_kimi_k2 (#26787)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-14 10:27:07 +00:00 |
|
Cyrus Leung
|
d1d063a588
|
[Chore] Use max_transformers_version for Qwen-VL test (#26792)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-14 03:03:46 -07:00 |
|