Robert Shaw
|
727c41f3fd
|
[MoE Refactor][10/N] Cleanup Fp8 Process Weights After Loading (#31169)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-27 20:22:48 +00:00 |
|
Boyuan Feng
|
2f12cd32c0
|
[BugFix] Fix cache issue in compilation_config (#31376)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-12-27 09:30:39 -05:00 |
|
Isotr0py
|
40a8756224
|
[Chore]: Remove HF format Phi4-MM examples (#31405)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-27 13:42:02 +00:00 |
|
Isotr0py
|
3d024985ab
|
[CI/Build] Ignore max transformers version for more common tests (#31401)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-27 13:06:26 +00:00 |
|
baonudesifeizhai
|
8711b21676
|
Fix/get raw stream patch #30905 (#30912)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-12-26 20:08:47 -08:00 |
|
Yifan Qiao
|
52bf066516
|
[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector (#30166)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-12-26 18:25:46 -08:00 |
|
Kunshang Ji
|
5326c89803
|
[XPU][CI]skip test_preprocess_error_handling due to fork/spawn issue (#31381)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-12-26 21:40:44 +00:00 |
|
Xinyu Chen
|
87f1b8ca2c
|
CustomOp: Unify aiter impl into GroupedTopk (#31221)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2025-12-26 12:44:29 -05:00 |
|
rongfu.leng
|
887e900b77
|
[Docs] Add profiler user docs for http request (#31370)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-12-26 23:48:15 +08:00 |
|
Patrick von Platen
|
48e744976c
|
[Mistral common] Ensure all functions are imported from the top & only use public methods (#31138)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-26 04:48:24 -08:00 |
|
Jee Jee Li
|
ce1eafd1a5
|
[Core] Initialize LoRA support for tower and connector in multi-modal models (#26674)
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Anexdeus <5142168@mail.ru>
|
2025-12-26 04:48:20 -08:00 |
|
Harry Mellor
|
0b544e6476
|
[Docs] Fix some snippets (#31378)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-26 12:47:41 +00:00 |
|
Jee Jee Li
|
c3666f56fd
|
[Misc] Fix Qwen2-MoE shared_expert_gate (#31339)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-26 05:10:39 +00:00 |
|
Andreas Karatzas
|
c79dbfa9ad
|
[CI] Fix flaky vision beam search test with flexible semantic validation (#31324)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-26 04:39:32 +00:00 |
|
Shinichi Hemmi
|
9ee05cbe7f
|
Support LoRA and GPTQModel for PLaMo 2/3 (#31322)
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
|
2025-12-26 11:41:33 +08:00 |
|
Ning Xie
|
3b8f31b362
|
[benchmark] use model card root instead of id (#31329)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-12-26 10:55:56 +08:00 |
|
Isotr0py
|
2cd94259c8
|
[CI/Build] Ignore max transformers version skipping for initialization tests (#30619)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-26 10:50:32 +08:00 |
|
oscardev256
|
b7165d53c6
|
Feature/isaac 0.1 (#28367)
Signed-off-by: oscardev256 <42308241+oscardev256@users.noreply.github.com>
Signed-off-by: Oscar Gonzalez <ogonzal6@alumni.jh.edu>
Signed-off-by: Yang <lymailforjob@gmail.com>
Co-authored-by: Yang <lymailforjob@gmail.com>
|
2025-12-25 18:49:11 -08:00 |
|
Nick Hill
|
81786c8774
|
[BugFix] Fix async scheduling + reasoning with struct output (#31332)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2025-12-25 23:01:02 +00:00 |
|
Stan Wozniak
|
f1531d9f2a
|
[Hybrid] Mamba2 prefix cache blocks freeing for running requests (#28047)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-12-25 20:54:06 +00:00 |
|
SongHe
|
2d6001f491
|
[Model][Ernie4.5-VL] Support video metadata for timestamp rendering (#31274)
Signed-off-by: dengsonghe <dengsonghe@baidu.com>
Co-authored-by: dengsonghe <dengsonghe@baidu.com>
|
2025-12-25 14:07:15 +00:00 |
|
Amir Samani
|
030fc44914
|
use the same stream for cuda graph catpure and replay for NCCL (#29207)
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-25 19:10:03 +08:00 |
|
Isotr0py
|
2532f437ee
|
[Doc] Add troubleshooting for Triton PTX error about undefined gpu-name (#31338)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-25 02:26:34 -08:00 |
|
Louie Tsai
|
f15185fbdb
|
[Benchmark Suite] improve cpu Benchmark Suite tests and comparison report for 0.12.0 (#30994)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-12-25 08:51:45 +00:00 |
|
Mark Gatere
|
ba25a65992
|
[Frontend] add FunctionGemma tool parser support (#31218)
Signed-off-by: gateremark <gateremg@gmail.com>
|
2025-12-25 15:29:25 +08:00 |
|
Amith KK
|
42826bbccd
|
[Doc] Add tool call parser documentation for GPT-OSS models (#31212)
Signed-off-by: Amith KK <amithkumaran@gmail.com>
|
2025-12-25 05:29:10 +00:00 |
|
Richard Zou
|
254f6b9867
|
[Bugfix] Fix eagle dp tests on A100 (#31241)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-12-25 00:05:04 +00:00 |
|
Michael Goin
|
bc5ef333e0
|
[Perf] Add skip_clone to SamplingParams for internal request handling (#31041)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-24 14:35:57 -08:00 |
|
Cyrus Leung
|
09dc7c690c
|
[Chore][1/2] Drop v0.14 deprecations (#31285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 09:54:01 -08:00 |
|
ゆり
|
506eb0f454
|
[Bugfix] Remove dead block_quant_to_tensor_quant function (#31294)
Co-authored-by: yurekami <yurekami@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2025-12-24 17:22:48 +00:00 |
|
Ning Xie
|
5d93089686
|
[cli] complete vllm cli help message (#31226)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-12-24 15:45:47 +00:00 |
|
Kevin McKay
|
66c9887440
|
[Bugfix][Hardware][AMD] Fix FP8 dtype in silu_mul quantization (#31179)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-24 10:37:11 -05:00 |
|
wang.yuqi
|
1ff67df182
|
[CI] Reorganization pooling_mteb_test (#31265)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-12-24 23:36:20 +08:00 |
|
skaraban3807
|
7cd288a4b3
|
[PERF] Add interleaved memory allocation to NUMA module (#30800)
|
2025-12-24 13:47:49 +00:00 |
|
Cyrus Leung
|
d201807339
|
[Chore] Bump lm-eval version (#31264)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 05:39:13 -08:00 |
|
Cyrus Leung
|
aa3868ecfe
|
[Chore] Remove unused noqas (#31263)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 05:38:46 -08:00 |
|
Cyrus Leung
|
7adeb4bfa8
|
[Bugfix] Fix max_model_len="auto" handling (#31260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 19:15:27 +08:00 |
|
wang.yuqi
|
bd89ce16d2
|
[Model] Introduce verify_and_update_model_config for VerifyAndUpdateConfig. (#31131)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-12-24 09:54:57 +00:00 |
|
Pleaplusone
|
b41aeb3468
|
[Bugfix][ROCm] Fix load issue on deepseek quark quantization when shared expert enabled (#31261)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-12-24 16:47:44 +08:00 |
|
Ryan Rock
|
ddfac7034e
|
[CI/Build] Ignore data_parallel_size_local (#30281)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2025-12-24 07:40:54 +00:00 |
|
Micah Williamson
|
6559d96796
|
[ROCm][CI] Set TORCH_NCCL_BLOCKING_WAIT Distributed Tests On ROCm (#31259)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-24 07:19:07 +00:00 |
|
kliuae
|
1c74150bca
|
[ROCm][CI] Fix "Distributed Tests (H200)" Test (#31227)
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
|
2025-12-24 06:56:30 +00:00 |
|
Andreas Karatzas
|
0247a91e00
|
[ROCm][CI] Fix entrypoints tests and Python-only installation test on ROCm (#28979)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-23 22:42:30 -08:00 |
|
Michael Goin
|
8ee90c83f8
|
Add --max-model-len auto to auto-fit context to available memory (#29431)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-23 21:37:14 -08:00 |
|
Nick Cao
|
d7e05ac743
|
[docker] Fix downloading sccache on aarch64 platform (#30070)
Signed-off-by: Nick Cao <nickcao@nichi.co>
|
2025-12-23 21:36:33 -08:00 |
|
sihao_li
|
471ddb99a0
|
[XPU] Remove distributed_executor_backend check (#30760)
Signed-off-by: sihao.li <sihao.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-12-23 21:34:33 -08:00 |
|
Xiong Wang
|
bb24592d13
|
[Qwen3-Omni] fixed _get_feat_extract_output_lengths function (#31007)
Signed-off-by: Xiong Wang <wangxiongts@163.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-12-23 21:33:54 -08:00 |
|
Matthew Bonanni
|
369f47aa0f
|
[DeepSeek v3.2] Remove unnecessary syncwarps (#31047)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-23 21:33:30 -08:00 |
|
zejunchen-zejun
|
dabff12ed3
|
[Bugfix][ROCm][Dynamo][DS 3.1][FP8] fix unsupported hasattr call when Dynamo tracing for ROCm device (#31149)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-12-23 21:32:19 -08:00 |
|
Ming Yang
|
3bb9561928
|
Revert "[bench] Support common prefix len config (for decode-only bench)" (#31240)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-12-23 21:17:23 -08:00 |
|