Cyrus Leung
|
61cf087680
|
[Bugfix] Fix lora tests (#34834)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-18 13:22:31 -08:00 |
|
Kurt Shuster
|
2991dd3d22
|
[Bugfix][Model] Support LoRA on Qwen3 Output Embedding (#29816)
Signed-off-by: kurt <kurt@thinkingmachines.ai>
|
2026-02-06 20:25:31 +08:00 |
|
yugong333
|
ffe1fc7a28
|
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
|
2026-02-02 12:30:06 -05:00 |
|
Runkai Tao
|
7320ca3942
|
Add unpermute-aware fused MoE LoRA path (#32655)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
|
2026-02-02 09:46:09 +08:00 |
|
Jackmin801
|
12dab78f49
|
[Feat] allow inplace loading lora (#31326)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-20 10:15:20 +08:00 |
|
danisereb
|
aa7f37ccfa
|
Add support for LoRA adapters in Nemotron-H models (#30802)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-01-19 22:30:44 +08:00 |
|
Xin Yang
|
e7b68f4d6c
|
[Bugfix] Fix Triton FusedMoE LoRA (#30585)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-09 11:46:59 +00:00 |
|
gnovack
|
bde38c11df
|
fix lora moe sharding when rank < max_lora_rank (#31994)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-09 14:43:25 +08:00 |
|
Lucas Wilkinson
|
6cdf015c3c
|
[Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-08 15:20:49 -08:00 |
|
wangxiyuan
|
bb4337b34c
|
[Platform] Deprecate seed_everything (#31659)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2026-01-04 18:34:04 -08:00 |
|
B-201
|
ecd49ce7e6
|
[Fix] Align fused moe lora_b shape with peft (#31534)
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-31 09:44:59 +08:00 |
|
ZT-AIA
|
f84bf7d79b
|
Add Loraconfig parameter to get_punica_wrapper function (#31408)
Signed-off-by: ZT-AIA <1028681969@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-29 22:27:31 -08:00 |
|
Jee Jee Li
|
ce1eafd1a5
|
[Core] Initialize LoRA support for tower and connector in multi-modal models (#26674)
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Anexdeus <5142168@mail.ru>
|
2025-12-26 04:48:20 -08:00 |
|
Harry Mellor
|
af506fd76a
|
Fix instantiation of HfHubHTTPError in LoRA test (#30768)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-16 08:02:24 -08:00 |
|
Jee Jee Li
|
0e391e7570
|
[Bugfix] Fix RequestOutput miss lora_request (#30636)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-16 01:36:35 -08:00 |
|
gnovack
|
ea657f2078
|
Lora MoE Align Improvements (#29257)
Signed-off-by: gnovack <gnovack@amazon.com>
|
2025-12-09 10:35:16 +08:00 |
|
Jee Jee Li
|
67312cad11
|
[Misc] Split the LoRA code (#30253)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-09 00:59:31 +08:00 |
|
Jee Jee Li
|
b0f4866a77
|
[CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-07 20:27:11 +08:00 |
|
Cyrus Leung
|
e83b7e379c
|
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199)
|
2025-12-07 00:00:22 -08:00 |
|
Cyrus Leung
|
27f4c2fd46
|
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 23:15:42 -08:00 |
|
Harry Mellor
|
951445a52d
|
Remove default values from InitVars so that they're not stored (#29859)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-02 12:16:37 +00:00 |
|
Jee Jee Li
|
39e63dec7c
|
[LoRA] Cleanup LoRA unused code (#29611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 22:52:58 -08:00 |
|
Jee Jee Li
|
2f5f9acd55
|
[LoRA] Continue optimizing MoE LoRA weight loading (#29322)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-27 05:56:28 -08:00 |
|
Roger Wang
|
0ff70821c9
|
[Core] Deprecate xformers (#29262)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-11-24 04:18:55 +00:00 |
|
Jee Jee Li
|
1073ba68b0
|
[LoRA] Optimize 3D MoE logic (#29222)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-24 10:27:23 +08:00 |
|
Alex Brooks
|
b4734b9550
|
[Bugfix] Fix default MM LoRA alignment for single str prompts (#29140)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-11-21 13:32:30 +08:00 |
|
Jee Jee Li
|
9875be6431
|
[LoRA][2/2]Remove LoRA extra vocab (#28545)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-21 09:46:43 +08:00 |
|
gnovack
|
d69062c67a
|
add support for --fully-sharded-loras in fused_moe (#28761)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-19 16:32:00 +08:00 |
|
Varun Sundar Rabindranath
|
6b2b9fd934
|
[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness (#28322)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-10 10:45:29 +08:00 |
|
yugong333
|
2ec401bc39
|
Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-04 18:27:35 +08:00 |
|
gnovack
|
294c805f1d
|
Early exit for MoE LoRA kernels (#27131)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-03 20:22:17 +08:00 |
|
Jee Jee Li
|
32257297dd
|
[CI/Build] Remove the flaky gpt-oss lora test (#27966)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-03 16:50:06 +08:00 |
|
Jee Jee Li
|
0384aa7150
|
[CI/Build] Add gpt-oss LoRA test (#27870)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-31 22:17:21 +08:00 |
|
Huamin Li
|
c7d2a554ba
|
[CI Failure] fix test_default_mm_loras (#27795)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-10-30 18:13:03 +08:00 |
|
Jee Jee Li
|
f4e8154076
|
[Kernel] Enable moe LoRA kernel support FP16 (#27468)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-27 19:48:37 +08:00 |
|
Danielle Robinson
|
9932ed6a83
|
[Kernel] Adding split_K implementation for fused_moe_lora (#27291)
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Signed-off-by: Danielle Robinson <dcmaddix@gmail.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-27 02:05:24 -07:00 |
|
gnovack
|
8e4ca4d14e
|
Bugfix - pass 'max_num_tokens_padded' into 'moe_lora_align_block_size' (#27311)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-22 12:23:57 +00:00 |
|
Huy Do
|
becb7de40b
|
Update PyTorch to 2.9.0+cu129 (#24994)
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-21 17:20:18 -04:00 |
|
Chen Wu
|
5f6cbf60d6
|
[Feature][Kernel]FusedMoE LoRA (#21229)
Signed-off-by: wuchen <cntryroa@gmail.com>
Signed-off-by: banjuede <lmklhc@163.com>
Signed-off-by: Chen Wu <cntryroa@gmail.com>
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: wuchen <wuchen@zetyun.com>
Co-authored-by: Nathan Van Gheem <vangheem@gmail.com>
Co-authored-by: banjuede <lmklhc@163.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
|
2025-10-21 03:01:37 +00:00 |
|
Andy Lo
|
b63f2143f8
|
[LoRA] LoRA cuda graph specialization (#25914)
Signed-off-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-20 04:21:09 +00:00 |
|
Cyrus Leung
|
d31f7844f8
|
[Misc] Move utils to avoid conflicts with stdlib, and move tests (#27169)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-19 05:20:55 -07:00 |
|
Cyrus Leung
|
f6cdc9a02f
|
[Chore] Rename utils submodules (#26920)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 03:58:13 +00:00 |
|
Cyrus Leung
|
828523ad8e
|
[Chore] Separate out vllm.utils.async_utils (#26913)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-15 15:33:00 +00:00 |
|
Jee Jee Li
|
fdd32750f0
|
[CI/Build] Cleanup LoRA test (#26752)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-14 12:06:35 +00:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Ashwin Phadke
|
ab196edefb
|
Remove LoRA bias support (#25807)
Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com>
Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-10 09:50:33 +00:00 |
|
Harry Mellor
|
6c04638214
|
Fix per file ruff ignores related to line length (#26262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-06 05:12:40 +00:00 |
|
Harry Mellor
|
4e256cadc2
|
Remove all references to yapf as it's no longer used (#26251)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 09:18:11 -07:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Jee Jee Li
|
273690a50a
|
[Core] Optimize LoRA weight loading (#25403)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-23 18:19:45 +08:00 |
|