wliao2
|
32e0c0bfa2
|
refactor hard coded device string in test files under tests/v1 and tests/lora (#37566)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
|
2026-04-03 11:21:47 +08:00 |
|
Andreas Karatzas
|
679c6a3ecc
|
[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 08:17:33 +08:00 |
|
Jee Jee Li
|
1f0d210641
|
[CI/Build][LoRA] Update Qwen35 LoRA testing (#37816)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-23 12:55:49 +08:00 |
|
Jee Jee Li
|
8fbe3f303f
|
[Bugfix][LoRA] Fix Qwen35 LoRA (#36976)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-20 11:09:32 +08:00 |
|
Bhoomit
|
3717a4dd47
|
[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-17 14:36:41 +00:00 |
|
yugong333
|
b3ce711b93
|
Fp8 lora dense kernel (#35242)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
|
2026-03-13 19:05:08 +00:00 |
|
Kunshang Ji
|
53ec16a705
|
[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-12 07:57:47 -07:00 |
|
Micah Williamson
|
ee54f9cdb9
|
[ROCm][CI] Accept Different But Valid Output for test_olmoe_tp (#35224)
|
2026-03-07 13:50:52 -08:00 |
|
lif
|
00b814ba5a
|
[V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
|
2026-03-07 22:09:55 +08:00 |
|
daje0601
|
3b23d57c96
|
[Model] Add LoRA support for Whisper models (#29856)
Signed-off-by: daje0601 <englishmt4118@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-05 10:38:25 +08:00 |
|
Runkai Tao
|
ada4f4fadd
|
[Fix Bug]num_active_loras always equals to zero (#34119)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-02 23:17:46 +08:00 |
|
gnovack
|
3ecd0bf9fc
|
Add TMA support to fused_moe_lora kernel (#32195)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-01 10:55:25 +08:00 |
|
Itay Alroy
|
dea268336f
|
[1/N] Elastic EP Milestone 2 (#34861)
Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
|
2026-02-28 04:46:42 +00:00 |
|
gnovack
|
a532c83849
|
use 'max_active_experts' for moe lora input size (#33197)
Signed-off-by: gnovack <gnovack@amazon.com>
|
2026-02-27 03:50:43 +00:00 |
|
Lucas Wilkinson
|
5e58bdc711
|
[Bugfix] Remove erroneous lower bound on LoRA vocab size constraint (#35354)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-26 18:44:50 +00:00 |
|
Cyrus Leung
|
845ee348ef
|
[Misc] Standardize handling of mm_processor_kwargs.size (#35284)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-26 13:05:46 +00:00 |
|
Chaojun Zhang
|
9f9a675b23
|
[XPU][8/N] Fix kernel bugs in XPU LoRA and MOE LORA (#34115)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-26 15:46:44 +08:00 |
|
rasmith
|
b188bab441
|
[CI][AMD][BugFix] Add torch.cuda.set_device to test_punica_ops so punica kernels execute on same device as tensor (#34985)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-02-25 19:18:00 +00:00 |
|
Bhoomit
|
42489e43c2
|
[Misc][LoRA] Increase max vocab size limit to 258048 in logits processor (#34773)
Signed-off-by: Bhoomit Vasani <vbhoomit@amazon.com>
|
2026-02-25 23:30:55 +08:00 |
|
Xin Yang
|
a7c9f7b7ec
|
[Bugfix] Fix lora_ids in FusedMoE LoRA test (#35135)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-23 21:49:25 -05:00 |
|
Cyrus Leung
|
61cf087680
|
[Bugfix] Fix lora tests (#34834)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-18 13:22:31 -08:00 |
|
Kurt Shuster
|
2991dd3d22
|
[Bugfix][Model] Support LoRA on Qwen3 Output Embedding (#29816)
Signed-off-by: kurt <kurt@thinkingmachines.ai>
|
2026-02-06 20:25:31 +08:00 |
|
yugong333
|
ffe1fc7a28
|
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
|
2026-02-02 12:30:06 -05:00 |
|
Runkai Tao
|
7320ca3942
|
Add unpermute-aware fused MoE LoRA path (#32655)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
|
2026-02-02 09:46:09 +08:00 |
|
Jackmin801
|
12dab78f49
|
[Feat] allow inplace loading lora (#31326)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-20 10:15:20 +08:00 |
|
danisereb
|
aa7f37ccfa
|
Add support for LoRA adapters in Nemotron-H models (#30802)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-01-19 22:30:44 +08:00 |
|
Xin Yang
|
e7b68f4d6c
|
[Bugfix] Fix Triton FusedMoE LoRA (#30585)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-09 11:46:59 +00:00 |
|
gnovack
|
bde38c11df
|
fix lora moe sharding when rank < max_lora_rank (#31994)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-09 14:43:25 +08:00 |
|
Lucas Wilkinson
|
6cdf015c3c
|
[Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-08 15:20:49 -08:00 |
|
wangxiyuan
|
bb4337b34c
|
[Platform] Deprecate seed_everything (#31659)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2026-01-04 18:34:04 -08:00 |
|
B-201
|
ecd49ce7e6
|
[Fix] Align fused moe lora_b shape with peft (#31534)
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-31 09:44:59 +08:00 |
|
ZT-AIA
|
f84bf7d79b
|
Add Loraconfig parameter to get_punica_wrapper function (#31408)
Signed-off-by: ZT-AIA <1028681969@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-29 22:27:31 -08:00 |
|
Jee Jee Li
|
ce1eafd1a5
|
[Core] Initialize LoRA support for tower and connector in multi-modal models (#26674)
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Anexdeus <5142168@mail.ru>
|
2025-12-26 04:48:20 -08:00 |
|
Harry Mellor
|
af506fd76a
|
Fix instantiation of HfHubHTTPError in LoRA test (#30768)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-16 08:02:24 -08:00 |
|
Jee Jee Li
|
0e391e7570
|
[Bugfix] Fix RequestOutput miss lora_request (#30636)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-16 01:36:35 -08:00 |
|
gnovack
|
ea657f2078
|
Lora MoE Align Improvements (#29257)
Signed-off-by: gnovack <gnovack@amazon.com>
|
2025-12-09 10:35:16 +08:00 |
|
Jee Jee Li
|
67312cad11
|
[Misc] Split the LoRA code (#30253)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-09 00:59:31 +08:00 |
|
Jee Jee Li
|
b0f4866a77
|
[CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-07 20:27:11 +08:00 |
|
Cyrus Leung
|
e83b7e379c
|
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199)
|
2025-12-07 00:00:22 -08:00 |
|
Cyrus Leung
|
27f4c2fd46
|
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 23:15:42 -08:00 |
|
Harry Mellor
|
951445a52d
|
Remove default values from InitVars so that they're not stored (#29859)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-02 12:16:37 +00:00 |
|
Jee Jee Li
|
39e63dec7c
|
[LoRA] Cleanup LoRA unused code (#29611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 22:52:58 -08:00 |
|
Jee Jee Li
|
2f5f9acd55
|
[LoRA] Continue optimizing MoE LoRA weight loading (#29322)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-27 05:56:28 -08:00 |
|
Roger Wang
|
0ff70821c9
|
[Core] Deprecate xformers (#29262)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-11-24 04:18:55 +00:00 |
|
Jee Jee Li
|
1073ba68b0
|
[LoRA] Optimize 3D MoE logic (#29222)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-24 10:27:23 +08:00 |
|
Alex Brooks
|
b4734b9550
|
[Bugfix] Fix default MM LoRA alignment for single str prompts (#29140)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-11-21 13:32:30 +08:00 |
|
Jee Jee Li
|
9875be6431
|
[LoRA][2/2]Remove LoRA extra vocab (#28545)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-21 09:46:43 +08:00 |
|
gnovack
|
d69062c67a
|
add support for --fully-sharded-loras in fused_moe (#28761)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-19 16:32:00 +08:00 |
|
Varun Sundar Rabindranath
|
6b2b9fd934
|
[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness (#28322)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-10 10:45:29 +08:00 |
|
yugong333
|
2ec401bc39
|
Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-04 18:27:35 +08:00 |
|