Commit Graph

303 Commits

Author SHA1 Message Date
Xin Yang
9bd7231106 Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205)" (#38778)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-04-01 22:02:32 -07:00
bnellnm
7cf56a59a2 [MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-04-01 09:44:08 -04:00
yzong-rh
d9b90a07ac [MoE Refactor] Migrate Unquantized to Full Oracle Flow (#36286)
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: yzong-rh <yzong@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-03-31 15:43:33 -04:00
Andreas Karatzas
679c6a3ecc [Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-25 08:17:33 +08:00
Ranran
dc6908ac6a [Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (#35007)
Signed-off-by: Ranran <1012869439@qq.com>
Signed-off-by: Ranran <hzz5361@psu.edu>
Signed-off-by: ran <hzz5361@psu.edu>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-03-23 18:31:14 -04:00
Jee Jee Li
aec2dc6c0d [Bugfix][LoRA] Fix incorrect LoRA Log (#37877)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-23 11:42:52 +00:00
Jee Jee Li
96266f119b [LoRA] Minor improvements to LoRA log (#37557)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-03-19 15:18:06 +00:00
Xin Yang
b1169d7be8 [Kernel] Add gpt-oss Router GEMM kernel (#37205)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-03-18 08:15:56 -07:00
Jee Jee Li
8c31f47c63 [LoRA] Make LoRA respect language_model_only (#37375)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-18 07:53:34 +00:00
Bhoomit
3717a4dd47 [Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-17 14:36:41 +00:00
Kyuyeun Kim
0a0a1a198b Add ability to replace oot ops when using lora (#37181)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
2026-03-16 18:04:15 -07:00
yugong333
b3ce711b93 Fp8 lora dense kernel (#35242)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
2026-03-13 19:05:08 +00:00
Xinan Miao
2cdf92228c [Feature]: Remove Chunking From FusedMoE (#34086)
Signed-off-by: SouthWest7 <am1ao@qq.com>
Signed-off-by: Southwest <1403572259@qq.com>
Signed-off-by: southwest <am1ao@qq.com>
Signed-off-by: Xinan Miao <1403572259@qq.com>
Co-authored-by: SouthWest7 <am1ao@qq.com>
2026-03-12 14:24:38 -04:00
Ethan T.
c87fb515ed fix(lora): use replaced_module_name in pooling model name check (#36402)
Signed-off-by: gambletan <ethanchang32@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 03:11:27 -07:00
Ning Xie
fe714dd507 [openapi server] log exception in exception handler(2/N) (#36201)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-03-10 20:16:30 -07:00
Harry Mellor
f83b933b84 [CI] Bump mypy version to 1.19.1 (#36104)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-10 09:18:28 -07:00
wang.yuqi
a3189a08b0 [Model] Consolidate score logic by introduce score_type (#36479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-10 13:32:25 +00:00
Harry Mellor
a0f44bb616 Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-08 20:05:24 -07:00
Jiayi Yan
6a895197fa [Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-05 17:05:46 +00:00
daje0601
3b23d57c96 [Model] Add LoRA support for Whisper models (#29856)
Signed-off-by: daje0601 <englishmt4118@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-05 10:38:25 +08:00
Robert Shaw
97995f6376 [MoE Refactor] Create MK for TRTLLM Kernels (#32564)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2026-03-03 10:39:50 -08:00
Runkai Tao
ada4f4fadd [Fix Bug]num_active_loras always equals to zero (#34119)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-02 23:17:46 +08:00
gnovack
3ecd0bf9fc Add TMA support to fused_moe_lora kernel (#32195)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-01 10:55:25 +08:00
gnovack
a532c83849 use 'max_active_experts' for moe lora input size (#33197)
Signed-off-by: gnovack <gnovack@amazon.com>
2026-02-27 03:50:43 +00:00
Lucas Wilkinson
5e58bdc711 [Bugfix] Remove erroneous lower bound on LoRA vocab size constraint (#35354)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-02-26 18:44:50 +00:00
Runkai Tao
a1f53addb1 [BugFix] Align fused MoE-LoRA kernel config with actual weight shapes (#34396)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
2026-02-26 18:03:10 +00:00
Chaojun Zhang
9f9a675b23 [XPU][8/N] Fix kernel bugs in XPU LoRA and MOE LORA (#34115)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-02-26 15:46:44 +08:00
Bhoomit
42489e43c2 [Misc][LoRA] Increase max vocab size limit to 258048 in logits processor (#34773)
Signed-off-by: Bhoomit Vasani <vbhoomit@amazon.com>
2026-02-25 23:30:55 +08:00
Jhao-Ting Chen
c2c4c4611a [FIX] fused moe with lora shared expert dual stream (1.07x otps) (#34933)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-02-25 04:40:45 +00:00
Xin Yang
c870eb9e0f [LoRA] Update LoRA expand kernel block_n calculation (#32621)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-02-23 23:17:53 -08:00
yugong333
a55caf6ae9 [LoRA] Support Quantized Adapters (#30286)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Signed-off-by: wz1qqx <ziqi.wang@novita.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: wz1qqx <55830058+wz1qqx@users.noreply.github.com>
Co-authored-by: wz1qqx <ziqi.wang@novita.ai>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-20 19:54:35 -08:00
bnellnm
5bff999d12 [Bugfix] Add method to swap quant_method on FusedMoE to fix LoRA issues (#34453)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-02-15 20:10:50 -08:00
Runkai Tao
e1d97c38f8 [Bug Fix] Fix naive_block_assignment always defaulting to False due to arg misalignment (#33848)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
2026-02-12 11:30:57 +08:00
Reagan Lee
7c233dbb36 [Tiny] Rename encoder budget file to more specific name (#34103)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
2026-02-09 03:48:19 +00:00
Xin Yang
3920cafdd6 [Bugfix] Fix _fused_moe_lora_expand signature mismatch (#33821)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-02-07 10:45:59 +08:00
Kurt Shuster
2991dd3d22 [Bugfix][Model] Support LoRA on Qwen3 Output Embedding (#29816)
Signed-off-by: kurt <kurt@thinkingmachines.ai>
2026-02-06 20:25:31 +08:00
danisereb
5b2a9422f0 [BugFix] Fix LoRA Fp8 (#33879)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2026-02-05 17:25:55 +00:00
yugong333
ffe1fc7a28 Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
2026-02-02 12:30:06 -05:00
Cyrus Leung
d7e17aaacd [Refactor] Move profiling methods to MM budget (#33559)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-02 23:27:00 +08:00
Runkai Tao
7320ca3942 Add unpermute-aware fused MoE LoRA path (#32655)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
2026-02-02 09:46:09 +08:00
Danielle Robinson
74898a7015 [BugFix][LoRA] TritonExperts is ModularMoEPath for FP8 models (#33393)
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
2026-01-30 15:27:42 +00:00
杨朱 · Kiki
1a7894dbdf [Misc] Replace Optional[X] with X | None syntax (#33332)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 01:56:59 -08:00
Cyrus Leung
c87eac18f7 [Refactor] Move MM item count validation outside of processor (#33396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-30 09:27:31 +00:00
cwazai
f210f0b7b1 [lora/moe] Avoid extra intermediate buffer & Python slicing in expand phase when split_k == 1 (#32774)
Signed-off-by: 陈建华 <1647430658@qq.com>
2026-01-29 00:22:45 +08:00
danisereb
f3a5ee705f [LoRA][Spec Decode] Support LoRA for Nemotron-H MTP models (#32265)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-27 07:53:26 -08:00
cwazai
e33192b269 [lora/moe] Improve fused MoE‑LoRA kernel indexing and memory access (#32770)
Signed-off-by: 陈建华 <1647430658@qq.com>
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
Signed-off-by: kimheesu <wlskaka4@gmail.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: ganyi <ygan@amd.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Alex Sun <alex.s@amd.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Signed-off-by: Richard Zou <zou3519@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: AuYang <459461160@qq.com>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Karan Bansal <karanb192@gmail.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
Signed-off-by: Matteo Fari <matteofari06@gmail.com>
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Orion Reblitz-Richardson <orionr@meta.com>
Signed-off-by: Orion Reblitz-Richardson <orionr@gmail.com>
Signed-off-by: marksverdhei <marksverdhei@hotmail.com>
Signed-off-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Randall Smith <ransmith@amd.com>
Signed-off-by: jon <joninco@bullpoint.org>
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Reagan <reaganjlee@gmail.com>
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com>
Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com>
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: cwazai <38356712+cwazai@users.noreply.github.com>
Co-authored-by: Yanwen Lin <lyw1124278064@gmail.com>
Co-authored-by: Kim Hee Su <wlskaka4@gmail.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Pleaplusone <ygan@amd.com>
Co-authored-by: whx <56632993+whx-sjtu@users.noreply.github.com>
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: danisereb <daserebrenik@nvidia.com>
Co-authored-by: Yanan Cao <gmagogsfm@users.noreply.github.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Matt <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Kebe <mail@kebe7jun.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Alex Sun <minchsun@amd.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Lucas Kabela <lucaskabela@meta.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: Xu Jinyang <72930776+AuYang261@users.noreply.github.com>
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: David Ramon Prados <davidramon3@hotmail.es>
Co-authored-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Fadi Arafeh <115173828+fadara01@users.noreply.github.com>
Co-authored-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Rishabh Saini <rishabhsaini01@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Karan Bansal <karanb192@users.noreply.github.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: tianshu-Michael-yu <101950379+tianshu-Michael-yu@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
Co-authored-by: Matteo Fari <matteofari06@gmail.com>
Co-authored-by: Harry Huang <vastrockhuang162@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Orion Reblitz-Richardson <orionr@gmail.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: joninco <joninco@bullpoint.org>
Co-authored-by: dolpm <34420038+dolpm@users.noreply.github.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: monajafi-amd <mohammad.najafi@amd.com>
Co-authored-by: ruizcrp <ruiz.crp@gmail.com>
Co-authored-by: Shengqi Chen <i@harrychen.xyz>
Co-authored-by: 7. Sun <jhao.sun@gmail.com>
Co-authored-by: Roy Wang <jasonailu87@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com>
Co-authored-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com>
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>
Co-authored-by: david guan <102001211+Chenhao-Guan@users.noreply.github.com>
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Maryam Tahhan <mtahhan@redhat.com>
Co-authored-by: Joshua Deng <91448271+joshuadeng@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-26 04:56:34 -08:00
Danielle Robinson
ee484b3f4b Set splitk=1 for fused-moe-lora expand kernel (#32882)
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-25 22:52:34 -08:00
Jee Jee Li
73b243463b [BugFix] Add env variable to control PDL in LoRA (#32836)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-25 16:32:30 +08:00
yugong333
d4dbb7af63 Using max_loras + 1 to construct grid in fused_moe_lora (#32277)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
2026-01-24 12:39:30 -05:00
Xin Yang
ecc3dd66cc [Bugfix] Fix FusedMoE LoRA kernel offs_token out of bound value (#32279)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-01-24 01:41:35 +00:00