Commit Graph

109 Commits

Author SHA1 Message Date
Luka Govedič
40bb175027 [vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
2026-03-31 22:15:05 -04:00
wliao2
4dfad17ed1 replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-31 22:32:54 +08:00
Ilya Markov
12701e8af2 [EPLB] Optmize eplb mapping and record in router for prefill (#36261)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2026-03-30 19:48:33 +00:00
Kyle Sayers
d28d86e8a3 [QeRL] Fix online quantized reloading (#38442)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-03-29 14:56:41 -06:00
Kyle Sayers
648edcf729 [QeRL] Compose online quantization with quantized reloading (#38032)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-03-27 13:22:33 -07:00
Andreas Karatzas
37a83007fe [ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm (#38167)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-26 19:54:59 +08:00
Matthias Gehre
a889b7f584 [Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
2026-03-25 11:42:58 +00:00
Andreas Karatzas
6ecba840d7 [ROCm][CI] get_cu_count was renamed to num_compute_units in #35042 (#37764)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 16:02:21 +08:00
L.B.R.
1779c09898 [ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode (#34709)
Signed-off-by: L.B.R. <lbr@mmonad.com>
Co-authored-by: L.B.R. <lbr@mmonad.com>
2026-03-20 10:11:23 -05:00
Roy Wang
821eb80c0d [Performance][Model Loader] Skip non-local expert weights during EP model loading (#37136)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
2026-03-16 01:33:36 -07:00
Lalithnarayan C
7acaea634c In-Tree AMD Zen CPU Backend via zentorch [1/N] (#35970)
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:35:35 +00:00
Hari
a3e2e250f0 [Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614)
Signed-off-by: hasethuraman <hsethuraman@microsoft.com>
2026-03-15 19:38:21 +08:00
arlo
8c29042bb9 [Feature] Add InstantTensor weight loader (#36139) 2026-03-14 18:05:23 +01:00
Kunshang Ji
53ec16a705 [Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-12 07:57:47 -07:00
Raghavan
c8c3935b70 [Bugfix][Model] Fix FP8 k_scale/v_scale not loaded for Qwen3-MoE (#35656)
Signed-off-by: raghavan <oneraghavan@gmail.com>
2026-03-04 13:15:38 +00:00
Kunshang Ji
16d2ad1d38 [Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-04 09:49:47 +00:00
William Zhang
70c73df69e [Bugfix] Fix EVS implementation for Qwen3 VL (#33607)
Signed-off-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
2026-03-04 02:18:11 +00:00
Laura Wang
2465071510 [Perf] Add opt-in SM100 Oink RMSNorm custom-op path (#31828)
Signed-off-by: Laura Wang <3700467+Laurawly@users.noreply.github.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2026-02-24 23:01:53 -08:00
Cyrus Leung
7fcb705b80 [CI/Build] Skip GCS test (#34057)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-07 08:52:38 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
13b842f271 [BugFix][Router Replay] Capture Logical Experts with EPLB (#33013)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
2026-01-31 10:12:17 -05:00
Micah Williamson
6c64c41b4a [ROCm][CI] Force max_num_seqs=1 on ROCm In test_sharded_state_loader to reduce flakiness (#33277)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-01-31 12:28:29 +08:00
Kyle Sayers
f857a03f6b [QeRL] Layerwise Reloading (#32133)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-01-30 08:50:05 -07:00
Cyrus Leung
51550179fc [Refactor] Define MM data parser in processing info instead of processor itself (#33260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-29 13:55:17 +08:00
Xin Yang
63227accf5 [Kernel] Add topk_sigmoid kernel (#31246)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-01-21 22:49:51 +00:00
whx
1861ae8aae [PluggableLayer][1/N] Define PluggableLayer (Fix ci) (#32744)
Signed-off-by: whx-sjtu <2952154980@qq.com>
2026-01-21 11:38:04 -05:00
Robert Shaw
c78ee240b3 Revert "[PluggableLayer][1/N] Define PluggableLayer" (#32725) 2026-01-21 00:21:06 +00:00
whx
4ca62a0dbd [PluggableLayer][1/N] Define PluggableLayer (#32331)
Signed-off-by: whx-sjtu <2952154980@qq.com>
2026-01-20 16:19:21 +00:00
bnellnm
327a02d8db [MoE Refactor] Separate Router into OO Classes (#30623)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-01-18 11:40:49 -05:00
Cyrus Leung
232214b2ae [Bugfix] Replace PoolingParams.normalize with use_activation (#32243)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-13 10:45:42 +00:00
Cyrus Leung
583a90e005 [Refactor] Separate sequence and token pooling types (#32026)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-10 04:53:24 +00:00
Cyrus Leung
c8ed39b9dd [Model] Reorganize pooling layers (#31973)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-09 11:02:14 +00:00
Lucas Wilkinson
6cdf015c3c [Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-08 15:20:49 -08:00
Cyrus Leung
d1b6fe007f [Chore] Further cleanup pooler (#31951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 02:16:21 -08:00
Cyrus Leung
aafd4d2354 [Chore] Try remove init_cached_hf_modules (#31786)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-07 12:34:04 +08:00
Cyrus Leung
e83b7e379c Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) 2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46 [Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Noa Neria
6366c098d7 Validating Runai Model Streamer Integration with S3 Object Storage (#29320)
Signed-off-by: Noa Neria <noa@run.ai>
2025-12-04 18:04:43 +08:00
Morrison Turnansky
0838b52e2e [Frontend][torch.compile] CompilationConfig Overhaul (#20283): Set up -O infrastructure (#26847)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: adabeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-27 01:55:58 -08:00
Chenheli Hua
839c6b7b72 [Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. (#27721)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-11-24 19:24:37 +00:00
Cyrus Leung
aab0102a26 [V0 deprecation] Remove more V0 references (#29088)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:56:59 +00:00
TJian
82b05b15e6 [BugFix] [FEAT] Enable fastsafetensors for ROCm platform (#28225)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-20 16:34:11 +00:00
Shreyas Kulkarni
95ae50b7d1 [Quantization] [Eagle] Add complete quantization support to the draft model in Eagle (#28435)
Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>
2025-11-17 15:01:34 -08:00
vllmellm
f080a83511 [RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops (#24490)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-10 08:20:53 -08:00
Alexis MacAskill
a47d94f18c Add runai model streamer e2e test for GCS (#28079)
Signed-off-by: Alexis MacAskill <amacaskill@google.com>
2025-11-07 03:07:54 +00:00
Zhewen Li
0291fbf65c [CI/Build] Fix amd model executor test (#27612)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-10-28 08:58:11 +00:00
Harry Mellor
8f18feb191 Remove last level references not removed in #26355 (#27260)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-22 09:18:17 +00:00
Nick Hill
647214f3d5 [V0 Deprecation] Remove V0 executors (#27142)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-21 11:09:37 -07:00
iAmir97
7a6c8c3fa1 [Chore] Separate out vllm.utils.network_utils (#27164)
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
2025-10-19 03:06:32 -07:00
Cyrus Leung
4d4d6bad19 [Chore] Separate out vllm.utils.importlib (#27022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-17 00:48:59 +00:00
Morrison Turnansky
e3fdb627d9 [FrontEnd] UNREVERT CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26502)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2025-10-13 22:47:16 +00:00