Fadi Arafeh
|
17ab54de81
|
[CPU Backend][BugFix] Fix failing Darwin pipelines (#33002)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-24 17:02:22 +00:00 |
|
7. Sun
|
cd775bdbe0
|
[Tests] Replace flaky sleep with polling in test_background_cancel (#32986)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-01-24 16:39:07 +00:00 |
|
Lucas Wilkinson
|
da5e7b12be
|
[MLA] Fuse cat and qaunt for fp8 kv-cache (#32950)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-24 16:03:02 +00:00 |
|
Louie Tsai
|
719ac592ed
|
Update CPU doc according to feedback (#32963)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-24 16:02:44 +00:00 |
|
Hiroken.
|
1209b784f2
|
[Bugfix]: resolve torch.compile cache conflict between mm_encoder_tp_modes (#32842)
Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com>
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>
|
2026-01-24 14:45:14 +00:00 |
|
Lukas Geiger
|
5fa0f6efa9
|
[EncoderCacheManager] Remove unnecessary copy (#32800)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2026-01-24 14:28:57 +00:00 |
|
david guan
|
bc0d291bfe
|
feat: Complete LoRA support for MiniMaxM2 Fixes #32736 (#32763)
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-01-24 20:48:46 +08:00 |
|
Isotr0py
|
9ad7f89f55
|
[Models]: Make Multimodal config implicit in ViT implementation (#31972)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-24 20:34:26 +08:00 |
|
Hiroken.
|
6450b536a6
|
[Bugfix] Fix E2E latency calculation and add warmup support in mm_processor benchmark (#32646)
Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com>
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com>
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>
|
2026-01-24 10:31:41 +00:00 |
|
7. Sun
|
0f19427db5
|
[Perf] Cache exc.errors() result in validation exception handler (#32984)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-01-24 02:01:35 -08:00 |
|
Cyrus Leung
|
51931c5c9a
|
[UX] Deduplicate sampling parameter startup logs (#32953)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-24 17:37:28 +08:00 |
|
Reagan Lee
|
06b557ecd9
|
feat(benchmark): add encoder forward pass benchmarking to mm-processor (#31655)
Signed-off-by: Reagan <reaganjlee@gmail.com>
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com>
Co-authored-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com>
|
2026-01-24 08:24:44 +00:00 |
|
Roger Wang
|
81c2a889ce
|
[Doc] Ignore typo check on doc (#32999)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-01-23 23:52:22 -08:00 |
|
Isotr0py
|
8edaf38570
|
[Models] Add SharedFusedMoE support to Qwen3MoE (#32082)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-23 23:36:31 -08:00 |
|
Roy Wang
|
5c86a89805
|
[docs] Update governance process links (#32995)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-23 23:32:44 -08:00 |
|
7. Sun
|
0ccecf8833
|
[Tests] Standardize RNG seed utility across test files (#32982)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-01-24 06:47:14 +00:00 |
|
7. Sun
|
0b9a735e11
|
[Tests] Clarify pytest skip reasons with actionable context (#32981)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-01-24 06:38:50 +00:00 |
|
7. Sun
|
14d03b8ddb
|
[Perf] Cache xpu_get_mem_info() result to avoid duplicate calls (#32983)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-01-23 20:56:23 -08:00 |
|
Michael Goin
|
d0cbac5827
|
[Dev UX] Add auto-detection for VLLM_PRECOMPILED_WHEEL_VARIANT during install (#32948)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Shengqi Chen <i@harrychen.xyz>
|
2026-01-23 19:15:17 -08:00 |
|
ruizcrp
|
c0d820457a
|
Auth_token added in documentation as it is required (#32988)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-24 03:03:05 +00:00 |
|
monajafi-amd
|
97ef11dd34
|
[ROCm][ViT] Enable Flash Attention Triton backend on RDNA3/RDNA4 (#32944)
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
|
2026-01-24 10:03:07 +08:00 |
|
Xin Yang
|
ecc3dd66cc
|
[Bugfix] Fix FusedMoE LoRA kernel offs_token out of bound value (#32279)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-24 01:41:35 +00:00 |
|
Joe Runde
|
7e1f10d562
|
[Core][Bugfix] allow graceful worker termination (#32965)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2026-01-23 17:28:45 -08:00 |
|
ElizaWszola
|
a28b94e6ef
|
[Performance] Split FlashAttn attention and cache update (#25954)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2026-01-23 17:28:06 -08:00 |
|
dolpm
|
0118cdcc02
|
[fix] add VLLM_OBJECT_STORAGE_SHM_BUFFER_NAME to compile factors (#32912)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-23 22:53:10 +00:00 |
|
Shengqi Chen
|
136c499f6e
|
[CI] fix version comparsion and exclusion patterns in upload-release-wheels.sh (#32971)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2026-01-23 22:21:49 +00:00 |
|
joninco
|
ebd0a17e0e
|
[Bugfix] Fix missing is_layer_skipped check for FusedMoE in AWQConfig (#32935)
Signed-off-by: jon <joninco@bullpoint.org>
|
2026-01-23 17:19:56 -05:00 |
|
Wentao Ye
|
37c9859fab
|
[Refactor] Clean up unused variables & func (#32692)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-23 17:04:25 -05:00 |
|
Michael Goin
|
4561f13985
|
[Refactor] Rename gptq_marlin to marlin to match MoE (#32952)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-23 16:48:12 -05:00 |
|
rasmith
|
6cc6d92be5
|
[CI][AMD][BugFix] Update wvSplitK (and other skinny_gemm wrappers) to ensure tensors passed will be made contiguous for the kernel (#32831)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2026-01-23 13:35:48 -08:00 |
|
Wentao Ye
|
dfab5f3764
|
[Bug] Fix benchmark script moe_permute_unpermute (#32949)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-23 16:18:56 -05:00 |
|
Markus / Mark
|
586a57ad7e
|
fix: Add glm4_moe_lite to MLA detection (#32614)
Signed-off-by: marksverdhei <marksverdhei@hotmail.com>
Signed-off-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-01-23 12:38:57 -08:00 |
|
Lucas Wilkinson
|
3a41459501
|
[cudagraphs] Refactor cudagraph capture loop (#32946)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-23 13:22:20 -07:00 |
|
Nick Hill
|
8518b30447
|
[Model Runner V2] Add KV Connector support (#32742)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-23 10:49:17 -08:00 |
|
Matthew Bonanni
|
2d6b537157
|
[Bugfix][CI] Fix pre-commit (#32956)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-23 10:26:56 -08:00 |
|
Orion Reblitz-Richardson
|
68b0a6c1ba
|
[CI][torch nightlies] Use main Dockerfile with flags for nightly torch tests (#30443)
Signed-off-by: Orion Reblitz-Richardson <orionr@meta.com>
Signed-off-by: Orion Reblitz-Richardson <orionr@gmail.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-01-23 10:22:56 -08:00 |
|
Harry Huang
|
5206e5e28c
|
[V1][Hybrid] Mamba Prefix Caching with align mode (#30877)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2026-01-23 09:56:48 -08:00 |
|
Matteo Fari
|
fec9da0af4
|
[Model] Enable LoRA support for internvl2 (#32397)
Signed-off-by: Matteo Fari <matteofari06@gmail.com>
|
2026-01-24 01:39:01 +08:00 |
|
Luka Govedič
|
bbbd696af9
|
[torch.compile][CI] Add back attn fusion on hopper/ada (#32940)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2026-01-23 16:49:20 +00:00 |
|
sangbumlikeagod
|
9b77bb790d
|
[Frontend] add logprob, compression_rate to 'verbose_json' features (#31059)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
|
2026-01-23 16:35:13 +00:00 |
|
Matt
|
305e53ade8
|
[Hardware][AMD][CI][Bugfix] Fix Kernels Attention Cache test (#32904)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-23 16:24:26 +00:00 |
|
Mark McLoughlin
|
1cb4341fbc
|
[ROCm][PD] Remove unused moriio connector proxy code (#32939)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-01-23 15:59:04 +00:00 |
|
baonudesifeizhai
|
1fb648bf10
|
[Bugfix] Fix FP8 MoE EP Weight Loading for ModelOpt Llama4 (#32886)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
|
2026-01-23 10:31:48 -05:00 |
|
Nicolò Lucchesi
|
7e22309755
|
[Misc] Postpone torch_profiler deprecation (#32867)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-23 14:39:48 +00:00 |
|
Xin Yang
|
90c2007932
|
[Bugfix] Disable tma_aligned_scales in test_fusions_e2e (#32916)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-23 14:34:30 +00:00 |
|
Raushan Turganbay
|
d95d650762
|
[Bugfix] Fix getting vision features in Transformer Multimodal backend (#32933)
Signed-off-by: raushan <raushan@huggingface.co>
|
2026-01-23 13:34:48 +00:00 |
|
tianshu-Michael-yu
|
13d8746c54
|
[Feature]: Remove DtoH Copy for lfm2_vl On Default Stream (#32815)
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
|
2026-01-23 13:20:30 +00:00 |
|
Fadi Arafeh
|
10e94c84f6
|
[CPU][Feat] Update PyTorch to v2.10 for CPU Backend (#32869)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-01-23 21:13:06 +08:00 |
|
Isotr0py
|
243e78c20f
|
[Benchmark][Bugfix] Fix race condtion when starting server for sweep benchmark (#32927)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-23 12:11:18 +00:00 |
|
Fadi Arafeh
|
aac0b817fa
|
[CPU Backend][BugFix] Fix failing CPU MoE test (#32876)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-23 12:06:51 +00:00 |
|