Vadim Gimpelson
|
05d96d7991
|
merge
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-26 01:25:41 -07:00 |
|
Roy Wang
|
faa80947f5
|
[Performance] Add --enable-ep-weight-filter CLI option (#37351)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit 761e0aa7a0)
|
2026-03-18 01:41:25 -07:00 |
|
Matthew Bonanni
|
93f3c8e531
|
[Misc] Add float16 to CacheDType (#37199)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-16 13:24:48 -07:00 |
|
Yuanheng Zhao
|
8d8855fdae
|
[Bugfix] Add safety check and fallback for null scaling factor (#36106)
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 14:27:29 +00:00 |
|
Artem Perevedentsev
|
f5e59ee7a6
|
[Performance] Add prefetch for checkpoints to OS page cache (#36012)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
2026-03-16 11:32:02 +00:00 |
|
leo-cf-tian
|
2754231ba3
|
[Kernel] Add FlashInfer MoE A2A Kernel (#36022)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Leo Tian <lctian@nvidia.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>
|
2026-03-15 23:45:32 -07:00 |
|
Hari
|
a3e2e250f0
|
[Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614)
Signed-off-by: hasethuraman <hsethuraman@microsoft.com>
|
2026-03-15 19:38:21 +08:00 |
|
arlo
|
8c29042bb9
|
[Feature] Add InstantTensor weight loader (#36139)
|
2026-03-14 18:05:23 +01:00 |
|
Matthew Bonanni
|
9efc4db965
|
[Bugfix] Fix DeepSeek-V3.2 tokenizer stripping spaces (#37004)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-13 22:55:36 +00:00 |
|
Mark McLoughlin
|
7afe0faab1
|
[Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish (#36666)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-13 12:10:06 -07:00 |
|
Harry Mellor
|
5a3f1eb62f
|
[Misc] Set default kv_buffer_device in a better way (#36862)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-13 19:07:33 +00:00 |
|
Itay Alroy
|
d5af196c18
|
[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
|
2026-03-13 09:25:33 -04:00 |
|
Nick Hill
|
cd32d6f586
|
[Model Runner V2] Some code simplification (#36929)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-13 00:59:23 +00:00 |
|
Matthew Bonanni
|
f444c05c32
|
[Attention] Use FA4 for MLA prefill (#34732)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-12 12:10:17 -04:00 |
|
Giancarlo Delfin
|
c77181e534
|
[Model Runner V2] Add probabilistic rejection sampling for spec decoding (#35461)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-11 14:04:32 -07:00 |
|
汪志鹏
|
ff1e3d9c63
|
[BugFix]: add bagel to MM_PREFIX_LM_MODELS (#36316)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
2026-03-11 19:55:59 +00:00 |
|
Cyrus Leung
|
196802dfa6
|
[Misc] Clean up renderers (#36770)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-11 16:39:29 +00:00 |
|
Jhao-Ting Chen
|
5573894737
|
Kimi k2.5 MLA based eagle3 (#36361)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Izzy Putterman <iputterman@nvidia.com>
|
2026-03-11 11:36:11 -04:00 |
|
Michael Goin
|
9c34e9d24f
|
Disable cascade attention by default (#36318)
|
2026-03-11 03:12:23 -07:00 |
|
liuzhenwei
|
f22d6e0267
|
[Hardware][NIXL] set default kv buffer type for different platform (#36438)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-11 05:19:28 +00:00 |
|
wang.yuqi
|
a3189a08b0
|
[Model] Consolidate score logic by introduce score_type (#36479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-10 13:32:25 +00:00 |
|
Mark McLoughlin
|
234860399b
|
[Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270) (#36628)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-03-10 06:20:41 -07:00 |
|
Zhuohan Li
|
04b67d8f62
|
Remove unused disable_fallback field (#36546)
|
2026-03-09 20:56:54 -07:00 |
|
Lucas Wilkinson
|
483463f735
|
[MRV2] Extensible CG dispatch rework (#35959)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-03-09 13:58:45 -07:00 |
|
Copilot
|
4b87ffbefb
|
[torch.compile] Rename compile_ranges_split_points to compile_ranges_endpoints (#36027)
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-09 18:04:40 +00:00 |
|
Matthew Bonanni
|
77a73458e3
|
Reapply [Attention] Refactor check_and_update_config (#35122)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-09 07:17:14 -07:00 |
|
Tushar Shetty
|
c4d859c274
|
[Bugfix] Skip out-of-stage layers in get_layers_from_vllm_config for pipeline parallel (#36243)
Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com>
Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>
|
2026-03-08 20:40:16 -07:00 |
|
PatchyTIS
|
a6be75dbd2
|
[Core] NGram GPU Implementation compatible with Async Scheduler (#29184)
|
2026-03-07 13:51:37 -08:00 |
|
lif
|
00b814ba5a
|
[V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
|
2026-03-07 22:09:55 +08:00 |
|
Copilot
|
ce8546a12b
|
[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538)
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: ProExpertProg <luka.govedic@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-06 23:55:06 +00:00 |
|
Mark McLoughlin
|
27066d1b2b
|
[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-03-05 22:04:31 -08:00 |
|
Shiyan Deng
|
03a49bb8f0
|
[Feature] Add --distributed-timeout-seconds CLI option (#36047)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-03-05 20:57:51 -08:00 |
|
Yanhong Li
|
a911f4dd20
|
[Model] Add support for OLMo Hybrid (#32550)
|
2026-03-05 14:51:06 -05:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Cyrus Leung
|
7196348157
|
[Bugfix] Fix Qwen-VL tokenizer implementation (#36140)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-05 08:07:19 -08:00 |
|
Seiji Eicher
|
e2b31243c0
|
[Docs] Update CacheConfig block_size docstring to remove inaccurate limit when using CUDA (#35632)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2026-03-05 06:24:08 +00:00 |
|
Martin Hickey
|
c3598d02fa
|
[Misc] Remove deprecated items that are due for removal (#36006)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2026-03-05 06:14:50 +00:00 |
|
Harry Mellor
|
17dc9c7fc9
|
[CI] Bump mypy version (#34950)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 20:55:11 +00:00 |
|
fenypatel99
|
7eca859110
|
Add PyTorch profiler schedule support with warmup/active iterations (#35240)
|
2026-03-04 12:53:38 -08:00 |
|
Nicolò Lucchesi
|
18e01a0a10
|
[Misc] Add --attention-backend auto option (#35738)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-04 15:12:27 +00:00 |
|
sungsoo ha
|
6cb901093f
|
[Core] Add All-to-All communication backend for DCP (#34883)
Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com>
Signed-off-by: sungsoo ha <hasungsoo@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 10:01:57 -05:00 |
|
haosdent
|
d6e04f4c43
|
[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094) (#34571)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-04 11:56:22 +01:00 |
|
Yashwant Bezawada
|
a13d8c03c9
|
[KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops (#31057)
Signed-off-by: Yashwant Bezawada <yashwant_b@me.com>
|
2026-03-02 15:04:47 -05:00 |
|
Fynn Schmitt-Ulms
|
9433acb8df
|
[Spec Decode] Add hidden states extraction system (#33736)
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
|
2026-03-02 14:29:09 -05:00 |
|
ElizaWszola
|
d9c7730877
|
[Performance] Extract kv update ops from MLA attention backends (#34627)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Di Wu <dw2761@nyu.edu>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-02 10:43:19 -05:00 |
|
wangxiyuan
|
510bc9e1df
|
[Misc] Cleanup useless current_platform import (#35715)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2026-03-02 09:36:54 +00:00 |
|
Lucas Wilkinson
|
8b5014d3dd
|
[Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-01 23:44:57 +00:00 |
|
Richard Zou
|
e82fbeec7b
|
[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-01 21:44:22 +00:00 |
|
Ilya Markov
|
b2d8b422b2
|
[EPLB] Enforce sync eplb for NCCL-based all2all backend (#35212)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-02-28 05:47:12 +00:00 |
|
Itay Alroy
|
dea268336f
|
[1/N] Elastic EP Milestone 2 (#34861)
Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
|
2026-02-28 04:46:42 +00:00 |
|