biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Vadim Gimpelson	05d96d7991	merge Signed-off-by: khluu <khluu000@gmail.com>	2026-03-26 01:25:41 -07:00
Roy Wang	faa80947f5	[Performance] Add --enable-ep-weight-filter CLI option (#37351 ) Signed-off-by: esmeetu <jasonailu87@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (cherry picked from commit `761e0aa7a0`)	2026-03-18 01:41:25 -07:00
Matthew Bonanni	93f3c8e531	[Misc] Add `float16` to `CacheDType` (#37199 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-16 13:24:48 -07:00
Yuanheng Zhao	8d8855fdae	[Bugfix] Add safety check and fallback for null scaling factor (#36106 ) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 14:27:29 +00:00
Artem Perevedentsev	f5e59ee7a6	[Performance] Add prefetch for checkpoints to OS page cache (#36012 ) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>	2026-03-16 11:32:02 +00:00
leo-cf-tian	2754231ba3	[Kernel] Add FlashInfer MoE A2A Kernel (#36022 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: Leo Tian <lctian@nvidia.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com> Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>	2026-03-15 23:45:32 -07:00
Hari	a3e2e250f0	[Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614 ) Signed-off-by: hasethuraman <hsethuraman@microsoft.com>	2026-03-15 19:38:21 +08:00
arlo	8c29042bb9	[Feature] Add InstantTensor weight loader (#36139 )	2026-03-14 18:05:23 +01:00
Matthew Bonanni	9efc4db965	[Bugfix] Fix DeepSeek-V3.2 tokenizer stripping spaces (#37004 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-13 22:55:36 +00:00
Mark McLoughlin	7afe0faab1	[Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish (#36666 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-13 12:10:06 -07:00
Harry Mellor	5a3f1eb62f	[Misc] Set default `kv_buffer_device` in a better way (#36862 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-13 19:07:33 +00:00
Itay Alroy	d5af196c18	[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-03-13 09:25:33 -04:00
Nick Hill	cd32d6f586	[Model Runner V2] Some code simplification (#36929 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-13 00:59:23 +00:00
Matthew Bonanni	f444c05c32	[Attention] Use FA4 for MLA prefill (#34732 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-12 12:10:17 -04:00
Giancarlo Delfin	c77181e534	[Model Runner V2] Add probabilistic rejection sampling for spec decoding (#35461 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-11 14:04:32 -07:00
汪志鹏	ff1e3d9c63	[BugFix]: add bagel to MM_PREFIX_LM_MODELS (#36316 ) Signed-off-by: princepride <wangzhipeng628@gmail.com>	2026-03-11 19:55:59 +00:00
Cyrus Leung	196802dfa6	[Misc] Clean up renderers (#36770 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-11 16:39:29 +00:00
Jhao-Ting Chen	5573894737	Kimi k2.5 MLA based eagle3 (#36361 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Izzy Putterman <iputterman@nvidia.com>	2026-03-11 11:36:11 -04:00
Michael Goin	9c34e9d24f	Disable cascade attention by default (#36318 )	2026-03-11 03:12:23 -07:00
liuzhenwei	f22d6e0267	[Hardware][NIXL] set default kv buffer type for different platform (#36438 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-11 05:19:28 +00:00
wang.yuqi	a3189a08b0	[Model] Consolidate score logic by introduce score_type (#36479 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-10 13:32:25 +00:00
Mark McLoughlin	234860399b	[Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270 ) (#36628 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-03-10 06:20:41 -07:00
Zhuohan Li	04b67d8f62	Remove unused disable_fallback field (#36546 )	2026-03-09 20:56:54 -07:00
Lucas Wilkinson	483463f735	[MRV2] Extensible CG dispatch rework (#35959 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-03-09 13:58:45 -07:00
Copilot	4b87ffbefb	[torch.compile] Rename `compile_ranges_split_points` to `compile_ranges_endpoints` (#36027 ) Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-09 18:04:40 +00:00
Matthew Bonanni	77a73458e3	Reapply [Attention] Refactor `check_and_update_config` (#35122 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-09 07:17:14 -07:00
Tushar Shetty	c4d859c274	[Bugfix] Skip out-of-stage layers in get_layers_from_vllm_config for pipeline parallel (#36243 ) Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com> Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>	2026-03-08 20:40:16 -07:00
PatchyTIS	a6be75dbd2	[Core] NGram GPU Implementation compatible with Async Scheduler (#29184 )	2026-03-07 13:51:37 -08:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Copilot	ce8546a12b	[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538 ) Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: ProExpertProg <luka.govedic@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-06 23:55:06 +00:00
Mark McLoughlin	27066d1b2b	[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-05 22:04:31 -08:00
Shiyan Deng	03a49bb8f0	[Feature] Add --distributed-timeout-seconds CLI option (#36047 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-03-05 20:57:51 -08:00
Yanhong Li	a911f4dd20	[Model] Add support for OLMo Hybrid (#32550 )	2026-03-05 14:51:06 -05:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Cyrus Leung	7196348157	[Bugfix] Fix Qwen-VL tokenizer implementation (#36140 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-05 08:07:19 -08:00
Seiji Eicher	e2b31243c0	[Docs] Update `CacheConfig` block_size docstring to remove inaccurate limit when using CUDA (#35632 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2026-03-05 06:24:08 +00:00
Martin Hickey	c3598d02fa	[Misc] Remove deprecated items that are due for removal (#36006 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2026-03-05 06:14:50 +00:00
Harry Mellor	17dc9c7fc9	[CI] Bump `mypy` version (#34950 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 20:55:11 +00:00
fenypatel99	7eca859110	Add PyTorch profiler schedule support with warmup/active iterations (#35240 )	2026-03-04 12:53:38 -08:00
Nicolò Lucchesi	18e01a0a10	[Misc] Add `--attention-backend auto` option (#35738 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-04 15:12:27 +00:00
sungsoo ha	6cb901093f	[Core] Add All-to-All communication backend for DCP (#34883 ) Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Signed-off-by: sungsoo ha <hasungsoo@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:01:57 -05:00
haosdent	d6e04f4c43	[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094 ) (#34571 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-04 11:56:22 +01:00
Yashwant Bezawada	a13d8c03c9	[KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops (#31057 ) Signed-off-by: Yashwant Bezawada <yashwant_b@me.com>	2026-03-02 15:04:47 -05:00
Fynn Schmitt-Ulms	9433acb8df	[Spec Decode] Add hidden states extraction system (#33736 ) Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>	2026-03-02 14:29:09 -05:00
ElizaWszola	d9c7730877	[Performance] Extract kv update ops from MLA attention backends (#34627 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Di Wu <dw2761@nyu.edu> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-02 10:43:19 -05:00
wangxiyuan	510bc9e1df	[Misc] Cleanup useless `current_platform` import (#35715 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-03-02 09:36:54 +00:00
Lucas Wilkinson	8b5014d3dd	[Attention] FA4 integration (#32974 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-01 23:44:57 +00:00
Richard Zou	e82fbeec7b	[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-01 21:44:22 +00:00
Ilya Markov	b2d8b422b2	[EPLB] Enforce sync eplb for NCCL-based all2all backend (#35212 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2026-02-28 05:47:12 +00:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00

1 2 3 4 5 ...

580 Commits