biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Harry Mellor	a8b70304d6	Update `rope_scaling` to `rope_parameters` in preparation for Transformers v5 (#28542 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 09:06:36 -08:00
vnadathur	1ffe934c8a	[torch.compile] caching of config fields should be opt-out by default (#26468 ) Signed-off-by: vnadathur <glvikramn@gmail.com> Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com> Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com> Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com> Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-19 06:13:54 -08:00
Tova Movshovitz	ba558c029a	[config] Expose `get_total_num_hidden_layers()` in ModelConfig (#28961 ) Signed-off-by: tovam <tovam@pliops.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-19 11:37:11 +00:00
Li, Jiang	20852c8f4c	[CPU] Refactor CPU WNA16 (#28826 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-19 10:32:00 +08:00
Luciano Martins	c2612371ad	[Model] Add Gemma3 GGUF multimodal support (#27772 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 08:56:29 -08:00
Ronald	d8874c61a5	[Core] Async Scheduling X Spec Decoding Compatibility (#24799 ) Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-17 12:16:20 -08:00
Lucas Wilkinson	64e39d667c	[BugFix] Temporary fix for IMA with MTP = 2 and full-cg (#28315 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-17 09:41:22 -05:00
Lucia Fang	b316ac6589	[V1] Support MP Executor for multi node distributed inference (#23691 ) Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-16 09:01:21 +00:00
Angela Yi	f36292dbee	[compile] Enable sequence parallelism matching w/o custom ops enabled (#27126 ) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-11-15 11:46:12 +00:00
Vadim Gimpelson	173b356abf	[PERF] Remove TRTLLM Gen attn kernel limitation `max_seq_len <=131072` (#28755 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-15 15:43:41 +05:30
Cyrus Leung	638e4196d1	[Misc] Make `SchedulerConfig.max_model_len` init-only (#28733 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-15 01:59:31 -08:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Harry Mellor	5f3cd7f7f2	[Docs] Update the name of `Transformers backend` -> `Transformers modeling backend` (#28725 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-14 16:34:14 +00:00
Cyrus Leung	511a6b611d	[Config] Clean up SchedulerConfig initialization (#28665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 22:41:02 +08:00
Jingchun Gao	4516d44b7f	[DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer (#25438 ) Signed-off-by: gaojc <1055866782@qq.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: gaojingchun (A) <g00955623@china.huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-14 11:24:10 +00:00
elvischenv	5d6ce2b960	[Perf] Support stream interval for reducing host overhead (#27869 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-13 13:21:25 -05:00
Roger Wang	d3387750f1	[Misc] Turn off encoder torch compile by default (#28634 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-13 08:38:08 -08:00
Harry Mellor	b230286fbc	Fix `get_num_experts` when config sets it explicitly to `None` (#28652 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: bruceszchen <bruceszchen@tencent.com>	2025-11-13 16:02:42 +00:00
Nick Hill	8832fff972	[BugFix] Fix `mm_encoder_attn_backend` arg type checking (#28599 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-13 03:06:03 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	4ca5cd5740	[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695 ) Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-11-12 15:24:12 -08:00
Thomas Parnell	64d57c3be7	[Model] [Config] Correctly identify granite-4.0-micro as non-hybrid model (#28563 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-11-12 18:17:55 +00:00
PerryZhang01	a1e7fa362a	[EPLB][ROCm]: support EPBL for ROCm backend (#27731 ) Signed-off-by: Perry Zhang <perzhang@amd.com> Co-authored-by: Perry Zhang <perzhang@amd.com>	2025-11-12 18:16:35 +00:00
Harry Mellor	a742134cc5	Remove deprecated fields from `CompilationConfig` (#27593 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 16:10:28 +00:00
TJian	edb59a9470	[ROCm] [Bugfix] Fix `fused_qknorm_rope_kernel` rocm compatibility (#28500 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-12 05:01:14 -08:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Yanan Cao	48c879369f	[Frontend] Change CompilationMode to a proper Enum (#28165 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-11 19:46:18 -05:00
zhrrr	68c09efc37	[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2025-11-11 12:00:31 -05:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Ilya Markov	d17ecc6b19	[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-10 18:33:11 -05:00
Robert Shaw	30700b1cd7	[CI] Fix Plugin Tests Tests (#28413 ) Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>	2025-11-10 22:36:11 +00:00
zhangsicheng5	2108a571d7	[DCP] Support dcp kv_cache interleave size > 1 (#26696 ) Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: Qiu <qiuchunshuo@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-09 04:45:27 +09:00
Aurick Qiao	781f5ebf52	Bump arctic-inference requirement (#28174 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-07 18:31:18 -08:00
Nick Hill	da786e339e	[Core] Rework handling of async scheduling config (#28250 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 20:01:23 +00:00
Harry Mellor	c0a4b95d64	Fix issues from #28242 (#28257 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-07 04:23:17 +00:00
Lucas Kabela	4bf56c79cc	[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-11-07 00:16:03 +00:00
Vadim Gimpelson	b6a248bdd7	[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-05 17:01:12 -08:00
Walter Beller-Morales	752ddeacaa	[Core] add support for reasoning parser plugins (#28075 ) Signed-off-by: walter beller-morales <walter.beller.morales@gmail.com>	2025-11-06 01:15:06 +08:00
Vadim Gimpelson	d4e547bb7e	Revert "[PERF] Decouple projections from GDN custom op" (#28080 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-04 15:58:23 -08:00
Harry Mellor	2f1cc8cef1	Remove deprecated `--rope-scaling` and `--rope-theta` (#28006 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-04 18:01:56 +00:00
yt0428	05cae69f0f	[model] Add support for openPangu_Ultra_MoE (#27521 ) Signed-off-by: yuantao <2422264527@qq.com> Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-04 08:17:20 -08:00
Vadim Gimpelson	5fd8f02ea9	[PERF] Decouple projections from GDN custom op (#27512 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-04 08:11:41 -08:00
Ning Xie	786030721e	[Docs] add runai_streamer_sharded to LoadConfig (#27937 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-03 20:35:16 +00:00
Aurick Qiao	2c19d96777	[Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>	2025-11-03 09:23:31 -08:00
ahao-anyscale	cac4c10ef0	[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-11-03 11:13:51 -05:00
Asaf Joseph Gardin	00b31a36a2	[V1] [Hybrid] Mamba1 Automatic Prefix Caching (#26377 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-11-02 04:16:23 -08:00
Cyrus Leung	99d69af9ec	[Bugfix] Python 3.10 compatibility for `Self` (#27918 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-01 15:28:54 +00:00
Yihua Cheng	e675118849	[Add] cmdline argument parsing for KV cache offloading modules (#27621 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-01 07:17:07 +00:00
Vinay R Damodaran	5e8862e9e0	[Feature] Pydantic validation for scheduler.py and structured_outputs.py (#26519 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-31 18:05:50 +00:00
Nick Hill	9e5bd3076e	[Cleanup] Remove no-longer-used `SpeculativeConfig.enable_chunked_prefill` (#27826 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-31 10:57:45 -07:00
Paul Zhang	e7acb20076	[Feature] Batch invariant torch.compile (#27660 ) Signed-off-by: PaulZhang12 <paulzhan@fb.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-30 13:11:29 -07:00

1 2 3 4 5 ...

276 Commits