biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jee Jee Li	1eaff27815	[V0 deprecation] Remove long context LoRA (#21169 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-19 02:15:41 -07:00
Woosuk Kwon	dd572c0ab3	[V0 Deprecation] Remove V0 Spec Decode workers (#21152 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-18 21:47:50 -07:00
Lucia Fang	9a9fda1423	[Core] Support Local Chunked Attention for Hybrid KV Cache (#19351 ) Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Lu Fang <fanglu@meta.com> Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Lu Fang <fanglu@meta.com>	2025-07-18 20:48:38 -07:00
Rui Qiao	217937221b	Elastic Expert Parallel Initial Support (#20775 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-07-18 17:46:09 -07:00
wang.yuqi	ca4eb82bcb	[Model] Re-add the implicit conversion feature for as_seq_cls_model (#21103 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-18 07:15:07 +00:00
Woosuk Kwon	4de7146351	[V0 deprecation] Remove V0 HPU backend (#21131 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-17 16:37:36 -07:00
Nir David	01513a334a	Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) (#12010 ) Signed-off-by: Nir David <ndavid@habana.ai> Signed-off-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai>	2025-07-16 15:33:41 -04:00
Harry Mellor	313ae8c16a	[Deprecation] Remove everything scheduled for removal in v0.10.0 (#20979 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-15 15:57:53 +00:00
Harry Mellor	56fe4bedd6	[Deprecation] Remove `TokenizerPoolConfig` (#20968 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-15 14:00:50 +00:00
Thomas Parnell	3534c39a20	[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-15 04:04:35 -07:00
Woosuk Kwon	d4d309409f	Implement Async Scheduling (#19970 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-14 23:01:46 -07:00
Tyler Michael Smith	559756214b	Change default model to Qwen3-0.6B (#20335 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-07-14 16:54:52 +00:00
Maroon Ayoub	66f6fbd393	[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511 ) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>	2025-07-14 02:45:31 +00:00
22quinn	8632e831ba	[Core] Add `update_config` RPC method (#20095 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-07-14 00:49:18 +00:00
Wang Siyuan	247102f07f	[Bugfix] Fix: add patch_rope_scaling after hf override (#20857 ) Signed-off-by: Wang Siyuan <wsy0227@sjtu.edu.cn> Signed-off-by: Wang Siyuan <sywang0227@gmail.com>	2025-07-13 00:13:25 -07:00
Nicolò Lucchesi	020f58abcd	[Core] Support multiple tasks per model (#20771 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-12 19:40:11 -07:00
Nicolò Lucchesi	3c7d942da8	[Frontend] Abstract prompt and SpeechToTextConfig for transcriptions models (#20637 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-07-11 21:33:26 -07:00
Ilya Markov	fc0f41d10a	Integration SM100 FlashInfer fused allreduce RMSNorm (#20691 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-07-11 18:58:15 -07:00
Luka Govedič	762be26a8e	[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging (#20777 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com> Signed-off-by: luka <lgovedic@redhat.com>	2025-07-11 00:15:22 -07:00
nopperl	5d09152ff1	[V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine (#20660 ) Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>	2025-07-11 05:53:31 +00:00
bigmoyan	0cf893cae1	Add kimi-k2 tool parser (#20789 ) Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@msh.team>	2025-07-11 10:36:23 +08:00
Alex Brooks	41060c6e08	[Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] (#19126 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-07-10 21:09:37 +01:00
Nathan Hoos	d6902ce79f	[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975 ) Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>	2025-07-10 15:30:26 -04:00
Nick Hill	ffbcc9e757	[BugFix] Fix `VllmConfig()` construction on all platforms (#20695 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-10 07:00:20 +00:00
zhrrr	34dad19e7b	[Bugfix] set default set cuda_graph_sizes to min(self.max_num_seqs * 2, 512) (#20628 ) Signed-off-by: izhuhaoran <izhuhaoran@qq.com>	2025-07-09 11:02:51 +08:00
Sanger Steel	72d14d0eed	[Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load (#19619 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com> Co-authored-by: Eta <esyra@coreweave.com>	2025-07-07 22:47:43 -07:00
Anton	e601efcb10	[Misc] Add fully interleaved support for multimodal 'string' content format (#14047 ) Signed-off-by: drobyshev.anton <drobyshev.anton@wb.ru> Co-authored-by: drobyshev.anton <drobyshev.anton@wb.ru>	2025-07-07 19:43:08 +00:00
wang.yuqi	110df74332	[Model][Last/4] Automatic conversion of CrossEncoding model (#19675 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-07 14:46:04 +00:00
Cyrus Leung	9fb52e523a	[V1] Support any head size for FlexAttention backend (#20467 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-06 09:54:36 -07:00
wang.yuqi	2e26f9156a	[Model][3/N] Automatic conversion of CrossEncoding model (#20168 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-04 05:47:39 -07:00
wang.yuqi	6f1229f91d	[Model][2/N] Automatic conversion of CrossEncoding model (#19978 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-03 13:59:23 +00:00
Jee Jee Li	1819fbda63	[Quantization] Bump to use latest bitsandbytes (#20424 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-03 21:58:46 +08:00
Cyrus Leung	b024a42e93	[Core] Move multimodal placeholder from chat utils to model definition (#20355 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-03 08:18:30 +00:00
Nick Hill	657f2f301a	[DP] Support external DP Load Balancer mode (#19790 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-02 10:21:52 -07:00
WangHuaqiang	ccbfb1d1c9	[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322 ) Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>	2025-07-02 12:53:36 +00:00
Chenheli Hua	2e7cbf2d7d	[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-07-01 23:34:03 -07:00
Lionel Villard	c05596f1a3	[Perf] Validate @config in pre-commit instead of dynamically (#20200 ) Signed-off-by: Lionel Villard <villard@us.ibm.com>	2025-07-01 05:10:28 -04:00
Luka Govedič	6d42ce8315	[CLI] Improve CLI arg parsing for `-O`/`--compilation-config` (#20156 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-07-01 01:03:13 +00:00
Jiayi Yan	7b460c25f9	[BugFix] Fix the incorrect func name in the comments. (config.py) (#20185 )	2025-06-27 22:51:16 -07:00
Luka Govedič	aafabaa0d5	[Fix][torch.compile] Enable custom ops by default when Inductor off (#20102 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-06-27 09:00:42 -06:00
Michael Goin	4ab3ac285e	[Bugfix] Fix flaky failure when getting DP ports (#20151 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-27 15:30:53 +08:00
wang.yuqi	cd4cfee689	[Model][1/N] Automatic conversion of CrossEncoding model (#20012 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-06-26 21:10:04 -07:00
Bowen Wang	e9fd658a73	[Feature] Expert Parallelism Load Balancer (EPLB) (#18343 ) Signed-off-by: Bowen Wang <abmfy@icloud.com>	2025-06-26 15:30:21 -07:00
Michael Goin	1f5d178e9c	Revert "[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine" (#20128 )	2025-06-26 07:32:22 -07:00
zhrrr	9f0608fc16	[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine (#20062 ) Signed-off-by: izhuhaoran <izhuhaoran@qq.com>	2025-06-25 21:03:17 +00:00
Aaron Pham	ba7ba35cda	[Chore] debloat some initial logs (#19438 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-06-25 06:36:22 +00:00
David Xia	7108934142	[Frontend] speed up import time of vllm.config (#18036 ) Signed-off-by: David Xia <david@davidxia.com>	2025-06-25 00:41:11 -04:00
cascade	e6327c9b3e	[Feature] Support sequence parallelism for static fp8 quantization (#19181 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-06-23 16:09:02 -04:00
Aaron Pham	c4cf260677	[Perf][CLI] Improve overall startup time (#19941 )	2025-06-22 23:11:22 +00:00
Aaron Pham	e91386cde1	[Chore] dedup logs (#19955 )	2025-06-22 19:43:07 +00:00

1 2 3 4 5 ...

671 Commits