biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Woosuk Kwon	3b5567a209	[V1][Minor] Do not print attn backend twice (#13985 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-01 07:09:14 +00:00
Isotr0py	fdcc405346	[Doc] Consolidate `whisper` and `florence2` examples (#14050 )	2025-02-28 22:49:15 -08:00
Kuntai Du	8994dabc22	[Documentation] Add more deployment guide for Kubernetes deployment (#13841 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-03-01 06:44:24 +00:00
Li, Jiang	02296f420d	[Bugfix][V1][Minor] Fix shutting_down flag checking in V1 MultiprocExecutor (#14053 )	2025-02-28 22:31:01 -08:00
YajieWang	6a92ff93e1	[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931 )	2025-02-28 22:30:59 -08:00
Jee Jee Li	6a84164add	[Bugfix] Add file lock for ModelScope download (#14060 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-01 06:10:28 +00:00
Brayden Zhong	f64ffa8c25	[Docs] Add `pipeline_parallel_size` to optimization docs (#14059 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-01 05:43:54 +00:00
Luka Govedič	bd56c983d6	[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-02-28 16:20:11 -07:00
Rui Qiao	084bbac8cc	[core] Bump ray to 2.43 (#13994 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-02-28 21:47:44 +00:00
Chen Zhang	28943d36ce	[v1] Move block pool operations to a separate class (#13973 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-28 20:53:31 +00:00
Andrey Talman	b526ca6726	Add RELEASE.md (#13926 ) Signed-off-by: atalman <atalman@fb.com>	2025-02-28 12:25:50 -08:00
Chen Zhang	e7bd944e08	[v1] Cleanup the BlockTable in InputBatch (#13977 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-02-28 19:03:16 +00:00
iefgnoix	c3b6559a10	[V1][TPU] Integrate the new ragged paged attention kernel with vLLM v1 on TPU (#13379 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-02-28 11:01:36 -07:00
Harry Mellor	4be4b26cb7	Fix entrypoint tests for embedding models (#14052 )	2025-02-28 08:56:44 -08:00
Brayden Zhong	2aed2c9fa7	[Doc] Fix ROCm documentation (#14041 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-02-28 16:42:07 +00:00
Yang Liu	9b61dd41e7	[Bugfix] Initialize attention bias on the same device as Query/Key/Value for QwenVL Series (#14031 )	2025-02-28 07:36:08 -08:00
Cyrus Leung	f7bee5c815	[VLM][Bugfix] Enable specifying prompt target via index (#14038 )	2025-02-28 07:35:55 -08:00
Jee Jee Li	e0734387fb	[Bugfix] Fix MoeWNA16Method activation (#14024 )	2025-02-28 15:22:42 +00:00
Harry Mellor	f58f8b5c96	Update AutoAWQ docs (#14042 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-28 15:20:29 +00:00
Thibault Schueller	b3f7aaccd0	[V1][Minor] Restore V1 compatibility with LLMEngine class (#13090 )	2025-02-28 00:52:25 -08:00
Kacper Pietkun	b91660ddb8	[Hardware][Intel-Gaudi] Regional compilation support (#13213 )	2025-02-28 00:51:49 -08:00
Harry Mellor	76c89fcadd	Use smaller embedding model when not testing model specifically (#13891 )	2025-02-28 00:50:43 -08:00
Mathis Felardos	b9e41734c5	[Bugfix][Disaggregated] patch the inflight batching on the decode node in SimpleConnector to avoid hangs in SimpleBuffer (nccl based) (#13987 ) Signed-off-by: Mathis Felardos <mathis@mistral.ai>	2025-02-28 07:53:45 +00:00
Cyrus Leung	1088f06242	[Doc] Move multimodal Embedding API example to Online Serving page (#14017 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-28 07:12:04 +00:00
Travis Johnson	73e0225ee9	[Bugfix] Check that number of images matches number of <\|image\|> tokens with mllama (#13911 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-02-28 04:00:45 +00:00
Roger Wang	6c85da3a18	[V1]`SupportsV0Only` protocol for model definitions (#13959 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-27 20:02:15 -05:00
Jee Jee Li	67fc426845	[Misc] Print FusedMoE detail info (#13974 )	2025-02-27 18:53:13 -05:00
Benjamin Chislett	9804145cac	[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict (#13626 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-02-27 15:28:08 -08:00
Lucas Wilkinson	2e94b9cfbb	[Attention] Flash MLA for V1 (#13867 ) Signed-off-by: Yang Chen <yangche@fb.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Yang Chen <yangche@fb.com>	2025-02-27 23:03:41 +00:00
qli88	8294773e48	[core] Perf improvement for DSv3 on AMD GPUs (#13718 ) Signed-off-by: qli88 <qiang.li2@amd.com>	2025-02-27 22:14:30 +00:00
Woosuk Kwon	cd813c6d4d	[V1][Minor] Minor cleanup for GPU Model Runner (#13983 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-27 13:11:40 -08:00
Sage Moore	38acae6e97	[ROCm] Fix the Kernels, Core, and Prefix Caching AMD CI groups (#13970 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-02-27 20:31:47 +00:00
Cyrus Leung	a2dd48c386	[VLM] Deprecate legacy input mapper for OOT multimodal models (#13979 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-27 19:14:55 +00:00
dependabot[bot]	126f6beeb4	Bump azure/setup-helm from 4.2.0 to 4.3.0 (#13742 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-27 19:04:10 +00:00
Yang Chen	58d1b2aa77	[Attention] MLA support for V1 (#13789 ) Signed-off-by: Yang Chen <yangche@fb.com>	2025-02-27 13:14:17 -05:00
Cyrus Leung	f1579b229d	[VLM] Generalized prompt updates for multi-modal processor (#13964 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-27 17:44:25 +00:00
Isotr0py	7864875879	[Bugfix] Fix qwen2.5-vl overflow issue (#13968 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-02-27 17:30:39 +00:00
Noam Gat	1dd422b64a	Update LMFE version to v0.10.11 to support new versions of transforme… (#13930 )	2025-02-27 17:16:12 +00:00
Rui Qiao	06c8f8d885	[bugfix] Fix profiling for RayDistributedExecutor (#13945 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-02-28 01:01:21 +08:00
Harry Mellor	5677c9bb3e	Deduplicate `.pre-commit-config.yaml`'s `exclude` (#13967 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-27 16:27:47 +00:00
王博伟	512d77d582	Update quickstart.md (#13958 )	2025-02-27 16:05:11 +00:00
Szymon Ożóg	7f0be2aa24	[Model] Deepseek GGUF support (#13167 )	2025-02-27 02:08:35 -08:00
Isotr0py	edf309ebbe	[VLM] Support multimodal inputs for Florence-2 models (#13320 )	2025-02-27 02:06:41 -08:00
Michael Goin	788f284b53	Fix test_block_fp8.py test for MoE (#13915 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-27 18:00:00 +08:00
Yang Zheng	4b1d141f49	[PP] Correct cache size check (#13873 ) Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>	2025-02-27 17:47:29 +08:00
Chauncey	10c3b8c1cf	[Misc] fixed 'required' is an invalid argument for positionals (#13948 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-02-27 09:06:49 +00:00
Brayden Zhong	a7f37314b7	[CI/Build] Add examples/ directory to be labelled by `mergify` (#13944 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-02-27 08:24:11 +00:00
Mark McLoughlin	cd711c48b2	[V1][Metrics] Handle preemptions (#13169 )	2025-02-26 20:04:59 -08:00
Sage Moore	378b3ef6f8	[ROCm][V1] Update reshape_and_cache to properly work with CUDA graph padding (#13922 )	2025-02-26 20:04:12 -08:00
Rui Qiao	c9944acbf9	[misc] Rename Ray ADAG to Compiled Graph (#13928 )	2025-02-26 20:03:28 -08:00

... 79 80 81 82 83 ...

8877 Commits