biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Ning Xie	2bb246b8f7	[MISC] add cpu_kvcache_space_bytes to CacheConfig (#19812 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-22 13:39:09 +08:00
wangxiyuan	e773a9e1c2	[Misc] Clean up useless code (#19889 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-06-20 21:09:09 +00:00
Maximilien de Bayser	799397ee4f	Support embedding models in V1 (#16188 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-18 21:36:33 -07:00
Ning Xie	6e9cc73f67	[MISC] correct DeviceConfig device field static type analysis (#19699 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-17 17:21:50 -07:00
Ning Xie	26bc46ef89	[MISC] typo fix (#19672 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-16 07:18:49 +00:00
Ye (Charlotte) Qi	b692e9cd07	[Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config (#19660 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-06-16 06:30:29 +00:00
Woosuk Kwon	055915e6ce	Enable prefix caching with full cuda graphs (#19617 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-15 01:05:05 -07:00
Woosuk Kwon	aafbbd981f	[torch.compile] Use custom ops when use_inductor=False (#19618 )	2025-06-13 15:05:54 -07:00
youkaichao	d70bc7c029	[torch.compile] reorganize the cache directory to support compiling multiple models (#19064 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-06-13 15:23:25 +08:00
Luka Govedič	f98548b9da	[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com>	2025-06-12 08:31:04 -07:00
rasmith	c7ea0b56cd	[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (#17331 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-06-11 15:53:28 -04:00
Richard Zou	77f0d465d0	[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-11 07:54:41 +08:00
Siyuan Liu	3a7cd627a8	[Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration (#19383 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-06-09 16:41:51 -07:00
wang.yuqi	2ffb9b6e07	[Bugfix] model_max_length should consider max_model_len in tokenizer_config (#19201 )	2025-06-08 07:17:53 -07:00
Richard Zou	eaa2e51088	[Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-06-08 08:56:12 +08:00
Richard Zou	da511d54d8	Fix CompilationConfig repr (#19091 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-06-06 16:23:35 +08:00
Chen Zhang	f8a1a2d108	[v1] Hybrid Memory Allocator (#17996 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-05 20:47:09 -07:00
Cyrus Leung	01dc9a76db	[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-04 04:49:20 -07:00
Varun Sundar Rabindranath	fa98d77773	[Kernel] DeepEP dispatch-combine kernel integration (#18434 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-03 12:30:02 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Rui Qiao	bdce64f236	[V1] Support DP with Ray (#18779 )	2025-06-02 21:15:13 -07:00
Siyuan Liu	9112b443a0	[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-06-03 00:06:20 +00:00
Gregory Shtrasberg	ca2f6b9c30	[Bugfix][Model] Attempt to fix eagle in V0. (#18978 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-06-02 08:15:53 -07:00
jennyyyyzhen	ebb1ec9318	[Model] enable data parallel for Llama4 vision encoder (#18368 ) Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com> Co-authored-by: yZhen <yZhen@fb.com> Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>	2025-06-02 19:22:54 +08:00
Cyrus Leung	6aa8f9a4e7	[Core] Rework dtype resolution (#18751 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-01 11:04:23 +08:00
Satyajith Chilappagari	2a50ef5760	[Neuron] Add Multi-Modal model support for Neuron (#18921 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com> Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com> Co-authored-by: FeliciaLuo <luof@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-31 10:39:11 +00:00
Yikun Jiang	3c49dbdd03	Skip device and quant Pydantic validation to make plugin device work (#18843 ) Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-28 20:12:30 -07:00
aws-elaineyz	1661a9c28f	[Doc][Neuron] Update documentation for Neuron (#18868 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-28 19:44:01 -07:00
Richard Zou	26b4fa45be	Add ability to use CUDAGraphs with use_inductor=False (#17345 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-29 10:16:52 +08:00
Harry Mellor	6dbe5b5c93	Remove checks for `None` for fields which should never be `None` (#17985 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 21:32:19 +00:00
Harry Mellor	4c2b38ce9e	Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 12:46:04 +00:00
wang.yuqi	de65fc8e1e	[CI] improve embed testing (#18747 )	2025-05-28 00:16:35 -07:00
Cyrus Leung	0c492b7824	[Deprecation] Remove fallbacks for Embeddings API (#18795 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-28 15:09:04 +08:00
wang.yuqi	3e9ce609bd	[Bugfix] Fix nomic max_model_len (#18755 )	2025-05-27 20:29:53 -07:00
Hyogeun Oh (오효근)	a68e293cb9	[Doc] Convert Sphinx directives ( `{class}`, `{meth}`, `{attr}`, ...) to MkDocs format for better documentation linking (#18663 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-05-27 01:44:20 -07:00
Cyrus Leung	61a45e7a72	[Bugfix] Fix Mistral-format models with sliding window (#18693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 01:44:04 -07:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Cyrus Leung	7d9216495c	[Doc] Update references to doc files (#18637 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 15:49:21 -07:00
Jiayi Yao	2628a69e35	[V1] Support Deepseek MTP (#18435 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn> Co-authored-by: Rui Qiao <ruisearch42@gmail.com>	2025-05-23 10:26:28 -07:00
Cyrus Leung	273cb3b4d9	[Doc] Fix top-level API links/docs (#18621 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 09:46:56 -07:00
cascade	71ea614d4a	[Feature]Add async tensor parallelism using compilation pass (#17882 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-23 01:03:34 -07:00
aws-elaineyz	ed5d408255	[Neuron] Remove bypass on EAGLEConfig and add a test (#18514 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-22 21:26:32 -07:00
lkchen	e44d8ce8c7	[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-05-23 02:54:42 +00:00
Harry Mellor	4b0da7b60e	Enable hybrid attention models for Transformers backend (#18494 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 10:12:08 +08:00
wangxiyuan	721fb9b181	[Platform] Move platform check to right place (#18470 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-22 12:11:28 -07:00
Kebe	5d7f545204	[Frontend] deprecate `--device` arg (#18399 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-05-21 01:21:17 -07:00
Michael Goin	f4a8a37465	[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-20 09:08:37 -07:00
cascade	9ab2c02ff8	Support sequence parallelism combined with pipeline parallelism (#18243 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-17 22:47:25 +00:00
David Ben-David	3e0d435027	[P/D][V1] Support dynamic loading of external KV connector implementations (#18142 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-05-17 06:40:39 +00:00
Lucia Fang	3d2779c29a	[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 22:28:27 -07:00

1 2 3 4 5 ...

671 Commits