biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Chen Zhang	f8a1a2d108	[v1] Hybrid Memory Allocator (#17996 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-05 20:47:09 -07:00
Kebe	ef3f98b59f	[Bugfix] fix v1 cpu worker fails on macOS (#19121 )	2025-06-04 20:17:38 +00:00
Kaixi Hou	41aa578428	[NVIDIA] Add Cutlass MLA backend (#17625 )	2025-06-03 21:40:26 -07:00
Li, Jiang	4555143ea7	[CPU] V1 support for the CPU backend (#16441 )	2025-06-03 18:43:01 -07:00
Harry Mellor	6865fe0074	Fix interaction between `Optional` and `Annotated` in CLI typing (#19093 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yikun Jiang <yikun@apache.org>	2025-06-03 21:07:19 +00:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Reid	17430e3653	[bugfix] small fix logic issue (#18999 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-03 05:35:12 +00:00
Rui Qiao	bdce64f236	[V1] Support DP with Ray (#18779 )	2025-06-02 21:15:13 -07:00
jennyyyyzhen	ebb1ec9318	[Model] enable data parallel for Llama4 vision encoder (#18368 ) Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com> Co-authored-by: yZhen <yZhen@fb.com> Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>	2025-06-02 19:22:54 +08:00
Michael Goin	2ad6194a02	Let max_num_batched_tokens use human_readable_int for large numbers (#18968 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-01 11:41:29 +08:00
Michael Goin	d54af615d5	[Bugfix] Fix PP default fallback behavior for V1 (#18915 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-30 10:13:17 +08:00
Harry Mellor	4c2b38ce9e	Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 12:46:04 +00:00
Isotr0py	1f1b1bc03b	[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-27 04:40:28 +00:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Cyrus Leung	7d9216495c	[Doc] Update references to doc files (#18637 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 15:49:21 -07:00
Jiayi Yao	2628a69e35	[V1] Support Deepseek MTP (#18435 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn> Co-authored-by: Rui Qiao <ruisearch42@gmail.com>	2025-05-23 10:26:28 -07:00
Sanger Steel	c32e249a23	[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-05-22 18:44:18 -07:00
Hyogeun Oh (오효근)	2b16104557	[Misc] Update deprecation message for `--enable-reasoning` (#18404 )	2025-05-21 07:33:11 -07:00
Kebe	5d7f545204	[Frontend] deprecate `--device` arg (#18399 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-05-21 01:21:17 -07:00
Woosuk Kwon	fabe89bbc4	[Spec Decode] Don't fall back to V0 when spec decoding is enabled (#18265 )	2025-05-16 16:10:27 -07:00
Lucia Fang	3d2779c29a	[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 22:28:27 -07:00
Harry Mellor	b18201fe06	Allow users to pass arbitrary JSON keys from CLI (#18208 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-15 21:05:34 -07:00
Sky Lee	f4937a51c1	[Model] vLLM v1 supports Medusa (#17956 ) Signed-off-by: lisiqi23 <lisiqi23@xiaomi.com> Signed-off-by: skylee-01 <497627264@qq.com> Co-authored-by: lisiqi23 <lisiqi23@xiaomi.com>	2025-05-15 21:05:31 -07:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
Reid	009b3d5382	[Misc] not show --model in vllm serve --help (#16691 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-11 08:47:58 +00:00
Gregory Shtrasberg	06c0922a69	[FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 (#17870 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-11 15:58:45 +08:00
Kuntai Du	9112155283	[Perf] Use small max_num_batched_tokens for A100 (#17885 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-05-11 07:53:23 +00:00
Harry Mellor	4b2ed7926a	Improve configs - the rest! (#17562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 15:18:44 -07:00
vllmellm	3c9396a64f	[FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>	2025-05-09 10:42:05 +08:00
Harry Mellor	646a31e51e	Fix and simplify `deprecated=True` CLI `kwarg` (#17781 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-07 16:51:06 +01:00
Jee Jee Li	ba7703e659	[Misc] Remove qlora_adapter_name_or_path (#17699 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-06 23:10:37 -07:00
Aaron Pham	175bda67a1	[Feat] Add deprecated=True to CLI args (#17426 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-06 08:11:27 -07:00
Li, Jiang	a6fed02068	[V1][PP] Support PP for MultiprocExecutor (#14219 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-05-06 07:58:05 -07:00
Michael Goin	d419aa5dc4	[V1] Enable TPU V1 backend by default (#17673 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-06 06:49:49 -07:00
Cyrus Leung	46fae69cf0	[Misc] V0 fallback for `--enable-prompt-embeds` (#17615 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-03 22:59:24 +00:00
Cyrus Leung	887d7af882	[Core] Gate `prompt_embeds` behind a feature flag (#17607 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 00:19:20 +08:00
Chenyaaang	87baebebd8	[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-02 21:42:44 -07:00
Harry Mellor	785d75a03b	Automatically tell users that dict args must be valid JSON in CLI (#17577 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-02 05:24:55 -07:00
Jerry Zhang	109e15a335	Add `pt_load_map_location` to allow loading to cuda (#16869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-05-01 23:23:42 -07:00
Chen Xia	61c299f81f	[Misc]add configurable cuda graph size (#17201 ) Signed-off-by: CXIAAAAA <cxia0209@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-01 11:04:50 -07:00
Harry Mellor	6768ff4a22	Move the last arguments in `arg_utils.py` to be in their final groups (#17531 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-01 10:31:44 -07:00
Chauncey	98060b001d	[Feature][Frontend]: Deprecate --enable-reasoning (#17452 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-01 06:46:16 -07:00
Harry Mellor	a257d9bccc	Improve configs - `ObservabilityConfig` (#17453 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-01 03:52:05 -07:00
Alec	0be6d05b5e	[V1][Metrics] add support for kv event publishing (#16750 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-04-30 07:44:45 -07:00
Harry Mellor	13698db634	Improve configs - `ModelConfig` (#17130 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-30 10:38:22 +08:00
Harry Mellor	a6977dbd15	Simplify (and fix) passing of guided decoding backend options (#17008 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:02:23 +00:00
Harry Mellor	2ef5d106bb	Improve literal dataclass field conversion to argparse argument (#17391 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 16:25:08 +00:00
Hyogeun Oh (오효근)	193e78e35d	[Fix] Documentation spacing in compilation config help text (#17342 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-04-29 00:16:17 -07:00
Cyrus Leung	ebb3930d28	[Misc] Move config fields to MultiModalConfig (#17343 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-29 06:37:21 +00:00

1 2 3 4 5 ...

417 Commits