biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Harry Mellor	4c2b38ce9e	Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 12:46:04 +00:00
wang.yuqi	de65fc8e1e	[CI] improve embed testing (#18747 )	2025-05-28 00:16:35 -07:00
Cyrus Leung	0c492b7824	[Deprecation] Remove fallbacks for Embeddings API (#18795 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-28 15:09:04 +08:00
wang.yuqi	3e9ce609bd	[Bugfix] Fix nomic max_model_len (#18755 )	2025-05-27 20:29:53 -07:00
Hyogeun Oh (오효근)	a68e293cb9	[Doc] Convert Sphinx directives ( `{class}`, `{meth}`, `{attr}`, ...) to MkDocs format for better documentation linking (#18663 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-05-27 01:44:20 -07:00
Cyrus Leung	61a45e7a72	[Bugfix] Fix Mistral-format models with sliding window (#18693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 01:44:04 -07:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Cyrus Leung	7d9216495c	[Doc] Update references to doc files (#18637 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 15:49:21 -07:00
Jiayi Yao	2628a69e35	[V1] Support Deepseek MTP (#18435 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn> Co-authored-by: Rui Qiao <ruisearch42@gmail.com>	2025-05-23 10:26:28 -07:00
Cyrus Leung	273cb3b4d9	[Doc] Fix top-level API links/docs (#18621 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-23 09:46:56 -07:00
cascade	71ea614d4a	[Feature]Add async tensor parallelism using compilation pass (#17882 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-23 01:03:34 -07:00
aws-elaineyz	ed5d408255	[Neuron] Remove bypass on EAGLEConfig and add a test (#18514 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-22 21:26:32 -07:00
lkchen	e44d8ce8c7	[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-05-23 02:54:42 +00:00
Harry Mellor	4b0da7b60e	Enable hybrid attention models for Transformers backend (#18494 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 10:12:08 +08:00
wangxiyuan	721fb9b181	[Platform] Move platform check to right place (#18470 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-22 12:11:28 -07:00
Kebe	5d7f545204	[Frontend] deprecate `--device` arg (#18399 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-05-21 01:21:17 -07:00
Michael Goin	f4a8a37465	[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-20 09:08:37 -07:00
cascade	9ab2c02ff8	Support sequence parallelism combined with pipeline parallelism (#18243 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-17 22:47:25 +00:00
David Ben-David	3e0d435027	[P/D][V1] Support dynamic loading of external KV connector implementations (#18142 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-05-17 06:40:39 +00:00
Lucia Fang	3d2779c29a	[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 22:28:27 -07:00
Nicolò Lucchesi	e3f3aee6f4	[Misc] Avoid cuda graph log when sizes still match (#18202 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-05-15 09:59:38 -07:00
omahs	a9944aabfa	fix: typos (#18151 ) Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>	2025-05-15 02:16:15 -07:00
Luka Govedič	83f74c698f	[Fix][ROCm] Enforce eager for all encoder-decoder models on ROCm (#18154 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-05-14 22:04:43 -07:00
Aaron Pham	2fc9075b82	[V1] Structured Outputs + Thinking compatibility (#16577 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 15:45:24 -07:00
Jon Gill	754b699cbe	[Bug]: Fix S3 model/tokenizer path resolution (#18083 ) Signed-off-by: Jon Gill <jon@yurts.ai>	2025-05-13 19:34:17 -07:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Woosuk Kwon	2ff297dce9	[BugFix] Set default random seed to 0 for V1 (#17929 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-13 07:52:19 +00:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
Harry Mellor	d67085c2c8	Remove noisy warnings from `SchedulerConfig` (#17995 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 00:33:45 +00:00
bwshen-mi	acee8f48aa	[Model] Support MiMo-7B inference with MTP (#17433 ) Signed-off-by: wp-alpha <wangpeng66@xiaomi.com> Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com>	2025-05-12 23:25:33 +00:00
Robert Shaw	d19110204c	[P/D] NIXL Integration (#17751 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Brent Salisbury <bsalisbu@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>	2025-05-12 09:46:16 -07:00
Harry Mellor	68311891f5	Don't default construct `ModelConfig` when default constructing `VllmConfig` (#17943 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-10 13:23:00 +00:00
Harry Mellor	4b2ed7926a	Improve configs - the rest! (#17562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-09 15:18:44 -07:00
Isotr0py	5c4c08f6f1	[Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config (#17265 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-09 17:16:12 +00:00
inkcherry	5b2dcbf0b8	Fix Whisper crash caused by invalid`` `max_num_batched_tokens``` config (#17853 ) Signed-off-by: inkcherry <mingzhi.liu@intel.com>	2025-05-09 09:16:26 +00:00
Chanh Nguyen	7ea2adb802	[Core] Support full cuda graph in v1 (#16072 ) Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>	2025-05-07 22:30:15 -07:00
Akshat Tripathi	c20ef40fd0	[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-05-07 16:28:47 -04:00
Satyajith Chilappagari	043e4c4955	Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Aaron Dou <yzdou@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Chongming Ni <chongmni@amazon.com> Co-authored-by: Amulya Ballakur <amulyaab@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Lin Lin Pan <tailinpa@amazon.com> Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>	2025-05-07 00:07:30 -07:00
Jee Jee Li	822de7fb94	[Misc] Split model loader (#17712 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-07 12:42:26 +08:00
Cyrus Leung	2858830c39	[Bugfix] Prioritize dtype in root config before checking text config (#17629 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 12:43:05 +00:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Cyrus Leung	887d7af882	[Core] Gate `prompt_embeds` behind a feature flag (#17607 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 00:19:20 +08:00
Chenyaaang	87baebebd8	[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-02 21:42:44 -07:00
Harry Mellor	785d75a03b	Automatically tell users that dict args must be valid JSON in CLI (#17577 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-02 05:24:55 -07:00
Jerry Zhang	109e15a335	Add `pt_load_map_location` to allow loading to cuda (#16869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-05-01 23:23:42 -07:00
Chen Xia	173daac19d	[Bug]change the position of cuda_graph_sizes in dataclasses (#17548 ) Signed-off-by: CXIAAAAA <cxia0209@gmail.com>	2025-05-01 11:52:37 -07:00
Cyrus Leung	9b1769dd9a	[Bugfix] Fix lint error (#17547 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 11:12:19 -07:00
Chen Xia	61c299f81f	[Misc]add configurable cuda graph size (#17201 ) Signed-off-by: CXIAAAAA <cxia0209@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-01 11:04:50 -07:00
Harry Mellor	6768ff4a22	Move the last arguments in `arg_utils.py` to be in their final groups (#17531 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-01 10:31:44 -07:00
Chauncey	98060b001d	[Feature][Frontend]: Deprecate --enable-reasoning (#17452 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-01 06:46:16 -07:00

1 2 3 4 5 ...

591 Commits