biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
youkaichao	555aa21905	[V1] Fully Transparent Implementation of CPU Offloading (#15354 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-31 20:22:34 +08:00
shangmingc	d03308be0c	[Misc] Remove stale func in KVTransferConfig (#14746 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-03-28 17:33:32 +00:00
Jee Jee Li	726efc6a32	[Quantization][V1] BitsAndBytes support V1 (#15611 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-28 10:12:47 +08:00
Nick Hill	15dac210f0	[V1] AsyncLLM data parallel (#13923 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-27 16:14:41 -07:00
wang.yuqi	3f532cb6a6	[Misc] Use model_redirect to redirect the model name to a local folder. (#14116 )	2025-03-27 02:21:23 -07:00
Rui Qiao	df8d3d1287	[Misc] Restrict ray version dependency and update PP feature warning in V1 (#15556 )	2025-03-27 06:21:07 +00:00
Matthew Vine	7a6d45bc8a	Support FIPS enabled machines with MD5 hashing (#15299 ) Signed-off-by: Matthew Vine <32849887+MattTheCuber@users.noreply.github.com>	2025-03-26 20:19:46 -04:00
marko	27df5199d9	Support SHA256 as hash function in prefix caching (#15297 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-03-26 11:11:28 -07:00
Bryan Lu	781d056280	[Feature] Enhance EAGLE Architecture with Proper RMS Norms (#14990 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-26 08:24:07 +00:00
youkaichao	d0cfec7ab9	[bugfix] fix inductor cache on max_position_embeddings (#15436 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-25 07:05:39 -07:00
Russell Bryant	a09ad90a72	[V1] guidance backend for structured output + `auto` fallback mode (#14779 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>	2025-03-24 21:02:33 -07:00
Jee Jee Li	6db94571d7	[Misc] Remove LoRA log (#15388 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-24 20:43:48 -07:00
Manish Sethi	761702fd19	[Core] Integrate `fastsafetensors` loader for loading model weights (#10647 ) Signed-off-by: Manish Sethi <Manish.sethi1@ibm.com>	2025-03-24 08:08:02 -07:00
Luka Govedič	f622dbcf39	[Fix] [torch.compile] Improve UUID system for custom passes (#15249 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-03-24 01:54:07 +00:00
Lucas Wilkinson	dccf535f8e	[V1] Enable V1 Fp8 cache for FA3 in the oracle (#15191 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-23 15:07:04 -07:00
Roger Wang	9c5c81b0da	[Misc][Doc] Add note regarding loading `generation_config` by default (#15281 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-23 14:00:55 -07:00
Woosuk Kwon	bc8ed3c4ba	[V1][Spec Decode] Use better defaults for N-gram (#15358 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-23 10:52:30 -07:00
shangmingc	50c9636d87	[V1][Usage] Refactor speculative decoding configuration and tests (#14434 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-03-22 19:28:10 -10:00
Russell Bryant	b877031d80	Remove openvino support in favor of external plugin (#15339 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-22 14:06:39 -07:00
wwl2755	1c2bec0f82	[Doc] add load_format items in docs (#14804 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-03-21 22:36:43 -07:00
Woosuk Kwon	2b22290ce0	[V1] Add flag to disable cascade attention (#15243 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-20 15:24:16 -07:00
Wang Ran (汪然)	bfe2fe0af4	typo: Update config.py (#15189 )	2025-03-19 23:31:21 -07:00
Matt Ritter	a8652f4f0f	Enable CUDA graph support for llama 3.2 vision (#14917 ) Signed-off-by: Matt Ritter <100659061+mritterfigma@users.noreply.github.com>	2025-03-19 23:29:16 -07:00
Russell Bryant	1f16b7fe74	[Core][V0] Add guidance backend for structured output (#14589 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <lohuynh@microsoft.com> Co-authored-by: Michal Moskal <michal@moskal.me> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-19 21:33:51 -07:00
Alexander Matveev	cfbca8a2f2	[V1] TPU - Tensor parallel MP support (#15059 )	2025-03-20 00:55:18 +00:00
Cyrus Leung	f690372b68	[Core] Update dtype detection and defaults (#14858 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 13:49:33 +08:00
Jee Jee Li	46c759c165	[Bugfix] Fix LoRA extra vocab size (#15047 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-18 09:40:29 -07:00
yury-tokpanov	452e8fd968	[MODEL] Add support for Zamba2 models (#13185 ) Signed-off-by: Yury Tokpanov <yury@zyphra.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-18 08:56:21 -07:00
hoshi-hiyouga	414919138b	[Bugfix] torchrun compatibility (#14899 ) Signed-off-by: hiyouga <hiyouga@buaa.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-03-18 05:49:27 -07:00
Robert Shaw	d4d93db2c5	[V1] V1 Enablement Oracle (#13726 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-03-14 22:02:20 -07:00
Michael Goin	14f301b541	Update to torch==2.6.0 (#12721 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: luka <luka@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-14 16:58:30 -04:00
Varun Sundar Rabindranath	0b1cfa6180	[Kernel] LoRA - Enable CUDAGraphs for V1 (#14626 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-13 20:42:04 -07:00
Cyrus Leung	f53a0586b9	[Bugfix] Fix prompt format of GLM4V (#14539 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-13 11:37:17 +00:00
Mathis Felardos	1bd32bc8dd	[Config][Disaggregated] Add timeout configuration for the torch.store and add KVTransferConfig.kv_connector_extra_config (#14367 ) Signed-off-by: Mathis Felardos <mathis@mistral.ai>	2025-03-12 20:15:20 -07:00
Woosuk Kwon	53be4a8634	[V1] Allow sliding window + prefix caching (#13069 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-12 11:21:19 -07:00
Sage Moore	d9f83d6206	[ROCm] Enable chunked prefill/paged attention in MLA on ROCm (#14316 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-03-12 15:51:20 +00:00
Woosuk Kwon	c0c25e25fa	[Model] Add support for Gemma 3 (#14660 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-12 08:36:33 -07:00
Pavani Majety	debd6bbf09	[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-03-12 05:13:11 +00:00
Roger Wang	1fc973c0b5	[V1][Core] Fix memory issue with logits & sampling (#14508 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com>	2025-03-11 04:03:41 +00:00
Harry Mellor	3b352a2f92	Correct capitalisation: `VLLM` -> `vLLM` (#14562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-10 16:36:21 +00:00
Aaron Pham	0b7f06b447	[Misc] add `use_tqdm_on_load` to reduce logs (#14407 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-08 05:57:46 -08:00
Harry Mellor	47512b3200	Default to `generation_config` from model (#12622 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-08 14:46:15 +08:00
Cyrus Leung	05fb6718f0	[Bugfix] Clean up multi-modal processors (#14417 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-07 10:33:38 +00:00
Tyler Michael Smith	cc2f9b32c8	[Distributed] Add enable_expert_parallel arg (#14305 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-06 18:54:45 +00:00
youkaichao	151b08e0fe	[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-03-07 00:32:46 +08:00
Congcong Chen	0a995d5434	[Model] New model support for Phi-4-multimodal-instruct (#14119 )	2025-03-04 20:57:01 -08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Ce Gao	bf33700ecd	[v0][structured output] Support reasoning output (#12955 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-03-02 14:49:42 -05:00
Luka Govedič	bd56c983d6	[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-02-28 16:20:11 -07:00
Roger Wang	6c85da3a18	[V1]`SupportsV0Only` protocol for model definitions (#13959 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-27 20:02:15 -05:00

... 3 4 5 6 7 ...

671 Commits