biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Shinichi Hemmi	3badb0213b	[Model] Add PLaMo2 (#14323 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com> Signed-off-by: shemmi <shemmi@preferred.jp> Co-authored-by: Kento Nozawa <nzw0301@preferred.jp> Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp> Co-authored-by: Calvin Metzger <metzger@preferred.jp>	2025-04-15 19:31:30 -07:00
Angky William	fdcb850f14	[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546 ) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>	2025-04-15 22:31:38 +00:00
Dipika Sikka	54a66e5fee	[Misc] Update `compressed-tensors` WNA16 to support zero-points (#14211 )	2025-04-15 07:33:51 -06:00
DefTruth	280d62b8a2	[Kernel] Remove redundant Exp calculations (#16123 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-15 12:58:37 +00:00
Xihui Cang	1666e66443	Add "/server_info" endpoint in api_server to retrieve the vllm_config. (#16572 ) Signed-off-by: Xihui Cang <xihuicang@gmail.com>	2025-04-15 11:50:38 +00:00
Richard Zou	b590adfdc1	Fix vLLM x torch.compile config caching (#16491 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-14 23:11:11 -07:00
Michael Goin	b4fe16c75b	Add `vllm bench [latency, throughput]` CLI commands (#16508 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-14 23:10:35 -07:00
Pooya Davoodi	bc5dd4f669	[Bugfix] Fix broken GritLM model and tests (missing pooling_metadata) (#16631 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2025-04-14 23:09:58 -07:00
Jinzhen Lin	d06ba4ed3f	[Kernel] moe wna16 marlin kernel (#14447 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-14 20:05:22 -07:00
Alex Brooks	6b40996ae8	[Core][Bugfix] Fix Offline MM Beam Search (#16390 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-15 10:33:02 +08:00
Shuqiao Li	d2020acac7	config check sleep mode support oot platforms (#16562 )	2025-04-14 16:31:50 -07:00
Chengji Yao	1eb3c2ed48	[DOC][TPU] Add core idea about avoiding recompilation after warmup (#16614 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-14 21:56:06 +00:00
courage17340	b1308b84a3	[Model][VLM] Add Kimi-VL model support (#16387 ) Signed-off-by: courage17340 <courage17340@163.com>	2025-04-14 21:41:48 +00:00
Nicolò Lucchesi	b3f2fddd17	[TPU][V1] Fix exponential padding when `max-num-batched-tokens` is not a power of 2 (#16596 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-14 17:01:05 +00:00
Cyrus Leung	aa29841ede	[Bugfix] Multi-modal caches not acting like LRU caches (#16593 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-14 09:24:16 -07:00
shangmingc	1dd23386ec	[Misc] Update usage with mooncake lib for kv transfer (#16523 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-04-14 11:31:37 +00:00
DefTruth	ce4ddd2d1a	[Misc] remove warning if triton>=3.2.0 (#16553 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-14 02:39:47 -07:00
Harry Mellor	e51929ebca	Improve configs - `SchedulerConfig` (#16533 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-14 17:24:16 +08:00
Russell Bryant	dc1b4a6f13	[Core][V0] Enable regex support with xgrammar (#13228 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-14 10:13:38 +08:00
Michael Goin	d085a44082	Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) (#16537 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-13 14:55:18 +00:00
Lily Liu	f49e5aff11	[V1][Spec Decode] KV cache slots for eagle heads (#16370 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-12 19:42:51 -07:00
Ryan McConville	6c11ecf8d3	[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (#16529 ) Signed-off-by: Ryan McConville <ryan@ryanmcconville.com>	2025-04-12 20:19:19 +00:00
SnowCharm	93e5f3c5fb	[Perf] Optimize Preparing Inputs for GPU Model Runner (#16484 ) Signed-off-by: snowcharm <snowcharmqq@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-12 22:54:37 +08:00
Jie Fu (傅杰)	70363bccfa	Fix syntaxWarning: invalid escape sequence '\s' (#16532 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-04-12 14:39:42 +00:00
Huazhong Ji	68bb122eb4	[MISC] Make GroupCoordinator compatible with out-of-tree devices (#16464 ) Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>	2025-04-12 09:20:25 +00:00
Cyrus Leung	d9fc8cd9da	[V1] Enable multi-input by default (#15799 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-12 08:52:39 +00:00
wang.yuqi	fbf722c6e6	[Frontend] support matryoshka representation / support embedding API dimensions (#16331 )	2025-04-11 23:23:10 -07:00
leon-seidel	e92d7085bf	[Feature][V1] Add xgrammar to support minLength, maxLength with test (#16516 ) Signed-off-by: Leon Seidel <leon.seidel@fau.de>	2025-04-11 23:22:07 -07:00
Michael Goin	bd6028d6b0	Optimized topk for topk=1 (Llama-4) (#16512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-12 14:21:08 +08:00
Nick Hill	41cc883c29	[BugFix] Handle non-contiguous tensors properly when serializing (#16492 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-11 17:54:06 -07:00
Michael Goin	87b836ba77	Bugfix for PixtralHF models without spatial_merge_size (#16513 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 23:32:22 +00:00
rongfu.leng	56c76c2e0e	[Bugfix] clean up duplicated code (#16485 ) Signed-off-by: Gogs <gogs@fake.local> Co-authored-by: Gogs <gogs@fake.local>	2025-04-11 23:19:40 +00:00
Yong Hoon Shin	a3bf8d4a2b	[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (#16488 )	2025-04-12 06:26:55 +08:00
Ye (Charlotte) Qi	16eda8c43a	[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>	2025-04-12 06:26:17 +08:00
Harry Mellor	cd77382ac1	Improve configs - `LoadConfig` (#16422 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-11 20:27:27 +00:00
Travis Johnson	71b9cde010	[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2025-04-11 19:59:50 +00:00
Michael Goin	f41647ee6b	[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 17:54:08 +00:00
Nicolò Lucchesi	4d022cbc75	[TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (#16483 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-11 17:06:14 +00:00
Richard Zou	70de35a881	Fix erroneous "model doesn't support compile" warning (#16486 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-11 16:24:36 +00:00
Tomasz Zielinski	34b2cf3b33	[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779 ) Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com>	2025-04-11 07:38:36 -07:00
chaow-amd	9e90c9f73f	[Bugfix] Fix bugs of running Quark quantized models (#16236 ) Signed-off-by: chaow <chaow@amd.com>	2025-04-11 10:18:32 -04:00
DefTruth	e9528f6dc6	[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-11 06:50:50 -06:00
Jee Jee Li	a26f59ccbc	[Misc] Raise error for V1 not supporting Long LoRA. (#16415 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 01:51:20 -07:00
Michael Goin	aa3b3d76e0	Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-11 08:09:52 +00:00
Jee Jee Li	f7030df3be	[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-11 15:32:37 +08:00
DefTruth	905e91e9ac	Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453 )	2025-04-11 06:44:22 +00:00
Alex Brooks	f8f9c0ba62	[Bugfix] Don't set an upper bound on repetition penalty (#16403 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-11 14:19:40 +08:00
Yong Hoon Shin	99ef59cf7f	[Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-10 21:26:07 -07:00
Nicolò Lucchesi	3cc9af88ff	[TPU][V1] Disable per-request seed/Generator (#16172 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-10 17:05:44 -04:00
Cyrus Leung	56d4aefa33	[VLM] Avoid unnecessary dummy multimodal data during processing (#16416 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-10 19:32:14 +00:00

1 2 3 4 5 ...

3957 Commits