biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Richard Liaw	8cac35ba43	[Ray] Improve documentation on batch inference (#16609 ) Signed-off-by: Richard Liaw <rliaw@berkeley.edu>	2025-04-16 22:19:26 -07:00
Russell Bryant	9dbf7a2dc1	[V1] Remove log noise when idle (#16735 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-16 21:34:08 -07:00
David Heineman	607029e515	[Bugfix] Revert max_prompt_len validation for decoder-only models. (#16741 ) Signed-off-by: David Heineman <david@davidheineman.com>	2025-04-16 21:33:15 -07:00
Isotr0py	cb072ce93b	[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-17 04:17:39 +00:00
Divakar Verma	95aca283b4	[rocm][V0] fix selection logic for custom PA in V0 (#16426 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-04-16 19:52:11 -07:00
Robert Shaw	2b05b8ce69	[V1][Frontend] Improve Shutdown And Logs (#11737 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:48:34 -07:00
Aaruni Aggarwal	3c776dcefb	Adding vllm buildkite job for IBM Power (#16679 ) Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>	2025-04-17 10:47:47 +08:00
Bryan Lu	2cbd4d2999	[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification (#16636 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-16 19:47:26 -07:00
Staszek Paśko	3092375e27	[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:28:32 -07:00
Harry Mellor	3cd91dc955	Help user create custom model for Transformers backend remote code models (#16719 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 01:05:59 +00:00
Jade Zheng	8a7368e069	[Misc] Remove redundant comment (#16703 ) Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>	2025-04-17 00:44:52 +00:00
Harry Mellor	93e561ec4d	Improve error for structured output backend selection (#16717 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 00:35:35 +00:00
Joe Runde	e1b004839a	[Hardware] Add processor inputs to platform validation (#16680 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-04-16 09:28:42 -07:00
xsank	ee378f3d49	[Model] support modernbert (#16648 ) Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com> Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>	2025-04-16 05:30:15 -07:00
DefTruth	e82ee40de3	[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-16 03:31:39 -07:00
Cyrus Leung	facbe2a114	[Doc] Improve OOM troubleshooting (#16704 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-16 18:29:48 +08:00
Reid	7168920491	[Misc] refactor examples series (#16708 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-16 10:16:36 +00:00
Kay Yan	21378a2323	[CI] Cleanup `additional_dependencies: [toml]` for pre-commit yapf hook (#16405 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-04-16 10:05:31 +00:00
Shanshan Shen	976711d9db	[V1][Structured Output] Move xgrammar related utils to `backend_xgrammar.py` (#16578 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-16 17:01:36 +08:00
Sage Moore	44fa4d556c	[ROCM] Bind triton version to 3.2 in requirements-built.txt (#16664 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-04-16 14:05:28 +08:00
billishyahao	3ac98edcb1	[Feature] add model aware kv ops helper (#16020 ) Signed-off-by: billishyahao <bill.he@amd.com>	2025-04-15 23:00:43 -07:00
Richard Zou	966c742ed2	Disable remote caching when calling compile_fx (#16611 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-15 22:18:28 -07:00
Jee Jee Li	0d7d05f4b6	[Misc] Modify LRUCache touch (#16689 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-16 04:51:38 +00:00
rongfu.leng	96bb8aa68b	[Bugfix] fix gpu docker image mis benchmarks dir (#16628 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-15 21:21:14 -07:00
Shinichi Hemmi	3badb0213b	[Model] Add PLaMo2 (#14323 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com> Signed-off-by: shemmi <shemmi@preferred.jp> Co-authored-by: Kento Nozawa <nzw0301@preferred.jp> Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp> Co-authored-by: Calvin Metzger <metzger@preferred.jp>	2025-04-15 19:31:30 -07:00
Angky William	fdcb850f14	[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546 ) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>	2025-04-15 22:31:38 +00:00
Dipika Sikka	54a66e5fee	[Misc] Update `compressed-tensors` WNA16 to support zero-points (#14211 )	2025-04-15 07:33:51 -06:00
DefTruth	280d62b8a2	[Kernel] Remove redundant Exp calculations (#16123 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-15 12:58:37 +00:00
Xihui Cang	1666e66443	Add "/server_info" endpoint in api_server to retrieve the vllm_config. (#16572 ) Signed-off-by: Xihui Cang <xihuicang@gmail.com>	2025-04-15 11:50:38 +00:00
Jee Jee Li	1575c1701a	[CI/Build] Fix LoRA OOM (#16624 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-15 16:38:19 +08:00
Reid	6ae996a873	[Misc] refactor argument parsing in examples (#16635 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-15 08:05:30 +00:00
Richard Zou	b590adfdc1	Fix vLLM x torch.compile config caching (#16491 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-14 23:11:11 -07:00
Michael Goin	b4fe16c75b	Add `vllm bench [latency, throughput]` CLI commands (#16508 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-14 23:10:35 -07:00
Pooya Davoodi	bc5dd4f669	[Bugfix] Fix broken GritLM model and tests (missing pooling_metadata) (#16631 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2025-04-14 23:09:58 -07:00
Tyler Michael Smith	dbb036cf61	[Bugfix] Fix tests/kernels/test_mamba_ssm_ssd.py (#16623 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-04-15 05:35:38 +00:00
Taneem Ibrahim	70e7ed841d	[BugFix]: Update minimum `pyzmq` version (#16549 ) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2025-04-14 20:06:03 -07:00
Jinzhen Lin	d06ba4ed3f	[Kernel] moe wna16 marlin kernel (#14447 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-04-14 20:05:22 -07:00
Alex Brooks	6b40996ae8	[Core][Bugfix] Fix Offline MM Beam Search (#16390 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-15 10:33:02 +08:00
Shuqiao Li	d2020acac7	config check sleep mode support oot platforms (#16562 )	2025-04-14 16:31:50 -07:00
Chengji Yao	1eb3c2ed48	[DOC][TPU] Add core idea about avoiding recompilation after warmup (#16614 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-04-14 21:56:06 +00:00
Siyuan Liu	c64ee87267	[Hardware][TPU] Add torchvision to tpu dependency file (#16616 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-04-14 17:50:46 -04:00
courage17340	b1308b84a3	[Model][VLM] Add Kimi-VL model support (#16387 ) Signed-off-by: courage17340 <courage17340@163.com>	2025-04-14 21:41:48 +00:00
Nishan Acharya	7b5ecf79bd	s390x: Fix PyArrow build and add CPU test script for Buildkite CI (#16036 ) Signed-off-by: Nishan Acharya <Nishan.Acharya@ibm.com>	2025-04-14 10:55:32 -07:00
Harry Mellor	9883a18859	Fix triton install condition on CPU (#16600 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-14 17:06:01 +00:00
Nicolò Lucchesi	b3f2fddd17	[TPU][V1] Fix exponential padding when `max-num-batched-tokens` is not a power of 2 (#16596 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-14 17:01:05 +00:00
Cyrus Leung	aa29841ede	[Bugfix] Multi-modal caches not acting like LRU caches (#16593 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-14 09:24:16 -07:00
Md. Shafi Hussain	6bf27affb6	[fix]: Dockerfile.ppc64le fixes for opencv-python and hf-xet (#16048 ) Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>	2025-04-14 17:08:39 +01:00
shangmingc	1dd23386ec	[Misc] Update usage with mooncake lib for kv transfer (#16523 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-04-14 11:31:37 +00:00
Reid	7cbfc10943	[Misc] refactor examples (#16563 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-14 09:59:15 +00:00
DefTruth	ce4ddd2d1a	[Misc] remove warning if triton>=3.2.0 (#16553 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-14 02:39:47 -07:00

1 2 3 4 5 ...

5868 Commits