biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sijia(Jackson) Chen	92edf35826	[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints (#16674 )	2025-04-17 11:44:34 -07:00
Nicolò Lucchesi	eb5819b2d9	[V1][TPU] Enable Top K (#15489 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Hyesoo Yang <hyeygit@gmail.com> Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>	2025-04-17 18:18:11 +00:00
Nicolò Lucchesi	5989f4684d	[TPU][V1] Fix padding recompilation when `max-num-batched-tokens` is not even (#16726 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-17 18:09:57 +00:00
rongfu.leng	5125d72f02	[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small (#16548 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-17 17:48:31 +00:00
Ximingwang-09	a018e555fd	[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 (#16753 ) Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com> Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-04-18 00:01:30 +08:00
Robin	6211b92273	[Bugfix]Fix index out of range error in api server log (#16787 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-04-17 09:01:07 -07:00
Nick Hill	05fcd1b430	[V1][Perf] Faster incremental detokenization (#15137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-17 07:45:24 -07:00
Insu Kim	7c02d6a137	[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion (#16784 ) Signed-off-by: insukim1994 <insu.kim@moreh.io>	2025-04-17 14:10:08 +00:00
wang.yuqi	11c3b98491	[Doc] Document Matryoshka Representation Learning support (#16770 )	2025-04-17 13:37:37 +00:00
Cyrus Leung	dbe7f07001	[Doc] Make sure to update vLLM when installing latest code (#16781 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-17 06:53:31 -06:00
Reid	c69bf4ee06	fix: hyperlink (#16778 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-17 11:34:20 +00:00
Harry Mellor	d27ea94034	Improve configs - `TokenizerPoolConfig` + `DeviceConfig` (#16603 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 11:19:42 +00:00
Reid	99ed526101	[Misc] refactor examples series - lmcache (#16758 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-17 11:02:35 +00:00
Michael Yao	207da28186	[Doc] Fix a 404 link in installation/cpu.md (#16773 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-04-17 10:46:21 +00:00
intervitens	5b1aca2ae3	[Bugfix] Fix GLM4 model (#16618 ) Signed-off-by: intervitens <intervitens@tutanota.com>	2025-04-17 03:35:07 -07:00
Reid	d8e557b5e5	[doc] add open-webui example (#16747 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-17 18:27:32 +08:00
Cyrus Leung	61a44a0b22	[Doc] Add more tips to avoid OOM (#16765 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-17 09:54:34 +00:00
DefTruth	a6481525b8	[misc] ignore marlin_moe_wna16 local gen codes (#16760 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-17 17:15:14 +08:00
Richard Liaw	8cac35ba43	[Ray] Improve documentation on batch inference (#16609 ) Signed-off-by: Richard Liaw <rliaw@berkeley.edu>	2025-04-16 22:19:26 -07:00
Russell Bryant	9dbf7a2dc1	[V1] Remove log noise when idle (#16735 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-16 21:34:08 -07:00
David Heineman	607029e515	[Bugfix] Revert max_prompt_len validation for decoder-only models. (#16741 ) Signed-off-by: David Heineman <david@davidheineman.com>	2025-04-16 21:33:15 -07:00
Isotr0py	cb072ce93b	[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-17 04:17:39 +00:00
Divakar Verma	95aca283b4	[rocm][V0] fix selection logic for custom PA in V0 (#16426 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-04-16 19:52:11 -07:00
Robert Shaw	2b05b8ce69	[V1][Frontend] Improve Shutdown And Logs (#11737 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:48:34 -07:00
Aaruni Aggarwal	3c776dcefb	Adding vllm buildkite job for IBM Power (#16679 ) Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>	2025-04-17 10:47:47 +08:00
Bryan Lu	2cbd4d2999	[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification (#16636 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-16 19:47:26 -07:00
Staszek Paśko	3092375e27	[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432 ) Signed-off-by: Staszek Pasko <staszek@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-16 19:28:32 -07:00
Harry Mellor	3cd91dc955	Help user create custom model for Transformers backend remote code models (#16719 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 01:05:59 +00:00
Jade Zheng	8a7368e069	[Misc] Remove redundant comment (#16703 ) Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>	2025-04-17 00:44:52 +00:00
Harry Mellor	93e561ec4d	Improve error for structured output backend selection (#16717 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 00:35:35 +00:00
Joe Runde	e1b004839a	[Hardware] Add processor inputs to platform validation (#16680 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-04-16 09:28:42 -07:00
xsank	ee378f3d49	[Model] support modernbert (#16648 ) Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com> Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>	2025-04-16 05:30:15 -07:00
DefTruth	e82ee40de3	[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-16 03:31:39 -07:00
Cyrus Leung	facbe2a114	[Doc] Improve OOM troubleshooting (#16704 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-16 18:29:48 +08:00
Reid	7168920491	[Misc] refactor examples series (#16708 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-16 10:16:36 +00:00
Kay Yan	21378a2323	[CI] Cleanup `additional_dependencies: [toml]` for pre-commit yapf hook (#16405 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-04-16 10:05:31 +00:00
Shanshan Shen	976711d9db	[V1][Structured Output] Move xgrammar related utils to `backend_xgrammar.py` (#16578 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-16 17:01:36 +08:00
Sage Moore	44fa4d556c	[ROCM] Bind triton version to 3.2 in requirements-built.txt (#16664 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-04-16 14:05:28 +08:00
billishyahao	3ac98edcb1	[Feature] add model aware kv ops helper (#16020 ) Signed-off-by: billishyahao <bill.he@amd.com>	2025-04-15 23:00:43 -07:00
Richard Zou	966c742ed2	Disable remote caching when calling compile_fx (#16611 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-15 22:18:28 -07:00
Jee Jee Li	0d7d05f4b6	[Misc] Modify LRUCache touch (#16689 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-16 04:51:38 +00:00
rongfu.leng	96bb8aa68b	[Bugfix] fix gpu docker image mis benchmarks dir (#16628 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-15 21:21:14 -07:00
Shinichi Hemmi	3badb0213b	[Model] Add PLaMo2 (#14323 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com> Signed-off-by: shemmi <shemmi@preferred.jp> Co-authored-by: Kento Nozawa <nzw0301@preferred.jp> Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp> Co-authored-by: Calvin Metzger <metzger@preferred.jp>	2025-04-15 19:31:30 -07:00
Angky William	fdcb850f14	[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546 ) Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local> Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>	2025-04-15 22:31:38 +00:00
Dipika Sikka	54a66e5fee	[Misc] Update `compressed-tensors` WNA16 to support zero-points (#14211 )	2025-04-15 07:33:51 -06:00
DefTruth	280d62b8a2	[Kernel] Remove redundant Exp calculations (#16123 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-15 12:58:37 +00:00
Xihui Cang	1666e66443	Add "/server_info" endpoint in api_server to retrieve the vllm_config. (#16572 ) Signed-off-by: Xihui Cang <xihuicang@gmail.com>	2025-04-15 11:50:38 +00:00
Jee Jee Li	1575c1701a	[CI/Build] Fix LoRA OOM (#16624 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-15 16:38:19 +08:00
Reid	6ae996a873	[Misc] refactor argument parsing in examples (#16635 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-15 08:05:30 +00:00
Richard Zou	b590adfdc1	Fix vLLM x torch.compile config caching (#16491 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-04-14 23:11:11 -07:00

... 169 170 171 172 173 ...

14386 Commits