biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	3f3e92e1f2	[Model] Automatic conversion of classification and reward models (#11469 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 18:22:22 +00:00
Jee Jee Li	196c34b0ac	[Misc] Move weights mapper (#11443 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-24 13:05:25 +00:00
Jee Jee Li	b1b1038fbd	[Bugfix] Fix Qwen2-VL LoRA weight loading (#11430 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-24 09:56:10 +00:00
Cyrus Leung	9edca6bf8f	[Frontend] Online Pooling API (#11457 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 17:54:30 +08:00
Rui Qiao	a491d6f535	[V1] TP Ray executor (#11107 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-12-23 23:00:12 +00:00
Michael Goin	63afbe9215	[CI] Expand OpenAI test_chat.py guided decoding tests (#11048 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-23 18:35:38 +00:00
Dipika Sikka	8cef6e02dc	[Misc] add w8a8 asym models (#11075 )	2024-12-23 13:33:20 -05:00
Michael Goin	5bfb30a529	[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-23 23:06:20 +08:00
Jason T. Greene	f1d1bf6288	[Bugfix] Fix fully sharded LoRAs with Mixtral (#11390 ) Signed-off-by: Jason Greene <jason.greene@redhat.com>	2024-12-22 23:25:10 +08:00
Roger Wang	29c748930e	[CI] Fix flaky entrypoint tests (#11403 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-21 21:08:44 -08:00
omer-dayan	995f56236b	[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192 ) Signed-off-by: OmerD <omer@run.ai>	2024-12-20 16:46:24 +00:00
Wallas Henrique	86c2d8fd1c	[Bugfix] Fix spec decoding when seed is none in a batch (#10863 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-12-20 05:15:31 +00:00
Isotr0py	e24113a8fe	[Model] Refactor Qwen2-VL to use merged multimodal processor (#11258 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 16:28:00 +00:00
Yehoshua Cohen	6c7f881541	[Model] Add JambaForSequenceClassification model (#10860 ) Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 22:48:06 +08:00
Yanyi Liu	5aef49806d	[Feature] Add load generation config from model (#11164 ) Signed-off-by: liuyanyi <wolfsonliu@163.com> Signed-off-by: Yanyi Liu <wolfsonliu@163.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-12-19 10:50:38 +00:00
Cyrus Leung	6142ef0ada	[VLM] Merged multimodal processor for Qwen2-Audio (#11303 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 06:14:17 +00:00
Michael Goin	a30482f054	[CI] Expand test_guided_generate to test all backends (#11313 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-19 04:00:38 +00:00
Travis Johnson	17ca964273	[Model] IBM Granite 3.1 (#11307 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-12-19 11:27:24 +08:00
Tyler Michael Smith	5a9da2e6e9	[Bugfix][Build/CI] Fix sparse CUTLASS compilation on CUDA [12.0, 12.2) (#11311 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-19 02:43:30 +00:00
Joe Runde	ca5f54a9b9	[Bugfix] fix minicpmv test (#11304 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-12-18 10:34:26 -08:00
Isotr0py	996aa70f00	[Bugfix] Fix broken phi3-v mm_processor_kwargs tests (#11263 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-18 10:16:40 -08:00
Dipika Sikka	60508ffda9	[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995 ) Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com> Co-authored-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2024-12-18 09:57:16 -05:00
Wallas Henrique	8b79f9e107	[Bugfix] Fix guided decoding with tokenizer mode mistral (#11046 )	2024-12-17 22:34:08 -08:00
Cody Yu	bf8717ebae	[V1] Prefix caching for vision language models (#11187 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2024-12-17 16:37:59 -08:00
Michael Goin	c77eb8a33c	[Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264 )	2024-12-17 16:34:06 -08:00
Joe Runde	2d1b9baa8f	[Bugfix] Fix request cancellation without polling (#11190 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled Details Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled Details Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled Details Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled Details Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled Details Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled Details Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled Details Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled Details	2024-12-17 12:26:32 -08:00
kYLe	66d4b16724	[Frontend] Add OpenAI API support for input_audio (#11027 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-16 22:09:58 -08:00
Michael Goin	0064f697d3	[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-17 11:39:58 +08:00
youkaichao	551603feff	[core] overhaul memory profiling and fix backward compatibility (#10511 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-16 13:32:25 -08:00
Isotr0py	d927dbcd88	[Model] Refactor Ultravox to use merged input processor (#11198 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-16 10:09:53 +00:00
Jani Monoses	bddbbcb132	[Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203 )	2024-12-16 09:56:19 +00:00
Cyrus Leung	b10609e6a1	[Misc] Clean up multi-modal processor (#11207 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-15 06:30:28 +00:00
Cyrus Leung	93abf23a64	[VLM] Fully dynamic prompt replacement in merged input processor (#11199 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 17:52:18 +00:00
Brad Hilton	9c3dadd1c9	[Frontend] Add `logits_processors` as an extra completion argument (#11150 ) Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>	2024-12-14 16:46:42 +00:00
Cyrus Leung	0920ab9131	[Doc] Reorganize online pooling APIs (#11172 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-14 00:22:22 +08:00
Sungjae Lee	c31d4a57a6	[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching (#8240 )	2024-12-13 07:51:25 -08:00
Cyrus Leung	eeec9e3390	[Frontend] Separate pooling APIs in offline inference (#11129 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-13 10:40:07 +00:00
youkaichao	be39e3cd18	[core] clean up cudagraph batchsize padding logic (#10996 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-13 06:57:50 +00:00
Pooya Davoodi	1efce68605	[Bugfix] Use runner_type instead of task in GritLM (#11144 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2024-12-13 04:09:53 +00:00
Luka Govedič	30870b4f66	[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 ) Signed-off-by: luka <luka@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-12-13 03:19:23 +00:00
Cody Yu	78ed8f57d8	[Misc][V1] Fix type in v1 prefix caching (#11151 )	2024-12-13 00:57:40 +00:00
Jiaxin Shan	85362f028c	[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094 ) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-12 09:25:16 +00:00
youkaichao	62de37a38e	[core][distributed] initialization from StatelessProcessGroup (#10986 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-12 09:04:19 +00:00
Pooya Davoodi	1da8f0e1dd	[Model] Add support for embedding model GritLM (#10816 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2024-12-12 06:39:16 +00:00
Alexander Matveev	4e11683368	[V1] VLM preprocessor hashing (#11020 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Alexander Matveev <alexm@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-12 00:55:30 +00:00
Cyrus Leung	d1e21a979b	[CI/Build] Split up VLM tests (#11083 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-12 06:18:16 +08:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Cyrus Leung	2e33fe4191	[CI/Build] Check transformers v4.47 (#10991 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 05:02:02 +00:00
Mor Zusman	ffa48c9146	[Model] PP support for Mamba-like models (#10992 ) Signed-off-by: mzusman <mor.zusmann@gmail.com>	2024-12-10 21:53:37 -05:00
Aurick Qiao	d5c5154fcf	[Misc] LoRA + Chunked Prefill (#9057 )	2024-12-11 10:09:20 +08:00

... 18 19 20 21 22 ...

2109 Commits