biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
zifeitong	e3dd0692fa	[BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (#8250 )	2024-09-25 05:53:43 +00:00
Alex Brooks	8ff7ced996	[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-24 07:36:46 +00:00
Peter Salas	3f06bae907	[Core][Model] Support loading weights by ID within models (#7931 )	2024-09-24 07:14:15 +00:00
Jani Monoses	f2bd246c17	[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (#8707 )	2024-09-23 14:43:09 +00:00
Yanyi Liu	a79e522984	[Model] Support pp for qwen2-vl (#8696 )	2024-09-23 13:46:59 +00:00
litianjian	5b59532760	[Model][VLM] Add LLaVA-Onevision model support (#8486 ) Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-22 10:51:44 -07:00
Cyrus Leung	06ed2815e2	[Model] Refactor BLIP/BLIP-2 to support composite model loading (#8407 )	2024-09-22 12:24:21 +00:00
Isotr0py	13d88d4137	[Bugfix] Refactor composite weight loading logic (#8656 )	2024-09-22 04:33:27 +00:00
Divakar Verma	9dc7c6c7f3	[dbrx] refactor dbrx experts to extend FusedMoe class (#8518 )	2024-09-21 15:09:39 -06:00
Cyrus Leung	5e85f4f82a	[VLM] Use `SequenceData.from_token_counts` to create dummy data (#8687 )	2024-09-20 23:28:56 -07:00
zyddnys	0f961b3ce9	[Bugfix] Fix incorrect llava next feature size calculation (#8496 )	2024-09-20 22:48:32 +00:00
Niklas Muennighoff	3b63de9353	[Model] Add OLMoE (#7922 )	2024-09-20 09:31:41 -07:00
Amit Garg	18ae428a0d	[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571 )	2024-09-20 08:54:02 +08:00
Geun, Lim	e18749ff09	[Model] Support Solar Model (#8386 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-18 11:04:00 -06:00
Aaron Pham	9d104b5beb	[CI/Build] Update Ruff version (#8469 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-18 11:00:56 +00:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
Joe Runde	98f9713399	[Bugfix] Fix TP > 1 for new granite (#8544 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-17 23:17:08 +00:00
sroy745	1009e93c5d	[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631 )	2024-09-17 07:35:01 -07:00
Chris	3724d5f6b5	[Bugfix][Model] Fix Python 3.8 compatibility in Pixtral model by updating type annotations (#8490 )	2024-09-15 04:20:05 +00:00
ywfang	8a0cf1ddc3	[Model] support minicpm3 (#8297 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-14 14:50:26 +00:00
Jee Jee Li	06311e2956	[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 (#8442 )	2024-09-13 07:58:28 +00:00
Wenxiang	a480939e8e	[Bugfix] Fix weight loading issue by rename variable. (#8293 )	2024-09-12 19:25:00 -04:00
Patrick von Platen	d31174a4e1	[Hotfix][Pixtral] Fix multiple images bugs (#8415 )	2024-09-12 15:21:51 -07:00
Roger Wang	c16369455f	[Hotfix][Core][VLM] Disable chunked prefill by default and prefix caching for multimodal models (#8425 )	2024-09-12 14:06:51 -07:00
Alex Brooks	c6202daeed	[Model] Support multiple images for qwen-vl (#8247 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-12 10:10:54 -07:00
Isotr0py	e56bf27741	[Bugfix] Fix InternVL2 inference with various num_patches (#8375 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-12 10:10:35 -07:00
Blueyo0	1bf2dd9df0	[Gemma2] add bitsandbytes support for Gemma2 (#8338 )	2024-09-11 21:53:12 -07:00
Patrick von Platen	d394787e52	Pixtral (#8377 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-11 14:41:55 -07:00
bnellnm	73202dbe77	[Kernel][Misc] register ops to prevent graph breaks (#6917 ) Co-authored-by: Sage Moore <sage@neuralmagic.com>	2024-09-11 12:52:19 -07:00
Yang Fan	3b7fea770f	[Model][VLM] Add Qwen2-VL model support (#7905 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-11 09:31:19 -07:00
Yangshen⚡Deng	6a512a00df	[model] Support for Llava-Next-Video model (#7559 ) Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-10 22:21:36 -07:00
Isotr0py	1230263e16	[Bugfix] Fix InternVL2 vision embeddings process with pipeline parallel (#8299 )	2024-09-11 10:11:01 +08:00
Jee Jee Li	e497b8aeff	[Misc] Skip loading extra bias for Qwen2-MOE GPTQ models (#8329 )	2024-09-10 20:59:19 -04:00
Cyrus Leung	da1a844e61	[Bugfix] Fix missing `post_layernorm` in CLIP (#8155 )	2024-09-10 08:22:50 +00:00
Dipika Sikka	6cd5e5b07e	[Misc] Fused MoE Marlin support for GPTQ (#8217 )	2024-09-09 23:02:52 -04:00
Vladislav Kruglikov	f9b4a2d415	[Bugfix] Correct adapter usage for cohere and jamba (#8292 )	2024-09-09 11:20:46 -07:00
Isotr0py	36bf8150cc	[Model][VLM] Decouple weight loading logic for `Paligemma` (#8269 )	2024-09-07 17:45:44 +00:00
Isotr0py	e807125936	[Model][VLM] Support multi-images inputs for InternVL2 models (#8201 )	2024-09-07 16:38:23 +08:00
Cyrus Leung	2f707fcb35	[Model] Multi-input support for LLaVA (#8238 )	2024-09-07 02:57:24 +00:00
Patrick von Platen	29f49cd6e3	[Model] Allow loading from original Mistral format (#8168 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-06 17:02:05 -06:00
Alex Brooks	9da25a88aa	[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-05 12:48:10 +00:00
manikandan.tm@zucisystems.com	8685ba1a1e	Inclusion of InternVLChatModel In PP_SUPPORTED_MODELS(Pipeline Parallelism) (#7860 )	2024-09-05 11:33:37 +00:00
wnma	d3311562fb	[Bugfix] remove post_layernorm in siglip (#8106 )	2024-09-04 18:55:37 +08:00
Peter Salas	2be8ec6e71	[Model] Add Ultravox support for multiple audio chunks (#7963 )	2024-09-04 04:38:21 +00:00
Isotr0py	ec266536b7	[Bugfix][VLM] Add fallback to SDPA for ViT model running on CPU backend (#8061 )	2024-09-03 21:37:52 +08:00
Isotr0py	dd2a6a82e3	[Bugfix] Fix internlm2 tensor parallel inference (#8055 )	2024-09-02 23:48:56 +08:00
Shawn Tan	f8d60145b4	[Model] Add Granite model (#7436 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-09-01 18:37:18 -07:00
Roger Wang	5b86b19954	[Misc] Optional installation of audio related packages (#8063 )	2024-09-01 14:46:57 -07:00
Cyrus Leung	d05f0a9db2	[Bugfix] Fix import error in Phi-3.5-MoE (#8052 )	2024-08-30 22:26:55 -07:00
Wenxiang	1248e8506a	[Model] Adding support for MSFT Phi-3.5-MoE (#7729 ) Co-authored-by: Your Name <you@example.com> Co-authored-by: Zeqi Lin <zelin@microsoft.com> Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>	2024-08-30 13:42:57 -06:00

1 2 3 4 5 ...

372 Commits