biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Woosuk Kwon	371d04d39b	[V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (#11394 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-27 09:32:38 +09:00
Simon Mo	f49777ba62	Deepseek v3 (#11502 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>	2024-12-26 16:09:44 -08:00
Robert Shaw	55fb97f7bd	[2/N] API Server: Avoid ulimit footgun (#11530 )	2024-12-26 23:43:05 +00:00
Michael Goin	2072924d14	[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-26 15:33:30 -08:00
Robert Shaw	720b10fdc6	[1/N] API Server (Remove Proxy) (#11529 )	2024-12-26 23:03:43 +00:00
Cyrus Leung	eec906d811	[Misc] Add placeholder module (#11501 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-26 13:12:51 +00:00
Jee Jee Li	f57ee5650d	[Model] Modify MolmoForCausalLM MLP (#11510 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-26 13:12:05 +00:00
sroy745	dcb1a944d4	[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681 ) Signed-off-by: Sourashis Roy <sroy@roblox.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-26 19:02:58 +09:00
Jee Jee Li	aa25985bd1	[Misc][LoRA] Fix LoRA weight mapper (#11495 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-26 15:52:48 +08:00
Lucas Tucker	dbeac95dbb	Mypy checking for vllm/compilation (#11496 ) Signed-off-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: lucast2021 <lucast2021@headroyce.org>	2024-12-26 05:04:07 +00:00
Cyrus Leung	51a624bf02	[Misc] Move some multimodal utils to modality-specific modules (#11494 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-26 04:23:20 +00:00
Cyrus Leung	b689ada91e	[Frontend] Enable decord to load video from base64 (#11492 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-25 16:33:55 +00:00
Rui Qiao	9832e5572a	[V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor (#11472 )	2024-12-24 19:49:46 -08:00
Cyrus Leung	3f3e92e1f2	[Model] Automatic conversion of classification and reward models (#11469 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 18:22:22 +00:00
Jee Jee Li	196c34b0ac	[Misc] Move weights mapper (#11443 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-24 13:05:25 +00:00
Mengqing Cao	5c7963249d	[attn][tiny fix] fix attn backend in MultiHeadAttention (#11463 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2024-12-24 12:39:36 +00:00
Isotr0py	7a5286cc04	[Bugfix][Hardware][CPU] Fix CPU `input_positions` creation for text-only inputs with mrope (#11434 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-24 17:59:51 +08:00
Jee Jee Li	b1b1038fbd	[Bugfix] Fix Qwen2-VL LoRA weight loading (#11430 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-24 09:56:10 +00:00
Cyrus Leung	9edca6bf8f	[Frontend] Online Pooling API (#11457 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 17:54:30 +08:00
dpxa	4f074fbf53	[Misc]Suppress irrelevant exception stack trace information when CUDA… (#11438 ) Co-authored-by: shiquan <shiquan>	2024-12-24 08:43:39 +00:00
Rui Qiao	a491d6f535	[V1] TP Ray executor (#11107 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-12-23 23:00:12 +00:00
Rafael Vasquez	32aa2059ad	[Docs] Convert rST to MyST (Markdown) (#11145 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-12-23 22:35:38 +00:00
yansh97	94d545a1a1	[Doc] Fix typo in the help message of '--guided-decoding-backend' (#11440 )	2024-12-23 20:20:44 +00:00
Michael Goin	60fb4f3bcf	[Bugfix] Add kv cache scales to gemma2.py (#11269 )	2024-12-23 19:30:45 +00:00
Dipika Sikka	b866cdbd05	[Misc] Add assertion and helpful message for marlin24 compressed models (#11388 )	2024-12-24 02:23:38 +08:00
Michael Goin	5bfb30a529	[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-23 23:06:20 +08:00
Lucas Tucker	e51719ae72	mypy type checking for vllm/worker (#11418 ) Signed-off-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: lucast2021 <lucast2021@headroyce.org>	2024-12-23 13:55:49 +00:00
youkaichao	f30581c518	[misc][perf] remove old code (#11425 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-23 08:01:08 +00:00
Jason T. Greene	f1d1bf6288	[Bugfix] Fix fully sharded LoRAs with Mixtral (#11390 ) Signed-off-by: Jason Greene <jason.greene@redhat.com>	2024-12-22 23:25:10 +08:00
Roger Wang	c2d1b075ba	[Bugfix] Fix issues for `Pixtral-Large-Instruct-2411` (#11393 ) Signed-off-by: ywang96 <ywang@example.com> Co-authored-by: ywang96 <ywang@example.com>	2024-12-21 10:15:03 +00:00
Ricky Xu	584f0ae40d	[V1] Make AsyncLLMEngine v1-v0 opaque (#11383 ) Signed-off-by: Ricky Xu <xuchen727@hotmail.com>	2024-12-21 15:14:08 +08:00
George	51ff216d85	[Bugfix] update should_ignore_layer (#11354 ) Signed-off-by: George Ohashi <george@neuralmagic.com>	2024-12-21 06:36:23 +00:00
Woosuk Kwon	dd2b5633dd	[V1][Bugfix] Skip hashing empty or None mm_data (#11386 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-21 14:22:21 +09:00
Michael Goin	d573aeadcc	[Bugfix] Don't log OpenAI field aliases as ignored (#11378 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-20 19:03:50 +00:00
omer-dayan	995f56236b	[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192 ) Signed-off-by: OmerD <omer@run.ai>	2024-12-20 16:46:24 +00:00
Roger Wang	04139ade59	[V1] Fix profiling for models with merged input processor (#11370 ) Signed-off-by: ywang96 <ywang@roblox.com>	2024-12-20 12:04:21 +00:00
youkaichao	c954f21ac0	[misc] add early error message for custom ops (#11355 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-19 21:18:25 -08:00
Wallas Henrique	86c2d8fd1c	[Bugfix] Fix spec decoding when seed is none in a batch (#10863 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-12-20 05:15:31 +00:00
Michael Goin	b880ffb87e	[Misc] Add tqdm progress bar during graph capture (#11349 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-12-20 04:35:18 +00:00
Akash kaothalkar	48edab8041	[Bugfix][Hardware][POWERPC] Fix auto dtype failure in case of POWER10 (#11331 ) Signed-off-by: Akash Kaothalkar <0052v2@linux.vnet.ibm.com>	2024-12-20 01:32:07 +00:00
yangzhibin	e461c262f0	[Misc] Remove unused vllm/block.py (#11336 )	2024-12-19 17:54:24 +00:00
Isotr0py	276738ce0f	[Bugfix] Fix broken CPU compressed-tensors test (#11338 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-19 17:37:31 +00:00
Cyrus Leung	cdf22afdda	[Misc] Clean up and consolidate LRUCache (#11339 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-20 00:59:32 +08:00
Isotr0py	e24113a8fe	[Model] Refactor Qwen2-VL to use merged multimodal processor (#11258 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 16:28:00 +00:00
Roger Wang	7379b3d4b2	[V1] Fix multimodal profiling for `Molmo` (#11325 ) Signed-off-by: ywang96 <ywang@example.com> Co-authored-by: ywang96 <ywang@example.com>	2024-12-19 16:27:22 +00:00
Yehoshua Cohen	6c7f881541	[Model] Add JambaForSequenceClassification model (#10860 ) Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 22:48:06 +08:00
Cyrus Leung	a0f7d53beb	[Bugfix] Cleanup Pixtral HF code (#11333 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 13:22:00 +00:00
Yanyi Liu	5aef49806d	[Feature] Add load generation config from model (#11164 ) Signed-off-by: liuyanyi <wolfsonliu@163.com> Signed-off-by: Yanyi Liu <wolfsonliu@163.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-12-19 10:50:38 +00:00
Rui Qiao	f26c4aeecb	[Misc] Optimize ray worker initialization time (#11275 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-12-18 23:38:02 -08:00
Cyrus Leung	6142ef0ada	[VLM] Merged multimodal processor for Qwen2-Audio (#11303 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-19 06:14:17 +00:00

1 2 3 4 5 ...

2621 Commits