biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
youkaichao	92e793d91a	[core] LLM.collective_rpc interface and RLHF example (#12084 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-16 20:19:52 +08:00
youkaichao	bf53e0c70b	Support torchrun and SPMD-style offline inference (#12071 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-16 19:58:53 +08:00
Isotr0py	dd7c9ad870	[Bugfix] Remove hardcoded `head_size=256` for Deepseek v2 and v3 (#12067 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-16 10:11:54 +00:00
Michael Goin	9aa1519f08	Various cosmetic/comment fixes (#12089 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2025-01-16 09:59:06 +00:00
Cyrus Leung	f8ef146f03	[Doc] Add documentation for specifying model architecture (#12105 )	2025-01-16 15:53:43 +08:00
Elfie Guo	fa0050db08	[Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: Michael Goin <mgoin@redhat.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2025-01-16 04:31:27 +00:00
tvirolai-amd	cd9d06fb8d	Allow hip sources to be directly included when compiling for rocm. (#12087 )	2025-01-15 16:46:03 -05:00
Varun Sundar Rabindranath	ebd8c669ef	[Bugfix] Fix _get_lora_device for HQQ marlin (#12090 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-01-15 19:59:42 +00:00
Roger Wang	70755e819e	[V1][Core] Autotune encoder cache budget (#11895 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-01-15 11:29:00 -08:00
Joe Runde	edce722eaa	[Bugfix] use right truncation for non-generative tasks (#12050 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-01-16 00:31:01 +08:00
maang-h	57e729e874	[Doc]: Update `OpenAI-Compatible Server` documents (#12082 )	2025-01-15 16:07:45 +00:00
kewang-xlnx	de0526f668	[Misc][Quark] Upstream Quark format to VLLM (#10765 ) Signed-off-by: kewang-xlnx <kewang@xilinx.com> Signed-off-by: kewang2 <kewang2@amd.com> Co-authored-by: kewang2 <kewang2@amd.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-15 11:05:15 -05:00
Yuan	5ecf3e0aaf	Misc: allow to use proxy in `HTTPConnection` (#12042 ) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>	2025-01-15 13:16:40 +00:00
RunningLeon	97eb97b5a4	[Model]: Support internlm3 (#12037 )	2025-01-15 11:35:17 +00:00
wangxiyuan	3adf0ffda8	[Platform] Do not raise error if _Backend is not found (#12023 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-01-15 10:14:15 +00:00
Keyun Tong	ad388d25a8	Type-fix: make execute_model output type optional (#12020 )	2025-01-15 09:44:56 +00:00
Rahul Tuli	cbe94391eb	Fix: cases with empty sparsity config (#12057 ) Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>	2025-01-15 17:41:24 +08:00
Chen Zhang	994fc655b7	[V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003 )	2025-01-15 07:55:30 +00:00
Kyle Sayers	3f9b7ab9f5	[Doc] Update examples to remove SparseAutoModelForCausalLM (#12062 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-01-15 06:36:01 +00:00
youkaichao	ad34c0df0f	[core] platform agnostic executor via collective_rpc (#11256 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-15 13:45:21 +08:00
Rui Qiao	f218f9c24d	[core] Turn off GPU communication overlap for Ray executor (#12051 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-01-15 05:19:55 +00:00
Elfie Guo	0794e7446e	[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467 )	2025-01-15 12:47:49 +08:00
Woosuk Kwon	b7ee940a82	[V1][BugFix] Fix edge case in VLM scheduling (#12065 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-14 20:21:28 -08:00
Shanshan Shen	9ddac56311	[Platform] move current_memory_usage() into platform (#11369 ) Signed-off-by: Shanshan Shen <467638484@qq.com>	2025-01-15 03:38:25 +00:00
Konrad Zawora	1a51b9f872	[HPU][Bugfix] Don't use /dev/accel/accel0 for HPU autodetection in setup.py (#12046 ) Signed-off-by: Konrad Zawora <kzawora@habana.ai>	2025-01-15 02:59:18 +00:00
Jee Jee Li	42f5e7c52a	[Kernel] Support MulAndSilu (#11624 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-15 02:29:53 +00:00
Jee Jee Li	a3a3ee4e6f	[Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_mapping (#11924 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-15 07:49:49 +08:00
maang-h	87054a57ab	[Doc]: Update the Json Example of the `Engine Arguments` document (#12045 )	2025-01-14 17:03:04 +00:00
Harry Mellor	c9d6ff530b	Explain where the engine args go when using Docker (#12041 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-14 16:05:50 +00:00
Chen Zhang	a2d2acb4c8	[Bugfix][Kernel] Give unique name to BlockSparseFlashAttention (#12040 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-01-14 15:45:05 +00:00
wangxiyuan	2e0e017610	[Platform] Add output for Attention Backend (#11981 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-01-14 13:27:04 +00:00
Chen Zhang	1f18adb245	[Kernel] Revert the API change of Attention.forward (#12038 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-01-14 20:59:32 +08:00
Cyrus Leung	bb354e6b2d	[Bugfix] Fix various bugs in multi-modal processor (#12031 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-14 12:16:11 +00:00
youkaichao	ff39141a49	[HPU][misc] add comments for explanation (#12034 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-14 19:24:06 +08:00
TJian	8a1f938e6f	[Doc] Update Quantization Hardware Support Documentation (#12025 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-01-14 04:37:52 +00:00
Konrad Zawora	078da31903	[HPU][Bugfix] set_forward_context and CI test execution (#12014 ) Signed-off-by: Konrad Zawora <kzawora@habana.ai>	2025-01-14 11:04:18 +08:00
Woosuk Kwon	1a401252b5	[Docs] Add Sky Computing Lab to project intro (#12019 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-13 17:24:36 -08:00
Steve Luo	f35ec461fc	[Bugfix] Fix deepseekv3 gate bias error (#12002 ) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2025-01-13 13:43:51 -07:00
Yikun Jiang	289b5191d5	[Doc] Fix build from source and installation link in README.md (#12013 ) Signed-off-by: Yikun <yikunkero@gmail.com>	2025-01-13 17:23:59 +00:00
elijah	c6db21313c	bugfix: Fix signature mismatch in benchmark's `get_tokenizer` function (#11982 ) Signed-off-by: elijah <f1renze.142857@gmail.com>	2025-01-13 15:22:07 +00:00
Shanshan Shen	a7d59688fb	[Platform] Move get_punica_wrapper() function to Platform (#11516 ) Signed-off-by: Shanshan Shen <467638484@qq.com>	2025-01-13 13:12:10 +00:00
youkaichao	458e63a2c6	[platform] add device_control env var (#12009 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-13 20:59:09 +08:00
Harry Mellor	e8c23ff989	[Doc] Organise installation documentation into categories and tabs (#11935 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-13 12:27:36 +00:00
Roger Wang	cd8249903f	[Doc][V1] Update model implementation guide for V1 support (#11998 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-01-13 11:58:54 +00:00
Chen Zhang	0f8cafe2d1	[Kernel] unified_attention for Attention.forward (#11967 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-01-13 19:28:53 +08:00
Alex Brooks	5340a30d01	Fix Max Token ID for Qwen-VL-Chat (#11980 ) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>	2025-01-13 08:37:48 +00:00
youkaichao	89ce62a316	[platform] add ray_device_key (#11948 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-13 16:20:52 +08:00
Chenguang Li	c3f05b09a0	[Misc]Minor Changes about Worker (#11555 ) Signed-off-by: Chenguang Li <757486878@qq.com>	2025-01-13 15:47:05 +08:00
Concurrensee	cf6bbcb493	[Misc] Fix Deepseek V2 fp8 kv-scale remapping (#11947 ) Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>	2025-01-12 23:05:06 -08:00
Sungjae Lee	80ea3af1a0	[CI][Spec Decode] fix: broken test for EAGLE model (#11972 ) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>	2025-01-13 06:50:35 +00:00

1 2 3 4 5 ...

4170 Commits