biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Tyler Michael Smith	09545c0a94	[Bugfix/CI] Turn test_compressed_tensors_2of4_sparse back on (#13250 )	2025-02-13 20:19:25 -08:00
Roger Wang	dd5ede4440	[V1] Consolidate MM cache size to vllm.envs (#13239 )	2025-02-13 20:19:03 -08:00
Jinzhen Lin	8c32b08a86	[Kernel] Fix awq error when n is not divisable by 128 (#13227 )	2025-02-13 20:07:05 -08:00
Gregory Shtrasberg	410886950a	[ROCm] Avoid using the default stream on ROCm (#13238 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-02-14 09:29:26 +08:00
Harry Mellor	e38be640e6	Revert "Add label if pre-commit passes" (#13242 )	2025-02-13 16:12:32 -08:00
Tyler Michael Smith	c1e37bf71b	[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-14 00:01:14 +00:00
Michael Goin	2344192a55	Optimize moe_align_block_size for deepseek_v3 (#12850 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-13 18:43:37 -05:00
Harry Mellor	bffddd9a05	Add label if pre-commit passes (#12527 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-13 20:51:30 +00:00
Nicolò Lucchesi	d84cef76eb	[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint (#12909 )	2025-02-13 07:23:45 -08:00
Vaibhav Jain	37dfa60037	[Bugfix] Missing Content Type returns 500 Internal Server Error (#13193 )	2025-02-13 06:52:22 -08:00
Cyrus Leung	1bc3b5e71b	[VLM] Separate text-only and vision variants of the same model architecture (#13157 )	2025-02-13 06:19:15 -08:00
燃	02ed8a1fbe	[Misc] Qwen2.5-VL Optimization (#13155 )	2025-02-13 06:17:57 -08:00
Aoyu	2092a6fa7d	[V1][Core] Add worker_base for v1 worker (#12816 ) Signed-off-by: Aoyu <aoyuzhan@amazon.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Aoyu <aoyuzhan@amazon.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-02-13 20:35:18 +08:00
Cyrus Leung	c9d3ecf016	[VLM] Merged multi-modal processor for Molmo (#12966 )	2025-02-13 04:34:00 -08:00
Roger Wang	fdcf64d3c6	[V1] Clarify input processing and multimodal feature caching logic (#13211 )	2025-02-13 03:43:24 -08:00
Russell Bryant	578087e56c	[Frontend] Pass pre-created socket to uvicorn (#13113 )	2025-02-13 00:51:46 -08:00
Isotr0py	fa253f1a70	[VLM] Remove input processor from clip and siglip (#13165 )	2025-02-13 00:31:37 -08:00
Rui Qiao	9605c1256e	[V1][core] Implement pipeline parallel on Ray (#12996 )	2025-02-13 08:02:46 +00:00
Russell Bryant	0ccd8769fb	[CI/Build] Allow ruff to auto-fix some issues (#13180 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-13 07:45:38 +00:00
Daniel Han	cb944d5818	Allow Unsloth Dynamic 4bit BnB quants to work (#12974 )	2025-02-12 23:13:08 -08:00
Russell Bryant	d46d490c27	[Frontend] Move CLI code into vllm.cmd package (#12971 )	2025-02-12 23:12:21 -08:00
LikeSundayLikeRain	04f50ad9d1	[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case (#13097 )	2025-02-12 23:11:26 -08:00
Cody Yu	60c68df6d1	[Build] Automatically use the wheel of the base commit with Python-only build (#13178 )	2025-02-12 23:10:28 -08:00
Lu Fang	009439caeb	Simplify logic of locating CUDART so file path (#13203 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-13 13:52:41 +08:00
Isotr0py	bc55d13070	[VLM] Implement merged multimodal processor for Mllama (#11427 )	2025-02-12 20:26:21 -08:00
Michael Goin	d88c8666a1	[Bugfix][Example] Fix GCed profiling server for TPU (#12792 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2025-02-13 11:52:11 +08:00
Kaixi Hou	4fc5c23bb6	[NVIDIA] Support nvfp4 quantization (#12784 )	2025-02-12 19:51:51 -08:00
Kevin H. Luu	9f9704dca6	[perf-benchmark] cleanup unused Docker images and volumes in H100 benchmark instance (#12706 )	2025-02-12 19:51:33 -08:00
Russell Bryant	8eafe5eaea	[CI/Build] Ignore ruff warning up007 (#13182 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-13 11:48:31 +08:00
Murali Andoorveedu	4c0d93f4b2	[V1][Bugfix] Copy encoder input ids to fix set iteration issue during VLM abort (#13173 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>	2025-02-12 12:58:11 -08:00
Michael Goin	14b7899d10	[CI] Fix failing FP8 cpu offload test (#13170 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-12 19:16:06 +00:00
Michael Goin	09972e716c	[Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity (#13119 )	2025-02-12 09:19:53 -08:00
Qubitium-ModelCloud	36a08630e8	[CORE] [QUANT] Support for GPTQModel's `dynamic` quantization per module override/control (#7086 )	2025-02-12 09:19:43 -08:00
Russell Bryant	2c2b560f48	[CI/Build] Use mypy matcher for pre-commit CI job (#13162 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-12 17:12:22 +00:00
Lu Fang	042c3419fa	Introduce VLLM_CUDART_SO_PATH to allow users specify the .so path (#12998 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-12 09:06:13 -08:00
Jee Jee Li	82cabf53a3	[Misc] Delete unused LoRA modules (#13151 )	2025-02-12 08:58:24 -08:00
Rafael Vasquez	314cfade02	[Frontend] Generate valid tool call IDs when using `tokenizer-mode=mistral` (#12332 )	2025-02-12 08:29:56 -08:00
Cyrus Leung	985b4a2b19	[Bugfix] Fix num video tokens calculation for Qwen2-VL (#13148 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-12 11:55:23 +00:00
bnellnm	f4d97e4fc2	[Bug] [V1] Try fetching stop_reason from EngineOutput before checking the request (#13108 )	2025-02-12 02:39:16 -08:00
Shiyan Deng	f1042e86f0	[Misc] AMD Build Improvements (#12923 )	2025-02-12 02:36:10 -08:00
Maximilien de Bayser	7c4033acd4	Further reduce the HTTP calls to huggingface.co (#13107 )	2025-02-12 02:34:09 -08:00
dependabot[bot]	d59def4730	Bump actions/setup-python from 5.3.0 to 5.4.0 (#12672 )	2025-02-12 16:41:22 +08:00
dependabot[bot]	0c7d9effce	Bump helm/chart-testing-action from 2.6.1 to 2.7.0 (#12463 )	2025-02-12 16:41:06 +08:00
dependabot[bot]	dd3b4a01f8	Bump actions/stale from 9.0.0 to 9.1.0 (#12462 )	2025-02-12 00:40:25 -08:00
dependabot[bot]	a0597c6b75	Bump helm/kind-action from 1.10.0 to 1.12.0 (#11612 )	2025-02-12 00:40:19 -08:00
Lingfan Yu	e92694b6fe	[Neuron][Kernel] Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency (#12921 ) Signed-off-by: Lingfan Yu <lingfany@amazon.com>	2025-02-11 21:12:37 -08:00
Kevin H. Luu	842b0fd402	[ci] Add more source file dependencies for some tests (#13123 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-11 20:38:10 -08:00
Christian Pinto	974dfd4971	[Model] IBM/NASA Prithvi Geospatial model (#12830 )	2025-02-11 20:34:30 -08:00
Keyun Tong	3ee696a63d	[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518 ) Signed-off-by: Keyun Tong <tongkeyun@gmail.com>	2025-02-12 12:25:58 +08:00
Russell Bryant	72c2b68dc9	[Misc] Move pre-commit suggestion back to the end (#13114 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-11 22:34:16 +00:00

... 195 196 197 198 199 ...

14386 Commits