Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

ce20124671 [release] Add force remove for TPU logs (#14697) Kevin H. Luu 2025-03-12 15:35:18 -07:00
53be4a8634 [V1] Allow sliding window + prefix caching (#13069) Woosuk Kwon 2025-03-12 11:21:19 -07:00
f5d3acd474 [BugFix][V1] Fix parallel sampling finishing/aborts (#14512) Nick Hill 2025-03-12 13:29:48 -04:00
916836bbfb [FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664) TJian 2025-03-13 00:31:19 +08:00
d9f83d6206 [ROCm] Enable chunked prefill/paged attention in MLA on ROCm (#14316) Sage Moore 2025-03-12 08:51:20 -07:00
4a754fcf15 [Bugfix] Missing thumbnail from NVLM-D processor (#14633) ameyanjarlekar 2025-03-12 08:50:49 -07:00
c0c25e25fa [Model] Add support for Gemma 3 (#14660) Woosuk Kwon 2025-03-12 08:36:33 -07:00
45f3f3f59e [ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. (#14629) Sage Moore 2025-03-12 05:00:28 -07:00
ff47aab056 [CPU] Upgrade CPU backend to torch-2.6 (#13381) Li, Jiang 2025-03-12 18:41:13 +08:00
debd6bbf09 [Kernel] Add ModelOpt FP4 Checkpoint Support (#12520) Pavani Majety 2025-03-11 22:13:11 -07:00
5c538c37b2 [V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing (#14645) Benjamin Chislett 2025-03-12 01:12:41 -04:00
e22ee1e7a2 [Kernel] GGUF MoE kernel (#14613) Szymon Ożóg 2025-03-12 04:33:27 +01:00
e392d85831 [Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization (#14545) Isotr0py 2025-03-12 11:12:52 +08:00
77a318bd01 [V1][Core] Support MistralTokenizer for Structured Output (#14625) Aaron Pham 2025-03-11 22:40:09 -04:00
80e78d02ac [Model] Extend Ultravox to accept audio longer than 30s (#13631) Farzad Abdolhosseini 2025-03-11 19:27:10 -07:00
4a42b9f5d6 [Doc] Update benchmarks README (#14646) Jennifer Zhao 2025-03-11 19:23:04 -07:00
47532cd9f4 [core][V1] pluggable scheduler (#14466) Joe Runde 2025-03-11 19:15:15 -06:00
36e0c8f7da [Feature] Add vllm bench CLI (#13993) Randy Chen 2025-03-11 17:31:48 -07:00
9f583e360c [release] Add commands to clean up logs on TPU release node (#14642) Kevin H. Luu 2025-03-11 17:14:50 -07:00
b706d898af [Bugfix][V1][PP] Only warmup sampler at last PP rank (#14643) Cody Yu 2025-03-11 16:40:07 -07:00
863d315c86 [V1][TPU] Pad the block_table.shape[1] so the ragged paged attention can handle correctly (#14597) iefgnoix 2025-03-11 16:12:26 -07:00
d374f04a33 Fix run_tpu_test (#14641) Richard Liu 2025-03-11 14:14:33 -07:00
61a01b27a7 [V1] Delay all xgrammar usage until needed (#14616) Russell Bryant 2025-03-11 16:21:33 -04:00
53056731fd fix some typos : supported_head_sizes (#14627) Yang.Tao 2025-03-12 01:38:24 +08:00
4cbf286794 [V1] Remove cache from StructuredOutputManager (#14622) Russell Bryant 2025-03-11 13:36:07 -04:00
c6e14a61ab [Hardware][Intel GPU] upgrade IPEX dependency to 2.6.10. (#14564) Kunshang Ji 2025-03-11 10:11:47 -07:00
07b4b7a37f [BugFix/Build] Fix sparse kernels not getting built on hopper (#14572) Lucas Wilkinson 2025-03-11 13:09:03 -04:00
07964e2f30 docs: Add documentation for s390x cpu implementation (#14198) Dilip Gowda Bhagavan 2025-03-11 22:32:17 +05:30
4bf82d4b90 [V1] Add regex structured output support with xgrammar (#14590) Russell Bryant 2025-03-11 11:03:44 -04:00
9ab326713f Uninstall dependencies before installing requirements/tpu.txt (#14586) Richard Liu 2025-03-11 08:01:35 -07:00
af295e9b01 [Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (#14609) Cyrus Leung 2025-03-11 22:59:43 +08:00
a1c8f3796c dynamic distpatch of fp8 kernels (#14245) Jeff Daily 2025-03-11 07:54:56 -07:00
08a1a1121d benchmarks: simplify test jsonschema (#14567) Russell Bryant 2025-03-11 09:39:30 -04:00
1477ffc381 [VLM] Cleanup siglip legacy code and fix broken paligemma multimodal processor (#14602) Isotr0py 2025-03-11 19:27:36 +08:00
70b808fe1a [Perf]:Optimize qwen2-vl to reduce cudaMemcpyAsync (#14377) yexin(叶鑫) 2025-03-11 15:39:56 +08:00
63d635d179 [Misc] Correct deepseek-vl2 chat template (#14558) Isotr0py 2025-03-11 12:37:11 +08:00
1fc973c0b5 [V1][Core] Fix memory issue with logits & sampling (#14508) Roger Wang 2025-03-10 21:03:41 -07:00
c982ac5722 [Bugfix] Fix FP16 overflow for DeepSeek V2 (#13232) Concurrensee 2025-03-10 22:46:59 -05:00
4290b704ff [V1][PP] Do not block engine core when no requests to schedule (#14585) Cody Yu 2025-03-10 19:48:24 -07:00
c91b64f749 [neuron] add reshape_and_cache (#14391) Liangfu Chen 2025-03-10 18:37:29 -07:00
d6123170d5 [Neuron] Add Neuron device communicator for vLLM v1 (#14085) gnovack 2025-03-10 18:37:04 -07:00
485afdd3cb [MISC][V1] Handle exception of current_platform.get_device_name() in arg_utils (#14379) Cody Yu 2025-03-10 17:42:11 -07:00
90e88ab756 [Kernel] moe wna16 cuda kernel (#13321) Jinzhen Lin 2025-03-11 08:12:40 +08:00
04421dff8a [V1] Prevent xgrammar from breaking TPU support (#14575) Russell Bryant 2025-03-10 19:06:19 -04:00
432d6dad15 Fix typo in benchmark_serving_structured_output.py (#14566) Russell Bryant 2025-03-10 17:58:58 -04:00
5ff0d32580 [V1] LoRA - Add triton kernels for V1 (#13096) Varun Sundar Rabindranath 2025-03-10 17:27:53 -04:00
0967110e42 [Minor] Update the tqdm bar for parallel sampling (#14571) Woosuk Kwon 2025-03-10 14:23:48 -07:00
fb0acb6c72 [Perf] Improve MLA on V1 (#14540) Simon Mo 2025-03-10 12:06:58 -07:00
92b0ce2ac7 [Bugfix][v1] fixed llava-hf/llava-1.5-7b-hf is broken on V1 (#14554) Chauncey 2025-03-11 02:24:51 +08:00
bc2d4473bf [Docs] Make installation URLs nicer (#14556) Harry Mellor 2025-03-10 18:43:08 +01:00
3b352a2f92 Correct capitalisation: VLLM -> vLLM (#14562) Harry Mellor 2025-03-10 17:36:21 +01:00
dea985aef0 [V1][Bugfix] Fix handing of second_per_grid_ts for Qwen2-VL & Qwen2.5-VL (#14548) Roger Wang 2025-03-10 09:03:11 -07:00
39be30351f Correct capitalisation: Github -> GitHub (#14561) Harry Mellor 2025-03-10 16:53:33 +01:00
001a9c7b0d [Doc] Update PaliGemma note to a warning (#14565) Cyrus Leung 2025-03-10 23:02:28 +08:00
89cdaa83e7 [Kernel] Add more dtype support for GGUF kernels (#14043) Szymon Ożóg 2025-03-10 15:30:04 +01:00
b0746fae3d [Frontend] support image embeds (#13955) Chauncey 2025-03-10 20:36:03 +08:00
60a98b2de5 [Docs] Mention model_impl arg when explaining Transformers fallback (#14552) Harry Mellor 2025-03-10 13:13:10 +01:00
460f553a6d [Misc] Add log information for handle_process_request. (#14130) Chauncey 2025-03-10 16:40:50 +08:00
1253b15774 [Feature] Consolidate performance benchmark datasets (#14036) Jennifer Zhao 2025-03-10 00:23:11 -07:00
dc74613fa2 [Bugfix] Wrong requirements path - rocm (#14527) Martin Hoyer 2025-03-10 03:49:46 +01:00
a21076ed3a [Misc] Ensure out-of-tree quantization method recognize by cli args (#14328) Yanyi Liu 2025-03-09 20:13:31 +08:00
212007b168 [Hardware][TPU] Fix the recompiling issue in logits processor after warmup (#14510) Chengji Yao 2025-03-09 01:44:39 -08:00
fb16eea48b [Bugfix] Revert QKVCrossParallelLinear usage in Mllama to keep BNB quantization work (#14498) Isotr0py 2025-03-09 12:47:45 +08:00
73ae0b44e9 [Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 (#12428) Yuchen Yan 2025-03-09 12:14:53 +08:00
6d7f037748 [Feat] Support chunked prefill for LMCache connector (#14505) Jiayi Yao 2025-03-08 21:30:06 -06:00
10f7552789 [V1][TPU] Remove unnecessary padding for running on TPU. (#14467) iefgnoix 2025-03-08 18:56:04 -08:00
b0d541947a [Attention] Default to FlashMLA backend for MLA (#14451) Lucas Wilkinson 2025-03-08 21:18:39 -05:00
5f0b53c6ea Revert "[V1][Core] Fix memory issue with logits & sampling" (#14504) Robert Shaw 2025-03-08 20:43:37 -05:00
eb8b5eb183 [V1] Support bad_words in sampler (#13376) 22quinn 2025-03-08 14:50:26 -08:00
9513290032 [Misc] Upgrade to Python 3.9 typing for additional directories (#14492) Cyrus Leung 2025-03-09 01:35:50 +08:00
0d5e73d30e Update CODEOWNERS for structured output (#14496) Russell Bryant 2025-03-08 12:19:51 -05:00
609ef61fea [Bugfix] Fix profiling OOM and decouple encoder multimodal profiling (#14361) Isotr0py 2025-03-09 00:52:34 +08:00
db84f5eb3b [Bugfix] DeepSeek Accuracy (#14476) Lucas Wilkinson 2025-03-08 11:47:03 -05:00
206e2577fa Move requirements into their own directory (#12547) Harry Mellor 2025-03-08 17:44:35 +01:00
e02883c400 [Misc] Don't run ruff at all on 3rd party libs (#14493) Cyrus Leung 2025-03-08 23:16:40 +08:00
9085aabd62 [benchmarks] Add option to use unique jsonschema for each request (#14457) Russell Bryant 2025-03-08 09:36:39 -05:00
8d5aa466fb [V1][Core] Fix memory issue with logits & sampling (#13776) Roger Wang 2025-03-08 06:11:04 -08:00
0b7f06b447 [Misc] add use_tqdm_on_load to reduce logs (#14407) Aaron Pham 2025-03-08 08:57:46 -05:00
03fe18ae0f [VLM] Add TP support for Phi-4-MM (#14453) Isotr0py 2025-03-08 21:57:14 +08:00
cb8bdfade2 [V1] TPU - Add tensor parallel support via Ray (#13618) Alexander Matveev 2025-03-08 08:19:38 -05:00
33f227e16b [CI/Build] Use a fixed seed to avoid flaky tests (#14480) Cyrus Leung 2025-03-08 19:30:09 +08:00
cfd0ae8234 Add RLHF document (#14482) Harry Mellor 2025-03-08 10:51:39 +01:00
7caff01a7b [Build/BugFix] Fix hopper 12.8 build (#14354) Lucas Wilkinson 2025-03-08 03:11:56 -05:00
be0b399d74 Add training doc signposting to TRL (#14439) Harry Mellor 2025-03-08 08:35:07 +01:00
b8b0ccbd2d [Bugfix] Make the deviceprofiler include LoRA memory. (#14469) Jee Jee Li 2025-03-08 15:12:22 +08:00
c908a07f57 [Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479) Robin 2025-03-08 15:07:32 +08:00
7b6fd6e486 [Doc]add doc for Qwen models tool calling (#14478) Robin 2025-03-08 14:58:46 +08:00
47512b3200 Default to generation_config from model (#12622) Harry Mellor 2025-03-08 07:46:15 +01:00
3b9c6c6947 [CI/Build] refactor: set timezone of container to UTC (#12888) Roger Meier 2025-03-08 07:42:01 +01:00
4aae667668 [core] add extra_args to SamplingParams (#13300) Aviv Keshet 2025-03-07 22:41:18 -08:00
9f3bc0f58c [MISC][V1] Register process killing handler only in the main thread (#14380) Cody Yu 2025-03-07 22:40:06 -08:00
980385f8c1 [Bugfix][Disaggregated] Add a check in send_kv_caches_and_hidden_states and fix the reshape of the KVCache (#14369) Mathis Felardos 2025-03-08 07:39:31 +01:00
ca7a2d5f28 Revert "[Perf] Reduce MLA CPU overheads in V1 (#14384)" (#14471) Tyler Michael Smith 2025-03-08 01:18:53 -05:00
333681408f [Bugfix][V1] Handle MLA in kv_cache_interface (#14462) Tyler Michael Smith 2025-03-08 01:18:25 -05:00
ef64044079 [V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949) afeldman-nm 2025-03-07 20:48:12 -05:00
66e16a038e [Bugfix] Fix torch_xla which can't handle None seed introduced in #14274 (#14459) yarongmu-google 2025-03-07 15:17:04 -08:00
e1f0835ae0 [V1][Metrics] Fix traceback with preemptions+LoRA (#14220) Mark McLoughlin 2025-03-07 20:36:16 +00:00
8ed5421aaa [V1] Eagerly remove finished requests from the batch (#14388) Nick Hill 2025-03-07 10:56:00 -08:00
c6359e8ca6 [v1] torch.compile integration explanation (#14437) youkaichao 2025-03-08 01:55:50 +08:00
952a074980 [Misc] Add Phi4-MM example (#14343) Jee Jee Li 2025-03-08 01:28:52 +08:00

... 107 108 109 110 111 ...