Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

e7eea5a520 [V1][CI] Fix failed v1-test because of min_p (#13316) Woosuk Kwon 2025-02-14 17:29:51 -08:00
a12934d3ec [V1][Core] min_p sampling support (#13191) Aoyu 2025-02-15 07:50:05 +08:00
3bcb8c75da [Core] Reduce TTFT with concurrent partial prefills (#10235) Joe Runde 2025-02-14 16:36:07 -07:00
5e5c8e091e [Quant][Perf] Use moe_wna16 kernel by default for MoEs with many experts (#13236) Michael Goin 2025-02-14 15:53:42 -05:00
c9e2d644e7 [Hardware][Gaudi][Bugfix] Fix error for guided decoding (#12317) Yu-Zhou 2025-02-14 20:36:49 +08:00
7734e9a291 [Core] choice-based structured output with xgrammar (#12632) Russell Bryant 2025-02-14 07:36:05 -05:00
6224a9f620 Support logit_bias in v1 Sampler (#13079) Lu Fang 2025-02-14 04:34:59 -08:00
085b7b2d6c [V1] Simplify GPUModelRunner._update_states check (#13265) Nick Hill 2025-02-14 04:33:43 -08:00
4da1f667e9 [VLM] Keep track of whether prompt replacements have been applied (#13215) Cyrus Leung 2025-02-14 20:20:46 +08:00
556ef7f714 [Misc] Log time consumption of sleep and wake-up (#13115) Jun Duan 2025-02-14 07:10:21 -05:00
83481ceb49 [Bugfix] Fix missing parentheses (#13263) Xu Song 2025-02-14 17:07:10 +08:00
185cc19f92 [Frontend] Optionally remove memory buffer used for uploading to URLs in run_batch (#12927) Pooya Davoodi 2025-02-14 00:22:42 -08:00
45f90bcbba [WIP] TPU V1 Support Refactored (#13049) Alexander Matveev 2025-02-14 03:21:53 -05:00
b0ccfc565a [Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch (#13126) Kero Liang 2025-02-14 14:39:20 +08:00
ba59b78a9c [ROCm][V1] Add intial ROCm support to V1 (#12790) Sage Moore 2025-02-13 22:21:50 -08:00
cbc40128eb [V1] LoRA - Enable Serving Usecase (#12883) Varun Sundar Rabindranath 2025-02-14 11:51:12 +05:30
f0b2da72a8 Expand MLA to support most types of quantization (#13181) Michael Goin 2025-02-14 01:19:22 -05:00
f2b20fe491 Consolidate Llama model usage in tests (#13094) Harry Mellor 2025-02-14 06:18:03 +00:00
40932d7a05 [Misc] Remove redundant statements in scheduler.py (#13229) Wang Ran (汪然) 2025-02-14 14:07:25 +08:00
84683fa271 [Bugfix] Offline example of disaggregated prefill (#13214) XiaobingZhang 2025-02-14 12:20:47 +08:00
067678262a [Bugfix][CI] Inherit codespell settings from pyproject.toml in the pre-commit-config (#13237) Tyler Michael Smith 2025-02-13 23:19:43 -05:00
09545c0a94 [Bugfix/CI] Turn test_compressed_tensors_2of4_sparse back on (#13250) Tyler Michael Smith 2025-02-13 23:19:25 -05:00
dd5ede4440 [V1] Consolidate MM cache size to vllm.envs (#13239) Roger Wang 2025-02-13 20:19:03 -08:00
8c32b08a86 [Kernel] Fix awq error when n is not divisable by 128 (#13227) Jinzhen Lin 2025-02-14 12:07:05 +08:00
410886950a [ROCm] Avoid using the default stream on ROCm (#13238) Gregory Shtrasberg 2025-02-13 20:29:26 -05:00
e38be640e6 Revert "Add label if pre-commit passes" (#13242) Harry Mellor 2025-02-14 00:12:32 +00:00
c1e37bf71b [Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198) Tyler Michael Smith 2025-02-13 19:01:14 -05:00
2344192a55 Optimize moe_align_block_size for deepseek_v3 (#12850) Michael Goin 2025-02-13 18:43:37 -05:00
bffddd9a05 Add label if pre-commit passes (#12527) Harry Mellor 2025-02-13 20:51:30 +00:00
d84cef76eb [Frontend] Add /v1/audio/transcriptions OpenAI API endpoint (#12909) Nicolò Lucchesi 2025-02-13 16:23:45 +01:00
37dfa60037 [Bugfix] Missing Content Type returns 500 Internal Server Error (#13193) Vaibhav Jain 2025-02-13 20:22:22 +05:30
1bc3b5e71b [VLM] Separate text-only and vision variants of the same model architecture (#13157) Cyrus Leung 2025-02-13 22:19:15 +08:00
02ed8a1fbe [Misc] Qwen2.5-VL Optimization (#13155) 燃 2025-02-13 22:17:57 +08:00
2092a6fa7d [V1][Core] Add worker_base for v1 worker (#12816) Aoyu 2025-02-13 20:35:18 +08:00
c9d3ecf016 [VLM] Merged multi-modal processor for Molmo (#12966) Cyrus Leung 2025-02-13 20:34:00 +08:00
fdcf64d3c6 [V1] Clarify input processing and multimodal feature caching logic (#13211) Roger Wang 2025-02-13 03:43:24 -08:00
578087e56c [Frontend] Pass pre-created socket to uvicorn (#13113) Russell Bryant 2025-02-13 03:51:46 -05:00
fa253f1a70 [VLM] Remove input processor from clip and siglip (#13165) Isotr0py 2025-02-13 16:31:37 +08:00
9605c1256e [V1][core] Implement pipeline parallel on Ray (#12996) Rui Qiao 2025-02-13 00:02:46 -08:00
0ccd8769fb [CI/Build] Allow ruff to auto-fix some issues (#13180) Russell Bryant 2025-02-13 02:45:38 -05:00
cb944d5818 Allow Unsloth Dynamic 4bit BnB quants to work (#12974) Daniel Han 2025-02-12 23:13:08 -08:00
d46d490c27 [Frontend] Move CLI code into vllm.cmd package (#12971) Russell Bryant 2025-02-13 02:12:21 -05:00
04f50ad9d1 [Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case (#13097) LikeSundayLikeRain 2025-02-13 02:11:26 -05:00
60c68df6d1 [Build] Automatically use the wheel of the base commit with Python-only build (#13178) Cody Yu 2025-02-12 23:10:28 -08:00
009439caeb Simplify logic of locating CUDART so file path (#13203) Lu Fang 2025-02-12 21:52:41 -08:00
bc55d13070 [VLM] Implement merged multimodal processor for Mllama (#11427) Isotr0py 2025-02-13 12:26:21 +08:00
d88c8666a1 [Bugfix][Example] Fix GCed profiling server for TPU (#12792) Michael Goin 2025-02-12 22:52:11 -05:00
4fc5c23bb6 [NVIDIA] Support nvfp4 quantization (#12784) Kaixi Hou 2025-02-12 19:51:51 -08:00
9f9704dca6 [perf-benchmark] cleanup unused Docker images and volumes in H100 benchmark instance (#12706) Kevin H. Luu 2025-02-12 19:51:33 -08:00
8eafe5eaea [CI/Build] Ignore ruff warning up007 (#13182) Russell Bryant 2025-02-12 22:48:31 -05:00
4c0d93f4b2 [V1][Bugfix] Copy encoder input ids to fix set iteration issue during VLM abort (#13173) Murali Andoorveedu 2025-02-12 12:58:11 -08:00
14b7899d10 [CI] Fix failing FP8 cpu offload test (#13170) Michael Goin 2025-02-12 14:16:06 -05:00
09972e716c [Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity (#13119) Michael Goin 2025-02-12 12:19:53 -05:00
36a08630e8 [CORE] [QUANT] Support for GPTQModel's dynamic quantization per module override/control (#7086) Qubitium-ModelCloud 2025-02-13 01:19:43 +08:00
2c2b560f48 [CI/Build] Use mypy matcher for pre-commit CI job (#13162) Russell Bryant 2025-02-12 12:12:22 -05:00
042c3419fa Introduce VLLM_CUDART_SO_PATH to allow users specify the .so path (#12998) Lu Fang 2025-02-12 09:06:13 -08:00
82cabf53a3 [Misc] Delete unused LoRA modules (#13151) Jee Jee Li 2025-02-13 00:58:24 +08:00
314cfade02 [Frontend] Generate valid tool call IDs when using tokenizer-mode=mistral (#12332) Rafael Vasquez 2025-02-12 11:29:56 -05:00
985b4a2b19 [Bugfix] Fix num video tokens calculation for Qwen2-VL (#13148) Cyrus Leung 2025-02-12 19:55:23 +08:00
f4d97e4fc2 [Bug] [V1] Try fetching stop_reason from EngineOutput before checking the request (#13108) bnellnm 2025-02-12 05:39:16 -05:00
f1042e86f0 [Misc] AMD Build Improvements (#12923) Shiyan Deng 2025-02-12 02:36:10 -08:00
7c4033acd4 Further reduce the HTTP calls to huggingface.co (#13107) Maximilien de Bayser 2025-02-12 07:34:09 -03:00
d59def4730 Bump actions/setup-python from 5.3.0 to 5.4.0 (#12672) dependabot[bot] 2025-02-12 16:41:22 +08:00
0c7d9effce Bump helm/chart-testing-action from 2.6.1 to 2.7.0 (#12463) dependabot[bot] 2025-02-12 16:41:06 +08:00
dd3b4a01f8 Bump actions/stale from 9.0.0 to 9.1.0 (#12462) dependabot[bot] 2025-02-12 00:40:25 -08:00
a0597c6b75 Bump helm/kind-action from 1.10.0 to 1.12.0 (#11612) dependabot[bot] 2025-02-12 00:40:19 -08:00
e92694b6fe [Neuron][Kernel] Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency (#12921) Lingfan Yu 2025-02-11 21:12:37 -08:00
842b0fd402 [ci] Add more source file dependencies for some tests (#13123) Kevin H. Luu 2025-02-11 20:38:10 -08:00
974dfd4971 [Model] IBM/NASA Prithvi Geospatial model (#12830) Christian Pinto 2025-02-12 04:34:30 +00:00
3ee696a63d [RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518) Keyun Tong 2025-02-11 20:25:58 -08:00
72c2b68dc9 [Misc] Move pre-commit suggestion back to the end (#13114) Russell Bryant 2025-02-11 17:34:16 -05:00
14ecab5be2 [Bugfix] Guided decoding falls back to outlines when fails to import xgrammar (#12976) Yuan Tang 2025-02-11 13:17:44 -05:00
deb6c1c6b4 [Doc] Improve OpenVINO installation doc (#13102) Harry Mellor 2025-02-11 18:02:46 +00:00
565c1efa65 [CI/Build][Bugfix] Fix CPU backend default threads num (#13077) Li, Jiang 2025-02-12 00:55:56 +08:00
2b25b7d2e1 Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 (#13023) Szymon Ożóg 2025-02-11 17:38:48 +01:00
6c4dbe23eb [BugFix] Pop instead of del CUDA_VISIBLE_DEVICES (#12962) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-02-11 18:21:50 +02:00
21f5d50fa5 [Bugfix] Do not use resource module on Windows (#12858) (#13029) MoonRide303 2025-02-11 17:21:18 +01:00
bf3e05215c [Misc] Fix typo at comments at metrics.py (#13024) Jewon Lee 2025-02-12 01:20:37 +09:00
ad9776353e Set torch_dtype in TransformersModel (#13088) Harry Mellor 2025-02-11 15:51:19 +00:00
75e6e14516 [V1][Metrics] Add several request timing histograms (#12644) Mark McLoughlin 2025-02-11 15:14:00 +00:00
110f59a33e [Bugfix] fix flaky test (#13089) மனோஜ்குமார் பழனிச்சாமி 2025-02-11 20:11:20 +05:30
2e3b969ec0 [Platform] add pre_register_and_update function (#12432) wangxiyuan 2025-02-11 22:06:46 +08:00
da317197dd [Build] Fix cuda link target of cumem_allocator in CPU env (#12863) Yuhong Guo 2025-02-11 21:55:57 +08:00
7539bbc6a6 [ROCm] Using a more precise memory profiling (#12624) Gregory Shtrasberg 2025-02-11 08:47:10 -05:00
9cf4759493 [executor] init local_rank as device index (#13027) Mengqing Cao 2025-02-11 21:20:53 +08:00
41c5dd45b9 [V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592) Cody Yu 2025-02-11 00:27:25 -08:00
fc6485d277 [Bugfix]: Reasoning output bug according to the chat template change (#13025) Ce Gao 2025-02-11 15:49:03 +08:00
78a141d768 [Misc] LoRA - Refactor Punica ops tests (#12970) Varun Sundar Rabindranath 2025-02-11 12:56:03 +05:30
c320ca8edd [Core] Don't do platform detection at import time (#12933) Russell Bryant 2025-02-11 02:25:25 -05:00
58047c6f04 [Benchmark] Add BurstGPT to benchmark_serving (#13063) Woosuk Kwon 2025-02-10 21:25:30 -08:00
cb080f32e3 [Bugfix] Support missing tool parameters in mistral tokenizer (#12884) Florian Greinacher 2025-02-11 04:33:33 +01:00
2c0f58203c [Docs] Annouce Meta Meetup (#13065) Simon Mo 2025-02-10 18:24:29 -08:00
2ff4857678 [V1][Minor] Move scheduler outputs to a separate file (#13062) Woosuk Kwon 2025-02-10 18:10:06 -08:00
91e876750e [misc] Fix setup.py condition to avoid AMD from being mistaken with CPU (#13022) Kevin H. Luu 2025-02-10 18:06:16 -08:00
08b2d845d6 [Model] Ultravox Model: Support v0.5 Release (#12912) Farzad Abdolhosseini 2025-02-10 14:02:48 -08:00
2ae889052c Fix seed parameter behavior in vLLM (#13007) மனோஜ்குமார் பழனிச்சாமி 2025-02-10 20:56:50 +05:30
51f0b5f7f6 [Bugfix] Clean up and fix multi-modal processors (#13012) Cyrus Leung 2025-02-10 18:45:21 +08:00
fde71262e0 [misc] Add retries with exponential backoff for HF file existence check (#13008) Kevin H. Luu 2025-02-10 01:15:02 -08:00
243137143c [Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003) Yuan Tang 2025-02-10 01:09:33 -05:00
b2496bb07f [core] fix sleep mode and pytorch checkpoint compatibility (#13001) youkaichao 2025-02-10 13:03:43 +08:00

... 112 113 114 115 116 ...