Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

950751a987 [v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders (#17483) Chen Zhang 2025-05-11 07:12:04 +08:00
4c31218f80 [Misc] remove --model from vllm serve usage (#17944) Reid 2025-05-10 21:23:31 +08:00
68311891f5 Don't default construct ModelConfig when default constructing VllmConfig (#17943) Harry Mellor 2025-05-10 14:23:00 +01:00
fc4441a4ee Add missing content type headers to /ping and /health (#17036) (#17786) Ximo Guanter 2025-05-10 08:13:32 +02:00
246e3e0a36 fix broken test vllm:test_kernels - test_attention_selector.py::test_flash_attn (#17873) tracelogfb 2025-05-09 19:46:54 -07:00
7042cc96b0 [V1][Spec Decoding] Log accumulated metrics after system goes idle (#17913) Mark McLoughlin 2025-05-10 02:23:07 +01:00
0c0fdae84f [Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362) Pavani Majety 2025-05-09 16:24:41 -07:00
3b602cdea7 AMD conditional all test execution // new test groups (#17556) Alexei-V-Ivanov-AMD 2025-05-09 17:35:58 -05:00
4b2ed7926a Improve configs - the rest! (#17562) Harry Mellor 2025-05-09 23:18:44 +01:00
7e3571134f [V1][Spec Decoding] Include bonus tokens in mean acceptance length (#17908) Mark McLoughlin 2025-05-09 21:32:36 +01:00
ea2236bf95 Add option to use torch._inductor.standalone_compile (#17057) Richard Zou 2025-05-09 15:59:04 -04:00
7d4aedae7c Handle error when str passed to /v1/audio/transcriptions (#17909) Harry Mellor 2025-05-09 20:23:59 +01:00
22481fbfa3 Update CT WNA16MarlinMoE integration (#16666) Michael Goin 2025-05-09 11:19:45 -06:00
5c4c08f6f1 [Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config (#17265) Isotr0py 2025-05-10 01:16:12 +08:00
c44c384b1c [Misc] Add references in ray_serve_deepseek example (#17907) Rui Qiao 2025-05-09 09:59:36 -07:00
85b72cb7b1 Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" (#17910) Michael Goin 2025-05-09 09:58:18 -06:00
6e5595ca39 [CI/Build] Automatically retry flaky tests (#17856) Cyrus Leung 2025-05-09 23:55:17 +08:00
200da9a517 [v1] Move block management logic from KVCacheManager to SpecializedManager (#17474) Chen Zhang 2025-05-09 23:25:34 +08:00
9f64e93415 [BugFix][AMD] Compatible patch for latest AITER(05/07/2025) (#17864) qli88 2025-05-09 09:59:36 -05:00
ec61ea20a8 [Misc] add dify integration (#17895) Reid 2025-05-09 18:42:39 +08:00
c6798baa9c Change top_k to be disabled with 0 (still accept -1 for now) (#17773) Harry Mellor 2025-05-09 11:01:49 +01:00
5b2dcbf0b8 Fix Whisper crash caused by invalid`` max_num_batched_tokens`` config (#17853) inkcherry 2025-05-09 17:16:26 +08:00
6e4a93e3f7 [Bugfix][CPU] Fix broken AVX2 CPU TP support (#17252) Isotr0py 2025-05-09 16:55:14 +08:00
217db4baa6 [Bugfix][ROCm] Fix AITER MLA V1 (#17880) vllmellm 2025-05-09 16:38:21 +08:00
ff8c400502 [Doc] remove visible token in doc (#17884) Yan Ma 2025-05-09 16:21:31 +08:00
89a0315f4c [Doc] Update several links in reasoning_outputs.md (#17846) Michael Yao 2025-05-09 16:20:55 +08:00
3d1e387652 [Docs] Add Slides from NYC Meetup (#17879) Simon Mo 2025-05-08 21:46:54 -07:00
d310e6de98 [BUGFIX]: return fast when request requires prompt logprobs (#17251) Ning Xie 2025-05-09 12:25:41 +08:00
5e6f939484 [Attention] MLA move rotary embedding to cuda-graph region (#17668) Lucas Wilkinson 2025-05-08 23:14:42 -04:00
760e3ecc8f [V1][Structured Output] Update llguidance (>= 0.7.11) to avoid AttributeError (no StructTag) (#17839) Shanshan Shen 2025-05-09 11:14:18 +08:00
3c9396a64f [FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523) vllmellm 2025-05-09 10:42:05 +08:00
376786fac1 Add cutlass support for blackwell fp8 blockwise gemm (#14383) Shu Wang 2025-05-08 17:09:55 -05:00
4f605a6de5 Fix noisy warning for uncalibrated q_scale/p_scale (#17414) Michael Goin 2025-05-08 15:56:59 -04:00
8342e3abd1 [CI] Prune down lm-eval small tests (#17012) Michael Goin 2025-05-08 15:00:26 -04:00
a83a0f92b5 [Test] Attempt all TPU V1 tests, even if some of them fail. (#17334) yarongmu-google 2025-05-08 10:20:54 -07:00
226a4272cf [V1] Improve VLLM_ALLOW_INSECURE_SERIALIZATION logging (#17860) Russell Bryant 2025-05-08 12:57:35 -04:00
ec54d73c31 [CI] Fix test_collective_rpc (#17858) Russell Bryant 2025-05-08 12:47:12 -04:00
a944f8ede7 [Misc] Delete LoRA-related redundancy code (#17841) Jee Jee Li 2025-05-08 21:02:21 +08:00
015815fe01 [Bugfix] use_fast failing to be propagated to Qwen2-VL image processor (#17838) Cyrus Leung 2025-05-08 20:39:21 +08:00
e4ca6e3a99 Fix transient dependency error in docs build (#17848) Harry Mellor 2025-05-08 11:42:03 +01:00
53d0cb7423 [Misc] add chatbox integration (#17828) Reid 2025-05-08 18:05:26 +08:00
f50dcb7c21 [Easy] Eliminate c10::optional usage in vllm/csrc (#17819) Lu Fang 2025-05-08 03:05:10 -07:00
a1e19b635d [Doc] Fix a typo in the file name (#17836) Cyrus Leung 2025-05-08 18:04:18 +08:00
bb239a730f [Bugfix] Fix quark fp8 format loading on AMD GPUs (#12612) fxmarty-amd 2025-05-08 11:53:53 +02:00
a463555dee [TPU] Fix the test_sampler (#17820) Jevin Jiang 2025-05-08 02:51:33 -07:00
ca04b97c93 [Bugfix] Fix tool call template validation for Mistral models (#17644) Rick Yuan 2025-05-08 17:47:19 +08:00
0a9bbaa104 [Misc] support model prefix & add deepseek vl2 tiny fused moe config (#17763) xsank 2025-05-08 15:50:22 +08:00
39956efb3f [Bugfix] Fix bad words for Mistral models (#17753) Qiong Zhou Huang 2025-05-07 23:32:10 -07:00
597051e56f [Qwen3]add qwen3-235b-bf16 fused moe config on A100 (#17715) Ximingwang-09 2025-05-08 14:09:32 +08:00
96722aa81d [Frontend] Chat template fallbacks for multimodal models (#17805) Cyrus Leung 2025-05-08 14:05:54 +08:00
843b222723 [Hardware][Intel-Gaudi] Support Automatic Prefix Caching on HPU (#17648) Agata Dobrzyniewicz 2025-05-08 07:37:03 +02:00
e515668edf [Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER (#17153) Akash kaothalkar 2025-05-08 11:05:03 +05:30
5a499e70d5 [Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs (#17071) Hashem Hashemi 2025-05-07 22:34:49 -07:00
6930a41116 [V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var (#17490) Russell Bryant 2025-05-08 01:34:02 -04:00
998eea4a0e Only log non-default CLI args for online serving (#17803) Harry Mellor 2025-05-08 06:33:29 +01:00
c747d84576 [Installation] OpenTelemetry version update (#17771) Mikhail Podvitskii 2025-05-08 07:32:49 +02:00
b2da14a05a Improve exception reporting in MP engine (#17800) Vadim Markovtsev 2025-05-08 07:32:39 +02:00
7ea2adb802 [Core] Support full cuda graph in v1 (#16072) Chanh Nguyen 2025-05-07 22:30:15 -07:00
3d13ca0e24 [BugFix] Fix --disable-log-stats in V1 server mode (#17600) Nick Hill 2025-05-07 21:08:15 -07:00
66ab3b13c9 Don't call the venv vllm (#17810) Harry Mellor 2025-05-08 05:06:39 +01:00
a8238bbdb0 [Chore][Doc] uses model id determined from OpenAI client (#17815) Aaron Pham 2025-05-07 21:48:57 -04:00
d43f914d42 [Core][Feature] Input metadata dump on crash (#13407) Wallas Henrique 2025-05-07 19:15:09 -03:00
ed5272cf21 [BugFix] Avoid secondary missing MultiprocExecutor.workers error (#17811) Nick Hill 2025-05-07 14:55:04 -07:00
c20ef40fd0 [Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238) Akshat Tripathi 2025-05-07 21:28:47 +01:00
db593aa67f [Quantization] Quark MXFP4 format loading (#16943) Bowen Bao 2025-05-07 12:05:05 -07:00
f98e307588 [Bugfix] Fix missing lora name mapping for lora without prefix (#17793) Isotr0py 2025-05-08 00:17:12 +08:00
646a31e51e Fix and simplify deprecated=True CLI kwarg (#17781) Harry Mellor 2025-05-07 16:51:06 +01:00
be8ff88e66 [Bugfix] Fix Video IO error for short video (#17791) Isotr0py 2025-05-07 23:36:06 +08:00
1a6af1453d Only depend on importlib-metadata for Python < 3.10 (#17776) Christian Heimes 2025-05-07 16:51:06 +02:00
32aa74c09c [ROCm][FP8][Kernel] FP8 quantization fused into Custom Paged Attention (#17139) Gregory Shtrasberg 2025-05-07 10:12:35 -04:00
7377dd0307 [doc] update the issue link (#17782) Reid 2025-05-07 20:29:05 +08:00
98c89e16ff Make key optional for rotary embedding (#17566) Yong Hoon Shin 2025-05-07 00:11:46 -07:00
324a3119b0 Fix test_memory_usage_no_spec (#17754) Yong Hoon Shin 2025-05-07 00:10:33 -07:00
8a15c2603a [Frontend] Add missing chat templates for various MLLMs (#17758) Cyrus Leung 2025-05-07 15:10:01 +08:00
043e4c4955 Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357) Satyajith Chilappagari 2025-05-07 00:07:30 -07:00
ba7703e659 [Misc] Remove qlora_adapter_name_or_path (#17699) Jee Jee Li 2025-05-07 14:10:37 +08:00
f80ae5bdcf [Kernel] Use fused rmsnorm for some models like qwen3 series (#17735) Wanrui Dai 2025-05-07 14:10:02 +08:00
1a45a61387 [Kernel] GGUF MoeVec kernel (#16780) Szymon Ożóg 2025-05-07 14:07:23 +08:00
c3e9d5060e [Misc] Use apply_rotary_emb from vllm_flash_attn for Qwen2-VL vision RoPE (#17726) Isotr0py 2025-05-07 12:51:33 +08:00
822de7fb94 [Misc] Split model loader (#17712) Jee Jee Li 2025-05-07 12:42:26 +08:00
8d84d836d1 [BugFix][Spec Decode] Fix hidden size mismatch between target and eagle head (#17740) Woosuk Kwon 2025-05-06 19:51:26 -07:00
950b71186f Replace lm-eval bash script with pytest and use enforce_eager for faster CI (#17717) Michael Goin 2025-05-06 21:00:10 -04:00
e50a1f1a9c [TPU] Add kernel test for moe_pallas (#17496) Michael Goin 2025-05-06 20:59:57 -04:00
a17cef70ea Removed unused marlin cuda code (#17684) Michael Goin 2025-05-06 20:59:47 -04:00
18dd5e01f2 [Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels (#17146) Chih-Chieh Yang 2025-05-06 20:59:30 -04:00
6de3e13413 Add logging for torch nightly version (#17669) Yang Wang 2025-05-06 17:45:51 -07:00
ed3a1d2106 [ROCm] fix num_stages for default moe config to avoid triton OutOfResource error (#17744) Hongxia Yang 2025-05-06 20:39:48 -04:00
022afbeb4e Fix doc build performance (#17748) Harry Mellor 2025-05-07 01:36:41 +01:00
2f925e5777 [Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (#16828) Thomas Parnell 2025-05-06 18:21:48 -04:00
de906b95f9 [Bugfix] Fix for the condition to accept empty encoder inputs for mllama (#17732) Gregory Shtrasberg 2025-05-06 15:59:06 -04:00
d456aea71f [Misc] Add Next Edit Prediction (NEP) datasets support in benchmark_serving.py (#16839) d.transposed 2025-05-06 21:38:45 +02:00
621ca2c0ab [TPU] Increase block size and reset block shapes (#16458) Jevin Jiang 2025-05-06 10:55:04 -07:00
6115b11582 Make right sidebar more readable in "Supported Models" (#17723) Harry Mellor 2025-05-06 17:48:26 +01:00
5b8c390747 [Bugfix] Fix modality limits in vision language example (#17721) Cyrus Leung 2025-05-07 00:12:28 +08:00
7525d5f3d5 [doc] Add RAG Integration example (#17692) Reid 2025-05-07 00:10:23 +08:00
aabcd2cae3 [v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (#17479) Chen Zhang 2025-05-06 23:50:34 +08:00
0d115460a7 [Docs] Use gh-file to add links to tool_calling.md (#17709) Michael Yao 2025-05-06 23:27:19 +08:00
175bda67a1 [Feat] Add deprecated=True to CLI args (#17426) Aaron Pham 2025-05-06 11:11:27 -04:00
cba31c47c4 [v1] AttentionMetadata for each layer (#17394) Chen Zhang 2025-05-06 22:58:37 +08:00
a6fed02068 [V1][PP] Support PP for MultiprocExecutor (#14219) Li, Jiang 2025-05-06 22:58:05 +08:00

... 94 95 96 97 98 ...