Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

883131544f [Bugfix] Update import path for bc_linter_include (#24766) Mohammad Miadh Angkad 2025-09-18 04:33:11 +08:00
ee5fd49150 [Misc] Update owners for KV connector and V1 offloading (#25041) Yihua Cheng 2025-09-17 12:37:29 -07:00
7ae9887542 [V1] Logits processor docs (#22919) afeldman-nm 2025-09-17 14:53:12 -04:00
e3db5ebb66 [CI Bugfix] Fix failing test_model_load_with_params tests due to tokenizer refactor (#25086) Michael Goin 2025-09-17 14:15:05 -04:00
9d442b7c48 [V0 Deprecation] Remove V0 tests in test_sequence.py (#25088) Woosuk Kwon 2025-09-17 11:08:45 -07:00
eb68c2dcd9 [CI] Revert back prepare_prompts and check_answers (#25087) Woosuk Kwon 2025-09-17 11:03:16 -07:00
8b32464ac1 Change log level from info to debug for IOProcessor (#24999) Michael Goin 2025-09-17 13:21:28 -04:00
99cc41ad50 [V0 Deprecation] Remove unused output processor util (#25023) Woosuk Kwon 2025-09-17 09:50:07 -07:00
d6a518fdde Remove unused find_cuda_init helper script (#25044) Simon Mo 2025-09-17 09:47:40 -07:00
4aa8c7b047 cleanup: remove adapter commons (#25045) Simon Mo 2025-09-17 09:46:29 -07:00
4b946d693e [V0 Deprecation] Remove V0 Core tests (#25082) Woosuk Kwon 2025-09-17 09:32:42 -07:00
087c6ffc92 [CI Bugfix] Fix failing test_invalid_env (#25078) Michael Goin 2025-09-17 11:28:58 -04:00
4a2d33e371 [Docs] vllm/benchmarks/datasets.py fix docstring param format. (#24970) samzong 2025-09-17 23:11:51 +08:00
8f3616f422 Remove old cutlass mla (#23961) Matthew Bonanni 2025-09-17 10:31:43 -04:00
47f670b03b [Docs] improve code formatting and comments for eliminate griffe build warning. (#25010) samzong 2025-09-17 22:31:20 +08:00
dd6a910aac [Bugfix][Qwen3-Next] fixes the varlen issue in qwen3-next's MTP implementation. (#24957) Tao He 2025-09-17 21:59:09 +08:00
1b962e2457 [fix] lora benchmarks pass no_lora_flag_cpu (#23774) dolpm 2025-09-17 06:22:25 -07:00
bfe9380161 Apply fixes for CUDA 13 (#24599) Aidyn-A 2025-09-17 17:15:42 +04:00
9fccd04e30 [Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check (#25046) Li, Jiang 2025-09-17 20:54:02 +08:00
252ada5559 Add RADIO Vision Encoder Support to vLLM (#24595) danielafrimi 2025-09-17 15:53:30 +03:00
e120533d7a [Misc] Avoid use of deprecated AutoModelForVision2Seq (#25065) Cyrus Leung 2025-09-17 20:19:15 +08:00
2b85697031 [BugFix] enable DOTALL to match multi-line tool_call parameters in extract_tool_call_required_streaming (#24668) Shijun Yin 2025-09-17 17:21:18 +08:00
544fe76b95 [Frontend] Support returning all prompt logprobs (#24956) Chauncey 2025-09-17 17:03:52 +08:00
bb58dc8c20 [DP] Create placement groups by ray_device_key (#25026) Xinyu Chen 2025-09-17 16:57:25 +08:00
0fb2551c23 [Docs] Fix griffe warning in base_static_graph.py (#25018) Michael Yao 2025-09-17 16:49:19 +08:00
6c47f6bfa4 [Core] Remove tokenizer group in vLLM (#24078) Zhuohan Li 2025-09-17 01:42:59 -07:00
c15309a730 [Model] Apply SharedFusedMoE to glm4_moe. (#24849) whx 2025-09-17 16:02:31 +08:00
4a9375fe9d [Model] Pass param prefix to LLMHead (#24862) whx 2025-09-17 16:01:27 +08:00
03191cd8f0 [Core][MultiModalHasher] Hash images without converting image mode (#24969) Lukas Geiger 2025-09-17 08:57:34 +01:00
b77bf34e53 [EPLB] Support EPLB for Mixtral Model (#22842) rouchenzi 2025-09-17 00:27:34 -07:00
dd39baf717 [XPU] Fix xpu model runner call torch.cuda APIs (#25011) Kunshang Ji 2025-09-17 14:45:25 +08:00
43a62c51be Add more documentation and improve usability of lognormal dist (benchmark_serving_multi_turn) (#23255) Daniel Serebrenik 2025-09-17 08:53:17 +03:00
ca2d1925ef [Rocm] [quantization] Fix quark ptpc moe and add test case (#24649) haoyangli-amd 2025-09-17 13:15:13 +08:00
0f7acdd73c [Model] Support Qwen3-VL Model Series (#24727) Roger Wang 2025-09-16 22:01:04 -07:00
5801e49776 [V0 Deprecation] Remove MQLLMEngine (#25019) Woosuk Kwon 2025-09-16 21:29:27 -07:00
58d4c705a8 [Core] Get num_encoder_tokens from scheduler config (#24989) Russell Bryant 2025-09-16 23:59:07 -04:00
ea3de5ef0d [misc] fix typo in value error (#24995) Prashant Gupta 2025-09-16 20:58:38 -07:00
67532a1a68 [UX] Remove "quantization is not fully optimized yet" log (#25012) Michael Goin 2025-09-16 23:57:51 -04:00
5672ba90bd [Docs] fix invalid doc link (#25017) yyzxw 2025-09-17 11:53:23 +08:00
dd83a157f1 [UX] Enforce valid choices for envs like VLLM_ATTENTION_BACKEND, etc (#24761) Michael Goin 2025-09-16 23:42:23 -04:00
5a411ef6c4 [Benchmarks] Add MMVU video dataset support and clean up deprecated datasets (#24719) Isotr0py 2025-09-17 11:29:43 +08:00
eeb135eb87 [Core] Use CpuGpuBuffer for block table tensors (#24795) Nick Hill 2025-09-16 19:18:06 -07:00
3059b9cc6b [Doc] Add --force-overwrite option to generate_cmake_presets.py (#24375) elvischenv 2025-09-17 09:45:29 +08:00
64ad551878 Removes source compilation of nixl dependency (#24874) Benjamin Bartels 2025-09-17 02:33:18 +01:00
cef32104b4 [FP8] Extend per-token-group quantization support to QuantFP8 (#24342) Tahsin Tunan 2025-09-17 07:31:06 +06:00
493b10f8bf [CI] GPT-OSS GPQA eval test for Blackwell (#24920) Michael Goin 2025-09-16 21:13:21 -04:00
d119fc8614 [CI][Bugfix] Fix failing Blackwell test (#24993) Matthew Bonanni 2025-09-16 18:55:02 -04:00
dbebb7f812 [Perf] Reuse workspace for FP8+FP4 Marlin MoE (#20500) Michael Goin 2025-09-16 17:45:10 -04:00
3053a22b33 fp8 kv cache support fix for torch.compile (#22758) Aleksandr Malyshev 2025-09-16 14:27:11 -07:00
02d4b85454 Use kwargs for long lists of EngineCoreRequest arguments in tests and fix extra kwargs (#24987) Andrew Sansom 2025-09-16 16:06:56 -05:00
86daa875fe [gpt-oss][1][bugfix] fix streaming final output (#24466) Andrew Xia 2025-09-16 12:56:16 -07:00
dcf2f3ec06 [ROCm] Add dependencies for ROCm (#24900) Concurrensee 2025-09-16 14:49:06 -05:00
218454b9b2 [MISC] Add code owners of vllm/v1 to vllm/v1/core (#24928) Chen Zhang 2025-09-16 12:07:34 -07:00
f4d6eb95cf [gpt-oss][1b] streaming add item id, content id (#24788) Andrew Xia 2025-09-16 11:41:12 -07:00
cd1f885bcf Directly get max encoder len from VLLM config in V1 (#24866) Sugar 2025-09-17 01:52:31 +08:00
d593cf28fa [Misc] Add removed encoder-decoder models to previously supported models list (#24961) Isotr0py 2025-09-17 01:46:46 +08:00
faa7a5daac [Bugfix] Fix unable to run encoder model when disable_hybrid_kv_cache_manager is true (#24571) lianyibo 2025-09-17 01:36:58 +08:00
567939953b [Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693) Sage Moore 2025-09-16 09:21:48 -07:00
08369289af [Core][MultiModalHasher] Don't convert memoryviews to bytes during hashing (#24925) Lukas Geiger 2025-09-16 16:32:47 +01:00
73cfb3c5ee [Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 (#24331) Chih-Chieh Yang 2025-09-16 10:53:43 -04:00
4e5affeaa1 [CI] Add Decode Context Parallelism (DCP) test to CI (#24487) Ming Yang 2025-09-16 06:21:28 -07:00
e4f0b4cd96 (doc): set cmake c++ compatible standard when building on MacOS CPU. (#23483) TeeKen Lau 2025-09-16 23:08:46 +10:00
de3e53a75b feat: Add Grafana and Perces monitoring dashboards for vLLM (#23498) liangwen12year 2025-09-16 08:53:40 -04:00
85e0df1392 [Docs] move benchmarks README to contributing guides (#24820) Ye (Charlotte) Qi 2025-09-16 05:52:57 -07:00
0faf3cc3e8 Move SpeculativeConfig from config/__init__.py to config/speculative.py (#24904) Harry Mellor 2025-09-16 12:51:35 +01:00
7ea5c73ad7 [Feat][EPLB] A novel static EPLB placement strategy for MoE models. (#23745) Chen Bruce 2025-09-16 18:55:16 +08:00
27fcfe7bcf [Mamba] Support TP>1 with quantization for mamba2 mixer in case n_groups % tp_size == 0 (#24593) tomeras91 2025-09-16 13:51:01 +03:00
68dbde5dbb [Bugfix] remove duplicate tokens streamed in required tool choice streaming (#23312) Cheng Kuan Yong Jason 2025-09-16 15:16:32 +08:00
04ad0dc275 [benchmark] Add triton version in the moe tuned config (#24769) Jee Jee Li 2025-09-16 14:10:54 +08:00
238c4c1705 [QWEN NEXT] Fused MoE kernels Optimization configs (#24924) Saman A. Pour 2025-09-15 22:06:03 -07:00
8c54610265 [Bug] [Spec Dec]: Fix kv_cache dtype mismatch for Eagle3 drafter on FP8 target (#24505) vllmellm 2025-09-16 12:45:38 +08:00
17871983a2 [Bugfix] Fix sequence parallelism bug when enable pipeline parallelism (#24021) cascade 2025-09-15 21:32:32 -07:00
759ef49b15 Remove V0 Encoder-Decoder Support (#24907) Woosuk Kwon 2025-09-15 21:17:14 -07:00
5206ab20ba [XPU] Fix circular import error. (#24927) Kunshang Ji 2025-09-16 11:35:36 +08:00
0af3ce1355 Upgrade flashinfer to 0.3.1 (#24470) Lu Fang 2025-09-15 19:36:09 -07:00
e1279ef00f [Docs] Update instructions for how to using existing torch binary (#24892) Richard Zou 2025-09-15 22:25:50 -04:00
2942970d44 [Metrics] Hide deprecated metrics with gpu_ prefix (#24245) Mark McLoughlin 2025-09-16 03:15:57 +01:00
3c96e7b8a1 [CI] Small Accuracy Eval Test for Deepseek Model (#24259) Wentao Ye 2025-09-15 22:14:50 -04:00
b42566f440 [Bug] Fix is_flashmla_supported Check Error (#24774) Wentao Ye 2025-09-15 22:10:55 -04:00
d96e11167d Add pytest-cov and .coveragerc (#24778) Reza Barazesh 2025-09-15 22:08:46 -04:00
2891603efd [ROCm][Bugfix] Fix the case where there's bias (#24895) Gregory Shtrasberg 2025-09-15 22:05:12 -04:00
de2cc3d867 [Deprecation] Remove DeepGEMM Old Symbol Wrapper (#24902) Wentao Ye 2025-09-15 22:03:29 -04:00
e95084308b Updated CODEOWNERS for flashinfer, mla, fused_moe (#24906) Michael Goin 2025-09-15 22:01:28 -04:00
7f6f2c1182 HuggingFace -> Hugging Face in Integration with Hugging Face docs (#24889) Sergio Paniego Blanco 2025-09-16 02:28:35 +02:00
5bcc153d7b [Compile] Fix noop_elimination pass and add tests for noop_elimination (#24880) Jiangyun Zhu 2025-09-16 07:33:18 +08:00
45bfa49cb8 [Tests] fix initialization of kv hash in tests (#24273) Mickaël Seznec 2025-09-15 23:48:27 +02:00
fd2f10546c [ci] fix wheel names for arm wheels (#24898) Simon Mo 2025-09-15 14:39:08 -07:00
e757a629e7 [Bug] Fix Cutlass Scaled MM Compilation Error (#24887) Wentao Ye 2025-09-15 17:21:17 -04:00
aae725af7c [Performance] Remove redundant clone() calls in cutlass_mla (#24891) Alexander Matveev 2025-09-15 16:21:53 -04:00
73df49ef3a [gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still (#24759) Andrew Xia 2025-09-15 13:08:08 -07:00
25aba2b6a3 [gpt-oss] Add IncompleteDetails to ResponsesRepsonse (#24561) Andrew Xia 2025-09-15 13:07:55 -07:00
94b03f88dd Bump Flashinfer to 0.3.1 (#24868) Benjamin Bartels 2025-09-15 20:45:55 +01:00
49bfc538e4 Update num_tokens_across_dp to use nccl instead of gloo (#24105) Sage Moore 2025-09-15 12:05:48 -07:00
a0b26701c9 [Transform] Deterministic Hadacore Transforms (#24106) Kyle Sayers 2025-09-15 19:59:31 +01:00
c4afdb69cc Move MultiModalConfig from config/__init__.py to config/multimodal.py (#24659) Harry Mellor 2025-09-15 18:43:16 +01:00
b834b4cbf1 [USAGE] Improve error handling for weight initialization in Unquantized… (#20321) Rafael Marcelino Koike 2025-09-15 12:45:49 -04:00
740f0647b1 Reinstate existing torch script (#24729) Harry Mellor 2025-09-15 17:43:40 +01:00
01413e0cf5 Fp8 paged attention update (#22222) xiao-llm 2025-09-15 10:43:26 -04:00
0e219cd50b [Bugfix] Fix GLM4.1V multimodal processor with compatability for Transformers v4.56 (#24822) Isotr0py 2025-09-15 20:45:06 +08:00
72c99f2a75 [Model]: support Ling2.0 (#24627) ant-yy 2025-09-15 20:09:30 +08:00

... 62 63 64 65 66 ...