Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

9b5b39b650 Update deprecated type hinting in vllm/lora (#18128) Harry Mellor 2025-05-14 11:57:59 +01:00
9ccc6ded42 [doc] add missing import (#18133) Reid 2025-05-14 18:57:34 +08:00
d62a076e84 [Model] GritLM supports other attention backends (#18109) Cyrus Leung 2025-05-14 18:33:19 +08:00
259127f8b8 [Bugfix] Fix LoRA test (#18123) Jee Jee Li 2025-05-14 18:25:47 +08:00
612c2edb4f [FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110) TJian 2025-05-14 18:03:11 +08:00
38fe728d60 [Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile (#17844) Andrzej Kotłowski 2025-05-14 11:39:51 +02:00
82e7f9bb03 [Misc] replace does not exist model (#18119) rongfu.leng 2025-05-14 17:13:47 +08:00
63dc3426e0 [Model] Add packed_modules_mapping for Qwen3-MOE (#18118) Jee Jee Li 2025-05-14 17:13:19 +08:00
8f5dc41481 [Bugfix] Fix entrypoints audio test failure (#18111) Cyrus Leung 2025-05-14 17:08:07 +08:00
63ad622233 [New Model]: support GTE NewModel (#17986) wang.yuqi 2025-05-14 16:31:31 +08:00
e7ef61c1f0 [Bugfix][Example] make lmcache v0 work. (#18051) majianpeng 2025-05-14 14:43:44 +08:00
d4154c35a2 [Bugfix] fix moe marlin topk_weight loading (#18080) Jinzhen Lin 2025-05-14 14:31:57 +08:00
6685890d11 [Fix] Move "model_config" as keyword args in chat_utils.py (#18098) lkchen 2025-05-13 23:27:26 -07:00
33011318c2 Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117) Ecthlion_zyy 2025-05-14 14:19:14 +08:00
4f8b373225 [BugFix][AMD] Compatible patch for AITER lib after 04/20 (#17912) qli88 2025-05-14 01:05:20 -05:00
7b2f28deba [AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082) Charlie Fu 2025-05-14 00:13:56 -05:00
2d912fb66f [FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (#17955) vllmellm 2025-05-14 13:03:47 +08:00
12e6c0b41c [Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig (#18086) Michael Goin 2025-05-13 23:36:17 -04:00
9a2a6357de [Bugfix] Fix FP8 Marlin MoE and enable for compressed-tensors models (#18026) Michael Goin 2025-05-13 22:48:33 -04:00
6266c57bae [core][distributed] add ep group and all2all interface (#18077) youkaichao 2025-05-14 10:46:49 +08:00
754b699cbe [Bug]: Fix S3 model/tokenizer path resolution (#18083) Jon Gill 2025-05-13 19:34:17 -07:00
6e27c6d86b [Misc] Remove unused numpy tensor (#18084) Roger Wang 2025-05-13 19:33:40 -07:00
d5af47a149 [P/D] Add some more debug logs to NixlConnector (#18102) Nick Hill 2025-05-13 19:33:03 -07:00
65f0f74b66 [Hardware/NVIDIA/Modelopt] Fix modelopt forward method for v1 torch.compile (#18101) Pavani Majety 2025-05-13 19:33:00 -07:00
176a95c670 [Fix] Support CUDAGraph capture for encoder-decoder on ROCm (#18104) Luka Govedič 2025-05-13 22:31:42 -04:00
f2ae883b67 [v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001) Chen Zhang 2025-05-14 10:09:39 +08:00
40de1ef455 [FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968) vllmellm 2025-05-14 10:08:20 +08:00
0189a65a2e [Docs] Expand security doc with firewall info (#18081) Russell Bryant 2025-05-13 15:36:00 -04:00
55aa7af994 [V1] DP scale-out (2/N): Decouple engine process management and comms (#15977) Nick Hill 2025-05-13 10:48:21 -07:00
0b217da646 Update deprecated type hinting in vllm/adapter_commons (#18073) Harry Mellor 2025-05-13 16:32:51 +01:00
19324d660c Update deprecated type hinting in vllm/compilation (#18072) Harry Mellor 2025-05-13 16:32:48 +01:00
fc407a1425 Give auto-merge label workflow permission to add labels to issues (#18078) Harry Mellor 2025-05-13 15:53:13 +01:00
009d9e7590 Convert benchmarks to ruff format (#18068) Harry Mellor 2025-05-13 14:43:29 +01:00
b922c2ebd2 [Bugfix] Fix entrypoints metrics tests (#18063) Cyrus Leung 2025-05-13 21:42:43 +08:00
00b14e0f16 [CI] set token permissions for pre-commit CI job (#17729) Russell Bryant 2025-05-13 09:38:30 -04:00
54e467e6f8 [CI] Add token permissions for add-ready-label CI job (#17730) Russell Bryant 2025-05-13 09:38:13 -04:00
79a1d25bbd [CI] Add workflow permissions for helm CI job (#17727) Russell Bryant 2025-05-13 08:49:07 -04:00
9944011b30 [CI] Set token permissions for reminder comment CI job (#17728) Russell Bryant 2025-05-13 08:46:58 -04:00
8c946cecca Update deprecated type hinting in vllm/transformers_utils (#18058) Harry Mellor 2025-05-13 12:34:37 +01:00
ff334ca1cd Update deprecated type hinting in vllm/profiler (#18057) Harry Mellor 2025-05-13 12:34:34 +01:00
6223dd8114 Update deprecated type hinting in model_executor/layers (#18056) Harry Mellor 2025-05-13 12:17:23 +01:00
906f0598fc [doc] add download/list/delete HF model CLI usage (#17940) Reid 2025-05-13 19:15:51 +08:00
cb528d0585 [Fix] check to make sure processor has chat templates (#18047) Aaron Pham 2025-05-13 06:04:10 -04:00
98fcba1575 Convert .buildkite to ruff format (#17656) Harry Mellor 2025-05-13 10:28:31 +01:00
23b3134eb5 [Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722) Russell Bryant 2025-05-13 04:47:29 -04:00
ea6ae8cb45 [Bugfix] Fix marlin moe fallback logic for llama4 (#18042) Michael Goin 2025-05-13 03:53:28 -04:00
2ff297dce9 [BugFix] Set default random seed to 0 for V1 (#17929) Woosuk Kwon 2025-05-13 00:52:19 -07:00
8dd0671bac [Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP (#17916) Jin Huang 2025-05-13 03:10:07 -04:00
f0d610a8ae [v1][KVCacheManager] Avoid full cache hit by controlling max_length (#17999) Chen Zhang 2025-05-13 14:50:38 +08:00
e57e4d6e9e Fix Broken macro for cutlass moe (#18049) Driss Guessous 2025-05-12 23:31:06 -07:00
ee5be834e7 [BugFix] Fix 4-GPU RLHF tests (#18007) Nick Hill 2025-05-12 23:03:55 -07:00
48545728d8 cleanup invalid prints (#18050) Calvin Chen 2025-05-13 14:01:57 +08:00
dc1a821768 [Feature][V1] Support tool_choice: required when using Xgrammar as the StructuredOutputBackend. (#17845) Chauncey 2025-05-13 14:01:31 +08:00
61e0a506a3 [Bugfix] Avoid repeatedly creating dummy data during engine startup (#17935) Cyrus Leung 2025-05-13 13:40:19 +08:00
1df491c522 [Bugfix] Fixes for new marlin moe usage (#18017) Michael Goin 2025-05-12 23:50:04 -04:00
d8487ef557 [ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 (#13779) Arjun Kathuria 2025-05-13 09:06:33 +05:30
c06af9a959 [Misc] Slight spelling modification (#18039) Jee Jee Li 2025-05-13 11:36:27 +08:00
60f7624334 Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844) Tao He 2025-05-13 10:52:47 +08:00
f6518b2b48 [ROCm] Skip tests for quantizations incompatible with ROCm (#17905) hissu-hyvarinen 2025-05-13 03:39:28 +03:00
d67085c2c8 Remove noisy warnings from SchedulerConfig (#17995) Harry Mellor 2025-05-13 01:33:45 +01:00
307939f299 Use NVFP4 Marlin for CompressedTensorsW4A16Fp4 (#18000) Michael Goin 2025-05-12 20:07:34 -04:00
9d7ea9dbbf Update some more deprecated type hinting (#17998) Harry Mellor 2025-05-13 00:49:33 +01:00
acee8f48aa [Model] Support MiMo-7B inference with MTP (#17433) bwshen-mi 2025-05-13 07:25:33 +08:00
f065de4e88 Fix FBGEMM integration (#18002) Michael Goin 2025-05-12 19:02:07 -04:00
dc9905368d [V1][Spec Decode] Eagle unit tests (#17350) wwl2755 2025-05-12 16:01:17 -07:00
ebab1ac37c [CI] Make JSON output tests less likely to fail (#17859) Russell Bryant 2025-05-12 18:31:54 -04:00
2b0db9b0e2 Enable standard language model for torhc nightly (#18004) Yang Wang 2025-05-12 14:00:04 -07:00
195adb47c0 [Chore] Remove unused method (#18024) Robert Shaw 2025-05-12 16:59:47 -04:00
302f3aca7e [v1][KVCacheManager] Change prefix caching metric from counting blocks to counting tokens (#18003) Chen Zhang 2025-05-13 04:46:12 +08:00
e9c730c9bd Enabling "Weight Loading Multiple GPU Test - Large Models" (#18020) Alexei-V-Ivanov-AMD 2025-05-12 15:05:33 -05:00
289199feb6 [Core] Use platform-agnostic device control for DP engine core (#17245) Jade Zheng 2025-05-13 03:09:16 +08:00
b9fd0d7a69 [CI/Build] Fix TPU V1 Test mixed use of & and && across tests (#17968) Carol Zheng 2025-05-12 12:06:59 -07:00
72a3f6b898 Construct KVTransferConfig properly from Python instead of using JSON blobs without CLI (#17994) Harry Mellor 2025-05-12 19:25:33 +01:00
98ea35601c [Lora][Frontend]Add default local directory LoRA resolver plugin. (#16855) Jonathan Berkhahn 2025-05-12 10:39:10 -07:00
d19110204c [P/D] NIXL Integration (#17751) Robert Shaw 2025-05-12 12:46:16 -04:00
05a4324f8e Initialize the delta tool call fields explicitly (#17340) Maximilien de Bayser 2025-05-12 10:28:58 -03:00
7ea6cb28b2 [Misc] Improve modelscope import error (#17983) Jee Jee Li 2025-05-12 18:46:45 +08:00
9fbf2bfbd5 Correcting testcases in builkite job for IBM Power (#17675) Aaruni Aggarwal 2025-05-12 13:41:55 +05:30
3a5ea75129 [Feature] Support DeepSeekV3 Function Call (#17784) Xu Wenqing 2025-05-12 15:45:21 +08:00
891b9d33de [Fix] Benchmark "EngineClient" has no attribute "model_config" (#17976) Brayden Zhong 2025-05-12 01:55:53 -04:00
430783018c [Bugfix][TPU] Use np array when updating cache slot_mapping (#17971) Siyuan Liu 2025-05-11 21:58:33 -07:00
19a3c78d1f [Bugfix] Fix pydantic.errors.PydanticUserError (#17962) Li Wang 2025-05-12 12:58:23 +08:00
ada50aa295 [bugfix] fix the wrong parser (#17958) Reid 2025-05-12 12:58:02 +08:00
08bf784078 [Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails (#17623) Cheng Kuan Yong Jason 2025-05-12 09:06:10 +08:00
d45fe333fb [misc] add instructions on how to install nvshmem/pplx/deepep (#17964) youkaichao 2025-05-12 09:02:39 +08:00
021c16c7ca [Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861) Isotr0py 2025-05-12 08:56:30 +08:00
7de18d541b [BUG] [ROCm] [MLA] Fix variable name bug due to change in variable name in PR #17483 (#17961) TJian 2025-05-12 00:14:30 +08:00
a810b5b088 [BugFix] [ROCm]: Bugfix and handle addition case of input for rocm_aiter_rms_norm (#17857) TJian 2025-05-11 19:17:11 +08:00
009b3d5382 [Misc] not show --model in vllm serve --help (#16691) Reid 2025-05-11 16:47:58 +08:00
e4b8713380 [New Model]: nomic-embed-text-v2-moe (#17785) wang.yuqi 2025-05-11 15:59:43 +08:00
06c0922a69 [FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 (#17870) Gregory Shtrasberg 2025-05-11 03:58:45 -04:00
cd3edfc908 [Misc] Add compressed-tensors NVFP4A16 emulation support (#17914) Dipika Sikka 2025-05-11 03:58:38 -04:00
9cea90eab4 [Frontend] Add /classify endpoint (#17032) Frieda Huang 2025-05-11 03:57:07 -04:00
d1110f5b5a [doc] update lora doc (#17936) Reid 2025-05-11 15:56:21 +08:00
8132365b74 [Bugfix]: v1 engine - consider lora adapters in allowed_token_ids (#17855) Ben Browning 2025-05-11 03:53:58 -04:00
eea22a56ab fix amd triton mla path (#17871) Shiyan Deng 2025-05-11 00:53:31 -07:00
9112155283 [Perf] Use small max_num_batched_tokens for A100 (#17885) Kuntai Du 2025-05-11 00:53:23 -07:00
90d0a74b60 [Bugfix] Add revision to transformers.Auto*.from_pretrained processors (#17948) xinli-centml 2025-05-11 03:52:44 -04:00
d74e5f37bc [Kernel] fp4 marlin kernel (#17687) Jinzhen Lin 2025-05-11 10:58:49 +08:00
ca66a1674c [v1] Rename specialized_manager.py to single_type_kv_cache_manager.py (#17946) Chen Zhang 2025-05-11 07:14:12 +08:00

... 93 94 95 96 97 ...