Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

7b5575fa7d [Bug] Fix vLLM config is not set error (#29999) Wentao Ye 2025-12-05 16:42:12 -05:00
77e4472809 let draft model follow target model's config_format (#30152) Bangsheng Tang 2025-12-05 13:33:42 -08:00
962d703818 [Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926) Divakar Verma 2025-12-05 13:57:26 -06:00
e23ca3a0e8 [CI] Re-use whisper_client for all tests (#30148) Nicolò Lucchesi 2025-12-05 20:47:37 +01:00
3633035a3f [Misc] Rename CohereForAI references to CohereLabs (#30147) Russell Bryant 2025-12-05 14:41:40 -05:00
bff78310d9 [Enc-Dec] Fix OOT tokenizer issue (#30144) Nicolò Lucchesi 2025-12-05 20:23:33 +01:00
adb315060c [KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170) Tova Movshovitz 2025-12-05 20:33:26 +02:00
4e26d3b09e [Compile] Conditional compilation. Introduce compile_ranges (#24252) Ilya Markov 2025-12-05 19:17:32 +01:00
66e674cdd5 [Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315) Matthew Bonanni 2025-12-05 12:48:43 -05:00
dff0a2b394 [NIXL] Add remote_request_id to kv_transfer_params (#29665) Mark McLoughlin 2025-12-05 17:43:48 +00:00
dc264bcea1 [BugFix] Eagerly abort cancelled final-step requests (#29987) Nick Hill 2025-12-05 09:28:32 -08:00
78c44fd722 [NIXL] Small cleanup of unused variables (#29618) Nicolò Lucchesi 2025-12-05 18:17:36 +01:00
e7296b08da [bugfix] Pass globals to aot_compiled function (#29428) Angela Yi 2025-12-05 08:54:26 -08:00
da7bc54ea8 [responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798) Andrew Xia 2025-12-05 08:11:50 -08:00
949a6a19d2 [NIXL] Add compatibility checking to NIXL KV connector handshake (#29503) Mark McLoughlin 2025-12-05 14:52:45 +00:00
2c174420f5 Reduce validation to a warning (#28749) Alec S 2025-12-05 09:02:49 -05:00
0d8a7d8a26 [Compressed Tensors] Add XPU wNa16 support (#29484) Yi Liu 2025-12-05 22:02:09 +08:00
9843e332da [CPU][Perf] Add fast vectorized exp impl from Arm Optimized Routines (#30068) Elham 2025-12-05 08:09:20 -05:00
b7d85cf25c [CI] Have pre-commit comment on a PR if pre-commit was not used (#30077) Harry Mellor 2025-12-05 13:03:45 +00:00
c2894d3883 [Feature] Add Layer-wise NVTX Support (#29990) Max Hu 2025-12-05 06:20:07 -05:00
3628bcaaf2 [ROCm][MXFP4] Infer w4a4 quant method in rocm aiter fused moe (#29775) Zhiwei 2025-12-05 19:01:16 +08:00
b73b158ab0 [Bugfix] Fix parse_output_message crash on commentary with no recipient (#29972) strinczer 2025-12-05 10:51:12 +00:00
7ae13c66ba [typing] fix type (#29964) Ning Xie 2025-12-05 18:46:08 +08:00
f16356fe36 [bench] Support common prefix len config (for decode-only bench) (#29934) Ming Yang 2025-12-05 02:26:52 -08:00
65ee97288a [BugFix] Adding env variable to disable async grammar compilation (#29996) Alec S 2025-12-05 03:49:37 -05:00
62b3333448 [Frontend] Remove deprecated -O.xx flag (#29991) Yanan Cao 2025-12-05 00:47:22 -08:00
feecba09af [CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997) rasmith 2025-12-05 02:42:25 -06:00
6038b1b04b [Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978) amitz-nv 2025-12-05 10:34:33 +02:00
60a66ea2dc [DOC]: Add kthena to integrations (#29931) Tiger Xu / Zhonghu Xu 2025-12-05 16:11:03 +08:00
06579f9a82 [AMD][CI] Add ray[default] Dependency On ROCm To Pass v1/metrics/test_engine_logger_apis.py (#30110) Micah Williamson 2025-12-05 00:48:23 -06:00
6e865b6a83 Refactor example prompts fixture (#29854) Chukwuma Nwaugha 2025-12-05 06:44:32 +00:00
d698bb382d [Bugfix] Correct num_q_heads on DCP for Flashinfer backends (#29487) Jingchun Gao 2025-12-05 13:54:31 +08:00
2c22c4ca2d [ROCm][CI] Increase the memory threshold for test_deep_sleep_fp8_kvcache (#30104) Charlie Fu 2025-12-04 22:51:44 -06:00
5867819eaf Do not guard during noop elimination pass (#30095) Laith Sakka 2025-12-04 20:10:12 -08:00
7c9b2c8f81 [ROCm][CI] Add jiwer dependency for testing (#30081) Charlie Fu 2025-12-04 21:34:51 -06:00
0098a6e3da [PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms (#29952) Qiu 2025-12-05 10:40:51 +08:00
befb59e5b1 [Model] Add Holo2 reasoning parser (#30048) Hubert de La Jonquiere 2025-12-05 03:38:45 +01:00
aaddc9c82a [CI] fix silent error in nightly wheel index generation script, add generation time to HTML index (#30060) Shengqi Chen 2025-12-05 08:48:59 +08:00
263c38d74d [CI/Build] Update batch invariant test trigger (#30080) Zhewen Li 2025-12-04 16:42:37 -08:00
bcf43ab1f3 [CI/Build][AMD] Add Llama4 Maverick FP8 to AMD CI (#28695) Zhewen Li 2025-12-04 16:07:20 -08:00
4470ee2f90 [Perf] Enable separate shared_experts stream only for CUDA (#30085) Alexander Matveev 2025-12-04 19:03:17 -05:00
690cc3ef20 docs: update metrics design doc to use new vllm:kv_cache_usage_perc (#30041) TimWang 2025-12-05 07:37:14 +08:00
1f0d184590 [aot_compile]change VLLM backend to read fake args from example_value (#29104) Laith Sakka 2025-12-04 14:33:45 -08:00
c8ab988b15 [BugFix] Fix DBO assert assert B_block_table == B_q (#29933) Lucas Wilkinson 2025-12-04 14:48:54 -05:00
48a5fff66e [Bugfix] Missing tokens in return_token_ids when tool parsers is enabled in streaming mode (#29074) Peng-YM 2025-12-05 03:09:39 +08:00
1119f6e47a Abstract eplb algo (#26471) Mercykid-bash 2025-12-05 03:09:09 +08:00
e10c84e06a Access partial_rotary_factor from rope_parameters (#29966) Harry Mellor 2025-12-04 18:42:49 +00:00
ece2825a29 [KVConnector] Remove v0-related kv connector components such as kv pipe and kv lookup buffer (#29705) Kuntai Du 2025-12-05 02:20:48 +08:00
652ba93da3 [Bugfix] Fix FP8 MoE LoRA (#29890) Jee Jee Li 2025-12-05 02:17:49 +08:00
6dcb07f676 support qwen3-vl handle requests with embeddings (#30037) Tao Yun 2025-12-05 01:34:06 +08:00
46cbbca05c [CI][DCP][Perf] reduce DCP CI execution time (#29858) Qiu 2025-12-05 01:28:21 +08:00
b286a311c2 [Chore] Deprecate merge_by_field_config arg (#30035) Cyrus Leung 2025-12-05 01:21:24 +08:00
990f806473 [Doc] clarify nightly builds in developer docs (#30019) Shengqi Chen 2025-12-05 00:28:37 +08:00
5b4b42c0b6 Mark DBO test as flaky on b200 for Distributed B200 test (#29913) Doug Smith 2025-12-04 10:38:03 -05:00
cc050558f4 [Model Runner V2] Implement get_num_sampled_and_rejected kernel (#30029) Woosuk Kwon 2025-12-04 07:19:42 -08:00
5c32a06a04 Use Transformers v5 RoPE standardisation and validation (#30046) Harry Mellor 2025-12-04 14:54:28 +00:00
dd97e047e0 Fix broken multiline assert in LoRAModelManager.register_module (#30032) Yongtao Huang 2025-12-04 22:04:42 +08:00
9998ea5b57 Delete HF version of Phi 4 MM (#30049) Harry Mellor 2025-12-04 13:44:50 +00:00
74c4d80c6c [Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling (#27145) wang.yuqi 2025-12-04 21:44:15 +08:00
1b7c7f5159 [release] install regex (#30008) Kevin H. Luu 2025-12-04 03:18:29 -08:00
6796ce8bdb [Bugfix] Fix the issue with interleaved thinking when using streaming (#30033) Chauncey 2025-12-04 19:11:59 +08:00
e96a6a6dca [ROCm][CI][Bugfix] Fixing the Multi-Modal Models Test (Extended) 1 group (#30013) Andreas Karatzas 2025-12-04 05:00:16 -06:00
6366c098d7 Validating Runai Model Streamer Integration with S3 Object Storage (#29320) Noa Neria 2025-12-04 12:04:43 +02:00
842aba501d [P/D] Introduce Mooncake Transfer Engine as kv_connector (#24718) dtc 2025-12-04 17:51:36 +08:00
f2f4cea6cc [CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER (#29995) rasmith 2025-12-04 03:30:22 -06:00
dfdda96747 [Core] Remove forced None assignment for deprecated PassConfig flags (#29994) Arpit Khandelwal 2025-12-04 04:15:04 -05:00
ffdd18111b Add DeepSeek-V3.2 tool parser. (#29848) Xu Wenqing 2025-12-04 16:46:34 +08:00
b8a6ae4158 [ROCm] add fallback for aiter fp8 decode mla (#30005) Ye (Charlotte) Qi 2025-12-04 00:45:57 -08:00
899e2ef558 [Core] Fix standalone runs of test_reset_prefix_cache_e2e (#29899) Mark McLoughlin 2025-12-04 08:22:03 +00:00
68eb5c8d97 [Misc] Move functions into PoolingMetadata (#30027) Cyrus Leung 2025-12-04 16:21:19 +08:00
5430e110c0 [CI][AMD] Match Main CI Behavior By Skipping test_eplb_spec_decode In AMD CI (#30006) Micah Williamson 2025-12-04 02:20:54 -06:00
3f1b03739a [ROCm] [Bugfix] compute_attn_mask_seqlen for qwen3 omni (#29974) TJian 2025-12-04 16:20:24 +08:00
9aa33a74b0 [Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model (#30001) Charlie Fu 2025-12-04 01:52:28 -06:00
fd68e909db [docs] Remove _total from counter metrics names (#30028) CYJiang 2025-12-04 15:46:15 +08:00
404fc4bfc0 [Frontend] refactor harmony utils output message parsing (#29820) daniel-salib 2025-12-03 23:36:57 -08:00
82a64b3d8f [Bugfix] fixed deepseekv32 tool calling error (#30025) Chauncey 2025-12-04 15:12:12 +08:00
9ae2f60374 [Misc] Various cleanups for MM input processing (#29970) Cyrus Leung 2025-12-04 14:22:20 +08:00
80f8af4b2f Fix error while downloading dependencies for CPU backend (#29797) Jianwei Mao 2025-12-04 14:04:44 +08:00
8aaa81b35f [KVConnector] remove unused code (the model aware kv ops class) (#29709) Kuntai Du 2025-12-04 14:00:52 +08:00
fca3f46658 [Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk (#29971) Benjamin Bartels 2025-12-04 05:50:27 +00:00
28097d5638 [Bugfix][CPU] Fix CPU KV cache fallback memory allocation (#29604) gausah01 2025-12-04 05:01:15 +00:00
dd38ba3a26 [Bugfix] Fix adapter_enabled IMA (#29977) Jee Jee Li 2025-12-04 12:51:15 +08:00
5f91cdda75 [Misc] Add docker build env for Ascend NPU (#30015) Li Wang 2025-12-04 11:53:00 +08:00
33a3d6c798 fix LoRA-related examples (#29956) Iceber Gu 2025-12-04 11:48:30 +08:00
c493b9d092 [CI/Build] Add MM code path to Examples Test (#29986) Zhewen Li 2025-12-03 19:21:45 -08:00
ad32e3e19c enable multi-node in external launcher mode (#29833) Xieyang Xu 2025-12-03 17:02:02 -08:00
1109f98288 [CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930) Shengqi Chen 2025-12-04 06:08:19 +08:00
b5407869c8 [Bugfix] Respect VLLM_CONFIGURE_LOGGING value (#28671) Elizabeth Thomas 2025-12-03 16:00:52 -06:00
2902c34826 [Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929) bnellnm 2025-12-03 15:49:00 -05:00
ac1886588f [CI] Fix re import error (#29973) Wentao Ye 2025-12-03 15:16:54 -05:00
2fc5d6e0d7 Fix LLMEngine.del dp_group cleanup condition (#29954) Yongtao Huang 2025-12-04 04:14:44 +08:00
afe9eb408e [Bugfix] Fix flashinfer ar+norm kernel not available issue (#29960) elvischenv 2025-12-04 02:50:53 +08:00
19bee6d12d [Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470) Varun Sundar Rabindranath 2025-12-03 13:04:59 -05:00
dd5d1ef780 [Bugfix] Mistral tool parser streaming update (#19425) avigny 2025-12-03 18:45:31 +01:00
d1f7392c5f [ROCm][CI] Fix v1/logits_processors failure on ROCm (#29927) Micah Williamson 2025-12-03 11:17:07 -06:00
9ae3c55b10 SigLIP example add chat_template (#29902) Yu Jiaqi 2025-12-04 00:12:58 +08:00
9bcf92295a [Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163) Lumis Chen 2025-12-04 00:06:57 +08:00
5aa9b09040 [CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29839) rasmith 2025-12-03 08:56:35 -06:00
1bb17ecb39 [CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868) ioana ghiban 2025-12-03 14:33:50 +01:00
15b1511a15 [GPU Backend] [Doc]: Remove duplicate statements on missing GPU wheels. (#29962) ioana ghiban 2025-12-03 13:56:47 +01:00

... 37 38 39 40 41 ...