Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

432870829d [Bugfix] Fix missing per_act_token parameter in compressed_tensors_moe (#20509) Lucia Fang 2025-07-06 12:08:30 +08:00
f73d02aadc [BUG] Fix #20484. Support empty sequence in cuda penalty kernel (#20491) Vadim Gimpelson 2025-07-06 06:38:02 +04:00
c5ebe040ac test_attention compat with coming xformers change (#20487) Jeremy Reizenstein 2025-07-06 03:37:59 +01:00
8d763cb891 [Misc] remove unused import (#20517) Reid 2025-07-06 10:17:06 +08:00
cf4cd53982 [Misc] Add logger.exception for TPU information collection failures (#20510) Reid 2025-07-05 22:24:32 +08:00
32c9be2200 [v1] Re-add fp32 support to v1 engine through FlexAttention (#19754) Isotr0py 2025-07-05 17:41:10 +08:00
8aeaa910a2 Fix unknown attribute of topk_indices_dtype in CompressedTensorsW8A8Fp8MoECutlassMethod (#20507) Lucia Fang 2025-07-05 14:03:20 +08:00
906e05d840 [Misc] Remove the unused LoRA test code (#20494) Jee Jee Li 2025-07-05 13:48:16 +08:00
ef9a2990ae [doc] small fix (#20506) Reid 2025-07-05 11:56:39 +08:00
7e90870491 [Misc] Add security warning for development mode endpoints (#20508) Reid 2025-07-05 11:52:13 +08:00
d3f05c9248 [Doc] fix mutltimodal_inputs.md gh examples link (#20497) Guy Stone 2025-07-05 00:41:35 +01:00
c108781c85 [CI Bugfix] Fix pre-commit failures on main (#20502) Michael Goin 2025-07-05 06:17:30 +09:00
3d184b95b8 [feat]: CUTLASS block scaled group gemm for SM100 (#19757) Duncan Moss 2025-07-04 11:58:04 -07:00
2f35a022e6 Enable V1 for Hybrid SSM/Attention Models (#20016) Thomas Parnell 2025-07-04 19:46:53 +02:00
ffe00ef77a [Misc] Small: Remove global media connector. Each test should have its own test connector object. (#20395) Chenheli Hua 2025-07-04 08:15:03 -07:00
5561681d04 [CI] add kvcache-connector dependency definition and add into CI build (#18193) Peter Pan 2025-07-04 21:49:18 +08:00
fbd62d8750 [Doc] Fix classification table in list of supported models (#20489) Cyrus Leung 2025-07-04 21:08:02 +08:00
2e26f9156a [Model][3/N] Automatic conversion of CrossEncoding model (#20168) wang.yuqi 2025-07-04 20:47:39 +08:00
9e5452ee34 [Bug][Frontend] Fix structure of transcription's decoder_prompt (#18809) sangbumlikeagod 2025-07-04 20:28:07 +09:00
0e3fe896e2 Support Llama 4 for fused_marlin_moe (#20457) Michael Goin 2025-07-04 16:55:10 +09:00
1caca5a589 [Misc] Add SPDX-FileCopyrightText (#20428) Jee Jee Li 2025-07-04 15:40:42 +08:00
783921d889 [Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels (#20331) Wentao Ye 2025-07-04 03:06:24 -04:00
4a98edff1f [Structured Outputs][V1] Skipping with models doesn't contain tokenizers (#20365) Aaron Pham 2025-07-04 03:05:49 -04:00
a7bab0c9e5 [Misc] small update (#20462) Reid 2025-07-04 11:33:44 +08:00
25950dca9b Add ignore consolidated file in mistral example code (#20420) 汪志鹏 2025-07-04 10:55:07 +08:00
a4113b035c [Platform] Add custom default max tokens (#18557) Gabriel Marinho 2025-07-03 23:50:17 -03:00
7e1665b089 [Misc] Change warn_for_unimplemented_methods to debug (#20455) Michael Goin 2025-07-04 11:35:08 +09:00
8d1096e7db [Bugfix] Register reducer even if transformers_modules not available (#19510) Seiji Eicher 2025-07-03 15:08:12 -07:00
8d775dd30a [Misc] Fix Unable to detect current VLLM config. Defaulting to NHD kv cache layout warning (#20400) Nicolò Lucchesi 2025-07-03 23:56:09 +02:00
78fe77534b [Kernel] Enable fp8 support for pplx and BatchedTritonExperts. (#18864) bnellnm 2025-07-03 17:55:40 -04:00
2f2fcb31b8 [Misc] Remove _maybe_ignore_quant_config from GLM4.1v (#20432) v0.9.2rc1 Yuxuan Zhang 2025-07-04 05:41:13 +08:00
1dba2c4ebe [Misc] adjust for ipv6 for mookcacke url parse (#20107) Ning Xie 2025-07-04 04:27:17 +08:00
71d6de3a26 [Misc] Clean up InternVL family config registration (#19992) Isotr0py 2025-07-04 04:01:47 +08:00
536fd33003 [CI] Trimming some failing test groups from AMDPRODUCTION. (#20390) Alexei-V-Ivanov-AMD 2025-07-03 10:21:31 -05:00
619b9f5c7e [Frontend] fix duplicate output for bench subcmd (#20446) Reid 2025-07-03 23:02:06 +08:00
d1b689c445 [Bugfix] Fix flaky test_streaming_response test (#20363) Nicolò Lucchesi 2025-07-03 16:46:24 +02:00
9854dc9040 [Frontend] improve vllm bench <bench_type> --help display (#20430) Reid 2025-07-03 22:22:16 +08:00
ff5c60fad8 [Misc] Automatically tag PRs to add new models (#20222) Isotr0py 2025-07-03 22:11:03 +08:00
6f1229f91d [Model][2/N] Automatic conversion of CrossEncoding model (#19978) wang.yuqi 2025-07-03 21:59:23 +08:00
1819fbda63 [Quantization] Bump to use latest bitsandbytes (#20424) Jee Jee Li 2025-07-03 21:58:46 +08:00
7f0367109e [CI/Build][CPU] Enable cross compilation in CPU release pipeline (#20423) Li, Jiang 2025-07-03 20:26:12 +08:00
fb14d53cf6 [Kernel] refactor cpu worker v0 cache dtype (#20080) Ning Xie 2025-07-03 16:39:14 +08:00
b024a42e93 [Core] Move multimodal placeholder from chat utils to model definition (#20355) Cyrus Leung 2025-07-03 16:18:30 +08:00
cb97f2bfc5 [Docs] Replace two list with tables in intel_gaudi.md (#20414) Michael Yao 2025-07-03 15:48:25 +08:00
359200f6ac [doc] fix link (#20417) Reid 2025-07-03 15:21:57 +08:00
220aee902a [Misc] Add rules to label Speculative Decoding Related PRs (#20406) Lifans 2025-07-02 23:56:49 -07:00
67d25eca05 [Tests] Update online DP tests to verify that requests are balanced (#20157) Nick Hill 2025-07-03 07:49:13 +01:00
363528de27 [Feature] Support MiniMax-M1 function calls features (#20297) qscqesze 2025-07-03 14:48:27 +08:00
4ff61ababa [TPU] Add a case to cover RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 (#20385) QiliangCui 2025-07-02 23:46:41 -07:00
0ec3779df7 [Bugfix][CI/CD][CPU] Fix CPU CI tests (#20383) Li, Jiang 2025-07-03 11:11:36 +08:00
b616f6a53d [Misc] Small: Fix video loader return type annotations. (#20389) Chenheli Hua 2025-07-02 20:10:39 -07:00
2e25bb12a8 [Bugfix] Fix import of CutlassExpertsFp8 in compressed_tensors_moe.py (#20381) bnellnm 2025-07-02 22:07:43 -04:00
9965c47d0d Enable CPU nightly performance benchmark and its Markdown report (#18444) Louie Tsai 2025-07-02 18:50:25 -06:00
059d4cdb49 [BugFix] Fix DP headless mode arg validation (#20398) Nick Hill 2025-07-03 01:15:32 +01:00
bdb84e26b0 [Bugfix] Fixes for FlashInfer's TORCH_CUDA_ARCH_LIST (#20136) Tyler Michael Smith 2025-07-02 20:15:11 -04:00
3dd359147d [Docs] Update EAGLE example (#20375) Nicolò Lucchesi 2025-07-03 02:13:51 +02:00
657f2f301a [DP] Support external DP Load Balancer mode (#19790) Nick Hill 2025-07-02 18:21:52 +01:00
a1aafc827a [ROCm][FEAT] Enable Full Graph Mode in AITER MLA V1 Attn Backend (Decode Phase only) (#20254) vllmellm 2025-07-03 00:25:46 +08:00
139508a418 [Misc] add handler HF_TOKEN is emptry string (#20369) rongfu.leng 2025-07-03 00:14:31 +08:00
d265414dbc [Minor] Clean up incorrect comment in test (#20382) Nick Hill 2025-07-02 17:13:37 +01:00
48fb076cbc [V1] LogitsProcessor programming model (#16728) afeldman-nm 2025-07-02 12:10:42 -04:00
c1909e7e8c [Kernels] MoE refactor (#19636) bnellnm 2025-07-02 09:08:27 -04:00
b95877509b Documentation update tool_calling: mapping back to function from response (#20373) cronoik-inceptionai 2025-07-02 16:55:49 +04:00
706ff13224 [Model] Adds support for SlimMoE models Phi-tiny-MoE-instruct (#20286) zichongli5 2025-07-02 05:54:12 -07:00
ccbfb1d1c9 [Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322) WangHuaqiang 2025-07-02 20:53:36 +08:00
9e5552aa13 [NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280) Joonchen Liau 2025-07-02 20:47:19 +08:00
0c600b9ab6 [Build/CI] Automatically tag DeepSeek related PRs (#20370) Lu Fang 2025-07-02 20:02:43 +09:00
e303dcf523 [Model] Add Ernie4.5 and Ernie4.5MoE Model Support (#20220) CSWYF3634076 2025-07-02 18:37:01 +08:00
ae9c4d416f [Docs] Make TPU ref prettier in google_tpu.md (#20356) Michael Yao 2025-07-02 17:04:08 +08:00
d853520b3e [Docs] Fix indentations for 2-level items in deprecation_policy.md (#20352) Michael Yao 2025-07-02 14:50:31 +08:00
ba51aea65e [Bugfix] Keye-VL compatibility with tok_kwargs (#20058) (#20353) Cyrus Leung 2025-07-02 14:46:59 +08:00
8452946c06 [Model][VLM] Support Keye-VL-8B-Preview (#20126) Kwai-Keye 2025-07-02 14:35:04 +08:00
2e7cbf2d7d [Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105) Chenheli Hua 2025-07-01 23:34:03 -07:00
7da296be04 [TPU] kv cache update kernel supports dynamic grid (#20235) Chengji Yao 2025-07-01 23:33:37 -07:00
b205e8467d [Doc][TPU] Add models and features supporting matrix. (#20230) QiliangCui 2025-07-01 23:33:20 -07:00
be0cfb2b68 fix[Docs]: link anchor is incorrect #20309 (#20315) yyzxw 2025-07-02 14:32:34 +08:00
1a03dd496b [Bugfix] Fix dynamic rotary embedding (#20343) Cyrus Leung 2025-07-02 14:31:26 +08:00
27b8017636 [FIX][Intel GPU]fix ipex flash_attn_varlen_func api missing parameter (#20348) Kunshang Ji 2025-07-02 13:26:40 +08:00
9ec1e3065a [Misc][Doc] Add missing comment for LLM (#20285) Lifans 2025-07-01 19:04:24 -07:00
9dae7d46bf [Refactor] Remove Unused Env VLLM_ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON (#20334) Wentao Ye 2025-07-01 22:03:43 -04:00
7058d7dd5d [Refactor] Remove duplicate find_free_port (#20333) Wentao Ye 2025-07-01 22:03:07 -04:00
a0389e0554 [UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169) Liangliang Ma 2025-07-02 09:06:04 +08:00
3be8d312a2 [Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 (#20324) Tyler Michael Smith 2025-07-01 21:05:47 -04:00
3abfe22154 Enable group size 64 for Machete (#20290) czhu-cohere 2025-07-01 18:05:44 -07:00
e81fbefe8a [Refactor] Refactor import utils (#20269) Wentao Ye 2025-07-01 21:05:42 -04:00
9290de5667 remove unused variables in marlin_template.h (#20236) 周周周 2025-07-02 08:51:52 +08:00
7f280d69c9 [Optimization] Cache sampled token ids in model runner (#20291) Woosuk Kwon 2025-07-01 11:01:31 -07:00
02cabff207 [V1] [ROCm] Enable EP with AITER Fused MoE (#20270) TJian 2025-07-01 09:48:30 -07:00
3d19d47d91 [Frontend] Expand tools even if tool_choice="none" (#17177) Shintarou Okada 2025-07-02 01:47:38 +09:00
8acb4badee [CUDA graphs] Enable full cuda graphs with FA3 AoT scheduling (#20301) Woosuk Kwon 2025-07-01 09:07:36 -07:00
314af8617c [Docs] Update transcriptions API to use openai client with stream=True (#20271) Nicolò Lucchesi 2025-07-01 17:47:13 +02:00
0e96cc9b7e [Misc] Minor refactoring for scheduler (#20299) Woosuk Kwon 2025-07-01 07:55:32 -07:00
ecad851cbd [Model]Add Tencent HunYuanMoEV1 Model Support (#20114) aiyiwang2025 2025-07-01 22:28:13 +08:00
ed70f3c64f Add GLM4.1V model (Draft) (#19331) Yuxuan Zhang 2025-07-01 20:48:26 +08:00
650d5dbd04 [Misc] Minor refactor of NIXL background handshake (#20068) Nicolò Lucchesi 2025-07-01 13:40:14 +02:00
9025a9a705 [Quant] [Bugfix] Fix quantization config matching with hf_to_vllm_mapper (#20046) Kyle Sayers 2025-07-01 06:20:34 -04:00
c05596f1a3 [Perf] Validate @config in pre-commit instead of dynamically (#20200) Lionel Villard 2025-07-01 05:10:28 -04:00
787b13389e [doc] fix the incorrect logo in dark mode (#20289) Reid 2025-07-01 16:18:09 +08:00
96453cfa83 [BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine (#19067) TY-AMD 2025-07-01 16:12:19 +08:00
b1c1fe35a5 [Misc] remove redundant char (#20287) Kebe 2025-07-01 15:33:22 +08:00

... 83 84 85 86 87 ...