Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

ab714131e4 [Doc] Update compatibility matrix for pooling and multimodal models (#21831) Cyrus Leung 2025-07-29 21:29:51 +08:00
755fa8b657 [KVCache] Make KVCacheSpec hashable (#21791) Chen Zhang 2025-07-29 04:58:29 -07:00
2470419119 [Docs] Fix the outdated URL for installing from vLLM binaries (#21523) Kay Yan 2025-07-29 19:56:27 +08:00
61a6905ab0 [Model] Refactor JambaForCausalLM (#21394) Jee Jee Li 2025-07-29 18:25:07 +08:00
37efc63b64 [V0 deprecation] Guided decoding (#21347) Reza Barazesh 2025-07-29 03:15:30 -07:00
a4528f0cac [Model]: Fused MoE for nomic-embed-text-v2-moe (#18321) Isotr0py 2025-07-29 18:13:27 +08:00
a2480251ec [Doc] Link to RFC for pooling optimizations (#21806) Cyrus Leung 2025-07-29 14:53:18 +08:00
7234fe2685 [Misc] Rework process titles (#21780) Nick Hill 2025-07-29 06:14:47 +01:00
f1e2c095ec Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema (#21684) Benji Beck 2025-07-28 22:09:45 -07:00
12a223ef9b [AMD][CI/Build][Bugfix] Guarding CUDA specific functions by ifndef ROCM (#21766) Gregory Shtrasberg 2025-07-28 23:35:37 -04:00
e18f085103 skip fusedmoe layer for start_load_kv (#21378) Calvin Chen 2025-07-29 09:59:44 +08:00
afa2607596 [CI] Parallelize Kernels MoE Test (#21764) Michael Goin 2025-07-28 21:56:24 -04:00
48b763d6b5 [Refactor] Merge Compressed Tensor FP8 CompressedTensorsW8A8Fp8MoEMethod and CompressedTensorsW8A8Fp8MoECutlassMethod (#21775) Wentao Ye 2025-07-28 21:47:21 -04:00
947e982ede [Docs] Minimize spacing for supported_hardware.md table (#21779) Michael Goin 2025-07-28 21:46:39 -04:00
c6c9122d50 [Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning (#20396) lyrisz 2025-07-28 16:13:58 -07:00
8aa1485fcf [Perf] Disable chunked local attention by default with llama4 (#21761) Lucas Wilkinson 2025-07-28 18:49:04 -04:00
89ac266b26 [Feat]: Add support for Dynamic Quant 4 bit CPU kleidiai kernels (#17112) Nikhil Gupta 2025-07-28 21:55:15 +01:00
c6f36cfa26 [Bugfix] DeepGEMM is not enabled on B200 due to _lazy_init() (#21472) Clayton Coleman 2025-07-28 16:51:22 -04:00
b18b417fbf Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" (#21778) Kuntai Du 2025-07-28 13:15:18 -07:00
9ba1c88a93 [AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure (#21647) Lu Fang 2025-07-28 13:11:16 -07:00
e0e58f9729 [Bug] Enforce contiguous input for dynamic_scaled_fp8_quant and static_scaled_fp8_quant (#21773) Wentao Ye 2025-07-28 15:55:48 -04:00
b361f14e39 [AMD][BugFix] Fix omission of wvSplitK kernel for small batch sizes (1-4) due to torch.compile (#21350) rasmith 2025-07-28 14:38:20 -05:00
01c753ed98 update flashinfer to v0.2.9rc2 (#21701) weiliang 2025-07-29 03:31:47 +08:00
94b71ae106 Use metavar to list the choices for a CLI arg when custom values are also accepted (#21760) Harry Mellor 2025-07-28 20:31:10 +01:00
7d44c691b0 [P/D] Log warnings related to prefill KV expiry (#21753) Nick Hill 2025-07-28 19:40:53 +01:00
e17a4d3bf9 [Bugfix] Fix granite speech shape validation (#21762) Cyrus Leung 2025-07-29 02:19:21 +08:00
ec261b0291 [XPU] IPEX-optimized Punica Wrapper on XPU (#21703) Chaojun Zhang 2025-07-29 00:43:37 +08:00
04fe61aa3d [CI/Build] Fix plugin tests (#21758) Cyrus Leung 2025-07-28 23:08:05 +08:00
25708d317a [Bugfix] Mistral crashes on tool with no description (#21167) Michard Hugo 2025-07-28 17:03:35 +02:00
0e18a5d058 [Misc] Reduce logs for model resolution (#21765) Cyrus Leung 2025-07-28 22:59:56 +08:00
34a20c49b3 [Logs] Change flashinfer sampler logs to once (#21759) Michael Goin 2025-07-28 09:59:51 -04:00
31084b3b1f [Bugfix][CI/Build] Update peft version in test requirement (#21729) Isotr0py 2025-07-28 21:17:43 +08:00
bccc43c033 [Bugfix]check health for engine core process exiting unexpectedly (#21728) wuhang 2025-07-28 21:17:31 +08:00
1395dd9c28 [Docs] Add revision date to rendered docs (#21752) Harry Mellor 2025-07-28 14:12:46 +01:00
9ace2eaf35 [Bugfix] Improve JSON extraction in LlamaToolParser (#19024) Keyang Ru 2025-07-28 05:36:58 -07:00
656c24f1b5 [Ernie 4.5] Name Change for Base 0.3B Model (#21735) Anton Vlasjuk 2025-07-28 14:22:32 +02:00
63fe3a700f [PD] let p2p nccl toy proxy handle /chat/completions (#21734) Chauncey 2025-07-28 19:45:50 +08:00
0ae970ed15 [Bugfix] Fix glm4.1v video_grid_thw tensor shape scheme (#21744) Isotr0py 2025-07-28 19:26:49 +08:00
65e8466c37 [Bugfix] Fix environment variable setting in CPU Dockerfile (#21730) Li, Jiang 2025-07-28 19:02:39 +08:00
1b769dccf3 [Bugfix] Fix Ernie4_5_MoeForCausalLM shared experts (#21717) Jee Jee Li 2025-07-28 19:02:25 +08:00
2cc571199b [feature] add log non default args in LLM (#21680) rongfu.leng 2025-07-28 17:21:22 +08:00
a4ed731546 [Model] Prioritize Transformers fallback over suffix matching (#21719) Cyrus Leung 2025-07-28 17:15:31 +08:00
d128d0d554 Migrate KeyeImageInputs and KeyeVideoInputs to TensorSchema (#21686) Benji Beck 2025-07-28 01:16:35 -07:00
a6c050286a [v1][mamba] Added mamba_type into MambaSpec (#21715) Asaf Joseph Gardin 2025-07-28 11:15:55 +03:00
139a7f07bd [BugFix] Fix ChunkedLocalAttention when the hybrid kv-cache is disabled (#21707) Lucas Wilkinson 2025-07-28 03:18:47 -04:00
150d9e6337 [Bugfix] fix max-file-size type from str to int (#21675) Ning Xie 2025-07-28 15:06:52 +08:00
139a97ec56 [Bugfix] Fix shape checking for Fuyu (#21709) Cyrus Leung 2025-07-28 15:05:56 +08:00
18cc33dd60 [bugfix] fix profile impact benchmark results (#21507) rongfu.leng 2025-07-28 13:44:24 +08:00
7656cf4cf3 [Bugfix] [issue-21565] Fix the incompatibility issue with stream and named function calling when Thinking is disabled (#21573) Hongsheng Liu 2025-07-28 13:43:50 +08:00
3ea57a56d9 Migrate Idefics3ImagePixelInputs and Idefics3ImageEmbeddingInputs to … (#21683) Benji Beck 2025-07-27 22:37:23 -07:00
75856bc2cb Migrate GraniteSpeechAudioInputs to TensorSchema (#21682) Benji Beck 2025-07-27 22:37:20 -07:00
304dcdf575 Migrate GLMVImagePixelInputs to TensorSchema (#21679) Benji Beck 2025-07-27 22:36:11 -07:00
88e46c7c8d Migrate Glm4vImageInputs, Glm4vVideoInputs to TensorSchema (#21678) Benji Beck 2025-07-27 22:36:08 -07:00
d8937de4c8 Migrate Gemma3ImagePixelInputs to TensorSchema (#21676) Benji Beck 2025-07-27 22:36:05 -07:00
e626d286f5 [FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel (#21242) TJian 2025-07-27 22:07:06 -07:00
c7ffe93d9c [Model] Support TP/PP/mamba2 kernel for PLaMo2 (#19674) Shinichi Hemmi 2025-07-28 14:00:47 +09:00
15a72ac478 [V1] Exception Handling when Loading KV Cache from Remote Store (#21534) Adeline 2025-07-28 11:34:17 +08:00
04ff4be310 [Misc] Add fused_moe configs for Qwen3-Coder-480B-A35B-Instruct-FP8 (#21700) Jee Jee Li 2025-07-28 11:12:18 +08:00
93269bb43e Fix GLM tool parser (#21668) Yuxuan Zhang 2025-07-28 10:46:38 +08:00
82acf2184d Fix typo for limit-mm-per-prompt in docs (#21697) Joachim Studnia 2025-07-27 19:45:37 -07:00
86ae693f20 [Deprecation][2/N] Replace --task with --runner and --convert (#21470) Cyrus Leung 2025-07-28 10:42:40 +08:00
8f605ee309 [Attention] Make CutlassMLA the default backend for SM100 (blackwell) (#21626) Alexander Matveev 2025-07-27 16:13:00 -04:00
a9b2a1d704 [Misc] Refactor vllm config str (#21666) Ning Xie 2025-07-28 00:51:44 +08:00
57c22e57f9 Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934) Caleb_Du 2025-07-27 22:08:00 +08:00
bda9d0535f [Refactor] Refactor MOE NVFP4 Code Base: ModelOpt + Compressed Tensor (#21631) Wentao Ye 2025-07-27 08:25:21 -04:00
3d847a3125 [VLM] Add video support for Intern-S1 (#21671) Isotr0py 2025-07-27 19:49:43 +08:00
5f8c9a425e Migrate Florence2ImagePixelInputs to TensorSchema (#21663) Benji Beck 2025-07-27 02:43:02 -07:00
1cbf951ba2 [Misc] add default value for file pattern arg (#21659) Ning Xie 2025-07-27 13:14:51 +08:00
a8936e5193 Refactor: Remove numpy dependency from LoggingStatLogger (#20529) ZiTian.Zhao 2025-07-27 12:06:21 +08:00
01a395e9e7 [CI/Build][Doc] Clean up more docs that point to old bench scripts (#21667) Ye (Charlotte) Qi 2025-07-26 21:02:12 -07:00
971948b846 Handle non-serializable objects in vllm bench (#21665) Huy Do 2025-07-26 20:35:22 -07:00
eed2f463b2 [VLM] Support HF format Phi-4-MM model (#17121) Isotr0py 2025-07-27 11:07:57 +08:00
20950b29fb Migrate ChameleonImagePixelInputs to TensorSchema (#21657) Benji Beck 2025-07-26 19:34:25 -07:00
3339cba3ff Migrate FuyuImagePatchInputs to TensorSchema (#21662) Benji Beck 2025-07-26 19:34:14 -07:00
0b8caf9095 Migrate DeepseekVL2ImageInputs to TensorSchema (#21658) Benji Beck 2025-07-26 19:34:11 -07:00
ccf27cc4d4 Migrate Blip2ImagePixelInputs and Blip2ImageEmbeddingInputs to TensorSchema (#21656) Benji Beck 2025-07-26 19:33:52 -07:00
c657369841 support torch.compile for bailing moe (#21664) Jinzhen Lin 2025-07-27 07:54:32 +08:00
6c66f28fa5 Remove xformers requirement for Mistral-format Pixtral and Mistral3 (#21154) Wenchen Lo 2025-07-26 16:20:29 -07:00
de509ae8eb [NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels (#21411) Kaixi Hou 2025-07-26 07:10:36 -07:00
e7c4f9ee86 [CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI (#21355) Ye (Charlotte) Qi 2025-07-26 07:10:14 -07:00
9094d11c5d [Bugfix][Apple Silicon] fix missing symbols when build from source on Mac with Apple Silicon (#21380) Yeju Zhou 2025-07-26 22:09:57 +08:00
56e544f24b [Refactor] Remove moe_align_block_size_triton (#21335) Wentao Ye 2025-07-26 10:08:29 -04:00
97d6c30cc9 [BugFix] Fix shared storage connector load kv only load attention layer (#21428) WeiQing Chen 2025-07-26 22:07:40 +08:00
a40a8506df [Misc] Improve memory profiling debug message (#21429) Ye (Charlotte) Qi 2025-07-26 07:07:21 -07:00
c215f5c877 [Bug] Fix has_flashinfer_moe Import Error when it is not installed (#21634) Wentao Ye 2025-07-26 10:06:14 -04:00
1cd6eaba54 Support encoder-only models without KV-Cache (#21270) Maximilien de Bayser 2025-07-26 10:09:52 -03:00
f27fdfc3ed [Bugfix] Investigate Qwen2-VL failing test (#21527) Isotr0py 2025-07-26 21:09:29 +08:00
de10ff0b7c Migrate AyaVisionImagePixelInputs to TensorSchema for shape validation (#21622) Benji Beck 2025-07-26 06:08:18 -07:00
9d197280fa Migrate AriaImagePixelInputs to TensorSchema for shape validation (#21620) Benji Beck 2025-07-26 06:08:15 -07:00
e98def439c [Take 2] Correctly kill vLLM processes after benchmarks (#21646) Huy Do 2025-07-26 06:06:05 -07:00
05c1126f29 [Misc] remove unused try-except in pooling config check (#21618) Reid 2025-07-26 20:20:03 +08:00
875af38e01 Support Intern-S1 (#21628) Lyu Han 2025-07-26 19:14:04 +08:00
7728dd77bb [TPU][Test] Divide TPU v1 Test into 2 parts. (#21431) QiliangCui 2025-07-25 23:20:30 -07:00
2f6e6b33fb [Bugfix] Fix isinstance check for tensor types in _load_prompt_embeds to use dtype comparison (#21612) Alexandre JUAN 2025-07-26 05:11:10 +02:00
a55c95096b Correctly kill vLLM processes after finishing serving benchmarks (#21641) Huy Do 2025-07-25 19:06:21 -07:00
97349fe2bc [Docs] add offline serving multi-modal video input expamle Qwen2.5-VL (#21530) WeiQing Chen 2025-07-26 09:37:32 +08:00
62965de5fe [Model] Ultravox: Support Llama 4 and Gemma 3 backends (#17818) Farzad Abdolhosseini 2025-07-26 04:12:31 +03:00
7ae75fa6d0 [Feature] Add support for MoE models in the calibration-free RTN-based quantization (#20766) Alex Kogan 2025-07-25 21:09:34 -04:00
f1b286b2fb [TPU] Update ptxla nightly version to 20250724 (#21555) Chengji Yao 2025-07-25 17:09:00 -07:00
c7742d6113 [Bugfix] Always set RAY_ADDRESS for Ray actor before spawn (#21540) Rui Qiao 2025-07-25 17:08:30 -07:00

... 77 78 79 80 81 ...