Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

4a9952ec1b [Bugfix] Add quant_config in ViT of Kimi-K2.5 (#34501) LoganJane 2026-02-14 00:05:34 +08:00
1dae7b7843 [Bugfix] Exclude language_model_only key from MM AOT compile hash but include in model one (#34508) Roger Wang 2026-02-13 05:59:00 -08:00
5885e330ef [Misc] Port Qwen3.5 Configs (#34512) Roger Wang 2026-02-13 05:24:25 -08:00
071d863e20 Extend ColBERT support to non-standard BERT backbones (#34170) Ilya Boytsov 2026-02-13 10:53:09 +01:00
0916e7960b [GDN] Use CPU tensors to build GDN metadata (#34498) Woosuk Kwon 2026-02-13 01:24:45 -08:00
3d2a026fd0 [Feature] Pipeline Parallel Async send/recv, 2.9% E2E throughput improvement (#33368) Wentao Ye 2026-02-13 03:38:16 -05:00
dddbff4624 [Core] Move pause and resume functions into engine (#34125) Aaron Hao 2026-02-13 00:15:10 -08:00
47e9b63e1a [KVConnector] Clean up redundant code in KV connectors (#34147) Martin Hickey 2026-02-13 08:14:30 +00:00
934acddef9 [Perf] fused_moe: add int4_w4a16 benchmark support and tuning config (#34130) Matthias Gehre 2026-02-13 09:14:27 +01:00
742d214d6e [Bugfix] fix the import path in moe test utils.py (#34245) Marek Michalowski 2026-02-13 08:13:45 +00:00
4137c5dfa7 [Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode (#34418) haosdent 2026-02-13 16:13:22 +08:00
7a8a46ddcb [BugFix] Fix and optimize max_num_blocks_per_req calculation for MambaSpec (#34440) Harry Huang 2026-02-13 16:13:14 +08:00
bcf0731aa0 [New Model] support new model ovis2.6 (#34426) myselvess 2026-02-13 16:12:45 +08:00
ec090c2429 [Refactor] Call renderer for online IO processor request (#34490) Cyrus Leung 2026-02-13 14:48:45 +08:00
eea3024f43 [Bugfix] Fix mamba state dtype setting for Qwen3-Next and Qwen3.5 (#34489) Roger Wang 2026-02-12 22:48:42 -08:00
2f308214c0 [Refactor] Pass full VllmConfig to Renderer (#34485) Cyrus Leung 2026-02-13 14:48:38 +08:00
1b4e8e53f8 [CI/Build] Fix CUDA re-initialization error in distributed model tests (#34491) Cyrus Leung 2026-02-13 14:43:53 +08:00
dcf6ee8592 [Bugfix] Fix encoder cache underestimation for GLM-4V/GLM-OCR single image (#34483) haosdent 2026-02-13 13:04:06 +08:00
372b2e762a [Bugfix] Standardize getting number of image patches/tokens (#34358) Cyrus Leung 2026-02-13 12:47:01 +08:00
6afa587d31 [ROCm][CI] Fix serving tokens test failures (#34047) Andreas Karatzas 2026-02-12 21:27:53 -06:00
94ed6cf6ea Add new sections to CODEOWNERS (#34309) Cyrus Leung 2026-02-13 10:39:28 +08:00
bf37812ca7 [Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode (#33706) Harry Huang 2026-02-13 10:21:52 +08:00
b86bf4417e [Bugfix] Fix Random Dataset Prefix Length Inaccuracy (#33907) Frank Wang 2026-02-12 18:21:19 -08:00
de13dd781f [Kernel] [Helion] [5/N] Add Helion Autotuning infrastructure (#34025) Yanan Cao 2026-02-12 18:21:05 -08:00
62788f99a4 [Bugfix] Delete unused redundant code in Kimi-K2.5 (#34427) LoganJane 2026-02-13 10:18:42 +08:00
ea5ff3a1f6 [Refactor] Simplify BOS/EOS token handling (#34435) Cyrus Leung 2026-02-13 10:18:24 +08:00
04ea31baab [Bugfix] Remove assert that's no longer valid (#34443) bnellnm 2026-02-12 21:18:15 -05:00
6f019e6e0a [BugFix] Add block_size validation for mamba cache align mode (#34445) Harry Huang 2026-02-13 10:18:07 +08:00
d707678dfb Fix num_logprobs parameter description in sampler.py (#34451) Zhuohan Li 2026-02-12 18:18:03 -08:00
fc22cae4ac [CI/Build] Update video URLs for testing (#34446) Cyrus Leung 2026-02-13 10:15:36 +08:00
96161fe978 [Kernel] [Helion] [4/N] Add silu_mul_fp8 Helion kernel (#33373) Yanan Cao 2026-02-12 18:13:12 -08:00
4453ba8d9e [Core] Profiler improvements and lazy initialization (#33198) Jaewon 2026-02-12 16:16:38 -08:00
aa181c923b [Core] Add sleep level 0 mode with enqueue/wait pattern (#33195) Jaewon 2026-02-12 16:16:25 -08:00
be7370daf3 [Frontend] Enable generic structured_outputs for responses API (#33709) Alec S 2026-02-12 19:15:48 -05:00
9ea1f598ce Use paged_attention_v1 for sliding window decode in rocm_aiter_fa (#34378) Mengtao (Martin) Yuan 2026-02-12 16:14:43 -08:00
f120bd42d3 [Kernel] Support Flashinfer trtllm fused MoE non gated FP8 & NVFP4 (#33506) amitz-nv 2026-02-12 23:06:58 +02:00
fac4e96940 small adjustment to wvSplitKrc (#34410) Hashem Hashemi 2026-02-12 12:26:36 -08:00
6d4e27ce29 [Bugfix] Enforce DeepGEMM when using sparse_attn_indexer on CUDA (#34374) Michael Goin 2026-02-12 15:08:06 -05:00
4c078fa546 [ROCm][CI] Pin TorchCodec to v0.10.0 for ROCm compatibility (#34447) Andreas Karatzas 2026-02-12 12:47:34 -06:00
6c0baee610 [Voxtral Realtime] Refactor & Improve buffering logic (#34428) Patrick von Platen 2026-02-12 18:46:43 +01:00
1100a97621 [Voxstral Realtime] Enable tests (#33803) Patrick von Platen 2026-02-12 18:43:24 +01:00
766e167821 [ROCm][quantization] improve OCP weight quant parser robust (#34431) xuebwang-amd 2026-02-13 01:40:19 +08:00
becbe24808 [Bugfix] Remove broken raw url GGUF model loading support (#34433) Isotr0py 2026-02-13 01:40:01 +08:00
679ca5d8d3 Fix MoE for the Transformers modelling backend (#34436) Harry Mellor 2026-02-12 18:29:42 +01:00
f2c47886fd [Attention] Add FlashInfer Sparse MLA backend (#33451) Matthew Bonanni 2026-02-12 12:21:54 -05:00
334c715e0f [Docs] Spec decoding docs warning removal (#34439) Nicolò Lucchesi 2026-02-12 18:01:51 +01:00
7b5a8b4a9d [BUG] Reset running requests when clearing cache for pause/resume (#34382) Aaron Hao 2026-02-12 08:19:13 -08:00
dea63512bb Add config file for fused MoE for Nemotron (TP4, B200) (#34411) danisereb 2026-02-12 16:09:55 +02:00
8a798be929 [ROCm] Enable MXFP4 MoE weight pre-shuffling on gfx950 and update aiter (#34192) Douglas Lehr 2026-02-12 07:06:33 -06:00
fb455ed547 [V0 Deprecation] Remove code related to per-request logits processors (#34400) Cyrus Leung 2026-02-12 20:44:28 +08:00
2d5be1dd5c release script khluu 2026-02-12 02:37:52 -08:00
f5897613fb Fix Mistral config remap to accept compressed-tensors quantization #34028 (#34104) baonudesifeizhai 2026-02-12 03:22:06 -05:00
55a1a9563a Vllm CPU benchmark suite improvement (#34128) Louie Tsai 2026-02-12 00:04:44 -08:00
386bfe5d08 [bugfix] refactor FunASR's _get_data_parser (#34397) AllenDou 2026-02-12 15:26:49 +08:00
e9cd691132 [Bugfix] Fix Sparse24 Compressed Tensors models (#33446) Kyle Sayers 2026-02-12 02:15:16 -05:00
80f2ba6ea6 Fix DeepSeek-OCR tensor validation for all size variants (#34085) Yichuan Wang 2026-02-11 22:50:23 -08:00
136b0bfa59 [BugFix] Fix DP chunking (#34379) Lucas Wilkinson 2026-02-11 23:44:03 -07:00
7a06e5b05b [Bugfix] Fix MTP accuracy for GLM-5 (#34385) v0.16.0rc3 Michael Goin 2026-02-11 22:08:19 -05:00
946b2f106c [Bugfix] send None sentinel on final commit so server properly sends transcription.done (#33963) Junseo Park 2026-02-12 06:01:53 +09:00
5e8adb0c49 [Misc] Bump fastsafetensors version for latest fixes (#34273) Nick Hill 2026-02-11 00:30:09 -08:00
9be1ff2d3a [Bugfix] fix default is_neox_style is True for deepseek (#34353) Xinyu Dong 2026-02-12 02:20:45 +08:00
b3ee90f961 [Model] GLM adaptation (#34124) Jee Jee Li 2026-02-09 17:32:52 +08:00
b96f7314b4 [Refactor] Pass Renderer to Input Processor (#34329) Cyrus Leung 2026-02-12 11:38:11 +08:00
ced2a92f40 [Refactor] Move validation to params definitions (#34362) Cyrus Leung 2026-02-12 11:33:15 +08:00
e1d97c38f8 [Bug Fix] Fix naive_block_assignment always defaulting to False due to arg misalignment (#33848) Runkai Tao 2026-02-11 22:30:57 -05:00
ec12d39d44 [Bugfix] Fix MTP accuracy for GLM-5 (#34385) Michael Goin 2026-02-11 22:08:19 -05:00
ff1f83b056 [Refactor] Replace activation: str with MoEActivation enum (#33843) Michael Goin 2026-02-11 20:29:32 -05:00
83b47f67b1 [ci] Integrate AMD tests into CI (#33626) Kevin H. Luu 2026-02-11 16:54:17 -08:00
fb7b30c716 [ROCm][CI] Revert Test Groups From mi325_8 to mi325_1 Agent Pool In AMD CI (#34384) Micah Williamson 2026-02-11 17:52:34 -06:00
31d992d215 [Bugfix] Fix some issues with MoERunner PR #32344 (#34371) bnellnm 2026-02-11 17:33:14 -05:00
5aff2699bd Fix CI failure - Flashinfer Kernel tests (#34316) Wei Zhao 2026-02-11 17:17:16 -05:00
527ca32197 [Bugfix] Fix more multimodal tests for transformers V5 (#34334) Raushan Turganbay 2026-02-11 22:02:05 +01:00
5458eb835d [Bugfix] send None sentinel on final commit so server properly sends transcription.done (#33963) Junseo Park 2026-02-12 06:01:53 +09:00
144d9b7cc8 [Benchmarks] Reduce ready checker log verbosity (#34349) Tomas Ruiz 2026-02-11 21:57:57 +01:00
83e26c834e [GPT-OSS] Remove unnecessary contiguous (#34337) elvischenv 2026-02-12 04:29:29 +08:00
5001211369 [ROCm] [CI] fix test_unrecognized_env (#34350) TJian 2026-02-12 02:50:44 +08:00
11c7ace340 [Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) (#34243) Eldar Kurtić 2026-02-11 19:24:22 +01:00
be7f3d5d20 [Bugfix] fix default is_neox_style is True for deepseek (#34353) Xinyu Dong 2026-02-12 02:20:45 +08:00
0ab06100f4 [Multimodal] Expose mm_processor_kwargs for DummyInputsBuilder (#34330) Isotr0py 2026-02-12 01:37:40 +08:00
ffb3d553cc [Model Runner V2] Init cuda graph pool when necessary (#33217) Xinyu Chen 2026-02-12 01:12:13 +08:00
fa7e0bfacf [CI][BugFix] Fix silent failure in shellcheck hook and baseline exist… (#32458) junuxyz 2026-02-12 02:03:48 +09:00
48134a2c22 [Docs] Fix typo ("defult") and double spacing (#34348) SorenDreano 2026-02-11 18:02:27 +01:00
64f570ab56 [ROCm] [aiter] Split KV cache update for AiterFlashAttention (#33681) kliuae 2026-02-12 00:26:44 +08:00
fd618871b4 [Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache (#33948) Rohan Potdar 2026-02-11 10:12:05 -06:00
67a42b5a44 Don't try and run GLM-ASR with remote code (#34352) Harry Mellor 2026-02-11 17:09:40 +01:00
c7914d30f9 Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043) Lucas Wilkinson 2026-02-11 08:07:56 -07:00
1b8756562e Responses harmony system message structured (#34268) Adam Binford 2026-02-11 08:14:28 -05:00
275e0d2a99 [NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715) Linda 2026-02-11 13:38:11 +01:00
0f5e55e7a8 Make JAIS compatible with Transformers v5 (#34264) Harry Mellor 2026-02-11 13:30:37 +01:00
1e9204bff3 Make Qwen3VL compatible with Transformers v5 (#34262) Harry Mellor 2026-02-11 13:13:23 +01:00
05339a7b20 [Bugfix][CPU] Fix llama4 inference on CPU (#34321) Li, Jiang 2026-02-11 19:07:23 +08:00
40b8f55358 [Docs] Reduce time spent generating API docs (#34255) Harry Mellor 2026-02-11 11:56:02 +01:00
c44d0c6d66 Patch protobuf for CVE-2026-0994 (#34253) v0.16.0rc2 Seiji Eicher 2026-02-11 02:25:04 -08:00
83db96d8cd [XPU][9/N] clean up existing ipex code/doc (#34111) Kunshang Ji 2026-02-11 16:27:15 +08:00
dbfb79fe45 [XPU][7/N] enable xpu fp8 moe (#34202) zofia 2026-02-11 11:33:59 +08:00
b2e1fc3589 [Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183) Roger Wang 2026-02-09 21:03:32 -08:00
55a1baebc5 [Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available (#34153) Gregory Shtrasberg 2026-02-09 17:38:54 -06:00
e1e9841631 [torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. (#33945) Charlie Fu 2026-02-09 15:15:43 -06:00
5bd63387c3 [XPU][6/N] add xpu scaled_mm kernel (#34117) zofia 2026-02-09 20:17:35 +08:00
5045d5c983 Patch protobuf for CVE-2026-0994 (#34253) Seiji Eicher 2026-02-11 02:25:04 -08:00

... 17 18 19 20 21 ...