Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

73f48ce559 [Kernel] [Helion] Use warning_once in get_gpu_name to prevent log spam (#38743) Yanan Cao 2026-04-01 21:30:31 -07:00
3aab680e3e [ROCm][Bugfix] Fix ROCm runtime failure due to missing symbol (#38750) Gregory Shtrasberg 2026-04-01 23:30:11 -05:00
5a2d420c17 [Bugfix] Use dedicated MM processor cache in /tokenize to prevent sender-cache pollution (#38545) Sergey Zinchenko 2026-04-02 07:14:49 +03:00
5f96f9aff1 [Perf] DSV3.2 Indexer Fused Weights Projection (#38684) Benjamin Chislett 2026-04-01 23:34:49 -04:00
694449050f Fix multiline-format string for python 3.10 (#38739) Luka Govedič 2026-04-01 23:19:35 -04:00
6241521dd2 [BugFix] Fix precommit breakage due to conflicting in-flight merges (#38759) Nick Hill 2026-04-01 15:35:06 -07:00
1785dc5501 Revert "[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831)" (#38751) Kevin H. Luu 2026-04-01 15:34:28 -07:00
54500546ac [Bugfix] Preserve original ImportError in gRPC server entrypoint (#38673) Chang Su 2026-04-01 15:16:44 -07:00
cfad6a509c Revert "[Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang (#38730)" khluu 2026-04-01 15:14:58 -07:00
de5e6c44c6 [Feat][Executor] Introduce RayExecutorV2 (#36836) Jeffrey Wang 2026-04-01 14:34:29 -07:00
cb268e4e55 [Refactor] Simplify FutureWrapper in MultiprocExecutor (#38644) yzong-rh 2026-04-01 17:28:26 -04:00
c284a6671c [Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang (#38730) v0.19.0rc1 Stefano Castagnetta 2026-04-01 21:08:40 +02:00
3a30a1a6a8 [Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242) Chauncey 2026-04-02 00:56:45 +08:00
29982d48b3 (security) Enforce frame limit in VideoMediaIO (#38636) Juan Pérez de Algaba 2026-04-01 12:23:45 +02:00
6183cae1bd [Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang (#38730) Stefano Castagnetta 2026-04-01 21:08:40 +02:00
c09ad767cd Feature/silu block quant fusion v1 (#32996) Monishver 2026-04-01 11:50:43 -07:00
c9a9db0e02 [Compile] Fix nvfp4 compile warning (#38573) Wentao Ye 2026-04-01 14:28:57 -04:00
cbe7d18096 [Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242) Chauncey 2026-04-02 00:56:45 +08:00
db5d0719e1 [Kernel] Add MXFP8 to Marlin GEMM/MoE and refactor Mxfp8LinearOp (#34664) Michael Goin 2026-04-01 18:41:42 +02:00
dc0428ebb8 [NIXL][BUG] Fix Triton heterogeneous TP (#37940) yzong-rh 2026-04-01 11:23:15 -04:00
148c2072ec Add ibm-granite/granite-vision-3.3-2b to supported models documentation (#38714) Jesus Talavera 2026-04-01 17:22:25 +02:00
2f5c3c1ec0 [Misc] Fix docstring typo: buildin -> builtin (#38722) majianhan 2026-04-01 22:39:46 +08:00
fa246d5231 Fix shape comment in extract_hidden_states example (#38723) Fynn Schmitt-Ulms 2026-04-01 10:29:33 -04:00
7cf56a59a2 [MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153) bnellnm 2026-04-01 09:44:08 -04:00
5e30e9b9a9 [Bugfix] Revert "Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding" (#38359) Elvir Crnčević 2026-04-01 15:11:10 +02:00
582340f273 [Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831) 손세정 2026-04-01 21:22:29 +09:00
992368522f [KVTransfer] Fix TpKVTopology.is_kv_replicated equality case (#38179) yjz 2026-04-01 18:41:49 +08:00
58ee614221 (security) Enforce frame limit in VideoMediaIO (#38636) Juan Pérez de Algaba 2026-04-01 12:23:45 +02:00
f9f6a9097a Add verified label to trigger pre-commit (#38708) Harry Mellor 2026-04-01 10:31:02 +01:00
c75a313824 [Perf] triton bilinear_pos_embed kernel for ViT (#37948) Zhanda Zhu 2026-04-01 04:52:02 -04:00
4f6eed3bd4 [Core] Simplify multimodal masking (#34246) Lukas Geiger 2026-04-01 09:18:22 +01:00
1dbbafd3f3 [Feat][v1] Simple yet General CPU KV Cache Offloading (#37160) v0.19.0rc0 Yifan Qiao 2026-03-31 17:58:37 -07:00
0ee3b7fc3d [Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178) Lucas Wilkinson 2026-04-01 00:15:53 -04:00
268bed9cf3 [Bugfix][Async] Fix async spec decoding with hybrid models (#38556) Matthew Bonanni 2026-03-31 11:08:54 -04:00
bcc0fdd0f3 [CI] fix LM Eval Qwen3.5 Models (B200) (#38632) Jiangyun Zhu 2026-03-31 21:20:08 +08:00
69b8bd4b33 [CI Failure] pin colmodernvbert revision (#38612) wang.yuqi 2026-03-31 18:54:54 +08:00
36d7f19897 [CPU] Support head_size 512 in cpu_attn (#38676) Li, Jiang 2026-04-01 13:42:27 +08:00
2d725b89c5 [Bugfix] Lazy import diskcache to avoid sqlite3/libstdc++ ImportError at startup (#38649) Jeffrey Wang 2026-03-31 22:31:20 -07:00
ef53395e2c [bugfix] do not add extra linebreak for score/rerank with chat template (#38617) Augusto Yao 2026-04-01 12:50:07 +08:00
eb47454987 [Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178) Lucas Wilkinson 2026-04-01 00:15:53 -04:00
116f4be405 [1/N][Cleanup] Standardize on use of is_quantized_kv_cache (#38659) Matthew Bonanni 2026-04-01 00:08:01 -04:00
7b01d97a22 [Perf] Optimize mean pooling using chunks and index_add, 5.9% E2E throughput improvement (#38559) Wentao Ye 2026-03-31 23:54:58 -04:00
17b72fd1c8 Fix priority preemption regression test in scheduler (#37051) HarshRathva 2026-04-01 09:06:12 +05:30
c49497726b [ROCm][perf] Shuffle KV cache to use paged_attention_common (#32914) Samu Tamminen 2026-04-01 06:30:19 +03:00
cb0b443274 [Misc] Add 20 regression tests for 11 tool parser bug fixes (#38172) Ben Browning 2026-03-31 23:00:31 -04:00
40bb175027 [vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825) Luka Govedič 2026-03-31 22:15:05 -04:00
0fab52f0aa Fix NaN from stale FP4 scale padding in create_fp4_scale_tensor (#38148) Elvir Crnčević 2026-04-01 04:14:59 +02:00
91e4521f9f [Feat][v1] Simple yet General CPU KV Cache Offloading (#37160) Yifan Qiao 2026-03-31 17:58:37 -07:00
31a719bcd3 [ROCm][perf] fix Aiter sparse MLA with MTP>1 (#37887) Stig-Arne Grönroos 2026-04-01 02:22:23 +03:00
2e56975657 Generative Scoring (#34539) Vedant V Jhaveri 2026-03-31 16:02:11 -07:00
36f1dc19ae feat(grpc): add periodic stats logging and servicer log forwarding (#38333) Chang Su 2026-03-31 15:50:07 -07:00
3dc01ef352 [Quantization] Consolidate dummy format logic into DummyModelLoader (#38637) Asaf Gardin 2026-04-01 01:20:45 +03:00
cc671cb110 [Kernel] [Helion] [17/N] Add Helion kernel torch.compile support (#38592) Yanan Cao 2026-03-31 14:06:42 -07:00
856589ed9a [Refactor] Remove dead code in kv connector and model runner (#38383) Wentao Ye 2026-03-31 17:05:23 -04:00
517b769b58 [Perf] Fix DBO overlap: capture DeepEP event before yield (#38451) czhu-cohere 2026-03-31 13:38:59 -07:00
d9b90a07ac [MoE Refactor] Migrate Unquantized to Full Oracle Flow (#36286) yzong-rh 2026-03-31 15:43:33 -04:00
598190aac3 [fix] Remove trtllm ragged mla prefills (#36540) Olya Kozlova 2026-03-31 21:30:27 +02:00
b779eb3363 [Model] Sync upstream BT=chunk_size fix for GDN chunk_fwd_kernel_o, simplify warmup to single pass (#38343) Xu Jinyang 2026-04-01 03:03:24 +08:00
077a9a8e37 [torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate (#37373) BadrBasowid 2026-04-01 02:15:50 +08:00
07edd551cc [CI/Build] Resolve a dependency deadlock when installing the test dependencies used in CI (#37766) Run Yu 2026-03-31 11:05:14 -07:00
7c080dd3c5 [4/n] Migrate FP4/W4A8 CUTLASS kernels to torch stable ABI (#37503) mikaylagawarecki 2026-03-31 13:21:13 -04:00
0dd25a44ea [Quantization][Autoround][XPU] Add W4A16 Support (#37986) Yi Liu 2026-04-01 00:48:24 +08:00
3896e021a0 [Bugfix] Fix FusedMoE weight loading with padded hidden dimensions (#37010) SandishKumarHN 2026-03-31 09:22:26 -07:00
b6e636c12c [Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 (#38629) v0.18.2rc0 zhang-prog 2026-03-31 23:50:41 +08:00
f1ff50c86c [Bugfix] clamp dA_cumsum differences to prevent Inf in Mamba2 SSD kernels (#37501) Jingu Kang 2026-04-01 00:35:51 +09:00
757068dc65 [Bugfix][Async] Fix async spec decoding with hybrid models (#38556) Matthew Bonanni 2026-03-31 11:08:54 -04:00
7337ff7f03 [Docs] PD with Nixl compat matrix (#38628) Nicolò Lucchesi 2026-03-31 17:01:21 +02:00
5869f69c5f [Online Quant] [QeRL] Minor code cleanup (#38574) Kyle Sayers 2026-03-31 10:56:43 -04:00
4dfad17ed1 replace cuda_device_count_stateless() to current_platform.device_count() (#37841) wliao2 2026-03-31 07:32:54 -07:00
e8057c00bc [CI] Avoid concurrent docker pull in intel XPU CI runners to prevent rate limit issues (#38594) wenjun liu 2026-03-31 22:23:18 +08:00
7430389669 [Bugfix][CI] Skip flaky test_eagle test (#38566) Nicolò Lucchesi 2026-03-31 15:42:37 +02:00
202f147cf2 Fix MLA runs when use_inductor_graph_partition=True (#38631) ElizaWszola 2026-03-31 15:37:43 +02:00
ea7bfde6e4 [CI] fix LM Eval Qwen3.5 Models (B200) (#38632) Jiangyun Zhu 2026-03-31 21:20:08 +08:00
d71a15041f [XPU]move testing dependencies from Dockerfile to xpu-test.in (#38596) sihao_li 2026-03-31 20:49:43 +08:00
abdbb68386 [EPLB] Add alternative communication for EPLB weight exchange (#33176) Ilya Markov 2026-03-31 14:17:12 +02:00
0c63739135 [EPD] update EPD script arguments (#36742) liuzhenwei 2026-03-31 20:02:09 +08:00
719735d6c5 [CI Failure] pin colmodernvbert revision (#38612) wang.yuqi 2026-03-31 18:54:54 +08:00
aae3e688f8 Fix document of torchrun_example.py (#31113) Maosheng Liao 2026-03-31 18:54:23 +08:00
7d65463528 [WIP][CI][Bugfix] Fix test_run_eagle_dp (#38584) Matthew Bonanni 2026-03-31 06:30:25 -04:00
8278825b57 DOC: TPU mention fix (#38129) Mateusz Sokół 2026-03-31 12:27:56 +02:00
acf7292bf2 [Misc] Move --grpc CLI argument into make_arg_parser (#38570) Chang Su 2026-03-31 03:24:05 -07:00
ce884756f0 [Feature]: add presence_penalty and frequency_penalty fields to Responses API (#38613) Chauncey 2026-03-31 16:45:57 +08:00
d9d21eb8e3 [Frontend][3/n] Improve pooling entrypoints | scoring. (#28631) wang.yuqi 2026-03-31 15:52:00 +08:00
f09daea261 [CPU] Support int8 compute mode in CPU AWQ (#35697) Yintong Lu 2026-03-31 15:27:37 +08:00
42318c840b [ci] Remove benchmarks job (#38611) Kevin H. Luu 2026-03-30 23:46:21 -07:00
1ac6694297 [OOT] Add OOT support for linear kernel. (#37989) zhangyiming 2026-03-31 14:33:21 +08:00
12449f9492 [Bugfix][CPU] Skip set_num_threads after thread binding (#38535) Li, Jiang 2026-03-30 20:13:00 +08:00
6cc7abdc66 [kv_offload+HMA] Fix num_blocks with different per-layer page sizes and improve assert message (#38554) Kfir Toledo 2026-03-31 09:00:40 +03:00
d53cb9cb8e [Tool Parser][2/3] Use self.tools instead of request.tools in tool parsers (#38189) Flora Feng 2026-03-31 01:41:36 -04:00
44eef0ca1e vLLM Benchmark Suite perf regression after PR#32723 (#38576) Louie Tsai 2026-03-30 22:23:17 -07:00
b9cdc85207 [ROCm][CI] Fix Whisper translation test attention backend selection (#38508) Andreas Karatzas 2026-03-31 00:21:49 -05:00
b92312dfd7 [CI] Fix SPLADE pooler test broken by #38139 (#38495) haosdent 2026-03-30 15:48:33 +08:00
3e802e8786 [Mypy] Fix adjust_request typing (#38264) Flora Feng 2026-03-31 00:21:18 -04:00
350af48e14 [KVConnector] Remove redundant method KVConnectorOutput::merge() (#38546) Martin Hickey 2026-03-31 05:11:02 +01:00
e31915063d [Bugfix] Fix for builtins (forward fix of pytorch/177558) (#37234) Lucas Kabela 2026-03-30 18:08:11 -07:00
29e48707e8 [Refactor] Consolidate Tool type alias in tool_parsers/utils.py (#38265) Flora Feng 2026-03-30 20:55:51 -04:00
4ac227222f [Bugfix][DCP] Fix CUDA graph capture for Decode Context Parallelism (#36070) sungsoo ha 2026-03-30 17:20:43 -07:00
bb51d5b40d Add @vadiklyutiy as committer (#38589) Vadim Gimpelson 2026-03-31 03:50:04 +04:00
93b3ec1585 feat(attention): extract KV-cache update from FlashAttentionDiffKV ba… (#36466) Prathmesh Bhatt 2026-03-30 16:16:09 -07:00
e812bf70bd Restore non-hf processor path for Nano-Nemotron-VL (bypass call_hf_processor_mm_only) - fixes #38018 (#38567) Netanel Haber 2026-03-31 00:56:52 +03:00

1 2 3 4 5 ...