Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

9df152bbf6 [Misc] Algin Qwen3-VL-embedding image example outputs with HF repo example (#33419) Isotr0py 2026-01-31 11:36:56 +08:00
876a16f4fb [ModelRunner V2] Fix spec decoding + logprobs (#33391) Nick Hill 2026-01-30 19:33:26 -08:00
aaa901ad55 [Attention] Move MLA forward from backend to layer (#33284) Matthew Bonanni 2026-01-30 22:30:00 -05:00
010ec0c30e [Deprecation] Deprecate seed_everything and scatter_mm_placeholders in v0.15 (#33362) Wentao Ye 2026-01-30 21:54:16 -05:00
64a40a7ab4 [Bugfix] Fix typo in read_offset variable name (#33426) Alberto Ferrer 2026-01-30 19:26:15 -06:00
31aedfe7d6 [Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831 (#33366) Gregory Shtrasberg 2026-01-30 19:05:23 -06:00
67ebaff528 Refactor NVFP4 Linear utils for ModelOpt and CT (#33201) Michael Goin 2026-01-30 19:37:42 -05:00
2b465570e6 [CI][HPU]accelerate hpu test by skip python re-install and clean container name (#33286) Chendi.Xue 2026-01-30 15:36:29 -06:00
9ca66ecc10 Indicate compile mode in the benchmark results (#32990) Huy Do 2026-01-30 12:34:36 -08:00
c3a9752b0c [Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. (#32437) Pavani Majety 2026-01-30 10:30:46 -08:00
f451b4558b [Quantization][ROCm] Fix MoE weight loading to be robust (Qwen3_MoE/Qwen3_next as example models) (#33173) xuebwang-amd 2026-01-31 01:50:23 +08:00
3f96fcf646 fix QERL attention import path (#33432) Vasiliy Kuznetsov 2026-01-30 12:29:09 -05:00
6c1f9e4c18 [Kernel] [Helion] [1/N] Add Helion ConfigManager (#32740) Yanan Cao 2026-01-30 09:19:19 -08:00
67239c4c42 Fix encoder-decoder model disabling mm processor cache (#33236) Harry Mellor 2026-01-30 16:30:10 +00:00
8ece60768f [CI] Qwen3-ASR transcriptios tests (#33414) Nicolò Lucchesi 2026-01-30 17:17:56 +01:00
fd0e377244 Support FP8 block quant for CompressedTensorsW8A16Fp8 (#33280) Michael Goin 2026-01-30 11:15:20 -05:00
f857a03f6b [QeRL] Layerwise Reloading (#32133) Kyle Sayers 2026-01-30 10:50:05 -05:00
74898a7015 [BugFix][LoRA] TritonExperts is ModularMoEPath for FP8 models (#33393) Danielle Robinson 2026-01-30 07:27:42 -08:00
8f5d51203b Disable Cascade Attention for Batch Invariance (#32561) Frank Wang 2026-01-30 07:00:46 -08:00
ae5b7aff2b Improve Mistral format checks. (#33253) Julien Denize 2026-01-30 15:23:33 +01:00
a11bc12d53 Fix test_moe.py for Transformers v5 (#33413) Harry Mellor 2026-01-30 14:03:25 +00:00
58cb55e4de [Doc] Enhance documentation around CPU container images (#32286) Nathan Weinberg 2026-01-30 08:36:20 -05:00
cf896ae0e3 [Misc] Clean up HIDDEN_DEPRECATED_METRICS after metric removal (#33323) 杨朱 · Kiki 2026-01-30 21:31:17 +08:00
c5113f60f2 Remove deprecated reasoning_content message field (#33402) Harry Mellor 2026-01-30 11:48:15 +00:00
174f16700b [Doc] [ROCm] Update Documentation to reflect v0.15.0 release (#33388) vllmellm 2026-01-30 19:06:08 +08:00
8e2ad97ad0 [BUGFIX] Pixtral cannot be loaded with --limit-mm-per-prompt 0 (#33406) Julien Denize 2026-01-30 11:52:02 +01:00
10152d2194 [Realtime API] Adds minimal realtime API based on websockets (#33187) Patrick von Platen 2026-01-30 11:41:29 +01:00
1a7894dbdf [Misc] Replace Optional[X] with X | None syntax (#33332) 杨朱 · Kiki 2026-01-30 17:56:59 +08:00
c87eac18f7 [Refactor] Move MM item count validation outside of processor (#33396) Cyrus Leung 2026-01-30 17:27:31 +08:00
f45870b53f fix: allow LFM2 MoE prefix caching (align) (#33376) tianshu-Michael-yu 2026-01-30 00:23:14 -08:00
ba45bedfd1 [model] Add support for openPangu7B-VL (#32449) hujiaxin0 2026-01-30 15:54:27 +08:00
9432ed8c7e Explicitly set return_dict for apply_chat_template (#33372) Harry Mellor 2026-01-30 07:27:04 +00:00
726d89720c [CI] Enable mypy import following for vllm/spec_decode (#33282) Lucas Kabela 2026-01-29 22:43:32 -08:00
d334dd26c4 Move decode context parallel validationn to ParallelConfig (#33239) Harry Mellor 2026-01-30 06:18:41 +00:00
070c811d6f [CI][AMD] Skip 4 GPUs testgroup ray tests (#33305) Ryan Rock 2026-01-29 23:39:53 -06:00
8bfc8d5600 [Models] Refactor Kimi-K2.5 weight loading (#33346) Isotr0py 2026-01-30 13:31:20 +08:00
ec51831a22 [BugFix] Disable async scheduling for Mamba prefix caching (#33352) Harry Huang 2026-01-30 12:40:19 +08:00
80b918f2bd Fix tie_word_embeddings for multimodal models in Transformers v5 (#33359) Harry Mellor 2026-01-30 03:37:39 +00:00
c46b0cd0af [Model][Multimodal] Add explicit MusicFlamingo adapter (#32696) Wang Haoyu 2026-01-30 11:01:29 +08:00
133765760b [Docs] Adding links and intro to Speculators and LLM Compressor (#32849) v0.16.0rc0 Aidan Reilly 2026-01-29 22:12:35 +00:00
bfb9bdaf3f [Bugfix] Enable Triton MoE for FP8 per-tensor dynamic (#33300) Michael Goin 2026-01-29 15:15:17 -05:00
2284461d02 [release] Minor fixes to release annotation and wheel upload (#33129) Kevin H. Luu 2026-01-29 12:09:35 -08:00
8e2a469b3b Add Triton fused MoE config for B200 (Nemotron Nano) (#32804) danisereb 2026-01-29 21:21:33 +02:00
23591e631e [Bugfix][Kernel] Fix negative memory offset in GDN Triton kernel (#33326) CarstyYou 2026-01-30 02:40:11 +08:00
0493d897c4 [NVIDIA] [feat] Integrate flashinfer Trtllmgen bf16 moe (#32954) Linda 2026-01-29 19:00:13 +01:00
8c8ebeb941 [BUGFIX][XPU] fix memory check after XPU reuse GPU_worker (#33358) Chendi.Xue 2026-01-29 11:56:30 -06:00
831453fcef [Chore] Move MediaConnector to vllm.multimodal.media (#33324) Cyrus Leung 2026-01-30 00:54:31 +08:00
5a66c9cc76 [ez] Delete torch25_custom_graph_pass (#33287) Angela Yi 2026-01-29 08:47:05 -08:00
5e73e4900c [Bugfix] Fix broken GLM-OCR initialization (#33350) Isotr0py 2026-01-29 23:56:05 +08:00
c6e7404cc5 [Multimodal] Simplify MM input definitions (#33331) Cyrus Leung 2026-01-29 21:32:04 +08:00
17b17c0684 [Backport] [Kimi-K2.5] Replace torch.cuda with current_platform for d… (#33320) sthWrong 2026-01-29 20:29:17 +08:00
8bb6271c77 [Intel GPU] refine xpu worker (#32894) Kunshang Ji 2026-01-29 20:26:52 +08:00
8b3f0a99dd [Models] Qwen3-ASR (#33312) Roger Wang 2026-01-29 03:27:15 -08:00
8311f083bd [Bugfix][CPU] Fix thread num for shared memory communication (#33317) Li, Jiang 2026-01-29 19:26:58 +08:00
40c35038d2 [Voxtral] Streaming example (#33042) Patrick von Platen 2026-01-29 12:22:49 +01:00
a5aa4d5c0f [Quantization][Refactor] use platform dict to choose kernel (#33130) zofia 2026-01-29 18:44:58 +08:00
615e8033e5 [Bug Fix] Handle variable-length tensors in MultiModalFlatField batching (#31751) andrii.pasternak 2026-01-29 10:42:59 +00:00
d09135fbd0 [BugFix] Async Eplb fix potential race condition (#32881) Ilya Markov 2026-01-29 11:31:40 +01:00
8688c3d460 [fix] tesdt mcp_tool_calling_streaming with a more complex math question (#32769) daniel-salib 2026-01-29 02:25:58 -08:00
5400014d55 [Chore] Remove use_data_parallel kwargs from ViT implementation (#33310) Isotr0py 2026-01-29 18:20:52 +08:00
3a92c6f3b5 [Misc] Cleanup Kimi-K2.5's vision chunk modality entrypoints (#33157) Isotr0py 2026-01-29 17:46:02 +08:00
e01ff5c070 Bugfix: Pass router logits dtype in nemotron shared experts (#32669) amirkl94 2026-01-29 01:36:34 -08:00
fb946a7f89 Make mypy opt-out instead of opt-in (#33205) Harry Mellor 2026-01-29 09:12:26 +00:00
a650ad1588 [Misc] Remove missed pad_for_cudagraph (#33283) Lucas Wilkinson 2026-01-29 02:12:05 -07:00
d697581a7c [Doc] Update outdated link to Ray documentation (#32660) graftim 2026-01-29 09:56:06 +01:00
5eeba80c74 Adding optional speculator tests for larger models (#32943) shanjiaz 2026-01-29 03:54:02 -05:00
08b1195e62 [PluggableLayer][2/N] Apply PluggableLayer to linear layers (#33152) whx 2026-01-29 16:53:15 +08:00
3bba2edb0f support returning tokenids in responses api (#33212) cmunley1 2026-01-29 00:52:39 -08:00
53fc166402 [BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262) Ilya Markov 2026-01-29 09:52:11 +01:00
31b25f6516 [Doc]: fixing multiple typos in diverse files (#33256) Didier Durand 2026-01-29 09:52:03 +01:00
abb34ac43a [Bugfix] Fix Qwen3-VL-Reranker load. (#33298) wang.yuqi 2026-01-29 16:42:53 +08:00
2515bbd027 [CI/Build][BugFix] fix cuda/compat loading order issue in docker build (#33116) Pengchao Wang 2026-01-29 00:19:05 -08:00
c487a8eef4 [Release] [ROCm] Remove old build step (#33316) TJian 2026-01-29 15:35:51 +08:00
9e138cb01d [Misc][Build] Lazy load cv2 in nemotron_parse.py (#33189) Kiersten Stokes 2026-01-29 00:55:50 -06:00
f176443446 [Release] [CI] Optim release pipeline (#33156) v0.15.0 TJian 2026-01-29 14:45:42 +08:00
f9d03599ef [Release] [CI] Optim release pipeline (#33156) TJian 2026-01-29 14:45:42 +08:00
39037d258e Fix tool call indexing double-counting (#33141) wangln19 2026-01-29 13:57:09 +08:00
51550179fc [Refactor] Define MM data parser in processing info instead of processor itself (#33260) Cyrus Leung 2026-01-29 13:55:17 +08:00
07ea184f00 [ez] Delete more torch version checks <= 2.8 (#33288) Angela Yi 2026-01-28 21:28:46 -08:00
a663b218ae [Misc] Add orozery to CODEOWNERS (core, kv_transfer, kv_offload) (#33227) Or Ozeri 2026-01-29 06:24:20 +02:00
1bd47d6e5a [Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+SM100 (#33285) Michael Goin 2026-01-28 21:40:59 -05:00
141cd43967 [UX] Remove noisy CT UnquantizedLinearMethod warn (#33273) Michael Goin 2026-01-28 19:09:30 -05:00
6bf3b46d78 [ModelRunner V2] Misc code simplification and cleanup (#33266) Nick Hill 2026-01-28 14:41:23 -08:00
77c4f45c6c [7/N][Attention][Docs] Add documentation for attention backends (#32477) Matthew Bonanni 2026-01-28 17:20:22 -05:00
ca1969186d [UX] Enable nested configs in config yaml files (#33193) Michael Goin 2026-01-28 16:54:25 -05:00
ab597c869a [Bugfix] Add missing encoder only guard for do_kv_cache_update (#33269) Gregory Shtrasberg 2026-01-28 15:25:07 -06:00
4197168ea5 [ez] Remove checks for torch version <= 2.8 (#33209) Angela Yi 2026-01-28 13:03:56 -08:00
59bcc5b6f2 Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976) Rohan Potdar 2026-01-28 14:47:47 -06:00
3e440786af [Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement (#32618) Wentao Ye 2026-01-28 15:30:32 -05:00
fe18ce4d3f Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" (#33241) v0.15.0rc3 Or Ozeri 2026-01-28 14:36:00 +02:00
8bdd3979d8 [CI] Change GPU key to device key for B200 test (#33275) Kevin H. Luu 2026-01-28 11:14:29 -08:00
c4e744dbd4 [Perf] Optimize moe_permute for CUTLASS FP8 (#32892) Wentao Ye 2026-01-28 13:15:24 -05:00
8ebf372e9d [CI] Whisper tests enforce_eager=False (#33098) Nicolò Lucchesi 2026-01-28 18:36:56 +01:00
f210f0b7b1 [lora/moe] Avoid extra intermediate buffer & Python slicing in expand phase when split_k == 1 (#32774) cwazai 2026-01-29 00:22:45 +08:00
392c5af4fe [Benchmark] Add startup benchmarking to buildkite run (#33183) Bin Bao 2026-01-28 11:03:07 -05:00
af9b69f977 [Quantization][Deprecation] Remove Marlin 24 (#32688) Robert Shaw 2026-01-28 07:54:59 -08:00
8e5e40daf4 [Misc] Provide a DeepSeek ReasoningParser with thinking enabled by default (#33221) Chauncey 2026-01-28 21:16:53 +08:00
2e8de86777 Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" (#33241) Or Ozeri 2026-01-28 14:36:00 +02:00
247d1a32ea [Quantization][Deprecation] Remove BitBlas (#32683) Robert Shaw 2026-01-28 03:06:22 -08:00
5f7f9ea884 Relax protobuf library version constraints (#33202) v0.15.0rc2 Jeffrey Wang 2026-01-27 20:15:53 -08:00

... 22 23 24 25 26 ...