Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

f296a1966d [Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs (#36876) Thomas Parnell 2026-03-13 07:09:39 +01:00
bc2c0c86ef [Frontend] Fix usage incorrectly returned with empty stream_options` (#36379) Csrayz 2026-03-13 11:33:04 +08:00
891c60dcd5 fix(kv-cache): increase hybrid attention grouping threshold from 1.25 to 1.5 (#36684) jaime campos salas 2026-03-12 23:28:27 -04:00
1ce13cf992 [Model] Add support for BERT-like Chinese ERNIE pooling models (#36385) whyiug 2026-03-13 11:23:53 +08:00
10f08dedfa [Model] Add ColPali late interaction model for multi-modal retrieval (#36818) Nikita 2026-03-13 03:18:57 +01:00
5e1a373d2e [BUG] Fix rank calculation in NCCLWeightTransferEngine (#36940) Aaron Hao 2026-03-12 18:56:51 -07:00
572c776bfb build: update smg-grpc-servicer to use vllm extra (#36938) Simo Lin 2026-03-12 18:31:36 -07:00
55d8073d06 [Bugfix] ep_scatter kernel store-load race condition (#34991) Yifan Qiao 2026-03-12 18:07:59 -07:00
cd32d6f586 [Model Runner V2] Some code simplification (#36929) Nick Hill 2026-03-12 17:59:23 -07:00
aaa3092f51 [MoE] Add routing simulation override for MXFP4 quantized MoE (#33595) Jaewon 2026-03-12 17:30:44 -07:00
87985077a4 [Speculative Decoding] Add norm_before_fc for gpt-oss draft models (#36545) Shubhra Pandit 2026-03-12 19:03:32 -04:00
a79c1c2c80 [AMD][Build] Add DeepEP to ROCm Dockerfile (#36086) Ryan Rock 2026-03-12 16:33:32 -05:00
cc8f1f4764 [ROCm][CI] Preparing gfx90a mirroring (#36210) Andreas Karatzas 2026-03-12 15:42:25 -05:00
05b9e8ab5b Revise environment setup in AGENTS.md (#36909) Michael Goin 2026-03-12 20:21:11 +01:00
2cdf92228c [Feature]: Remove Chunking From FusedMoE (#34086) Xinan Miao 2026-03-13 02:24:38 +08:00
c973ecdead [bnb] Skip moe + bnb test (#36896) Marc Sun 2026-03-12 19:03:25 +01:00
e39257a552 Add AGENTS.md (#36877) Harry Mellor 2026-03-12 17:20:50 +00:00
cc16b24b17 Update Flashinfer to 0.6.6 (#36768) Dimitrios Bariamis 2026-03-12 18:19:19 +01:00
bdc2343454 [Bugfix] Fix KeyError in parse_response_input for reasoning items with optional content (#34499) Eunkwang Jeon 2026-03-13 01:13:36 +09:00
f444c05c32 [Attention] Use FA4 for MLA prefill (#34732) Matthew Bonanni 2026-03-12 12:10:17 -04:00
85199f9681 [Bugfix] fix main branch pre-commit error (1 line change) (#36897) SoluMilken 2026-03-13 00:08:37 +08:00
a1257fd1ea [Kernel] Add FP8 KV cache support to Triton MLA decode attention (#34597) grimulkan 2026-03-12 10:32:34 -05:00
abcffbba8c [CI] Fix mypy pre-commit errors on main (#36882) Thomas Parnell 2026-03-12 16:22:29 +01:00
53ec16a705 [Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145) Kunshang Ji 2026-03-12 22:57:47 +08:00
2e693f48e7 [Perf] Add TRTLLM FP8 MoE Modular Kernel (#36307) Wei Zhao 2026-03-12 10:32:31 -04:00
7f1f36bf91 [CI] Fix mypy for vllm/reasoning (#35742) Martin Hickey 2026-03-12 12:21:33 +00:00
5282c7d4d0 [docs] Add lightweight AI assisted contribution policy (#30947) Mark McLoughlin 2026-03-12 11:46:13 +00:00
9e19f8338b [Perf] add packed recurrent fast path for decode (#36596) caozuoba 2026-03-12 19:01:57 +08:00
06e0bc21d2 [Frontend] Split OpenAIServingModels into OpenAIModelRegistry + OpenAIServingModels (#36536) Sage 2026-03-12 12:29:37 +02:00
5a71cdd76e [Bugfix] Fix crash when tool_choice=required exceeds max_tokens (#36841) Chauncey 2026-03-12 18:28:45 +08:00
f0d3658c0f [MM][OOT] Support CPU seq_lens for OOT MMEncoderAttention kernels (#36605) Shanshan Shen 2026-03-12 18:28:23 +08:00
57431d8231 [UX] Only show FP4 Marlin fallback warning for w4a4 models (#36806) Michael Goin 2026-03-12 10:19:35 +01:00
3e64fe4a18 [Bugfix] Warm up Triton autotuner for GDN layers during V1 profiling (#36599) Xu Jinyang 2026-03-12 15:51:09 +08:00
8cb24d3aed [KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328) sfeiqiang 2026-03-12 15:46:20 +08:00
00726c74c9 [Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop (#36670) István Ketykó 2026-03-12 08:35:54 +01:00
9fe404ed04 [Frontend] OpenAI Responses API supports Tool/Function calling with streaming (#29947) Chauncey 2026-03-12 15:03:50 +08:00
802f306cd1 [Tests] Skip model weight download for render-only test server (#36813) Sage 2026-03-12 08:24:42 +02:00
894843eb25 replace with torch.cuda.device with with torch.accelerator.device_index (#36144) Yan Ma 2026-03-12 14:12:57 +08:00
584a3f56de [Kernel][Helion][13/N] Force static_shapes=False in helion register (#36677) Yanan Cao 2026-03-11 22:35:29 -07:00
36735fd772 [BugFix] Fix multiple/duplicate stdout prefixes (#36822) Nick Hill 2026-03-11 21:23:21 -07:00
6ecabe4936 [CI Failure] Fix Language Models Test (Extended Pooling) daily CI Failure (#36761) wang.yuqi 2026-03-12 12:22:05 +08:00
2f8b4ce0c0 [Model Runner V2] Do not initialize sampler for non-last PP ranks (#36824) Woosuk Kwon 2026-03-11 20:55:28 -07:00
2ef69456f5 [LMCache] Fault Tolerance Mechanism (#36586) Yuwei An 2026-03-11 20:54:39 -07:00
17852aa503 more models for vLLM Benchmark Suite (#35086) Louie Tsai 2026-03-11 20:36:51 -07:00
8647c6cf51 [Bugfix] Fix minimax_m2 tool parser when stream interval > 1 (#35895) Flora Feng 2026-03-11 22:25:14 -04:00
513949f95f [XPU][Doc] Remove manual OneAPI install step, now handled by torch-xpu (#36831) Kunshang Ji 2026-03-12 09:46:02 +08:00
262b76a09f [Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829) Nick Hill 2026-03-11 18:20:34 -07:00
c34ba6b961 [Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710) Wentao Ye 2026-03-11 20:37:01 -04:00
24062b704f [ROCm][CI/Build] Add gfx1152/gfx1153 (Krackan) to HIP supported architectures (#36499) Matthias Gehre 2026-03-12 00:14:40 +01:00
d6b61e5166 [BUG] Fix async rlhf tests (#35811) Aaron Hao 2026-03-11 15:06:10 -07:00
cf632499ee [Kernel] [Helion] [15/N] Split config files into per-platform files (#36698) Yanan Cao 2026-03-11 14:25:29 -07:00
a3774a8198 [Kernel] [Helion] [12/N] Use FakeTensorMode to avoid GPU allocation during config key computation (#36563) Yanan Cao 2026-03-11 14:25:16 -07:00
0ce21c46a0 [Kernel] [Helion] [14/N] Set autotune_ignore_errors=True during autotuning (#36683) Yanan Cao 2026-03-11 14:25:04 -07:00
55eed6b7a5 [Model Runner V2] Add WhisperModelState [6/N] (#35790) Woosuk Kwon 2026-03-11 14:20:38 -07:00
c77181e534 [Model Runner V2] Add probabilistic rejection sampling for spec decoding (#35461) Giancarlo Delfin 2026-03-11 14:04:32 -07:00
12001f2ebc [LMCache] Pass TP size in lookup for MLA multi-reader locking (#36129) maobaolong 2026-03-12 04:45:20 +08:00
7ee5d5093b [BugFix][kv_offload] Fix offloading decodes with async scheduling (#33881) Or Ozeri 2026-03-11 22:43:40 +02:00
428bc718bd [Bugfix][ROCm] Strip block_size before attention backend validation (#36274) jennyyyyzhen 2026-03-11 13:37:31 -07:00
ff1e3d9c63 [BugFix]: add bagel to MM_PREFIX_LM_MODELS (#36316) 汪志鹏 2026-03-12 03:55:59 +08:00
35bdca5431 [Refactor] Remove dead code in KV connector (#36424) Wentao Ye 2026-03-11 15:40:17 -04:00
8a24842765 [ROCm] add tuned moe_wna16_triton kernel configs for CDNA4 (#35093) Amanzhol Salykov 2026-03-11 20:00:08 +01:00
65986db6ba Make Gemma and Gemma 2 accept inputs_embeds like Gemma 3 (#36787) Harry Mellor 2026-03-11 18:12:43 +00:00
9556af87d5 [torch.compile] Add support for non-contiguous fused RMSNorm + group quant (#36551) Luka Govedič 2026-03-11 13:56:55 -04:00
a1a3523a56 [KVConnector] Support worker -> scheduler metadata (#31964) Or Ozeri 2026-03-11 19:36:37 +02:00
741f4e046b fix: align lfm2 thumbnail token counting with HF (#36707) tianshu-Michael-yu 2026-03-11 10:28:38 -07:00
a5d06dc557 Add 320 dimension size support to MLA (#36161) Julien Denize 2026-03-11 18:21:22 +01:00
5efa206a8c Fix ExaoneMoeMTP test that never ran in Transformers v4 (#36792) Harry Mellor 2026-03-11 17:10:23 +00:00
196802dfa6 [Misc] Clean up renderers (#36770) Cyrus Leung 2026-03-12 00:39:29 +08:00
c84b519cf3 [Bugfix] Fix negative max_tokens when input prompt is too long (#36789) Isotr0py 2026-03-12 00:30:51 +08:00
741ecf0630 [CI] Add bfcl tool call correctness eval (#36560) Flora Feng 2026-03-11 12:27:36 -04:00
b7e5a588d8 [Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels (#36061) Robert Shaw 2026-03-11 12:07:14 -04:00
822e250ab7 [torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation (#36093) Richard Zou 2026-03-11 12:07:09 -04:00
bea02cdf93 Fix routed experts capture for hybrid models (Mamba + Attention) (#35744) Hongxin Xu 2026-03-11 23:53:10 +08:00
a3ea760ea5 Add 'none' reasoning effort to ChatCompletionRequest (#36238) Julien Denize 2026-03-11 16:45:34 +01:00
35db669f1d Correct link to supported hardware on vllm.ai (#36798) Harry Mellor 2026-03-11 15:43:28 +00:00
afebeffbfb Add support to Mistral large 3 eagle with dense layers (#36163) Julien Denize 2026-03-11 16:42:56 +01:00
5573894737 Kimi k2.5 MLA based eagle3 (#36361) Jhao-Ting Chen 2026-03-11 08:36:11 -07:00
d5816c8c2f Fix tied weights in weight mapping test for Transformers v5 (#36788) Harry Mellor 2026-03-11 15:10:26 +00:00
8ccbcda5c0 [Model Runner V2] Remove unused warmup_for_prefill method (#36762) Woosuk Kwon 2026-03-11 08:02:44 -07:00
a9e532afe2 [ROCm][Perf] Allow MTP lens > 1 in Sparse MLA (#36681) tvirolai-amd 2026-03-11 16:43:03 +02:00
f3163bba67 Disable docs build skipping until a better solution is found (#36790) Harry Mellor 2026-03-11 13:53:23 +00:00
700a1ddc65 [Misc] Use envs module to get VLLM_DISABLED_KERNELS (#35776) Martin Hickey 2026-03-11 13:37:46 +00:00
f33251ffc8 [Bugfix] Fix Mistral-small --format (#36782) Silvia Colabrese 2026-03-11 12:47:52 +01:00
e584dce52b Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230) Wuxun Zhang 2026-03-11 19:19:15 +08:00
40c0461f24 [openapi] refactor render related openapi [3/N] (#36749) Ning Xie 2026-03-11 18:14:34 +08:00
724759684c [Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps (#36136) Weiguang Li 2026-03-11 18:13:06 +08:00
9c34e9d24f Disable cascade attention by default (#36318) Michael Goin 2026-03-11 11:12:23 +01:00
09b6f99852 [compile] aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#36358) Richard Zou 2026-03-11 06:12:03 -04:00
c87fb515ed fix(lora): use replaced_module_name in pooling model name check (#36402) Ethan T. 2026-03-11 18:11:27 +08:00
5353c9b016 platforms: Fix Ray DP startup crash (#36665) Itay Alroy 2026-03-11 12:08:55 +02:00
13e79fc811 [ci] Update rtol for test_classification (#36556) Angela Yi 2026-03-11 03:08:16 -07:00
9d07a3d6e4 Add: Eagle3 support for Qwen3.5 (#36658) Rahul Tuli 2026-03-11 15:37:42 +05:30
646b85544b [Refactor] Remove Molmo2 processor wrapper (#36667) Cyrus Leung 2026-03-11 18:07:20 +08:00
4286cc5ec2 fix(minicpmv): fix audio inference by handling meta device in init_re… (#36751) tc-mb 2026-03-11 18:06:28 +08:00
95c0f928cd [NemotronH] Small fix reasoning parser (#36635) v0.17.1 roikoren755 2026-03-11 11:44:41 +02:00
c9b1e977dc add nemotron v3 reasoning parser (#36393) Shaun Kotek 2026-03-10 00:11:41 +02:00
545d18d81b [Bugfix] Support other quantization methods in glm41v (#36321) LoganJane 2026-03-11 17:48:05 +08:00
e661b9ee83 [NemotronH] Small fix reasoning parser (#36635) roikoren755 2026-03-11 11:44:41 +02:00
c910eeb125 [XPU]Bug fix for some unexpected error when use AgRs backend on XPU device. (#36593) YiSheng5 2026-03-11 17:17:46 +08:00
f4ae58b38b Remove unused config field from Gemma2 (#36672) Harry Mellor 2026-03-11 08:51:19 +00:00

... 8 9 10 11 12 ...