Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

fe36bf5e80 [Model] Remove the unnecessary dtype conversion in MiniCPM (#32523) Canlin Guo 2026-01-18 16:07:28 +08:00
963dc0b865 [Model Runner V2] Minor optimization for eagle input processing (#32535) Woosuk Kwon 2026-01-17 21:55:17 -08:00
8cc26acd8b [Performance] Improve Triton prefill attention kernel's performance (#32403) Isotr0py 2026-01-18 12:19:59 +08:00
4a6af8813f [MoE Refactor] Move Test Impl into Test Dirs (#32129) Robert Shaw 2026-01-17 23:16:59 -05:00
4147910f1e [Model Runner V2] Move mrope_positions buffer to MRopeState (#32532) Woosuk Kwon 2026-01-17 20:09:48 -08:00
3055232ba0 [Feature] Add FIPS 140-3 compliant hash algorithm option for multimodal hashing (#32386) Karan Bansal 2026-01-18 08:32:01 +05:30
d68209402d [build] fix cu130 related release pipeline steps and publish as nightly image (#32522) Shengqi Chen 2026-01-18 10:36:11 +08:00
965765aef9 [build] fix cu130 related release pipeline steps and publish as nightly image (#32522) Shengqi Chen 2026-01-18 10:36:11 +08:00
9e078d0582 [CI/Build][Docker] Add centralized version manifest for Docker builds (#31492) Mritunjay Kumar Sharma 2026-01-17 19:15:30 +05:30
2b99f210f5 [Misc] Fix typo: seperator -> separator in flashmla_sparse.py (#32411) Guofang.Tang 2026-01-17 20:18:30 +08:00
1646fea672 [Model] Molmo2: Enable quantized weight mapping for vision backbone (#32385) Kim Hee Su 2026-01-17 18:33:05 +09:00
d3317bbba4 [Models] Lfm2Moe: minor name changes for resolving lora conflicts (#29063) Paul Pak 2026-01-16 23:12:55 -07:00
b17039bccc [CI] Implement uploading to PyPI and GitHub in the release pipeline, enable release image building for CUDA 13.0 (#31032) v0.14.0 Shengqi Chen 2026-01-17 12:52:33 +08:00
8e61425ee6 [CI] Implement uploading to PyPI and GitHub in the release pipeline, enable release image building for CUDA 13.0 (#31032) Shengqi Chen 2026-01-17 12:52:33 +08:00
2e7c89e708 Revert "[Attention][MLA] Make FLASHINFER_MLA the default MLA backen… (#32484) Matthew Bonanni 2026-01-16 23:42:39 -05:00
037a6487af apply _validate_input to MistralTokenizer token-id chat prompts (#32448) vanshil shah 2026-01-16 19:23:45 -08:00
5a3050a089 [Docs][Governance] Add @robertshaw2-redhat to lead maintainers group (#32498) Simon Mo 2026-01-16 18:35:49 -08:00
484e22bc18 [TPU][Core] Enable Pipeline Parallelism on TPU backend (#28506) Chenyaaang 2026-01-16 15:29:20 -08:00
ca21288080 [CI] Fix OOM in Hopper Fusion E2E Tests (H100) (#32489) Lucas Wilkinson 2026-01-16 14:27:16 -07:00
4c82b6fac7 [responsesAPI] allow tuning include_stop_str_in_output (#32383) Andrew Xia 2026-01-16 16:14:40 -05:00
a884bc62d6 [LoRA] Update LoRA expand kernel heuristic (#32425) Xin Yang 2026-01-16 10:38:07 -08:00
7a1030431a Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. (#29843) Hashem Hashemi 2026-01-16 09:45:04 -08:00
9fd918e510 [CI] Update deepgemm to newer version (#32479) Wentao Ye 2026-01-16 12:18:05 -05:00
c9a533079c [EPLB][BugFix]Possible deadlock fix (#32418) Ilya Markov 2026-01-16 15:11:01 +01:00
48b67ba75f [Frontend] Standardize use of create_error_response (#32319) Cyrus Leung 2026-01-14 19:22:26 +08:00
6ca4f400d8 [CI][AMD] Skip test_permute_cols since the kernel is not used and not built for ROCm (#32444) rasmith 2026-01-16 02:22:53 -06:00
180e981d56 [Chore] Replace swish with silu (#32459) Cyrus Leung 2026-01-16 16:22:45 +08:00
b84c426a8c [ROCm][CI] Skip Qwen3-30B-A3B-MXFP4A16 Eval Test On Non-CUDA Platforms (#32460) Micah Williamson 2026-01-16 02:17:44 -06:00
b66b0d6abb fix(rocm): Enable non-gated MoE (is_act_and_mul=False) support on ROCm (#32244) Rabi Mishra 2026-01-16 13:01:10 +05:30
03da3b52ef [Bugfix] Refactor to support DP parallel in R3 (#32306) Hongxin Xu 2026-01-16 15:13:58 +08:00
14ce524249 [CI] Breakup h200 tests (#30499) Lucas Wilkinson 2026-01-15 23:23:22 -07:00
4ae77dfd42 [Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest (#32395) wang.yuqi 2026-01-16 14:17:04 +08:00
73f635a75f [Bug] Add TPU backend option (#32438) XiongfeiWei 2026-01-15 21:17:12 -08:00
35bf5d08e8 [bugfix] Fix online serving crash when text type response_format is received (#26822) cjackal 2026-01-16 13:23:54 +09:00
5de6dd0662 [Bugfix] [DeepSeek-V3.2] fix sparse_attn_indexer padding (#32175) Kebe 2026-01-16 12:21:55 +09:00
709502558c [Model] Add Step3vl 10b (#32329) ltd0924 2026-01-16 11:04:16 +08:00
09f4264a55 [Bugfix] Fix ROCm dockerfiles (#32447) TJian 2026-01-16 10:50:00 +08:00
7f42dc20bb [CI] Fix LM Eval Large Models (H100) (#32423) v0.14.0rc2 Matthew Bonanni 2026-01-15 19:52:49 -05:00
c2a37a3cf8 Cherry pick [ROCm] [CI] [Release] Rocm wheel pipeline with sccache #32264 TJian 2026-01-16 02:56:18 +08:00
0e31fc7996 [UX] Use kv_offloading_backend=native by default (#32421) Michael Goin 2026-01-15 13:55:11 -05:00
6ac0fcf416 [ROCm][Bugfix] Disable hip sampler to fix deepseek's accuracy issue on ROCm (#32413) Pleaplusone 2026-01-16 00:35:47 +08:00
b62249725c [ROCM] Add ROCm image build to release pipeline (#31995) Douglas Lehr 2026-01-15 05:01:40 -06:00
1b57275207 [Bugfix][ROCm][performance] Resolve the performance regression issue of the Qwen3-Next-80B-A3B-Thinking under rocm_atten (#32336) vllmellm 2026-01-15 03:32:48 +08:00
46f8a982b1 [ROCm][CI] Enable AITER Unified Attention On ROCm For gpt-oss Test (#32431) Micah Williamson 2026-01-15 18:55:57 -06:00
bcf2333cd6 [CI] Fix LM Eval Large Models (H100) (#32423) Matthew Bonanni 2026-01-15 19:52:49 -05:00
83239ff19a Add thread_n=64 support to Marlin MoE (#32360) Michael Goin 2026-01-15 19:45:44 -05:00
c277fbdf31 [Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257) TomerBN-Nvidia 2026-01-16 02:15:05 +02:00
aca5c51487 [Refactor] Remove unused file (#32422) Wentao Ye 2026-01-15 17:59:38 -05:00
31c29257c8 [MoE Refactor][17/N] Apply Refactor to Bf16 (#31827) Yongye Zhu 2026-01-15 12:53:40 -08:00
8c11001ba2 [ROCM] DSfp4 mla projection gemms weight dynamic quantization (#32238) Aleksandr Malyshev 2026-01-15 12:13:08 -08:00
bd292be0c0 [BugFix] Python file source reading can fail on UnicodeDecodeError (#32416) Richard Zou 2026-01-15 15:01:41 -05:00
41c544f78a [ROCm] [CI] [Release] Rocm wheel pipeline with sccache (#32264) TJian 2026-01-16 02:56:18 +08:00
1be5a73571 [UX] Use kv_offloading_backend=native by default (#32421) Michael Goin 2026-01-15 13:55:11 -05:00
c36ba69bda [BugFix] Fix assert x_s.shape[-1] == x_q.shape[-1] // group_shape[1] in Blackwell Quantized MoE Test (#32362) Lucas Wilkinson 2026-01-15 11:19:12 -07:00
047413375c [Attention][AMD] Make flash-attn optional (#30361) Matthias Gehre 2026-01-15 18:18:24 +01:00
74e4bb1c5a fixing podman build issue (#32131) smit kadvani 2026-01-15 09:07:08 -08:00
b34474bf2c [Feature] Support async scheduling + PP (#32359) Wentao Ye 2026-01-15 12:06:23 -05:00
6218034dd7 [Model Runner V2] Support FlashInfer backend & Fix CUDA Graph bug [1/2] (#32348) Woosuk Kwon 2026-01-15 08:59:23 -08:00
77c16df31d [ROCm][Bugfix] Disable hip sampler to fix deepseek's accuracy issue on ROCm (#32413) Pleaplusone 2026-01-16 00:35:47 +08:00
130d6c9514 [ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for AiterFlashAttentionBackend (#29887) Pleaplusone 2026-01-15 23:29:53 +08:00
361dfdc9d8 [Quant] Support MXFP4 W4A16 for compressed-tensors MoE models (#32285) Dipika Sikka 2026-01-15 10:25:55 -05:00
8ebfacaa75 [Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32339) Matthew Bonanni 2026-01-15 09:49:57 -05:00
b89275d018 [ROCm] Improve error handling while loading quantized model on gfx120… (#31715) brian033 2026-01-15 20:16:00 +08:00
28459785ff [3/N] Group together media-related code (#32406) Cyrus Leung 2026-01-15 19:52:12 +08:00
8853a50af2 [CI][BugFix][AMD][FP8] Fix test_rms_norm so it runs correctly on ROCm (#32372) rasmith 2026-01-15 05:05:54 -06:00
c5891b5430 [ROCM] Add ROCm image build to release pipeline (#31995) Douglas Lehr 2026-01-15 05:01:40 -06:00
707b44cc28 [Refactor] [11/N] to simplify the mcp architecture (#32396) Chauncey 2026-01-15 18:49:31 +08:00
3a4e10c847 [Benchmark] [Feature] add vllm bench sweep startup command (#32337) rongfu.leng 2026-01-15 17:25:46 +08:00
cbbae38f93 [2/N] Move cache factories to MM registry (#32382) Cyrus Leung 2026-01-15 17:02:30 +08:00
cdba4c74b3 [Model] Avoid token selection in SigLIP pooling head (#32389) Cyrus Leung 2026-01-15 17:01:59 +08:00
a52d1396a7 fix: avoid crash on zero-arg tool calls in glm4 parser (#32321) seeksky 2026-01-15 16:45:59 +08:00
1e584823f8 [Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314) dtc 2026-01-15 16:31:16 +08:00
4c1c501a7e [Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369) Chauncey 2026-01-15 15:41:34 +08:00
ae1eba6a9a [ROCm][CI] Pin transformers 4.57.3 to fix jina test failures (#32350) Andreas Karatzas 2026-01-15 01:19:34 -06:00
e9ec2a72d8 [Bugfix] Fix stale common_attn_metadata.max_seq_len in speculative decoding with Eagle (#32312) Ofir Zafrir 2026-01-15 08:39:37 +02:00
2c9b4cf5bf [BugFix] Fix DeepSeek-V3.1 + DeepGEMM incompatible scale shapes (#32361) Lucas Wilkinson 2026-01-14 23:32:22 -07:00
9d7ae3fcdb [code clean] remove duplicate check (#32376) Ning Xie 2026-01-15 13:29:34 +08:00
3c2685645e [CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol (#32201) rasmith 2026-01-14 23:04:34 -06:00
773d7073ae [ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] (#32355) Micah Williamson 2026-01-14 22:53:43 -06:00
edadca109c [Bugfix] Add CpuCommunicator.dispatch and combine to fix DP+MoE inference (#31867) kzwrime 2026-01-15 12:50:48 +08:00
d86fc23bdd [Misc] Remove redundant line (#32366) Li Wang 2026-01-15 12:29:56 +08:00
375e5984fe Support configure skip_special_tokens in openai response api (#32345) Shiyan Deng 2026-01-14 20:07:26 -08:00
19b251fe3d Fix optional parameter parsing in MiniMax M2 tool parser #32278 (#32342) baonudesifeizhai 2026-01-14 23:05:48 -05:00
15422ed3f7 [CI/Build][Hardware][AMD] Fix v1/shutdown (#31997) Ryan Rock 2026-01-14 22:01:42 -06:00
8471b27df9 [compile] raise on compile_size implicit padding (#32343) dolpm 2026-01-14 12:46:56 -08:00
66652e8082 [BugFix] Assign page_size_padded when unifying kv cache spec. (#32283) Lumosis 2026-01-14 12:10:01 -08:00
e27078ea80 [Bugfix][ROCm][performance] Resolve the performance regression issue of the Qwen3-Next-80B-A3B-Thinking under rocm_atten (#32336) vllmellm 2026-01-15 03:32:48 +08:00
d084e9fca7 [MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding (#26291) Aleksandr Samarin 2026-01-14 21:20:52 +03:00
3a612322eb [CI] Move rixl/ucx from Dockerfile.rocm_base to Dockerfile.rocm (#32295) qli88 2026-01-14 10:53:36 -06:00
9ea07b41da [1/N] Reorganize multimodal processing code (#32327) Cyrus Leung 2026-01-14 23:25:31 +08:00
552b262936 rename tokenize serving api request id prefix to tokenize (#32328) Ning Xie 2026-01-14 22:52:20 +08:00
00e6402d56 [Frontend] track responsesAPI server_load (#32323) Chauncey 2026-01-14 20:00:37 +08:00
ce0946249d [Misc] Make mem utils can be reused by other platforms (#32322) Shanshan Shen 2026-01-14 19:46:01 +08:00
3f28174c6a [Frontend] Standardize use of create_error_response (#32319) Cyrus Leung 2026-01-14 19:22:26 +08:00
769d0629e1 [Refactor] [9/N] to simplify the vLLM openai translations serving ar chitecture (#32313) Chauncey 2026-01-14 18:20:58 +08:00
90db5b31e4 [Refactor] Move top-level dummy data generation to registry (#32310) Cyrus Leung 2026-01-14 18:17:46 +08:00
b8199f6049 [Model] Re-implement Qwen3Omni Audio Encoder (#32167) Roger Wang 2026-01-13 23:40:30 -08:00
7e6f123810 Add Molmo2 multimodal model support (#30997) sangho.lee 2026-01-13 23:33:09 -08:00
9312a6c03a [Refactor] [8/N] to simplify the vLLM openai responsesapi_serving architecture (#32260) Chauncey 2026-01-14 15:26:24 +08:00
6388b50058 [Docs] Add docs about OOT Quantization Plugins (#32035) Michael Goin 2026-01-14 02:25:45 -05:00

... 26 27 28 29 30 ...