Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

d7ff22204a [Misc] Add mooncake-transfer-engine to kv_connectors requirements (#34826) Teng Ma 2026-02-19 02:26:24 +08:00
c0bd8b13da [Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion (#34697) Isotr0py 2026-02-19 01:46:53 +08:00
caeb887bf6 [Bugfix] Fix NVFP4 TRTLLM MoE non-gated support; add gsm8k for Nemotron-3-Nano FP8+NVFP4 (#34725) Michael Goin 2026-02-18 12:39:22 -05:00
6b3166a7c7 [CI][Bugfix] Fix multinode test script (#34820) Ilya Markov 2026-02-18 17:45:10 +01:00
25e2e136ef [CI] temporarily disable multi-node tests (#34825) Robert Shaw 2026-02-18 11:32:44 -05:00
6874638bc4 [Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) (#34758) Robert Shaw 2026-02-18 10:42:36 -05:00
e24663c5a9 Add unit tests for fp8 output fusion of triton_attn (#34228) Burkhard Ringlein 2026-02-18 12:22:49 +01:00
c50e105a88 [Model Runner V2] Avoid prepare prefill kernel launch overhead (#34780) Nick Hill 2026-02-18 00:49:21 -08:00
a766b30349 [Renderer] Deprecate code paths for old input processing (#34775) Cyrus Leung 2026-02-18 16:35:04 +08:00
1faa8cb73c [Quantization] - Added uses_meta_device_weights to quant config (#34645) Asaf Joseph Gardin 2026-02-18 09:43:44 +02:00
e89a91d927 [Bugfix] fix activation in cpu_fused_moe_torch call (#34696) Marek Michalowski 2026-02-18 07:39:46 +00:00
909b147197 [Bugfix] Fix prefix creation for Qwen3.5 (#34723) Michael Goin 2026-02-18 02:39:15 -05:00
a88b3be7c4 [Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255) ElizaWszola 2026-02-18 08:35:04 +01:00
a49ea5a58f [Model Runner V2] A bit more PP simplification (#34766) Nick Hill 2026-02-17 21:39:07 -08:00
30ebe0dc3c [CI/Build] Remove use of skip_v1 (#34699) Cyrus Leung 2026-02-18 12:19:11 +08:00
cef65f0715 [ROCm][CI] Removed hard-coded attn backend requirement for Qwen VL (#34753) Andreas Karatzas 2026-02-17 21:59:53 -06:00
6f3b2047ab [Core] Fix SSRF bypass via backslash-@ URL parsing inconsistency (#34743) Russell Bryant 2026-02-17 22:53:35 -05:00
02e8f26cea [torch.compile] Turn on silu+fp4 quant fusion by default for O1+ (#34718) Luka Govedič 2026-02-17 22:29:15 -05:00
4a00a511bb [BugFix] [Build] fix string literals comparison in indexer_k_quant_and_cache calling site (#34653) Hongxia Yang 2026-02-17 22:19:41 -05:00
a0d8d944e2 [Renderer] Move MM Hash parsing into Renderer (#34711) Cyrus Leung 2026-02-18 11:18:55 +08:00
df3f537a66 [CI] Remove unused precompiled wheel args from image build (#34767) Amr Mahdi 2026-02-17 18:58:18 -08:00
7743152957 [Attention] Refactor check_and_update_config (#33600) Matthew Bonanni 2026-02-17 20:06:54 -05:00
ab33d2a629 [Feature] Decode Context Parallel support for GPU model runner v2 (#34179) Wentao Ye 2026-02-17 19:27:15 -05:00
be3af2d29e [Model Runner V2] Further simplification for PP (#34724) Woosuk Kwon 2026-02-17 15:18:18 -08:00
c656ba3b4d [Kernel] Triton-based Top-k and Top-p sampler kernels (#33538) Jongseok Park 2026-02-17 15:14:30 -08:00
dc5fa77a4e [Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs (#34457) Matthew Bonanni 2026-02-17 14:01:27 -05:00
1e4a084c8e [CI] Fix flaky test_parsable_context (#34717) Flora Feng 2026-02-17 13:42:52 -05:00
7967e854da [BugFix] Fix sp tests (#34716) Richard Zou 2026-02-17 12:07:56 -05:00
6bd6d0c3c1 Fixed whisper CPU test that does not spawn properly. (#34324) almayne 2026-02-17 14:46:23 +00:00
8e962fef5f [CI][Nixl] Add CrossLayer KV layout tests (#34615) Nicolò Lucchesi 2026-02-17 14:35:40 +01:00
574fe75245 [Renderer] Move InputPreprocessor into Renderer (2/2) (#34560) Cyrus Leung 2026-02-17 21:29:01 +08:00
c61a98f529 [CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514) junuxyz 2026-02-17 21:22:56 +09:00
28bffe9466 Fix docs build warning (#34686) Harry Mellor 2026-02-17 10:31:40 +00:00
ad65177a19 [Bugfix] Fix 'remove_instance_endpoint' method logic in disagg_proxy_demo (#32922) ChenqianCao 2026-02-17 18:06:53 +08:00
d44a5b6c47 Remove dead bitsandbytes CxB code from 8-bit inference path (#34633) Tim Dettmers 2026-02-17 04:49:14 -05:00
1d65283e95 Revert "[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj" (#34683) Jiangyun Zhu 2026-02-17 17:29:27 +08:00
c464b57374 [Ray] Propagate third-party env vars to Ray workers via prefix matching (#34383) kourosh hakhamaneshi 2026-02-17 01:08:42 -08:00
c5c38e152a [CI] Fix bake config artifact path for AMI rebuild pipeline (#34656) Amr Mahdi 2026-02-16 22:39:44 -08:00
d00df624f3 [Model Runner V2] Minor refactoring for penalties (#34662) Woosuk Kwon 2026-02-16 21:43:00 -08:00
9752da9d9c [Model Runner V2] Minor simplification for BadWordsState (#34669) Woosuk Kwon 2026-02-16 21:27:24 -08:00
04925b2202 [Model Runner V2] Minor cleanup for PP (#34666) Woosuk Kwon 2026-02-16 19:15:31 -08:00
d74278fb67 [Model Runner V2] Fix unintended CPU-GPU sync in make_dummy (#34667) Woosuk Kwon 2026-02-16 19:00:29 -08:00
b68fd899d1 [Bugfix] Fix fused MoE int32 overflow in stride*offset without perf regression (#34507) haosdent 2026-02-17 09:58:49 +08:00
0b5f9b7204 [CI] Enable mypy import following for vllm/v1/kv_offload (#34639) Aneesh Puttur 2026-02-16 20:58:15 -05:00
9a8853f781 [Core] Pipeline Parallel support for Model Runner V2 (#33960) zhanqiuhu 2026-02-16 20:48:16 -05:00
387a1898d9 [Model Runner V2] support bad_words sampling param (#33433) zhrrr 2026-02-17 08:36:06 +08:00
3b30e61507 [NemotronH] Do not force router to run in fp32 (#34582) roikoren755 2026-02-16 20:15:32 +02:00
824f9e8f3c Targeting the MI355 agent pool with all existing tests (#34629) Alexei-V-Ivanov-AMD 2026-02-16 11:02:27 -06:00
6cc403e67d [Bugfix][CI] Fix flaky entrypoints/openai/test_response_api_with_harmony.py::test_function_calling[openai/gpt-oss-20b] (#34624) Nicolò Lucchesi 2026-02-16 17:11:07 +01:00
72d5951d02 [Bugfix] Treat generation_config max_tokens as default not ceiling (#34063) Almog Tavor 2026-02-16 17:58:24 +02:00
a3205beffb [CI] Enable mypy coverage for individual excluded files (#34292) Lucas Kabela 2026-02-16 07:34:29 -08:00
6930becd45 (bugfix): Fixed encode in LLM entrypoint for IOProcessr plugin prompts (#34618) Christian Pinto 2026-02-16 15:33:55 +00:00
03a8770a6d [ROCm][CI] Fix plugins test group; updating terratorch and dependencies (#34589) Andreas Karatzas 2026-02-16 09:33:42 -06:00
bc56a1d56e [Bugfix] Fix ARC touch KeyError for non-ready T1 blocks in kv offload (#34576) Yiqi Xue 2026-02-16 07:33:19 -08:00
ec7d9e6745 Fix call to moe_mk in modelopt MoE modules (required for LoRA) (#34575) danisereb 2026-02-16 17:33:09 +02:00
3bb4e4311c [Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj (#34492) Isotr0py 2026-02-16 23:32:51 +08:00
08f8c198ae [CI] Disable precompiled wheel path in CI image builds (#34606) Amr Mahdi 2026-02-16 07:14:43 -08:00
a21cedf4ff Bump lm-eval version for Transformers v5 compatibility (#33994) Harry Mellor 2026-02-16 14:24:35 +01:00
3ef74cde5d [CI][Tracing] Fix race condition by adding server readiness check (#34364) emricksini-h 2026-02-16 13:57:39 +01:00
cd81cdb399 [Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs (#31058) Ekagra Ranjan 2026-02-16 06:08:44 -05:00
1e828573b4 [CI][Metrics] Stabilize tests with polling and subprocess guards (#34566) Andreas Karatzas 2026-02-16 04:52:02 -06:00
a5ccc85c8c [Bugfix] Fix Dynamo unexpected keyword argument (#34320) Samu Tamminen 2026-02-16 11:32:30 +02:00
b5475d0534 Revert "[Misc] fix qwen3.5 config" (#34610) Roger Wang 2026-02-16 01:06:05 -08:00
9521002f0a [Misc] fix qwen3.5 config (#34604) JJJYmmm 2026-02-16 16:25:38 +08:00
ec17bdd894 [Renderer] Move InputPreprocessor into Renderer (1.5/2) (#34598) Cyrus Leung 2026-02-16 15:46:33 +08:00
bb59c90248 [CI] Write bake config to temp directory instead of repo root (#34569) Amr Mahdi 2026-02-15 22:15:47 -08:00
5bff999d12 [Bugfix] Add method to swap quant_method on FusedMoE to fix LoRA issues (#34453) bnellnm 2026-02-15 23:10:50 -05:00
bb85929aa6 [BugFix] Fix Python 3.13 FlashMLA import error (#34548) Lucas Wilkinson 2026-02-15 20:09:18 -08:00
5653021094 [Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model (#34584) Parth Bansal 2026-02-16 05:09:00 +01:00
974d829b05 [CI][Frontend] Return 422 instead of 500 for invalid Anthropic tool_choice (#34590) Andreas Karatzas 2026-02-15 22:06:48 -06:00
91ac5d9bfd [CI/Build] Enable tests for recent day-0 new models (#34585) Isotr0py 2026-02-16 10:17:04 +08:00
23d825aba1 [torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test (#34392) Luka Govedič 2026-02-15 09:33:57 -05:00
f07a128413 [CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… (#33079) Maryam Tahhan 2026-02-15 14:33:08 +00:00
71cd89264f [MM Encoder] Add Triton ViT attention backend (#32183) Isotr0py 2026-02-15 22:32:47 +08:00
19fab44152 [Doc] Update Encoder-Decoder models support doc with Florence-2 (#34581) Isotr0py 2026-02-15 20:18:57 +08:00
79c7e09235 [KV Connector] Add temporary, off-by-default VLLM_DISABLE_REQUEST_ID_RANDOMIZATION workaround (#34415) Seiji Eicher 2026-02-14 23:26:10 -08:00
79f3fab05a [Bugfix] Handle num_expert_group=None in flashinfer block-scale FP8 MoE (#34494) haosdent 2026-02-15 15:25:46 +08:00
604b9eaec5 [BUGFIX] Fix accuracy regression for NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 with TP>1 (#34476) Vadim Gimpelson 2026-02-15 11:25:17 +04:00
50dbd6c9e6 [bugfix] Fix critical bug when reporting for all paths where handler.create_error_response is used (#34516) Stanislav Kirillov 2026-02-15 12:24:25 +05:00
98bcc6ca59 [CI][Entrypoints] Validate detokenize token IDs to prevent int64 overflow causing 500 (#34468) Andreas Karatzas 2026-02-15 01:08:38 -06:00
f13e86d8dd [Kernels] Fix Helion GPU utils to use platform-agnostic device name API (#34537) Andreas Karatzas 2026-02-14 22:29:23 -06:00
9ca768c740 [Model Runner V2] Minor cleanup for Sampler (#34563) Woosuk Kwon 2026-02-14 18:29:03 -08:00
d5fe3f702c [Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997) Thomas Parnell 2026-02-14 22:15:56 +01:00
73391a1baa [Renderer] Move InputPreprocessor into Renderer (1/2) (#34510) Cyrus Leung 2026-02-15 02:14:21 +08:00
b3c14229b0 [ROCm][CI] Guard sparse MLA backend imports for ROCm compatibility in tests (#34538) Andreas Karatzas 2026-02-14 09:32:09 -06:00
2f186635cb [Bugfix] Fix Qwen3.5 config loading (#34554) Roger Wang 2026-02-14 03:56:11 -08:00
342a7cda2d [Misc] Update tests and examples for Prithvi/Terratorch models (#34416) Christian Pinto 2026-02-14 07:03:51 +00:00
d1ea65d0a1 [new model] add COLQwen3 code & Inference (#34398) Kata Coder 2026-02-14 13:15:19 +09:00
de42abb366 [CI] Heavy refactoring of Voxtral multimodal audio model tests (#34294) Andreas Karatzas 2026-02-13 22:04:29 -06:00
60ca7981bc Add explicit validation error for tool calls. (#34438) Julien Denize 2026-02-14 05:04:01 +01:00
0ef5b9147b fix: use __annotations__ instead of get_type_hints() for dynamic kwargs detection (#34527) Christian S. Perone 2026-02-14 04:03:37 +00:00
ed242652d7 [bug] Make sure get_modality_with_max_tokens is deterministic (#34533) Shiyan Deng 2026-02-13 20:02:59 -08:00
b37b679770 [Feature][Perf] Support Selective CPU Weight Offloading (#34535) Wei Zhao 2026-02-13 23:02:24 -05:00
a0638d052d [Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993 (#34543) Andreas Karatzas 2026-02-13 22:01:42 -06:00
c027541eaf [Hybrid] Enable spec decoding in mamba cache align mode (#33705) Harry Huang 2026-02-14 05:02:28 +08:00
fd267bc7b7 [Bugfix]: Fix structured output in multi-turn gpt-oss (#34454) Ben Browning 2026-02-13 14:12:48 -05:00
bfaa559305 Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides" (#34530) Michael Goin 2026-02-13 13:35:29 -05:00
87789c8364 [Misc] vLLM's --enforce-eager should turn off compile and cudagraphs only (#34523) Richard Zou 2026-02-13 12:52:20 -05:00
bcd65c1f6a [Bugfix] Replace c10::optional with std::optional in topk kernel (#34467) Pushpinder Singh 2026-02-13 08:30:23 -08:00
59d53066d8 [Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993) Wei Zhao 2026-02-13 11:11:26 -05:00

... 16 17 18 19 20 ...