Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

048bb59728 AMD CI Test - unskip moe_sum test and moe_align_block_size tests (#32039) Hongxia Yang 2026-01-14 02:25:10 -05:00
7933638051 [misc] Remove is_torch_equal_or_newer(2.4) cases (#32296) Angela Yi 2026-01-13 23:22:07 -08:00
6b176095e3 [Build] Relax anthropic version pin from ==0.71.0 to >=0.71.0 (#32289) David 2026-01-14 02:21:39 -05:00
9d0d7f48d5 [ROCm][CI] Handle missing vision_config in Isaac model attention patch (#32281) Andreas Karatzas 2026-01-14 01:21:26 -06:00
50632adc58 Consolidate Intel Quantization Toolkit Integration in vLLM (#31716) Yi Liu 2026-01-14 15:11:30 +08:00
6fa6e7ef0c [ROCm][CI] Disable Async Scheduling For Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test (#32275) Micah Williamson 2026-01-13 23:29:42 -06:00
90c0836902 [Model Runner V2] Refactor Sampler (#32245) Woosuk Kwon 2026-01-13 17:58:12 -08:00
8ef50d9a6b [Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885) Roberto L. Castro 2026-01-14 00:22:53 +01:00
2a60ac91d0 [Improvement] Persist CUDA compat libraries paths to prevent reset on apt-get (#30784) emricksini-h 2026-01-13 23:35:05 +01:00
9e65bb4ef4 Add mergify label job for "bug" in PR titles (#31980) Michael Goin 2026-01-13 17:28:19 -05:00
0db574b185 [Build] Add scripts for cherry-picking and trigger build (#32282) Simon Mo 2026-01-13 13:21:05 -08:00
2f4a71daf2 [Misc] Add In-Container restart capability through supervisord for sagemaker entrypoint (#28502) HappyAmazonian 2026-01-13 13:06:10 -08:00
69f8a0ea37 fix(rocm): Use refresh_env_variables() for rocm_aiter_ops in test_moe (#31711) Rabi Mishra 2026-01-14 00:41:54 +05:30
f28125d87b [Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improvement (#32058) Wentao Ye 2026-01-13 13:58:18 -05:00
2c24bc6996 [BugFix] [KVConnector] Fix KV events for LMCache connector (#32169) Martin Hickey 2026-01-13 15:50:34 +00:00
0aa8c40552 [Bugfix] Replace PoolingParams.normalize with use_activation (#32243) Cyrus Leung 2026-01-13 18:45:42 +08:00
46f8c6b725 Fix CUDA 13 wheel installation doc (#32276) Dmitry Tokarev 2026-01-13 13:48:37 -05:00
af54d2e2d0 [responseAPI] support partial message generation (#32100) Andrew Xia 2026-01-13 13:41:26 -05:00
6beef12b9b [EPLB][Cleanup] Remove is_async_enabled from EplbModelState (#32050) Sage Moore 2026-01-13 10:19:03 -08:00
ab74b2a27a [Trivial] Remove duplicate enable_mfu_metrics (#32246) Mark McLoughlin 2026-01-13 17:09:23 +00:00
2263d44b68 [4/N][Attention] Move MLA common to model_executor (#32060) Matthew Bonanni 2026-01-13 12:08:45 -05:00
4f3676e726 nixl_connector: export UCX_MEM_MMAP_HOOK_MODE=none to avoid a UCX memory leak (#32181) Mathis Felardos 2026-01-13 17:21:10 +01:00
510265472c [BugFix] [KVConnector] Fix KV events for LMCache connector (#32169) Martin Hickey 2026-01-13 15:50:34 +00:00
4f02cb2eac [Refactor] [7/N] to simplify the vLLM lora serving architecture (#32251) Chauncey 2026-01-13 23:37:34 +08:00
252c011012 [Refactor] Remove MultiModalProfiler (#32254) Cyrus Leung 2026-01-13 23:10:20 +08:00
98f60e5acb [6/N][Attention] Move utils to more appropriate locations (#32215) Matthew Bonanni 2026-01-13 08:38:52 -05:00
fefce49807 [Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240) Chauncey 2026-01-13 21:01:39 +08:00
a5bbbd2f24 [Quantization] fix: overflow with static per-tensor scaling (#29867) Mickaël Seznec 2026-01-13 13:56:01 +01:00
8c8653b672 [Docs] Nixl Usage recommend fail kv_load_failure_policy (#32198) Nicolò Lucchesi 2026-01-13 13:51:57 +01:00
232214b2ae [Bugfix] Replace PoolingParams.normalize with use_activation (#32243) Cyrus Leung 2026-01-13 18:45:42 +08:00
eb28e8068d [Refactor] Remove get_encoder_dummy_data (#32241) Cyrus Leung 2026-01-13 17:21:23 +08:00
542a4059b2 [Model] Use mm_position to compute mrope positions for Qwen2-VL/2.5-VL (#32126) YunzhuLu 2026-01-13 17:04:29 +08:00
df7e12715f [ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061) Andreas Karatzas 2026-01-13 01:14:30 -06:00
44c34f22d9 [Doc] Update installation from source command (#32239) Roy Wang 2026-01-13 15:10:27 +08:00
80221e1884 [BugFix]Fix eagle draft_model_config and add tests (#31753) Xingyu Liu 2026-01-12 23:09:36 -08:00
5e714f7ff4 [ROCm][CI] Fix HuggingFace flash_attention_2 accuracy issue in Isaac vision encoder (#32233) Andreas Karatzas 2026-01-13 00:33:59 -06:00
11b6af5280 [ROCm][Bugfix] Fix Mamba batched decode producing incorrect output (#32099) v0.14.0rc1 Andreas Karatzas 2026-01-12 23:46:53 -06:00
2a719e0865 [Perf] Optimize requests abort (#32211) Wentao Ye 2026-01-12 23:11:37 -05:00
f243abc92d Fix various typos found in docs (#32212) Andrew Bennett 2026-01-12 21:41:47 -06:00
60b77e1463 [Frontend] Add reasoning_effort to OpenAIServing._preprocess_chat() (#31956) Sanghoon Yoon 2026-01-13 12:21:49 +09:00
15b33ff064 [Misc] improve warning/assert messages (#32226) cjackal 2026-01-13 12:11:23 +09:00
c6bb5b5603 [BugFix] Fix engine crash caused by chat tools + response_format (#32127) Nick Hill 2026-01-12 18:33:14 -08:00
9273a427b5 [Misc] Allow enabling NCCL for DP sync when async scheduling (#32197) Nick Hill 2026-01-12 18:03:08 -08:00
78d13ea9de [Model] Handle trust_remote_code for transformers backend (#32194) Cyrus Leung 2026-01-13 09:30:12 +08:00
a307ac0734 [responsesAPI] add unit test for optional function tool call id (#32036) Andrew Xia 2026-01-12 19:14:54 -05:00
a28d9f4470 [ROCm][CI] Handle pytest status code 5 when a shard isn't allocated any tests (#32040) Divakar Verma 2026-01-12 16:35:49 -06:00
629584bfc9 [Kernel][MoE] fix computation order of MoE weight multiplication and improve flow (#31962) xuebwang-amd 2026-01-13 06:17:30 +08:00
0a7dd23754 [Model Runner V2] Add support for M-RoPE (#32143) Woosuk Kwon 2026-01-12 13:37:43 -08:00
dec28688c5 [Model Runner V2] Minor refactor for logit_bias (#32209) Woosuk Kwon 2026-01-12 13:08:30 -08:00
9f430c94bd [BUGFIX] Add missed remaping of the names of fp8 kv-scale (#32199) Vadim Gimpelson 2026-01-13 00:42:06 +04:00
f8bd8394e3 [NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031) Nicolò Lucchesi 2026-01-12 21:38:49 +01:00
ca81811bfe [Model Runner V2] Support logit_bias, allowed_token_ids, min_tokens (#32163) Woosuk Kwon 2026-01-12 11:31:10 -08:00
ad8818bb5e [Misc][BE] Type coverage for vllm/compilation [3/3] (#31748) Lucas Kabela 2026-01-12 11:24:38 -08:00
08e8e99ce7 [Misc] Change log level for batch queue log (#32192) Nicolò Lucchesi 2026-01-12 19:59:31 +01:00
2be765b68a [BugFix] scheduler: Fix ordering preserving of skipped requests (#32173) Or Ozeri 2026-01-12 20:39:38 +02:00
16abe6b85a [Misc] Set default torch num threads for input processing (#31879) Roger Wang 2026-01-12 10:28:16 -08:00
1eb61ab34b [Refactor] EPLB rebalance algo to NumPy (#30697) Ilya Markov 2026-01-12 19:13:23 +01:00
3d962d72ab [BugFix] fix FusedMoE.make_expert_params_mapping in EXAONE-MoE (#32196) Kyungmin Lee 2026-01-13 03:00:45 +09:00
20228cb851 [3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054) Matthew Bonanni 2026-01-12 12:13:56 -05:00
7c0d3c5152 [Benchmark] Share data between SLA runs (#32184) Cyrus Leung 2026-01-13 01:12:22 +08:00
5b68107411 [Misc][PD] Fix get_attn_backend usage in transfer connectors (#31988) Nicolò Lucchesi 2026-01-12 18:10:05 +01:00
8fb2c135be [Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode (#32118) Asaf Joseph Gardin 2026-01-12 19:02:38 +02:00
8863c2b25c [Model] Standardize pooling heads (#32148) Cyrus Leung 2026-01-13 01:01:49 +08:00
3f72639d36 [FIX] Add NO_MUL activation support for modular kernel path (#31528) danielafrimi 2026-01-12 18:55:49 +02:00
6bc9c8473e [MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct (#29384) Jaehyun An 2026-01-13 01:39:02 +09:00
63ed2409e8 Add K-EXAONE-236B-A23B (#31621) Kyungmin Lee 2026-01-13 01:30:50 +09:00
95e53d907c doc: Update model references in supported_models.md (#32188) Andy Zhang 2026-01-13 00:15:28 +08:00
0346396e94 [ROCm] [Bugfix] Fix order of mori build in Dockerfile.rocm_base (#32179) TJian 2026-01-12 23:33:21 +08:00
e68b0dad8b doc: Update model name for Qwen3-Coder in documentation (#32185) Andy Zhang 2026-01-12 23:10:50 +08:00
9cddbdba6d OffloadingConnector: Add cpu_bytes_to_use configuration (#24498) Or Ozeri 2026-01-12 17:00:43 +02:00
49e6b86c91 [Feature] Support recording expert indices for rollout router replay (#28284) Hongxin Xu 2026-01-12 22:23:04 +08:00
0565f1fdec [P/D] Refactor mooncake connector sender thread using async coroutines (#31573) dtc 2026-01-12 20:35:35 +08:00
9dbe1fe960 [Bugfix] Fix missing scale passing for encoder Triton Attention implementation (#32149) Isotr0py 2026-01-12 19:13:41 +08:00
a5f89ae296 [Doc] Add documentation for offline API docs feature (#32134) RickyChen / 陳昭儒 2026-01-12 18:33:48 +08:00
05e8981234 [Doc] Improve LoRA docs (#32159) Jee Jee Li 2026-01-12 18:19:17 +08:00
899541bdb1 [doc] fix broken links (#32158) XlKsyt 2026-01-12 18:18:38 +08:00
d7b2e57097 [Frontend] Fix Flaky MCP Streaming Test (#32153) daniel-salib 2026-01-12 02:03:32 -08:00
5e034f2e3d [cpu][bench] Add Fused MoE Micro Benchmark for CPU Backend (#32092) Andika Rachman 2026-01-12 17:03:28 +07:00
22970c1626 [Misc] Disable default --ready-check-timeout-sec extra call in vllm bench (#30975) Nicolò Lucchesi 2026-01-12 10:58:21 +01:00
600aaab8d6 [Model] Remove incorrect SupportsPP from MTP models (#32150) Cyrus Leung 2026-01-12 17:19:30 +08:00
60446cd684 [Model] Improve multimodal pooling examples (#32085) wang.yuqi 2026-01-12 15:54:09 +08:00
9101dc756c [Model] Avoid hardcoding pooling type (#32119) Cyrus Leung 2026-01-12 13:28:12 +08:00
025a32f9ed [Model Runner V2] Remove async barrier (#32083) Woosuk Kwon 2026-01-11 20:24:30 -08:00
19504ac07f [Model Runner V2] Skip building deprecated fields in attn metadata (#32132) Woosuk Kwon 2026-01-11 14:31:04 -08:00
3df619ac94 [CI] fix test_concat_and_cache_mla_rope_fused (#32117) Jiangyun Zhu 2026-01-11 23:11:11 +08:00
d74132ca3b fix offline inference chat response prompt (#32088) Ning Xie 2026-01-11 22:01:18 +08:00
a34abc49b7 [FixBug] Improve exception string in tensorizer.py (#31680) maang 2026-01-11 21:01:53 +08:00
d70249e2e9 [Misc] fix this log format not space (#32112) rongfu.leng 2026-01-11 21:01:16 +08:00
a374532111 [CI/Build] Separate out flaky responses API tests (#32110) Cyrus Leung 2026-01-11 21:01:12 +08:00
cee7436a26 [Misc] Make scipy as optional audio/benchmark dependency (#32096) Isotr0py 2026-01-11 16:18:57 +08:00
4c16ba617f [KVConnector] OffloadingConnector: Fix bug in handling of preemptions (#29870) Or Ozeri 2026-01-11 10:05:36 +02:00
bde57ab2ed [Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713) Matt 2026-01-11 01:19:46 -06:00
9103ed1696 [CPU][BugFix] Disable AOT Compile for CPU (#32037) Fadi Arafeh 2026-01-11 07:15:49 +00:00
46eb30f519 make assume_32_bit_indexing configurable (#32044) Laith Sakka 2026-01-10 23:15:46 -08:00
0dd63639be [MTP][GLM][Bugfix] Fixed .weight_scale loading logic that dropped MTP prediction accuracy with fp8+mtp (#32101) Andy Liu 2026-01-10 23:14:54 -08:00
ef96fa3f1f [Benchmark][2/2] Use spline interpolation to tune SLA variables (#32095) Cyrus Leung 2026-01-11 12:27:27 +08:00
2a4dbe24ea [BugFix] Wait for compute before offloading KV to CPU (#31341) Or Ozeri 2026-01-11 00:25:08 +02:00
8020a60402 [Bugfix] Fix Qwen3-VL-Reranker model loading for sequence classification (#32089) RickyChen / 陳昭儒 2026-01-11 04:40:09 +08:00
e15a5ff07b [MISC] Add strict contiguity check for FlashInfer attention tensors (#32008) Vadim Gimpelson 2026-01-11 00:40:05 +04:00
6ea001cfb7 [Bugfix][Quantization] Ensure input contiguity in per_token_quant_int8 (#31637) Vensen 2026-01-11 04:40:02 +08:00

... 27 28 29 30 31 ...