Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

d6704dd099 Fix MiniMax-M2 rmsnorm precision and remove useless code (#27627) Roger Young 2025-10-29 21:01:05 +08:00
ecca3fee76 [Frontend] Add vllm bench sweep to CLI (#27639) Cyrus Leung 2025-10-29 20:59:48 +08:00
9a0d2f0d92 [CI/Build] Skip cpu offloading test on AMD (#27690) Zhewen Li 2025-10-29 05:55:51 -07:00
ad3ec89532 [VLM] Add Qwen3-VL generation test (#25185) Isotr0py 2025-10-29 20:19:37 +08:00
3481e40743 [chore] Remove models weight on S3 logic (#27725) Kevin H. Luu 2025-10-29 03:29:49 -07:00
5e72216d17 Feature/video support in random mm dataset (#25963) Eugene Khvedchenya 2025-10-29 12:24:52 +02:00
1a33aacf82 [Misc] Raise error for missing video metadata in MultiModalDataParser (#27664) Isotr0py 2025-10-29 18:06:42 +08:00
7ba6aa8f56 [Fix] import get_kv_cache_torch_dtype error in LMCacheConnector integration (#27670) Yue Zhang 2025-10-29 18:03:54 +08:00
ab2eb27b74 [Frontend] [gpt-oss] Mcp type bug (#27689) Alec S 2025-10-29 06:01:32 -04:00
3c7fefdeba [Frontend] [gpt-oss] Tool json call parsing error retry (#27675) Alec S 2025-10-29 05:42:44 -04:00
1891cf605a [Bugfix] Fix modular kernel tests (#27707) bnellnm 2025-10-29 04:14:33 -04:00
8df98c2161 [perf] Enable concurrent execution of "shared_experts" and "selected_experts" in qwen3-next (#27578) Jiangyun Zhu 2025-10-29 16:12:54 +08:00
4fb8771cc0 [CI/Build] Move pre-commit only scripts to tools/pre_commit (#27657) Cyrus Leung 2025-10-29 16:04:33 +08:00
413ef7a3b4 [Speculators] Move tests + fix integration (#27308) Dipika Sikka 2025-10-29 03:54:21 -04:00
8b62495076 [Bugfix] Fix non-contiguous tensor error in rocm_unquantized_gemm_impl (#27605) Zhewen Li 2025-10-29 00:00:15 -07:00
83fd49b1fc [CI/Build][Bugfix]Fix Quantized Models Test on AMD (#27712) Zhewen Li 2025-10-28 23:27:30 -07:00
a4a4f0f617 [KV Connector] Update lmcache connector with latest compatibility (#27681) Shaoting 2025-10-28 22:38:37 -07:00
0d8161b075 [Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes (#27705) Lukas Geiger 2025-10-29 05:28:20 +00:00
d2c33c397a [NIXL][XPU] update name of nixl wheel (#27631) liuzhenwei 2025-10-29 12:43:29 +08:00
f6d5f5888c [Build] Revert triton_kernels requirements (#27659) Varun Sundar Rabindranath 2025-10-29 00:07:09 -04:00
9007bf57e6 Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27714) Simon Mo 2025-10-28 20:58:01 -07:00
f257544709 Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 (#27598) v0.11.1rc4 Huy Do 2025-10-28 19:39:15 -07:00
0b51c9bd8b [Core] Early return in SlidingWindowManager.remove_skipped_blocks (#27673) Jialin Ouyang 2025-10-28 18:32:33 -07:00
d3ab240f39 [Bug] Fix deepep low latency use nvlink by default (#27677) Wentao Ye 2025-10-28 19:53:12 -04:00
94666612a9 [Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207) Lucas Kabela 2025-10-28 15:36:43 -07:00
4fe5895361 [AsyncScheduling] Make async overlap work with logprobs (#27615) Nick Hill 2025-10-28 15:35:54 -07:00
111faf1118 [Core] Scheduler: Publish connector events after output (#25875) Or Ozeri 2025-10-28 23:01:33 +02:00
6afc28a9ba [Test] Batch Invariant: Unit test using parameterized backend (#27478) Wentao Ye 2025-10-28 16:51:35 -04:00
141e6a0505 [Misc] Make reorder batch also separate extends (#27367) Lucas Wilkinson 2025-10-29 01:55:10 +08:00
130aa8cbcf Add load pattern configuration guide to benchmarks (#26886) Matvei Pashkovskii 2025-10-28 19:49:15 +02:00
e3d8186666 [compile] Add fallback path to AOT compile when serialization fails. (#27350) Zhengxu Chen 2025-10-28 12:54:26 -04:00
f5710ef02a [Misc] Make LayerBlockType a Literal instead of Enum (#27658) Cyrus Leung 2025-10-29 00:23:35 +08:00
a8c02fb5bf [Bugfix][CI] Fix v1 attention backend tests and add CI coverage (#26597) Mohammad Miadh Angkad 2025-10-28 23:42:05 +08:00
02af36df36 [Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer (#27117) Kero Liang 2025-10-28 23:01:24 +08:00
e88bdd60d9 [FLA] Introduce Kimi Delta Attention(KDA) to VLLM (#27654) Zhiyuan Li 2025-10-28 22:56:28 +08:00
05e034f085 [nit]: Fix import for the lmcache integration (#27600) Samuel Shen 2025-10-28 07:40:55 -07:00
936643a868 [BugFix] Also consider RAY_EXPERIMENTAL_NOSET_* when storing compilation cache (#27294) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-28 16:22:28 +02:00
b186149e8e [Bugfix][Frontend] validate arg priority in frontend LLM class before add request (#27596) Junpu Fan 2025-10-28 07:02:43 -07:00
2abbd351ef [Core] Enable async scheduling for external_launcher mode (#27394) 22quinn 2025-10-28 06:52:47 -07:00
446912d1cb fix: allow HuggingFace standard chat template params via **kwargs (#27622) wangln19 2025-10-28 21:12:34 +08:00
a00d6254e9 [compile] Disable dynamo guards check for AOT compilation. (#27288) Zhengxu Chen 2025-10-28 08:58:12 -04:00
05181cc57f [Hybrid] Add mamba_block_size to Engine Args (#27289) Asaf Joseph Gardin 2025-10-28 14:54:24 +02:00
259504e147 [compile] Add enable_prompt_embeds to compile hash. (#27285) Zhengxu Chen 2025-10-28 08:46:03 -04:00
0484b64248 [Bug] Fix shape issue for eplb expert weights (#27589) Wentao Ye 2025-10-28 08:44:05 -04:00
f58d9b6404 [Misc] Separate out utils.counter and move utils.Device to engine (#27588) Cyrus Leung 2025-10-28 20:20:46 +08:00
44b5ce956d [Bugfix] In LongRoPE, decide short vs long based on max_model_len (#27431) Matthew Bonanni 2025-10-28 08:00:56 -04:00
7a865f2325 [V0 Deprecation] Remove vestigial V0 logits_processors.py file (#27601) Nick Hill 2025-10-28 04:17:45 -07:00
2fa90bda27 Fix a robust parsing issue in KimiK2ToolParser that causes IndexError (#27565) wangln19 2025-10-28 19:11:50 +08:00
0291fbf65c [CI/Build] Fix amd model executor test (#27612) Zhewen Li 2025-10-28 01:58:11 -07:00
b46e4a06f1 [Core][Bookkeeping Optimization] Update against numpy view of is_token_ids tensor (#27618) Jialin Ouyang 2025-10-28 01:13:10 -07:00
d34f5fe939 [Bugfix][CPU] Fallback oneDNN linear to torch linear to fix half gemm support on legecy platforms (#27526) Li, Jiang 2025-10-28 14:25:44 +08:00
bdb01a38fe [Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X (#27323) Eric Yue 2025-10-28 13:58:06 +08:00
5b3c35a68e [ROCm] [Doc] Update ROCm installation docs (#27327) vllmellm 2025-10-28 13:00:50 +08:00
61fbfe5274 [Bugfix] fixed inconsistent finish_reason handling between V0 and V1 engines (#27555) Chauncey 2025-10-28 10:18:08 +08:00
255e34ca50 [Stability fix] turn off HMA allocator when connector is set (#27592) Kuntai Du 2025-10-27 18:32:23 -07:00
a8d2e326ec [Bugfix][CI] Fix config resolving logic with remote models (#27610) Roger Wang 2025-10-27 17:48:32 -07:00
53a56e658b [gpt-oss][2/N] Support input_messages in responsesRequest (#26962) Andrew Xia 2025-10-27 16:15:49 -07:00
69f064062b Code quality improvements: version update, type annotation enhancement, and enum usage simplification (#27581) usberkeley 2025-10-28 01:50:22 +08:00
921e78f4bb [ROCm] Update AITER branch for ROCm base docker (#27586) Micah Williamson 2025-10-27 12:22:33 -05:00
6ebffafbb6 [Misc] Clean up more utils (#27567) Cyrus Leung 2025-10-27 23:30:38 +08:00
3b96f85c36 [Chore]: Stream tokens vs characters in tool call parser tests (#26513) Ben Browning 2025-10-27 11:06:25 -04:00
23ad820553 fixing mm placeholder replacement issue with gemma3 (#27538) tingtinggithub 2025-10-27 07:34:01 -07:00
5d3be3ba4c [Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement (#27487) Varun Sundar Rabindranath 2025-10-27 10:32:50 -04:00
4f882be4a0 [Model] Siglip2 Model Support (#27566) Yu Jiaqi 2025-10-27 21:57:37 +08:00
9273754222 [Hybrid] Added supports_mamba_prefix_caching Protocol (#27339) Asaf Joseph Gardin 2025-10-27 15:05:20 +02:00
f4e8154076 [Kernel] Enable moe LoRA kernel support FP16 (#27468) Jee Jee Li 2025-10-27 19:48:37 +08:00
a663f6ae64 [cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 (#27415) Fadi Arafeh 2025-10-27 11:14:55 +00:00
a4fc21895e [Bugfix] Fixed when return_token_ids=False, the first event still contains prompt_token_ids. (#27561) Chauncey 2025-10-27 19:06:43 +08:00
a3e8611da5 [Bugfix] Limit the default value of max_model_len when it is not specified by users (#27556) Shanshan Shen 2025-10-27 18:16:20 +08:00
7c2bdb83dc [Misc] Clean up utils (#27552) Cyrus Leung 2025-10-27 17:05:40 +08:00
9932ed6a83 [Kernel] Adding split_K implementation for fused_moe_lora (#27291) Danielle Robinson 2025-10-27 02:05:24 -07:00
2d631d28c6 [Doc] Slight improvement to M2 and beyond (#27554) Jee Jee Li 2025-10-27 17:02:10 +08:00
b368382964 [Model] Deprecate merge_by_field_config=False (#27551) Cyrus Leung 2025-10-27 16:43:00 +08:00
a806c14cc7 [Performance][LoRA] add context varying params to 'do_not_specialize' in fused moe lora (#27445) gnovack 2025-10-26 23:31:55 -07:00
181bf5bbde [Docs] reemove the incorrect enable_reasoning parameter (#27550) yyzxw 2025-10-27 14:17:19 +08:00
cbd5e07a51 [Model] Use merge_by_field_config for MM models (Qwen series) (#27546) Cyrus Leung 2025-10-27 13:38:05 +08:00
63b22e0dbb [Model][Bugfix] fix ernie45 moe 300B SharedFusedMoE output tuple (#27316) CSWYF3634076 2025-10-27 11:53:31 +08:00
5980604c44 Fix MiniMax-M2 copyright (#27537) Roger Young 2025-10-27 11:29:51 +08:00
361a7463d3 fix m2 test (#27536) youkaichao 2025-10-27 01:04:36 +08:00
720af6ab79 [Model][MiniMax-M2] Support MiniMax-M2 Model (#27535) Roger Young 2025-10-27 00:59:11 +08:00
55cba4a05c [CI/Build] Update causal-conv1d installation (#27529) Cyrus Leung 2025-10-26 22:14:22 +08:00
c7abff2990 Revert "[CI/Build] Use CPU for mm processing test on CI (#27522)" (#27531) Cyrus Leung 2025-10-26 19:44:27 +08:00
71b1c8b667 [Chore]:Extract math and argparse utilities to separate modules (#27188) Yeshwanth N 2025-10-26 16:33:32 +05:30
8fb7b2fab9 [Doc] Fix links to GH projects (#27530) Cyrus Leung 2025-10-26 17:55:51 +08:00
be7b55a83d [Doc] Remove Molmo warning (#27527) Cyrus Leung 2025-10-26 16:22:52 +08:00
315b860abe [bugfix]fix empty prompts for async-engine mode in benchmark throughput (#27494) Lucia Fang 2025-10-26 01:16:35 -07:00
87c41c26ad [Bugfix] Fix processor initialization for model from modelscope instead of HF (#27461) rongfu.leng 2025-10-26 15:44:31 +08:00
65d2cf9511 [BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190) JartX 2025-10-26 08:08:52 +01:00
d63cd9ff10 [CI/Build] Use CPU for mm processing test on CI (#27522) Isotr0py 2025-10-26 13:09:18 +08:00
66a168a197 [CI/Build] Refactor processing tests (#27470) Cyrus Leung 2025-10-26 00:14:30 +08:00
a99564ac5b [Attention] Add missing kv cache scale setup (#27490) Matthew Bonanni 2025-10-25 03:12:49 -04:00
4c5f632165 [Misc] Simplify max tokens in multimodal registry (#27500) Cyrus Leung 2025-10-25 14:56:01 +08:00
b853540388 [Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector (#25712) Kuntai Du 2025-10-24 23:34:18 -07:00
56ed7609a9 Revert "[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selectio… (#27502) Zhuohan Li 2025-10-24 22:31:43 -07:00
29c9cb8007 [CI] Add tests for cudagraph (#27391) Jiangyun Zhu 2025-10-25 10:37:33 +08:00
83f478bb19 [KVConnector] Migrate the LMCache integration code to be vLLM native (#25542) v0.11.1rc3 Yihua Cheng 2025-10-24 17:23:53 -07:00
269c4db0a4 [Misc][DP] Guard mxfp4 implementation selection (#27484) Varun Sundar Rabindranath 2025-10-24 19:29:24 -04:00
52efc34ebf [Log] Optimize Startup Log (#26740) Wentao Ye 2025-10-24 19:27:04 -04:00
d95d0f4b98 [Distributed] Basic set of configuration for large EP deployment on GB200 (#27328) Pengchao Wang 2025-10-24 14:16:44 -07:00
0402428200 [Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run (#27455) Lehua Ding 2025-10-25 04:45:36 +08:00

... 49 50 51 52 53 ...