Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

09ad3b76b3 [Bug] Fix attention_backend arg string parsing (#30534) Michael Goin 2025-12-12 10:40:50 -05:00
dc13c99eed fix(gguf): Disable bfloat16 for GGUF on blackwell device (#30408) Christina Norman 2025-12-12 09:10:12 -06:00
3e34adcdfb [DeepSeek V3.2] Proper drop_thinking logic (#30490) Vladislav Nosivskoy 2025-12-12 18:01:06 +03:00
3e41992fec [Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532) Lucas Wilkinson 2025-12-12 08:57:47 -05:00
91401c7a26 [Bugfix] Fix CMakeLists Environment Variable (#21804) 吴坎 2025-12-12 18:54:52 +08:00
f90319d5d1 [Bugfix] Schedule failure due to wrong get_image_size_with_most_features (#29692) Jaehwang Jung 2025-12-12 19:27:20 +09:00
302b2c1eb9 [CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. (#30291) rasmith 2025-12-12 03:30:23 -06:00
8f8fda261a [Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729) Ben Browning 2025-12-11 23:59:53 -05:00
fe1787107e [compile] Parse compile range cache keys as Range during cache loading. (#30516) Zhengxu Chen 2025-12-11 23:30:51 -05:00
783644e4ac [ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527) Andreas Karatzas 2025-12-11 21:54:56 -06:00
197473c4e7 [CI/Build] Use spawn subprocess for ROCm (#30272) Ryan Rock 2025-12-11 21:33:17 -06:00
947dfda9c2 [LMCache] Relax lmcache version requirement (#30425) Nick Hill 2025-12-11 19:18:47 -08:00
9f2fc16a69 [Bugfix][Model] Fix Afmoe rope_parameters issue (#30505) Michael Goin 2025-12-11 21:53:57 -05:00
6a6fc41c79 gptq marlin quantization support for fused moe with lora (#30254) Bhanu Prakash Voutharoja 2025-12-12 13:27:22 +11:00
f355ad5412 [CPU][FIX] Fix build failures on Arm CPUs with torch nightly (#30481) Fadi Arafeh 2025-12-12 02:09:25 +00:00
042da73244 [Core] Refactor _build_attention_metadata (#29628) Lucas Wilkinson 2025-12-11 20:54:12 -05:00
b5945d49c0 [ROCm][CI] Use mi325_4 agent pool for V1 e2e tests (#30526) Andreas Karatzas 2025-12-11 19:37:24 -06:00
ba80926681 [CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they require cutlass_pack_scale_fp8 (#30508) rasmith 2025-12-11 19:02:19 -06:00
0ab23c2b2b [fix] fix SM check for Flashinfer TRTLLM MOE (#30314) jiahanc 2025-12-11 17:00:58 -08:00
48661d275f [CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm (#30417) rasmith 2025-12-11 18:24:20 -06:00
d527cf0b3d [FIX]Patch run-cluster.sh (fix for #28328) (#30002) Ev Lacey 2025-12-11 15:36:31 -08:00
2cc5affc38 [ROCM][CI] Fix AMD Examples Test Group (#30276) Concurrensee 2025-12-11 17:03:54 -06:00
a00d88973d [EPLB] Support EPLB w/ NVFP4 (#29804) Andrew Briand 2025-12-11 16:59:40 -06:00
61249b177d [Refactor] Remove useless syncwarp (#30510) Wentao Ye 2025-12-11 17:43:41 -05:00
c817b14151 [Perf] Optimize deepgemm experts initialization, 3.9% TTFT improvement (#30494) Wentao Ye 2025-12-11 17:28:34 -05:00
3efdc3feae [Docs][CPU backend] Add pre-built Arm CPU Docker images (#30491) ioana ghiban 2025-12-11 23:03:29 +01:00
0efd9f867c [Core] Whisper Enable Encoder Batching (#29421) Nicolò Lucchesi 2025-12-11 22:06:51 +01:00
90d6cf921f [BugFix][MM]support VLLM_RANDOMIZE_DP_DUMMY_INPUTS (#30472) Xingyu Liu 2025-12-11 13:00:15 -08:00
cf3eacfe58 Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim (#30389) Harry Mellor 2025-12-11 20:45:23 +00:00
92fea56fd1 [compile] Stop one-off setting enable_aot_compile and use context manager instead. (#30503) Zhengxu Chen 2025-12-11 15:28:03 -05:00
e458270a95 [Misc] Add mcp to requirements (#30474) Ye (Charlotte) Qi 2025-12-11 12:06:09 -08:00
72aaac5b66 [ROCm][Bugfix] Add MLACommonMetadata to allowed attention types for speculative decoding (#30430) Andreas Karatzas 2025-12-11 13:25:01 -06:00
0e71eaa644 [Feature] AWQ marlin quantization support for fused moe with lora (#30442) 汪志鹏 2025-12-12 02:03:32 +08:00
8781cd6b88 Add Eagle and Eagle3 support to Transformers modeling backend (#30340) Harry Mellor 2025-12-11 17:02:10 +00:00
aa3c250c48 [IMPROVEMENT] Change MistralReasoningParser behavior (#30391) Julien Denize 2025-12-11 17:53:26 +01:00
305b168a9f [CI] refine more logic when generating and using nightly wheels & indices, add cuda130 build for aarch64, specify correct manylinux version (#30341) Shengqi Chen 2025-12-12 00:42:30 +08:00
93db3256a4 Give pooling examples better names (#30488) Harry Mellor 2025-12-11 16:22:58 +00:00
17cb540248 [Docs][CPU Backend] Add nightly and per revision pre-built Arm CPU wheels (#30402) ioana ghiban 2025-12-11 16:57:10 +01:00
97a042f3bc Make the httpx logger less annoying when Transformers v5 is installed (#30480) Harry Mellor 2025-12-11 15:44:56 +00:00
3a3b06ee70 [Misc] Improve error message for is_multimodal (#30483) Cyrus Leung 2025-12-11 22:39:51 +08:00
f4417f8449 [KVConnector] Add KV events to KV Connectors (#28309) Martin Hickey 2025-12-11 14:30:29 +00:00
a11f4a81e0 [Misc][PCP&DCP] relocate PCP feature check (#30050) Qiu 2025-12-11 19:36:18 +08:00
853611bb18 Fix typo of endpoint name in CLI args docs (#30473) Kenichi Maehashi 2025-12-11 20:07:56 +09:00
d917747c95 [Bugfix] Fix task still being passed in tests/benchmarks (#30476) Cyrus Leung 2025-12-11 18:33:55 +08:00
a5f9fb5960 [Deprecation] Deprecation --convert reward, use --convert embed instead. (#30463) wang.yuqi 2025-12-11 18:18:25 +08:00
4515eb1a0b [Fix] Update lazing loading of video loader backend (#30444) jeremyteboul 2025-12-11 02:14:57 -08:00
13d63b65e0 [Deprecation] Remove missed fallback for embed_input_ids (#30469) Cyrus Leung 2025-12-11 18:06:36 +08:00
b4e8b91278 [Fix]fix import error from lmcache (#30376) wz1qqx 2025-12-11 17:23:52 +08:00
6299628d32 [bugfix] fix MiniMaxM2ReasoningParser streaming output not separating reasoning_content. (#29882) Rei. 2025-12-11 17:05:08 +08:00
fba8906930 [perf] Use direct copy (broadcast) instead of cat for k_nope/k_pe in MLA prefill (#29710) Ming Yang 2025-12-11 00:20:45 -08:00
d02d1043de fix: enhance human_readable_int function (#30337) Ning Xie 2025-12-11 15:30:33 +08:00
979f50efd0 [Deprecation] Remove fallbacks for embed_input_ids and embed_multimodal (#30458) Cyrus Leung 2025-12-11 14:58:23 +08:00
36c9ce2554 Ensure minimum frames for GLM 4.6V compatibility (#30285) gh-wf 2025-12-11 00:26:49 -05:00
1a516557e1 [Doc] Add Baidu Kunlun XPU support (#30455) xyDong0223 2025-12-11 12:52:17 +08:00
d6464f2679 [Chore] Fix torch precision warning (#30428) Wentao Ye 2025-12-10 23:05:56 -05:00
7e24e5d4d6 [Deprecation] Remove deprecated task, seed and MM settings (#30397) Cyrus Leung 2025-12-11 11:59:39 +08:00
5a87d8b9b1 [Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396) Cyrus Leung 2025-12-11 11:59:35 +08:00
d1e1fb4363 [Bugfix] Fix grouped_topk pytorch impl when num_experts can't be grouped properly (#29439) Divakar Verma 2025-12-10 21:47:18 -06:00
b51255f369 [ROCm] Fix broken import in platform attention backend dispatching (#30432) Andreas Karatzas 2025-12-10 19:12:58 -06:00
b4054c8ab4 Revert "[CI] Add Async Eplb nightly CI tests (#29385)" (#30431) Sage Moore 2025-12-10 16:48:35 -08:00
25221b44bb Add more docs for regex (#30106) Xu Song 2025-12-11 08:12:21 +08:00
8580919ac3 [Bugfix] fix confusing OOM errors during v1 init (#28051) shivampr 2025-12-10 15:17:41 -08:00
166ac3c94d fix(shm): Add memory barriers for cross-process shared memory visibility (#30407) Christina Norman 2025-12-10 17:01:19 -06:00
b9e0951f96 [docs] Improve wide-EP performance + benchmarking documentation (#27933) Seiji Eicher 2025-12-10 17:15:54 -05:00
fcb894222f [Docs] Update EPLB docs (#30426) Michael Goin 2025-12-10 15:56:51 -05:00
6ccb7baeb1 [LMCache] Fix breakage due to new LMCache version (#30216) Nick Hill 2025-12-10 11:52:01 -08:00
eea41804a4 [bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used (#30241) Po-Han Huang (NVIDIA) 2025-12-11 03:18:51 +08:00
9f042ba26b [Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289) Jialin Ouyang 2025-12-10 11:13:01 -08:00
e72d65b959 {Deprecation] Remove tokenizer setter (#30400) Cyrus Leung 2025-12-11 03:10:58 +08:00
a9e4106f28 [P/D] KV Load Failure Recovery/Abort Configuration (#26813) Will Eaton 2025-12-10 14:00:52 -05:00
e8e8cd73e5 [Bugfix] Fix HunyuanOCR cross-image contamination in batch processing (#30344) Anker 2025-12-10 19:09:31 +01:00
253305d5b2 [Chore] Delay recent deprecations (#30398) Cyrus Leung 2025-12-11 01:48:38 +08:00
794a7875ee [Misc] Consistent case for vllm bench serve results (#30403) Matthew Bonanni 2025-12-10 12:44:02 -05:00
2dcbac9077 [Docs] Generate full list of metrics in user docs (#30388) Mark McLoughlin 2025-12-10 16:09:34 +00:00
aacf0abf8b [BugFix] Fix AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight_scale' (#30399) Lucas Wilkinson 2025-12-10 10:59:23 -05:00
c756fb6781 [Core] Whisper enable FULL_DECODE_ONLY CudaGraph (#30072) Nicolò Lucchesi 2025-12-10 15:14:24 +01:00
d017bceb08 [BugFix] Fix minimax m2 model rotary_dim (#30384) Roger Young 2025-12-10 20:58:50 +08:00
cebda2a4af [CPU] Support for Whisper (#30062) Aditya Tewari 2025-12-10 12:58:42 +00:00
53d2420b44 [Bugfix] tpu_model_runner: set vllm config context when calling reset_dynamo_cache() (#30331) Daniele 2025-12-10 13:58:35 +01:00
9db78f34dc [Bugfix] Fix the issue where DeepSeek v3.2 cannot use structured_output (#30371) Chauncey 2025-12-10 16:30:16 +08:00
434ac76a7c [cpu][ci] Add CPU Attention Tests for Neon Backend (#30347) Fadi Arafeh 2025-12-10 05:37:35 +00:00
ed7af3178a [ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358) Andreas Karatzas 2025-12-09 23:33:13 -06:00
180345807f [CMake][Build]: Remove unused ACL CMake env variables (#30339) Radu Salavat 2025-12-09 20:27:19 -08:00
d007387aa7 [Bugfix] Cache added_vocab to avoid per-token overhead (#30351) Mingliang Li 2025-12-10 12:05:51 +08:00
3bdd426636 Fix typos in comments across multiple files (#30345) Wilson Wu 2025-12-10 12:05:28 +08:00
06462392e4 [bugfix][quantization] fix quark qwen3 kv_cache quantization (#30308) haoyangli-amd 2025-12-10 11:24:12 +08:00
7d80c73d42 [CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance (#30367) v0.13.0rc1 Micah Williamson 2025-12-09 20:35:49 -06:00
b75f826fca [CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform (#30020) rasmith 2025-12-09 20:28:37 -06:00
c3487aca34 [responsesAPI][6] Fix multi turn MCP tokenization (#30230) Andrew Xia 2025-12-09 18:13:13 -08:00
abe93bce59 [Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624) Lucas Wilkinson 2025-12-09 20:18:10 -05:00
2e7035dd8c [Bugfix] Fix fp8 DeepGemm compilation issues (#30336) ElizaWszola 2025-12-10 02:17:25 +01:00
4c2e10ea19 [Bugfix] Fix cuda graph sizes when running with speculative decoding (#30330) PatrykSaffer 2025-12-10 01:47:07 +01:00
03b5f940fd [V1][Spec Decode] Optimize Medusa proposer to avoid GPU-CPU sync (#29723) dongbo910220 2025-12-10 08:15:01 +08:00
2e7054da06 Improve wvsplitK tile and balance heristics. (#29937) Hashem Hashemi 2025-12-09 15:51:32 -08:00
3c680f4a17 [Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693) Charlie Fu 2025-12-09 16:39:26 -06:00
fccd532587 [Quantization] FP8 Weight Reloading for Quantized RL Rollout (#28480) Kyle Sayers 2025-12-09 16:54:32 -05:00
00e5cbb967 [MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply (#29066) bnellnm 2025-12-09 16:48:25 -05:00
7618dc973d [CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145) rasmith 2025-12-09 14:18:17 -06:00
f8dacc66b6 Bump actions/stale from 10.1.0 to 10.1.1 (#30234) dependabot[bot] 2025-12-09 20:12:14 +00:00
7cab92fd45 Bump actions/checkout from 6.0.0 to 6.0.1 (#30233) dependabot[bot] 2025-12-09 20:03:16 +00:00

... 35 36 37 38 39 ...