Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

f716a15372 Update KServe guide link in documentation (#29258) Yuan Tang 2025-11-24 09:40:05 -05:00
2601f18a82 [EPLB] Optimize EPLB for Async Rearrange Experts (#22179) WeiQing Chen 2025-11-24 22:08:29 +08:00
4de87866a8 [CPU][IBM Z] Fix BF16 support and vectorize math operations for s390x (#28926) R3hankhan 2025-11-24 17:38:09 +05:30
eca7a8fb59 [Doc]: fix typos in various files (#29230) Didier Durand 2025-11-24 12:10:48 +01:00
8005e606bf [Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-MTP (#27563) 杰兮 2025-11-24 18:16:52 +08:00
68dfe28eae [Feature][Benchmark] add --link-vars can filter when serve_param equal bench_param (#28909) rongfu.leng 2025-11-24 18:02:28 +08:00
ed40d85929 [BugFix] Fix R-VL model loading error (#29299) Fanli Lin 2025-11-24 14:48:45 +08:00
0ff70821c9 [Core] Deprecate xformers (#29262) Roger Wang 2025-11-23 20:18:55 -08:00
5253f4276f [ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter Flash Attention (#28376) tongqiu 2025-11-24 11:26:00 +08:00
30854783ad [Model] Add OpenCUA-7B support (#29068) Zero 2025-11-24 11:27:55 +09:00
1073ba68b0 [LoRA] Optimize 3D MoE logic (#29222) Jee Jee Li 2025-11-24 10:27:23 +08:00
c309bb5245 [Bugfix] Update Gradio OpenAI Chatbot Webserver example to new Gradio message history format (#29249) Josh Moore 2025-11-23 19:47:54 -05:00
3e1ad40655 [Model Runner V2] Add apply_temperature option to gumbel_sample (#29276) Woosuk Kwon 2025-11-23 14:13:00 -08:00
62d54ba46d [Model Runner V2] Optimize CUDA graph capture time (#29275) Woosuk Kwon 2025-11-23 11:15:32 -08:00
b004c00418 [Model Runner V2] Support spec decoding [1/N] (#29274) Woosuk Kwon 2025-11-23 10:09:06 -08:00
7f12c82fa6 [Model Runner V2] Change bookkeeping logic in preparation for spec decoding (#29194) Woosuk Kwon 2025-11-23 09:42:52 -08:00
6fb0215eee [Bugfix] Use lazy string reference for DeepseekV3Config in config registry (#28958) Luke 2025-11-23 06:43:21 -05:00
55c21c8836 [ROCm][CI] Fix "Cannot re-initialize CUDA in forked subprocess" in test_pynccl.py (#29119) Micah Williamson 2025-11-22 23:05:00 -06:00
3999442f1c [CI/Build][AMD] Add check for flash_att_varlen_func to test_tree_attention.py (#29252) rasmith 2025-11-22 22:45:08 -06:00
71362ffab4 [CI/Build][AMD] Skip test_multi_shared_storage_connector_consistency in test_multi_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29253) rasmith 2025-11-22 22:42:49 -06:00
20ee418adc [Model Runner V2] Minor fix for cudagraph_utils (#29256) Woosuk Kwon 2025-11-22 20:12:50 -08:00
389aa1b2eb [Doc] Update more docs with respect to V1 (#29188) Cyrus Leung 2025-11-23 10:58:48 +08:00
3ed767ec06 docs: fixes distributed executor backend config for multi-node vllm (#29173) Michael Act 2025-11-23 09:58:28 +07:00
5f96c00c55 [Fix] Add SM check to flashinfer MOE backend (#29144) jiahanc 2025-11-22 16:39:30 -08:00
4587063267 Patch DeepEP when building docker image with CUDA 13 (#29154) Qidong Su 2025-11-22 18:25:13 -05:00
472fdee974 [Chore] Update batch invariant code owner (#29246) Wentao Ye 2025-11-22 16:50:02 -05:00
df78aeef08 Refactor: Move CUDA graph dispatch logic earlier (#27382) Yizhou 2025-11-23 05:10:31 +08:00
7df331c66b [BugFix] Fix chunked prompt logprobs + preemption (#29071) Nick Hill 2025-11-22 13:07:18 -08:00
eb5352a770 [CI/build] Removes source compilation from runtime image (#26966) Benjamin Bartels 2025-11-22 18:23:09 +00:00
d1cf8214e5 [Bugfix] Use HF config fields as fallback when loading Mistral config (#29239) Cyrus Leung 2025-11-23 02:22:48 +08:00
730bd35378 [perf][cpu] Accelerate paged attention GEMMs (QK, PV) on Arm CPUs with NEON (#29193) Fadi Arafeh 2025-11-22 17:04:36 +00:00
f55c76c2b3 chore: add RTX_PRO_6000 GLM4.6-FP8 kernel tuning (#29240) Federico 2025-11-22 17:42:48 +01:00
d84d8f4429 Fix EVS crash when using video_embeds inputs in Qwen2.5-VL (#29232) ZiTian Zhao 2025-11-22 22:48:59 +08:00
ae66818379 [Misc] Fix pre-commit (#29238) Cyrus Leung 2025-11-22 22:48:01 +08:00
d44a63c6d6 [BugFix] Fix returned logprobs with spec decode + prefill chunking (#29216) Nick Hill 2025-11-22 06:41:25 -08:00
066209a045 [Attention] Refactor FA block_size limitations to hybrid models only (#29084) Nicolò Lucchesi 2025-11-22 15:38:44 +01:00
5f7209a793 [tiny] Remove unsupported TRITON_MLA backend from batch invariance (#28832) Bram Wasti 2025-11-22 08:00:50 -05:00
2d4978a57e fix: clean up function never use in setup.py (#29061) yihong 2025-11-22 21:00:04 +08:00
6965a392a4 Fix: Resolve circular import in model_loader/utils.py (#29189) Nandan Vallamdasu 2025-11-22 18:28:22 +05:30
5a4802588e [Misc] Further clean up chunked prefill and prefix caching init (#29186) Cyrus Leung 2025-11-22 19:34:15 +08:00
8e22da1d7f [CI/Build Don't add FLASHINFER backend in test_cpu_offloading.py (#29229) rasmith 2025-11-22 05:00:54 -06:00
a4fdf2405c [CI/Build] Skip tests that require libcudart in test_lmcache_integration.py (#29228) rasmith 2025-11-22 04:59:39 -06:00
e6309acdba Simplify from_blob usage in get_cuda_view_from_cpu_tensor (#29027) Jane (Yuan) Xu 2025-11-22 05:35:32 -05:00
988ee66b0d Handle triton kernel import exception (#29062) jinghanhu 2025-11-22 18:07:50 +08:00
ea38474ac5 [Frontend][Responses API] Multi-turn (with type: "output_text") support for non-harmony requests (#29175) Mads Kildegård 2025-11-22 10:58:22 +01:00
742e9ff6b3 [responsesAPI] parse reasoning item input (#28248) Andrew Xia 2025-11-21 23:42:11 -08:00
e9056056fb [Model Runner V2] Limit cudagraph size to max decode batch size (#29221) Woosuk Kwon 2025-11-21 20:21:35 -08:00
1489902b53 [LoRA] Cleanup FusedMoEWithLoRA (#29187) Jee Jee Li 2025-11-22 12:01:30 +08:00
933f67ecd8 [Bugfix]Fix a conditional to not check zero value (#28754) Yanan Cao 2025-11-21 19:59:07 -08:00
fd65015a14 [CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149) rasmith 2025-11-21 21:34:33 -06:00
77e1c035d0 [chore][LMCache connector] Remove useless logs from lmcache connector (#29069) Yihua Cheng 2025-11-21 19:18:00 -08:00
6f403501a0 [CI/Build][AMD] Enable Entrypoints Integration Test (Pooling) to run without error on ROCm (#29212) rasmith 2025-11-21 20:13:18 -06:00
052950e5b3 Add fused MoE config for H200 E160 N192 fp8 (#29182) FlintyLemming 2025-11-22 09:37:51 +08:00
1ef9c9e294 [CI/Build] Disable test_gptoss_tp.py in 'LoRA TP Test' group for ROCm platform (#29204) qli88 2025-11-21 19:36:19 -06:00
5c8f2adf50 [Bugfix] Fix block size in block_table with PCP (#29094) Jie Luo 2025-11-22 09:34:28 +08:00
ed8e6843cc [CI/Build] Add terratorch for AMD (#29205) Ryan Rock 2025-11-21 19:31:22 -06:00
d045e22dfe [Model][Qwen3VL] Tune Triton w8a8 block fp8 kernel for L40s (#29217) Lukas Geiger 2025-11-22 01:30:55 +00:00
1d34eb11e0 [CI] Bug: Fix triton import issue (#29202) Wentao Ye 2025-11-21 20:14:49 -05:00
9a3101b2ba [Rocm][CI] Fix DeekSeek V2-Lite Accuracy CI (#29135) Charlie Fu 2025-11-21 19:11:02 -06:00
d5dbdbfcb2 [docs] Fix cudagraph mode config (#29170) Angela Yi 2025-11-21 17:10:27 -08:00
30d6466238 [BugFix] Fix Eagle IndexError: list index out of range for even num_speculative_tokens (#29102) Lucas Wilkinson 2025-11-21 19:47:05 -05:00
e9af6ba62a [Model Runner V2] Optimize Gumbel Sampling Kernel (#29210) Woosuk Kwon 2025-11-21 15:52:28 -08:00
c6fa3895e9 [KV Connector] Fix async connector prefix cache metrics (#28585) Mark McLoughlin 2025-11-21 22:45:00 +00:00
3137991f55 [BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor (#29162) Varun Sundar Rabindranath 2025-11-21 17:28:17 -05:00
57430fc95c Default model load/config/tokenizer to mistral format if relevant files exist (#28659) Julien Denize 2025-11-21 22:58:59 +01:00
c68c7b403d [BugFix] Fix missing symbol triggering FA2 fallback on Hopper (#29107) Lucas Wilkinson 2025-11-21 16:58:32 -05:00
53a1ba6ec5 [log] add weights loading time log to sharded_state loader (#28628) Ning Xie 2025-11-22 05:06:09 +08:00
1840c5cb18 [BugFix] Make sure to allocate worst case MoE workspace during profile run in the DP + EP case (#27426) Lucas Wilkinson 2025-11-21 14:41:52 -05:00
1bed891f72 [Chore] Fix pre-commit error after #25266 (#29190) Woosuk Kwon 2025-11-21 10:21:40 -08:00
ceca060501 [Deprecation] Deprecate seed=None (#29185) Cyrus Leung 2025-11-22 02:19:25 +08:00
75648b16dd [ROCm][CI] Fix config/test_config_generation.py (#29142) Charlie Fu 2025-11-21 11:12:16 -06:00
460d02a417 [NIXL] Fix after virtual block_size for host_buffer with heter kv_layout (#29122) Chendi.Xue 2025-11-21 10:55:27 -06:00
b4c8fbaae2 Add TRTLLM MoE NVFP4 kernel to CompressedTensorsW4A4MoeMethod (#28892) Mingyuan Ma 2025-11-21 08:54:11 -08:00
e99e467384 [CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in lighting_attn.py (#29132) rasmith 2025-11-21 10:53:09 -06:00
a42ab317ac [Log] Optimize startup log (#28948) Wentao Ye 2025-11-21 11:46:20 -05:00
b7f1f490a6 Upstream triton fp4 weight preshuffle (#28888) Aleksandr Malyshev 2025-11-21 08:34:46 -08:00
30b44a1598 GPU Model Runner V2 (#25266) Woosuk Kwon 2025-11-21 08:20:55 -08:00
1f400c58b8 [CI] Add batch invariant test to ci (#27842) Wentao Ye 2025-11-21 11:20:33 -05:00
711241c13c [CI/Build] Fix illegal memory access and unsupported test in kernels/attention/test_cache.py (#29118) rasmith 2025-11-21 09:58:38 -06:00
d7219bcda3 [Misc] Move dynamic seed initialization to EngineArgs (#29165) Cyrus Leung 2025-11-21 23:27:44 +08:00
4050bae417 [Doc] Update plugin doc (#28532) wangxiyuan 2025-11-21 22:57:26 +08:00
f1805db1a6 [Perf] These changes enhance the NUMA functionality of vllm for systems with more than one NUMA nodes per socket (#25559) skaraban3807 2025-11-21 19:43:52 +05:30
434f3d3eb8 Fix mistral config (#29172) Julien Denize 2025-11-21 15:01:20 +01:00
2092ce8c39 Tool Call Parser logs should not contain user input / model output except on DEBUG (#29160) sfbemerk 2025-11-21 13:57:19 +01:00
fc9f821d20 fix cross attention (#28346) who who who 2025-11-21 20:55:43 +08:00
9452863088 Revert "Revert #28875 (#29159)" (#29179) Cyrus Leung 2025-11-21 20:27:43 +08:00
2b1b3dfa4b Update Dockerfile to use gcc-toolset-14 and fix test case failures on power (ppc64le) (#28957) Bhagyashri 2025-11-21 17:54:09 +05:30
cca2d2cdbe [Core] Align whisper closer to other multimodal models (#27292) Russell Bryant 2025-11-21 07:01:54 -05:00
aab0102a26 [V0 deprecation] Remove more V0 references (#29088) Cyrus Leung 2025-11-21 19:56:59 +08:00
b34129bf8e [Misc] remove useless v1 env (#29164) WeiQing Chen 2025-11-21 17:41:20 +08:00
4d7231e774 Revert #28875 (#29159) Cyrus Leung 2025-11-21 17:40:17 +08:00
8ac3a41487 [CI Failure] Fix Gemma3 RoPE configuration for sliding attention layers (#29111) Huamin Li 2025-11-20 23:53:30 -08:00
7d6da483b0 [Minor][Clean] Remove the legacy assertion in video (#29150) Canlin Guo 2025-11-21 15:52:34 +08:00
e4c3182c68 [Small] Capture AttributeError when checking ray dependency. (#29024) Chenheli Hua 2025-11-20 22:54:10 -08:00
b4734b9550 [Bugfix] Fix default MM LoRA alignment for single str prompts (#29140) Alex Brooks 2025-11-20 22:32:30 -07:00
30b9c67743 Revert "[Redo] #26368 (#28771)" (#29121) Jialin Ouyang 2025-11-20 21:27:45 -08:00
11857a00b0 [Attention] Add ROCM_AITER_MLA_SPARSE to attention backend registry (#29103) Matthew Bonanni 2025-11-20 23:24:43 -05:00
8c25f9cfb6 [BugFix] skip combo kernel on cpu (#29129) Boyuan Feng 2025-11-20 19:50:59 -08:00
56e96b37e4 [V0 Deprecation] Remove best_of (#29090) Cyrus Leung 2025-11-21 11:40:40 +08:00
698024ecce [Doc] update installation guide regarding aarch64+cuda pytorch build (#28875) Qidong Su 2025-11-20 22:40:25 -05:00

... 41 42 43 44 45 ...