Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

b2f78cbad4 [small][batch invariance] Rename the env and internal flags to simplify usage (#26855) Bram Wasti 2025-10-16 14:40:25 -07:00
23583ee28c [Bug] Add Assertion for random-input-len / random-output-len (#26834) Wentao Ye 2025-10-16 17:36:39 -04:00
01c977e96d [CI] Prune Quantization Tests and skip compilation (#27038) Michael Goin 2025-10-16 17:26:35 -04:00
b3dda72c23 [Feature] Migrate DeepGEMM API from get_m_alignment_for_contiguous_layout to get_mk_alignment_for_contiguous_layout (#26935) Wentao Ye 2025-10-16 16:46:48 -04:00
fb0571b077 [GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997) Varun Sundar Rabindranath 2025-10-16 15:53:11 -04:00
2ed8b6b3d0 [Bug] Fix batch invariant test has to is (#27032) Wentao Ye 2025-10-16 15:45:14 -04:00
013abde6ef Adding Warmup to Benchmark Serving (#26943) kimbochen 2025-10-16 15:44:32 -04:00
a5464dcf92 [Compressed Tensors] Always clone output for compile robustness (#26849) Kyle Sayers 2025-10-16 15:29:59 -04:00
ac3ed5a815 Support block size of 256 used by Intel HPU (#26883) Mandy Li 2025-10-16 12:10:57 -07:00
e6ba2000ae [gpt-oss][1/N] EZ: refactor serving_responses for modularity (#26948) Andrew Xia 2025-10-16 11:44:06 -07:00
aa255ff55a Support set in the CLI generation (#27031) Harry Mellor 2025-10-16 19:07:18 +01:00
7bb736d00e Fix Qwen2.5 VL image grid docstring (#27033) ZiTian Zhao 2025-10-17 00:57:36 +08:00
9f4e30904b [Model] Fix Qwen3VL mm mapping (#27027) Jee Jee Li 2025-10-17 00:45:59 +08:00
5afd3276df [Feature] Add process_weights_after_loading to AttentionImpl (#26870) rongfu.leng 2025-10-16 23:02:30 +08:00
43721bc67f [CI] Replace large models with tiny alternatives in tests (#24057) Tahsin Tunan 2025-10-16 20:51:27 +06:00
02d709a6f1 [docs] standardize Hugging Face env var to HF_TOKEN (deprecates HUGGING_FACE_HUB_TOKEN) (#27020) Kay Yan 2025-10-16 22:31:02 +08:00
4a510ab487 [NIXL] Improve request_finished() debug logs (#25665) Mark McLoughlin 2025-10-16 14:55:17 +01:00
314fa8abbf [Attention] Tune CUTLASS MLA num_splits (#26846) Matthew Bonanni 2025-10-16 09:36:09 -04:00
334535b6fb [Benchmark] Show E2EL by default for pooling models (#27014) Cyrus Leung 2025-10-16 20:47:09 +08:00
dcbb3f1871 [Bugfix] Correct LayerNorm epsilon parameter in modernbert.py (#27008) bogdanm 2025-10-16 17:27:44 +05:00
00417f4e44 [MISC] fix import violations for re and triton modules (#26654) Sungjae Lee 2025-10-16 19:38:27 +09:00
ed344f4116 Cleanup code after Python 3.10 upgrade (#26520) Lukas Geiger 2025-10-16 11:38:23 +01:00
e51928793e [Model][Bugfix] fix ernie45 vl run failed from shared experts optimization (#26885) CSWYF3634076 2025-10-16 18:37:35 +08:00
d2740fafbf [Chore] Separate out vllm.utils.collections (#26990) Cyrus Leung 2025-10-16 16:35:35 +08:00
17838e50ef [Benchmark] Use truncation by default for pooling benchmarks (#26992) Cyrus Leung 2025-10-16 16:02:39 +08:00
44c8555621 [CI/Build] Fix AMD import failures in CI (#26841) Zhewen Li 2025-10-16 00:28:20 -07:00
f7d318de2b [Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling (#26987) Akash kaothalkar 2025-10-16 11:06:59 +05:30
76f0d05bc6 [CI/Build] Update expected beam search output for Phi3V (#26978) Cyrus Leung 2025-10-16 13:12:44 +08:00
7d8975de84 Deepseek-v3 Batch Invariant on 8xH100 (#26609) Bram Wasti 2025-10-15 22:06:02 -07:00
785d8b6410 [PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) (#26437) Vadim Gimpelson 2025-10-16 08:18:31 +04:00
f6cdc9a02f [Chore] Rename utils submodules (#26920) Cyrus Leung 2025-10-16 11:58:13 +08:00
509cdc0370 [DOC][XPU]update feature parity with Intel GPU (#26954) Chendi.Xue 2025-10-15 22:07:10 -05:00
9b6504c307 [BugFix] Work around graph partition x torch.compile cache issue (#26956) Richard Zou 2025-10-15 23:06:11 -04:00
e19b16dde6 [bugfix] Fix SP + PP without specifying compile size (#26955) Angela Yi 2025-10-15 20:05:33 -07:00
582f2c6be7 [BUG] Allow runai_streamer_sharded in config check (#26958) ahao-anyscale 2025-10-15 20:05:14 -07:00
f8a0acbdbe [CI] Enable Blackwell Llama4 MoE tests (#26731) Michael Goin 2025-10-15 23:02:57 -04:00
1317034379 [ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops (#24097) kliuae 2025-10-16 10:41:34 +08:00
0ecc553ee6 [Bugfix] reasoning_parser parameter handling in run_batch.py (#26225) InChang Jeong 2025-10-16 11:24:05 +09:00
f96bc3649c [Qwen3-Next] Add tuned MoE config for Qwen3-Next FP8 on H100 tp2 (#26887) felixzhu555 2025-10-15 18:55:05 -07:00
938c43ea7f [ci] Adjusting AMD test composition 2025-10-14 (#26852) Alexei-V-Ivanov-AMD 2025-10-15 18:52:13 -05:00
0a9ef0cfce Move query quantization to attention layer for Flashinfer & Triton. (#26534) Adrian Abeyta 2025-10-15 18:01:38 -05:00
e5b438a247 [Bug] Temporally Disable VLLM_ALLREDUCE_USE_SYMM_MEM by Default (#26925) Wentao Ye 2025-10-15 16:18:50 -04:00
0b99f5d302 support flashinfer_fp4 moe for 5090 gpu (#26669) XiaobingZhang 2025-10-16 03:06:47 +08:00
1f491aa0c8 Vectorize RMS norm variance using vectorize_read_with_alignment (#26234) Benji Beck 2025-10-15 11:54:41 -07:00
de92d916fe [NVIDIA] Add support for cudnn fp4 gemm via flashinfer (#26107) Kaixi Hou 2025-10-15 10:53:00 -07:00
a1063628a4 [Chore] Clean up CODEOWNERS (#26923) Woosuk Kwon 2025-10-15 10:52:54 -07:00
d796375258 [ModelOpt] Remove NVFP4 MoE K%16==0 constraint (#26891) XiaobingZhang 2025-10-16 01:06:17 +08:00
14f8456344 [Feature]: Use pydantic validation in observability.py config (#26637) Sam/Samuel 2025-10-16 01:44:03 +09:00
4794c2bd92 Olmo 3 tool parser and tests (#26143) Pradeep Dasigi 2025-10-15 09:36:12 -07:00
d3cbaa08dc Lower sevarity of log when model info cache misses due to exception (#26917) Harry Mellor 2025-10-15 17:01:09 +01:00
828523ad8e [Chore] Separate out vllm.utils.async_utils (#26913) Cyrus Leung 2025-10-15 23:33:00 +08:00
136a17fe6e [Chore] Separate out vllm.utils.func (#26904) Cyrus Leung 2025-10-15 21:03:58 +08:00
f57438338d [BugFix] Patch inductor memory plan logic (#26878) Boyuan Feng 2025-10-15 05:51:45 -07:00
5d598680e3 chore: remove unused marker (#26890) Max Wittig 2025-10-15 14:40:33 +02:00
8f4b313c37 [Misc] rename torch_dtype to dtype (#26695) wangxiyuan 2025-10-15 20:11:48 +08:00
f93e348010 [Misc] Remove isort and yapf ignores (#26888) Cyrus Leung 2025-10-15 20:09:03 +08:00
f54f85129e [Model][2/N] Improve all pooling task | Support multi-vector retrieval (#25370) wang.yuqi 2025-10-15 19:14:41 +08:00
d4d1a6024f [Lora]Load tuned multi-lora kernel configs from json files (#26319) li2haipeng 2025-10-15 02:45:14 -07:00
db1764e4e0 [Platform] allow platform to init dp group (#22243) wangxiyuan 2025-10-15 17:32:17 +08:00
7f83b4ee8e [Easy] Get rid of unnecessary paraenthesis in kv_cache_manager (#26842) Jialin Ouyang 2025-10-15 02:17:43 -07:00
5c3bae1a6a [Fix] Remove divisibility requirement between num_kv_heads and tp_size in bailing_moe (#26876) ant-yy 2025-10-15 16:44:04 +08:00
5210dc3940 [Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops. (#26853) Xudong Ma 2025-10-15 01:37:49 -07:00
650b51f9f9 [doc] add Context Parallel Deployment doc (#26877) youkaichao 2025-10-15 16:33:52 +08:00
6256697997 [Doc] ruff format remaining Python examples (#26795) Cyrus Leung 2025-10-15 16:25:49 +08:00
71557a5f7c [CI] Fix mypy for vllm/executor (#26845) Wentao Ye 2025-10-15 04:23:33 -04:00
f3c378ffa7 [CI/Build] Add Qwen2.5-VL-7B-Instruct ChartQA Accuracy Tests in CI (#21810) Zhewen Li 2025-10-15 01:09:56 -07:00
f5ed68ef63 [Deepseek-V3.2][Kernel] Integrate cuda indexer k cache gather (#26456) Yongye Zhu 2025-10-15 04:05:01 -04:00
efdef57b1f [bugfix] Lazy import cv2 (#26869) Angela Yi 2025-10-15 00:47:50 -07:00
b8a4572157 [Misc] Use helper function to generate dummy messages in OpenAI MM tests (#26875) Cyrus Leung 2025-10-15 15:17:37 +08:00
302ef403a2 [DSA][MLA] Tiny refactor on DeepSeek to make it reusable for different backends (#26656) Mengqing Cao 2025-10-15 15:16:44 +08:00
8865da157b [Bugfix][Multi Modal] Fix incorrect Molmo token processing (#26873) sangho.lee 2025-10-15 02:13:59 -05:00
f0862eae43 [Graph Partition] pass tests for decorator (#26831) Boyuan Feng 2025-10-14 23:39:48 -07:00
8c851f6d04 [Bugfix] Fix qwen3-omni audio truncation issue (#26815) Isotr0py 2025-10-15 13:38:36 +08:00
7cfa420f49 [BugFix] Patch inductor partitioning logic (#26735) Angela Yi 2025-10-14 22:04:32 -07:00
a27b288e4a [Feature] default --extra-body param to disable thinking in vllm bench serve (#26784) rongfu.leng 2025-10-15 12:23:44 +08:00
e471d7ca7e [CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR (#26773) zhrrr 2025-10-15 12:09:44 +08:00
c43ca8259e [Docs] Move build.inc into arm.inc (#26862) Michael Yao 2025-10-15 11:35:08 +08:00
85a65e7f51 [Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972) (#25589) Tao Hui 2025-10-15 11:09:52 +08:00
a2986b3e33 [Bugfix] Fixes prefix-repetition benchmark script (#26828) kourosh hakhamaneshi 2025-10-14 19:54:43 -07:00
96b9aa5aa0 [Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level (#26355) Morrison Turnansky 2025-10-14 22:51:16 -04:00
e66d787bce Disable FlashInfer sampler by default (#26859) Michael Goin 2025-10-14 22:35:18 -04:00
bfad142e25 [BUGFIX][NIXL] quick fix for 'assert self.connector_worker is not None' in get_kv_connector_stats (#26851) Chendi.Xue 2025-10-14 21:33:25 -05:00
9354660036 [Bugfix]fix Qwen3 xml tool parser (#26345) Zhikaiiii 2025-10-15 09:50:30 +08:00
07ca70af8d [Core][Easy] Use envs.__getattr__ for all Unify to environment variable access (#26810) Jialin Ouyang 2025-10-14 18:41:18 -07:00
2dcd12d357 [torch.compile] Fix tests for torch==2.9 inductor partition (#26116) Luka Govedič 2025-10-14 19:55:02 -04:00
579d2e5458 [WideEP][P/D] Add usage stats for DP+EP and KV Connector (#26836) Tyler Michael Smith 2025-10-14 19:51:54 -04:00
0512c04aee [frontend][gptoss] Add per turn stats into Harmony Context (#25061) Ye Hu 2025-10-14 16:48:13 -07:00
7e0ef4084a [CI Failure] Fix torchao dep failure for Quantization Test (#26824) Michael Goin 2025-10-14 19:41:43 -04:00
4aed506b65 [Core] Streamline some structured output related code (#26737) Nick Hill 2025-10-14 16:27:44 -07:00
a86b4c58e8 remove attn output view kernel (#26680) Boyuan Feng 2025-10-14 15:53:10 -07:00
ff4810ba73 [Minor] Group async_scheduling related fields in model runner init (#26736) Nick Hill 2025-10-14 14:46:37 -07:00
9d6964926e fix: response_format for completion (#23212) Nan Qin 2025-10-14 16:23:22 -05:00
0e65818910 Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837) Dhruvil Bhatt 2025-10-14 14:21:03 -07:00
380f17527c [Perf] Cache vllm.env.__getattr__ result to avoid recomputation (#26146) Jialin Ouyang 2025-10-14 14:03:21 -07:00
b92ab3deda Notice for deprecation of AutoAWQ (#26820) HDCharles 2025-10-14 16:39:59 -04:00
acaa2c0a4a [Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs (#24964) Jialin Ouyang 2025-10-14 12:58:43 -07:00
82af928c41 [Attention][Spec Decode] FlashMLA spec decode support (#26541) Matthew Bonanni 2025-10-14 15:38:20 -04:00
87efc681db llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch (#26790) Huamin Li 2025-10-14 11:54:12 -07:00
c3a722fcb2 [CI Failure] Fix tests with missing TinyLlama-1.1B-Chat-v1.0-FP8-e2e (#26816) v0.11.1rc1 Michael Goin 2025-10-14 14:38:59 -04:00
aba48f7db1 [Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 (#26818) Ze'ev Klapow 2025-10-14 14:20:39 -04:00

... 52 53 54 55 56 ...