Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

c98be0a232 [Model] Enable DP for ViT in Qwen2-VL (#25445) Cyrus Leung 2025-09-23 13:17:10 +08:00
5774b0a1da [NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend (#25121) Chendi.Xue 2025-09-22 23:17:42 -05:00
e8db44f883 [DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP (#24588) Varun Sundar Rabindranath 2025-09-23 00:01:09 -04:00
fafbe11af4 [Docs] Fix griffe warnings in vllm/lora/ops (#25369) Michael Yao 2025-09-23 11:42:58 +08:00
78237e43bf [Bugfix] Remove contiguous output req for context parallel MLA (#25414) Michael Goin 2025-09-22 23:26:32 -04:00
eea1783989 [benchmarks]allow skip ready check for bench serve (#25420) Lucia Fang 2025-09-22 20:21:48 -07:00
f225ea7dd9 [XPU] Fix compile_size is None case. (#25433) Kunshang Ji 2025-09-23 11:09:00 +08:00
fc97733da8 [feat] Support MRoPE + YaRN (#25384) JJJYmmm 2025-09-23 11:04:47 +08:00
4741239db7 [Bug] Fix Long Context OOM Issue (#25290) Wentao Ye 2025-09-22 22:04:15 -04:00
c625f9043c [V0 deprecation] Remove _set_default_args_v0 function (#25409) Isotr0py 2025-09-23 09:52:09 +08:00
6fa78d8f23 [V0 deprecation] Remove platform v1 controling interface (#25410) Isotr0py 2025-09-23 09:48:12 +08:00
9949aa2ef1 [Perf] Apply torch.compile for per_block_cast_to_fp8 (#24611) Wentao Ye 2025-09-22 21:42:45 -04:00
0b7bed9c38 [Performance] Remove input pads in cutlass_mla and optimize v_proj output handling (#25184) Alexander Matveev 2025-09-22 21:20:53 -04:00
ac0048c0ae [BugFix] [DP/EP] Fix slow execution when BS <= DP (#25407) Matthew Bonanni 2025-09-22 20:26:17 -04:00
090197034f [Bugfix] Fix missing clear_connector_metadata (#25397) Nicolò Lucchesi 2025-09-23 02:10:59 +02:00
f31ff87460 [Core] Drop overly aggressive whisper assertion (#25408) Russell Bryant 2025-09-22 20:09:52 -04:00
d588cd2406 [Bugfix] fix custom op test (#25429) Luka Govedič 2025-09-22 20:07:43 -04:00
45d7d852d3 [Frontend] Responses API MCP tools for built in tools and to pass through headers (#24628) Alec S 2025-09-22 19:38:19 -04:00
8bed179109 [TPU] update torch_xla dependency for PyPI compatibility (#25278) Johnny Yang 2025-09-22 16:14:44 -07:00
f552d5e578 [CI/Build] Skip Qwen3-VL initialization tests until models are actually released (#25394) Cyrus Leung 2025-09-23 04:18:24 +08:00
8db2939289 [KV offload][5/N] Add CPUOffloadingSpec (#24251) Or Ozeri 2025-09-22 22:30:36 +03:00
d5e0fca264 [torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091), fix test (#24376), and prep for custom op matching (#24604) (#24542) Luka Govedič 2025-09-22 15:30:05 -04:00
8d0ee5a564 [misc] Remove RFC review hours reference (#25416) Simon Mo 2025-09-22 12:16:59 -07:00
922979bfcc [DP] support torchrun external launcher with Data Parallelism (#24899) Lucia Fang 2025-09-22 12:06:05 -07:00
239ef0c1ac [CI Failure] Fix fp8 kv cache on <SM90 (#25396) Michael Goin 2025-09-22 14:27:51 -04:00
1d7f95b85c [Compiler] Disable Inductor standalone compile by default (#25391) ElizaWszola 2025-09-22 19:37:46 +02:00
cfbee3d0e7 [CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274) Daisy-Ma-coder 2025-09-22 10:37:43 -07:00
06a41334c7 [EPLB] Reduce EPLB Inference Overhead (#24573) Bowen Wang 2025-09-22 09:31:05 -07:00
175811e3b5 [V1][Attention] Split triton_attn in triton-only and rocm specific backends (#24648) Burkhard Ringlein 2025-09-22 17:20:28 +02:00
c10101a3eb [Bugfix] Fix several issues with p2p xPyD in GET type (#23993) Csrayz 2025-09-22 22:53:13 +08:00
ac243886b0 [Kernel] MI-300X triton moe configs (#23445) Sara-KS 2025-09-22 09:29:54 -05:00
3d2c56b7a9 Make mypy behave like a proper pre-commit hook (#25313) Harry Mellor 2025-09-22 13:23:45 +01:00
64c824cd78 Make pickle import check fast (#25379) Harry Mellor 2025-09-22 12:08:25 +01:00
417a164af6 [Misc] Remove unused encoder-decoder error strings (#25374) Cyrus Leung 2025-09-22 19:04:32 +08:00
b6f01bd9a7 refactor: abstract graph mode support into platform interface (#25161) Yizhou 2025-09-22 18:22:29 +08:00
4cf71cc88a [TPU] Deprecate xm.mark_step in favor of `torch_xla.sync (#25254) Nicolò Lucchesi 2025-09-22 12:12:57 +02:00
a66d131381 [TPU][Bugfix][CI] Fix broken tests/build dependency (#25255) Nicolò Lucchesi 2025-09-22 11:55:04 +02:00
21467f9a1c Enable Eagle3 speculative decoding for GPT-OSS model (#25246) Eldar Kurtić 2025-09-22 10:50:39 +02:00
f92d952632 [V0 Deprecation] Remove MultiModalPlaceholderMap (#25366) Cyrus Leung 2025-09-22 16:49:19 +08:00
6d0b827cbd [V0 Deprecation] Remove V0-only methods in multi-modal registry (#25362) Cyrus Leung 2025-09-22 13:58:26 +08:00
0eecb31663 [Bugfix] Fix hermes tool parser handling of non-string argument types (#22002) WeiQing Chen 2025-09-22 11:35:39 +08:00
793be8d057 [Docs] GSM8K Accuracy Evaluation doc update (#25360) WeiQing Chen 2025-09-22 10:49:13 +08:00
7b57a433da [Model] Support Dots OCR (#24645) Roger Wang 2025-09-21 19:24:40 -07:00
5aeb925452 Multimodal - audio tests (#25285) Deboleina 2025-09-21 19:07:11 -04:00
04d3752329 [Bugfix][V0 Deprecation][CI] use async mock and await for async method (#25325) Yang Liu 2025-09-21 16:06:16 -07:00
bc6e542d9f Remove V0 attention backends (#25351) Woosuk Kwon 2025-09-21 16:03:28 -07:00
af7dfb0d1a [Perf] Further optimization for Qwen3-VL fast_pos_embed_interpolate (#25347) Isotr0py 2025-09-22 04:12:45 +08:00
1c3ffdbecc [V0 Deprecation] Remove V0 sampling metadata (#25345) Woosuk Kwon 2025-09-21 10:37:11 -07:00
c438b2951c feat: Enable engine-level arguments with speculators models (#25250) Rahul Tuli 2025-09-21 22:34:45 +05:30
0ff8ebb2d7 [V0 Deprecation] Remove async_output_proc, preemption mode, delay factor (#25334) Woosuk Kwon 2025-09-21 08:52:32 -07:00
26e673fe93 [V0 Deprecation] Remove V0 Sequence class & Sampler (#25332) Woosuk Kwon 2025-09-21 08:52:15 -07:00
65a5910ce3 [Optimization] Cache chat template result when processor fails to be loaded (#25341) Cyrus Leung 2025-09-21 19:41:02 +08:00
9aea7373ff [Bugfix] Typos in error message for missing model config file (#25339) Simon Danielsson 2025-09-21 13:36:47 +02:00
30d08911f7 [MM][Perf] Minor Optimization on Qwen3-VL fast_pos_embed_interpolate (#25337) Roger Wang 2025-09-21 04:05:20 -07:00
cf56cf78b4 [V1] Add sliding window support to Flex Attention backend (#24089) Isotr0py 2025-09-21 13:08:07 +08:00
7ed82d1974 [V0 Deprecation] Remove V0 MP executor (#25329) Woosuk Kwon 2025-09-20 21:26:35 -07:00
12dbd834cf [V0 Deprecation] Remove from_seq_group methods (#25330) Woosuk Kwon 2025-09-20 21:10:48 -07:00
035fd2bd2c [Multi Modal][Performance] Fused Q,K's apply_rope in more models (#25005) Wenlong Wang 2025-09-20 20:55:10 -07:00
1cd885bd54 [V0 Deprecation] Remove V0 model runner base & simplify worker base (#25328) Woosuk Kwon 2025-09-20 20:49:09 -07:00
62b38dc832 [Doc] improve test-pipeline.yaml documentation (#25305) Huamin Li 2025-09-20 20:29:12 -07:00
c99db8c8dd [V0 Deprecation] Remove V0 core (#25321) Woosuk Kwon 2025-09-20 19:58:26 -07:00
72dd1595b4 [CI] Skip tests failing on main (#25326) Woosuk Kwon 2025-09-20 19:57:46 -07:00
572ddf83ce [Chore] Remove unused sampler in models (#25324) Woosuk Kwon 2025-09-20 19:53:20 -07:00
86647d1cd0 [V0 Deprecation] Remove V0 Output Processor (#25320) Woosuk Kwon 2025-09-20 17:57:20 -07:00
52c2a8d4ad [V0 Deprecation] Remove LLMEngine (#25033) Woosuk Kwon 2025-09-20 17:56:30 -07:00
367a480bd3 [Docs] Fix warnings in vllm/profiler and vllm/transformers_utils (#25220) Michael Yao 2025-09-21 07:39:47 +08:00
bef180f009 [V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307) Cyrus Leung 2025-09-21 01:50:58 +08:00
d88918e4c2 [Core] Enable sharded state loader for V1 engine and enhance test coverage (#25308) lirong 2025-09-20 21:15:22 +08:00
3c713a9711 [Model] Cleanup InternViT's data parallel implementation (#25306) Isotr0py 2025-09-20 20:46:24 +08:00
bf8b26cad1 Generate _ModelInfo properties file when loading to improve loading speed (#23558) Manoel Marques 2025-09-20 07:51:13 -04:00
032d661d27 [Docs] Fix warnings in mkdocs build (continued) (#25042) Wenlong Wang 2025-09-20 04:45:18 -07:00
e08a3a3fdb [CI Failure] Disable FlashInfer RoPE to unblock CI (#25299) Michael Goin 2025-09-20 04:16:56 -04:00
3d9a1d2de5 [V1] Support LLM.apply_model (#18465) Cyrus Leung 2025-09-20 15:14:35 +08:00
be874c0201 [Bugfix] Fix Qwen3-VL-MoE weight loading for EP (#25300) Roger Wang 2025-09-20 00:04:05 -07:00
9607d5eb44 [Hybrid Allocator] Support full attention with different hidden size (#25101) Chen Zhang 2025-09-19 23:43:59 -07:00
c60e6137f0 [Optimization] Avoid repeated model architecture conversion for pooling models (#25261) Cyrus Leung 2025-09-20 13:30:22 +08:00
f91480b2d4 [Bugfix] fix tool call arguments is empty (#25223) Chauncey 2025-09-20 13:29:54 +08:00
6c5f82e5aa [BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention (#25298) Chendi.Xue 2025-09-19 23:41:23 -05:00
b7f186bbb3 [BugFix] Exclude self when checking for port collision (#25286) Nick Hill 2025-09-19 21:28:31 -07:00
3642909617 [BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) (#25268) JartX 2025-09-20 05:18:13 +02:00
c308501cb6 Improve weight loading for encoder models in Transformers backend (#25289) Harry Mellor 2025-09-20 04:11:03 +01:00
535d80056b [Misc] Support more collective_rpc return types (#25294) Nick Hill 2025-09-19 19:02:38 -07:00
a25ade5d47 [BugFix] Ensure appropriate guards in destructors (#25284) Nick Hill 2025-09-19 18:06:34 -07:00
8945b001db [torch.compile] CUDAGraph Inductor partition integration (#24281) Boyuan Feng 2025-09-19 18:02:15 -07:00
b8a287a0a8 [docs] Prompt Embedding feature support (#25288) Andrew Sansom 2025-09-19 19:46:23 -05:00
c7e713616a test: Remove vestigial skip for prompt embeds tests after landing v1 Prompt Embeds support (#25291) Andrew Sansom 2025-09-19 19:33:40 -05:00
a36c675817 Don't skip special tokens with hermes-style tool calling (#25281) Maximilien de Bayser 2025-09-19 21:33:25 -03:00
3da17c2cc2 [Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE #2969 (#25090) Lucas Kabela 2025-09-19 17:27:21 -07:00
14c1432789 [BugFix] Fix async scheduling CPU tensor race take 2 (#25279) Nick Hill 2025-09-19 16:34:07 -07:00
ee7a66dd9a allow disable flashinfer prefill (#25276) Lucia Fang 2025-09-19 15:59:41 -07:00
431535b522 Enable modelopt gemma3 nvfp4/fp8, make workflow more robust (#22771) Zhiyu 2025-09-19 15:40:33 -07:00
711e912946 [Compile] Fix Compile Warning for Ignoring MIN_BLOCK_PER_SM (#25193) Wentao Ye 2025-09-19 18:23:19 -04:00
e69e0b8b5f [Frontend] Responses API messages out, just harmony for now (#24985) Alec S 2025-09-19 17:40:16 -04:00
ddc9048394 Fix: Correct FusedMoE layer reference in auto_round quantization (#24818) David-Wen 2025-09-20 04:44:24 +08:00
b1a63d1b3b [BugFix] Make FlashInferMetadataBuilder non-blocking (#25040) nvjullin 2025-09-20 04:36:34 +08:00
48ecb4438b [Perf] Use FlashInfer RoPE for RotaryEmbedding.forward_cuda when available (#21126) Michael Goin 2025-09-19 16:06:49 -04:00
e57fc15971 Specify platform in pip-compile pre-commit hook so it runs on MacOS (#25273) Harry Mellor 2025-09-19 20:43:33 +01:00
4bdf400218 [Bugfix] Fix chunked a2_scales in modular kernels (#25264) bnellnm 2025-09-19 15:42:01 -04:00
7852b82b93 [Bugfix] GPT OSS Attritbute error on H100 (#25228) Varun Sundar Rabindranath 2025-09-19 15:14:09 -04:00
a2a5f79e09 Optimize triton unified attention performance for sliding window attention (#24390) qizixi 2025-09-19 12:07:26 -07:00

... 60 61 62 63 64 ...