Commit Graph

  • c98be0a232 [Model] Enable DP for ViT in Qwen2-VL (#25445) Cyrus Leung 2025-09-23 13:17:10 +08:00
  • 5774b0a1da [NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend (#25121) Chendi.Xue 2025-09-22 23:17:42 -05:00
  • e8db44f883 [DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP (#24588) Varun Sundar Rabindranath 2025-09-23 00:01:09 -04:00
  • fafbe11af4 [Docs] Fix griffe warnings in vllm/lora/ops (#25369) Michael Yao 2025-09-23 11:42:58 +08:00
  • 78237e43bf [Bugfix] Remove contiguous output req for context parallel MLA (#25414) Michael Goin 2025-09-22 23:26:32 -04:00
  • eea1783989 [benchmarks]allow skip ready check for bench serve (#25420) Lucia Fang 2025-09-22 20:21:48 -07:00
  • f225ea7dd9 [XPU] Fix compile_size is None case. (#25433) Kunshang Ji 2025-09-23 11:09:00 +08:00
  • fc97733da8 [feat] Support MRoPE + YaRN (#25384) JJJYmmm 2025-09-23 11:04:47 +08:00
  • 4741239db7 [Bug] Fix Long Context OOM Issue (#25290) Wentao Ye 2025-09-22 22:04:15 -04:00
  • c625f9043c [V0 deprecation] Remove _set_default_args_v0 function (#25409) Isotr0py 2025-09-23 09:52:09 +08:00
  • 6fa78d8f23 [V0 deprecation] Remove platform v1 controling interface (#25410) Isotr0py 2025-09-23 09:48:12 +08:00
  • 9949aa2ef1 [Perf] Apply torch.compile for per_block_cast_to_fp8 (#24611) Wentao Ye 2025-09-22 21:42:45 -04:00
  • 0b7bed9c38 [Performance] Remove input pads in cutlass_mla and optimize v_proj output handling (#25184) Alexander Matveev 2025-09-22 21:20:53 -04:00
  • ac0048c0ae [BugFix] [DP/EP] Fix slow execution when BS <= DP (#25407) Matthew Bonanni 2025-09-22 20:26:17 -04:00
  • 090197034f [Bugfix] Fix missing clear_connector_metadata (#25397) Nicolò Lucchesi 2025-09-23 02:10:59 +02:00
  • f31ff87460 [Core] Drop overly aggressive whisper assertion (#25408) Russell Bryant 2025-09-22 20:09:52 -04:00
  • d588cd2406 [Bugfix] fix custom op test (#25429) Luka Govedič 2025-09-22 20:07:43 -04:00
  • 45d7d852d3 [Frontend] Responses API MCP tools for built in tools and to pass through headers (#24628) Alec S 2025-09-22 19:38:19 -04:00
  • 8bed179109 [TPU] update torch_xla dependency for PyPI compatibility (#25278) Johnny Yang 2025-09-22 16:14:44 -07:00
  • f552d5e578 [CI/Build] Skip Qwen3-VL initialization tests until models are actually released (#25394) Cyrus Leung 2025-09-23 04:18:24 +08:00
  • 8db2939289 [KV offload][5/N] Add CPUOffloadingSpec (#24251) Or Ozeri 2025-09-22 22:30:36 +03:00
  • d5e0fca264 [torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091), fix test (#24376), and prep for custom op matching (#24604) (#24542) Luka Govedič 2025-09-22 15:30:05 -04:00
  • 8d0ee5a564 [misc] Remove RFC review hours reference (#25416) Simon Mo 2025-09-22 12:16:59 -07:00
  • 922979bfcc [DP] support torchrun external launcher with Data Parallelism (#24899) Lucia Fang 2025-09-22 12:06:05 -07:00
  • 239ef0c1ac [CI Failure] Fix fp8 kv cache on <SM90 (#25396) Michael Goin 2025-09-22 14:27:51 -04:00
  • 1d7f95b85c [Compiler] Disable Inductor standalone compile by default (#25391) ElizaWszola 2025-09-22 19:37:46 +02:00
  • cfbee3d0e7 [CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274) Daisy-Ma-coder 2025-09-22 10:37:43 -07:00
  • 06a41334c7 [EPLB] Reduce EPLB Inference Overhead (#24573) Bowen Wang 2025-09-22 09:31:05 -07:00
  • 175811e3b5 [V1][Attention] Split triton_attn in triton-only and rocm specific backends (#24648) Burkhard Ringlein 2025-09-22 17:20:28 +02:00
  • c10101a3eb [Bugfix] Fix several issues with p2p xPyD in GET type (#23993) Csrayz 2025-09-22 22:53:13 +08:00
  • ac243886b0 [Kernel] MI-300X triton moe configs (#23445) Sara-KS 2025-09-22 09:29:54 -05:00
  • 3d2c56b7a9 Make mypy behave like a proper pre-commit hook (#25313) Harry Mellor 2025-09-22 13:23:45 +01:00
  • 64c824cd78 Make pickle import check fast (#25379) Harry Mellor 2025-09-22 12:08:25 +01:00
  • 417a164af6 [Misc] Remove unused encoder-decoder error strings (#25374) Cyrus Leung 2025-09-22 19:04:32 +08:00
  • b6f01bd9a7 refactor: abstract graph mode support into platform interface (#25161) Yizhou 2025-09-22 18:22:29 +08:00
  • 4cf71cc88a [TPU] Deprecate xm.mark_step in favor of `torch_xla.sync (#25254) Nicolò Lucchesi 2025-09-22 12:12:57 +02:00
  • a66d131381 [TPU][Bugfix][CI] Fix broken tests/build dependency (#25255) Nicolò Lucchesi 2025-09-22 11:55:04 +02:00
  • 21467f9a1c Enable Eagle3 speculative decoding for GPT-OSS model (#25246) Eldar Kurtić 2025-09-22 10:50:39 +02:00
  • f92d952632 [V0 Deprecation] Remove MultiModalPlaceholderMap (#25366) Cyrus Leung 2025-09-22 16:49:19 +08:00
  • 6d0b827cbd [V0 Deprecation] Remove V0-only methods in multi-modal registry (#25362) Cyrus Leung 2025-09-22 13:58:26 +08:00
  • 0eecb31663 [Bugfix] Fix hermes tool parser handling of non-string argument types (#22002) WeiQing Chen 2025-09-22 11:35:39 +08:00
  • 793be8d057 [Docs] GSM8K Accuracy Evaluation doc update (#25360) WeiQing Chen 2025-09-22 10:49:13 +08:00
  • 7b57a433da [Model] Support Dots OCR (#24645) Roger Wang 2025-09-21 19:24:40 -07:00
  • 5aeb925452 Multimodal - audio tests (#25285) Deboleina 2025-09-21 19:07:11 -04:00
  • 04d3752329 [Bugfix][V0 Deprecation][CI] use async mock and await for async method (#25325) Yang Liu 2025-09-21 16:06:16 -07:00
  • bc6e542d9f Remove V0 attention backends (#25351) Woosuk Kwon 2025-09-21 16:03:28 -07:00
  • af7dfb0d1a [Perf] Further optimization for Qwen3-VL fast_pos_embed_interpolate (#25347) Isotr0py 2025-09-22 04:12:45 +08:00
  • 1c3ffdbecc [V0 Deprecation] Remove V0 sampling metadata (#25345) Woosuk Kwon 2025-09-21 10:37:11 -07:00
  • c438b2951c feat: Enable engine-level arguments with speculators models (#25250) Rahul Tuli 2025-09-21 22:34:45 +05:30
  • 0ff8ebb2d7 [V0 Deprecation] Remove async_output_proc, preemption mode, delay factor (#25334) Woosuk Kwon 2025-09-21 08:52:32 -07:00
  • 26e673fe93 [V0 Deprecation] Remove V0 Sequence class & Sampler (#25332) Woosuk Kwon 2025-09-21 08:52:15 -07:00
  • 65a5910ce3 [Optimization] Cache chat template result when processor fails to be loaded (#25341) Cyrus Leung 2025-09-21 19:41:02 +08:00
  • 9aea7373ff [Bugfix] Typos in error message for missing model config file (#25339) Simon Danielsson 2025-09-21 13:36:47 +02:00
  • 30d08911f7 [MM][Perf] Minor Optimization on Qwen3-VL fast_pos_embed_interpolate (#25337) Roger Wang 2025-09-21 04:05:20 -07:00
  • cf56cf78b4 [V1] Add sliding window support to Flex Attention backend (#24089) Isotr0py 2025-09-21 13:08:07 +08:00
  • 7ed82d1974 [V0 Deprecation] Remove V0 MP executor (#25329) Woosuk Kwon 2025-09-20 21:26:35 -07:00
  • 12dbd834cf [V0 Deprecation] Remove from_seq_group methods (#25330) Woosuk Kwon 2025-09-20 21:10:48 -07:00
  • 035fd2bd2c [Multi Modal][Performance] Fused Q,K's apply_rope in more models (#25005) Wenlong Wang 2025-09-20 20:55:10 -07:00
  • 1cd885bd54 [V0 Deprecation] Remove V0 model runner base & simplify worker base (#25328) Woosuk Kwon 2025-09-20 20:49:09 -07:00
  • 62b38dc832 [Doc] improve test-pipeline.yaml documentation (#25305) Huamin Li 2025-09-20 20:29:12 -07:00
  • c99db8c8dd [V0 Deprecation] Remove V0 core (#25321) Woosuk Kwon 2025-09-20 19:58:26 -07:00
  • 72dd1595b4 [CI] Skip tests failing on main (#25326) Woosuk Kwon 2025-09-20 19:57:46 -07:00
  • 572ddf83ce [Chore] Remove unused sampler in models (#25324) Woosuk Kwon 2025-09-20 19:53:20 -07:00
  • 86647d1cd0 [V0 Deprecation] Remove V0 Output Processor (#25320) Woosuk Kwon 2025-09-20 17:57:20 -07:00
  • 52c2a8d4ad [V0 Deprecation] Remove LLMEngine (#25033) Woosuk Kwon 2025-09-20 17:56:30 -07:00
  • 367a480bd3 [Docs] Fix warnings in vllm/profiler and vllm/transformers_utils (#25220) Michael Yao 2025-09-21 07:39:47 +08:00
  • bef180f009 [V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307) Cyrus Leung 2025-09-21 01:50:58 +08:00
  • d88918e4c2 [Core] Enable sharded state loader for V1 engine and enhance test coverage (#25308) lirong 2025-09-20 21:15:22 +08:00
  • 3c713a9711 [Model] Cleanup InternViT's data parallel implementation (#25306) Isotr0py 2025-09-20 20:46:24 +08:00
  • bf8b26cad1 Generate _ModelInfo properties file when loading to improve loading speed (#23558) Manoel Marques 2025-09-20 07:51:13 -04:00
  • 032d661d27 [Docs] Fix warnings in mkdocs build (continued) (#25042) Wenlong Wang 2025-09-20 04:45:18 -07:00
  • e08a3a3fdb [CI Failure] Disable FlashInfer RoPE to unblock CI (#25299) Michael Goin 2025-09-20 04:16:56 -04:00
  • 3d9a1d2de5 [V1] Support LLM.apply_model (#18465) Cyrus Leung 2025-09-20 15:14:35 +08:00
  • be874c0201 [Bugfix] Fix Qwen3-VL-MoE weight loading for EP (#25300) Roger Wang 2025-09-20 00:04:05 -07:00
  • 9607d5eb44 [Hybrid Allocator] Support full attention with different hidden size (#25101) Chen Zhang 2025-09-19 23:43:59 -07:00
  • c60e6137f0 [Optimization] Avoid repeated model architecture conversion for pooling models (#25261) Cyrus Leung 2025-09-20 13:30:22 +08:00
  • f91480b2d4 [Bugfix] fix tool call arguments is empty (#25223) Chauncey 2025-09-20 13:29:54 +08:00
  • 6c5f82e5aa [BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention (#25298) Chendi.Xue 2025-09-19 23:41:23 -05:00
  • b7f186bbb3 [BugFix] Exclude self when checking for port collision (#25286) Nick Hill 2025-09-19 21:28:31 -07:00
  • 3642909617 [BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) (#25268) JartX 2025-09-20 05:18:13 +02:00
  • c308501cb6 Improve weight loading for encoder models in Transformers backend (#25289) Harry Mellor 2025-09-20 04:11:03 +01:00
  • 535d80056b [Misc] Support more collective_rpc return types (#25294) Nick Hill 2025-09-19 19:02:38 -07:00
  • a25ade5d47 [BugFix] Ensure appropriate guards in destructors (#25284) Nick Hill 2025-09-19 18:06:34 -07:00
  • 8945b001db [torch.compile] CUDAGraph Inductor partition integration (#24281) Boyuan Feng 2025-09-19 18:02:15 -07:00
  • b8a287a0a8 [docs] Prompt Embedding feature support (#25288) Andrew Sansom 2025-09-19 19:46:23 -05:00
  • c7e713616a test: Remove vestigial skip for prompt embeds tests after landing v1 Prompt Embeds support (#25291) Andrew Sansom 2025-09-19 19:33:40 -05:00
  • a36c675817 Don't skip special tokens with hermes-style tool calling (#25281) Maximilien de Bayser 2025-09-19 21:33:25 -03:00
  • 3da17c2cc2 [Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE #2969 (#25090) Lucas Kabela 2025-09-19 17:27:21 -07:00
  • 14c1432789 [BugFix] Fix async scheduling CPU tensor race take 2 (#25279) Nick Hill 2025-09-19 16:34:07 -07:00
  • ee7a66dd9a allow disable flashinfer prefill (#25276) Lucia Fang 2025-09-19 15:59:41 -07:00
  • 431535b522 Enable modelopt gemma3 nvfp4/fp8, make workflow more robust (#22771) Zhiyu 2025-09-19 15:40:33 -07:00
  • 711e912946 [Compile] Fix Compile Warning for Ignoring MIN_BLOCK_PER_SM (#25193) Wentao Ye 2025-09-19 18:23:19 -04:00
  • e69e0b8b5f [Frontend] Responses API messages out, just harmony for now (#24985) Alec S 2025-09-19 17:40:16 -04:00
  • ddc9048394 Fix: Correct FusedMoE layer reference in auto_round quantization (#24818) David-Wen 2025-09-20 04:44:24 +08:00
  • b1a63d1b3b [BugFix] Make FlashInferMetadataBuilder non-blocking (#25040) nvjullin 2025-09-20 04:36:34 +08:00
  • 48ecb4438b [Perf] Use FlashInfer RoPE for RotaryEmbedding.forward_cuda when available (#21126) Michael Goin 2025-09-19 16:06:49 -04:00
  • e57fc15971 Specify platform in pip-compile pre-commit hook so it runs on MacOS (#25273) Harry Mellor 2025-09-19 20:43:33 +01:00
  • 4bdf400218 [Bugfix] Fix chunked a2_scales in modular kernels (#25264) bnellnm 2025-09-19 15:42:01 -04:00
  • 7852b82b93 [Bugfix] GPT OSS Attritbute error on H100 (#25228) Varun Sundar Rabindranath 2025-09-19 15:14:09 -04:00
  • a2a5f79e09 Optimize triton unified attention performance for sliding window attention (#24390) qizixi 2025-09-19 12:07:26 -07:00