Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

127ded0a9e [Ultravox] Use wrapped_model_config to instantiate inner model (#24679) Peter Salas 2025-09-11 11:52:24 -07:00
bb2b5126da [VLM] Migrate remain DP-supported ViT models to use disable_tp (#24363) Isotr0py 2025-09-12 02:30:41 +08:00
361ae27f8a [Docs] Fix formatting of transcription doc (#24676) Harry Mellor 2025-09-11 19:18:06 +01:00
e26fef8397 fix some typos (#24616) co63oc 2025-09-12 01:48:46 +08:00
c1eda615ba Fix model name included in responses (#24663) Harry Mellor 2025-09-11 18:47:51 +01:00
4aa23892d6 [Bugfix] Fix platform-specific routing in CustomOp implementations (#24444) Konrad Zawora 2025-09-11 19:15:01 +02:00
1fdd5c42d7 [Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111) Ilya Markov 2025-09-11 18:45:31 +02:00
bcbe2a4d9e [VLM] Optimize GLM4.5-V-style video processing to only decode necessary frames (#24161) Isotr0py 2025-09-12 00:44:34 +08:00
51d41265ad [Docs] Fix typos in EP deployment doc (#24669) Harry Mellor 2025-09-11 17:07:23 +01:00
4984a291d5 [Doc] Fix Markdown Pre-commit Error (#24670) Wentao Ye 2025-09-11 12:05:59 -04:00
404c85ca72 [Docs] Add transcription support to model (#24664) Nicolò Lucchesi 2025-09-11 16:39:01 +02:00
817beef7f3 [Bugifx] Fix qwen-next packed_modules_mapping (#24656) Jee Jee Li 2025-09-11 22:26:17 +08:00
4f6593b058 [HybridKVCache][Platform] Add support_hybrid_kv_cache for platform (#24646) Mengqing Cao 2025-09-11 21:47:58 +08:00
94e6b2d55f Allow users to specify kv cache memory size (#21489) Boyuan Feng 2025-09-11 06:41:07 -07:00
fd1ce98cdd [CI] Split mteb test from Language Models Test (#24634) wang.yuqi 2025-09-11 21:37:51 +08:00
d11ec124a0 [Bench] Add qwen-next in benchmark_moe.py (#24661) Jee Jee Li 2025-09-11 21:29:43 +08:00
f510715882 [build] add torch to tool.uv no-build-isolation-package (#24303) youkaichao 2025-09-11 21:19:44 +08:00
f946197473 [Docs] Fixes a typo in the qwen3next model name. (#24654) Tao He 2025-09-11 19:35:14 +08:00
0cd72a7b72 [XPU] add missing dependency tblib for XPU CI (#24639) Fanli Lin 2025-09-11 19:22:33 +08:00
5f5271f1ee Move LoRAConfig from config/__init__.py to config/lora.py (#24644) Harry Mellor 2025-09-11 12:01:38 +01:00
d6249d0699 Fix typing for safetensors_load_strategy (#24641) Harry Mellor 2025-09-11 11:41:39 +01:00
25bb9e8c65 [CI Failure] fix models/language/pooling/test_auto_prefix_cache_support.py (#24636) wang.yuqi 2025-09-11 18:31:23 +08:00
a1213fae5f [Misc] Add @NickLucche to codeowners (#24647) Nicolò Lucchesi 2025-09-11 11:18:09 +02:00
a8b0361c92 [CI] Split pooling from entrypoints Test (#24632) wang.yuqi 2025-09-11 16:53:09 +08:00
ed5ae4aace [Bugfix] Fix _synced_weight_loader (#24565) Kyuyeun Kim 2025-09-11 01:52:33 -07:00
0fc36463e0 [CI]Add transformers_utils to Async Engine, Inputs, Utils, Worker Test (#24615) Xingyu Liu 2025-09-11 01:52:10 -07:00
d14c4ebf08 [Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633) Michael Yao 2025-09-11 16:50:12 +08:00
ba6011027d [Docs] Update V1 doc to reflect whisper support (#24606) Russell Bryant 2025-09-11 04:50:08 -04:00
85df8afdae [Docs] Revise frameworks/anything-llm.md (#24489) Michael Yao 2025-09-11 16:50:05 +08:00
6aeb1dab4a [Bugfix] Fix incorrect import of CacheConfig (#24631) Cyrus Leung 2025-09-11 16:48:25 +08:00
e93f4cc9e3 Add the support for the qwen3 next model (a hybrid attention model). (#24526) Tao He 2025-09-11 15:32:09 +08:00
2048c4e379 [torchao] Support quantization configs using module swap (#21982) Jerry Zhang 2025-09-10 23:53:24 -07:00
d13360183a Remove redundant all gather + split (#23441) Chenxi Yang 2025-09-10 23:45:07 -07:00
9bd831f501 [Model] New model support for Motif-1-Tiny (#23414) TaehyunKim 2025-09-11 15:29:40 +09:00
e2b1f863aa [Doc]: fixing doc typos (#24635) Didier Durand 2025-09-11 08:19:28 +02:00
41329a0ff9 [Core] feat: Add --safetensors-load-strategy flag for faster safetensors loading from Lustre (#24469) shengshiqi-google 2025-09-11 06:10:01 +00:00
ee0bc5e1b4 Enable --profile in 'vllm bench throughput' (#24575) Tomas Ruiz 2025-09-11 08:06:19 +02:00
3d1393f6fc Kimi K2 Fused MoE kernels Optimization configs (#24597) Saman A. Pour 2025-09-10 23:06:16 -07:00
8a894084d2 [Engine][Chore] use local variable and remove output var assignment (#24554) Guy Stone 2025-09-11 02:05:42 -04:00
e2d8c27f68 [BugFix] Fix pipeline parallel (#24621) Nick Hill 2025-09-10 23:05:30 -07:00
29799ddacc [Bugfix] Add missing VIT backend dispatch on CPU (#24623) Li, Jiang 2025-09-11 13:28:41 +08:00
f17a6aa4ec [Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides (#24131) Peter Salas 2025-09-10 22:25:34 -07:00
6c8deacd72 [Bug] [Spec Decode] Fix model_initialization test and mismatch in aux_hidden_layers (#24613) Wenlong Wang 2025-09-10 21:23:18 -07:00
55b823ba0f Add @chaunceyjiang to codeowner for reasoning Reasoning and Tool parser (#24406) Chauncey 2025-09-11 12:23:04 +08:00
8c5a747246 [distributed] update known issues (#24624) youkaichao 2025-09-11 11:09:38 +08:00
5931b7e5d9 [Models][Quantization] Add quantization configuration update in Voxtral model (#24122) Alexandre Marques 2025-09-10 22:13:56 -04:00
cc99baf14d [Misc] Make timeout passable in init_distributed_environment (#24522) Jonathan Berkhahn 2025-09-10 15:41:12 -07:00
dcb28a332b [Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078) Hanjie Qiu 2025-09-10 18:31:10 -04:00
fba7856581 [Perf] Warmup FlashInfer attention during startup (#23439) Michael Goin 2025-09-10 18:03:17 -04:00
b5e383cd8b [gpt-oss] raise error for flashinfer backend without trtllm (#24482) Chen Zhang 2025-09-10 14:33:13 -07:00
9a161307f5 [torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends (#19767) Gregory Shtrasberg 2025-09-10 16:59:55 -04:00
37e8182bfe [v1] Add Whisper model support (encoder-decoder) (#21088) Russell Bryant 2025-09-10 16:53:35 -04:00
4db4426404 [CI] Fail subprocess tests with root-cause error (#23795) Nick Hill 2025-09-10 13:53:21 -07:00
a0933c3bd6 [Bugfix] Enable FP8 KV cache for FlashInfer and Triton backend on non-sm100 GPUs (#24577) Thien Tran 2025-09-11 03:33:41 +08:00
09e68bce34 [Misc] update log level debug to warning when process port is used by (#24226) rongfu.leng 2025-09-11 02:32:57 +08:00
9fb74c27a7 [Core] Support configuration parsing plugin (#24277) Xingyu Liu 2025-09-10 11:32:43 -07:00
4032949630 [Bugfix] Fix DeepEP config for DP4TP4 (#23619) Ming Yang 2025-09-10 10:37:56 -07:00
08abfa78ec [Bugfix] fix modelopt exclude_modules name mapping (#24178) tomeras91 2025-09-10 20:20:46 +03:00
2bef2d1405 [Logging] allow config logging stream (#24336) Shiyan Deng 2025-09-10 08:02:01 -07:00
36cacd0958 [Doc] Add documentation for GLM-4.5 series models: tool-calling and reasoning parser (#24589) Robin 2025-09-10 22:50:55 +08:00
bb3eb80d92 [Core] Split LoRA layers (#24574) Jee Jee Li 2025-09-10 22:47:51 +08:00
fcc0a3130a [CI] Fix tensorizer test assertion (#24545) pwschuurman 2025-09-10 06:57:36 -07:00
736569da8d [Platform] Custom ops support for LMhead and LogitsProcessor (#23564) zzhxxx 2025-09-10 21:26:31 +08:00
2eb9986a2d [BugFix] python collect_env.py and vllm collect-env compatibility with uv venv (#24066) Kay Yan 2025-09-10 21:25:33 +08:00
ccee371e86 [Docs] Fix warnings in mkdocs build (continued) (#24092) Hyogeun Oh (오효근) 2025-09-10 22:23:28 +09:00
c0bd6a684a Fix Auto_Round Quatization Loading on SM75 and Lower GPUs (#24217) RoadToNowhereX 2025-09-10 23:22:31 +10:00
3144d90217 fix some typos (#24167) co63oc 2025-09-10 21:21:23 +08:00
2f5e5c18de [CI/Build] bump timm dependency (#24189) Daniele 2025-09-10 15:20:59 +02:00
bd98842c8a [CI] Add PPL test for generation models (#24485) wang.yuqi 2025-09-10 21:16:39 +08:00
d6069887c6 [rocm] enable torchao quantization for rocm (#24400) Lifans 2025-09-10 06:16:21 -07:00
492196ed0e [CI/Build] split true unit tests to Entrypoints Unit Tests (#24418) Ye (Charlotte) Qi 2025-09-10 06:16:07 -07:00
f4f1a8df22 [BugFix] Ensure integrity of reused CPU tensors during async scheduling (#24527) Nick Hill 2025-09-10 06:15:14 -07:00
0b9a612fa3 [BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat (#24549) lacora 2025-09-10 06:14:55 -07:00
4c04eef706 [BugFix][Multi Modal] Fix TensorSchema shape mismatch in Molmo (#24559) Wenlong Wang 2025-09-10 06:14:27 -07:00
f36355abfd Move LoadConfig from config/__init__.py to config/load.py (#24566) Harry Mellor 2025-09-10 14:14:18 +01:00
9e3c3a7df2 [LoRA]: Add LoRA support to Mistral's Voxtral models (#24517) Yash Pratap Singh 2025-09-10 18:42:03 +05:30
6cbd41909e Feature/vit attention unification# 23880 (#23978) baonudesifeizhai 2025-09-10 09:10:14 -04:00
72d30108a0 Support for NemotronH Nano VLM (#23644) danielafrimi 2025-09-10 16:10:06 +03:00
8b83b93739 [Docs] Document the extra memory footprint overhead when using EPLB (#24537) Tyler Michael Smith 2025-09-10 09:09:49 -04:00
9dbefd88e9 [Docs] Improve organisation of API Reference nav (#24569) Harry Mellor 2025-09-10 14:08:21 +01:00
7c195d43da [ROCm][Bugfix] Fix Aiter RMSNorm (#23412) vllmellm 2025-09-10 21:08:03 +08:00
0ae43dbf8c [Attention] add DCP support for FLASH_ATTN_MLA backend (#24453) Lucas Wilkinson 2025-09-10 05:19:26 -04:00
267c80d31f [Model] Limit CPU threads for image transformations in InternVL to reduce cpu contention. (#24519) li-jinpeng 2025-09-10 16:45:44 +08:00
77f62613f9 Consolidate rendering parameters into RenderConfig dataclass (#24543) Flora Feng 2025-09-10 01:44:47 -07:00
feaf202e93 [Bugfix] Guard _may_reorder_batch for encoder-only models on CPU (#24319) (#24348) Remy 2025-09-10 15:24:42 +09:00
91130ae376 [docs] promo pytorch conf and ray summit (#24562) Simon Mo 2025-09-09 23:24:20 -07:00
e40827280b [Docs] Enable relative links in examples to function when rendered in the docs (#24041) Harry Mellor 2025-09-10 05:40:45 +01:00
4377b1ae3b [Bugfix] Update Run:AI Model Streamer Loading Integration (#23845) pwschuurman 2025-09-09 21:37:17 -07:00
009d689b0c [Core] Simplify and unify mm uuid handling & auto-generated mm hash overrides processing. (#24271) Chenheli Hua 2025-09-09 21:36:09 -07:00
0efdb5c3ba [gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading (#24154) Wei 2025-09-09 21:27:53 -07:00
53b42f4102 [BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 (#24392) Wenlong Wang 2025-09-09 21:24:23 -07:00
309d7aa401 [P/D] MultiConnector supports shutdown (#24425) Chauncey 2025-09-10 12:24:11 +08:00
b4a01aaf95 [KV Connector] More async support for get_num_new_matched_tokens (#23620) Yihua Cheng 2025-09-09 21:23:37 -07:00
83dd28aae4 [CI] Adjust threshold for flaky ngram spec decoding test (#24528) Nick Hill 2025-09-09 21:07:33 -07:00
f88e84016f [BugFix] Fix async core engine client finalizer (#24540) Nick Hill 2025-09-09 21:07:13 -07:00
3c2156b3af [Hardware][Apple-CPU] Enable native bfloat16 on Apple Silicon (M2 and later) (#24129) Ignacio Sica 2025-09-10 00:50:21 -03:00
7e7db04310 [CI] Retry flaky fp8 cutlass mla tests (#24536) Nick Hill 2025-09-09 20:33:10 -07:00
41f160b974 Add @heheda12345 to CODEOWNERS of KVCacheManager related code (#24546) Chen Zhang 2025-09-09 20:30:32 -07:00
dc625ea6b8 [Perf] Convert np array to torch tensor to index into block table for attn chunking (#24474) Yong Hoon Shin 2025-09-09 20:01:06 -07:00
b23fb78623 [Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. (#24538) bnellnm 2025-09-09 20:53:53 -04:00

... 64 65 66 67 68 ...