Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

84d57342b6 [BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-thinker (#26004) Wenlong Wang 2025-10-01 01:03:25 -07:00
57b46d769e [Doc] updating torch.compile doc link (#25989) nadathurv 2025-10-01 00:04:56 -07:00
f48b6a03ba [Misc]allow disable pynccl (#25421) Lucia Fang 2025-09-30 23:04:13 -07:00
e4beabd2c8 [BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 (#25988) v0.11.0rc4 Lucas Wilkinson 2025-10-01 00:58:31 -04:00
febb688356 [Bugfix] Fix __syncwarp on ROCM (#25996) Zhewen Li 2025-09-30 21:15:11 -07:00
a1825fe645 [MM] Add text-only mode for Qwen3-VL (#26000) Roger Wang 2025-09-30 21:13:42 -07:00
bab9231bf1 [Model] MTP fallback to eager for DeepSeek v32 (#25982) Lucia Fang 2025-09-30 18:53:22 -07:00
c214d699fd [spec decode] Consolidate speculative decode method name for MTP (#25232) qizixi 2025-09-26 15:27:05 -07:00
c3dfb0f6dd [Bench] Add DeepSeekV32 to MoE benchmark (#25962) Jee Jee Li 2025-10-01 05:13:48 +08:00
83f3c9beae [bugfix][deepseek] fix flashmla kernel selection (#25956) youkaichao 2025-10-01 00:30:36 +08:00
d0b178cef1 [NIXL] Add support for MLA caches with different latent dim (#25902) Nicolò Lucchesi 2025-09-30 14:18:29 +02:00
b3230e1ac0 [New Model] DeepSeek-V3.2 (Rebased to Main) (#25896) Yongye Zhu 2025-09-30 05:14:41 -04:00
03df0fb5d2 [BugFix] Fix DP/EP hang (#25906) Lucas Wilkinson 2025-09-30 00:18:59 -04:00
9471879bd4 [Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909) Wentao Ye 2025-09-29 21:15:19 -04:00
ab5b6459df [Bugfix] Fallback ViT attn backend to SDPA for blackwell (#25851) Roger Wang 2025-09-28 23:03:51 -07:00
2a69ab4899 Update to Transformers v4.56.2 (#24638) Harry Mellor 2025-10-01 06:07:07 +01:00
8d7da92fd7 [BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 (#25988) Lucas Wilkinson 2025-10-01 00:58:31 -04:00
e952eee698 [Bugfix] Fix __syncwarp on ROCM (#25996) Zhewen Li 2025-09-30 21:15:11 -07:00
66bca9b8bd [MM] Add text-only mode for Qwen3-VL (#26000) Roger Wang 2025-09-30 21:13:42 -07:00
99028fda44 Fix INT8 quantization error on Blackwell GPUs (SM100+) (#25935) Param 2025-09-30 22:19:53 -04:00
1244948885 [Log] Optimize Log for FP8MOE (#25709) Wentao Ye 2025-09-30 22:18:43 -04:00
a73f6491c8 Update launch_bounds_utils.h for correct compile on Multiple Cuda Arch - PTXAS out of range Warning (#25843) Salvatore Cena 2025-10-01 04:18:19 +02:00
001e50c92c [Model] MTP fallback to eager for DeepSeek v32 (#25982) Lucia Fang 2025-09-30 18:53:22 -07:00
96ebcaa3ad [Misc] Make EP kernels install script support uv (#25785) Lucas Wilkinson 2025-09-30 19:38:34 -04:00
5db1870bb9 [gpt-oss] use vLLM instead of openai types for streaming (#25186) Andrew Xia 2025-09-30 15:47:07 -07:00
2ce26b9b5d [Docs] Remove API Reference from search index (#25949) Harry Mellor 2025-09-30 23:10:02 +01:00
a388252ac4 Add explicit pooling classes for the Transformers backend (#25322) Harry Mellor 2025-09-30 23:07:06 +01:00
9a9f48dff7 [V1] [P/D] Add Support for KV Load Failure Recovery (#19330) David Ben-David 2025-10-01 00:57:08 +03:00
67f3fb0844 [Bench] Add DeepSeekV32 to MoE benchmark (#25962) Jee Jee Li 2025-10-01 05:13:48 +08:00
43b752c325 [Llama4] [multimodal] Fix misplaced dtype cast of cos_sin_cache in Llama4VisionRotaryEmbedding (#25889) cjackal 2025-10-01 05:35:15 +09:00
cfd302db9b OffloadingConnector: Fix GPU block tracking bug (#25856) Or Ozeri 2025-09-30 22:53:04 +03:00
fb610ae684 [Docs] Add moe kernel features doc (#25297) bnellnm 2025-09-30 15:03:15 -04:00
2f652e6cdf [Doc] Improve MM Pooling model documentation (#25966) Cyrus Leung 2025-10-01 02:58:29 +08:00
e6a226efba [Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype' (#25958) Wentao Ye 2025-09-30 14:13:03 -04:00
a2e6fa7e03 [bugfix][deepseek] fix flashmla kernel selection (#25956) youkaichao 2025-10-01 00:30:36 +08:00
9f1c4ecaf2 [Bugfix] Token type and position embeddings fail to be applied to inputs_embeds (#25922) Cyrus Leung 2025-10-01 00:23:12 +08:00
ef283548f7 [Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging (#25895) Pavani Majety 2025-09-30 07:51:31 -07:00
f4db5e6de1 [Bugfix][Model] Fix inference for Hunyuan dense models (#25354) Anion 2025-09-30 22:38:07 +08:00
099aaee536 Add Hugging Face Inference Endpoints guide to Deployment docs (#25886) Sergio Paniego Blanco 2025-09-30 16:35:06 +02:00
35fe398c7c [Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858) Asaf Joseph Gardin 2025-09-30 17:30:44 +03:00
bb6d43047e [Fix] Improve CPU backend compatibility for RISC-V (#25816) ihb2032 2025-09-30 21:48:07 +08:00
bc546f76a1 [CI] Move applicable tests to CPU (#24080) Reza Barazesh 2025-09-30 09:45:20 -04:00
80608ba5af [NIXL] Add support for MLA caches with different latent dim (#25902) Nicolò Lucchesi 2025-09-30 14:18:29 +02:00
e184c9c510 [perf] Use CPU tensor to reduce GPU->CPU sync (#25884) Lehua Ding 2025-09-30 19:51:16 +08:00
d7e34b4210 [Model] Move vision_feature_select_strategy into resolve_visual_encoder_outputs (#25938) Cyrus Leung 2025-09-30 19:24:57 +08:00
ef6e0e7132 [Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 (#25936) CSWYF3634076 2025-09-30 19:11:21 +08:00
1ad3aca682 Updated TRL integration docs (#25684) Sergio Paniego Blanco 2025-09-30 12:10:55 +02:00
8d0afa9b42 [Doc] Add Cambricon MLU support (#25942) a120092009 2025-09-30 17:59:47 +08:00
fa7e254a7f [New Model] DeepSeek-V3.2 (Rebased to Main) (#25896) Yongye Zhu 2025-09-30 05:14:41 -04:00
e23cacda35 [Bugfix]: Clean up chunked prefill logging when using whisper (#25075) Simon Danielsson 2025-09-30 10:17:49 +02:00
2e1b8bc2b6 [Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not (#25925) Zhou Jiahao 2025-09-30 16:15:23 +08:00
e47433b3c1 [BugFix] Pass config_format via try_get_generation_config (#25912) acisseJZhong 2025-09-29 22:09:50 -07:00
23194d83e8 [BugFix] Fix DP/EP hang (#25906) Lucas Wilkinson 2025-09-30 00:18:59 -04:00
61aedb5ffe MoveVllmConfig from config/__init__.py to config/vllm.py (#25271) Harry Mellor 2025-09-30 03:49:49 +01:00
d3bd171123 [Benchmark] Support benchmark throughput for external launcher DP (#25913) Zhuohan Li 2025-09-29 18:43:57 -07:00
89e4050af4 [Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909) Wentao Ye 2025-09-29 21:15:19 -04:00
78a47f87ce Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models (#25717) Andrew Sansom 2025-09-29 19:10:58 -05:00
6a113d9aed [V0 Deprecation] Remove vllm.worker and update according imports (#25901) Aaron Pham 2025-09-29 19:26:11 -04:00
2e4fe48c37 [NIXL] Increase default KV block eviction timeout on P (#25897) Nicolò Lucchesi 2025-09-29 23:35:14 +02:00
8eb0a1d906 [Doc] Polish example for torchrun dp (#25899) Zhuohan Li 2025-09-29 14:31:34 -07:00
fea3e476aa [Kernel] Chunk-aligned mamba2 (#24683) Thomas Parnell 2025-09-29 23:18:25 +02:00
61a3431613 [Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so (#25605) Gregory Shtrasberg 2025-09-29 17:01:50 -04:00
9bedac9623 [Doc] Add documentation for vLLM continuous benchmarking and profiling (#25819) Naman Lalit 2025-09-29 13:49:49 -07:00
c42ff4f4fd [BugFix][torch.compile] KV scale calculation issues with FP8 quantization (#25513) Adrian Abeyta 2025-09-29 14:52:04 -05:00
d5ab28511c [Bugfix] Use correct key "ignore" for config.json non-quantized layers (#25706) Lee Nau 2025-09-29 12:07:29 -07:00
e61eb5e09d [Model] Remove MotifForCausalLM (#25866) Jee Jee Li 2025-09-30 00:36:30 +08:00
0899ba5b42 [CI/Build] Include Transformers backend test in nightly transformers test (#25885) Isotr0py 2025-09-30 00:33:39 +08:00
145ac73317 [Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue (#25883) Rahul Tuli 2025-09-29 21:07:20 +05:30
d0d138bc55 [Nixl][P/D] Add cuda2cpu support (HD->DH transfer) (#24690) Chenxi Yang 2025-09-29 07:31:51 -07:00
43227236ec [torch.compile] serialize cudagraph_mode as its enum name instead of value (#25868) Jiangyun Zhu 2025-09-29 21:54:52 +08:00
8616300ae2 [Model][Bugfix] Fix issues in MiDashengLM implementation for quantized models (#25854) Zhou Jiahao 2025-09-29 18:59:04 +08:00
edbaadd91f [Bugfix] Fix requirements paths in install instructions (#25827) Yingjun Mou 2025-09-29 03:49:35 -07:00
9360d34fa1 update to latest deepgemm for dsv3.2 (#25871) youkaichao 2025-09-29 17:51:43 +08:00
1b67b04656 [Misc] Remove more get_input_embeddings_v0 (#25857) Cyrus Leung 2025-09-29 16:03:37 +08:00
bd51f78e39 [V0 Deprecation][Models] Remove all V0 condition for mm embeddings merge (#25331) Isotr0py 2025-09-29 14:09:18 +08:00
65ecb4f134 [Bugfix] Fallback ViT attn backend to SDPA for blackwell (#25851) Roger Wang 2025-09-28 23:03:51 -07:00
8ce5d3198d [P/D] NIXL Updates (#25844) v0.11.0rc3 Robert Shaw 2025-09-29 00:46:30 -04:00
09c2cbc04a [Bugfix] fix Qwen3VLMoe load when pp > 1 (#25838) JJJYmmm 2025-09-29 01:56:12 +08:00
143844fa43 [XPU]Fix xpu spec decoding UTs, avoid using cuda graph (#25847) Kunshang Ji 2025-09-29 13:15:10 +08:00
219cfbe7f6 Add Phi4FlashForCausalLM to _PREVIOUSLY_SUPPORTED_MODELS (#25832) Thomas Parnell 2025-09-29 07:08:17 +02:00
9b44a7d926 [P/D] NIXL Updates (#25844) Robert Shaw 2025-09-29 00:46:30 -04:00
a3ae45a38c [Misc] fix tests failure by using current_platform (#25825) Juechen Liu 2025-09-28 21:18:57 -07:00
0307428d65 Remove redundant cudagraph dispatcher warning (#25841) Michael Goin 2025-09-28 17:12:42 -04:00
471997adf6 [Bugfix] fix Qwen3VLMoe load when pp > 1 (#25838) JJJYmmm 2025-09-29 01:56:12 +08:00
b1ded114b9 Update GLM-4.5 Doc transformers version (#25830) Yuxuan Zhang 2025-09-28 20:05:51 +08:00
f4e4088c99 Fix random dataset mismatched token length with config. (#24937) weiliang 2025-09-28 16:23:44 +08:00
4c347044c9 [VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling (#25557) v0.11.0rc2 Isotr0py 2025-09-28 12:21:01 +08:00
19e7ab7315 [Bugfix] Fix Qwen3-VL regression from #24982 (#25814) Roger Wang 2025-09-27 20:21:09 -07:00
6de3d431d9 [MM] Optimize memory profiling for scattered multimodal embeddings (#25810) Roger Wang 2025-09-27 19:17:58 -07:00
b14773bd64 [Bugfix][NIXL] Fix Async Scheduler timeout issue (#25808) Nicolò Lucchesi 2025-09-27 21:17:35 +02:00
26a7a33b88 [Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982) Tyler Michael Smith 2025-09-27 10:22:28 -04:00
5aa5811a16 [CI] Fix FlashInfer AOT in release docker image (#25730) Michael Goin 2025-09-26 17:11:40 -04:00
c2fa2d4dc9 [Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL (#25788) Wentao Ye 2025-09-26 23:44:52 -04:00
32335c8b34 Add option to restrict media domains (#25783) Russell Bryant 2025-09-26 21:23:52 -04:00
04c2b26972 Add filtering for chat template kwargs (#25794) Russell Bryant 2025-09-27 06:46:49 -04:00
ee10d7e6ff Validate API tokens in constant time (#25781) Russell Bryant 2025-09-27 06:09:26 -04:00
bb79c4da2f Reduce the Cuda Graph memory footprint when running with DBO (#25779) Sage Moore 2025-09-26 15:29:56 -07:00
0efd540dbc [VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling (#25557) Isotr0py 2025-09-28 12:21:01 +08:00
6144754014 [Bugfix] Fix Qwen3-VL regression from #24982 (#25814) Roger Wang 2025-09-27 20:21:09 -07:00
69311446ba [MM] Optimize memory profiling for scattered multimodal embeddings (#25810) Roger Wang 2025-09-27 19:17:58 -07:00

... 57 58 59 60 61 ...