Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

4c23690f43 [Attention] FlashAttention ViT support, make default backend (#28763) Matthew Bonanni 2025-11-18 23:06:21 -05:00
814843e021 Enable bitsandbytes quantization on AMD GPUs that use warp size 32 (#27307) Strahinja Stamenkovic 2025-11-19 04:12:31 +01:00
20852c8f4c [CPU] Refactor CPU WNA16 (#28826) Li, Jiang 2025-11-19 10:32:00 +08:00
40b6b38f2c [Core] Switch Flat logprob control from environment variable to SamplingParams (#28914) Jialin Ouyang 2025-11-18 18:10:02 -08:00
da94c7c0eb Move online quantization to model.load_weights (#26327) Jerry Zhang 2025-11-18 16:52:41 -08:00
1395461f5f [Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587) tomeras91 2025-11-19 02:49:36 +02:00
9912b8ccb8 [Build] Add OpenAI triton_kernels (#28788) Varun Sundar Rabindranath 2025-11-18 19:45:20 -05:00
49ef847aa8 [NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 (#28938) Johnny 2025-11-19 01:44:27 +01:00
67745d189f Supress verbose logs from model_hosting_container_standards (#28949) Michael Goin 2025-11-18 15:29:06 -05:00
2a2d5d2780 Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985) Kunshang Ji 2025-11-19 03:34:36 +08:00
c3e2978620 [NIXL] fix cpu PD after physical <> logical block_size PR (#28904) Chendi.Xue 2025-11-18 13:03:23 -06:00
e4bb2684bc [Models] Replace all nn.Conv2d with vLLM's Conv2dLayer (#28842) Isotr0py 2025-11-19 02:56:04 +08:00
c64c0b78de [chore] Move the rest of wikimedia url to S3 (#28921) Kevin H. Luu 2025-11-18 09:44:18 -08:00
0af3d4f0df [FEAT] [AITER] [ROCm] integrate aiter sampling ops (#26084) vllmellm 2025-11-19 01:28:34 +08:00
da8dadf68b [Minor] Rename ec_producer field to is_ec_producer (#28884) Nick Hill 2025-11-18 09:26:07 -08:00
f226a3f0c1 [CI][NIXL] Change default block_size for tests (#28927) Nicolò Lucchesi 2025-11-18 18:22:30 +01:00
c2612371ad [Model] Add Gemma3 GGUF multimodal support (#27772) Luciano Martins 2025-11-18 13:56:29 -03:00
49a986ecd4 [Benchmark] multi_turn: Report warmup-inclusive runtime (#28937) Ido Segev 2025-11-18 18:38:22 +02:00
f6aa122698 [CI Sprint] Quantization CI Cleanup (#24130) Alex 2025-11-18 08:21:48 -06:00
184b12fdc6 [Bugfix][NIXL] Fix block_size_ratio when logical !=physical blocks (#28925) Nicolò Lucchesi 2025-11-18 15:07:50 +01:00
b9489f51e1 [Model][Perf] Use cos and sin cache in QwenVL (#28798) Canlin Guo 2025-11-18 19:51:54 +08:00
285eaa4285 [Bugfix] Safeguard against missing backend in AttentionBackendEnum (#28846) Song Zhixin 2025-11-18 18:53:44 +08:00
439368496d [BugFix] Fix PP/async scheduling with pooling models (#28899) v0.11.1 Nick Hill 2025-11-18 00:20:45 -08:00
896e41ae04 [CI/Build] Replace wikipedia url with local server ones (#28908) Isotr0py 2025-11-18 16:10:55 +08:00
5bb1da5190 [MISC] Remove format.sh (#28906) Kuntai Du 2025-11-18 13:28:31 +08:00
5bdd155277 [CI] Fix async scheduling + spec decoding test flake (#28902) Nick Hill 2025-11-17 21:26:32 -08:00
0168f69e50 [Misc] Remove unnecessary parentheses from log statements (#28897) Ning Xie 2025-11-18 12:33:46 +08:00
083cf326dc [Doc]: fix typos in various files (#28863) Didier Durand 2025-11-18 05:32:14 +01:00
bf9e1e8767 [Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields (#28872) Cyrus Leung 2025-11-18 12:30:29 +08:00
3ddcf46011 [Refactor] Remove Unused Func in Batch Invariant (#28881) Wentao Ye 2025-11-17 23:29:29 -05:00
d0a73620cc [ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss (#28638) xuebwang-amd 2025-11-18 11:16:45 +08:00
88ab591f0b Run macos smoke test workflow on main commit (#28752) Michael Goin 2025-11-17 22:16:03 -05:00
b6e04390d3 [Bugfix] Fix Kimi-K2 tool parser concatenated tool calls parsing (#28831) Benjamin Bartels 2025-11-18 03:13:25 +00:00
552cac95b5 [Misc] Fix wrong comment in scheduler (#28880) Zhuohan Li 2025-11-17 15:32:22 -08:00
61485844fc [BugFix] Corner case that could cause out-of-sync with external launcher mode and dp >1 (#28774) Bangsheng Tang 2025-11-17 15:22:11 -08:00
f77bce001a [Model] Add Afmoe architecture implementation (#28332) Pranav 2025-11-17 15:11:20 -08:00
a289cc1dde [Test] Batch Invariant: Rename and organize tests (#27421) Wentao Ye 2025-11-17 18:09:47 -05:00
95ae50b7d1 [Quantization] [Eagle] Add complete quantization support to the draft model in Eagle (#28435) Shreyas Kulkarni 2025-11-17 18:01:34 -05:00
7765e5ba75 [BugFix] Fix PP performance and PP kv connector output regression (#28768) Nick Hill 2025-11-17 14:08:50 -08:00
d8874c61a5 [Core] Async Scheduling X Spec Decoding Compatibility (#24799) Ronald 2025-11-18 04:16:20 +08:00
f8b19c0ffd [Bugfix] Fix GPT-OSS on AMD after #28603 (#28816) Zhewen Li 2025-11-17 10:15:26 -08:00
e42bd8c2e3 Cast return value to int64_t for cache size (#28814) tiehexue 2025-11-18 00:02:32 +08:00
7f064491f8 [Bugfix][Perf] Revert applying HF processor on text-only inputs for multimodal models (#28858) Roger Wang 2025-11-17 06:49:25 -08:00
64e39d667c [BugFix] Temporary fix for IMA with MTP = 2 and full-cg (#28315) Lucas Wilkinson 2025-11-17 09:41:22 -05:00
1b82fb0ad3 [XPU] work around for sp, avoid custom op import error (#28822) Kunshang Ji 2025-11-17 21:16:44 +08:00
d4acf518d0 [Metrics] Fix KV cache usage percent metric multiproc (#28792) Jae-Won Chung 2025-11-17 04:54:15 -05:00
ab01cd14e5 [BugFix] Fix glm4_moe_mtp load weights bug (#28805) wuyaoxuehun 2025-11-17 16:13:11 +07:00
577bb34fff [CPU][Bugfix] Fix _to_list in CPU model runner (#28824) Li, Jiang 2025-11-17 15:47:24 +08:00
3380ed5e11 [Doc] Add llama4 LoRA tag (#28825) Jee Jee Li 2025-11-17 14:08:48 +08:00
6f37419244 [Bugfix][Model] Prevent special token leakage in KimiK2ToolParser streaming mode (#28543) Jay Caldwell 2025-11-16 23:54:46 -06:00
60e089f0b9 [ROCm][Qwen3-32B] Fix AITER MHA accuracy issue cause by #25763 (#28670) Xiake Sun 2025-11-17 12:52:11 +08:00
d64429bb36 [NIXL][XPU] update install script of NIXL (#28778) liuzhenwei 2025-11-17 11:01:33 +08:00
561253b37f [Performance][Fix] update nvfp4 code to support renorm routing (#28569) jiahanc 2025-11-16 18:02:42 -08:00
80b6080ddc [BugFix] Fix async scheduling + chunked prefill + preemption (#28787) Nick Hill 2025-11-16 14:46:46 -08:00
03ee48111d Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261) amirkl94 2025-11-16 20:39:44 +02:00
5a87076d6e [Model][QwenVL] Optimize Qwen2_5_VisionAttention q,k preparation (#28769) Lukas Geiger 2025-11-16 17:37:15 +00:00
ac1daf3233 fix comment typo (#28802) Ning Xie 2025-11-17 01:03:21 +08:00
63fed55506 [Doc]: fix typos in various files (#28811) Didier Durand 2025-11-16 15:30:06 +01:00
8d259fad6c Fix gpt oss weight loading with EP + bf16 (#28765) Anna Shors 2025-11-16 05:12:45 -08:00
3bc1175798 [Bugfix] Fix host and port join for ipv6 in bench serve (#28679) scottzh8 2025-11-16 02:20:57 -08:00
af02c40970 Fixed gpt-oss _load_weights_other() parameter position bug (#28715) Dezhan 2025-11-16 01:46:29 -08:00
b316ac6589 [V1] Support MP Executor for multi node distributed inference (#23691) Lucia Fang 2025-11-16 01:01:21 -08:00
a55b64635c [Model] Allow users to control skip reading cache per request. (#28194) wang.yuqi 2025-11-16 16:04:50 +08:00
d231876ce3 [Benchmark] Fix client seed synchronization in multi-turn benchmark (#28512) ai-jz 2025-11-15 23:04:32 -08:00
f67299f66d [compile] Enable sequence parallelism matching w/o custom ops enabled (#27126) v0.11.1rc7 Angela Yi 2025-11-15 03:46:12 -08:00
5f6666fb5a LLaMA4 LoRA Adapter Enablement (#28602) Fardin Hoque 2025-11-14 10:27:56 -08:00
66a62d73da [Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677) Nicolò Lucchesi 2025-11-14 15:40:05 +01:00
c505dd6b61 [BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (default) (#28702) Lucas Wilkinson 2025-11-14 07:19:22 -05:00
f7adf64aac [BugFix] Fix multi-modal async scheduling race condition (#28706) Nick Hill 2025-11-14 01:11:13 -08:00
240d6b1758 [Bugfix] fix dots.ocr pp support (#28705) Jiangyun Zhu 2025-11-14 17:01:26 +08:00
b315ba9052 [Misc] Update xformers to 0.33.0.post1 (#28678) Roger Wang 2025-11-13 21:52:53 -08:00
9b24cf6f47 [bugfix] correct local_chunk_len for DCP in reorg_kvcache with long context (#28526) Qiu 2025-11-14 03:29:22 +08:00
facbc2c21e [BugFix] Ensure EngineArgs.create_engine_config is idempotent (#28515) Nick Hill 2025-11-13 09:14:08 -08:00
e2fd9a2edf [Misc] Turn off encoder torch compile by default (#28634) Roger Wang 2025-11-13 08:38:08 -08:00
1326f17492 Use official xformers-0.0.33 built for PT 2.9 (#28600) Huy Do 2025-11-12 22:48:53 -08:00
caf412e593 Skip models that cannot currently init on Transformers v5 (#28471) Harry Mellor 2025-11-12 23:43:57 +00:00
a035b5cffb [CI] Skip "Multi-Modal Models Test (Extended) 3" test that's broken in current Transformers (#28559) Harry Mellor 2025-11-12 19:38:13 +00:00
5b4dcecdd7 Remove deprecated fields from CompilationConfig (#27593) Harry Mellor 2025-11-12 16:10:28 +00:00
609bb244bd [Performance] Cache loaded custom logitsprocs to avoid overheads (#28462) Isotr0py 2025-11-12 08:49:29 +08:00
3a9ea77c35 [Bugfix] Fix max image size for PaddleOCR-VL (#28442) Roger Wang 2025-11-11 00:07:24 -08:00
28a82bb5e6 [Bugfix] Fix Stream Sync for Shared Expert Overlap (#28430) Robert Shaw 2025-11-11 00:59:08 -05:00
2a21f3e7c2 Only register rocm_aiter_ops if aiter is found (#28428) Michael Goin 2025-11-10 19:53:24 -07:00
ab625ba2fc [CI/Test Fix] Fix CP tests on Blackwell (#28404) Lucas Wilkinson 2025-11-10 20:36:29 -05:00
324c8cbd79 [Feature] Refactor batch invariant fp8 DeepGEMM (#27606) Wentao Ye 2025-11-10 19:08:40 -05:00
75ecaf48fe [Bugfix] Ensure calculated KV scales are applied in attention. (#27232) Adrian Abeyta 2025-11-10 17:42:37 -06:00
f849ee739c Adding a benchmark for batch invariance (#28161) Bram Wasti 2025-11-16 00:22:17 -05:00
be263f7645 [BugFix] Fix AssertionError: DCP not support reorder_batch_threshold > 1 now. (#28751) Lucas Wilkinson 2025-11-15 17:35:06 -05:00
2bb4435cb7 [Doc]: fix typos in various files (#28567) Didier Durand 2025-11-15 20:27:50 +01:00
07cadab27a [Model][Qwen3VL] Cache positional embedding indices (#28475) Lukas Geiger 2025-11-15 19:03:09 +00:00
637f292196 [CI] Fix broken pipeline (#28781) Nick Hill 2025-11-15 08:44:14 -08:00
e439c784fa Add support for Eagle with separate lm-head and embed_tokens layers (#28549) Eldar Kurtić 2025-11-15 15:12:02 +01:00
085a525332 [Model] Fix lmhead init bug of bailing_moe (#28777) hwhaokun 2025-11-15 21:44:12 +08:00
89d3679221 [Doc] Fix failing doc build (#28772) Cyrus Leung 2025-11-15 21:33:27 +08:00
cb15ee28db Allow Gemma3 to take image embeddings (#28483) tingtinggithub 2025-11-15 04:18:08 -08:00
f36292dbee [compile] Enable sequence parallelism matching w/o custom ops enabled (#27126) Angela Yi 2025-11-15 03:46:12 -08:00
173b356abf [PERF] Remove TRTLLM Gen attn kernel limitation max_seq_len <=131072 (#28755) Vadim Gimpelson 2025-11-15 14:13:41 +04:00
638e4196d1 [Misc] Make SchedulerConfig.max_model_len init-only (#28733) Cyrus Leung 2025-11-15 17:59:31 +08:00
1ec978c209 [Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 (#28709) Zhewen Li 2025-11-15 01:10:48 -08:00
74b5267d3a Use narrow over indexing in hadacore_transform to prep for ABI stable (#28756) Jane (Yuan) Xu 2025-11-15 04:10:15 -05:00
dd6ac1c2bb [RL] [V1] Remove unused device argument from reset_kv_cache (#28766) Zhuohan Li 2025-11-14 23:59:42 -08:00

... 43 44 45 46 47 ...