Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

7779de34da [BugFix] Fix P/D with non-MoE DP (#33037) Nick Hill 2026-01-27 08:03:47 -08:00
0d8ce320a2 [Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1 (#33090) Nicolò Lucchesi 2026-01-27 16:03:20 +01:00
d51e1f8b62 [Bugfix] Disable CG for Whisper+FA2 (#33164) Nicolò Lucchesi 2026-01-27 14:46:51 +01:00
5042815ab6 [Models] Kimi-K2.5 (#33131) Roger Wang 2026-01-26 22:50:31 -08:00
afb390ab02 [CI] Fix AssertionError: MCP tool call not found in output_messages (#33093) Chauncey 2026-01-26 23:19:57 +08:00
ecb4f82209 [CI] Update job dependency syntax for Intel and AMD jobs (#33240) Kevin H. Luu 2026-01-28 01:33:59 -08:00
5914090765 [CI] Update job dependency for hardware and CPU jobs (#33237) Kevin H. Luu 2026-01-28 01:10:05 -08:00
f1acbd68c5 [CI] Enable mypy import following for vllm/compilation (#33199) Harry Mellor 2026-01-28 08:59:54 +00:00
9581185d51 [XPU]disable test_acceptance_length UT (#33226) Yan Ma 2026-01-28 15:24:13 +08:00
2dd359f953 [Docs] Simplify CPU x86 Docker build documentation (#33071) Maryam Tahhan 2026-01-28 06:37:09 +00:00
22ad649501 [ROCm] Enabling forward_includes_kv_cache on ROCm MHA backends (#33106) Gregory Shtrasberg 2026-01-28 00:36:14 -06:00
36d450e3b8 Adds FunAudioChat multimodal audio model support (#2) (#33058) ramos 2026-01-28 13:18:09 +08:00
a2b877df6c [Bugfix] Lazy import NgramProposer in GPU model runner (#32821) 22quinn 2026-01-27 21:07:16 -08:00
35fb0b8613 Don't use min_pixels/max_pixels from Qwen2VL's processor (#33208) Harry Mellor 2026-01-28 05:02:08 +00:00
2eb673a088 Add flake8-implicit-str-concat rules to Ruff (#33191) Harry Mellor 2026-01-28 04:56:10 +00:00
a97b5e206d Relax protobuf library version constraints (#33202) Jeffrey Wang 2026-01-27 20:15:53 -08:00
911b51b69f [ROCm][CI] Add TORCH_NCCL_BLOCKING_WAIT For Distributed Tests (A100) (#32891) Micah Williamson 2026-01-27 21:32:31 -06:00
604e3b87e8 [Feature]: Container image WORKDIR consistency (#33159) Xinan Miao 2026-01-28 11:06:48 +08:00
706f123b23 [Docs] Use definition lists for CLI reference docs (#33186) Harry Mellor 2026-01-28 02:22:48 +00:00
fb7abfc1d0 [docs] Improve tlparse section (#33211) Angela Yi 2026-01-27 18:07:37 -08:00
5d3d6e44e8 [CI] minor fixes to pipeline generator and tests (#33151) Kevin H. Luu 2026-01-27 17:04:02 -08:00
46ec6d71c7 [Model Runner V2] Use a different stream for grammar bitmask h2d copy (#33059) Woosuk Kwon 2026-01-27 16:37:43 -08:00
e82fa448c4 Add attention benchmarking tools (#26835) Matthew Bonanni 2026-01-27 19:09:20 -05:00
d9aa39a3bb [torch.compile] Speed up MOE handling in forward_context (#33184) Richard Zou 2026-01-27 18:17:54 -05:00
3a6d5cbefd [Perf] Optimize dcp allocate tensor (#33102) Wentao Ye 2026-01-27 17:24:41 -05:00
f5d7049cc1 [Bugfix] Fix display error (inconsistent with context) (#33020) linhaifeng 2026-01-28 04:33:29 +08:00
3c3c547ce0 Enabling "2 node" distributed tests in the AMD CI pipeline. (#32719) Alexei-V-Ivanov-AMD 2026-01-27 13:13:21 -06:00
1cbccb6dba [Attention] Use has_flashinfer helper (#33177) Matthew Bonanni 2026-01-27 13:33:17 -05:00
bd92089d33 feature: support eagle3 for HunyuanVL & Hunyuan (#33035) Iris 2026-01-28 01:55:48 +08:00
a6760f1525 [Doc] Improve serve parameter documentation with meaningful defaults (#33082) Karan Bansal 2026-01-27 22:49:37 +05:30
66e601ef79 Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing (#33076) IriKa 2026-01-28 00:04:05 +08:00
0cd259b2d8 [BugFix] Fix P/D with non-MoE DP (#33037) Nick Hill 2026-01-27 08:03:47 -08:00
83fb2d09e8 Support heterogeneous NemotronHPuzzle model (#32549) danielafrimi 2026-01-27 17:55:54 +02:00
f3a5ee705f [LoRA][Spec Decode] Support LoRA for Nemotron-H MTP models (#32265) danisereb 2026-01-27 17:53:26 +02:00
7cbbca9aaa [Frontend] Cleanup api server (#33158) wang.yuqi 2026-01-27 23:18:10 +08:00
5ec44056f7 [Metrics][MFU] Fix UnembedMetrics FLOP overcounting for prefill (#33045) (#33045) omkhalil 2026-01-27 10:16:49 -05:00
492a7983dd [Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1 (#33090) Nicolò Lucchesi 2026-01-27 16:03:20 +01:00
a608b4c6c2 [5/N][Attention] Finish eliminating vllm/attention folder (#32064) Matthew Bonanni 2026-01-27 10:02:51 -05:00
1f3a2c2944 [Bugfix] Disable CG for Whisper+FA2 (#33164) Nicolò Lucchesi 2026-01-27 14:46:51 +01:00
7227d06156 [Metrics] [KVConnector] Add Offloading Connector metrics (#27942) omerpaz95 2026-01-27 15:34:49 +02:00
14385c80fc Fix weight mapping test for Transfomers v5 (#33162) Harry Mellor 2026-01-27 12:30:14 +00:00
76139d0801 [Frontend] Frontend will only attach supported tasks corresponding entrypoints. (#33139) wang.yuqi 2026-01-27 20:15:43 +08:00
da8d0c441a [AMD][QWEN3-NEXT] FP8 Tunings (#32042) Lifan Shen 2026-01-27 01:34:13 -08:00
58996f3589 [AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_kernel when on gfx942 (#32976) v0.15.0rc1 rasmith 2026-01-27 01:16:43 -06:00
b539f988e1 [Models] Kimi-K2.5 (#33131) Roger Wang 2026-01-26 22:50:31 -08:00
6c00645712 [CI][Pooling] Stabilize ModernBERT test (#32909) Andreas Karatzas 2026-01-26 23:26:48 -06:00
b781eeaa15 [code clean] remove duplicate code (#33135) Ning Xie 2026-01-27 12:57:16 +08:00
e0b005d9cf [Frontend] Cleanup serving engine (#33103) Cyrus Leung 2026-01-27 12:47:26 +08:00
3b8f0fe59e [torch.compile] Stop assuming 32 bit indexing (#33113) Richard Zou 2026-01-26 23:25:02 -05:00
c831911be2 [Frontend] Reduce mixin usage in serving pooling (#33101) Cyrus Leung 2026-01-27 11:50:37 +08:00
157caf511b [Perf] avoid duplicate mem_get_info() call in get_current_memory_usage (#33064) Paco Xu 2026-01-27 11:45:45 +08:00
0b53bec60b [DOC]: Add warning about max_num_batched_tokens and max_model_len when chunked prefill is disabled (#33109) Vincent Gimenes 2026-01-27 04:05:02 +01:00
c568581ff3 Fix IndexError with encoder-decoder models when using Custom Paged Attention (#33112) Strahinja Stamenkovic 2026-01-27 03:33:37 +01:00
2d7053438a fix: preserve native tool call ID in multi-turn tool calling (#32768) wangln19 2026-01-27 10:22:35 +08:00
5a93b9162b [MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567) Robert Shaw 2026-01-26 20:28:02 -05:00
6d86fde09c [Model Runner V2] Remove UvaBufferPool for cpu->gpu copy (#33055) Woosuk Kwon 2026-01-26 16:47:35 -08:00
510ed1e8d3 [Bugfix][TPU] Return a Default fp8 MoE Backend (#32908) XiongfeiWei 2026-01-26 15:46:11 -08:00
8caffd92df [Bugfix][MXFP4] Call trtllm_fp4_block_scale_moe with kwargs (#33104) Pengchao Wang 2026-01-26 15:13:18 -08:00
58a05b0ca1 [fix] CPUDNNLGEMMHandler pointer baked into inductor artifact (#32913) dolpm 2026-01-26 13:59:44 -08:00
6ee7f18f33 [Logging] add --disable-access-log-for-endpoints CLI option (#30011) Jared Wen 2026-01-27 05:49:03 +08:00
8f987883cb [Refactor] Remove unused _moe_permute function (#33108) Wentao Ye 2026-01-26 16:06:45 -05:00
cf1167e50b [Bugfix] Fix Dtypes for Pynccl Wrapper (#33030) v0.15.0rc0 Robert Shaw 2026-01-26 15:09:32 -05:00
ebe0ba91db [ci] Sync test areas with test-pipeline.yaml and enable new pipeline generator (#33080) Kevin H. Luu 2026-01-26 12:28:20 -08:00
43a013c3a2 [Bugfix] Fix Dtypes for Pynccl Wrapper (#33030) Robert Shaw 2026-01-26 15:09:32 -05:00
c25dbee40d [Model] Bump transformers version for test registry (#33100) Cyrus Leung 2026-01-27 02:53:22 +08:00
19ab0f7ce5 [Bugfix] Fix Voxtral streaming slot_mapping (#33073) Nicolò Lucchesi 2026-01-26 19:40:40 +01:00
67fe677c53 [FIX] Always support TP > 4 for FP4 Gemm (#31099) danielafrimi 2026-01-26 20:04:20 +02:00
d56afd45fd Remove unused logic in models/mistral.py (#33095) Andy Lo 2026-01-26 17:01:52 +00:00
a2393ed496 [CI] Fix AssertionError: MCP tool call not found in output_messages (#33093) Chauncey 2026-01-26 23:19:57 +08:00
be6931ee27 [ROCm][Bugfix] Fix ptpc scale load issue for fused shared expert path in deepseek mtp (#33018) Pleaplusone 2026-01-26 23:19:04 +08:00
9ef3b718d9 [Bugfix] Fix Can't instantiate abstract class DeepseekV32IndexerBackend (#33052) Chauncey 2026-01-26 22:44:02 +08:00
bb17e8f11c [GLM-OCR] GLM-OCR with MTP Support (#33005) Yuxuan Zhang 2026-01-26 22:24:43 +08:00
dcd80206b7 [Chore] Update type annotation of input_ids in model forward (#33063) Cyrus Leung 2026-01-26 22:02:10 +08:00
f4a0921c9c [Performance] Tune Mamba selective scan kernel for B200 (#32873) danisereb 2026-01-26 15:56:54 +02:00
208c56256f [Feature] Add LoRA support for Gemma3 vision components (#32764) VihaanThat 2026-01-26 19:26:40 +05:30
9ac818a551 [Misc] HF Hub LoRA Resolver (#20320) Alex Brooks 2026-01-26 06:56:32 -07:00
6ca2c91b96 [Model] Use mm_position to compute mrope positions for Qwen3-Omni (#33010) Itay Etelis 2026-01-26 15:48:07 +02:00
e33192b269 [lora/moe] Improve fused MoE‑LoRA kernel indexing and memory access (#32770) cwazai 2026-01-26 20:56:34 +08:00
61274bdef5 [Doc] Further update multi-modal impl doc (#33065) Cyrus Leung 2026-01-26 18:54:20 +08:00
b40db4dfec [StepVL] add step vl offline example (#33054) ltd0924 2026-01-26 17:00:32 +08:00
11b556878b [Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955) Cyrus Leung 2026-01-26 15:00:28 +08:00
ee484b3f4b Set splitk=1 for fused-moe-lora expand kernel (#32882) Danielle Robinson 2026-01-25 22:52:34 -08:00
a9b53dd435 [Model Runner V2] Add LoRAState to consolidate lora logic (#33062) Woosuk Kwon 2026-01-25 22:21:12 -08:00
254db42ede [Tests] Remove Duplicates (#33032) Robert Shaw 2026-01-26 00:23:54 -05:00
105d104576 [StepVL] support close img patch (#32923) ltd0924 2026-01-26 12:56:39 +08:00
566cdb6cfb [CI] Fix MHA attention test failure (AttributeError when model_config is None in ViT attention backend) (#33033) Lucas Wilkinson 2026-01-25 20:49:53 -07:00
2f0d3ba745 [Model Runner V2] Minor simplification for finish_requests (#33048) Woosuk Kwon 2026-01-25 18:35:02 -08:00
edf927bc9f [Model Runner V2] Fix slot_mapping after #25954 (#33046) Woosuk Kwon 2026-01-25 18:29:49 -08:00
22aeb43007 [Bugfix][VLM] Fix transformers backend embed_multimodal for Qwen2.5-VL profiling (#32969) Andreas Karatzas 2026-01-25 18:34:05 -06:00
a698e8e7ad [Model] Use mm_position to compute mrope positions for Qwen2.5-Omni (#32772) Itay Etelis 2026-01-25 14:15:53 +02:00
151e5451c2 [Doc] Add Qwen2.5 models to batch invariance tested models (#33016) zhanqiuhu 2026-01-25 04:20:46 -05:00
73b243463b [BugFix] Add env variable to control PDL in LoRA (#32836) Jee Jee Li 2026-01-25 16:32:30 +08:00
7e67df5570 [Bugfix] fix encoder cache hang in Qwen3VL (#32684) JJJYmmm 2026-01-25 13:17:31 +08:00
ff6c1da4e6 [Docs] Fix Apple silicon include path in CPU installation docs (#32977) 7. Sun 2026-01-25 01:51:49 +00:00
fcb9df99bd [Perf][Kernel] Optimize FP4 quantization kernels (SM100F) (#32520) Roberto L. Castro 2026-01-25 02:45:27 +01:00
1ebdff412a [DOC] [ROCm] Update doc for v0.14.1 (#32998) TJian 2026-01-25 09:13:21 +08:00
91601ff478 [Feature] add session based streaming input support to v1 (#28973) Joshua Deng 2026-01-24 13:06:28 -07:00
d4dbb7af63 Using max_loras + 1 to construct grid in fused_moe_lora (#32277) yugong333 2026-01-24 09:39:30 -08:00
203d0bc0c2 [CPU] Improve CPU Docker build (#30953) Maryam Tahhan 2026-01-24 17:08:24 +00:00
17ab54de81 [CPU Backend][BugFix] Fix failing Darwin pipelines (#33002) Fadi Arafeh 2026-01-24 17:02:22 +00:00

... 23 24 25 26 27 ...