Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

73a484caa1 [Model][Quantization] Fix / Add GGUF support for Qwen2 MoE models (#30307) Tsukasa OI 2025-12-10 04:13:10 +09:00
b37bf51e75 [CI/Test] Fix FP8 per-tensor quant test reference scale shape (#30352) Lucas Wilkinson 2025-12-09 13:52:20 -05:00
95501a70ec [BugFix] Fix DeepSeek-R1 hang with DP and MTP (#30119) Lucas Wilkinson 2025-12-09 13:51:19 -05:00
e858bfe051 [Cleanup] Refactor profiling env vars into a CLI config (#29912) Benjamin Chislett 2025-12-09 13:29:33 -05:00
d471b2aff0 [Model Runner V2] Support num NaNs in logits (#30187) Woosuk Kwon 2025-12-09 10:00:49 -08:00
9e6562a3f6 [Model Runner V2] Fix Triton warning on tl.where (#30355) Woosuk Kwon 2025-12-09 09:59:54 -08:00
0b6a8a304c [BugFix] Fix non detected failing tests (#30277) Ilya Markov 2025-12-09 18:57:55 +01:00
804e3468c0 Update AMD test definitions (2025-12-08) (#30298) Alexei-V-Ivanov-AMD 2025-12-09 11:31:30 -06:00
83319b44c2 [Compile] Fix torch warning TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled (#29897) Wentao Ye 2025-12-09 10:40:37 -05:00
56037dfa2f [BugFix] Fix assert batch_descriptor.num_tokens == num_tokens_padded (#30173) Lucas Wilkinson 2025-12-09 10:36:12 -05:00
5dcd593baf [Feature] Batch-Invariant Support for FA2 and LoRA (#30018) quanliu 2025-12-09 23:01:38 +08:00
5c213d2899 [BUGFIX] Mistral tool call parser v11+ (#30332) Julien Denize 2025-12-09 15:55:38 +01:00
ee14644ba9 [ROCm] Aiter Quant Kernels (#25552) vllmellm 2025-12-09 22:27:37 +08:00
1166c31cc7 [Bugfix]: Fix glm46 awq marlin moe wna16 compatibility (#30210) Dongjie Zou 2025-12-09 07:20:21 -05:00
03416eada6 [bugfix][quantization] Fix fp8 per_tensor scale shape (#30257) haoyangli-amd 2025-12-09 19:28:50 +08:00
c72ea10723 [Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. (#30056) Hubert de La Jonquiere 2025-12-09 11:54:08 +01:00
67475a6e81 [DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA (#30309) Jaya Yuan 2025-12-09 16:22:14 +08:00
9c32df6101 [Bugfix] Qwen 3 VL Embedding loading (#30303) wang.yuqi 2025-12-09 16:04:02 +08:00
aeb82b1930 [CI] Fix Flaky test_eagle_max_len Test (#30306) Micah Williamson 2025-12-09 01:33:34 -06:00
aed846917f [Attention] Make split_decodes_and_prefills(..., require_uniform=True) support padding (#29644) Lucas Wilkinson 2025-12-09 02:24:01 -05:00
e4605d225e [Misc] Fix safetensors import for safe_open (#30300) Yongtao Huang 2025-12-09 14:50:06 +08:00
58d5b3f514 [Model][Quantization] Restore MoE + GGUF models support (incl. Qwen3 MoE) by allowing Sideload Parameters (#30116) Tsukasa OI 2025-12-09 14:30:05 +09:00
c2e1987a6e [Doc] update Intel GPU MM status in Feature x Hardware matrix (#30294) Fanli Lin 2025-12-09 13:16:44 +08:00
e130845984 [CPU][CI] Enable fused MoE tests in Arm CI (#30132) Fadi Arafeh 2025-12-09 04:55:39 +00:00
4b03b50211 update torchao safetensors impl (#30155) liangel-02 2025-12-08 23:46:35 -05:00
4c6fd25880 kv_transfer: Rename the shared storage connectors (#30201) Or Ozeri 2025-12-09 06:46:09 +02:00
03b91f7262 [Bugfix] Fix compressed-tensors models failing to load with transformers backend (#30287) Michael Goin 2025-12-08 23:44:28 -05:00
f6227c22ab [Kernel]Support W4A8 Grouped GEMM on Hopper (#29691) czhu-cohere 2025-12-08 22:29:06 -05:00
ea657f2078 Lora MoE Align Improvements (#29257) gnovack 2025-12-08 18:35:16 -08:00
db14f61f2d [ci] Refactor CI file structure (#29343) Kevin H. Luu 2025-12-08 18:25:43 -08:00
78c7503364 [ROCm][CI] Skip NVIDIA-Only Prime-RL Test in AMD CI (#29420) Micah Williamson 2025-12-08 20:14:02 -06:00
e41312a2f5 [Bugfix] Skip generation config fallback for GGUF to prevent multi-process hang (#30209) Christina Norman 2025-12-08 19:52:43 -06:00
7b35011ad1 Mark qwen2_5_vl as xfail (#30283) Yanan Cao 2025-12-08 17:14:10 -08:00
ae339b1a67 [Bugfix] Fix DeepGEMM after #29546 (#30267) Zhewen Li 2025-12-08 17:05:27 -08:00
0ee6416f67 [Perf] Optimize group_topk kernel, 1.9% Throughput improvement, 2.1% TPOT improvemnt (#30159) Wentao Ye 2025-12-08 19:44:01 -05:00
d9417096d1 [Feature] Batch invariant: Enable TRITON_MLA without prefix-caching (#29125) Wentao Ye 2025-12-08 19:31:57 -05:00
9d6235ca9a [moe] Allow disabling DP chunking (#29936) Ming Yang 2025-12-08 16:29:36 -08:00
f1599ca55d feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189) Victor Ziliang Peng 2025-12-08 16:08:48 -08:00
60d17251c9 [Disagg] Support large batch size in proxy server and update NixlConnector doc for DP (#28782) Ming Yang 2025-12-08 16:01:08 -08:00
1fb632fdb6 [Perf] Improve fp8 quant in mla; replace ReduceSum with ReduceScatterSum (#29795) Lain 2025-12-08 15:02:34 -08:00
6af70e11a0 [ROCm][CI] Fix test_max_len.py for Rocm (#29916) Charlie Fu 2025-12-08 15:58:30 -06:00
ae0f69b16a Add SpecDec support to selective_state_update (#29488) roikoren755 2025-12-08 23:45:18 +02:00
799804d140 Bump nvshmem to 3.3.24 and fix CUDA 13 installation (#30149) Dmitry Tokarev 2025-12-08 15:24:34 -05:00
0d402d2600 online fp8 quant with streaming weight post-processing (#29196) Vasiliy Kuznetsov 2025-12-08 15:15:10 -05:00
d1b5e7afbf [TPU] Bump tpu-inference to 0.12.0 (#30221) Johnny Yang 2025-12-08 12:10:10 -08:00
fcd5306f65 Add latent MoE support (#30203) shaharmor98 2025-12-08 19:35:01 +02:00
398a596ed2 [MP executor] fix get device count for multi node of mp executor feature (#30042) weiguihua2 2025-12-09 01:33:48 +08:00
67312cad11 [Misc] Split the LoRA code (#30253) Jee Jee Li 2025-12-09 00:59:31 +08:00
87aee9ed2b Add evaluate_guards option to DynamicShapesConfig (#27432) Laith Sakka 2025-12-08 07:46:15 -08:00
184076c3fe [DeepSeek v3.2] Make top-k work for any logit values. (#27568) Daniel Cámpora 2025-12-08 15:55:58 +01:00
eb1051fb95 [ROCm] Guard group quant RMS norm fusion patterns (#30239) Ye (Charlotte) Qi 2025-12-08 06:44:48 -08:00
80433e225e [LoRA] Reduce the loading time of MoE LoRA (#30243) Jee Jee Li 2025-12-08 21:29:47 +08:00
5c2433a6f3 Add tip for mypy and markdownlint to the pre-commit comment (#30259) Harry Mellor 2025-12-08 13:11:51 +00:00
77072e93b3 [docs] governance documents (#24801) Simon Mo 2025-12-08 03:06:20 -09:00
2e660c2434 [Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. (#30249) wang.yuqi 2025-12-08 20:01:21 +08:00
408cf42f67 [CI] Prevents triggering of an inactive issue/PR check for forked repository. (#29654) Shiming Zhang 2025-12-08 18:29:14 +08:00
9e77ffca3f [Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686) wang.yuqi 2025-12-08 16:10:09 +08:00
bcb6f5947f [Perf] Remove sync point in vit torch sdpa attn backend (#30232) Dazhi Jiang 2025-12-08 15:12:42 +08:00
cd00c443d2 [Misc] Rename TensorRT Model Optimizer to Model Optimizer (#30091) Zhiyu 2025-12-07 23:05:27 -08:00
d143271234 [Bugfix] fix fuse_allreduce_rms when tp =1 (#30178) Jiangyun Zhu 2025-12-08 14:43:47 +08:00
c6df05ebb4 [ROCm] [Fused Moe EP] Use binary expert mask for aiter fused moe kernel (#29773) Zhiwei 2025-12-08 13:23:46 +08:00
d726a7b0ed [BugFix] Unblock use of LoRA with data parallel mode (#30220) Nick Hill 2025-12-07 20:21:05 -08:00
344b50d525 Address comment to mergify.yml in #30117 (#30219) Zhijian Jiang 2025-12-07 19:26:25 -08:00
735284ed86 [responsesAPI][7] Browser, Container MCP tools for non harmony models (#29989) Andrew Xia 2025-12-07 18:04:03 -08:00
444f0e3f33 [Frontend] Add MCP type support infrastructure to Responses API (#30054) daniel-salib 2025-12-07 18:02:52 -08:00
af0444bf40 [Performance] Fused blockwise quant RMS norm (#27883) ElizaWszola 2025-12-07 17:38:04 +01:00
0044c4038c [BugFix][DeepSeek-V3.2] Fix backend selection logic for Blackwell (#30195) Lucas Wilkinson 2025-12-07 10:53:51 -05:00
b952f4d3c3 [v1] Add PrefixLM support to FlexAttention backend (#27938) Isotr0py 2025-12-07 23:51:36 +08:00
541a2ef892 [Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. (#29546) Wentao Ye 2025-12-07 07:31:14 -05:00
b0f4866a77 [CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202) Jee Jee Li 2025-12-07 20:27:11 +08:00
879ddb09c3 [Kernel][MoE] optimize moe_align_block_size (#29642) Jinzhen Lin 2025-12-07 17:58:47 +08:00
1b0482b9d1 [Misc][Core] Remove unused req_index increment in scheduler (#30176) Yifan Qiao 2025-12-07 00:39:21 -08:00
e83b7e379c Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199) Cyrus Leung 2025-12-07 16:00:22 +08:00
27f4c2fd46 [Renderer] Separate out RendererConfig from ModelConfig (#30145) Cyrus Leung 2025-12-07 15:15:42 +08:00
a49d813fa8 Lazy loading to avoid importing all files (#29716) Luke 2025-12-06 23:13:14 -08:00
17eb25e327 [Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558) Wentao Ye 2025-12-06 23:44:50 -05:00
dce6d229f7 Support multiple image/audio embeddings per requests (#29988) jeremyteboul 2025-12-06 20:34:24 -08:00
cbedb703cc [Frontend] Remove confusing -O.xx flag error (#30169) Yanan Cao 2025-12-06 18:53:42 -08:00
8d3da4c79d [MISC]: change NIXL compatibility hash logging level to debug (#30182) AuruTus 2025-12-07 08:21:03 +08:00
421125d03a [ez] move harmony utils to parser folder (#30117) Andrew Xia 2025-12-06 14:34:34 -08:00
671427efbf [Model] Move multimodal_cpu_fields definition to field config (#30181) Cyrus Leung 2025-12-06 21:40:02 +08:00
21bb323542 Gigachat 3 tool parser and tests (#29905) Viacheslav 2025-12-06 15:04:14 +03:00
17a9abec2b simplify requires_files list creation (#29656) Chukwuma Nwaugha 2025-12-06 09:42:41 +00:00
92c35abb24 [Misc] Fix circular import in vllm.transformers_utils.config (#30179) Ye (Charlotte) Qi 2025-12-06 01:24:03 -08:00
43e7593031 Support tokenization_kwargs override (#29794) Yu Jiaqi 2025-12-06 17:12:53 +08:00
c46b932df2 [Chore] Deprecate SupportsMultiModal.merge_by_field_config (#30170) Cyrus Leung 2025-12-06 15:57:28 +08:00
6476382384 prefix caching design doc sha256 now default (#29261) redwrasse 2025-12-05 23:39:56 -08:00
d6aeaddf4a [bugfix] fix type[AttentionBackend] bug in kv_connector_base_v1 (#30051) kx 2025-12-06 15:11:31 +08:00
a238cbd89d [Model Runner V2] Support min-p sampling (#30171) Woosuk Kwon 2025-12-05 21:42:47 -08:00
4026ae31e9 [Misc] Move disable_nccl_for_dp_synchronization init logic into VllmConfig (#30161) Nick Hill 2025-12-05 20:59:04 -08:00
b12f4a9830 [CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985) rasmith 2025-12-05 22:57:38 -06:00
40a046cd82 [Bugfix]: Fix TokenizerLike interface (#30009) Rohan Potdar 2025-12-05 22:56:40 -06:00
e858bc4d14 [Model] Add support for transformer-based Ultravox v0.7 projector (#30089) Peter Salas 2025-12-05 20:55:43 -08:00
e3fbb6f152 fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb (#30093) Dongjie Zou 2025-12-05 23:55:09 -05:00
c4d62618ca Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend (#30102) yuttian1 2025-12-06 12:54:38 +08:00
62079d8600 [CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109) rasmith 2025-12-05 22:54:17 -06:00
bf4a901af9 Better error when world size is larger than node and distributed_executor_backend is not set (#30140) Harry Mellor 2025-12-06 04:53:52 +00:00
7e31c3a3f6 [CI]: Remove unnecessary imports from test_lmache_integration (#30157) Samuel Shen 2025-12-05 20:53:34 -08:00
dc839ad03d [CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151) rasmith 2025-12-05 22:52:11 -06:00
02a4169193 [Tests] Tool call tests for openai/gpt-oss-20b (#26237) Deboleina 2025-12-05 22:03:29 -05:00

... 36 37 38 39 40 ...