Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

1fb0209bbc [Bugfix][Hardware][AMD] Fix exception types in AITER MLA FP8 check (#31177) Kevin McKay 2026-01-06 00:10:59 -06:00
81323ea221 [CI] Fix CPU MM PRocessor Test (#31764) Robert Shaw 2026-01-05 23:22:18 -05:00
e1cd7a5faf [Bugfix] Add init_workspace_manager to moe kernel benchmarks (#31042) Michael Goin 2026-01-05 22:14:33 -05:00
a68e703c32 [UX] Add -ep shorthand for --enable-expert-parallel (#30890) Michael Goin 2026-01-05 22:13:36 -05:00
cd1245a184 [Cleanup] Remove redundant decoder_layer_type assignment in Qwen2 (#31760) maang 2026-01-06 10:09:18 +08:00
ffec815422 [Perf] Optimize additional fill(0) in cutlass moe, 2.9% E2E throughput improvement, 10.8% TTFT improvement (#31754) Wentao Ye 2026-01-05 21:01:13 -05:00
d386ab1412 [Docs] Improve malformed exception caused by backslash line continuations (#31694) maang 2026-01-06 09:51:54 +08:00
ccb309a964 Revert "[CI Failure] Disable B200 tests while runner is broken" (#31750) Michael Goin 2026-01-05 20:26:33 -05:00
2f4e6548ef [Bugfix] vLLM produces invalid UTF-8 tokens and “�” (#28874) John Calderon 2026-01-05 19:23:00 -05:00
3c98c2d21b [CI/Build] Allow user to configure NVSHMEM version via ENV or command line (#30732) Seiji Eicher 2026-01-05 15:56:08 -08:00
9513029898 [Bugfix] Properly apply v_scale for mimo_v2_flash (#31175) Michael Goin 2026-01-05 18:20:46 -05:00
f6c0009afa [Bugfix] Fix Broken ModelOpt NVFP4 MoE (#31742) Robert Shaw 2026-01-05 18:18:38 -05:00
776ca1e187 [MoE Refactor] Aiter Experts for BF16 MoE (#31542) Yongye Zhu 2026-01-05 14:52:59 -08:00
af9a7ec255 [Bug] Revert torch warning fix (#31585) Wentao Ye 2026-01-05 17:31:21 -05:00
276e03b92c [CI][DeepSeek] Add nightly DeepSeek R1 lm_eval tests on H200 (#30356) Matthew Bonanni 2026-01-05 17:17:59 -05:00
32f4e4db00 [Cleanup] Remove deprecated fields from CachedRequestData class (#31734) Nick Hill 2026-01-05 13:07:14 -08:00
ee21291825 [Model] Nemotron Parse 1.1 Support (#30864) amitz-nv 2026-01-05 23:00:14 +02:00
af1b07b0c5 [docker] install cuda13 version of lmcache and nixl (#30913) Qidong Su 2026-01-05 15:50:39 -05:00
c77a993cc2 pin lora_b moe weights on cpu (#31317) gnovack 2026-01-05 12:15:40 -08:00
fdcc5176be [BugFix] Fix architecture flags to prevent issues on SM103 (#31150) Roberto L. Castro 2026-01-05 21:11:35 +01:00
5708297e4e [Misc][Model][Refactor] Pass the prefix into Linear layers (#31669) Wang Kunpeng 2026-01-06 04:03:18 +08:00
02dbb933cb Fix GLM-4.6v flash tool calling in transformers 5.x (#31622) baonudesifeizhai 2026-01-05 14:32:43 -05:00
51e38a8e30 [Misc] Enable Paligemma's PrefixLM attention mask computation (#31725) Isotr0py 2026-01-06 03:31:49 +08:00
d8e38d4939 Triton Attention: Support cross-layers blocks (#30687) Or Ozeri 2026-01-05 21:29:16 +02:00
21156ff199 [Bugfix] Add missing extra_tensors arg to DeviceCommunicatorBase.disp… (#31644) kzwrime 2026-01-06 01:26:09 +08:00
c455b771fd [Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --enforce-eager (#31643) RickyChen / 陳昭儒 2026-01-06 01:25:38 +08:00
eefa713a66 [CI Failure] Disable B200 tests while runner is broken (#31732) Michael Goin 2026-01-05 11:50:51 -05:00
79ed460dd5 [Frontend] [Doc] Exclude log deltas feature (#30322) Kevin Šuc 2026-01-05 17:34:35 +01:00
6aa5b18e1d [v1] Add encoder-only/cross attention support to Triton Attention backend (#31406) Isotr0py 2026-01-06 00:00:23 +08:00
911d38ed99 [Model] Let more models to support the score template. (#31335) wang.yuqi 2026-01-05 19:54:26 +08:00
caaa482aca [platform] Support additional forward context for OOT (#31674) zzzzwwjj 2026-01-05 18:25:13 +08:00
b471aad41f [KVconnector][LMCache] remove the import of legacy LMCache code (#31704) Yihua Cheng 2026-01-05 02:11:01 -08:00
d5503ca7f9 [LoRA] LoRA PDL improvement (#31660) Jee Jee Li 2026-01-05 16:28:46 +08:00
a2ad15c070 [Model] Enable LoRA support for BLIP2 (#31620) Qiping Pan 2026-01-05 00:02:24 -08:00
3133c192a3 [ROCM] Reorder arguments and rename parameters for rope_cached_thd_positions_2c_fwd_inplace (#29993) Tres 2026-01-05 08:37:57 +01:00
76fd458aa7 [CI] Bump sentence-transformer from 3.2.1 to 5.2.0 (#31664) wang.yuqi 2026-01-05 13:45:01 +08:00
e2701cc525 [Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser (#31581) cjackal 2026-01-05 14:42:47 +09:00
fe8a9fbd2e [Bugfix] Fix EPLB state logging error (#31455) Tyler Michael Smith 2026-01-04 23:06:28 -05:00
98b8b3abaa [log] enable max_log_len trim only when needed (#31482) Ning Xie 2026-01-05 11:55:43 +08:00
346e56455a Add chat prefix completion feature to DeepSeek v3.2 (#31147) CHENYUE 2026-01-05 11:20:25 +08:00
8be6432bda [CI Failure] Fix NomicBert max_model_len validation (#31662) wang.yuqi 2026-01-05 11:06:52 +08:00
43e3f8e4a9 [Misc] Various code simplifications (#31666) Nick Hill 2026-01-04 18:35:56 -08:00
bb4337b34c [Platform] Deprecate seed_everything (#31659) wangxiyuan 2026-01-05 10:34:04 +08:00
367856de14 [CI/Build] Revive skipped reward models e2e test (#31665) Isotr0py 2026-01-05 10:33:46 +08:00
da436f868a [Minor] Small pooler output processing optimization (#31667) Nick Hill 2026-01-04 18:33:12 -08:00
f099cd557a [Bugfix] Fix AttributeError: 'Stream' object has no attribute 'dp_size' (#31663) Jee Jee Li 2026-01-05 10:31:31 +08:00
f2b6dfd237 [ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp (#31597) Andreas Karatzas 2026-01-04 20:17:05 -06:00
89f1f25310 [CI] Skip Phi-MoE test due to old API util (#31632) Andreas Karatzas 2026-01-04 18:52:07 -06:00
b53b89fdb3 [BugFix] Async scheduling: handle model forward errors more cleanly (#31611) Nick Hill 2026-01-04 11:04:37 -08:00
6522721d17 [misc] Sort uvicorn log level description according to verbosity (#31137) Ning Xie 2026-01-05 02:45:37 +08:00
0d4044edd8 fix no think of GLM-4.5 / GLM-4.7 (#31449) Yuxuan Zhang 2026-01-04 11:43:00 +08:00
41ab179738 [Docs] Fix argparse include path for mm-processor benchmark (#31654) Reagan Lee 2026-01-03 19:31:29 -08:00
268b1c55ad [MoE Refactor][13/N] Convert FI to Use PFNoEP (#31533) Robert Shaw 2026-01-03 15:26:36 -05:00
4f9ce35afe [CI][Bugfix] Fix token counting in chunked prefill compl test (#31630) Andreas Karatzas 2026-01-03 00:28:49 -06:00
97a01308e9 Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring (#29255) jeremyteboul 2026-01-02 20:31:09 -08:00
0eee877f67 [Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454) Xingyu Liu 2026-01-02 16:13:15 -07:00
a0e9ee83c7 [Benchmark] Fix OOM during MoE kernel tuning for large models (#31604) Alfred 2026-01-03 06:24:51 +08:00
a3f2f40947 [MoE Refactor] Explicit construct mk for flashinfer bf16 kernel (#31504) Yongye Zhu 2026-01-02 13:54:50 -08:00
5a468ff7c7 [MoE Refactor] Split invoke_fused_moe_kernel (#31050) Yongye Zhu 2026-01-02 13:47:15 -08:00
6ef770df7c [MoE] Fix output_shape calculation in Attention layer to handle 3D query inputs (#31596) Andreas Karatzas 2026-01-02 09:46:23 -06:00
bd877162eb [BugFix] Support online dense model DP without overhead (#30739) Nick Hill 2026-01-02 07:36:38 -08:00
08f425bad1 CustomOp: test forward dispatch for grouped_topk (#31530) Xinyu Chen 2026-01-02 23:04:01 +08:00
a01f2faedf Add multimodal input method in the documentation (#31601) labAxiaoming 2026-01-02 20:43:30 +08:00
cc410e8644 [Bugfix] Fix weight_loader v1 block scale (#31103) Kyuyeun Kim 2026-01-01 21:14:10 -08:00
825c2dc133 [Bugfix][Hardware][AMD] Fix last_page_len calculation in AITER MLA decode (#31282) Kevin McKay 2026-01-01 23:14:00 -06:00
1f43c121d5 Remove unused use_marlin variable in Mxfp4MoEMethod (#31549) Vaibhav Sourirajan 2026-01-02 00:13:36 -05:00
ca179d0f64 [Bugfix] Fix activation quantization for compressed-tensors W4A16 (#31572) Tmn07 2026-01-02 13:13:22 +08:00
013b54088c [ROCm][CI] Fix ModernBERT token classification test (#31612) Andreas Karatzas 2026-01-01 22:19:08 -06:00
5ac55eb30f [Model] Enable LoRA support for tower and connector in LLaVA (#31513) Jay Hemnani 2026-01-01 19:32:39 -08:00
ea53ca5e85 [Bugfix] Fix block size used in EAGLE slot mapping (#31540) Benjamin Chislett 2026-01-01 22:32:30 -05:00
27864a851c feat: support LoRA for DeepSeek-OCR(Language Model part) (#31569) zhima771 2026-01-02 11:32:11 +08:00
5cc4876630 [ROCm][CI] Fix failure in Language Models Tests (Extra Standard) by reducing agent pool size (#31553) Andreas Karatzas 2026-01-01 21:29:42 -06:00
5fff44064b [Bugfix] Replace BaseException with specific exceptions in FLA utils (#31590) Kevin McKay 2026-01-01 21:27:54 -06:00
1f5b7c41c3 Add Multimodal Processor Benchmark (#29105) Reagan Lee 2026-01-01 19:26:53 -08:00
adcf682fc7 [Audio] Improve Audio Inference Scripts (offline/online) (#29279) Ekagra Ranjan 2025-12-31 18:34:18 -05:00
21de6d4b02 [CI][Bugfix] Fix token counting in chunked prefill streaming test (#31565) Andreas Karatzas 2025-12-31 17:05:14 -06:00
6c2cfb62ff [BugFix] Fix async scheduling for pooling models (#31584) Nick Hill 2025-12-31 14:48:51 -08:00
d8da76f3b7 [Bugfix] Fix BAGEL online serving for text and image understanding (#31546) Fanjiang Ye 2025-12-31 16:46:10 -06:00
d722e9e614 Add GLM-ASR multimodal support (#31436) baonudesifeizhai 2025-12-31 10:12:24 -05:00
cf16342d43 [ROCm][CI] Update MiniCPM model test: MiniCPM3-4B to MiniCPM4.1-8B and simplify attention backend testing (#31551) Andreas Karatzas 2025-12-31 02:12:01 -06:00
357d435c54 [Bug] Fix log issue with \n (#31390) Wentao Ye 2025-12-31 00:16:55 -05:00
108a2728f7 Add get_expert_mapping to NemotronHModel (for LoRA support) (#31539) danisereb 2025-12-31 07:09:03 +02:00
578c8f51f6 [CI] [Critical] [CUDA] Fix duplicated test name (#31562) TJian 2025-12-31 14:01:09 +09:00
b4bb5f312f [Core] Remove unused num_tokens parameter from _init_model_kwargs (#31517) maang-h 2025-12-31 12:47:23 +08:00
70e1acefcd [BugFix] Fix NUMA node validation in CPU platform (#31520) SameerAsal 2025-12-30 20:06:49 -08:00
84f6cd741b [Mics] add pcp basic support to MoE model (#31003) Qiu 2025-12-31 12:01:29 +08:00
ecd49ce7e6 [Fix] Align fused moe lora_b shape with peft (#31534) B-201 2025-12-31 09:44:59 +08:00
e1ee11b2a5 Add docker buildx bake configuration (#31477) Amr Mahdi 2025-12-30 17:08:54 -08:00
04147dcfa7 [Bugfix]Fix pooling model always disabled due to incorrect PP rank check (#31505) vintipandey 2025-12-30 11:27:10 -08:00
07728bf5cd [BugFix] add select_gemm_impl on CompressedTensorsWNA16MoEMethod to support LoRA (#31453) JartX 2025-12-30 20:20:15 +01:00
3f52fa5aa2 [Model] Add support for openPangu moe model (#28775) yt0428 2025-12-31 00:11:38 +08:00
7157596103 [CPU] Disable async schedule on CPU (#31525) Li, Jiang 2025-12-30 20:34:08 +08:00
ab1af6aa3e [CI][NIXL] Split DPEP tests (#31491) Nicolò Lucchesi 2025-12-30 13:26:12 +01:00
1a834df2d4 [ROCm][Bugfix] Fix accuracy issue on fmoe when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS enabled (#31523) Pleaplusone 2025-12-30 17:21:49 +08:00
51085c2aeb [Frontend] add continue_final_message parameter to /embeddings endpoint (#31497) Kevin 2025-12-29 23:21:13 -08:00
3d973764ce [xpu] [bugfix] upgrade to latest oneccl in dockerfile (#31522) Roger Feng 2025-12-30 14:52:28 +08:00
3b312fb792 [Minor] Various small code cleanups/simplifications (#31508) Nick Hill 2025-12-29 22:42:06 -08:00
f84bf7d79b Add Loraconfig parameter to get_punica_wrapper function (#31408) ZT-AIA 2025-12-30 14:27:31 +08:00
99dcf5dcc5 Migrate meetups & sponsors [2/N] (#31500) Roy Wang 2025-12-30 12:26:15 +08:00
dc837bc23e feat(frontend): add --default-chat-template-kwargs CLI argument (#31343) Hojin Yang 2025-12-30 12:38:47 +09:00

... 30 31 32 33 34 ...