Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

eac3b96ec0 [Models] Allow converting Qwen3-VL into Reranker model (#31890) Isotr0py 2026-01-08 16:10:15 +08:00
573a1d1119 [ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch (#31905) Zhiwei 2026-01-08 15:47:44 +08:00
33156f56e0 [docker] A follow-up patch to fix #30913: [docker] install cuda13 version of lmcache and nixl (#31775) Shang Wang 2026-01-08 02:47:02 -05:00
107cf8e92f fix(rocm): Add get_supported_kernel_block_sizes() to ROCM_ATTN (#31712) Rabi Mishra 2026-01-08 13:16:07 +05:30
63baa28cf5 [Model] Enable LoRA support for tower and connector in GLM4-V (#31652) Zyyeric 2026-01-08 01:45:53 -06:00
e5173d3bac [Bugfix] Remove the num_hidden_layers override for glm4_moe (#31745) Andy Liu 2026-01-07 23:45:10 -08:00
d3235cb503 [Fix] Enable mm_processor_cache with vision LoRA (#31927) prashanth058 2026-01-08 01:31:51 -06:00
287b37cda4 [BugFix] Fix spec decoding edge case bugs (#31944) Nick Hill 2026-01-07 23:31:03 -08:00
791b2fc30a [grpc] Support gRPC server entrypoint (#30190) Chang Su 2026-01-07 23:24:46 -08:00
be6a81f31b [chore] Update FA commit (#30460) Lucas Wilkinson 2026-01-08 02:24:18 -05:00
2ab441befe [platform] add dp_metadata arg to set_additional_forward_context (#31942) Ronald 2026-01-08 14:56:44 +08:00
9572f74f15 [Model] Enable LoRA support for tower and connector in DotsOCR (#31825) ShaanveerS 2026-01-08 07:50:16 +01:00
5f2a473ff3 [ROCm][CI] v1 cpu offloading attention backend fix (#31833) Andreas Karatzas 2026-01-08 00:37:50 -06:00
6b2a672e47 [Doc] Add Claude code usage example (#31188) Michael Goin 2026-01-08 00:50:23 -05:00
f1b1bea5c3 [CI][BugFix][AMD] Actually skip tests marked @pytest.mark.skip_v1 (#31873) rasmith 2026-01-07 23:06:09 -06:00
cddbc2b4b2 [ROCm][CI] Add rocm support for run-multi-node-test.sh (#31922) Charlie Fu 2026-01-07 22:36:39 -06:00
087a138963 [ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory (#31928) Andreas Karatzas 2026-01-07 22:35:25 -06:00
c4041f37a4 [ROCm][LoRA] Fix MoE accuracy regression by preserving float32 router weight scaling (#31931) Andreas Karatzas 2026-01-07 22:17:56 -06:00
a79079feef [BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (#31915) Richard Zou 2026-01-07 23:04:58 -05:00
9f6dcb71ae [MoE Refactor][16/N] Apply Refactor to NVFP4 (#31692) Robert Shaw 2026-01-07 22:46:27 -05:00
8dd2419fa9 [CI] Skip Qwen-VL in multimodal processing tests due to flaky external dependency (#31932) Andreas Karatzas 2026-01-07 20:58:01 -06:00
39d82005f7 fix(rocm): add early return in get_flash_attn_version for ROCm (#31286) Rabi Mishra 2026-01-08 07:58:07 +05:30
25eef3dc2e feat(moe): Add is_act_and_mul=False support for Triton MoE kernels (#31645) Rabi Mishra 2026-01-08 07:57:09 +05:30
0d7667419f [0/N][Attention] Fix miscellaneous pre-commit issues (#31924) Matthew Bonanni 2026-01-07 20:15:17 -05:00
5dcd7ef1f2 [MoE Refactor][15/N] Apply Refactor to Fp8 (#31415) Robert Shaw 2026-01-07 19:42:33 -05:00
ffc0a2798b Add back missing DeepEP LL params (#31911) Elvir Crnčević 2026-01-07 23:47:54 +01:00
10ef65eded [BugFix] Fix bad words with speculative decoding (#31908) Nick Hill 2026-01-07 12:46:42 -08:00
6170d47d22 [EPLB] Optimize EPLB with numpy (#29499) Ilya Markov 2026-01-07 21:21:35 +01:00
0ada960a20 [Kernel] Support bias type in grouped_topk kernel (#31781) Xin Yang 2026-01-07 12:16:32 -08:00
c907d22158 [refactor] refactor memory constants usage (#31865) Ning Xie 2026-01-08 02:37:31 +08:00
f347ac6c34 [Perf] Fuse stride preparation for NVFP4 cutlass_moe (#31837) Michael Goin 2026-01-07 13:31:26 -05:00
05f47bd8d2 [Doc] Fix: Correct vLLM announcing blog post link in docs (#31868) Festus Ayobami Owumi 2026-01-07 18:06:42 +00:00
bf184a6621 Enable quantized attention in NemotronH models (#31898) roikoren755 2026-01-07 19:37:19 +02:00
30399cc725 UX: add vLLM env info in '/server_info' (#31899) Jee Jee Li 2026-01-08 01:13:02 +08:00
b89443b8d9 [KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761) Kfir Toledo 2026-01-07 18:59:43 +02:00
1d9e9ae8a4 [Bugfix]: prevent leaking tokens in crash log (#30751) Marko Rosenmueller 2026-01-07 17:15:19 +01:00
b7036c87a1 [Refactor] Clean up pooler modules (#31897) Cyrus Leung 2026-01-08 00:07:43 +08:00
cc6dafaef2 [Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 Linear Op) (#29213) Kate Cheng 2026-01-07 07:53:54 -08:00
1ab055efe6 [OpenAI] Extend VLLMValidationError to additional validation parameters (#31870) R3hankhan 2026-01-07 20:15:49 +05:30
b665bbc2d4 [Chore] Migrate V0 attention utils (#31891) Cyrus Leung 2026-01-07 21:44:36 +08:00
974138751b [Refactor] GLM-ASR Modeling (#31779) Jared Wen 2026-01-07 21:08:29 +08:00
41cfa50632 [ROCm][AITER] fix wrong argument passed to AITER flash_attn_varlen_func (#31880) vllmellm 2026-01-07 12:25:03 +01:00
d111bc53ad [Bugfix][MTP] Fix GLM4 MoE fp8 loading with MTP on (#31757) Andy Liu 2026-01-07 01:18:52 -08:00
0790f07695 [Misc] Improve error messages for unsupported types and parameters (#30593) BlankR 2026-01-07 01:00:16 -08:00
1f33e38e81 [Model] Cleanup: Remove redundant manual definition of make_empty_intermediate_tensors in GLM-4-MoE (#31869) maang 2026-01-07 16:18:28 +08:00
59fe6f298e [XPU]fallback to TRITON_ATTN on xpu when use float32 dtype (#31762) sihao_li 2026-01-07 16:10:29 +08:00
e7596371a4 [Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808) weiyu 2026-01-07 00:07:16 -08:00
0dd5dee9b9 [Bugfix][Kernel] fix bias adding in triton kernel implemented fused moe (#31676) xuebwang-amd 2026-01-07 15:36:13 +08:00
4614c5a539 [Bugfix][Hardware][AMD] Consolidate FP8 min/max values helper function (#31106) Kevin McKay 2026-01-07 00:55:03 -06:00
482914849c [BugFix] LoRA: Support loading base_layer of experts (#31104) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2026-01-07 08:49:39 +02:00
efeaac92f2 [Bugfix] Fix race condition in async-scheduling for vlm model (#31841) tianshu-Michael-yu 2026-01-06 22:45:10 -08:00
55caa6051d refactor: find_loaded_library (#31866) tjp_zju 2026-01-07 14:42:20 +08:00
c7a79d41a0 [Attention][3/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties (#31850) Lucas Wilkinson 2026-01-07 00:31:34 -05:00
6409004b26 [ROCm][AITER] bugfix accuracy regression in ROCM_AITER_TRITON_MLA backend (#31816) vllmellm 2026-01-07 06:04:53 +01:00
aafd4d2354 [Chore] Try remove init_cached_hf_modules (#31786) Cyrus Leung 2026-01-07 12:34:04 +08:00
0a2c2dc3f1 fixed mypy warnings for files vllm/v1/attention with TEMPORARY workaround (#31465) Jack Yang 2026-01-06 23:08:47 -05:00
f09c5feb7c Change warning in get_current_vllm_config to report caller's line number (#31855) Tyler Michael Smith 2026-01-06 22:48:13 -05:00
1b8af957f6 [Doc] Update release docs (#31799) Cyrus Leung 2026-01-07 11:27:40 +08:00
a051525e07 [Model] Enable LoRA support for PaliGemma (#31656) Ce Zhao 2026-01-06 21:09:32 -05:00
5b833be49e [1/2][lmcache connector] clean up lmcache multi-process adapter (#31838) Yihua Cheng 2026-01-06 18:02:42 -08:00
873480d133 [Misc][BE] Type coverage for vllm/compilation [1/3] (#31554) Lucas Kabela 2026-01-06 17:37:51 -08:00
6f351548b2 [Frontend] Implement robust video frame recovery for corrupted videos (#29197) vSeamar 2026-01-06 17:13:24 -08:00
364a8bc6dc [ROCm][CI] Fix plugin tests (2 GPUs) failures on ROCm and removing VLLM_FLOAT32_MATMUL_PRECISION from all ROCm tests (#31829) Andreas Karatzas 2026-01-06 19:12:23 -06:00
9a1d20a89c [CI] Add warmup run in test_fusion_attn (#31183) Angela Yi 2026-01-06 16:31:52 -08:00
309a8f66ee [Bugfix] Handle mistral tokenizer in get_hf_processor (#31817) Cyrus Leung 2026-01-07 07:46:56 +08:00
e5d427e93a [ROCm][CI] Pinning timm lib version to fix ImportError in Multi-Modal Tests (Nemotron) (#31835) Andreas Karatzas 2026-01-06 17:23:11 -06:00
2a42ae790d [ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm (#31820) Andreas Karatzas 2026-01-06 17:21:15 -06:00
d49899732e [Spec Decode][UX] Add acceptance stats to vllm bench serve report (#31739) Matthew Bonanni 2026-01-06 16:21:42 -05:00
dba95378a6 Report error log after vllm bench serve (#31808) Elvir Crnčević 2026-01-06 21:24:19 +01:00
ada6f91d56 Fix RecursionError in MediaWithBytes unpickling (#31191) Nikhil G 2026-01-06 12:11:26 -08:00
8becf146bd [Quantization][Refactor] Move CPU GPTQ kernel into MP linear (#31801) Li, Jiang 2026-01-07 03:10:18 +08:00
c07163663d [ROCm][CI] Fix tests/compile unit tests (#28895) Charlie Fu 2026-01-06 12:50:43 -06:00
f7008ce1c4 [Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821) Benjamin Chislett 2026-01-06 13:50:37 -05:00
4e67a8f616 [Bugfix] Fix GLM-4 MoE router logits dtype for data parallel chunking (#31055) Yakine Tahtah 2026-01-06 18:57:56 +01:00
142c4d1738 make 500: InternalServerError more informative (#20610) Masataro Asai 2026-01-06 12:36:24 -05:00
6f5e653383 [Log] add log about gpu worker init snapshot and requested memory (#29493) Ning Xie 2026-01-07 01:32:55 +08:00
22dffca982 [PERF] Speed-up of GDN attention decode part (Qwen3-Next) (#31722) Vadim Gimpelson 2026-01-06 21:32:46 +04:00
4c73be14e0 [Attention][2/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties (#31774) Lucas Wilkinson 2026-01-06 12:32:14 -05:00
2f4bdee61e [Quantization][MoE] remove unused ep logic from moe marlin (#31571) Jinzhen Lin 2026-01-07 01:07:19 +08:00
28c94770ad [NemotronH] Use ReplicatedLinear for fc1_latent_proj (#31807) roikoren755 2026-01-06 18:00:40 +02:00
af8fd73051 [MoE Refactor][14/N] Clean Up FI Quant Config Smuggling (#31593) Robert Shaw 2026-01-06 10:47:04 -05:00
d3e477c013 [MoE Refactor] Add Temporary Integration Tests - H100/B200 (#31759) Robert Shaw 2026-01-06 10:34:17 -05:00
02809af1e7 [Bugfix]: Fix cross attention backend selection for Turing GPU (#31806) Isotr0py 2026-01-06 23:15:56 +08:00
cbd4690a03 [LoRA]Disable linear LoRA kernel PDL (#31777) Jee Jee Li 2026-01-06 23:12:25 +08:00
96860af655 [Model] rename use_pad_token to use_sep_token (#31784) wang.yuqi 2026-01-06 22:16:04 +08:00
0202971a48 [Frontend] Support GLM-4.5 / GLM-4.7 with enable_thinking: false (#31788) Chauncey 2026-01-06 21:53:21 +08:00
2c1a4f2488 [Bugfix]: avoid overriding audio/text kwargs (Qwen3-Omni) (#31790) Jzz1943 2026-01-06 20:59:17 +08:00
6444824873 [Misc] Implement TokenizerLike.convert_tokens_to_ids (#31796) Cyrus Leung 2026-01-06 20:08:22 +08:00
bf0f3a4638 [Bugfix] Fix torch.compile error for DP + MoE on CPU Backend (#31650) kzwrime 2026-01-06 20:06:20 +08:00
e0327c9db2 [Attention][1/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties (#31773) Lucas Wilkinson 2026-01-06 07:05:17 -05:00
14df02b4e1 [Chore] Cleanup mem_utils.py (#31793) Cyrus Leung 2026-01-06 19:55:59 +08:00
6ebb66ccea [Doc] Fix format of multimodal_inputs.md (#31800) BlankR 2026-01-06 03:30:24 -08:00
43d384bab4 [CI] Increase the MTEB_EMBED_TOL threshold to 5e-4. (#31797) wang.yuqi 2026-01-06 19:30:05 +08:00
db318326a5 [Misc] Use deprecated for seed_everything (#31780) Cyrus Leung 2026-01-06 19:29:55 +08:00
799b5721f6 [cpu][bench] Add CPU paged attention benchmarks (#31720) Fadi Arafeh 2026-01-06 10:57:57 +00:00
97ca4c3b60 [Chore] Remove more V0 dead code from sequence.py (#31783) Cyrus Leung 2026-01-06 18:25:14 +08:00
ee2e69d6cd [Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff (#31776) Isotr0py 2026-01-06 16:44:22 +08:00
7101e0851f [Models]: Use MMEncoderAttention for MoonViT (#31738) Isotr0py 2026-01-06 16:00:25 +08:00
e9717801bd [Bugfix][ROCm] Fix Unsupported attention metadata type for speculative decoding in eagle.py (#31714) vllmellm 2026-01-06 08:53:22 +01:00
da71d44410 [Doc] Show that use_audio_in_video is supported in docs (#30837) Cyrus Leung 2026-01-06 15:27:19 +08:00

... 29 30 31 32 33 ...