Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

326e7c3105 [Doc] Add Sophgo TPU Support (#30949) wzyrrr 2025-12-19 00:29:33 +08:00
0db5439ded [Bugfix][torch2.10] Fix test_qwen2_5_vl_compilation with 2.10 RC (#30822) Lucas Kabela 2025-12-18 08:23:31 -08:00
28d15ab56b adds jais 2 support (#30188) sarathc-cerebras 2025-12-18 21:16:58 +05:30
6628758233 [Bug] Fix batch invariant in torch 2.10 (#30907) Wentao Ye 2025-12-18 10:27:51 -05:00
eee600c34f [Misc] support nsys profile for bench latency (#29776) zhrrr 2025-12-18 22:52:20 +08:00
100f93d2be Filter safetensors files to download if .safetensors.index.json exists (#30537) Michael Goin 2025-12-18 09:51:17 -05:00
96bf50a2c0 [ROCm] Serving Fails on Radeon Due to AITER Dtype Import (#30952) vllmellm 2025-12-18 19:47:46 +08:00
f90d3636e2 [Bugfix][CPU] Fix Mac CPU build (#30955) Li, Jiang 2025-12-18 17:38:22 +08:00
8372be2828 [moe] Use enable_chunking func (to support disabling chunking) (#29935) Ming Yang 2025-12-18 01:02:38 -08:00
8da6ae49c3 [ROCm][Bugfix] Fix fa_version argument error in flash_attn_maxseqlen_wrapper for ROCm without aiter (#30909) Andreas Karatzas 2025-12-18 02:45:51 -06:00
2c0ee0fde8 [BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910) Lucas Wilkinson 2025-12-18 02:50:15 -05:00
30bb19a760 [BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910) Lucas Wilkinson 2025-12-18 02:50:15 -05:00
aa7e836055 [Bugfix] Fix Unicode issues in GLM-4 tool calling (#30920) Chauncey 2025-12-18 15:12:17 +08:00
be2ad5f920 [ROCm][Bugfix] fix(structured_output): Skip guidance backend for schemas with patternProperties (#30730) Andreas Karatzas 2025-12-18 01:04:57 -06:00
a85724bd6e [Platform] Let EPD work with non-cuda platform (#30225) wangxiyuan 2025-12-18 14:45:29 +08:00
11a89cf95c [Fix][FlexAttention] return max logical block index to handle reused blocks (#30915) Yifan Qiao 2025-12-17 22:42:21 -08:00
e3ab93c896 [CPU] Refactor CPU fused MOE (#30531) Li, Jiang 2025-12-18 14:36:49 +08:00
fc2ae6d617 fix: add warmup for audio preprocessing (#30706) Nathan Price 2025-12-18 00:12:29 -06:00
ec965569d9 [KV connector][LMCache] Only record the cuda event when there are request to store/load (#30814) Yihua Cheng 2025-12-17 21:31:34 -08:00
82dc338ad6 [AMD][CI] fix lm eval ci arg (#30911) Divakar Verma 2025-12-17 23:18:26 -06:00
717ac33d9c [PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json (#29553) Vadim Gimpelson 2025-12-18 09:16:04 +04:00
cfb7e55515 [Doc][CPU] Update CPU doc (#30765) Li, Jiang 2025-12-18 12:59:09 +08:00
b166ef20e1 [refactor] Add prefix support to embed_tokens in DeepSeek MTP (#30788) zzhxxx 2025-12-18 12:45:56 +08:00
5f2f3fba1d [compile] Fix CI for test_gpt2_cache_hit (#30902) Zhengxu Chen 2025-12-17 23:22:23 -05:00
4a8412f773 [UX] Reduce DeepGEMM warmup log output to single progress bar (#30903) Matthew Bonanni 2025-12-17 23:21:51 -05:00
0c738b58bc [Quantization] Support Quark int4-fp8 w4a8 for MoE (#30071) Bowen Bao 2025-12-17 20:20:42 -08:00
55f1fc1b1b [v1] Add PrefixLM support to TritonAttention backend (#30386) v0.13.0rc4 Isotr0py 2025-12-18 08:05:24 +08:00
17f3988094 [BugFix] Workspace allocation during profile run : DeepEPHighThroughput + DeepGEMM (#30899) Varun Sundar Rabindranath 2025-12-17 18:00:59 -05:00
682c38583c [CI][Bugfix] Fix flaky tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio (#30878) Nicolò Lucchesi 2025-12-17 18:49:56 +01:00
5a3adf581e fused_moe_lora PDL improvements (#30716) gnovack 2025-12-17 19:55:00 -08:00
6fe5887652 [Chore] Remove v0 dead code for Qwen2.5-omni (#30883) Isotr0py 2025-12-18 11:54:39 +08:00
bc3700e0cd [NIXL] Support P tensor-parallel-size > D tensor-parallel-size (#27274) Nicolò Lucchesi 2025-12-18 04:53:30 +01:00
fd8afdf38d [ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811) Micah Williamson 2025-12-17 20:27:37 -06:00
a0b782f9cc [Metrics] Model FLOPs Utilization estimation (#30738) SungMinCho 2025-12-17 17:40:51 -08:00
ed2897f336 [CI][Feature] Adds auto-rebase PR rule (#30875) Rafael Vasquez 2025-12-17 19:46:44 -05:00
74a1ac38b0 [v1] Add PrefixLM support to TritonAttention backend (#30386) Isotr0py 2025-12-18 08:05:24 +08:00
05a83dc6ee feat(api): Eager chat template warmup to eliminate first-request latency (#30700) Nathan Price 2025-12-17 18:01:29 -06:00
e3fc374a9a [BugFix] Workspace allocation during profile run : DeepEPHighThroughput + DeepGEMM (#30899) Varun Sundar Rabindranath 2025-12-17 18:00:59 -05:00
e06d0bf0aa 2.9.1 PyTorch release update (#28495) Andrey Talman 2025-12-17 15:20:22 -05:00
e3a0f21e6c [docs]: add ecosystem projects sr in docs/governance (#30844) Xunzhuo 2025-12-18 02:45:56 +08:00
7eb6cb6c18 [Attention] Update tests to remove deprecated env vars (#30563) Matthew Bonanni 2025-12-17 12:49:59 -05:00
9ca8cb38fd [CI][Bugfix] Fix flaky tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio (#30878) Nicolò Lucchesi 2025-12-17 18:49:56 +01:00
2497228ad4 [Chore] Factor out logic for requesting initial memory (#30868) Cyrus Leung 2025-12-17 23:32:17 +08:00
196cdc3224 [Model] Gemma3: Support untied word embeddings (#30827) KimHyemin 2025-12-18 00:11:18 +09:00
b7b6a60aca Adapt the old parameter enable_thinking in chat_template_kwargs (#30852) 高鑫崧 2025-12-17 23:10:59 +08:00
9e67c4ce98 [Docs] fix function name (#30748) rongfu.leng 2025-12-17 20:14:45 +08:00
6e9dbcc50e [Fix] uniform decode batch check (#30747) Jialin Ouyang 2025-12-17 03:58:43 -08:00
6482e3895b chores: adjust the attn register param order (#30688) Hank_ 2025-12-17 19:58:16 +08:00
fb980eb2fd Fix lazy import (#30858) Harry Mellor 2025-12-17 11:33:50 +00:00
84896fda22 [Bugfix] deepseek-V3.2 self.weights_proj has no bias (#30841) baoqian426 2025-12-17 19:32:34 +08:00
4bf6c23668 [ci] Sync test areas yaml file with test-pipeline (#30862) Kevin H. Luu 2025-12-17 02:30:56 -08:00
9ad5b21710 [Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory (#30749) Chauncey 2025-12-17 18:27:30 +08:00
f284d7bd0c [Bug] Fix AttributeError: 'ColumnParallelLinear' object has no attribute weight_scale_inv (#30823) Wentao Ye 2025-12-17 05:00:35 -05:00
53cd7f868b [compile] Recompile graph module during Dynamo cache loading. (#30743) Zhengxu Chen 2025-12-17 05:00:12 -05:00
7b966ae2ba [Fix]Load kv-cache dtype from hf_quant_config.json automatically (fix for reverted PR) (#30785) danielafrimi 2025-12-17 11:56:38 +02:00
9db1db5949 [compile] Ignore VLLM_FORCE_AOT_LOAD from cache factors (#30809) Zhengxu Chen 2025-12-17 04:56:24 -05:00
177c391db2 [compile] Disable aot when eager backend is used. (#30810) Zhengxu Chen 2025-12-17 04:55:56 -05:00
519ef9a911 [UX] Make vllm bench serve discover model by default and use --input-len (#30816) Michael Goin 2025-12-17 04:55:30 -05:00
a100152288 [Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#30842) Ye (Charlotte) Qi 2025-12-17 01:54:21 -08:00
4c054d89aa [Doc][ResponsesAPI] add documentation (#30840) Andrew Xia 2025-12-17 17:53:02 +08:00
f4e884f222 [NIXL][Bugfix] Fix NIXL/RDMA registration failure over CuMemAllocator (#29569) Sheng Lin 2025-12-17 17:52:58 +08:00
3b1d440ede CustomOp: grouped topk (#29575) Xinyu Chen 2025-12-17 17:43:00 +08:00
a9e15c21ef [Mamba] Removed disable cascade attn in MambaModelConfig (#30712) Asaf Joseph Gardin 2025-12-17 10:48:53 +02:00
20fda43151 [Bugfix][Frontend] Prevent IndexError in MiniMax M2 tool parser during streaming extraction (#30555) Robin 2025-12-17 16:37:57 +08:00
f124b56786 [XPU] fix broken fp8 online quantization for XPU platform (#30831) v0.13.0rc3 Yan Ma 2025-12-17 16:28:13 +08:00
4f735babb7 [XPU] fix broken fp8 online quantization for XPU platform (#30831) Yan Ma 2025-12-17 16:28:13 +08:00
d78e128b8b [Bugfix][CPU] Fix CPU backend ROPE dispatch for VL models (#30829) Li, Jiang 2025-12-17 15:25:12 +08:00
761b730dcb [BugFix] Fix memory spike in workspace allocation (#30744) Lucas Wilkinson 2025-12-16 09:46:22 -05:00
0cd5353644 [Bugfix][CPU] Fix CPU backend ROPE dispatch for VL models (#30829) Li, Jiang 2025-12-17 15:25:12 +08:00
d4d2751732 Update note comment for flashinfer attention warmup (#30711) Michael Goin 2025-12-17 00:29:03 -05:00
009a773828 bump up compressed tensors version to 0.13.0 (#30799) shanjiaz 2025-12-17 00:01:04 -05:00
44d3b1df3d [CI/Build] Fix compatibility between #30244 and #30396 (#30787) Cyrus Leung 2025-12-17 12:21:19 +08:00
bb5ac1fe38 [CPU] Add action to automatically label CPU related PRs (#30678) Fadi Arafeh 2025-12-17 04:21:07 +00:00
811cdf5197 Update model-hosting-container-standards to 0.1.10 (#30815) Michael Goin 2025-12-16 20:52:14 -05:00
f34eca5f01 [ROCm] [Bugfix] Fix torch sdpa hallucination (#30789) v0.13.0rc2 TJian 2025-12-17 07:32:43 +08:00
4cd332f3cf [CI] Skip ci failure test (#30804) Wentao Ye 2025-12-16 17:47:53 -05:00
16484d394c [Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475) Roger Wang 2025-12-16 14:18:17 -08:00
e397bd6592 [CI/Build] Skip broken ViT backend functionality test tempoarily (#30782) Isotr0py 2025-12-16 22:45:25 +08:00
6a88d590bb [Bugfix] Fix broken ViT attention selection for Blackwell device (#30731) Isotr0py 2025-12-16 13:24:32 +08:00
ad8c073131 [CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic (#29873) Shanshan Shen 2025-12-16 11:08:16 +08:00
f5db6385a1 Fix nemotron_nas intermediate_size computation (#30795) Grzegorz K. Karch 2025-12-17 02:06:28 +01:00
c0a88df7f7 [docker] Allow kv_connectors install to fail on arm64 (#30806) Amr Mahdi 2025-12-17 02:41:57 +02:00
e087fbc393 [MM] Pass FA version in ViT Attn (#30756) Nicolò Lucchesi 2025-12-17 00:54:45 +01:00
e80455ca8b Replace deprecated enable_fusion with fuse_norm_quant in test_rms_group_quant (#30817) Michael Goin 2025-12-16 18:40:47 -05:00
2410132bb1 [ROCm] [Bugfix] Fix torch sdpa hallucination (#30789) TJian 2025-12-17 07:32:43 +08:00
0a1ab1e565 [Perf][Kernels] Vectorize csrc/activations_kernels.cu (#29512) Michael Goin 2025-12-16 17:56:02 -05:00
b6ec077e05 [CI] Skip ci failure test (#30804) Wentao Ye 2025-12-16 17:47:53 -05:00
ce96857fdd [Kernel][Quantization][MoE] add marlin kernel support for turing (sm75) (#29901) Jinzhen Lin 2025-12-17 06:35:28 +08:00
eaa82a709a [Bugfix][DSV32] Fix overflow in topk. (#30754) Daniel Cámpora 2025-12-16 23:21:17 +01:00
f5f51e5931 [Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475) Roger Wang 2025-12-16 14:18:17 -08:00
9fec0e13d5 [Attention] Cache attention metadata builds across hybrid KV-cache groups (#29627) Lucas Wilkinson 2025-12-16 17:10:16 -05:00
254a7f8fd6 [Perf] Do FP4 quant before All gather on flashinfer trtllmgen MOE (#30014) jiahanc 2025-12-16 13:01:48 -08:00
f21f5ea38c [Refactor] Small refactor for group topk (#30562) Wentao Ye 2025-12-16 14:50:59 -05:00
ca702a14dc [Frontend] Add max-completion-token option to transcription/translation endpoints (#30769) Nicolò Lucchesi 2025-12-16 20:36:49 +01:00
10ee1c64cf [CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test (#30723) Michael Goin 2025-12-16 14:28:34 -05:00
66c3537e5d [Docs][API] Remove warning about LoRARequest being internal-only (#30774) Mark McLoughlin 2025-12-16 16:35:46 +00:00
e1625498f4 Update where bytes_to_unicode is imported from (#30771) Harry Mellor 2025-12-16 16:05:01 +00:00
0b0acc758e Remove head_mask from Ultravox and Swin (#30764) Harry Mellor 2025-12-16 16:02:41 +00:00
af506fd76a Fix instantiation of HfHubHTTPError in LoRA test (#30768) Harry Mellor 2025-12-16 16:02:24 +00:00
ce12b407f2 [TRTLLM] Remove the MoE GEMM weight name change (#30713) Ming Yang 2025-12-16 08:01:38 -08:00

... 33 34 35 36 37 ...