Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

a676e668ee [Bugfix] fix apply_temperature to avoid nan in probs (#24734) courage17340 2025-09-25 13:32:21 +08:00
c85be1f6dd optimize: eliminate duplicate split_enc_dec_inputs calls (#25573) Nicole LiHui 🥜 2025-09-25 13:03:25 +08:00
845adb3ec6 [Model] Add LongCat-Flash (#23991) XuruiYang 2025-09-25 12:53:40 +08:00
90b139cfff Enable Fbgemm NVFP4 on Dense models (#25609) Saman A. Pour 2025-09-24 21:12:53 -07:00
4492e3a554 [Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super() (#25613) Wentao Ye 2025-09-24 21:52:52 -04:00
05c19485a5 [Kernel] Support DCP for Triton backend (#25132) Wei Wei 2025-09-24 18:09:34 -07:00
52d0cb8458 [Model] Improve DotsOCRForCausalLM (#25466) Jee Jee Li 2025-09-25 07:58:08 +08:00
5c1e496a75 [MISC] replace c10::optional with std::optional (#25602) Shiyan Deng 2025-09-24 16:56:21 -07:00
e7f27ea648 Improve --help for enhanced user experience (#24903) Harry Mellor 2025-09-25 00:08:18 +01:00
1f29141258 [Refactor] Use DeepGEMM Col Major TMA Aligned Tensor (#25517) Wentao Ye 2025-09-24 18:52:36 -04:00
6160ba4151 feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel (#25503) Duncan Moss 2025-09-24 15:50:04 -07:00
fea8006062 [Logging] Improve log for when DeepEP HT disables CUDA Graphs (#25531) Tyler Michael Smith 2025-09-24 18:43:06 -04:00
e6750d0b18 [V0 Deprecation] Remove unused classes in attention (#25541) Woosuk Kwon 2025-09-24 13:24:40 -07:00
8c853050e7 [Docs] Enable fail_on_warning for the docs build in CI (#25580) Harry Mellor 2025-09-24 20:30:33 +01:00
f84a472a03 Suppress benign cuBLAS warning when capturing cudagraphs with DBO (#25596) Sage Moore 2025-09-24 12:02:08 -07:00
54e42b72db Support mnnvl all2allv from Flashinfer (#21003) Shu Wang 2025-09-24 13:38:16 -05:00
2dda3e35d0 [Bugfix] add cache model when from object storage get model (#24764) rongfu.leng 2025-09-25 02:11:16 +08:00
d83f3f7cb3 Fixes and updates to bench_per_token_quant_fp8 (#25591) Michael Goin 2025-09-24 11:30:15 -04:00
302eb941f3 [ROCm][Build][Bugfix] Fix ROCm base docker whls installation order (#25415) Gregory Shtrasberg 2025-09-24 11:25:10 -04:00
487745ff49 [ROCm][Bugfix] Only enable +rms_norm based on aiter if not explicitly disabled (#25275) Gregory Shtrasberg 2025-09-24 11:24:39 -04:00
9313be5017 [Misc] Improve type annotations for jsontree (#25577) Cyrus Leung 2025-09-24 22:49:58 +08:00
8938774c79 Move DeviceConfig, ObservabilityConfig, SpeechToTextConfig to their own files (#25564) Harry Mellor 2025-09-24 14:59:05 +01:00
e18b714b2e [Bugfix] Fix DeepSeekV31ToolParser to correctly parse multiple tools in non-streaming output (#25405) Tao Hui 2025-09-24 20:58:00 +08:00
b1068903fd [docs] fix nixl kv_connector_extra_config.backends key (#25565) Peter Pan 2025-09-24 19:00:27 +08:00
164299500b [Benchmark] Fix regression in structured output benchmark (#25500) Russell Bryant 2025-09-24 06:40:42 -04:00
58c360d9be [Bug] fix import and unit test (#25558) Jonas M. Kübler 2025-09-24 12:17:59 +02:00
42488dae69 [Bugfix] Fix dummy video number of frames calculation (#25553) Roger Wang 2025-09-24 02:47:30 -07:00
b67dece2d8 [misc] update the warning message (#25566) youkaichao 2025-09-24 17:24:35 +08:00
2338daffd3 [BugFix] Potential Fix for FA3 full-cudagraph IMA (#25490) Lucas Wilkinson 2025-09-24 05:04:04 -04:00
2e19a848d4 [V0 Deprecation] Remove max_seq_len_to_capture (#25543) Woosuk Kwon 2025-09-24 01:51:39 -07:00
77a7fce1bb [CI/Build] add nightly prime-rl integration tests (#25207) Jackmin801 2025-09-24 01:44:22 -07:00
6488f3481b [Misc]] Move processing context to multimodal directory (#25548) Cyrus Leung 2025-09-24 16:15:00 +08:00
27ec3c78f3 [CI/Build] Fix v1 OOT registration test (#25547) Isotr0py 2025-09-24 16:03:13 +08:00
1cbcfb94de [Bugfix][CPU] Skip unsupported custom op register on CPU (#25534) Li, Jiang 2025-09-24 14:21:51 +08:00
fed8a9b107 [Misc] Retry HF processing if "Already borrowed" error occurs (#25535) Cyrus Leung 2025-09-24 13:32:11 +08:00
190c45a6af [TPU][Bugfix] fix the missing apply_model in tpu worker (#25526) Chengji Yao 2025-09-23 22:18:08 -07:00
5caaeb714c [Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls (#25514) Ben Browning 2025-09-23 23:20:38 -04:00
d747c2ef18 [Perf] Fix jit compiles at runtime of fla gated delta rule (#25432) Corey Lowman 2025-09-23 23:16:13 -04:00
c30b405b8f [Spec Decode] Enable FlashInfer Spec Decoding (#25196) Benjamin Chislett 2025-09-23 22:29:58 -04:00
77d906995c [KV sharing] Re-land Gemma3n model changes from #22628 (#24357) Yong Hoon Shin 2025-09-23 19:25:34 -07:00
359d293006 [fix]: add Arm 4bit fused moe support (#23809) Nikhil Gupta 2025-09-24 02:32:22 +01:00
9df8da548e [BugFix] Fix MLA assert with CUTLASS MLA (#25478) Lucas Wilkinson 2025-09-23 21:09:43 -04:00
bf68fd76a9 [Compile] Fix AMD Compile Error (#25518) Wentao Ye 2025-09-23 20:42:48 -04:00
de94289a98 [Core] Support weight_loader_v2 for UnquantizedLinearMethod (#23036) Kyle Sayers 2025-09-24 01:30:26 +01:00
1983609239 [Bugfix] Use a separate FlashInfer workspace buffer for trtllm-gen (#25520) Benjamin Chislett 2025-09-23 20:19:56 -04:00
d06b5a95cb [V1][Metrics] Add per-request TPOT histogram (#24015) baxingpiaochong 2025-09-24 08:19:04 +08:00
be0bb568c9 [Model] Support SeedOss Reason Parser (#24263) 0xNullPath 2025-09-24 08:15:51 +08:00
c8bde93367 [BUG] Allows for RunAI Streamer and Torch.compile cache to be used together (#24922) ahao-anyscale 2025-09-23 17:13:32 -07:00
88d7bdbd23 [Bug] Fix AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv' (#25519) Wentao Ye 2025-09-23 20:07:51 -04:00
0d235b874a Add CUTLASS FP8 MOE benchmark scripts and kernel config (#25302) Chenxi Yang 2025-09-23 17:07:42 -07:00
7ad5e50adf Improve output when failing json.loads() on structured output test (#25483) Doug Smith 2025-09-23 20:03:31 -04:00
dc464a3d39 [BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch (#25505) Lucas Wilkinson 2025-09-23 20:00:29 -04:00
1210e4d95b [Bugfix] [B200] cutlass_mla - ensure kv_split == 1 for batch size > 1 (#25509) Alexander Matveev 2025-09-23 19:57:55 -04:00
e0b24ea030 [Perf] Increase default max splits for FA3 full cudagraphs (#25495) Lucas Wilkinson 2025-09-23 19:53:34 -04:00
bde2a1a8a4 [ROCm] Small functional changes for gptoss (#25201) Juan Villamizar 2025-09-23 18:39:50 -05:00
5e25b12236 [Kernel] [Mamba] Remove BLOCK_H=1 from list of tuneable configurations for _chunk_cumsum_fwd_kernel (#25197) Thomas Parnell 2025-09-24 01:23:30 +02:00
c85d75cf08 Add VLLM_NVTX_SCOPES_FOR_PROFILING=1 to enable nvtx.annotate scopes (#25501) Corey Lowman 2025-09-23 18:50:09 -04:00
abad204be6 [BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory accounting (#25359) kourosh hakhamaneshi 2025-09-23 15:49:09 -07:00
7361ab379f Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512) Michael Goin 2025-09-23 18:48:40 -04:00
95bc60e4cb [gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI (#25428) Andrew Xia 2025-09-23 15:46:46 -07:00
4f2954f724 Fix triton_reshape_and_cache_flash.py triton import (#25522) Michael Goin 2025-09-23 18:26:10 -04:00
eca7be9077 Add VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE & VLLM_ENABLE_INDUCTOR_COORDINA… (#25493) rouchenzi 2025-09-23 15:17:49 -07:00
969b4da3a6 [V0 Deprecation] Remove placeholder attn (#25510) Thomas Parnell 2025-09-24 00:12:14 +02:00
4f8c4b890a [Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] (#24830) Jialin Ouyang 2025-09-23 15:11:14 -07:00
ae002924e9 [CI/Build] Fix and re-enable v1 PP test on CI (#25496) Isotr0py 2025-09-24 05:58:25 +08:00
690f948e4a [Bugfix] Fix for the import error from #24588 (#25481) Gregory Shtrasberg 2025-09-23 17:31:08 -04:00
08275ec0a2 [Build] Update Xgrammar to 0.1.25 (#25467) Chauncey 2025-09-24 05:25:46 +08:00
c828d1bf98 [Bugfix] gpt-oss container tool output bug (#25485) Alec S 2025-09-23 16:43:45 -04:00
8b8a8afc89 [CI] Fix Pre-commit Issue (#25497) Wentao Ye 2025-09-23 16:09:37 -04:00
8bdd8b5c51 Enable symmetric memory all reduce by default only enabling for TP (#25070) Ilya Markov 2025-09-23 21:53:00 +02:00
a8ffc4f0f2 [Bugfix] Lower gpt-oss max cudagraph size to 992 to be compatible with FA3 (#25508) Michael Goin 2025-09-23 15:49:55 -04:00
d5944d5146 [Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406) jiahanc 2025-09-23 12:44:35 -07:00
24fab45d96 [Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE (#25444) Michael Goin 2025-09-23 15:29:26 -04:00
63400259d0 [Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666) ElizaWszola 2025-09-23 21:03:10 +02:00
8c1c81a3de [core] add nccl symmetric memory for all reduce (#24532) Amir Samani 2025-09-23 11:33:06 -07:00
a3a7828010 [ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 (#24988) Hashem Hashemi 2025-09-23 11:31:45 -07:00
5abb117901 [Core] Ensure LoRA linear respect the base_layer's tp_size and tp_rank (#25487) Jee Jee Li 2025-09-24 02:19:25 +08:00
867ecdd1c8 [Spec Decode][CI] Add e2e test for examples/spec_decode.py and prevent breaking Acceptance Length (#24531) Ekagra Ranjan 2025-09-23 13:46:40 -04:00
24e8222745 [Misc] Reduce initialization time of auto_tune (#23682) Weida Hong 2025-09-24 01:34:58 +08:00
100b630a60 [V1][Kernel] Add triton implementation for reshape_and_cache_flash (#24503) Burkhard Ringlein 2025-09-23 18:52:40 +02:00
527821d191 Use macro guard CUDA functions for back compatibility in grouped_topk_kernel.cu (#25346) Ming Yang 2025-09-23 09:45:39 -07:00
846197f505 [Log] Optimize kv cache memory log from Bytes to GiB (#25204) Wentao Ye 2025-09-23 12:44:37 -04:00
2357480b1a [BugFix] Fix UB in per_token_group_quant.cu (#24913) rivos-shreeasish 2025-09-23 09:14:22 -07:00
f11e3c516b [Kernels] Support blocked fp8 quantization for compressed tensors MoE (#25219) bnellnm 2025-09-23 12:11:34 -04:00
875d6def90 Add backward compatibility for GuidedDecodingParams (#25422) Harry Mellor 2025-09-23 17:07:30 +01:00
cc1dc7ed6d [Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845) Lucas Wilkinson 2025-09-23 12:02:10 -04:00
a903669e10 [V1] Remove V0 code paths for Hybrid models (#25400) Thomas Parnell 2025-09-23 17:26:13 +02:00
2c58742dff [UX] Change kv-cache-memory log level to debug (#25479) Michael Goin 2025-09-23 11:01:24 -04:00
4c966e440e [XPU] Fix MOE DP accuracy issue on XPU (#25465) Fanli Lin 2025-09-23 22:32:57 +08:00
da5e7e4329 [Docs] NixlConnector quickstart guide (#24249) Peter Pan 2025-09-23 22:23:22 +08:00
f05a4f0e34 [P/D] Support NIXL connector to disconnect during a clean shutdown (#24423) Chauncey 2025-09-23 22:08:02 +08:00
61d1b35561 [BugFix] Register expert_map as named buffer for wake_up and sleep (#25458) Joel 2025-09-23 21:49:13 +08:00
b6a136b58c [CI/Build] Fix disabled v1 attention backend selection test (#25471) Isotr0py 2025-09-23 21:05:46 +08:00
0d9fe260dd [docs] Benchmark Serving Incorrect Arg (#25474) vllmellm 2025-09-23 21:05:11 +08:00
273690a50a [Core] Optimize LoRA weight loading (#25403) Jee Jee Li 2025-09-23 18:19:45 +08:00
231c2c63e4 [Bugfix] Fix idefics3 tie_word_embeddings (#25454) Isotr0py 2025-09-23 18:06:48 +08:00
4322c553a6 [Test]: Hermes tool parser stream output error in Qwen3 case (#25203) Andreas Hartel 2025-09-23 11:56:31 +02:00
babad6e5dd [Misc] Move DP for ViT code inside model executor dir (#25459) Cyrus Leung 2025-09-23 17:20:52 +08:00
9383cd6f10 [Frontend] Add a new xml-based tool parser for qwen3-coder (#25028) Zhikaiiii 2025-09-23 16:07:27 +08:00
ba8d2165b6 Handle triton kernel import exception (#25319) Ming Yang 2025-09-23 00:56:00 -07:00

... 59 60 61 62 63 ...