Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

66a2209645 [Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085) Kunshang Ji 2026-03-05 18:36:39 +08:00
0bfa229bf1 [Release] Include source distribution (sdist) in PyPI uploads (#35136) Doug Smith 2026-03-05 04:43:50 -05:00
7493c51c55 [Docs] add Dynamo/aibrix integration and kubeai/aks link (#32767) Paco Xu 2026-03-05 17:39:50 +08:00
ac773bbe80 [Docs] Update docs to include mm processor + encoder benchmarks (#34083) Reagan Lee 2026-03-05 01:38:25 -08:00
48e376a007 qwen3coder tool parser fix anyOf double encoded parameters (#36032) Christian Munley 2026-03-05 01:06:57 -08:00
21eb2c3372 [Chore] Correct MTP models test registry ordering (#36115) Isotr0py 2026-03-05 16:55:04 +08:00
e2b31243c0 [Docs] Update CacheConfig block_size docstring to remove inaccurate limit when using CUDA (#35632) Seiji Eicher 2026-03-04 22:24:08 -08:00
c3598d02fa [Misc] Remove deprecated items that are due for removal (#36006) Martin Hickey 2026-03-05 06:14:50 +00:00
57c629e9c1 [Bugfix] Fix block_size for hybrid model MTP (#36036) Benjamin Chislett 2026-03-05 01:10:54 -05:00
d106bf39f5 [Doc] Add Parallel Draft Models (#35973) zihaoanllm 2026-03-05 13:44:07 +08:00
b0651021e5 [Kernel] [Helion] [11/N] Retune configs for silu_mul_fp8 (#36062) Yanan Cao 2026-03-04 21:25:59 -08:00
f600d5192e [Bugfix] Fix score layer quantization for sequence classification models - Qwen3 (VL) Reranker (#35849) Hanjun Cho 2026-03-05 13:57:20 +09:00
8e7820131e [Perf] Use dummy M for weight prepacking on x86 (#35890) Tianmu Li 2026-03-04 20:56:49 -08:00
0a12cea25f Order config.py in Lexicographical order (#35866) Andrii Skliar 2026-03-05 05:56:47 +01:00
dd6dbd93f8 [compile] Fix extra cache save on warm start. (#35921) Zhengxu Chen 2026-03-04 23:56:30 -05:00
26366009c5 [CI] Don't leave docs preview comment on closed PRs (#36087) Harry Mellor 2026-03-05 04:51:46 +00:00
16c472abe7 [Core] Move ray-specific WorkerWrapperBase methods to RayWorkerWrapper (#35328) Nick Hill 2026-03-04 20:11:59 -08:00
3b23d57c96 [Model] Add LoRA support for Whisper models (#29856) daje0601 2026-03-05 11:38:25 +09:00
2f4226fe52 [CI] Fix pre-commit mypy issue in main (#36049) Wentao Ye 2026-03-04 21:13:12 -05:00
792cbd64ca Add platform method to enable custom collective ops registration (#34760) nkm-meta 2026-03-04 16:50:32 -08:00
2ed4722e26 [compile] Reduce log spam from compile. (#36044) Zhengxu Chen 2026-03-04 19:48:36 -05:00
a3299c3d1d [Model Runner V2] Misc code simplification (#35941) Nick Hill 2026-03-04 15:26:35 -08:00
6c21a0c2d7 [ROCm][CI] Added MI325 mirrors (stage C) (#35239) Andreas Karatzas 2026-03-04 16:48:46 -06:00
562339abc3 [Misc] Support OOT linear method registering (#35981) Shanshan Shen 2026-03-05 06:25:56 +08:00
d7adcadb9b [Bugfix] Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 (#36017) amitz-nv 2026-03-05 00:23:51 +02:00
f678c3f61a [RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag (#35928) Simon Mo 2026-03-04 14:05:32 -08:00
be0a3f7570 [Bugfix] Fix race in non-blocking num_accepted_tokens GPU->CPU copy (#36013) Thomas Parnell 2026-03-04 22:52:44 +01:00
17dc9c7fc9 [CI] Bump mypy version (#34950) Harry Mellor 2026-03-04 20:55:11 +00:00
7eca859110 Add PyTorch profiler schedule support with warmup/active iterations (#35240) fenypatel99 2026-03-04 12:53:38 -08:00
636ee223ac [Docs] Document security risks of GPT-OSS Python tool (#35139) Russell Bryant 2026-03-04 15:27:31 -05:00
b7d59ffce2 [UX] Remove NoOpOffloader log (#35678) Robert Shaw 2026-03-04 15:13:40 -05:00
5569f5218d [torch.compile] Stop lazily compiling (#35472) Richard Zou 2026-03-04 15:13:17 -05:00
138d891d7f [Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441) Davina Zaman 2026-03-04 11:44:39 -08:00
d7166e74c1 [CI] Add Blackwell AsyncTP correctness test (#35871) Stefano Castagnetta 2026-03-04 20:41:21 +01:00
417fd28fb1 [Model Runner V2] Fix pooling (#36019) Nick Hill 2026-03-04 10:53:17 -08:00
7faba503c4 [Kernel][Mamba] Optimize Mamba2 SSD prefill Triton kernels (#35397) tomeras91 2026-03-04 20:47:17 +02:00
bc6be89d16 [Frontend] Add vllm launch command for GPU-less preprocessing serving (#34551) Hyunkyun Moon 2026-03-05 03:41:52 +09:00
32224f568a docs: update CPU Docker images to reference Docker Hub instead of AWS ECR (#34882) Maxime Grenu 2026-03-04 19:31:35 +01:00
f3dc292e9f docs: add version requirement note for --profiler-config flag (#32454) Abhishek Mathukiya 2026-03-04 13:13:54 -05:00
138c5fa186 [Docs] Add RunPod GPU deployment guide for vLLM (#34531) Chen 2026-03-04 12:11:34 -06:00
2f2c1d73a7 [Docs] Upgrade dynamic LoRA warning to admonition block (#35218) Russell Bryant 2026-03-04 13:01:42 -05:00
fb3e78ab09 [Feature][CI]: compare func & no_func outputs in test_functionalization.py (#35481) Bhuminjay Soni 2026-03-04 23:31:16 +05:30
fd3bfe74c9 [Docs] Update design/multiprocessing.md (#30677) Michael Yao 2026-03-05 01:58:59 +08:00
bfdb512f11 fix minicpmo4.5: fix attn_mask in vit attn && fix resampler pos_emb i… (#34127) tc-mb 2026-03-05 01:46:17 +08:00
d25c1ec3c9 docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build (#35090) Sage 2026-03-04 19:45:35 +02:00
7cc6058ac6 [Doc] Add MTP docs and update speculative decoding guidance (#35197) Xing Liu 2026-03-05 01:23:34 +08:00
28028dff2f fix(docs): use static rdzv backend in multi-node troubleshooting script (#34784) Manrique Vargas 2026-03-04 12:15:35 -05:00
3417ba5648 docs: add README for logits_processor examples (#35933) Dr Alex Mitre 2026-03-04 11:09:19 -06:00
58cfe0dc44 Fix phi4-mm and remove cuda binding (#35964) Yan Ma 2026-03-05 01:08:05 +08:00
e86221deb6 [Doc] Fix GPU Worker count in Process Count Summary (#36000) simone-dotolo 2026-03-04 18:03:14 +01:00
289fc48ab7 Use MMEncoderAttention (=use FlashAttention) instead of torch.sdpa in radio.py (#35653) Netanel Haber 2026-03-04 18:43:13 +02:00
2f2212e6cc Split generic IO Processor plugins tests from Terratorch specific ones (#35756) Christian Pinto 2026-03-04 16:01:03 +00:00
18e01a0a10 [Misc] Add --attention-backend auto option (#35738) Nicolò Lucchesi 2026-03-04 16:12:27 +01:00
6cb901093f [Core] Add All-to-All communication backend for DCP (#34883) sungsoo ha 2026-03-04 07:01:57 -08:00
ead7bde1ab [Bugfix] Make kaldi_native_fbank optional (#35996) Cyrus Leung 2026-03-04 22:47:32 +08:00
6aa6ad8992 [BugFix] Fix implicit and incorrect assumption on ECConnector is_producer (#34783) Qi Wang 2026-03-04 06:01:30 -08:00
c8c3935b70 [Bugfix][Model] Fix FP8 k_scale/v_scale not loaded for Qwen3-MoE (#35656) Raghavan 2026-03-04 18:45:38 +05:30
bb6888b8b1 [Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blocks in LRU/ARC prepare_store() (#35846) Ronen Schaffer 2026-03-04 14:25:33 +02:00
1aaec59d79 [MISC] fixed tool_parser mypy errors (#35640) Taneem Ibrahim 2026-03-04 06:23:12 -06:00
1659b2e058 [Feature] Add basic metrics for /realtime endpoint (#35500) pougetat 2026-03-04 03:56:32 -08:00
d6e04f4c43 [Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094) (#34571) haosdent 2026-03-04 18:56:22 +08:00
a8f66cbde8 [XPU] bump vllm-xpu-kernels to v0.1.3 (#35984) Kunshang Ji 2026-03-04 18:23:31 +08:00
16d2ad1d38 [Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681) Kunshang Ji 2026-03-04 17:49:47 +08:00
5dc3538736 [ROCm][Bugfix] Fall back from CK MXFP4 MoE when GEMM dimensions are unsupported (#35893) Chuan (Richard) Li 2026-03-04 00:30:54 -08:00
36bf213181 [Bugfix] Add missing dynamic_arg_dims for Qwen3-ASR torch.compile (#35869) Nathan Price 2026-03-04 02:29:01 -06:00
6f0dd93801 [Core] Remove busy loop from idle buffer readers (#28053) Joe Runde 2026-03-04 00:44:20 -07:00
5d199ac8f2 Support Audio Extraction from MP4 Video for Nemotron Nano VL (#35539) Andrii Skliar 2026-03-04 08:20:33 +01:00
9e0f44bec4 [cohere][fix][spec-decode]: fix crash when allowed_token_ids is set without penalties (#35654) Komal Kumar Teru 2026-03-04 12:50:15 +05:30
097eb544e9 [Bugfix] Improve engine ready timeout error message (#35616) v0.17.0rc0 lailoo 2026-03-04 13:54:32 +08:00
7cdba98edf [BugFix] Support tool_choice=none in the Anthropic API (#35835) ShiJie Zhong 2026-03-04 13:24:46 +08:00
3c85cd9d74 [Rocm][CI] Fix ROCm LM Eval Large Models (8 Card) (#35913) Charlie Fu 2026-03-03 22:50:13 -06:00
edba15045a [Bugfix] Guard mm_token_type_ids kwarg in get_mrope_input_positions (#35711) Andreas Karatzas 2026-03-03 22:12:51 -06:00
e379396167 [Refactor] Clean up processor kwargs extraction (#35872) Cyrus Leung 2026-03-04 11:53:53 +08:00
6e9f21e8a2 [Chore] Remove debug code in model implementation (#35883) Isotr0py 2026-03-04 11:50:58 +08:00
c1d963403c [model] support FireRedASR2 (#35727) AllenDou 2026-03-04 11:41:30 +08:00
77e6dcbbfa [PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753) Shanshan Shen 2026-03-04 11:41:27 +08:00
70c73df69e [Bugfix] Fix EVS implementation for Qwen3 VL (#33607) William Zhang 2026-03-03 18:18:11 -08:00
9a9d442464 Enable bnb for multiple indices weight (#35838) xjx 2026-03-04 09:46:47 +08:00
f7da9cdffc [ROCm][CI] Support async weight transfer example with platform-aware determinism (#35710) Andreas Karatzas 2026-03-03 19:44:14 -06:00
f22ff2958c [Bugfix] Fix coord_socket assertion in DPEngineCoreProc for offline DP mode (#35916) Jaewon 2026-03-03 16:10:11 -08:00
d15c3b90fc [Core] Move save_tensorized_model logic to Worker (#35825) Nick Hill 2026-03-03 15:31:59 -08:00
97286a20ed [Model Runner V2] support dp & ep for spec decoding (#35294) zhrrr 2026-03-04 07:19:45 +08:00
12b38c0f45 [CI/Build] Allow mounting AWS credentials for sccache S3 auth (#35912) Amr Mahdi 2026-03-03 14:30:47 -08:00
467886a0c4 [Model Runner V2] Fix inputs_embeds=None bug for MM models (#35917) Woosuk Kwon 2026-03-03 13:47:45 -08:00
a9b8b13e5c [Bugfix] Fix misnamed parameter in compressed_tensors_moe.py (#35813) bnellnm 2026-03-03 16:29:57 -05:00
e7213003cb [ROCm][CI] Fix TP size issue for test_gpt_oss (#35887) Micah Williamson 2026-03-03 14:57:34 -06:00
3a8eef5869 [ROCm][Bugfix]: Disable AITER Triton ROPE by default (#35601) Rohan Potdar 2026-03-03 13:43:56 -06:00
97995f6376 [MoE Refactor] Create MK for TRTLLM Kernels (#32564) Robert Shaw 2026-03-03 13:39:50 -05:00
881a6b011b [CI] Temporarily Disable Llama4 MoE Refactor Test (#35870) Robert Shaw 2026-03-03 13:36:15 -05:00
8e1fd5baf0 [CI] Bump num_speculative_tokens to 3 in nightly DeepSeek tests (#35882) Matthew Bonanni 2026-03-03 12:26:44 -05:00
ae88468bcc fix: Ensure invalid audio files return 400 error (#34715) JasonCohere 2026-03-03 16:47:39 +00:00
e05cb3b93e TRTLLM gen-full attn Test Coverage (#34986) ojhaanshika 2026-03-03 08:35:34 -08:00
28ef9ba399 [BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA (#34552) Lucas Wilkinson 2026-03-03 10:21:57 -05:00
fb7fdc49c4 [ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops (#34307) TJian 2026-03-03 22:24:21 +08:00
ea463978bb [Frontend][1/n] Improve pooling entrypoints | classify. (#35604) wang.yuqi 2026-03-03 22:05:36 +08:00
440f0e7dc6 [Bugfix] Avoid src/dst as None in irecv/isend_tensor_dict (#35754) Li, Jiang 2026-03-03 21:56:08 +08:00
fd4a90f337 [CI] And PPL test for Qwen3.5. (#35853) wang.yuqi 2026-03-03 21:15:51 +08:00
ad9d09e2b8 [Perf] [Hybrid] Copy num_accepted_tokens in non-blocking way when not using prefix caching (#35442) Thomas Parnell 2026-03-03 13:15:43 +01:00
4beebfd146 [CI/Build][Intel] Add new performance benchmarks for Intel Gaudi 3 (#31025) Szymon Reginis 2026-03-03 12:48:24 +01:00
b8401cde0e add regression test (#35834) hallerite 2026-03-02 23:32:15 -08:00

... 11 12 13 14 15 ...