Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

5dfc5abe94 [ROCm] [Release] Change the package from aiter to amd-aiter (#35198) TJian 2026-03-03 15:13:39 +08:00
8fa68a8ce4 Fix TYPE_CHECKING stub defaults in envs.py to match actual runtime defaults (#35645) lin-shh 2026-03-03 00:59:43 -05:00
35a6f0bfe2 [Misc] Fix typos in comments: explict→explicit, paramaters→parameters (#35648) lin-shh 2026-03-03 00:59:14 -05:00
3a6cbf16e2 [MISC] Removed unused function find_all_indices() from tool_parsers/utils.py (#35683) Taneem Ibrahim 2026-03-02 23:58:42 -06:00
f44d1ddc8c [BugFix] Fix cmake based incremental install (wrong vllm install dir) (#35773) Lucas Wilkinson 2026-03-03 00:58:16 -05:00
48a54c1e0d [CI/Build] Trigger processor tests on registry update (#35824) Cyrus Leung 2026-03-03 13:55:57 +08:00
8b9e8b7454 [ROCm][CI] Fix Assertion Logic For test_gpt_oss (#35806) Micah Williamson 2026-03-02 23:08:04 -06:00
c21d0039ec [Refactor] Fix maxsim cuda platform and add cli to control it (#35427) Wentao Ye 2026-03-02 23:48:31 -05:00
7d8bbe6f42 [CI/Build] Automatically patch video metadata for multimodal processor test (#35822) Isotr0py 2026-03-03 12:27:45 +08:00
25e02647c2 [Core] Add optional flags to check for repetitive token patterns in engine output (#35451) aykoppol 2026-03-02 20:23:25 -08:00
a0a5178ab4 [Model Runner V2] Use ModelState.prepare_attn() for cuda graph capture [5/N] (#35774) Woosuk Kwon 2026-03-02 20:06:27 -08:00
8ea8ba275e [V0 deprecation] Remove Swin model (#35821) Isotr0py 2026-03-03 12:03:41 +08:00
4f85bae9d6 [Docs][Model Runner V2] Add Design Docs (#35819) Woosuk Kwon 2026-03-02 19:58:14 -08:00
0a7165fd71 [ModelRunnerV2] Rename sampler functions and variables for clarity (#35459) Andy Lo 2026-03-03 04:48:56 +01:00
6521ccf286 [CI] Temporarily Disable Nightly Failures (#35770) Robert Shaw 2026-03-02 20:49:13 -05:00
8ebd872f50 [Tool Parser] Fix Qwen3Coder streaming parameter loss with speculative decode (#35615) Martin Vit 2026-03-03 02:40:37 +01:00
168ee03e1c [Model Runner V2][Perf] align dummy_run tokens to uniform decode for dp cudagraph (#35376) zhrrr 2026-03-03 09:10:47 +08:00
9dd656f0ea [XPU][NIXL] Add GPUDirect RDMA support for XPU (#35270) liuzhenwei 2026-03-03 08:42:49 +08:00
c8b678e53e [Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735) Jakub Zakrzewski 2026-03-03 01:32:14 +01:00
18c29c746b [ROCm][CI] Fix backslash-continuation in pytest marker re-quoting and treat exit code 5 as success (#35798) Andreas Karatzas 2026-03-02 18:07:51 -06:00
96fc09503a [All Reduce] Change default backend of Flashinfer All Reduce to trtllm (#35793) Hanjie Qiu 2026-03-02 18:57:38 -05:00
1b82b433fc [Bugfix] Fix MM processor test for Qwen3.5 (#35797) Roger Wang 2026-03-02 15:05:08 -08:00
9319044ee9 [MoE][Perf] Wrap DSV3 QKVAProj GEMM in custom op for torch.compile (#35751) Robert Shaw 2026-03-02 18:03:49 -05:00
c42dc402c1 clean unused cudagraph_batch_sizes (#35552) Boyuan Feng 2026-03-02 14:00:16 -08:00
fa6a6be519 [Bugfix] Fix missing sequence_lengths in qwen3_omni_moe_thinker (#35741) Ye (Charlotte) Qi 2026-03-02 13:11:56 -08:00
cad21918e3 [BUG] Fix rlhf_async example (#35788) Aaron Hao 2026-03-02 12:36:40 -08:00
53700bf49b [ci] Add Ray compatibility check informational CI job (#34672) Jeffrey Wang 2026-03-02 12:06:16 -08:00
a13d8c03c9 [KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops (#31057) Yashwant Bezawada 2026-03-02 14:04:47 -06:00
9433acb8df [Spec Decode] Add hidden states extraction system (#33736) Fynn Schmitt-Ulms 2026-03-02 14:29:09 -05:00
d1a6e96d9e [torch.compile] Improve cold and warm start compile tests (#35709) Richard Zou 2026-03-02 14:27:06 -05:00
2a9e3347e9 [BugFix][Model]Fix the garbled code in Ernie4.5-VL caused by fast_moe_cold_start (#35587) CSWYF3634076 2026-03-03 02:56:33 +08:00
cc0d565f40 [CI/Build] Enable Qwen3.5 tests on CI (#35763) Isotr0py 2026-03-03 01:43:53 +08:00
358e4d5ba7 [CI][HPU] Pin vllm commit compatible with vllm-gaudi - HPU tests (#35307) Patryk Wolsza 2026-03-02 18:02:26 +01:00
792a74b973 [Doc] Improve UX of --enable-log-requests (#35723) Cyrus Leung 2026-03-03 00:24:09 +08:00
4034c3d32e [Core] Move test utility to test file (#35672) Turner Jabbour 2026-03-02 08:56:03 -07:00
7560d674c9 [CI] Fix mypy for vllm/device allocator (#35518) Martin Hickey 2026-03-02 15:53:18 +00:00
d9c7730877 [Performance] Extract kv update ops from MLA attention backends (#34627) ElizaWszola 2026-03-02 16:43:19 +01:00
ada4f4fadd [Fix Bug]num_active_loras always equals to zero (#34119) Runkai Tao 2026-03-02 10:17:46 -05:00
7e9149d9a9 [Docs] Add breadcrumbs for better UX (#35749) Harry Mellor 2026-03-02 14:31:54 +00:00
87c98b0236 [MyPy][BugFix] Check profiler is assigned before calling start() on it (#35505) Martin Hickey 2026-03-02 13:23:42 +00:00
de7dd634b9 Fix unresolved-import errors when using Astral's ty by removing src.root (#35681) Tyler Michael Smith 2026-03-02 05:26:47 -05:00
9a87b0578f [Feat] Supports Anthropic Messages count_tokens API (#35588) Chauncey 2026-03-02 17:48:54 +08:00
510bc9e1df [Misc] Cleanup useless current_platform import (#35715) wangxiyuan 2026-03-02 17:36:54 +08:00
cbd361fd46 [CPU][Distributed] Fix Enable _CPUSHMDistributed only when TP/PP ranks share the same SHM group name (#34169) Charles Ashby 2026-03-02 04:34:35 -05:00
c212202d93 [Misc] Bound NIXL upper bound version (#35495) Nicolò Lucchesi 2026-03-02 09:57:07 +01:00
ec27b36b4b [CI] Defining extended V1 e2e + engine tests (#35580) Andreas Karatzas 2026-03-02 02:10:54 -06:00
3fd1d4ec2c [Rocm][CI] Fix LM Eval Large Models (H100) test group (#34750) Charlie Fu 2026-03-02 01:43:38 -06:00
cb21972a97 [Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels (#34448) EdalatiAli 2026-03-02 02:31:19 -05:00
c34963f138 [ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism (#35152) Andreas Karatzas 2026-03-02 01:04:18 -06:00
f26650d649 [ROCm] add amd-quark package in requirements for rocm to use quantized models (#35658) Hongxia Yang 2026-03-02 01:02:43 -05:00
92f5d0f070 [XPU] fix mxfp4 activation type (#35691) Kunshang Ji 2026-03-02 11:48:39 +08:00
a60985b07e Fix deprecated v1 config tests (#35327) Jesse Cai 2026-03-01 17:32:03 -08:00
8b5014d3dd [Attention] FA4 integration (#32974) Lucas Wilkinson 2026-03-01 18:44:57 -05:00
57a96e26c9 Revert "[Bugfix] Disable TRTLLM attention with KV transfer enabled (#33192)" (#34832) zhanqiuhu 2026-03-01 17:32:37 -05:00
e82fbeec7b [torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475) Richard Zou 2026-03-01 16:44:22 -05:00
6290470843 [Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile (#35256) haosdent 2026-03-02 04:14:46 +08:00
72f4d16262 [Model Runner V2] Use block table apis for capture inputs (#35671) Woosuk Kwon 2026-03-01 10:31:13 -08:00
5a435507d8 fix(mxfp4): return is_monolithic=False when LoRA is enabled for Triton backend (#35382) Seungho Yoon 2026-03-01 23:59:30 +09:00
59d7af9c6c [MISC] Fixing a null reference by removing parallel_utils from mypy EXCLUDE (#35630) Taneem Ibrahim 2026-03-01 08:26:44 -06:00
bbf81f9a92 [Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798) Asaf Gardin 2026-03-01 14:40:23 +02:00
da543d1abe [Model Runner V2] Minor refactoring for EncoderRunner (#35628) Woosuk Kwon 2026-03-01 00:15:39 -08:00
87d319c52f [AMD][CI] Support Triton attention with ExampleConnector (#34931) Ryan Rock 2026-03-01 01:58:07 -06:00
a9ec392c86 Fix typo: implictly -> implicitly in isaac.py docstring (#35646) lin-shh 2026-03-01 02:34:37 -05:00
afd089f231 [Bugfix][Model] Fix Qwen3.5/Qwen3Next ignoring --dtype flag on older GPUs (#35617) lailoo 2026-03-01 11:27:37 +08:00
3ecd0bf9fc Add TMA support to fused_moe_lora kernel (#32195) gnovack 2026-02-28 18:55:25 -08:00
e3eb146f7a [Model Runner V2] Add ModelStateInterface [4/N] (#35621) Woosuk Kwon 2026-02-28 13:19:45 -08:00
95a395dbec [Bugfix] Fix Anthropic API base64 image handling in Messages endpoint (#35557) Martin Vit 2026-02-28 21:57:08 +01:00
e94b263bd6 [Chore] Cleanup BNB utilization dead code (#35620) Isotr0py 2026-03-01 03:22:41 +08:00
e113a30113 [Deprecation] Deprecate code in 0.17 as scheduled (#35441) Wentao Ye 2026-02-28 12:32:37 -05:00
1dafb29f91 [Benchmark] Avoid unnecessary video download in MMVU (#35618) Cyrus Leung 2026-03-01 01:07:02 +08:00
49b9ae32e9 [Fix] Avoid sending image input to other PP ranks (#35405) emricksini-h 2026-02-28 17:14:29 +01:00
63d7972f13 Fix Qwen3_5MTP packed_modules_mapping for gate_up_proj (#35581) cwazai 2026-02-28 22:50:55 +08:00
c68e69f144 custom dataset img support base64 (#35280) flutist 2026-02-28 19:49:52 +08:00
7e08c22b8c [Feat] Add CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function (#35271) Chauncey 2026-02-28 18:12:00 +08:00
8e75d88554 add io_process_plugin for sparse embedding (#34214) Augusto Yao 2026-02-28 17:16:37 +08:00
0892d1ab1f [Feature]Supports Anthropic Thinking Block (#33671) Mario Hong 2026-02-28 17:02:33 +08:00
7600642eae Add padding support to wvSplitK solution for skinny GEMMs (#33762) Hashem Hashemi 2026-02-28 01:02:05 -08:00
1e69c04887 [ROCm][CI] Parametrize vision score tests across attention backends with per-backend tolerances (#35571) Andreas Karatzas 2026-02-28 02:59:26 -06:00
4292e3b807 [Benchmark] Improve UX of sweep scripts (#35600) Cyrus Leung 2026-02-28 16:36:02 +08:00
24d6ea8afd [Benchmark] Rename SLA Finder to Workload Explorer (#35586) Cyrus Leung 2026-02-28 15:31:55 +08:00
57c86c0741 [Misc] Change logging level from info to debug for tool parser import (#35575) Chauncey 2026-02-28 14:51:35 +08:00
06254d4cbb [CI] add trainer_send_weights for MockWeightTransferEngine (#35589) Chauncey 2026-02-28 14:47:43 +08:00
f5d1281c9d [ROCm][CI] Expose tests to AMD production CI and fix amdsmi heap corruption (#35071) Andreas Karatzas 2026-02-27 23:57:31 -06:00
94029ffaf0 [ROCm] Derive device capability from GCN arch string without CUDA init (#35069) Andreas Karatzas 2026-02-27 23:55:28 -06:00
88e8525f2e [ROCm][CI] Adding infiniband mappings for moriio tests (#35170) Andreas Karatzas 2026-02-27 23:53:28 -06:00
b2d8b422b2 [EPLB] Enforce sync eplb for NCCL-based all2all backend (#35212) Ilya Markov 2026-02-28 06:47:12 +01:00
1d5ab5d603 [Bugfix] Move chat completion response_format validation to Pydantic model_validator (#35510) Umut Polat 2026-02-28 08:26:19 +03:00
7b346ba8ed [Bugfix] Propagate compilation_time from workers to main process for TP>1 (#35503) Huy Do 2026-02-27 21:03:22 -08:00
dea268336f [1/N] Elastic EP Milestone 2 (#34861) Itay Alroy 2026-02-28 06:46:42 +02:00
90805ff464 [CI/Build] CPU release supports both of AVX2 and AVX512 (#35466) Ma Jian 2026-02-28 12:35:21 +08:00
2562e0271e [MTP] Validate that MTP weights are actually loaded (#35548) Matthew Bonanni 2026-02-27 23:27:40 -05:00
fd68cd132b [Bugfix] Fixes for SLA finder (#35537) Cyrus Leung 2026-02-28 12:20:55 +08:00
0edf101d2b [ROCm] Add stablelm Head Size 80 To Supported Head Sizes For ROCM_ATTN (#35527) Micah Williamson 2026-02-27 22:16:34 -06:00
d5b6f3ba36 [ROCm][Quantization] Add Composable Kernel (CK) backend support for M… (#34301) Douglas Lehr 2026-02-27 21:37:01 -06:00
1a014a0a93 [Model Runner V2] Move MM encoder to Model States [3/N] (#35564) Woosuk Kwon 2026-02-27 18:32:38 -08:00
86ac7bcf84 [Model Runner V2] Support pooling models (#35120) Woosuk Kwon 2026-02-27 18:03:01 -08:00
405f28d38d [Misc] Clean up ResponsesRequest model validators (#35531) Umut Polat 2026-02-28 04:19:21 +03:00
5323672bc2 [misc] cleanup one level of error stack when nixl fails to initialize (#35517) youkaichao 2026-02-28 08:42:37 +08:00
a201ad72d8 [Refactor][Kernel] Add global helper to deduplicate vectorized memory ops (#35105) Roberto L. Castro 2026-02-28 01:28:17 +01:00
e3691988d0 [ROCm]: fix aiter rope functionalization (#35533) Rohan Potdar 2026-02-27 16:42:30 -06:00

... 12 13 14 15 16 ...