Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

e94cfd51da [BUG] Qwen3-next MTP. Fix attn metadata build bug (#26564) Vadim Gimpelson 2025-10-10 22:59:03 +04:00
7c12763b24 Fix some typing issues found by mypy==1.18.2 (#26596) Harry Mellor 2025-10-10 19:21:25 +01:00
b8b302cde4 Update CUDA architecture list in build pipeline for 12.9.1 wheels (#26592) v0.11.0 Will Eaton 2025-10-10 14:15:27 -04:00
3b780a4bbb Update CUDA architecture list in build pipeline for 12.9.1 wheels (#26592) Will Eaton 2025-10-10 14:15:27 -04:00
30f78af147 Update pre-commit hook versions (#26591) Harry Mellor 2025-10-10 18:03:44 +01:00
19a9b169bf Add Qwen3-Omni moe thinker (#25550) Xiong Wang 2025-10-11 01:00:56 +08:00
96ad65b7fe [Transform] [Quantization] Add QuTLASS support to vLLM (#24440) Roberto L. Castro 2025-10-10 18:43:40 +02:00
8d2b8c0ff2 [Model] Add FlexOlmo model implementation (#24923) Shane A 2025-10-10 09:43:15 -07:00
b2155ed317 [Model][Qwen3VL] Compute cu_seqlens on CPU to remove (#26496) Lukas Geiger 2025-10-10 17:42:17 +01:00
910abdbd08 [Bugfix] fixed top_logprobs: -1 does not appear to work as intended (#26470) Chauncey 2025-10-11 00:41:17 +08:00
cddce79fda [torch.compile] Make inductor partition rules respect splitting_ops #25691 (#25845) baonudesifeizhai 2025-10-10 12:35:28 -04:00
e519281920 [Metrics] Add test for multi-modal cache stats logging (#26588) Mark McLoughlin 2025-10-10 17:00:50 +01:00
7b03584de8 Silu v2 (#25074) Elvir Crnčević 2025-10-10 17:19:53 +02:00
ae9d0e7da5 [Bugfix] Make DP padding optional in coordinate_batch_across_dp (#26375) Sage Moore 2025-10-10 07:53:33 -07:00
0e67102d93 Added test_top_k_per_row to test-pipeline.yaml. (#26569) Daniel Cámpora 2025-10-10 16:48:33 +02:00
f4ba2061cf [BugFix][torch.compile] Fix fused_scaled_matmul_reduce_scatter signature for PyTorch 2.8 (#26038) Jason Li 2025-10-10 10:42:13 -04:00
1e6848a65d [CI] fix test_run_batch.py::test_completions - AssertionError (#26578) Chauncey 2025-10-10 22:16:28 +08:00
67661375fa [BugFix] Fix noop elimination edge case (#26394) Andy Lo 2025-10-10 14:33:04 +01:00
213b64452a [Bugfix] Convert untraceable GroupShape to list for AMD impl (#26535) Lucas Kabela 2025-10-10 06:32:29 -07:00
784c231151 [NIXL] Ignore abort on already-finished request (#25067) Mark McLoughlin 2025-10-10 11:21:56 +01:00
606b00e80f [bugfix][DCP] fix block_size of hash in DCP prefix caching (#26296) Chen Zhang 2025-10-10 18:02:49 +08:00
720d3cd0f0 [CI] fix ruff format (#26579) Chauncey 2025-10-10 18:02:12 +08:00
ab196edefb Remove LoRA bias support (#25807) Ashwin Phadke 2025-10-10 15:20:33 +05:30
3ee202ea1e [GPT-OSS] Add support for arrays at tool message content (#25593) Luis Tomas Bolivar 2025-10-10 11:00:45 +02:00
ad430a67ca [Metrics] Log multi-modal cache stats and fix reset (#26285) Cyrus Leung 2025-10-10 16:45:55 +08:00
6f0f570c43 [deepseek] kernel block size for UniformTypeKVCacheSpecs (#26559) Chen Zhang 2025-10-10 16:40:41 +08:00
b545a0b207 fix test_simple_inductor_graph_partition (#26522) Boyuan Feng 2025-10-09 23:39:19 -07:00
29255cfc3b [Spec-Decode] Support piecewise cudagraphs for Eagle head (#25109) Lucas Wilkinson 2025-10-10 01:20:31 -04:00
da4455609d [Chore]: One pythonic tool parser test uses the wrong parser (#26515) Ben Browning 2025-10-10 00:03:55 -04:00
aafb99a4d4 [Core] Small simplification in GPUModelRunner._update_states() (#26508) Nick Hill 2025-10-09 19:53:58 -07:00
757fa4a4da [DP][ray] Support different VLLM_RAY_DP_PACK_STRATEGY (#23849) Rui Qiao 2025-10-09 19:53:43 -07:00
c6187f55f7 Refactor MistralTokenizer (#26358) Julien Denize 2025-10-10 00:48:58 +02:00
8983e0216f [CI] Fix Pre-commit Issue Cannot determine type of "rank" and "world_size" (#26448) Wentao Ye 2025-10-09 18:16:48 -04:00
1ee35382cb [Bug] Fix modular_kernel: ZeroDivisionError: integer division or modulo by zero (#26528) Wentao Ye 2025-10-09 18:13:27 -04:00
6e783bc54b [Bugfix] Fix CUDA graph selection bug in FlashInfer at high concurrency (#26499) Benjamin Chislett 2025-10-09 17:12:34 -04:00
c9d33c60dc [UX] Add FlashInfer as default CUDA dependency (#26443) Michael Goin 2025-10-09 17:10:02 -04:00
2e54db4d2b [Core] Remove unused prev_sampled_token_ids_invalid_indices input batch field (#26514) Nick Hill 2025-10-09 13:22:14 -07:00
44f633dba1 [Flashinfer][gpt-oss] Support FP8-qkv Flashinfer TRTLLM Sinks Attention (#25674) elvischenv 2025-10-10 04:13:39 +08:00
a462331e36 [Bugfix] Disable moe inplace for torch >= 2.9 (#26497) bnellnm 2025-10-09 14:07:38 -04:00
4069db3f2e [Bugfix] Enable padded FP4 quantization (#25947) roikoren755 2025-10-09 20:59:41 +03:00
0d37450eb7 [BUGFIX] Add cu_tokens_across_sp to DPMetadata (#26457) Sage Moore 2025-10-09 10:13:56 -07:00
47e66c24e2 [Model] Apply shared experts overlap optimization to all models with shared experts (#26145) bnellnm 2025-10-09 11:31:04 -04:00
3b736e1c38 [Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 (#25049) Ming Yang 2025-10-09 08:06:29 -07:00
2c1c7dfb35 [Models][Qwen] Replace pad with cat for better performance (#26486) Lukas Geiger 2025-10-09 15:51:26 +01:00
e246ad6f0c Upgrade Pydantic to v2.12.0 and remove hack for Python 3.13 (#26481) Harry Mellor 2025-10-09 14:02:40 +01:00
5728da11ea Revert #26113 "[Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops" (#26472) Jiangyun Zhu 2025-10-09 20:43:55 +08:00
92be3f3517 [Feature] Use pydantic validation in parallel.py config (#26417) Simon Danielsson 2025-10-09 14:41:31 +02:00
d1ddf340c8 [V0 deprecation] Remove QKVCrossParallelLinear implementation (#26475) Isotr0py 2025-10-09 18:52:27 +08:00
ec10fd0abc [Bugfix] Move current_platform import to avoid python import cache. (#16601) Wenzheng Bi 2025-10-09 18:46:19 +08:00
0426e3c5e1 [Models][Qwen3VL] Optimise _validate_and_reshape_mm_tensor (#26426) Lukas Geiger 2025-10-09 11:25:48 +01:00
4bdf7ac593 [Bugfix] Fix SHM cache initialization (#26427) Cyrus Leung 2025-10-09 17:48:04 +08:00
dc7976dd9f [Misc] Upgrade more code to Python 3.10 (#26463) Cyrus Leung 2025-10-09 17:43:53 +08:00
e4791438ed [Feature] Use pydantic validation in lora.py and load.py configs (#26413) Simon Danielsson 2025-10-09 11:38:33 +02:00
e6e898f95d [doc] add Volcengine as a compute sponsor (#26477) youkaichao 2025-10-09 17:11:47 +08:00
ddcbc2f334 [Misc] Misc code simplifications (#26450) Nick Hill 2025-10-09 02:10:06 -07:00
a83ff278d6 [torchao] Add support for ModuleFqnToConfig using regex (#26001) Jerry Zhang 2025-10-09 01:32:32 -07:00
cf4cd6c24f Add: Support for multiple hidden layers in Eagle3 (#26164) Rahul Tuli 2025-10-09 13:00:50 +05:30
b960441812 Enable RMSNorm substitution for Transformers backend (#26353) Harry Mellor 2025-10-09 08:28:51 +01:00
1317028aa8 [Model] Gemma3: Fix GGUF loading and quantization (#26189) Luciano Martins 2025-10-09 04:00:53 -03:00
5e49c3e777 Bump Flashinfer to v0.4.0 (#26326) elvischenv 2025-10-09 14:58:44 +08:00
0d7c3cb51d Update Dockerfile and install runai-model-streamer[gcs] package (#26464) pwschuurman 2025-10-08 23:48:51 -07:00
1b2c440cd6 [Core] Relax the LoRA max rank (#26461) Jee Jee Li 2025-10-09 14:47:14 +08:00
0f29dca988 [CI/Build] Fix model nightly tests (#26466) Cyrus Leung 2025-10-09 14:44:16 +08:00
d24cf322e1 [Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486) Zhiyuan Li 2025-10-09 14:43:39 +08:00
d17f0fbf30 [Core][KVConnector] Propagate all tokens on resumed preemptions (#24926) Qier Li 2025-10-09 02:43:31 -04:00
43ab8cfaa5 [MM][Doc] Add documentation for configurable mm profiling (#26200) Wenlong Wang 2025-10-08 23:21:20 -07:00
de253d63b7 [Hardware][AMD] Enable FlexAttention backend on ROCm (#26439) Matt 2025-10-09 01:20:18 -05:00
8bd696fa53 [Bugfix] Incorrect another MM data format in vllm bench throughput (#26462) Huy Do 2025-10-08 22:58:46 -07:00
bb6d8c21f9 [Bugfix] Catch and log invalid token ids in detokenizer #2 (#26445) Nick Hill 2025-10-08 21:20:25 -07:00
ebf6ef1a9b [Minor] Change warning->warning_once in preprocess (#26455) Zhuohan Li 2025-10-08 21:09:06 -07:00
0c52d6ef81 [Bugfix] Set the minimum python version for gpt-oss (#26392) Jee Jee Li 2025-10-09 11:35:49 +08:00
467a4f98f1 [Misc] Redact ray runtime env before logging (#26302) Rui Qiao 2025-10-08 17:43:34 -07:00
e614ab7806 Separate MLAAttention class from Attention (#25103) Naveenraj Kamalakannan 2025-10-08 20:11:11 -04:00
2a03f93de9 [Attention] Register FLASHMLA_SPARSE (#26441) Matthew Bonanni 2025-10-08 18:28:52 -04:00
da364615fc [Kernels] Modular kernel refactor (#24812) bnellnm 2025-10-08 17:51:52 -04:00
f08919b7d1 [Bugfix] Respect min_tokens in scheduler stop check (#26317) Elaine Zhao 2025-10-08 14:08:24 -07:00
93f2c0aa08 [Models] Improve iteration over layers (#26425) Lukas Geiger 2025-10-08 21:48:33 +01:00
4ebc9108a7 [Kernel] Centralize platform kernel import in current_platform.import_kernels (#26286) Nicolò Lucchesi 2025-10-08 22:25:31 +02:00
e1ba235668 [BugFix] Fix failing test quantization/test_compressed_tensors.py::test_compressed_tensors_fp8_block_enabled (#26436) Morrison Turnansky 2025-10-08 16:04:12 -04:00
b82f4307c9 [Bugfix][Flashinfer] fix VLLM_USE_TRTLLM_ATTENTION issue for models with diff hyperparameters (#25924) elvischenv 2025-10-09 03:54:48 +08:00
76879cc160 [Attention] Implement universal BACKEND_MAP (#25900) Matthew Bonanni 2025-10-08 15:00:25 -04:00
b25d7b5657 [Feature] Change cache.py with pydantic validation (#26390) Vinay R Damodaran 2025-10-08 11:12:59 -07:00
e09d1753ec Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 (#26416) Harry Mellor 2025-10-08 18:40:42 +01:00
4ba8875749 [Bug] Fix Test in Batch Invariant (#26128) Wentao Ye 2025-10-08 13:13:47 -04:00
6273fe8d3d [Benchmarks] Fix imports in FP8 tuning script (#26407) Lukas Geiger 2025-10-08 17:31:59 +01:00
9fb3ae4e6f [Bug] Fix DeepGEMM Attention Test (#26423) Wentao Ye 2025-10-08 12:23:41 -04:00
76afe4edf8 [Bugfix] Fix vllm bench ... on CPU-only head nodes (#25283) Aydin Abiar 2025-10-08 09:06:42 -07:00
c1b06fc182 [CI Failure] Fix pre-commit issue for install_nixl_from_source_ubuntu.py (#26424) Michael Goin 2025-10-08 10:55:43 -04:00
241b4cfe66 [Refactor] Refactor FP8 & INT8 Quant Folder inside w8a8 (#25293) Wentao Ye 2025-10-08 10:20:48 -04:00
9fc983c707 [NIXL][non-cuda] Add install script for nixl with non-cuda ucx (#25959) Chendi.Xue 2025-10-08 09:19:53 -05:00
2f99f2f506 Tidy vllm/config/__init__.py to only add classes and functions (#26405) Harry Mellor 2025-10-08 15:10:00 +01:00
338b1bf04f [Benchmarks] Add support for Qwen 3 VL MoE tuning (#26419) Lukas Geiger 2025-10-08 15:01:08 +01:00
e39dc46f8f [CI] Pooling models mteb test disable enforce_eager (#26408) wang.yuqi 2025-10-08 20:15:36 +08:00
10c75b5439 [Docs] Have mergify leave a comment with the docs preview link (#26412) Harry Mellor 2025-10-08 13:04:00 +01:00
f9582fd8f4 [Model] Allow passing custom number of max tiles to Nano 2 VL (#26403) Eugene Khvedchenya 2025-10-08 14:19:39 +03:00
f377333bd7 [Misc] add usedforsecurity=False in md5 hash call (#26357) Daniele 2025-10-08 12:18:32 +02:00
f8607863d8 [Feature] Enable E8M0 by Default on Hopper for DeepGEMM, 5% E2E throughput improvement (#26197) Wentao Ye 2025-10-08 03:33:56 -04:00
335b28f7d1 [TPU] Rename tpu_commons to tpu_inference (#26279) Utkarsh Sharma 2025-10-08 12:00:52 +05:30
5e65d6b2ad fix[DP][v1]: Prevent hangs from mismatched worker configurations (#26218) Ayush Satyam 2025-10-08 11:25:08 +05:30
0d4f48fa10 [Bugfix] Incorrect MM data format in vllm bench throughput (#26395) Cyrus Leung 2025-10-08 13:52:19 +08:00

... 54 55 56 57 58 ...