Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

127c8b782a Add gather_indexer_k_quant_cache kernel (#25931) Barry Kang 2025-10-08 12:58:57 +08:00
cd9890544b fix(v1/kv_cache): resolve async KV transfer bug in cascade attention (#23485) Ayush Satyam 2025-10-08 10:16:33 +05:30
067da2d1df [Core] Simplify setting new_token_ids in CachedRequestData (#26388) Nick Hill 2025-10-07 20:32:37 -07:00
046118b938 Add SwigluOAI implementation for CPUFusedMOE (#26347) isharif168 2025-10-08 03:17:49 +01:00
b32260ab85 [torchao] safetensors integration (#25969) liangel-02 2025-10-07 19:12:35 -07:00
f80e7866c0 [Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125) Lucas Wilkinson 2025-10-07 22:09:34 -04:00
31a4b3e6c4 Revert #24446 and #26168 (#26332) Thomas Parnell 2025-10-08 00:38:19 +02:00
caf8b1c084 [Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled (#26361) Benjamin Chislett 2025-10-07 18:12:26 -04:00
1b86bd8e18 Add more libraries to rlhf.md (#26374) Michael Goin 2025-10-07 16:59:41 -04:00
59012df99b [TPU] update TPU benchmark threshold (#25713) Johnny Yang 2025-10-07 13:53:09 -07:00
01efc7ef78 [ci] fix wheel names for arm wheels (#24898) v0.10.2 Simon Mo 2025-09-15 14:39:08 -07:00
3d1f67616d [Spec Decode] Enable efficient speculative decoding with FlashInfer-MLA (#25984) Benjamin Chislett 2025-10-07 16:05:59 -04:00
6ebaf43ee4 [V1] Logit processors for rejection sampler (#19482) Sergei Skvortsov 2025-10-07 21:02:49 +01:00
0c824fc46f [Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26113) Morrison Turnansky 2025-10-07 15:53:43 -04:00
eb577e4655 [Bugfix] Add missing sink tensor into flash attn cascade attn implementation (#26325) Pei-Lun Liao 2025-10-07 11:56:39 -07:00
8f36850f73 [Bug] Fix Shape Validation for Fallback while Enabling E8M0 for DeepGEMM (#26322) Wentao Ye 2025-10-07 13:50:30 -04:00
29fd2662ba [deepseek] add EP8 FusedMOE config for H200 and B200 (#26331) Chen Zhang 2025-10-08 01:38:54 +08:00
30a3e5af69 [CI] Add Qwen3 MoE NVFP4 to Blackwell lm-eval (#26316) Michael Goin 2025-10-07 13:36:15 -04:00
a38c1bfe09 [ci] Rename test_mxfp4_moe.py to test_ocp_mx_moe.py (#26364) fxmarty-amd 2025-10-07 18:52:24 +02:00
320feae6f5 [Model] Lfm2Moe (#26344) Paul Pak 2025-10-08 01:03:05 +09:00
1e4ecca1d0 [V0 Deprecation] Remove VLLM_USE_V1 from tests (#26341) Cyrus Leung 2025-10-07 23:42:31 +08:00
c0a7b89d8e [Misc] Move LRUCache into its own file (#26342) Cyrus Leung 2025-10-07 23:08:40 +08:00
6f59beaf0b [Model] Add support for ModernBertForTokenClassification (#26340) antrec 2025-10-07 16:29:19 +02:00
41f1cf38f2 [Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (#21166) fxmarty-amd 2025-10-07 15:35:26 +02:00
08d26a1b7e [Model] Use merge_by_field_config for MM models (Ovis family) (#26308) Isotr0py 2025-10-07 20:54:22 +08:00
63773a6200 [Docs] add docs for cuda graph v1 (#24374) fhl2000 2025-10-07 20:25:05 +08:00
883b42896a Add TRL example notebook to RLHF docs (#26346) Sergio Paniego Blanco 2025-10-07 13:31:28 +02:00
e1098ced95 Add topk logits torch op for DS3.2. (#25945) Daniel Cámpora 2025-10-07 12:07:32 +02:00
d100d78eb3 Optimize KV cache distribution for asymmetric pipeline parallelism (#25164) Grant Holmes (Ren) 2025-10-07 04:20:30 -05:00
7e4cd070b0 [V0 Deprecation] Remove VLLM_USE_V1 from docs and scripts (#26336) Cyrus Leung 2025-10-07 16:46:44 +08:00
46b0779996 [BugFix] Update KV block hash type from BlockHash to ExternalBlockHash in kv_events_subscriber - #26264 (#26265) Snehlata 2025-10-07 14:12:28 +05:30
de342585ff [Model] Define merge_by_field_config MM interface (R-T) (#26260) Ayush Satyam 2025-10-07 13:40:55 +05:30
185d8ed44f [responsesAPI][bugfix] serialize harmony messages (#26185) Andrew Xia 2025-10-07 00:07:53 -07:00
d9836d4517 [Deprecation] Deprecate LLM.set_tokenizer (#26333) Cyrus Leung 2025-10-07 14:50:57 +08:00
5f7e8a916a [Model] Define merge_by_field_config MM interface (U-Z) (#26261) Ayush Satyam 2025-10-07 12:15:49 +05:30
4dbdf4a294 [BUG] Fix file parsing for load_format runai_streamer_sharded (#26324) ahao-anyscale 2025-10-06 20:23:07 -07:00
c6873c4e6d [UX] Support nested dicts in hf_overrides (#25727) Michael Goin 2025-10-06 23:19:16 -04:00
2111b4643c [Core] Simplify the Dp padding/should ubatch coordination logic (#25768) Sage Moore 2025-10-06 18:57:49 -07:00
c50901f3b9 [Docs][DBO] Add initial doc that describes the DBO implementation (#26024) Sage Moore 2025-10-06 17:47:28 -07:00
8229280a9c [Misc] Define EP kernel arch list in Dockerfile (#25635) Simon Mo 2025-10-06 17:05:33 -07:00
f77df94647 [Perf] Add decode full-graph support to FlashInfer-MLA backend (#26313) Benjamin Chislett 2025-10-06 19:03:49 -04:00
f231e5bc21 [ROCm] Split AITER unified attention into its own backend (#25507) Gregory Shtrasberg 2025-10-06 18:49:23 -04:00
2161efe978 [Bugfix] Allow skipping MoE in NVFP4 (fix for MTP) (#25987) Benjamin Chislett 2025-10-06 16:16:30 -04:00
f23b4c04fd [BugFix] Pad input buffers in _dummy_run (#26209) Varun Sundar Rabindranath 2025-10-06 16:07:51 -04:00
93540958b8 [Docs] Fix broken table in moe_kernel_features doc (#26314) Varun Sundar Rabindranath 2025-10-06 15:58:05 -04:00
44b9af5bb2 [Benchmark] Enable MM Embedding benchmarks (#26310) Cyrus Leung 2025-10-07 03:51:58 +08:00
7cd95dc8a3 [Bugfix] Fix gemma3 with transformers backend (#23178) Raushan Turganbay 2025-10-06 20:42:32 +02:00
c02058c222 Add bias handling to CPUFusedMOE kernel (#26289) Crefeda Rodrigues 2025-10-06 19:39:10 +01:00
b2ea5ba677 [Bugfix][Spec Decode] Fix wrong valid_mask for padded speculation when chunked prefill occurs (#26231) 7mile 2025-10-07 02:24:22 +08:00
824a3f403f [Misc] auto_tune: kill specific vllm process (#26304) Karan Goel 2025-10-06 11:02:51 -07:00
05f6846ede Support llama3 eagle3 head with llama4 verifier (#25961) Rahul Tuli 2025-10-06 23:26:08 +05:30
20db99cc69 [CI Bugfix] Make sure TRTLLM attention is available in test_blackwell_moe (#26188) Michael Goin 2025-10-06 13:50:11 -04:00
6431be808f [Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input (#26295) Yannick Schnider 2025-10-06 19:19:34 +02:00
4727a8afa7 [Attention] Remove unused reorder_batch method (#24463) Matthew Bonanni 2025-10-06 13:13:39 -04:00
b8f603cebe [Model] EVS support for nano_nemotron_vl (#26269) tomeras91 2025-10-06 19:23:37 +03:00
fc679696f8 Fix DotsOCR tensor type (#26281) Chatcharin Sangbutsarakum 2025-10-06 19:23:43 +07:00
ab5e7d93f4 [Bugfix] Fix mrope in Transformers Backend (#26087) Raushan Turganbay 2025-10-06 13:40:50 +02:00
0340f45553 Support expert parallel load balancing in Transformers backend (#26287) Harry Mellor 2025-10-06 12:20:16 +01:00
19a00eb210 [Model] Use merge_by_field_config for MM models (Llava family) (#26280) Cyrus Leung 2025-10-06 17:45:26 +08:00
391612e78b [Frontend] Consolidate tokenizer init code (#26276) Cyrus Leung 2025-10-06 17:34:52 +08:00
77c95f72f7 [Doc] add KAITO to integrations (#25521) abhisheksheth28 2025-10-06 02:30:03 -07:00
59f30d0448 [Docs] Edit HF Inference Endpoints documentation (#26275) Aritra Roy Gosthipaty 2025-10-06 14:43:09 +05:30
43c146ca42 [Misc] Clean up unnecessary E501 ignore (#26274) Roger Wang 2025-10-06 00:29:18 -07:00
7c2ec0fe87 [Benchmarking] Add disable_shuffle option for dataset loading (#26258) Yasmin Moslem 2025-10-06 08:05:44 +01:00
039b6bade3 Bump actions/stale from 10.0.0 to 10.1.0 (#26272) dependabot[bot] 2025-10-06 07:01:21 +00:00
6c04638214 Fix per file ruff ignores related to line length (#26262) Harry Mellor 2025-10-06 06:12:40 +01:00
91ac7f764d [CI][gpt-oss] Enable python tool tests in CI (#24315) wuhang 2025-10-06 12:20:06 +08:00
4be7d7c1c9 [MISC] Add heheda12345 to CODEOWNERS of vllm/config/cache.py (#26270) Chen Zhang 2025-10-05 19:58:59 -07:00
59b477645c [Doc] Edited minor typo (#26266) orangeng 2025-10-05 19:53:09 -07:00
778f554157 [V1] [Hybrid] Some additional clean-up in Mamba2 prefix caching (#26222) Thomas Parnell 2025-10-06 04:40:30 +02:00
d3c84297c3 [CI] Add comment about the single cudagraph capture size that is used (#26252) Thomas Parnell 2025-10-06 04:35:37 +02:00
f509a20846 [DOC] Update production-stack.md (#26177) Elieser Pereira 2025-10-05 18:32:48 -03:00
60bc25e74c [CI] Add Blackwell LM Eval Small Models test to nightly (#26052) Michael Goin 2025-10-05 16:59:50 -04:00
b893d661b1 Fix per file ruff ignores related to simplification (#26259) Harry Mellor 2025-10-05 21:31:53 +01:00
6b6e98775f [NVIDIA] flashinfer TRTLLM attention prefill token limit (#25998) Jason Li 2025-10-05 16:24:37 -04:00
9c3c21c519 [CI] fix mamba kernel test (#26250) Jiangyun Zhu 2025-10-06 02:26:59 +08:00
512b8affa4 Update ruff pre-commit hooks version (#26255) Harry Mellor 2025-10-05 17:50:50 +01:00
1c0c68202c Fix per file ruff ignores related to typing (#26254) Harry Mellor 2025-10-05 17:37:55 +01:00
5f317530ec fix(tests): Resolve late binding of loop variable in assert message lambda (#26249) ihb2032 2025-10-06 00:18:22 +08:00
557b2e961d Remove all cases of fmt: on/off (#26253) Harry Mellor 2025-10-05 17:18:14 +01:00
4e256cadc2 Remove all references to yapf as it's no longer used (#26251) Harry Mellor 2025-10-05 17:18:11 +01:00
d6953beb91 Convert formatting to use ruff instead of yapf + isort (#26247) Harry Mellor 2025-10-05 15:06:22 +01:00
17edd8a807 [Platform][Kernel] platform-specific kernel loading (#25823) Hank_ 2025-10-05 19:25:15 +08:00
3303cfb4ac [Bugfix][Hardware][RISC-V] Limit supported dtypes to float32 to avoid scheduler segfault (#26228) ihb2032 2025-10-05 18:36:54 +08:00
b7e8e4e6be [Bugfix] Always apply MM processor even when no MM items are passed (#26240) Cyrus Leung 2025-10-05 18:10:20 +08:00
432e1cbc23 [Bugfix]: Assertion error when using FlashInfer backend (#25933) Simon Danielsson 2025-10-05 09:46:36 +01:00
201c971e96 [Perf][Easy] Early stop in request_block_hasher (#26112) Jialin Ouyang 2025-10-05 01:46:03 -07:00
e0986ea07b Add documentation for granite 4 tool calling (#26175) Maximilien de Bayser 2025-10-05 04:35:42 -03:00
a964e5e6c3 [Bugfix] Allow --skip-tokenizer-init with echo and return_token_ids (#26238) Cyrus Leung 2025-10-05 13:38:53 +08:00
78c1d5bfd2 [Easy] Add str repr for IterationStats (#26232) 22quinn 2025-10-04 22:00:21 -07:00
59a85c366e [Model] Use merge_by_field_config for MM models (H-L) (#26230) Cyrus Leung 2025-10-05 11:54:17 +08:00
119f00630b [Renderer] Clean up renderer code (#26216) Cyrus Leung 2025-10-05 01:05:29 +08:00
a42d2df75f [Frontend] Cache chat template kwargs resolution (#26227) Isotr0py 2025-10-04 23:32:30 +08:00
5c057e068f [CPU] Refine batch reorder of CPU attention backend (#26096) Li, Jiang 2025-10-04 21:54:35 +08:00
ed3aeb25a4 [V1] [Hybrid] Remove code to override default CUDA graph configuration (#26226) Thomas Parnell 2025-10-04 15:47:48 +02:00
86ee949128 Fix tensor device and dtype placement in Qwen2VL model (#26219) yuafng 2025-10-04 06:41:39 -07:00
4570535ec4 [Model] CLIP Embedding Support (#26010) Cyrus Leung 2025-10-04 21:21:42 +08:00
2a6dc67eb5 [Bugfix] Fix _reqs_to_process leak on abort (#26012) Nicolò Lucchesi 2025-10-04 13:39:31 +02:00
f05fea1f5e [Core] Enable decode of context length equal to max model length (#26168) Yannick Schnider 2025-10-04 11:59:26 +02:00
d0df145c2a Add Olmo 3 reasoning parser (#26054) Luca Soldaini 2025-10-04 02:48:29 -07:00

... 55 56 57 58 59 ...