Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

1ab8fc8197 Make PyTorch profiler gzip and CUDA time dump configurable (#29568) Yifei Zhang 2025-12-01 12:30:46 +08:00
f72a817bdf [MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141) Shu Wang 2025-11-30 18:05:32 -06:00
ec38a7368d [Model Runner V2] Use packed mask for prompt bin counts (#29756) Woosuk Kwon 2025-11-30 14:15:42 -08:00
21c2627934 [Misc]Remove redundant hidden_size property in ModelConfig (#29749) Xingyu Liu 2025-12-01 01:14:23 +08:00
39d28108f4 [Feat] Support non-gated activations in NVFP4 modelopt path (#29004) Omer Ullman Argov 2025-11-30 18:02:40 +02:00
cd719de5cb Fix RoPE failures in Transformers nightly (#29700) Harry Mellor 2025-11-30 14:29:32 +00:00
8c363ed666 [ROCm][Attention] Sliding window support for AiterFlashAttentionBackend (#29234) Pleaplusone 2025-11-30 19:31:50 +08:00
64bc09ba27 [Core] Enable inputs_embeds_size separate from hidden_size (#29741) Cyrus Leung 2025-11-30 17:31:12 +08:00
47539cfd3e [Bugfix] Fix mismatched nvfp4 gemm output shape (#29742) Isotr0py 2025-11-30 17:15:01 +08:00
2afcec4dec [Misc] Update TokenizerLike interface and move get_cached_tokenizer (#29730) Cyrus Leung 2025-11-30 14:59:47 +08:00
9381b5cde0 [Doc]: Fix typo in fused_moe layer (#29731) 朝 2025-11-30 14:29:13 +08:00
66b5840287 [Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output (#28783) Vensen 2025-11-30 14:24:25 +08:00
82c795d6f2 Fix AttributeError about _use_fi_prefill (#29734) Huamin Li 2025-11-29 22:04:55 -08:00
e1464c3a08 [Quantization] Enable compressed-tensors AWQ for Turing GPU (#29732) Isotr0py 2025-11-30 14:04:28 +08:00
a491b0911b [LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708) Xin Yang 2025-11-29 18:37:25 -08:00
b9d0504a36 [Bugfix] Revert test_tokenization.py (#29729) Jee Jee Li 2025-11-30 00:35:15 +08:00
1656ad3704 [Kernel][Quantization] add w4a8 support for marlin kernel (#24722) Jinzhen Lin 2025-11-29 23:19:33 +08:00
fa59fe417f [Chore] Move detokenizer_utils to vllm/tokenizers (#29727) Cyrus Leung 2025-11-29 22:25:17 +08:00
fe3398fab2 [Chore] Enable passing tokenizer=None into MM processor (#29724) Cyrus Leung 2025-11-29 22:25:10 +08:00
ad7f714d62 hfrunner.classify should return list[list[float]] not list[str] (#29671) Chukwuma Nwaugha 2025-11-29 13:57:00 +00:00
f4341f45d3 [Doc]: fix code block rendering (#29728) dublc 2025-11-29 21:46:48 +08:00
34a984274e [Misc] Refactor tokenizer interface (#29693) Cyrus Leung 2025-11-29 20:02:21 +08:00
f223ed4181 [Model Runner V2] Fuse penalties and temperature into single kernel (#29720) Woosuk Kwon 2025-11-29 02:29:16 -08:00
04a797cd0e [Doc]: fixing typos in various files. (#29717) Didier Durand 2025-11-29 10:15:39 +01:00
6afc0ffaf6 [Model Runner V2] Add sample/ directory and reorganize files (#29719) Woosuk Kwon 2025-11-29 00:41:01 -08:00
39e63dec7c [LoRA] Cleanup LoRA unused code (#29611) Jee Jee Li 2025-11-29 14:52:58 +08:00
4a80ad0a25 [Model Runner V2] Don't use UVA buffer for prefill_len (#29713) Woosuk Kwon 2025-11-28 20:27:16 -08:00
4b17ce6815 Add gpu memory wait before test_async_tp (#28893) Angela Yi 2025-11-28 20:19:05 -08:00
e23f665d83 [BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable (#29698) Lucas Wilkinson 2025-11-28 23:19:01 -05:00
ca1b1e7296 [Model Runner V2] Refactor prefill token preparation (#29712) Woosuk Kwon 2025-11-28 19:49:17 -08:00
762a4a6ca9 [Frontend] Perform offline path replacement to tokenizer (#29706) Tsukasa OI 2025-11-29 11:32:08 +09:00
b2c50eda50 [Bugfix] Fix wrong mock attribute (#29704) Cyrus Leung 2025-11-29 10:30:41 +08:00
1dcafb3dea [Model Runner V2] Support penalties using bin counts (#29703) Woosuk Kwon 2025-11-28 17:53:17 -08:00
ea3370b428 [ROCm][Bugfix] Patch for the Multi-Modal Processor Test group (#29702) Andreas Karatzas 2025-11-28 19:31:44 -06:00
c625d7b1c6 [Bugfix] Fix O(n²) multimodal string prompt processing (#29667) Mert Unsal 2025-11-28 16:10:39 -08:00
6173682b6e [compile] Include enable_sleep_mode into caching factors. (#29696) Zhengxu Chen 2025-11-28 18:58:38 -05:00
9726e64530 bugfix: correct attn output with base 2 or e (#28840) Augusto Yao 2025-11-29 07:52:12 +08:00
3fd1fb0b60 Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)" (#29697) Huamin Li 2025-11-28 15:26:52 -08:00
a51f4186f2 [Bugfix] fix dots.llm1.inst (#29687) Jiangyun Zhu 2025-11-29 07:25:26 +08:00
7675ba30de [Misc] Remove redundant ClassRegistry (#29681) Cyrus Leung 2025-11-29 07:24:47 +08:00
7c1ed45848 [CI/Build]: make it possible to build with a free-threaded interpreter (#29241) Ralf Gommers 2025-11-29 00:21:46 +01:00
1986de1375 [Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597) Benjamin Chislett 2025-11-28 17:25:05 -05:00
3461e7efd8 [Frontend] Remap -O to -cc commandline flag (#29557) Yanan Cao 2025-11-28 13:51:12 -08:00
fecae12cd7 Remove all_special_tokens_extended from tokenizer code (#29686) Harry Mellor 2025-11-28 20:26:51 +00:00
8d9338fae4 [Chore] Rename Processor to InputProcessor (#29682) Cyrus Leung 2025-11-29 01:35:41 +08:00
d40c854009 [CI/Build] Rework CPU multimodal processor test (#29684) Isotr0py 2025-11-29 01:10:29 +08:00
4332955602 [Docs] Add CLI reference doc for vllm bench sweep plot_pareto (#29689) Harry Mellor 2025-11-28 17:10:08 +00:00
f946a8d743 [Chore]: Reorganize model repo operating functions in transformers_utils (#29680) Isotr0py 2025-11-29 00:46:51 +08:00
6f9d81d03b [V0 deprecation] Clean up legacy paged attention helper functions (#28043) Isotr0py 2025-11-29 00:44:33 +08:00
fae6943068 [Doc]: fixing typos in multiple files. (#29685) Didier Durand 2025-11-28 17:41:41 +01:00
3bcbb30cbf add add_truncate_prompt_tokens in repr for PoolingParams (#29683) 果冻虾仁 2025-11-29 00:41:05 +08:00
9e6bcda3ac [mypy] Enable type checking for more directories (#29674) Cyrus Leung 2025-11-29 00:39:27 +08:00
9eec282cb5 Guard FlashInfer sampler using the same check as FlashInfer attention backend (#29415) Harry Mellor 2025-11-28 16:34:48 +00:00
0808eb813b [Misc] Remove yapf directives (#29675) Cyrus Leung 2025-11-28 23:07:23 +08:00
460d8bbf2d Remove upstream fa checks (#29471) Mingyuan Ma 2025-11-28 05:52:42 -08:00
e2f56c309d [CPU] Update torch 2.9.1 for CPU backend (#29664) Li, Jiang 2025-11-28 21:37:54 +08:00
f8151b66fa Revert "Supress verbose logs from model_hosting_container_standards (… (#29335) HappyAmazonian 2025-11-28 05:29:05 -08:00
1168768a2d [Optimization] Early return for _apply_matches and _iter_placeholders (#29668) Cyrus Leung 2025-11-28 21:26:47 +08:00
8e7a891602 [BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542) Nick Hill 2025-11-28 04:52:23 -08:00
953d9c820b [mypy] Pass type checking for vllm/utils and vllm/v1/pool (#29666) Cyrus Leung 2025-11-28 20:40:47 +08:00
33b06a6f24 [Misc] Remove redundant attention var constants (#29650) Cyrus Leung 2025-11-28 20:35:19 +08:00
5c2b5cb422 [Docs] Add SPLADE and Ultravox models to supported models documentation (#29659) Wilson Wu 2025-11-28 18:29:28 +08:00
3cb32e5d6e [Rocm] Set VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS default is disabled (#28985) 杰兮 2025-11-28 18:08:42 +08:00
ccbdf51bd5 [Doc] Reorganize benchmark docs (#29658) Cyrus Leung 2025-11-28 17:19:25 +08:00
5f5521bd5d Fix parameter order in GPT-OSS weight loading function for non-MXFP4 weights (#29506) Filipp Fisin 2025-11-28 09:45:10 +01:00
b2c1d294fa [BUGFIX] MistralTokenizer._call__ adds an invalid EOS token (#29607) Julien Denize 2025-11-28 09:44:47 +01:00
cc0f2a0e19 [Doc] Improve abnormal information string (#29655) maang-h 2025-11-28 16:12:20 +08:00
480598958e [Feature][Bench] Add pareto visualization (#29477) rongfu.leng 2025-11-28 15:53:20 +08:00
b34e8775a3 Revert "[CPU]Update CPU PyTorch to 2.9.0 (#29589)" (#29647) Cyrus Leung 2025-11-28 14:43:18 +08:00
f4b76056ee Improve enable chunked_prefill & prefix_caching logic. (#26623) wang.yuqi 2025-11-28 14:05:48 +08:00
37b15e97e8 [Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594) EanWang211123 2025-11-28 14:05:45 +08:00
c7ba1f6bc7 [BugFix] Fix ValueError in NewRequestData repr methods (#29392) maang-h 2025-11-28 13:42:30 +08:00
18523b87f6 [Docs] Update supported models for Olmo 3 in tool calling documentation (#29411) Wilson Wu 2025-11-28 10:53:55 +08:00
745a3bae1a [LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971) Xin Yang 2025-11-27 18:48:28 -08:00
35657bcd7a [CPU]Update CPU PyTorch to 2.9.0 (#29589) scydas 2025-11-28 09:34:33 +08:00
be493e0b3c [BugFix] Fix new nightly failures (#29578) Lucas Wilkinson 2025-11-27 16:45:38 -05:00
ae0ce1be27 [Model Runner V2][BugFix] Keep reference to GPU tensors in AsyncOutput (#29623) Woosuk Kwon 2025-11-27 12:38:53 -08:00
a5345bf49d [BugFix] Fix plan API Mismatch when using latest FlashInfer (#29426) Andrii Skliar 2025-11-27 20:34:59 +01:00
e5a621b724 [CI] Add batched audios Whisper test (#29308) Nicolò Lucchesi 2025-11-27 20:31:52 +01:00
38658ec6f3 [Bugfix][MM encoder] Fix ViT attention backend resolving for Turing GPU (#29614) Isotr0py 2025-11-28 03:17:37 +08:00
a24ea5414b [Deprecation] Advance deprecation status (#29617) Cyrus Leung 2025-11-28 03:04:58 +08:00
ea228b4491 [Misc] Remove unused code from protocol.py (#29616) Cyrus Leung 2025-11-28 02:39:59 +08:00
d45269b378 add skip_reading_prefix_cache in repr for PoolingParams (#29620) 果冻虾仁 2025-11-28 01:21:00 +08:00
ee9841daa9 [Bugfix] Fix doc build on main (#29619) Cyrus Leung 2025-11-28 01:08:08 +08:00
0840abdd24 [BugFix] Optional tokenizer argument when loading GGUF models (#29582) Injae Ryou 2025-11-28 01:53:10 +09:00
e1f262337b Update Transformers pin in CI to 4.57.3 (#29418) Harry Mellor 2025-11-27 16:42:14 +00:00
fc1d8be3dc [Attention] Update attention imports (#29540) Matthew Bonanni 2025-11-27 11:19:09 -05:00
cd007a53b4 [bugfix] avoid NIXL_ERR_REMOTE_DISCONNECT in nixl_connector when Prefill dies (#28120) Mathis Felardos 2025-11-27 16:32:38 +01:00
66d3d5422c [Doc]: fixing typos in diverse files (#29492) Didier Durand 2025-11-27 16:15:50 +01:00
bab438ff3e [CI/Build] Skip ray tests on ROCm (#29556) Ryan Rock 2025-11-27 09:01:37 -06:00
882851dc81 [CI/Build][Bugfix] Fix auto label issues for CPU (#29610) Li, Jiang 2025-11-27 22:51:26 +08:00
2f5f9acd55 [LoRA] Continue optimizing MoE LoRA weight loading (#29322) Jee Jee Li 2025-11-27 21:56:28 +08:00
cf348c8d27 [Bugfix] Fix HunyuanVL XD-RoPE (#29593) Roger Wang 2025-11-27 04:36:24 -08:00
a5abd1d384 [CI] Auto label CPU related issues (#29602) Li, Jiang 2025-11-27 19:33:19 +08:00
e6d4f3c254 [Bugfix] Fix pre-commit (#29601) Cyrus Leung 2025-11-27 18:23:06 +08:00
51906c8c55 [Docs] Improve priority parameter documentation (#29572) maang-h 2025-11-27 18:09:24 +08:00
0838b52e2e [Frontend][torch.compile] CompilationConfig Overhaul (#20283): Set up -O infrastructure (#26847) Morrison Turnansky 2025-11-27 04:55:58 -05:00
00d3310d2d [Bugfix] Update Ultravox compatibility (#29588) Cyrus Leung 2025-11-27 17:36:18 +08:00
da3222f371 [Model Runner V2] Implement multi-step Eagle with CUDA graph (#29559) Woosuk Kwon 2025-11-27 00:09:41 -08:00
43c5792592 [ROCm][CI] Fix test_cpu_offloading for ROCm (#29548) Micah Williamson 2025-11-27 01:54:44 -06:00

... 39 40 41 42 43 ...