Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

3496274663 [Misc] Convert VLLM_TORCH_PROFILER_DIR path to absolute (#23191) Ning Xie 2025-08-22 03:49:09 +08:00
8a19303173 [BugFix][gpt-oss] Fix Chat Completion with Multiple Output Message (#23318) Chen Zhang 2025-08-21 10:31:11 -07:00
603fbbbce0 [Misc] Misc code cleanup/simplification (#23304) Nick Hill 2025-08-21 10:22:55 -07:00
10f535c086 [Bugfix] Fix port conflict by obtaining a list of open ports upfront (#21894) Ming Yang 2025-08-21 10:22:18 -07:00
48bfb0c9b7 [Bug] Fix R1 Accuracy 0 Bug (#23294) Wentao Ye 2025-08-21 13:11:28 -04:00
f8ce022948 add tg-mxfp4-moe-test (#22540) Lain 2025-08-21 10:05:47 -07:00
0278f1ac3a Fix nvfp4 swizzling (#23140) Yi Liu 2025-08-22 00:54:50 +08:00
a482e4e769 Migrate MolmoImageInputs to TensorSchema (#22022) Benji Beck 2025-08-21 09:54:08 -07:00
e0b056e443 [ci/build] Fix abi tag for aarch64 (#23329) youkaichao 2025-08-21 23:32:55 +08:00
79f05e4436 [Multimodal] Always enable hashing mm data (#23308) Roger Wang 2025-08-21 07:23:28 -07:00
f8daddcc4c [Bugfix] set system_message in phi4mini chat template (#23309) jerryzhuang 2025-08-22 00:22:39 +10:00
c8e33c72c6 [V1] Remove unnecessary check for main thread (#23298) Robert Shaw 2025-08-21 10:08:35 -04:00
d70a16625d [Performance] V1 Pooling Models E2E Performance Optimization (#23162) wang.yuqi 2025-08-21 21:26:09 +08:00
5cc54f7c5b [Doc] Fix batch-level DP example (#23325) Cyrus Leung 2025-08-21 21:16:38 +08:00
0c6e40bbaa [Refactor] Simplify code for MM budget (#23310) Cyrus Leung 2025-08-21 16:00:16 +08:00
2e2000f352 [Model] Add LFM2 architecture (#22845) Paul Pak 2025-08-21 01:35:07 -06:00
31282401b6 [BugFix] Fix Python 3.9 Support (#23306) Jared O'Connell 2025-08-21 02:23:56 -04:00
0c31e28e95 [Bugfix] Fix extra whitespace in strings caused by newline (#23272) Cyrus Leung 2025-08-21 13:03:00 +08:00
f571ff8eb6 [Sampler] Support returning final logprobs (#22387) 22quinn 2025-08-20 21:28:32 -07:00
f64ee61d9e [CI] Block the cu126 wheel build while broken (#23285) Michael Goin 2025-08-21 00:21:05 -04:00
8993073dc1 [CI] Delete images older than 24h. (#23291) QiliangCui 2025-08-21 04:15:20 +00:00
655a09f653 [Model][VLM] Support R-4B Model (#23246) 杨奇(yann qi) 2025-08-21 12:08:52 +08:00
f94bf9b924 [Compile] Fix Compile Warning SM100 Cutlass MLA (#23287) Wentao Ye 2025-08-20 23:09:39 -04:00
3663870c72 [V1][Mamba1] - Full CUDA and Piecewise CUDA Graphs Support (#23035) Asaf Joseph Gardin 2025-08-21 06:08:51 +03:00
2461d9e562 [CI/Build] Split out mm processor tests (#23260) Cyrus Leung 2025-08-21 11:05:20 +08:00
7be5d113d8 [CPU] Refactor CPU W8A8 scaled_mm (#23071) Li, Jiang 2025-08-21 09:34:24 +08:00
b029de9902 [Optimization] Make new_block_ids None if empty (#23262) Woosuk Kwon 2025-08-20 18:25:56 -07:00
bbea1cefdd [CI Bugfix] Fix CI by fully removing --enable-prompt-adapter (#23284) Michael Goin 2025-08-20 20:18:12 -04:00
f5aa307d77 Remove duplicate entry in vllm.attention.__all__ (#23296) Russell Bryant 2025-08-20 20:14:59 -04:00
4b795020ed [EP] Add logging for experts map (#22685) 22quinn 2025-08-20 16:46:06 -07:00
c86af22f31 [Fix] remove is_marlin param in benchmark_moe (#23286) shixianc 2025-08-20 15:04:21 -07:00
10cc12ba66 Feature/mla tests (#23195) Matthew Bonanni 2025-08-20 17:46:47 -04:00
a4fbb32fab Remove chunked_prefill_enabled flag in V1 MLA (#23183) Matthew Bonanni 2025-08-20 17:43:17 -04:00
1b125004be [misc] fix multiple arch wheels for the nightly index (#23110) youkaichao 2025-08-21 05:15:34 +08:00
4fbda0b20c [Feature] use --eplb_config to set eplb param (#20562) rongfu.leng 2025-08-21 05:07:28 +08:00
1da94e673c Do not use eval() to convert unknown types (#23266) v0.10.1.1 Russell Bryant 2025-08-20 16:28:30 -04:00
d8b736f913 Limit HTTP header count and size (#23267) Russell Bryant 2025-08-20 13:57:37 -04:00
3a8708f60a [BugFix] fix CUTLASS MLA full cudagraph (#23200) Lucas Wilkinson 2025-08-19 18:17:08 -04:00
4e51fa8cba Do not use eval() to convert unknown types (#23266) Russell Bryant 2025-08-20 16:28:30 -04:00
bf7c99dfc4 [Perf] Speed up function _convert_tokens_to_string_with_added_encoders by 13.7x (#20413) Saurabh Misra 2025-08-20 13:17:11 -07:00
b95697d731 [Frontend] improve error logging of chat completion (#22957) Chen Zhang 2025-08-20 13:03:37 -07:00
582bbe6bd7 [Fix] correct tool_id for kimi-k2 when use tool_choice=required (#21259) bigmoyan 2025-08-21 03:59:54 +08:00
0cdbf5e61c [Kernel/Quant] Remove the original marlin format and qqq (#23204) Michael Goin 2025-08-20 15:13:36 -04:00
ebe56a0064 Small fix for Command-A-Vision (#23268) dongluw 2025-08-20 14:15:18 -04:00
f77a0802b7 Limit HTTP header count and size (#23267) Russell Bryant 2025-08-20 13:57:37 -04:00
c4477f55e5 Migrate Mistral3ImagePixelInputs to TensorSchema (#21945) Benji Beck 2025-08-20 10:37:29 -07:00
dfd2382039 [torch.compile] Support conditional torch.compile per module (#22269) Yong Hoon Shin 2025-08-20 09:52:59 -07:00
3b11b26b50 [FIXBUG ] Allow disabling rocm_aiter_fa backend for ROCm GPUs not compatible with AITER (#22795) JartX 2025-08-20 18:08:29 +02:00
d6d13bd49e [Misc] Add max_seq_len to CommonAttentionMetadata (#23216) Woosuk Kwon 2025-08-20 09:05:29 -07:00
5efd6905bc [CLI][Doc] Formalize --mm-encoder-tp-mode (#23190) Cyrus Leung 2025-08-20 23:42:28 +08:00
b17109beea [Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045) shixianc 2025-08-20 07:35:26 -07:00
4449235843 [Bugfix] Ensure correctness of HCXVision processing (#23254) Cyrus Leung 2025-08-20 22:19:30 +08:00
38217877aa [Fix] fix offline env use local mode path (#22526) rongfu.leng 2025-08-20 21:34:49 +08:00
c6d80a7a96 [Model] Improve olmo and olmo2 (#23228) Jee Jee Li 2025-08-20 20:47:05 +08:00
7cd17e22d7 [Model][V1] Support Ernie MTP (#22169) xyxinyang 2025-08-20 20:41:55 +08:00
50df09fe13 Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image (#23129) Michael Goin 2025-08-20 08:05:54 -04:00
68fcd3fa73 [Bugfix] Ensure correctness of Cohere2Vision processing (#23245) Cyrus Leung 2025-08-20 19:09:18 +08:00
83e69a09d6 [Model] Support deepseek with eagle (#21086) Xin Yang 2025-08-20 04:01:31 -07:00
3aa8c10038 Fix missing quotes (#23242) Shiming Zhang 2025-08-20 18:46:59 +08:00
103f1ec8d3 [Model] use autoWeightsLoader for gptoss (#22446) Calvin Chen 2025-08-20 18:16:27 +08:00
d983769c41 fix cuda graph (#22721) who who who 2025-08-20 14:24:37 +08:00
8fd920924c [BugFix] Fix stuck stats/metrics after requests are aborted (#22995) Nick Hill 2025-08-19 22:50:29 -07:00
de7b67a023 [CI/Build] Sync multimodal tests (#23181) Cyrus Leung 2025-08-20 13:06:42 +08:00
f729023272 [CI/Build] Also check DP in benchmarks throughput script (#23038) Zhewen Li 2025-08-19 21:09:27 -07:00
1a3079a15e chore: support pytorch format in lora (#22790) 길재은 2025-08-20 13:02:50 +09:00
941f56858a Fix a performance comparison issue in Benchmark Suite (#23047) Louie Tsai 2025-08-19 20:14:32 -07:00
a634733f67 [Attention] Optimize make_local_attention_virtual_batches for Flash Attention (#23185) Zebing Lin 2025-08-19 22:57:47 -04:00
64ab3c7253 [Doc] Update V1 status of various pooling models (#23189) Cyrus Leung 2025-08-20 10:33:41 +08:00
e58c5a9768 [Core] Add torch profiler CPU traces for AsyncLLM. (#21794) Chenheli Hua 2025-08-19 19:32:47 -07:00
d46d417b58 [CI Perf] Only test bfloat16 for tests/compile/test_fusion_all_reduce.py (#23132) Michael Goin 2025-08-19 22:18:52 -04:00
0167efe20d [Core] Optimize scheduler request removal for single completions (#21917) 633WHU 2025-08-20 09:25:59 +08:00
c32e6ad1f6 [Quantization] Bump Compressed Tensors Version (#23202) Kyle Sayers 2025-08-19 20:39:28 -04:00
1630cc8d0f [Benchmarks] Add video inputs to ShareGPTDataset. (#23199) Chenheli Hua 2025-08-19 16:42:31 -07:00
14e2b0730b [BugFix] fix CUTLASS MLA full cudagraph (#23200) Lucas Wilkinson 2025-08-19 18:17:08 -04:00
0f4f0191d8 [CI/Build] Replace lm-eval gsm8k tests with faster implementation (#23002) Michael Goin 2025-08-19 18:07:30 -04:00
a38b8af4c3 [NVIDIA] Add SM100 Flashinfer Cutlass MoE fp8 backend (#22357) amirkl94 2025-08-20 01:01:53 +03:00
21dce80ea9 [CI/Build] Add support for Python 3.13 (#13164) Michael Goin 2025-08-19 16:49:34 -04:00
e61bac87ee [Misc] Minor refactoring for FlashInfer backend (#23147) Woosuk Kwon 2025-08-19 13:11:51 -07:00
80141bbf2f fix: use cache_salt for gpt-oss (#23186) Marko Rosenmueller 2025-08-19 20:12:25 +02:00
b94faf9d50 [Bugfix] Fix accuracy issue when using flashinfer cutlass moe, TP=1 and modelopt. (#23125) bnellnm 2025-08-19 14:00:51 -04:00
5b5f350d67 [Misc] Enable yapf for FlashInfer backend (#23193) Woosuk Kwon 2025-08-19 10:33:47 -07:00
f7cf5b512e [Frontend] Add /collective_rpc API endpoint (#23075) 22quinn 2025-08-19 10:29:32 -07:00
03d4235fd2 [Misc] Fix the benchmark's README and improve the error messages for the benchmark's argument checks (#22654) Ruixiang Tan 2025-08-20 01:18:51 +08:00
d6a1a20973 [CI/Build] Update transformers to v4.55.2 (#23093) Isotr0py 2025-08-20 01:06:17 +08:00
a70d0bd0a3 Migrate LlavaOnevisionMultiInputs to TensorSchema (#21844) Benji Beck 2025-08-19 10:02:02 -07:00
24f4d1a224 Add return_token_ids parameter to OpenAI API endpoints (#22587) Yuge Zhang 2025-08-20 00:48:31 +08:00
4f510bc2a1 [Model] Removes redundant all-reduce operation in Qwen3MoeSparseMoeBlock (#23169) yiz-liu 2025-08-20 00:18:41 +08:00
1298c67795 [FEAT] [Performance] Enable DP for ViT in Qwen2.5VL (#22742) TJian 2025-08-19 08:25:57 -07:00
4d9c61993a [Bugfix] Fix benchmark_moe.py (#23177) Jee Jee Li 2025-08-19 21:39:40 +08:00
b87cb97a53 [Model] support new model ovis2.5 (#23084) myselvess 2025-08-19 21:12:59 +08:00
f856c33ce9 [Model] Add multi_label_classification support (#23173) wang.yuqi 2025-08-19 20:54:30 +08:00
03752dba8f [NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716) elvischenv 2025-08-19 20:22:15 +08:00
40f26734b9 [Misc] Fix seq_lens for graph capture (#23175) Woosuk Kwon 2025-08-19 03:58:16 -07:00
2c3f557f08 [Doc] use power of 2 (#23172) Tialo 2025-08-19 13:16:23 +03:00
21bcc8263f [Misc] Avoid accessing req_ids inside a loop (#23159) Woosuk Kwon 2025-08-19 02:39:38 -07:00
5bfe0dea7a [bug fix] Fix llama4 spec decoding (#22691) qizixi 2025-08-19 17:53:24 +09:00
31fd3265c8 [Bugfix] Fix broken Minimax-01-VL model (#22116) Isotr0py 2025-08-19 16:49:29 +08:00
31436e8b4f [Misc] Add request_id into benchmark_serve.py (#23065) hustxiayang 2025-08-19 04:32:18 -04:00
4efd43e9b4 Fix GLM-4.5V-FP8 numerical issue (#22949) qizixi 2025-08-19 16:56:31 +09:00
3c8a787247 [Benchmark] Add flag --served-model-name to benchmark_serving_multi_turn (#22889) Daniel Serebrenik 2025-08-19 10:48:07 +03:00

... 70 71 72 73 74 ...