Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

b4ac449a83 [Misc] Merge the logs of pp layers partitions (#16225) Kebe 2025-04-08 15:18:15 +08:00
8e5314a468 [V1] Add disable_chunked_mm_input arg to disable partial mm input prefill (#15837) Michael Goin 2025-04-08 00:24:07 -06:00
87918e40c4 [torch.compile][TPU] Make @support_torch_compile work for XLA backend (#15782) Siyuan Liu 2025-04-07 23:23:53 -07:00
f6b32efb7f [Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (#16194) Isotr0py 2025-04-08 13:38:13 +08:00
b99733d092 [Bugfix] Do not skip "empty" parts of chats that are parsable (#16219) Michael Goin 2025-04-07 23:14:15 -06:00
05a015d6a5 Add warning for Attention backends that do not support irope yet (#16212) Yong Hoon Shin 2025-04-07 20:59:26 -07:00
ad971af8c7 [Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 (#16161) zxfan-cpu 2025-04-08 11:48:47 +08:00
f2ebb6f541 [V1] Scatter and gather placeholders in the model runner (#16076) Roger Wang 2025-04-07 19:43:41 -07:00
1d01211264 Update BASE_IMAGE to 2.22 release of Neuron (#16218) Satyajith Chilappagari 2025-04-07 19:11:18 -07:00
f94ab12f79 [Misc] Update compressed-tensors to version 0.9.3 (#16196) Miles Williams 2025-04-08 03:09:06 +01:00
a865bc1ca6 [core] do not send error across process (#16174) youkaichao 2025-04-08 10:09:03 +08:00
21802c4b6d [ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (#16031) Michael Goin 2025-04-07 19:28:14 -06:00
652907b354 Torchao (#14231) Driss Guessous 2025-04-07 16:39:28 -07:00
24f1c01e0f [Bugfix][V0] XGrammar structured output supports Enum (#15878) leon-seidel 2025-04-08 00:38:25 +02:00
fad6e2538e [Misc] add description attribute in CLI (#15921) Reid 2025-04-08 06:30:35 +08:00
7f6d47c1a2 [V1][BugFix] Exit properly if engine core fails during startup (#16137) Nick Hill 2025-04-07 15:30:15 -07:00
3147586ebd [Bugfix] Fix guidance backend for Qwen models (#16210) Benjamin Chislett 2025-04-07 18:15:43 -04:00
ed636d99ca [Misc] Move Llama 4 projector call into encoder execution (#16201) Roger Wang 2025-04-07 14:02:05 -07:00
090c856d76 [Misc] Human-readable max-model-len cli arg (#16181) Nicolò Lucchesi 2025-04-07 20:40:58 +02:00
ad434d4cfe Print the warning only once (#16193) Gregory Shtrasberg 2025-04-07 14:30:06 -04:00
66d433b94f [V1] Revert the default max_num_seqs to V0 values for most hardware (#16158) Cyrus Leung 2025-04-08 01:54:36 +08:00
027b204ff1 [Bugfix] Re-enable support for ChatGLMForConditionalGeneration (#16187) Cyrus Leung 2025-04-07 23:15:58 +08:00
55dcce91df Upstream Llama4 Support to Main (#16113) Lu Fang 2025-04-07 08:06:27 -07:00
8017c8db7f [Doc]Update image to latest version (#16186) Robin 2025-04-07 22:17:39 +08:00
dc3529dbf6 [Misc] improve example mlpspeculator and llm_engine_example (#16175) Reid 2025-04-07 19:53:52 +08:00
7699258ef0 [Model] Add Qwen3 and Qwen3MoE (#15289) YamPengLi 2025-04-07 19:06:41 +08:00
e9ba99f296 [V1][Structured Output] Add supports_structured_output() method to Platform (#16148) Shanshan Shen 2025-04-07 19:06:24 +08:00
7c80368710 [VLM] Florence-2 supports online serving (#16164) Isotr0py 2025-04-07 19:04:02 +08:00
95d63f38c0 doc: fix some typos in doc (#16154) yihong 2025-04-07 13:32:06 +08:00
bb8dab821e [CI] Set max transformers version for Ultravox model test (#16149) Roger Wang 2025-04-06 21:37:58 -07:00
fc0f87768a [Bugfix] Make dummy encoder prompt padding alternative and add missing warnings (#16129) Isotr0py 2025-04-07 12:07:15 +08:00
0a57386721 [Misc] Update Mistral-3.1 example (#16147) Cyrus Leung 2025-04-07 11:57:37 +08:00
3749e28774 [V1][Minor] Minor simplification for get_computed_blocks (#16139) Woosuk Kwon 2025-04-06 20:38:12 -07:00
86fc2321ff [Metrics] Add bucket for request_latency, time_to_first_token and time_per_output_token (#15202) Kay Yan 2025-04-07 11:34:51 +08:00
2549c0dfef Fix requires-python (#16132) Martin Hoyer 2025-04-07 04:22:25 +02:00
b10e519895 [V1][Minor] Optimize get_cached_block (#16135) Woosuk Kwon 2025-04-06 13:48:14 -07:00
9bde5ba127 [TPU] Update PyTorch/XLA (#16130) Chengji Yao 2025-04-06 11:25:55 -07:00
72c8f1ad04 [Misc] update requires-python in pyproject.toml (#16116) Reid 2025-04-06 22:56:34 +08:00
da224daaa9 [Bugfix] add hf_token to EngineArgs (#16093) paolovic 2025-04-06 16:47:33 +02:00
3a100b9278 [Bugfix] LoRA : Fix the order in which the kernels process LoRAs (#16040) Varun Sundar Rabindranath 2025-04-06 10:04:50 -04:00
242a637aea [Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (#16103) rongfu.leng 2025-04-06 20:52:01 +08:00
c2a9671510 [Misc] Improve model redirect to accept json dictionary (#16119) Isotr0py 2025-04-06 20:51:45 +08:00
d5ae4f7f42 [Doc][Bugfix] Add missing EOF in k8s deploy doc (#16025) Paul Schweigert 2025-04-06 08:10:57 -04:00
b6c502a150 [Misc] refactor example eagle (#16100) Reid 2025-04-06 17:42:48 +08:00
9ca710e525 [CI][V1] Fix passing tokenizer as kwarg to validate_guidance_grammar (#16117) Roger Wang 2025-04-06 01:18:00 -07:00
eb07c8cb5b [Frontend] Fix typo in tool chat templates for llama3.2 and toolace (#14501) Ben Jackson 2025-04-06 00:44:36 -07:00
ba10801961 [Benchmark] Add sampling parameters to benchmark_serving. (#16022) Hyesoo Yang 2025-04-05 21:30:35 -07:00
620fc2d09e [Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 (#16112) Lucia Fang 2025-04-05 21:23:40 -07:00
296c6572dd Revert "[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (#15906)" v0.8.3 simon-mo 2025-04-05 21:10:57 -07:00
c575232395 [Model] Support Llama4 in vLLM (#16104) Lu Fang 2025-04-05 21:01:00 -07:00
29283eaa7e [Model] use AutoWeightsLoader for phi, gemma, deepseek (#16088) Jonghyun Choe 2025-04-06 12:34:38 +09:00
2fa66ef713 [Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (#15946) Jinzhen Lin 2025-04-06 11:04:22 +08:00
13affc432d [Misc] Remove redundant code (#16098) Chauncey 2025-04-06 11:03:50 +08:00
d8f094a92a [Misc] format output for encoder_decoder.py (#16095) Reid 2025-04-06 10:57:18 +08:00
97ae6d777f Fix some capitalisations in generated examples doc titles (#16094) Harry Mellor 2025-04-05 14:44:03 +01:00
6baeee70d1 Revert "doc: add info for macos clang errors (#16049)" (#16091) yihong 2025-04-05 19:51:51 +08:00
d2517a4939 [doc] fix 404 (#16082) Reid 2025-04-05 19:39:18 +08:00
6342adc438 fix: support clang17 for macos and fix the real libomp (#16086) yihong 2025-04-05 19:00:12 +08:00
0adba91547 [CI] Fix benchmark script level (#16089) Kevin H. Luu 2025-04-05 03:36:01 -07:00
4285e423a6 [Misc] Auto detect bitsandbytes pre-quantized models (#16027) Tristan Leclercq 2025-04-05 08:30:45 +02:00
63375f0cdb [V1][Spec Decode] Update N-gram Proposer Interface (#15750) v0.8.3rc1 Woosuk Kwon 2025-04-04 16:32:54 -07:00
70ad3f9e98 [Bugfix][TPU] Fix V1 TPU worker for sliding window (#16059) Michael Goin 2025-04-04 17:31:19 -06:00
d6fc629f4d [Kernel][Minor] Re-fuse triton moe weight application (#16071) bnellnm 2025-04-04 19:27:34 -04:00
af51d80fa1 Revert "[V1] Scatter and gather placeholders in the model runner" (#16075) Roger Wang 2025-04-04 14:50:57 -07:00
f5722a5052 [V1] Scatter and gather placeholders in the model runner (#15712) Cyrus Leung 2025-04-05 05:26:44 +08:00
651cf0fec1 [V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (#15906) Nick Hill 2025-04-04 12:56:43 -07:00
4dc52e1c53 [CI] Reorganize .buildkite directory (#16001) Kevin H. Luu 2025-04-04 12:16:20 -07:00
4708f13a9c [Bugfix] Fix default behavior/fallback for pp in v1 (#16057) Michael Goin 2025-04-04 11:58:08 -06:00
a6d042df0a [ROCm][Bugfix] Bring back fallback to eager mode removed in #14917, but for ROCm only (#15413) Gregory Shtrasberg 2025-04-04 12:40:37 -04:00
40a36ccfeb [ROCm][Bugfix] Use platform specific FP8 dtype (#15717) Gregory Shtrasberg 2025-04-04 12:40:20 -04:00
ef608c37a7 [Distributed] [ROCM] Fix custom allreduce enable checks (#16010) Ilya Markov 2025-04-04 18:39:08 +02:00
2386803f2a [CPU] Change default block_size for CPU backend (#16002) Li, Jiang 2025-04-05 00:39:05 +08:00
95862f7b4d [Benchmark][Doc] Update throughput benchmark and README (#15998) Ziji Shi (Steven) 2025-04-04 09:39:02 -07:00
230b131b54 [Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995) Isotr0py 2025-04-05 00:38:58 +08:00
0812d8dd41 [Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (#15945) liuzhenwei 2025-04-05 00:38:55 +08:00
bf7e3c51ae [Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (#15939) Jonghyun Choe 2025-04-05 01:38:52 +09:00
a35a8a8392 [V1][Spec Decode] Avoid logging useless nan metrics (#16023) Mark McLoughlin 2025-04-04 16:52:41 +01:00
4ef0bb1fcf doc: add info for macos clang errors (#16049) yihong 2025-04-04 22:58:16 +08:00
fadc59c0e6 [TPU][V1] Remove ragged attention kernel parameter hard coding (#16041) Chengji Yao 2025-04-04 04:48:50 -07:00
86cbd2eee9 [Misc] improve gguf check (#15974) Reid 2025-04-04 09:33:36 +08:00
092475f738 [ROCm] Tweak the benchmark script to run on ROCm (#14252) Huy Do 2025-04-03 17:12:48 -07:00
dcc56d62da [Bugfix] Fix function names in test_block_fp8.py (#16033) bnellnm 2025-04-03 19:01:34 -04:00
f15e70d906 [TPU] Switch Test to Non-Sliding Window (#15981) Robert Shaw 2025-04-03 14:28:45 -07:00
b6be6f8d1e [TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732) iefgnoix 2025-04-03 14:23:28 -07:00
03a70eacaf Re-enable the AMD Testing for the passing tests. (#15586) Alexei-V-Ivanov-AMD 2025-04-03 13:05:17 -05:00
45b1ff7a25 [Misc][Performance] Advance tpu.txt to the most recent nightly torch … (#16024) yarongmu-google 2025-04-03 10:32:54 -07:00
15ba07ef25 [Minor] Fused experts refactor (#15914) bnellnm 2025-04-03 13:19:38 -04:00
d2b58ca203 [Neuron][kernel] Fuse kv cache into a single tensor (#15911) Liangfu Chen 2025-04-03 09:51:32 -07:00
82e7e19a6e [SupportsQuant] Chameleon, Chatglm, Commandr (#15952) Kyle Sayers 2025-04-03 11:25:22 -04:00
421c462948 [SupportsQuant] Bert, Blip, Blip2, Bloom (#15573) Kyle Sayers 2025-04-03 11:23:19 -04:00
84884cd9ac fix: tiny fix make format.sh excutable (#16015) yihong 2025-04-03 23:18:05 +08:00
a43aa183dc [doc] update contribution link (#15922) Reid 2025-04-03 18:47:31 +08:00
463bbb1835 [Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (#15367) wwl2755 2025-04-03 02:32:10 -05:00
5e125e74d1 [misc] improve error message for "Failed to infer device type" (#15994) youkaichao 2025-04-03 14:45:03 +08:00
06f21ce7a5 [Benchmark] Add AIMO Dataset to Benchmark (#15955) Ziji Shi (Steven) 2025-04-02 23:09:18 -07:00
57a810db9c [ROCM][V0] PA kennel selection when no sliding window provided (#15982) Aleksandr Malyshev 2025-04-02 22:28:44 -07:00
8b664706aa [bugfix] add seed in torchrun_example.py (#15980) youkaichao 2025-04-03 12:25:01 +08:00
37bfee92bf fix: better error message for get_config close #13889 (#15943) yihong 2025-04-03 11:53:19 +08:00
e73ff24e31 [ROCM][KERNEL] Paged attention for V1 (#15720) Aleksandr Malyshev 2025-04-02 19:48:00 -07:00
bd7599d34a [V1][TPU] Do not compile sampling more than needed (#15883) Nicolò Lucchesi 2025-04-03 03:36:01 +02:00

... 101 102 103 104 105 ...