Jialin Ouyang
|
b30372cbd0
|
[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-10 15:34:18 -08:00 |
|
Wentao Ye
|
4b1ff13221
|
[Feature] Default ignore_eos True for random dataset (#28227)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-07 07:35:33 -05:00 |
|
汪志鹏
|
315068eb4a
|
[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark (#28265)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
2025-11-07 09:35:22 +00:00 |
|
Jacob Zhong
|
d72299d47b
|
Make the cv2 dependency optional (#27780)
Signed-off-by: Jacob <cmpute@qq.com>
|
2025-11-06 05:08:55 +00:00 |
|
Sophie du Couédic
|
a4398fbb5e
|
[Feature][Benchmarks] Support inf burstiness (#26941)
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
|
2025-11-03 18:33:17 +00:00 |
|
Seiji Eicher
|
b2e65cb4a7
|
[benchmark] Make request IDs unique across clients by default (#27723)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-10-30 17:40:35 -07:00 |
|
Cyrus Leung
|
ecca3fee76
|
[Frontend] Add vllm bench sweep to CLI (#27639)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-29 05:59:48 -07:00 |
|
Eugene Khvedchenya
|
5e72216d17
|
Feature/video support in random mm dataset (#25963)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenya <ekhvedchenia@nvidia.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-29 18:24:52 +08:00 |
|
Yeshwanth N
|
71b1c8b667
|
[Chore]:Extract math and argparse utilities to separate modules (#27188)
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
|
2025-10-26 04:03:32 -07:00 |
|
Lucia Fang
|
315b860abe
|
[bugfix]fix empty prompts for async-engine mode in benchmark throughput (#27494)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-10-26 08:16:35 +00:00 |
|
Cyrus Leung
|
b7030d962b
|
[Benchmark] Enable benchmark to run with encoding_format="bytes" (#27467)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-24 11:16:50 +00:00 |
|
Cyrus Leung
|
6738e4a093
|
[Bugfix] Fix SLA tuner initialization (#27355)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-22 20:43:04 -07:00 |
|
Cyrus Leung
|
ceacedc1f9
|
[Benchmark] Add plot utility for parameter sweep (#27168)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-21 20:30:03 -07:00 |
|
Cyrus Leung
|
d31f7844f8
|
[Misc] Move utils to avoid conflicts with stdlib, and move tests (#27169)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-19 05:20:55 -07:00 |
|
Cyrus Leung
|
b3aba04e5a
|
[Benchmark] Convenience script for multiple parameter combinations (#27085)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-18 23:57:01 -07:00 |
|
Harry Mellor
|
6c9fdbf725
|
[Docs] Replace rst style double-backtick with md single-backtick (#27091)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-17 02:47:34 -07:00 |
|
Tomas Ruiz
|
965c5f4914
|
vllm bench serve shows num of failed requests (#26478)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
|
2025-10-16 19:55:09 -07:00 |
|
Cyrus Leung
|
4d4d6bad19
|
[Chore] Separate out vllm.utils.importlib (#27022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-17 00:48:59 +00:00 |
|
Wentao Ye
|
23583ee28c
|
[Bug] Add Assertion for random-input-len / random-output-len (#26834)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-16 21:36:39 +00:00 |
|
kimbochen
|
013abde6ef
|
Adding Warmup to Benchmark Serving (#26943)
Signed-off-by: Kimbo Chen <chentenghung@gmail.com>
|
2025-10-16 12:44:32 -07:00 |
|
Cyrus Leung
|
334535b6fb
|
[Benchmark] Show E2EL by default for pooling models (#27014)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 12:47:09 +00:00 |
|
Cyrus Leung
|
17838e50ef
|
[Benchmark] Use truncation by default for pooling benchmarks (#26992)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 16:02:39 +08:00 |
|
Cyrus Leung
|
f6cdc9a02f
|
[Chore] Rename utils submodules (#26920)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 03:58:13 +00:00 |
|
Cyrus Leung
|
828523ad8e
|
[Chore] Separate out vllm.utils.async_utils (#26913)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-15 15:33:00 +00:00 |
|
wangxiyuan
|
8f4b313c37
|
[Misc] rename torch_dtype to dtype (#26695)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-10-15 12:11:48 +00:00 |
|
rongfu.leng
|
a27b288e4a
|
[Feature] default --extra-body param to disable thinking in vllm bench serve (#26784)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-10-15 04:23:44 +00:00 |
|
kourosh hakhamaneshi
|
a2986b3e33
|
[Bugfix] Fixes prefix-repetition benchmark script (#26828)
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com>
|
2025-10-15 02:54:43 +00:00 |
|
Maximilien de Bayser
|
fe3edb4cf0
|
Add support for the /rerank endpoint in vllm bench serve (#26602)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-10-14 04:25:43 +00:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Cyrus Leung
|
5be7ca1b99
|
[Benchmark] Support Infinity API (#26641)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-12 01:45:32 +08:00 |
|
Cyrus Leung
|
4bdf7ac593
|
[Bugfix] Fix SHM cache initialization (#26427)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-09 02:48:04 -07:00 |
|
Cyrus Leung
|
dc7976dd9f
|
[Misc] Upgrade more code to Python 3.10 (#26463)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-09 10:43:53 +01:00 |
|
Huy Do
|
8bd696fa53
|
[Bugfix] Incorrect another MM data format in vllm bench throughput (#26462)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-10-09 05:58:46 +00:00 |
|
Cyrus Leung
|
0d4f48fa10
|
[Bugfix] Incorrect MM data format in vllm bench throughput (#26395)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-08 13:52:19 +08:00 |
|
Cyrus Leung
|
44b9af5bb2
|
[Benchmark] Enable MM Embedding benchmarks (#26310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-06 19:51:58 +00:00 |
|
Roger Wang
|
43c146ca42
|
[Misc] Clean up unnecessary E501 ignore (#26274)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-10-06 07:29:18 +00:00 |
|
Yasmin Moslem
|
7c2ec0fe87
|
[Benchmarking] Add disable_shuffle option for dataset loading (#26258)
Signed-off-by: Yasmin Moslem <48152713+ymoslem@users.noreply.github.com>
|
2025-10-06 07:05:44 +00:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Sergei Skvortsov
|
b71fcd4905
|
[Misc] Add penalties sampling parameters to serve tool (#25974)
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
|
2025-10-03 15:43:14 -07:00 |
|
Ekagra Ranjan
|
ad2d788016
|
[Bug][Benchmark] Fix duplicate req in oversampling (#26140)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-03 02:55:24 +00:00 |
|
Ekagra Ranjan
|
1cab2f9cad
|
EAGLE 3: Fix preamble so that measured speedup over Eagle 1 becomes 32% instead of 5% on MTBench (#25916)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2025-10-02 11:29:35 -07:00 |
|
Nathan Scott
|
f9e714813a
|
[Benchmark] Finish documented v0.11.0 deprecation of --endpoint-type (#26007)
Signed-off-by: Nathan Scott <nathans@redhat.com>
|
2025-10-01 12:41:57 +00:00 |
|
Zhuohan Li
|
d3bd171123
|
[Benchmark] Support benchmark throughput for external launcher DP (#25913)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2025-09-30 01:43:57 +00:00 |
|
weiliang
|
f4e4088c99
|
Fix random dataset mismatched token length with config. (#24937)
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-28 08:23:44 +00:00 |
|
WeiQing Chen
|
f1d53d150c
|
[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl (#22872)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
|
2025-09-27 03:35:47 +00:00 |
|
Lucia Fang
|
eea1783989
|
[benchmarks]allow skip ready check for bench serve (#25420)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-09-23 03:21:48 +00:00 |
|
samzong
|
ce75e15373
|
refactor(benchmarks): add type annotations to wait_for_endpoint parameters (#25218)
Signed-off-by: samzong <samzong.lu@gmail.com>
|
2025-09-19 16:36:52 +00:00 |
|
Roger Wang
|
21da73343a
|
[Misc] Clean up flags in vllm bench serve (#25138)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-18 12:43:33 +00:00 |
|
Punitvara
|
05b044e698
|
[Doc] Fix cross-reference warnings (#25058)
Signed-off-by: Punit Vara <punitvara@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 02:05:16 -07:00 |
|
Simon Mo
|
a904ea78ea
|
[benchmark] add peak throughput metrics and plot (#23867)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-17 22:30:02 -07:00 |
|