Matthew Bonanni
|
d49899732e
|
[Spec Decode][UX] Add acceptance stats to vllm bench serve report (#31739)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-01-06 21:21:42 +00:00 |
|
Elvir Crnčević
|
dba95378a6
|
Report error log after vllm bench serve (#31808)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
|
2026-01-06 20:24:19 +00:00 |
|
Reagan Lee
|
1f5b7c41c3
|
Add Multimodal Processor Benchmark (#29105)
Signed-off-by: Reagan Lee <reaganjlee@gmail.com>
Signed-off-by: Reagan <reaganjlee@gmail.com>
|
2026-01-01 19:26:53 -08:00 |
|
Ning Xie
|
3b8f31b362
|
[benchmark] use model card root instead of id (#31329)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-12-26 10:55:56 +08:00 |
|
Ming Yang
|
3bb9561928
|
Revert "[bench] Support common prefix len config (for decode-only bench)" (#31240)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-12-23 21:17:23 -08:00 |
|
zhrrr
|
eee600c34f
|
[Misc] support nsys profile for bench latency (#29776)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2025-12-18 14:52:20 +00:00 |
|
Michael Goin
|
519ef9a911
|
[UX] Make vllm bench serve discover model by default and use --input-len (#30816)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-17 01:55:30 -08:00 |
|
Junru Shen
|
676db55eec
|
[Bugfix] Fix prefix_repetition routing in bench throughput (#29663)
Signed-off-by: Junru Shen <jrshen.sjr@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-16 01:37:15 -08:00 |
|
Vensen
|
add4b0ca44
|
[Bugfix][benchmarks] Fix input token calculation for rerank benchmark metrics (#30596)
Signed-off-by: vensen <vensenmu@gmail.com>
|
2025-12-14 14:57:15 +00:00 |
|
Bin Bao
|
a8ec486592
|
[Misc] Add a script to benchmark compilation time (#29919)
Signed-off-by: Bin Bao <binbao@meta.com>
|
2025-12-14 13:02:39 +00:00 |
|
Matthew Bonanni
|
794a7875ee
|
[Misc] Consistent case for vllm bench serve results (#30403)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-10 09:44:02 -08:00 |
|
Benjamin Chislett
|
e858bfe051
|
[Cleanup] Refactor profiling env vars into a CLI config (#29912)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-09 13:29:33 -05:00 |
|
Rohan Potdar
|
40a046cd82
|
[Bugfix]: Fix TokenizerLike interface (#30009)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2025-12-05 20:56:40 -08:00 |
|
Ming Yang
|
f16356fe36
|
[bench] Support common prefix len config (for decode-only bench) (#29934)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-12-05 10:26:52 +00:00 |
|
Copilot
|
1c593e117d
|
Fix boolean nested params, add dict format support, and enhance plotting for vllm bench sweep (#29025)
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-12-02 20:40:56 +00:00 |
|
Cyrus Leung
|
653591d5e7
|
[Chore] Move tokenizer initialization methods (#29793)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-02 13:33:37 +08:00 |
|
Cyrus Leung
|
34a984274e
|
[Misc] Refactor tokenizer interface (#29693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 04:02:21 -08:00 |
|
rongfu.leng
|
480598958e
|
[Feature][Bench] Add pareto visualization (#29477)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-11-27 23:53:20 -08:00 |
|
Didier Durand
|
66d3d5422c
|
[Doc]: fixing typos in diverse files (#29492)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-27 07:15:50 -08:00 |
|
rongfu.leng
|
68dfe28eae
|
[Feature][Benchmark] add --link-vars can filter when serve_param equal bench_param (#28909)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-11-24 02:02:28 -08:00 |
|
scottzh8
|
3bc1175798
|
[Bugfix] Fix host and port join for ipv6 in bench serve (#28679)
Signed-off-by: Scott Zhang <scottzh@fb.com>
Co-authored-by: Scott Zhang <scottzh@fb.com>
|
2025-11-16 10:20:57 +00:00 |
|
Jialin Ouyang
|
b30372cbd0
|
[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-10 15:34:18 -08:00 |
|
Wentao Ye
|
4b1ff13221
|
[Feature] Default ignore_eos True for random dataset (#28227)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-07 07:35:33 -05:00 |
|
汪志鹏
|
315068eb4a
|
[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark (#28265)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
2025-11-07 09:35:22 +00:00 |
|
Jacob Zhong
|
d72299d47b
|
Make the cv2 dependency optional (#27780)
Signed-off-by: Jacob <cmpute@qq.com>
|
2025-11-06 05:08:55 +00:00 |
|
Sophie du Couédic
|
a4398fbb5e
|
[Feature][Benchmarks] Support inf burstiness (#26941)
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
|
2025-11-03 18:33:17 +00:00 |
|
Seiji Eicher
|
b2e65cb4a7
|
[benchmark] Make request IDs unique across clients by default (#27723)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-10-30 17:40:35 -07:00 |
|
Cyrus Leung
|
ecca3fee76
|
[Frontend] Add vllm bench sweep to CLI (#27639)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-29 05:59:48 -07:00 |
|
Eugene Khvedchenya
|
5e72216d17
|
Feature/video support in random mm dataset (#25963)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenya <ekhvedchenia@nvidia.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-29 18:24:52 +08:00 |
|
Yeshwanth N
|
71b1c8b667
|
[Chore]:Extract math and argparse utilities to separate modules (#27188)
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com>
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com>
Signed-off-by: yeshsurya <yeshsurya@gmail.com>
|
2025-10-26 04:03:32 -07:00 |
|
Lucia Fang
|
315b860abe
|
[bugfix]fix empty prompts for async-engine mode in benchmark throughput (#27494)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-10-26 08:16:35 +00:00 |
|
Cyrus Leung
|
b7030d962b
|
[Benchmark] Enable benchmark to run with encoding_format="bytes" (#27467)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-24 11:16:50 +00:00 |
|
Cyrus Leung
|
6738e4a093
|
[Bugfix] Fix SLA tuner initialization (#27355)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-22 20:43:04 -07:00 |
|
Cyrus Leung
|
ceacedc1f9
|
[Benchmark] Add plot utility for parameter sweep (#27168)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-21 20:30:03 -07:00 |
|
Cyrus Leung
|
d31f7844f8
|
[Misc] Move utils to avoid conflicts with stdlib, and move tests (#27169)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-19 05:20:55 -07:00 |
|
Cyrus Leung
|
b3aba04e5a
|
[Benchmark] Convenience script for multiple parameter combinations (#27085)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-18 23:57:01 -07:00 |
|
Harry Mellor
|
6c9fdbf725
|
[Docs] Replace rst style double-backtick with md single-backtick (#27091)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-17 02:47:34 -07:00 |
|
Tomas Ruiz
|
965c5f4914
|
vllm bench serve shows num of failed requests (#26478)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
|
2025-10-16 19:55:09 -07:00 |
|
Cyrus Leung
|
4d4d6bad19
|
[Chore] Separate out vllm.utils.importlib (#27022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-17 00:48:59 +00:00 |
|
Wentao Ye
|
23583ee28c
|
[Bug] Add Assertion for random-input-len / random-output-len (#26834)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-16 21:36:39 +00:00 |
|
kimbochen
|
013abde6ef
|
Adding Warmup to Benchmark Serving (#26943)
Signed-off-by: Kimbo Chen <chentenghung@gmail.com>
|
2025-10-16 12:44:32 -07:00 |
|
Cyrus Leung
|
334535b6fb
|
[Benchmark] Show E2EL by default for pooling models (#27014)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 12:47:09 +00:00 |
|
Cyrus Leung
|
17838e50ef
|
[Benchmark] Use truncation by default for pooling benchmarks (#26992)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 16:02:39 +08:00 |
|
Cyrus Leung
|
f6cdc9a02f
|
[Chore] Rename utils submodules (#26920)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-16 03:58:13 +00:00 |
|
Cyrus Leung
|
828523ad8e
|
[Chore] Separate out vllm.utils.async_utils (#26913)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-15 15:33:00 +00:00 |
|
wangxiyuan
|
8f4b313c37
|
[Misc] rename torch_dtype to dtype (#26695)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-10-15 12:11:48 +00:00 |
|
rongfu.leng
|
a27b288e4a
|
[Feature] default --extra-body param to disable thinking in vllm bench serve (#26784)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-10-15 04:23:44 +00:00 |
|
kourosh hakhamaneshi
|
a2986b3e33
|
[Bugfix] Fixes prefix-repetition benchmark script (#26828)
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com>
|
2025-10-15 02:54:43 +00:00 |
|
Maximilien de Bayser
|
fe3edb4cf0
|
Add support for the /rerank endpoint in vllm bench serve (#26602)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-10-14 04:25:43 +00:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|