biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Matthew Bonanni	d49899732e	[Spec Decode][UX] Add acceptance stats to `vllm bench serve` report (#31739 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2026-01-06 21:21:42 +00:00
Elvir Crnčević	dba95378a6	Report error log after vllm bench serve (#31808 ) Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>	2026-01-06 20:24:19 +00:00
Reagan Lee	1f5b7c41c3	Add Multimodal Processor Benchmark (#29105 ) Signed-off-by: Reagan Lee <reaganjlee@gmail.com> Signed-off-by: Reagan <reaganjlee@gmail.com>	2026-01-01 19:26:53 -08:00
Ning Xie	3b8f31b362	[benchmark] use model card root instead of id (#31329 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-12-26 10:55:56 +08:00
Ming Yang	3bb9561928	Revert "[bench] Support common prefix len config (for decode-only bench)" (#31240 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-23 21:17:23 -08:00
zhrrr	eee600c34f	[Misc] support nsys profile for bench latency (#29776 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2025-12-18 14:52:20 +00:00
Michael Goin	519ef9a911	[UX] Make `vllm bench serve` discover model by default and use --input-len (#30816 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-17 01:55:30 -08:00
Junru Shen	676db55eec	[Bugfix] Fix prefix_repetition routing in bench throughput (#29663 ) Signed-off-by: Junru Shen <jrshen.sjr@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-16 01:37:15 -08:00
Vensen	add4b0ca44	[Bugfix][benchmarks] Fix input token calculation for rerank benchmark metrics (#30596 ) Signed-off-by: vensen <vensenmu@gmail.com>	2025-12-14 14:57:15 +00:00
Bin Bao	a8ec486592	[Misc] Add a script to benchmark compilation time (#29919 ) Signed-off-by: Bin Bao <binbao@meta.com>	2025-12-14 13:02:39 +00:00
Matthew Bonanni	794a7875ee	[Misc] Consistent case for `vllm bench serve` results (#30403 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-10 09:44:02 -08:00
Benjamin Chislett	e858bfe051	[Cleanup] Refactor profiling env vars into a CLI config (#29912 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-09 13:29:33 -05:00
Rohan Potdar	40a046cd82	[Bugfix]: Fix `TokenizerLike` interface (#30009 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2025-12-05 20:56:40 -08:00
Ming Yang	f16356fe36	[bench] Support common prefix len config (for decode-only bench) (#29934 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-05 10:26:52 +00:00
Copilot	1c593e117d	Fix boolean nested params, add dict format support, and enhance plotting for vllm bench sweep (#29025 ) Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-02 20:40:56 +00:00
Cyrus Leung	653591d5e7	[Chore] Move tokenizer initialization methods (#29793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 13:33:37 +08:00
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
rongfu.leng	480598958e	[Feature][Bench] Add pareto visualization (#29477 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-11-27 23:53:20 -08:00
Didier Durand	66d3d5422c	[Doc]: fixing typos in diverse files (#29492 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-27 07:15:50 -08:00
rongfu.leng	68dfe28eae	[Feature][Benchmark] add --link-vars can filter when serve_param equal bench_param (#28909 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-11-24 02:02:28 -08:00
scottzh8	3bc1175798	[Bugfix] Fix host and port join for ipv6 in bench serve (#28679 ) Signed-off-by: Scott Zhang <scottzh@fb.com> Co-authored-by: Scott Zhang <scottzh@fb.com>	2025-11-16 10:20:57 +00:00
Jialin Ouyang	b30372cbd0	[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage (#27896 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-10 15:34:18 -08:00
Wentao Ye	4b1ff13221	[Feature] Default `ignore_eos` True for `random` dataset (#28227 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-07 07:35:33 -05:00
汪志鹏	315068eb4a	[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark (#28265 ) Signed-off-by: princepride <wangzhipeng628@gmail.com>	2025-11-07 09:35:22 +00:00
Jacob Zhong	d72299d47b	Make the cv2 dependency optional (#27780 ) Signed-off-by: Jacob <cmpute@qq.com>	2025-11-06 05:08:55 +00:00
Sophie du Couédic	a4398fbb5e	[Feature][Benchmarks] Support `inf` burstiness (#26941 ) Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>	2025-11-03 18:33:17 +00:00
Seiji Eicher	b2e65cb4a7	[benchmark] Make request IDs unique across clients by default (#27723 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-10-30 17:40:35 -07:00
Cyrus Leung	ecca3fee76	[Frontend] Add `vllm bench sweep` to CLI (#27639 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-29 05:59:48 -07:00
Eugene Khvedchenya	5e72216d17	Feature/video support in random mm dataset (#25963 ) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenia@nvidia.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-29 18:24:52 +08:00
Yeshwanth N	71b1c8b667	[Chore]:Extract math and argparse utilities to separate modules (#27188 ) Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by: Yeshwanth N <yeshsurya@gmail.com> Signed-off-by: yeshsurya <yeshsurya@gmail.com>	2025-10-26 04:03:32 -07:00
Lucia Fang	315b860abe	[bugfix]fix empty prompts for async-engine mode in benchmark throughput (#27494 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-10-26 08:16:35 +00:00
Cyrus Leung	b7030d962b	[Benchmark] Enable benchmark to run with `encoding_format="bytes"` (#27467 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-24 11:16:50 +00:00
Cyrus Leung	6738e4a093	[Bugfix] Fix SLA tuner initialization (#27355 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-22 20:43:04 -07:00
Cyrus Leung	ceacedc1f9	[Benchmark] Add plot utility for parameter sweep (#27168 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-21 20:30:03 -07:00
Cyrus Leung	d31f7844f8	[Misc] Move utils to avoid conflicts with stdlib, and move tests (#27169 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-19 05:20:55 -07:00
Cyrus Leung	b3aba04e5a	[Benchmark] Convenience script for multiple parameter combinations (#27085 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-18 23:57:01 -07:00
Harry Mellor	6c9fdbf725	[Docs] Replace `rst` style double-backtick with `md` single-backtick (#27091 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:47:34 -07:00
Tomas Ruiz	965c5f4914	vllm bench serve shows num of failed requests (#26478 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2025-10-16 19:55:09 -07:00
Cyrus Leung	4d4d6bad19	[Chore] Separate out `vllm.utils.importlib` (#27022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 00:48:59 +00:00
Wentao Ye	23583ee28c	[Bug] Add Assertion for `random-input-len` / `random-output-len` (#26834 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-16 21:36:39 +00:00
kimbochen	013abde6ef	Adding Warmup to Benchmark Serving (#26943 ) Signed-off-by: Kimbo Chen <chentenghung@gmail.com>	2025-10-16 12:44:32 -07:00
Cyrus Leung	334535b6fb	[Benchmark] Show E2EL by default for pooling models (#27014 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 12:47:09 +00:00
Cyrus Leung	17838e50ef	[Benchmark] Use truncation by default for pooling benchmarks (#26992 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 16:02:39 +08:00
Cyrus Leung	f6cdc9a02f	[Chore] Rename `utils` submodules (#26920 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 03:58:13 +00:00
Cyrus Leung	828523ad8e	[Chore] Separate out `vllm.utils.async_utils` (#26913 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-15 15:33:00 +00:00
wangxiyuan	8f4b313c37	[Misc] rename torch_dtype to dtype (#26695 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-15 12:11:48 +00:00
rongfu.leng	a27b288e4a	[Feature] default --extra-body param to disable thinking in vllm bench serve (#26784 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-10-15 04:23:44 +00:00
kourosh hakhamaneshi	a2986b3e33	[Bugfix] Fixes prefix-repetition benchmark script (#26828 ) Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com>	2025-10-15 02:54:43 +00:00
Maximilien de Bayser	fe3edb4cf0	Add support for the /rerank endpoint in vllm bench serve (#26602 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-10-14 04:25:43 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00

1 2 3

137 Commits