Commit Graph

327 Commits

Author SHA1 Message Date
Chen Xia
61c299f81f [Misc]add configurable cuda graph size (#17201)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-01 11:04:50 -07:00
Harry Mellor
6768ff4a22 Move the last arguments in arg_utils.py to be in their final groups (#17531)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-01 10:31:44 -07:00
Chauncey
98060b001d [Feature][Frontend]: Deprecate --enable-reasoning (#17452)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-05-01 06:46:16 -07:00
Harry Mellor
a257d9bccc Improve configs - ObservabilityConfig (#17453)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-01 03:52:05 -07:00
Alec
0be6d05b5e [V1][Metrics] add support for kv event publishing (#16750)
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-04-30 07:44:45 -07:00
Harry Mellor
13698db634 Improve configs - ModelConfig (#17130)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-30 10:38:22 +08:00
Harry Mellor
a6977dbd15 Simplify (and fix) passing of guided decoding backend options (#17008)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-29 19:02:23 +00:00
Harry Mellor
2ef5d106bb Improve literal dataclass field conversion to argparse argument (#17391)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-29 16:25:08 +00:00
Hyogeun Oh (오효근)
193e78e35d [Fix] Documentation spacing in compilation config help text (#17342)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-04-29 00:16:17 -07:00
Cyrus Leung
ebb3930d28 [Misc] Move config fields to MultiModalConfig (#17343)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-29 06:37:21 +00:00
Harry Mellor
f94886946e Improve conversion from dataclass configs to argparse arguments (#17303)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-28 16:22:12 +00:00
Lucas Wilkinson
d8bccde686 [BugFix] Fix vllm_flash_attn install issues (#17267)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-04-27 17:27:56 -07:00
rasmith
68af5f6c5c [AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary (#17215)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2025-04-25 19:55:05 -07:00
Benjamin Chislett
a0e619e62a [V1][Spec Decode] EAGLE-3 Support (#16937)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Bryan Lu <yuzhelu@amazon.com>
2025-04-25 15:43:07 -07:00
rasmith
a41351f363 [Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-04-25 00:45:02 -07:00
Harry Mellor
6ca0234478 Move missed SchedulerConfig args into scheduler config group in EngineArgs (#17131)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 22:48:53 -07:00
Harry Mellor
0fa939e2d1 Improve configs - LoRAConfig + PromptAdapterConfig (#16980)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 10:29:34 -07:00
Harry Mellor
0a05ed57e6 Simplify TokenizerGroup (#16790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 04:43:56 -07:00
Woosuk Kwon
46e678bcff [Minor] Use larger batch sizes for A100/B100/B200/MI300x (#17073)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-23 19:18:59 -07:00
Travis Johnson
3cde34a4a4 [Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2025-04-23 18:34:41 +00:00
Harry Mellor
d059110498 Improve configs - SpeculativeConfig (#16971)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-22 12:55:36 +00:00
Cyrus Leung
8f7bace7c3 [Doc] Improve documentation for multimodal CLI args (#16960)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-22 08:35:35 +00:00
wangxiyuan
b9b4746950 [V1] Remove additional_config check (#16710)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-21 20:45:27 -07:00
Michael Goin
986537f1c3 [V1] V1 FlashInfer Attention (#16684)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Aurick Qiao <qiao@aurick.net>
2025-04-22 00:38:41 +00:00
Harry Mellor
4b07d36891 Improve configs - CacheConfig (#16835)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-20 12:25:04 +08:00
Harry Mellor
686623c5e7 Fix nullable_kvs fallback (#16837)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-18 05:58:39 -07:00
Harry Mellor
e78587a64c Improve-mm-and-pooler-and-decoding-configs (#16789)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-17 22:13:32 -07:00
Cyrus Leung
c16fb5dae8 [Doc] Improve help examples for --compilation-config (#16729)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-17 21:22:34 -07:00
Yihua Cheng
3408e47159 [P/D][V1] KV Connector API V1 (#15960)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
2025-04-17 13:22:40 -07:00
Harry Mellor
d27ea94034 Improve configs - TokenizerPoolConfig + DeviceConfig (#16603)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-17 11:19:42 +00:00
Harry Mellor
e51929ebca Improve configs - SchedulerConfig (#16533)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-14 17:24:16 +08:00
Cyrus Leung
d9fc8cd9da [V1] Enable multi-input by default (#15799)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-12 08:52:39 +00:00
Harry Mellor
cd77382ac1 Improve configs - LoadConfig (#16422)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-11 20:27:27 +00:00
Michael Goin
aa3b3d76e0 Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-11 08:09:52 +00:00
Russell Bryant
9665313c39 [V1] Set structured output backend to auto by default (#15724)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-10 17:53:26 +00:00
Harry Mellor
0c54fc7273 Improve configs - ParallelConfig (#16332)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-10 17:34:37 +00:00
Michael Goin
8e5314a468 [V1] Add disable_chunked_mm_input arg to disable partial mm input prefill (#15837)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-07 23:24:07 -07:00
Nicolò Lucchesi
090c856d76 [Misc] Human-readable max-model-len cli arg (#16181)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-04-07 14:40:58 -04:00
Cyrus Leung
66d433b94f [V1] Revert the default max_num_seqs to V0 values for most hardware (#16158)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-07 13:54:36 -04:00
paolovic
da224daaa9 [Bugfix] add hf_token to EngineArgs (#16093)
Signed-off-by: paolovic <paul-philipp.luley@uzh.ch>
Co-authored-by: paolovic <paul-philipp.luley@uzh.ch>
2025-04-06 14:47:33 +00:00
Tristan Leclercq
4285e423a6 [Misc] Auto detect bitsandbytes pre-quantized models (#16027)
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
2025-04-04 23:30:45 -07:00
Michael Goin
4708f13a9c [Bugfix] Fix default behavior/fallback for pp in v1 (#16057)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-04 17:58:08 +00:00
Woosuk Kwon
e75a6301bd [V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-01 12:33:16 -07:00
Varun Sundar Rabindranath
79455cf421 [Misc] Enable V1 LoRA by default (#15320)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-04-01 16:53:56 +08:00
Gregory Shtrasberg
c7e63aa4d8 [ROCm] Use device name in the warning (#15838)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-04-01 00:10:48 -07:00
shangmingc
239b7befdd [V1][Spec Decode] Remove deprecated spec decode config params (#15466)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-03-31 09:19:35 -07:00
youkaichao
555aa21905 [V1] Fully Transparent Implementation of CPU Offloading (#15354)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-31 20:22:34 +08:00
Reid
26df46ee59 [Misc] cli auto show default value (#15582)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-03-28 22:23:00 +00:00
Russell Bryant
7329ff5468 [V1] Support disable_any_whtespace for guidance backend (#15584)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-28 23:46:45 +08:00
Ce Gao
32b14baf8a [Refactor][Frontend] Keep all logic about reasoning into one class (#14428)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-03-28 00:23:30 -07:00