Cyrus Leung
|
afb4429b4f
|
[CI/Build] Reorganize models tests (#17459)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-30 23:03:08 -07:00 |
|
Aaron Pham
|
da4e7687b5
|
[Fix] Support passing args to logger (#17425)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-04-30 08:06:58 -07:00 |
|
Alec
|
0be6d05b5e
|
[V1][Metrics] add support for kv event publishing (#16750)
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-30 07:44:45 -07:00 |
|
Harry Mellor
|
13698db634
|
Improve configs - ModelConfig (#17130)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-30 10:38:22 +08:00 |
|
Harry Mellor
|
a6977dbd15
|
Simplify (and fix) passing of guided decoding backend options (#17008)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-29 19:02:23 +00:00 |
|
Harry Mellor
|
2ef5d106bb
|
Improve literal dataclass field conversion to argparse argument (#17391)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-29 16:25:08 +00:00 |
|
Cyrus Leung
|
ebb3930d28
|
[Misc] Move config fields to MultiModalConfig (#17343)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-29 06:37:21 +00:00 |
|
Ekagra Ranjan
|
e136000595
|
[V1][Spec Decode] Make Eagle model arch config driven (#17323)
|
2025-04-29 10:22:02 +08:00 |
|
Harry Mellor
|
c7941cca18
|
Explicitly explain quant method override ordering and ensure all overrides are ordered (#17256)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-28 16:55:31 +00:00 |
|
Harry Mellor
|
b6dd32aa07
|
Make name of compressed-tensors quant method consistent across vLLM (#17255)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-28 16:28:13 +00:00 |
|
Harry Mellor
|
fb1c933ade
|
Add missing class docstring for PromptAdapterConfig (#17302)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-28 04:06:59 -07:00 |
|
cascade
|
690fe019f0
|
[Feature] support sequence parallelism using compilation pass (#16155)
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-04-27 06:29:35 -07:00 |
|
Chen Zhang
|
838cedade7
|
[Bugfix] Get a specific type of layer from forward context (#17222)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-04-27 00:58:05 -07:00 |
|
Woosuk Kwon
|
513f074766
|
[CI/test] Fix Eagle Correctness Test (#17209)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 23:40:36 -07:00 |
|
Woosuk Kwon
|
1cf0719ebd
|
[Minor][Spec Decode] Add use_eagle to SpeculativeConfig (#17213)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 21:08:15 -07:00 |
|
Benjamin Chislett
|
a0e619e62a
|
[V1][Spec Decode] EAGLE-3 Support (#16937)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-25 15:43:07 -07:00 |
|
Harry Mellor
|
423e9f1cbe
|
Use Transformers helper get_text_config() instead of checking for text_config (#17105)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-25 08:47:35 -07:00 |
|
rasmith
|
a41351f363
|
[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2025-04-25 00:45:02 -07:00 |
|
Harry Mellor
|
6ca0234478
|
Move missed SchedulerConfig args into scheduler config group in EngineArgs (#17131)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 22:48:53 -07:00 |
|
Harry Mellor
|
0fa939e2d1
|
Improve configs - LoRAConfig + PromptAdapterConfig (#16980)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 10:29:34 -07:00 |
|
wang.yuqi
|
67309a1cb5
|
[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970)
|
2025-04-24 07:06:28 -07:00 |
|
Harry Mellor
|
0a05ed57e6
|
Simplify TokenizerGroup (#16790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 04:43:56 -07:00 |
|
Travis Johnson
|
3cde34a4a4
|
[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-04-23 18:34:41 +00:00 |
|
Harry Mellor
|
bdb3660312
|
Use @property and private field for data_parallel_rank_local (#17053)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-23 08:50:08 -07:00 |
|
Harry Mellor
|
f3a21e9c68
|
CacheConfig.block_size should always be int when used (#17052)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-23 08:50:05 -07:00 |
|
Richard Zou
|
7f58fb9718
|
Add assertion for no objects while hashing hf_config (#16930)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-22 09:32:22 -07:00 |
|
vllmellm
|
30bc3e0f66
|
[FEAT][ROCm]: Support AITER MLA (#15893)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
|
2025-04-22 09:31:13 -07:00 |
|
Harry Mellor
|
d059110498
|
Improve configs - SpeculativeConfig (#16971)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-22 12:55:36 +00:00 |
|
Cyrus Leung
|
8f7bace7c3
|
[Doc] Improve documentation for multimodal CLI args (#16960)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-22 08:35:35 +00:00 |
|
Lei Wang
|
8d32dc603d
|
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036)
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>
|
2025-04-22 09:01:36 +01:00 |
|
omer-dayan
|
71ce44047f
|
Support S3 Sharded loading with RunAI Model Streamer (#16317)
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-04-21 21:21:49 -07:00 |
|
Varun Sundar Rabindranath
|
7b8a2ab76f
|
[Kernel] Add expert_map support to Cutlass FP8 MOE (#16861)
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
|
2025-04-21 20:44:32 -07:00 |
|
Jee Jee Li
|
c9acbf1141
|
[Misc] Remove the chunked prefill warning for LoRA (#16925)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-21 20:44:24 -07:00 |
|
qizixi
|
bb3605db85
|
[Bugfix] Fix v1/spec_decode/test_ngram.py (#16895)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-04-20 20:54:29 -07:00 |
|
Harry Mellor
|
4b07d36891
|
Improve configs - CacheConfig (#16835)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-20 12:25:04 +08:00 |
|
Harry Mellor
|
686623c5e7
|
Fix nullable_kvs fallback (#16837)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-18 05:58:39 -07:00 |
|
Harry Mellor
|
e78587a64c
|
Improve-mm-and-pooler-and-decoding-configs (#16789)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 22:13:32 -07:00 |
|
Harry Mellor
|
d27ea94034
|
Improve configs - TokenizerPoolConfig + DeviceConfig (#16603)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 11:19:42 +00:00 |
|
Bryan Lu
|
2cbd4d2999
|
[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification (#16636)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-16 19:47:26 -07:00 |
|
Shinichi Hemmi
|
3badb0213b
|
[Model] Add PLaMo2 (#14323)
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
Signed-off-by: shemmi <shemmi@preferred.jp>
Co-authored-by: Kento Nozawa <nzw0301@preferred.jp>
Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp>
Co-authored-by: Calvin Metzger <metzger@preferred.jp>
|
2025-04-15 19:31:30 -07:00 |
|
Richard Zou
|
b590adfdc1
|
Fix vLLM x torch.compile config caching (#16491)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-14 23:11:11 -07:00 |
|
Shuqiao Li
|
d2020acac7
|
config check sleep mode support oot platforms (#16562)
|
2025-04-14 16:31:50 -07:00 |
|
Harry Mellor
|
e51929ebca
|
Improve configs - SchedulerConfig (#16533)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-14 17:24:16 +08:00 |
|
Cyrus Leung
|
d9fc8cd9da
|
[V1] Enable multi-input by default (#15799)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-12 08:52:39 +00:00 |
|
wang.yuqi
|
fbf722c6e6
|
[Frontend] support matryoshka representation / support embedding API dimensions (#16331)
|
2025-04-11 23:23:10 -07:00 |
|
Harry Mellor
|
cd77382ac1
|
Improve configs - LoadConfig (#16422)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-11 20:27:27 +00:00 |
|
Richard Zou
|
70de35a881
|
Fix erroneous "model doesn't support compile" warning (#16486)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-11 16:24:36 +00:00 |
|
Jee Jee Li
|
a26f59ccbc
|
[Misc] Raise error for V1 not supporting Long LoRA. (#16415)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-11 01:51:20 -07:00 |
|
Russell Bryant
|
9665313c39
|
[V1] Set structured output backend to auto by default (#15724)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-10 17:53:26 +00:00 |
|
Harry Mellor
|
0c54fc7273
|
Improve configs - ParallelConfig (#16332)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-10 17:34:37 +00:00 |
|