Jee Jee Li
|
822de7fb94
|
[Misc] Split model loader (#17712)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-07 12:42:26 +08:00 |
|
Cyrus Leung
|
2858830c39
|
[Bugfix] Prioritize dtype in root config before checking text config (#17629)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-04 12:43:05 +00:00 |
|
Harry Mellor
|
d6484ef3c3
|
Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-03 19:42:43 -07:00 |
|
Cyrus Leung
|
887d7af882
|
[Core] Gate prompt_embeds behind a feature flag (#17607)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-04 00:19:20 +08:00 |
|
Chenyaaang
|
87baebebd8
|
[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-05-02 21:42:44 -07:00 |
|
Harry Mellor
|
785d75a03b
|
Automatically tell users that dict args must be valid JSON in CLI (#17577)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-02 05:24:55 -07:00 |
|
Jerry Zhang
|
109e15a335
|
Add pt_load_map_location to allow loading to cuda (#16869)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-05-01 23:23:42 -07:00 |
|
Chen Xia
|
173daac19d
|
[Bug]change the position of cuda_graph_sizes in dataclasses (#17548)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
|
2025-05-01 11:52:37 -07:00 |
|
Cyrus Leung
|
9b1769dd9a
|
[Bugfix] Fix lint error (#17547)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-01 11:12:19 -07:00 |
|
Chen Xia
|
61c299f81f
|
[Misc]add configurable cuda graph size (#17201)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-01 11:04:50 -07:00 |
|
Harry Mellor
|
6768ff4a22
|
Move the last arguments in arg_utils.py to be in their final groups (#17531)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-01 10:31:44 -07:00 |
|
Chauncey
|
98060b001d
|
[Feature][Frontend]: Deprecate --enable-reasoning (#17452)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-01 06:46:16 -07:00 |
|
Harry Mellor
|
a257d9bccc
|
Improve configs - ObservabilityConfig (#17453)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-01 03:52:05 -07:00 |
|
Cyrus Leung
|
afb4429b4f
|
[CI/Build] Reorganize models tests (#17459)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-30 23:03:08 -07:00 |
|
Aaron Pham
|
da4e7687b5
|
[Fix] Support passing args to logger (#17425)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-04-30 08:06:58 -07:00 |
|
Alec
|
0be6d05b5e
|
[V1][Metrics] add support for kv event publishing (#16750)
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-30 07:44:45 -07:00 |
|
Harry Mellor
|
13698db634
|
Improve configs - ModelConfig (#17130)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-30 10:38:22 +08:00 |
|
Harry Mellor
|
a6977dbd15
|
Simplify (and fix) passing of guided decoding backend options (#17008)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-29 19:02:23 +00:00 |
|
Harry Mellor
|
2ef5d106bb
|
Improve literal dataclass field conversion to argparse argument (#17391)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-29 16:25:08 +00:00 |
|
Cyrus Leung
|
ebb3930d28
|
[Misc] Move config fields to MultiModalConfig (#17343)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-29 06:37:21 +00:00 |
|
Ekagra Ranjan
|
e136000595
|
[V1][Spec Decode] Make Eagle model arch config driven (#17323)
|
2025-04-29 10:22:02 +08:00 |
|
Harry Mellor
|
c7941cca18
|
Explicitly explain quant method override ordering and ensure all overrides are ordered (#17256)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-28 16:55:31 +00:00 |
|
Harry Mellor
|
b6dd32aa07
|
Make name of compressed-tensors quant method consistent across vLLM (#17255)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-28 16:28:13 +00:00 |
|
Harry Mellor
|
fb1c933ade
|
Add missing class docstring for PromptAdapterConfig (#17302)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-28 04:06:59 -07:00 |
|
cascade
|
690fe019f0
|
[Feature] support sequence parallelism using compilation pass (#16155)
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-04-27 06:29:35 -07:00 |
|
Chen Zhang
|
838cedade7
|
[Bugfix] Get a specific type of layer from forward context (#17222)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-04-27 00:58:05 -07:00 |
|
Woosuk Kwon
|
513f074766
|
[CI/test] Fix Eagle Correctness Test (#17209)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 23:40:36 -07:00 |
|
Woosuk Kwon
|
1cf0719ebd
|
[Minor][Spec Decode] Add use_eagle to SpeculativeConfig (#17213)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 21:08:15 -07:00 |
|
Benjamin Chislett
|
a0e619e62a
|
[V1][Spec Decode] EAGLE-3 Support (#16937)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-25 15:43:07 -07:00 |
|
Harry Mellor
|
423e9f1cbe
|
Use Transformers helper get_text_config() instead of checking for text_config (#17105)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-25 08:47:35 -07:00 |
|
rasmith
|
a41351f363
|
[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2025-04-25 00:45:02 -07:00 |
|
Harry Mellor
|
6ca0234478
|
Move missed SchedulerConfig args into scheduler config group in EngineArgs (#17131)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 22:48:53 -07:00 |
|
Harry Mellor
|
0fa939e2d1
|
Improve configs - LoRAConfig + PromptAdapterConfig (#16980)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 10:29:34 -07:00 |
|
wang.yuqi
|
67309a1cb5
|
[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970)
|
2025-04-24 07:06:28 -07:00 |
|
Harry Mellor
|
0a05ed57e6
|
Simplify TokenizerGroup (#16790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 04:43:56 -07:00 |
|
Travis Johnson
|
3cde34a4a4
|
[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-04-23 18:34:41 +00:00 |
|
Harry Mellor
|
bdb3660312
|
Use @property and private field for data_parallel_rank_local (#17053)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-23 08:50:08 -07:00 |
|
Harry Mellor
|
f3a21e9c68
|
CacheConfig.block_size should always be int when used (#17052)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-23 08:50:05 -07:00 |
|
Richard Zou
|
7f58fb9718
|
Add assertion for no objects while hashing hf_config (#16930)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-22 09:32:22 -07:00 |
|
vllmellm
|
30bc3e0f66
|
[FEAT][ROCm]: Support AITER MLA (#15893)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
|
2025-04-22 09:31:13 -07:00 |
|
Harry Mellor
|
d059110498
|
Improve configs - SpeculativeConfig (#16971)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-22 12:55:36 +00:00 |
|
Cyrus Leung
|
8f7bace7c3
|
[Doc] Improve documentation for multimodal CLI args (#16960)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-22 08:35:35 +00:00 |
|
Lei Wang
|
8d32dc603d
|
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036)
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>
|
2025-04-22 09:01:36 +01:00 |
|
omer-dayan
|
71ce44047f
|
Support S3 Sharded loading with RunAI Model Streamer (#16317)
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-04-21 21:21:49 -07:00 |
|
Varun Sundar Rabindranath
|
7b8a2ab76f
|
[Kernel] Add expert_map support to Cutlass FP8 MOE (#16861)
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
|
2025-04-21 20:44:32 -07:00 |
|
Jee Jee Li
|
c9acbf1141
|
[Misc] Remove the chunked prefill warning for LoRA (#16925)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-21 20:44:24 -07:00 |
|
qizixi
|
bb3605db85
|
[Bugfix] Fix v1/spec_decode/test_ngram.py (#16895)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-04-20 20:54:29 -07:00 |
|
Harry Mellor
|
4b07d36891
|
Improve configs - CacheConfig (#16835)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-20 12:25:04 +08:00 |
|
Harry Mellor
|
686623c5e7
|
Fix nullable_kvs fallback (#16837)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-18 05:58:39 -07:00 |
|
Harry Mellor
|
e78587a64c
|
Improve-mm-and-pooler-and-decoding-configs (#16789)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 22:13:32 -07:00 |
|