Harry Mellor
|
4c2b38ce9e
|
Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-28 12:46:04 +00:00 |
|
wang.yuqi
|
de65fc8e1e
|
[CI] improve embed testing (#18747)
|
2025-05-28 00:16:35 -07:00 |
|
Cyrus Leung
|
0c492b7824
|
[Deprecation] Remove fallbacks for Embeddings API (#18795)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-28 15:09:04 +08:00 |
|
wang.yuqi
|
3e9ce609bd
|
[Bugfix] Fix nomic max_model_len (#18755)
|
2025-05-27 20:29:53 -07:00 |
|
Hyogeun Oh (오효근)
|
a68e293cb9
|
[Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking (#18663)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-05-27 01:44:20 -07:00 |
|
Cyrus Leung
|
61a45e7a72
|
[Bugfix] Fix Mistral-format models with sliding window (#18693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 01:44:04 -07:00 |
|
Feng XiaoLong
|
4fc1bf813a
|
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
|
2025-05-23 16:16:26 -07:00 |
|
Cyrus Leung
|
7d9216495c
|
[Doc] Update references to doc files (#18637)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-23 15:49:21 -07:00 |
|
Jiayi Yao
|
2628a69e35
|
[V1] Support Deepseek MTP (#18435)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-05-23 10:26:28 -07:00 |
|
Cyrus Leung
|
273cb3b4d9
|
[Doc] Fix top-level API links/docs (#18621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-23 09:46:56 -07:00 |
|
cascade
|
71ea614d4a
|
[Feature]Add async tensor parallelism using compilation pass (#17882)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-23 01:03:34 -07:00 |
|
aws-elaineyz
|
ed5d408255
|
[Neuron] Remove bypass on EAGLEConfig and add a test (#18514)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-22 21:26:32 -07:00 |
|
lkchen
|
e44d8ce8c7
|
[Bugfix] Set KVTransferConfig.engine_id in post_init (#18576)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-05-23 02:54:42 +00:00 |
|
Harry Mellor
|
4b0da7b60e
|
Enable hybrid attention models for Transformers backend (#18494)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-23 10:12:08 +08:00 |
|
wangxiyuan
|
721fb9b181
|
[Platform] Move platform check to right place (#18470)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-05-22 12:11:28 -07:00 |
|
Kebe
|
5d7f545204
|
[Frontend] deprecate --device arg (#18399)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-05-21 01:21:17 -07:00 |
|
Michael Goin
|
f4a8a37465
|
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-20 09:08:37 -07:00 |
|
cascade
|
9ab2c02ff8
|
Support sequence parallelism combined with pipeline parallelism (#18243)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-17 22:47:25 +00:00 |
|
David Ben-David
|
3e0d435027
|
[P/D][V1] Support dynamic loading of external KV connector implementations (#18142)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
|
2025-05-17 06:40:39 +00:00 |
|
Lucia Fang
|
3d2779c29a
|
[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-05-15 22:28:27 -07:00 |
|
Nicolò Lucchesi
|
e3f3aee6f4
|
[Misc] Avoid cuda graph log when sizes still match (#18202)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-05-15 09:59:38 -07:00 |
|
omahs
|
a9944aabfa
|
fix: typos (#18151)
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>
|
2025-05-15 02:16:15 -07:00 |
|
Luka Govedič
|
83f74c698f
|
[Fix][ROCm] Enforce eager for all encoder-decoder models on ROCm (#18154)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-05-14 22:04:43 -07:00 |
|
Aaron Pham
|
2fc9075b82
|
[V1] Structured Outputs + Thinking compatibility (#16577)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-14 15:45:24 -07:00 |
|
Jon Gill
|
754b699cbe
|
[Bug]: Fix S3 model/tokenizer path resolution (#18083)
Signed-off-by: Jon Gill <jon@yurts.ai>
|
2025-05-13 19:34:17 -07:00 |
|
Nick Hill
|
55aa7af994
|
[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-13 10:48:21 -07:00 |
|
Woosuk Kwon
|
2ff297dce9
|
[BugFix] Set default random seed to 0 for V1 (#17929)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-13 07:52:19 +00:00 |
|
Tao He
|
60f7624334
|
Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844)
|
2025-05-12 19:52:47 -07:00 |
|
Harry Mellor
|
d67085c2c8
|
Remove noisy warnings from SchedulerConfig (#17995)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-13 00:33:45 +00:00 |
|
bwshen-mi
|
acee8f48aa
|
[Model] Support MiMo-7B inference with MTP (#17433)
Signed-off-by: wp-alpha <wangpeng66@xiaomi.com>
Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com>
|
2025-05-12 23:25:33 +00:00 |
|
Robert Shaw
|
d19110204c
|
[P/D] NIXL Integration (#17751)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>
|
2025-05-12 09:46:16 -07:00 |
|
Harry Mellor
|
68311891f5
|
Don't default construct ModelConfig when default constructing VllmConfig (#17943)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-10 13:23:00 +00:00 |
|
Harry Mellor
|
4b2ed7926a
|
Improve configs - the rest! (#17562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-09 15:18:44 -07:00 |
|
Isotr0py
|
5c4c08f6f1
|
[Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config (#17265)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-09 17:16:12 +00:00 |
|
inkcherry
|
5b2dcbf0b8
|
Fix Whisper crash caused by invalid`` max_num_batched_tokens`` config (#17853)
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
|
2025-05-09 09:16:26 +00:00 |
|
Chanh Nguyen
|
7ea2adb802
|
[Core] Support full cuda graph in v1 (#16072)
Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
|
2025-05-07 22:30:15 -07:00 |
|
Akshat Tripathi
|
c20ef40fd0
|
[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238)
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-05-07 16:28:47 -04:00 |
|
Satyajith Chilappagari
|
043e4c4955
|
Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Aaron Dou <yzdou@amazon.com>
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com>
Co-authored-by: Chongming Ni <chongmni@amazon.com>
Co-authored-by: Amulya Ballakur <amulyaab@amazon.com>
Co-authored-by: Patrick Lange <patlange@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: Lin Lin Pan <tailinpa@amazon.com>
Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com>
Co-authored-by: Yishan McNabb <yishanm@amazon.com>
Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>
|
2025-05-07 00:07:30 -07:00 |
|
Jee Jee Li
|
822de7fb94
|
[Misc] Split model loader (#17712)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-07 12:42:26 +08:00 |
|
Cyrus Leung
|
2858830c39
|
[Bugfix] Prioritize dtype in root config before checking text config (#17629)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-04 12:43:05 +00:00 |
|
Harry Mellor
|
d6484ef3c3
|
Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-03 19:42:43 -07:00 |
|
Cyrus Leung
|
887d7af882
|
[Core] Gate prompt_embeds behind a feature flag (#17607)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-04 00:19:20 +08:00 |
|
Chenyaaang
|
87baebebd8
|
[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-05-02 21:42:44 -07:00 |
|
Harry Mellor
|
785d75a03b
|
Automatically tell users that dict args must be valid JSON in CLI (#17577)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-02 05:24:55 -07:00 |
|
Jerry Zhang
|
109e15a335
|
Add pt_load_map_location to allow loading to cuda (#16869)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-05-01 23:23:42 -07:00 |
|
Chen Xia
|
173daac19d
|
[Bug]change the position of cuda_graph_sizes in dataclasses (#17548)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
|
2025-05-01 11:52:37 -07:00 |
|
Cyrus Leung
|
9b1769dd9a
|
[Bugfix] Fix lint error (#17547)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-01 11:12:19 -07:00 |
|
Chen Xia
|
61c299f81f
|
[Misc]add configurable cuda graph size (#17201)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-01 11:04:50 -07:00 |
|
Harry Mellor
|
6768ff4a22
|
Move the last arguments in arg_utils.py to be in their final groups (#17531)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-01 10:31:44 -07:00 |
|
Chauncey
|
98060b001d
|
[Feature][Frontend]: Deprecate --enable-reasoning (#17452)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-01 06:46:16 -07:00 |
|