Yiming
|
cd587c93ef
|
[BugFix]: Properly set engine_id when using multi connector (#19487)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: leiyiming <leiyiming@kingsoft.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-07-09 20:32:44 +00:00 |
|
fxmarty-amd
|
332d4cb17b
|
[Feature][Quantization] MXFP4 support for MOE models (#17888)
Signed-off-by: Felix Marty <felmarty@amd.com>
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
|
2025-07-09 13:19:02 -07:00 |
|
Tuan, Hoang-Trong
|
47043eb678
|
[Kernel] Triton implementation of causal-conv1d for Mamba-based models (#18218)
Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-09 12:53:55 -07:00 |
|
Li, Jiang
|
e59ba9e142
|
[CI/Build] Enlarge tolerance for a CPU multi-modal test (#20684)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-09 17:48:52 +00:00 |
|
Liangliang Ma
|
a3e4e85ece
|
[XPU][CI] enhance xpu test support (#20652)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
|
2025-07-09 16:53:09 +00:00 |
|
Chengji Yao
|
eb58f5953d
|
[TPU][Bugfix] fix test_pallas (#20666)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-09 09:32:48 -07:00 |
|
Sanger Steel
|
4ac9c33f78
|
[Bugfix] Fix handling of Tensorizer arguments for LoadConfig (#20643)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-07-09 15:36:37 +00:00 |
|
Chauncey
|
2155e95ef1
|
[Bugfix] Fix the issue where reasoning_content is None when Thinkng is enabled and tool_choice is set to 'required'. (#20662)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-09 07:39:58 +00:00 |
|
Dmitry Rogozhkin
|
e760fcef22
|
[XPU] Use spawn with XPU multiprocessing (#20649)
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
|
2025-07-09 00:34:28 -07:00 |
|
B-201
|
6bbf1795b7
|
[Misc] Fix the size of batched_dummy_mm_inputs in profile_run (#20434)
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-07-08 20:15:44 -07:00 |
|
kourosh hakhamaneshi
|
baed180aa0
|
[tech debt] Revisit lora request model checker (#20636)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-07-09 09:42:41 +08:00 |
|
QiliangCui
|
d8ee5a2ca4
|
[TPU][Bugfix] disable phi-3 test (#20632)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-08 23:14:26 +00:00 |
|
wang.yuqi
|
baba0389f7
|
[CI] Increase the threshold of the MTEB RERANK tests (#20615)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-08 08:10:11 -07:00 |
|
Nicolò Lucchesi
|
71d1d75b7a
|
[PD][Nixl] Remote consumer READ timeout for clearing request blocks (#20139)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-08 08:56:40 +01:00 |
|
Sanger Steel
|
72d14d0eed
|
[Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load (#19619)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Co-authored-by: Eta <esyra@coreweave.com>
|
2025-07-07 22:47:43 -07:00 |
|
Li, Jiang
|
7721ef1786
|
[CI/Build][CPU] Fix CPU CI and remove all CPU V0 files (#20560)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-07 22:13:44 -07:00 |
|
Chauncey
|
93b9d9f499
|
[Bugfix]: Fix messy code when using logprobs (#19209)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-08 11:02:15 +08:00 |
|
Ming Yang
|
afb7cff1b9
|
[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe (#20167)
Signed-off-by: Ming Yang <yming@meta.com>
|
2025-07-08 01:07:22 +00:00 |
|
Anton
|
e601efcb10
|
[Misc] Add fully interleaved support for multimodal 'string' content format (#14047)
Signed-off-by: drobyshev.anton <drobyshev.anton@wb.ru>
Co-authored-by: drobyshev.anton <drobyshev.anton@wb.ru>
|
2025-07-07 19:43:08 +00:00 |
|
ztang2370
|
a37d75bbec
|
[Front-end] microbatch tokenization (#19334)
Signed-off-by: zt2370 <ztang2370@gmail.com>
|
2025-07-07 17:54:10 +01:00 |
|
wang.yuqi
|
110df74332
|
[Model][Last/4] Automatic conversion of CrossEncoding model (#19675)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-07 14:46:04 +00:00 |
|
Jee Jee Li
|
2e610deb72
|
[CI/Build] Enable phi2 lora test (#20540)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-07 05:10:41 +00:00 |
|
Woosuk Kwon
|
462b269280
|
Implement OpenAI Responses API [1/N] (#20504)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-06 18:32:13 -07:00 |
|
Cyrus Leung
|
9fb52e523a
|
[V1] Support any head size for FlexAttention backend (#20467)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-06 09:54:36 -07:00 |
|
Woosuk Kwon
|
e202dd2736
|
[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-07-06 08:48:13 -07:00 |
|
Flora Feng
|
fe1e924811
|
[Frontend] Support image object in llm.chat (#19635)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
|
2025-07-06 06:47:13 +00:00 |
|
Vadim Gimpelson
|
f73d02aadc
|
[BUG] Fix #20484. Support empty sequence in cuda penalty kernel (#20491)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
|
2025-07-05 19:38:02 -07:00 |
|
Jeremy Reizenstein
|
c5ebe040ac
|
test_attention compat with coming xformers change (#20487)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-05 19:37:59 -07:00 |
|
Isotr0py
|
32c9be2200
|
[v1] Re-add fp32 support to v1 engine through FlexAttention (#19754)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-05 09:41:10 +00:00 |
|
Jee Jee Li
|
906e05d840
|
[Misc] Remove the unused LoRA test code (#20494)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-05 13:48:16 +08:00 |
|
Michael Goin
|
c108781c85
|
[CI Bugfix] Fix pre-commit failures on main (#20502)
|
2025-07-04 14:17:30 -07:00 |
|
Duncan Moss
|
3d184b95b8
|
[feat]: CUTLASS block scaled group gemm for SM100 (#19757)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Duncan Moss <dmoss@nvidia.com>
|
2025-07-04 12:58:04 -06:00 |
|
Thomas Parnell
|
2f35a022e6
|
Enable V1 for Hybrid SSM/Attention Models (#20016)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-04 17:46:53 +00:00 |
|
wang.yuqi
|
2e26f9156a
|
[Model][3/N] Automatic conversion of CrossEncoding model (#20168)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-04 05:47:39 -07:00 |
|
sangbumlikeagod
|
9e5452ee34
|
[Bug][Frontend] Fix structure of transcription's decoder_prompt (#18809)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
|
2025-07-04 11:28:07 +00:00 |
|
Jee Jee Li
|
1caca5a589
|
[Misc] Add SPDX-FileCopyrightText (#20428)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-04 07:40:42 +00:00 |
|
Aaron Pham
|
4a98edff1f
|
[Structured Outputs][V1] Skipping with models doesn't contain tokenizers (#20365)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-07-04 15:05:49 +08:00 |
|
Seiji Eicher
|
8d1096e7db
|
[Bugfix] Register reducer even if transformers_modules not available (#19510)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-07-03 22:08:12 +00:00 |
|
bnellnm
|
78fe77534b
|
[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. (#18864)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-07-03 14:55:40 -07:00 |
|
Ning Xie
|
1dba2c4ebe
|
[Misc] adjust for ipv6 for mookcacke url parse (#20107)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-03 20:27:17 +00:00 |
|
wang.yuqi
|
6f1229f91d
|
[Model][2/N] Automatic conversion of CrossEncoding model (#19978)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-03 13:59:23 +00:00 |
|
Cyrus Leung
|
b024a42e93
|
[Core] Move multimodal placeholder from chat utils to model definition (#20355)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-03 08:18:30 +00:00 |
|
Nick Hill
|
67d25eca05
|
[Tests] Update online DP tests to verify that requests are balanced (#20157)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-03 14:49:13 +08:00 |
|
qscqesze
|
363528de27
|
[Feature] Support MiniMax-M1 function calls features (#20297)
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
|
2025-07-03 06:48:27 +00:00 |
|
Chenheli Hua
|
b616f6a53d
|
[Misc] Small: Fix video loader return type annotations. (#20389)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-07-03 03:10:39 +00:00 |
|
Nick Hill
|
657f2f301a
|
[DP] Support external DP Load Balancer mode (#19790)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-02 10:21:52 -07:00 |
|
Nick Hill
|
d265414dbc
|
[Minor] Clean up incorrect comment in test (#20382)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-02 09:13:37 -07:00 |
|
afeldman-nm
|
48fb076cbc
|
[V1] LogitsProcessor programming model (#16728)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-07-02 09:10:42 -07:00 |
|
bnellnm
|
c1909e7e8c
|
[Kernels] MoE refactor (#19636)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
|
2025-07-02 06:08:27 -07:00 |
|
WangHuaqiang
|
ccbfb1d1c9
|
[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322)
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
|
2025-07-02 12:53:36 +00:00 |
|