Nick Hill
|
657f2f301a
|
[DP] Support external DP Load Balancer mode (#19790)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-02 10:21:52 -07:00 |
|
Nick Hill
|
d265414dbc
|
[Minor] Clean up incorrect comment in test (#20382)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-02 09:13:37 -07:00 |
|
afeldman-nm
|
48fb076cbc
|
[V1] LogitsProcessor programming model (#16728)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-07-02 09:10:42 -07:00 |
|
bnellnm
|
c1909e7e8c
|
[Kernels] MoE refactor (#19636)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
|
2025-07-02 06:08:27 -07:00 |
|
WangHuaqiang
|
ccbfb1d1c9
|
[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322)
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
|
2025-07-02 12:53:36 +00:00 |
|
CSWYF3634076
|
e303dcf523
|
[Model] Add Ernie4.5 and Ernie4.5MoE Model Support (#20220)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2025-07-02 03:37:01 -07:00 |
|
Kwai-Keye
|
8452946c06
|
[Model][VLM] Support Keye-VL-8B-Preview (#20126)
Signed-off-by: Kwai-Keye <Keye@kuaishou.com>
|
2025-07-01 23:35:04 -07:00 |
|
Chenheli Hua
|
2e7cbf2d7d
|
[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-07-01 23:34:03 -07:00 |
|
Chengji Yao
|
7da296be04
|
[TPU] kv cache update kernel supports dynamic grid (#20235)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-02 06:33:37 +00:00 |
|
Wentao Ye
|
7058d7dd5d
|
[Refactor] Remove duplicate find_free_port (#20333)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-01 19:03:07 -07:00 |
|
Liangliang Ma
|
a0389e0554
|
[UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
|
2025-07-02 09:06:04 +08:00 |
|
czhu-cohere
|
3abfe22154
|
Enable group size 64 for Machete (#20290)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
|
2025-07-01 18:05:44 -07:00 |
|
Wentao Ye
|
e81fbefe8a
|
[Refactor] Refactor import utils (#20269)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-01 18:05:42 -07:00 |
|
Woosuk Kwon
|
7f280d69c9
|
[Optimization] Cache sampled token ids in model runner (#20291)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-01 11:01:31 -07:00 |
|
aiyiwang2025
|
ecad851cbd
|
[Model]Add Tencent HunYuanMoEV1 Model Support (#20114)
Signed-off-by: aiyiwang <aiyiwang@tencent.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: quinnrong <quinnrong@tencent.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-01 07:28:13 -07:00 |
|
Yuxuan Zhang
|
ed70f3c64f
|
Add GLM4.1V model (Draft) (#19331)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-01 12:48:26 +00:00 |
|
Kyle Sayers
|
9025a9a705
|
[Quant] [Bugfix] Fix quantization config matching with hf_to_vllm_mapper (#20046)
|
2025-07-01 19:20:34 +09:00 |
|
Lionel Villard
|
c05596f1a3
|
[Perf] Validate @config in pre-commit instead of dynamically (#20200)
Signed-off-by: Lionel Villard <villard@us.ibm.com>
|
2025-07-01 05:10:28 -04:00 |
|
TY-AMD
|
96453cfa83
|
[BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine (#19067)
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
|
2025-07-01 16:12:19 +08:00 |
|
Varun Sundar Rabindranath
|
08d81f1014
|
[Bugfix] Fix deepep tests (#20288)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-01 15:29:08 +08:00 |
|
Li, Jiang
|
6cc1e7d96d
|
[CPU] Update custom ops for the CPU backend (#20255)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-01 07:25:03 +00:00 |
|
czhu-cohere
|
9909726d2a
|
Enable ZP Support for Machete (#20268)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
|
2025-07-01 07:12:20 +00:00 |
|
Alex Kogan
|
27949354fa
|
[Feature] A calibration-free RTN-based quantization for accurate and accelerated INT4/INT8 inference (#18768)
Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-07-01 05:44:38 +00:00 |
|
fyuan1316
|
e28533a16f
|
[Bugfix] Fix include prompt in stream response when echo=true (#15233)
Signed-off-by: Yuan Fang <yuanfang@alauda.io>
|
2025-07-01 01:30:14 +00:00 |
|
Luka Govedič
|
6d42ce8315
|
[CLI] Improve CLI arg parsing for -O/--compilation-config (#20156)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-07-01 01:03:13 +00:00 |
|
Kyle Sayers
|
d8cf819a9a
|
[Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-06-30 17:26:49 +00:00 |
|
Wentao Ye
|
551ef1631a
|
[Unit Test] Add unit test for deep gemm (#20090)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-30 10:26:42 -06:00 |
|
Woosuk Kwon
|
2863befce3
|
[Optimization] Use Shared CachedRequestData Instance Across All Requests (#20232)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 09:07:50 -07:00 |
|
redmoe-moutain
|
65b1cbb138
|
[Model] support dots1 (#18254)
Signed-off-by: redmoe-moutain <agiredmoe@gmail.com>
|
2025-06-29 19:34:36 -07:00 |
|
Dipika Sikka
|
6f2f53a82d
|
[Quantization] Add compressed-tensors NVFP4 MoE Support (#19990)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
|
2025-06-29 22:05:40 +00:00 |
|
Michael Goin
|
7b1895e6ce
|
[CI Fix] Try fixing eagle e2e test OOM by reducing block allocation (#20213)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-29 10:31:37 +08:00 |
|
Wentao Ye
|
4d36693687
|
[Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx (#20187)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-28 22:06:38 +00:00 |
|
Stan Wozniak
|
daec9dea6e
|
[Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution (#20137)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
|
2025-06-28 08:16:41 -07:00 |
|
Thomas Parnell
|
8615d9776f
|
[CI/Build] Add new CI job to validate Hybrid Models for every PR (#20147)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-27 23:00:25 -07:00 |
|
Chales Xu
|
e53be6f00a
|
[Misc] Add type assertion of request_id for LLMEngine.add_request (#19700)
Signed-off-by: n2ptr <xuzhanchaomail@163.com>
|
2025-06-27 22:47:36 -07:00 |
|
Michael Goin
|
c329ceca6d
|
[CI Fix] Pin tests/models/registry.py MiniMaxText01ForCausalLM to revision due to model changes (#20199)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-28 13:43:06 +08:00 |
|
Luka Govedič
|
aafabaa0d5
|
[Fix][torch.compile] Enable custom ops by default when Inductor off (#20102)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-27 09:00:42 -06:00 |
|
Robert Shaw
|
d1c956dc0f
|
Gemma3n (Text-only) (#20134)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-06-27 07:16:26 +00:00 |
|
Yazan Sharaya
|
6e244ae091
|
[Perf][Frontend] eliminate api_key and x_request_id headers middleware overhead (#19946)
Signed-off-by: Yazan-Sharaya <yazan.sharaya.yes@gmail.com>
|
2025-06-27 00:44:14 -04:00 |
|
wang.yuqi
|
cd4cfee689
|
[Model][1/N] Automatic conversion of CrossEncoding model (#20012)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-06-26 21:10:04 -07:00 |
|
Thomas Parnell
|
e110930680
|
[Fix] Fix gemma CI test failing on main (#20124)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-26 21:06:59 -07:00 |
|
Yang Wang
|
8b64c895c0
|
[CI] Sync test dependency with test.in for torch nightly (#19632)
Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-26 20:55:25 -07:00 |
|
li haoyang
|
0740e29b66
|
[Feature] add quick all reduce (#19744)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-06-26 20:54:24 -07:00 |
|
Michael Goin
|
71799fd005
|
[CI Failure] Fix OOM with test_oot_registration_embedding (#20144)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-27 11:21:04 +08:00 |
|
Bowen Wang
|
e9fd658a73
|
[Feature] Expert Parallelism Load Balancer (EPLB) (#18343)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
|
2025-06-26 15:30:21 -07:00 |
|
Wentao Ye
|
562308816c
|
[Refactor] Rename commnication utils (#20091)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-26 22:19:32 +00:00 |
|
Chengji Yao
|
04e1642e32
|
[TPU] add kv cache update kernel (#19928)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-26 10:01:37 -07:00 |
|
Wentao Ye
|
c894c5dc1f
|
[Bug Fix] Fix address/port already in use error for deep_ep test (#20094)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-26 22:33:13 +08:00 |
|
Li, Jiang
|
0567c8249f
|
[CPU] Fix torch version in x86 CPU backend (#19258)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-26 03:34:47 -07:00 |
|
Seiji Eicher
|
65397e40f5
|
[Bugfix] Allow CUDA_VISIBLE_DEVICES='' in Platform.device_id_to_physical_device_id (#18979)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-06-26 00:01:57 -07:00 |
|