fyuan1316
|
e28533a16f
|
[Bugfix] Fix include prompt in stream response when echo=true (#15233)
Signed-off-by: Yuan Fang <yuanfang@alauda.io>
|
2025-07-01 01:30:14 +00:00 |
|
Luka Govedič
|
6d42ce8315
|
[CLI] Improve CLI arg parsing for -O/--compilation-config (#20156)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-07-01 01:03:13 +00:00 |
|
Zhonghua Deng
|
ded1fb635b
|
[Bugfix][V1][P/D]Fix the issue of occasional garbled output for P2pNcclConnector (#20263)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-06-30 16:45:14 -07:00 |
|
Wentao Ye
|
97d9524fe9
|
[Refactor] Remove useless pdb comment (#20266)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-30 18:15:24 +00:00 |
|
Kyle Sayers
|
d8cf819a9a
|
[Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-06-30 17:26:49 +00:00 |
|
Wentao Ye
|
551ef1631a
|
[Unit Test] Add unit test for deep gemm (#20090)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-30 10:26:42 -06:00 |
|
Woosuk Kwon
|
2863befce3
|
[Optimization] Use Shared CachedRequestData Instance Across All Requests (#20232)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 09:07:50 -07:00 |
|
Woosuk Kwon
|
2965c99c86
|
[Spec Decode] Clean up spec decode example (#20240)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 08:28:13 -07:00 |
|
Woosuk Kwon
|
2062c0723d
|
[Spec Decode] Refactor spec decoding into a separate function (#20238)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 08:13:50 -07:00 |
|
li haoyang
|
1c50e100a9
|
[Bugfix] fix quark ptpc (#20251)
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: Haoyang Li <307790822@qq.com>
|
2025-06-30 22:24:50 +09:00 |
|
Michael Yao
|
3ee56e26be
|
[Docs] Fix 1-2-3 list in v1/prefix_caching.md (#20243)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-06-30 11:20:51 +00:00 |
|
Jee Jee Li
|
8fe7fc8634
|
[Quantization] Improve BitsAndBytesModelLoader (#20242)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-30 18:22:09 +08:00 |
|
Isotr0py
|
e936e401de
|
[Bugfix] Fix processor initialization in transformers 4.53.0 (#20244)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-30 10:16:16 +00:00 |
|
noiji
|
f5dfa07531
|
[Bugfix] Skip loading extra parameters for modelopt Qwen3 MoE model (#19598)
Signed-off-by: noiji <>
|
2025-06-30 18:21:56 +09:00 |
|
Reid
|
022c58b80f
|
[doc] Add Slack and Forum to the top navigation (#20208)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-06-30 07:53:45 +00:00 |
|
Woosuk Kwon
|
19108ef311
|
[Misc] Fix import (#20233)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-29 20:34:54 -07:00 |
|
Chendi.Xue
|
5a52f389dd
|
[BUGFIX][DEEPSEEK][MODEL_LOAD] fix w13, w2 weight not initialized assert (#20202)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-06-29 19:46:19 -07:00 |
|
redmoe-moutain
|
65b1cbb138
|
[Model] support dots1 (#18254)
Signed-off-by: redmoe-moutain <agiredmoe@gmail.com>
|
2025-06-29 19:34:36 -07:00 |
|
Huy Do
|
6c9837a761
|
Fix cuda_archs_loose_intersection when handling sm_*a (#20207)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-06-29 16:52:34 -07:00 |
|
Dipika Sikka
|
6f2f53a82d
|
[Quantization] Add compressed-tensors NVFP4 MoE Support (#19990)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
|
2025-06-29 22:05:40 +00:00 |
|
Michael Goin
|
7b1895e6ce
|
[CI Fix] Try fixing eagle e2e test OOM by reducing block allocation (#20213)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-29 10:31:37 +08:00 |
|
Wentao Ye
|
4d36693687
|
[Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx (#20187)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-28 22:06:38 +00:00 |
|
Stan Wozniak
|
daec9dea6e
|
[Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution (#20137)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
|
2025-06-28 08:16:41 -07:00 |
|
Nicolò Lucchesi
|
daceac57c7
|
[Frontend] Generalize v1/audio/transcriptions endpoint (#20179)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-06-28 08:15:26 -07:00 |
|
Thomas Parnell
|
8615d9776f
|
[CI/Build] Add new CI job to validate Hybrid Models for every PR (#20147)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-27 23:00:25 -07:00 |
|
Jiayi Yan
|
7b460c25f9
|
[BugFix] Fix the incorrect func name in the comments. (config.py) (#20185)
|
2025-06-27 22:51:16 -07:00 |
|
Michael Goin
|
f719772281
|
[Bugfix] Properly reject requests with empty list guided_choice (#20195)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-27 22:50:52 -07:00 |
|
Wentao Ye
|
d45417b804
|
fix ci issue distributed 4 gpu test (#20204)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-27 22:50:00 -07:00 |
|
Michael Goin
|
a29e62ea34
|
Fix num_token_padding support for static per-tensor scaled_fp8_quant (#20188)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-27 22:48:13 -07:00 |
|
Chales Xu
|
e53be6f00a
|
[Misc] Add type assertion of request_id for LLMEngine.add_request (#19700)
Signed-off-by: n2ptr <xuzhanchaomail@163.com>
|
2025-06-27 22:47:36 -07:00 |
|
Michael Goin
|
c329ceca6d
|
[CI Fix] Pin tests/models/registry.py MiniMaxText01ForCausalLM to revision due to model changes (#20199)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-28 13:43:06 +08:00 |
|
Fabien Dupont
|
3c545c0c3b
|
[CI/Build] Allow hermetic builds (#18064)
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Fabien Dupont <fabiendupont@pm.me>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Elias Levy <eliaslevy@google.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-06-27 09:04:39 -07:00 |
|
Tyler Michael Smith
|
e8c3bd2cd1
|
[Bugfix] Fix some narrowing conversion warnings (#20141)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-27 09:01:28 -07:00 |
|
bnellnm
|
c6c983053d
|
[Bugfix] Mark 'hidden_states' as mutable in moe_forward registration. (#20152)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-27 09:42:22 -06:00 |
|
Luka Govedič
|
aafabaa0d5
|
[Fix][torch.compile] Enable custom ops by default when Inductor off (#20102)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-27 09:00:42 -06:00 |
|
Hosang
|
94a55c7681
|
[Fix][ROCm] Remove unused variables to fix build error on GFX11/12 (#19891)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
|
2025-06-27 07:14:44 -07:00 |
|
Ilya Lavrenov
|
aa0dc77ef5
|
[Perf] Improved perf for resolve_chat_template_content_format (#20065)
Signed-off-by: Ilya Lavrenov <ilya.lavrenov@cerebras.net>
|
2025-06-27 09:16:41 +00:00 |
|
Michael Goin
|
4ab3ac285e
|
[Bugfix] Fix flaky failure when getting DP ports (#20151)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-27 15:30:53 +08:00 |
|
Robert Shaw
|
d1c956dc0f
|
Gemma3n (Text-only) (#20134)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-06-27 07:16:26 +00:00 |
|
Chendi.Xue
|
dec197e3e5
|
Quick Fix by adding conditional import for flash_attn_varlen_func in flash_attn (#20143)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
|
2025-06-27 05:48:13 +00:00 |
|
Yazan Sharaya
|
6e244ae091
|
[Perf][Frontend] eliminate api_key and x_request_id headers middleware overhead (#19946)
Signed-off-by: Yazan-Sharaya <yazan.sharaya.yes@gmail.com>
|
2025-06-27 00:44:14 -04:00 |
|
wang.yuqi
|
cd4cfee689
|
[Model][1/N] Automatic conversion of CrossEncoding model (#20012)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-06-26 21:10:04 -07:00 |
|
Thomas Parnell
|
e110930680
|
[Fix] Fix gemma CI test failing on main (#20124)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-26 21:06:59 -07:00 |
|
Yang Wang
|
8b64c895c0
|
[CI] Sync test dependency with test.in for torch nightly (#19632)
Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-26 20:55:25 -07:00 |
|
li haoyang
|
0740e29b66
|
[Feature] add quick all reduce (#19744)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-06-26 20:54:24 -07:00 |
|
Michael Goin
|
44d2e6af63
|
[Bugfix] Build moe_data for both sm100 and sm90 (#20086)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-26 20:50:12 -07:00 |
|
Ilya Markov
|
2d7779f888
|
[Perf] SM100 FP8 GEMM Optimizations after cutlass_profiler (#20071)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-06-26 20:50:09 -07:00 |
|
Dipika Sikka
|
a57d57fa72
|
[Quantization] Bump to use latest compressed-tensors (#20033)
Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-06-26 20:50:06 -07:00 |
|
Michael Goin
|
71799fd005
|
[CI Failure] Fix OOM with test_oot_registration_embedding (#20144)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-27 11:21:04 +08:00 |
|
Bowen Wang
|
e9fd658a73
|
[Feature] Expert Parallelism Load Balancer (EPLB) (#18343)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
|
2025-06-26 15:30:21 -07:00 |
|