Fabien Dupont
|
3c545c0c3b
|
[CI/Build] Allow hermetic builds (#18064)
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Fabien Dupont <fabiendupont@pm.me>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Elias Levy <eliaslevy@google.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-06-27 09:04:39 -07:00 |
|
Tyler Michael Smith
|
e8c3bd2cd1
|
[Bugfix] Fix some narrowing conversion warnings (#20141)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-27 09:01:28 -07:00 |
|
bnellnm
|
c6c983053d
|
[Bugfix] Mark 'hidden_states' as mutable in moe_forward registration. (#20152)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-27 09:42:22 -06:00 |
|
Luka Govedič
|
aafabaa0d5
|
[Fix][torch.compile] Enable custom ops by default when Inductor off (#20102)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-27 09:00:42 -06:00 |
|
Hosang
|
94a55c7681
|
[Fix][ROCm] Remove unused variables to fix build error on GFX11/12 (#19891)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
|
2025-06-27 07:14:44 -07:00 |
|
Ilya Lavrenov
|
aa0dc77ef5
|
[Perf] Improved perf for resolve_chat_template_content_format (#20065)
Signed-off-by: Ilya Lavrenov <ilya.lavrenov@cerebras.net>
|
2025-06-27 09:16:41 +00:00 |
|
Michael Goin
|
4ab3ac285e
|
[Bugfix] Fix flaky failure when getting DP ports (#20151)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-27 15:30:53 +08:00 |
|
Robert Shaw
|
d1c956dc0f
|
Gemma3n (Text-only) (#20134)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-06-27 07:16:26 +00:00 |
|
Chendi.Xue
|
dec197e3e5
|
Quick Fix by adding conditional import for flash_attn_varlen_func in flash_attn (#20143)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
|
2025-06-27 05:48:13 +00:00 |
|
Yazan Sharaya
|
6e244ae091
|
[Perf][Frontend] eliminate api_key and x_request_id headers middleware overhead (#19946)
Signed-off-by: Yazan-Sharaya <yazan.sharaya.yes@gmail.com>
|
2025-06-27 00:44:14 -04:00 |
|
wang.yuqi
|
cd4cfee689
|
[Model][1/N] Automatic conversion of CrossEncoding model (#20012)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-06-26 21:10:04 -07:00 |
|
Thomas Parnell
|
e110930680
|
[Fix] Fix gemma CI test failing on main (#20124)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-06-26 21:06:59 -07:00 |
|
Yang Wang
|
8b64c895c0
|
[CI] Sync test dependency with test.in for torch nightly (#19632)
Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-26 20:55:25 -07:00 |
|
li haoyang
|
0740e29b66
|
[Feature] add quick all reduce (#19744)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-06-26 20:54:24 -07:00 |
|
Michael Goin
|
44d2e6af63
|
[Bugfix] Build moe_data for both sm100 and sm90 (#20086)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-26 20:50:12 -07:00 |
|
Ilya Markov
|
2d7779f888
|
[Perf] SM100 FP8 GEMM Optimizations after cutlass_profiler (#20071)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-06-26 20:50:09 -07:00 |
|
Dipika Sikka
|
a57d57fa72
|
[Quantization] Bump to use latest compressed-tensors (#20033)
Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-06-26 20:50:06 -07:00 |
|
Michael Goin
|
71799fd005
|
[CI Failure] Fix OOM with test_oot_registration_embedding (#20144)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-27 11:21:04 +08:00 |
|
Bowen Wang
|
e9fd658a73
|
[Feature] Expert Parallelism Load Balancer (EPLB) (#18343)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
|
2025-06-26 15:30:21 -07:00 |
|
Kyle Yu
|
07b8fae219
|
[Doc] correct LoRA capitalization (#20135)
Signed-off-by: kyolebu <kyu@redhat.com>
|
2025-06-26 15:22:12 -07:00 |
|
Wentao Ye
|
562308816c
|
[Refactor] Rename commnication utils (#20091)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-26 22:19:32 +00:00 |
|
Chengji Yao
|
04e1642e32
|
[TPU] add kv cache update kernel (#19928)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-26 10:01:37 -07:00 |
|
Kunshang Ji
|
b69781f107
|
[Hardware][Intel GPU] Add v1 Intel GPU support with Flash attention backend. (#19560)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-06-26 09:27:18 -07:00 |
|
Tyler Michael Smith
|
0bceac9810
|
Spam folks if config.py changes (#20131)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-26 08:19:46 -07:00 |
|
Cyrus Leung
|
34878a0b48
|
[Doc] Rename page titles (#20130)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-26 08:18:49 -07:00 |
|
Cyrus Leung
|
6393b03986
|
[Doc] Auto sign-off for VSCode (#20132)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-26 08:18:36 -07:00 |
|
wang.yuqi
|
0907d507bf
|
[Doc] Automatically signed-off by PyCharm (#20120)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-06-26 14:34:17 +00:00 |
|
Wentao Ye
|
c894c5dc1f
|
[Bug Fix] Fix address/port already in use error for deep_ep test (#20094)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-26 22:33:13 +08:00 |
|
Michael Goin
|
1f5d178e9c
|
Revert "[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine" (#20128)
|
2025-06-26 07:32:22 -07:00 |
|
TJian
|
27c065df50
|
[Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) (#19904)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-06-26 12:42:31 +00:00 |
|
Michael Yao
|
84c260caeb
|
[Docs] Improve frameworks/helm.md (#20113)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-06-26 10:41:51 +00:00 |
|
Reid
|
167aca45cb
|
[Misc] Use collapsible blocks for benchmark examples. (#20017)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-26 03:35:16 -07:00 |
|
Li, Jiang
|
0567c8249f
|
[CPU] Fix torch version in x86 CPU backend (#19258)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-26 03:34:47 -07:00 |
|
Wentao Ye
|
d188913d99
|
[Refactor] Remove unused library (#20099)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-26 09:16:10 +00:00 |
|
Cyrus Leung
|
1d7c29f5fe
|
[Doc] Update docs for New Model Implementation (#20115)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-26 00:47:06 -07:00 |
|
Seiji Eicher
|
65397e40f5
|
[Bugfix] Allow CUDA_VISIBLE_DEVICES='' in Platform.device_id_to_physical_device_id (#18979)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-06-26 00:01:57 -07:00 |
|
Ekagra Ranjan
|
9502c38138
|
[Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline (#20083)
|
2025-06-25 22:06:27 -07:00 |
|
Nicolò Lucchesi
|
2582683566
|
[PD] Skip tp_size exchange with rank0 (#19413)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-06-25 20:04:39 -07:00 |
|
Michael Goin
|
754b00edb3
|
[Bugfix] Fix Mistral tool-parser regex for nested JSON (#20093)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-26 01:01:17 +00:00 |
|
Michael Goin
|
296ce95d8e
|
[CI] Add SM120 to the Dockerfile (#19794)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-25 16:23:56 -07:00 |
|
Chenyaaang
|
2d7620c3eb
|
[TPU] Add TPU specific var VLLM_TPU_MOST_MODEL_LEN (#19919)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-06-25 15:51:02 -07:00 |
|
Nick Hill
|
55c65ab495
|
[P/D] Avoid stranding blocks in P when aborted in D's waiting queue (#19223)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-25 15:19:44 -07:00 |
|
Chengji Yao
|
2cc2069970
|
[TPU][Bugfix] fix kv cache padding (#20048)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-25 21:24:10 +00:00 |
|
zhrrr
|
9f0608fc16
|
[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine (#20062)
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
|
2025-06-25 21:03:17 +00:00 |
|
QiliangCui
|
4e0db57fff
|
Fix the path to the testing script. (#20082)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-25 20:48:17 +00:00 |
|
Nick Hill
|
c40692bf9a
|
[Misc] Add parallel state node_count function (#20045)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-25 13:38:53 -07:00 |
|
lkchen
|
4734704b30
|
[PD] let toy proxy handle /chat/completions (#19730)
Signed-off-by: Linkun <github@lkchen.net>
|
2025-06-25 15:17:45 -04:00 |
|
Eldar Kurtić
|
8b8c209e35
|
static_scaled_fp8_quant should not run when scale.numel is not 1 (#20076)
|
2025-06-25 15:08:03 -04:00 |
|
lsz05
|
23a04e0895
|
[Fix] Support cls pooling in ModernBertPooler (#20067)
Signed-off-by: shengzhe.li <shengzhe.li@sbintuitions.co.jp>
|
2025-06-25 15:07:45 -04:00 |
|
Dipika Sikka
|
02c97d9a92
|
[Quantization] Add compressed-tensors emulations support for NVFP4 (#19879)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
|
2025-06-25 14:28:19 -04:00 |
|