Michael Goin
|
d46d417b58
|
[CI Perf] Only test bfloat16 for tests/compile/test_fusion_all_reduce.py (#23132)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-19 20:18:52 -06:00 |
|
633WHU
|
0167efe20d
|
[Core] Optimize scheduler request removal for single completions (#21917)
Signed-off-by: chiliu <chiliu@paypal.com>
Signed-off-by: chiliu <cliu_whu@yeah.net>
Co-authored-by: chiliu <chiliu@paypal.com>
|
2025-08-19 18:25:59 -07:00 |
|
Kyle Sayers
|
c32e6ad1f6
|
[Quantization] Bump Compressed Tensors Version (#23202)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-20 00:39:28 +00:00 |
|
Chenheli Hua
|
1630cc8d0f
|
[Benchmarks] Add video inputs to ShareGPTDataset. (#23199)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-08-19 23:42:31 +00:00 |
|
Lucas Wilkinson
|
14e2b0730b
|
[BugFix] fix CUTLASS MLA full cudagraph (#23200)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-19 22:17:08 +00:00 |
|
Michael Goin
|
0f4f0191d8
|
[CI/Build] Replace lm-eval gsm8k tests with faster implementation (#23002)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-19 15:07:30 -07:00 |
|
amirkl94
|
a38b8af4c3
|
[NVIDIA] Add SM100 Flashinfer Cutlass MoE fp8 backend (#22357)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
|
2025-08-19 18:01:53 -04:00 |
|
Michael Goin
|
21dce80ea9
|
[CI/Build] Add support for Python 3.13 (#13164)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 13:49:34 -07:00 |
|
Woosuk Kwon
|
e61bac87ee
|
[Misc] Minor refactoring for FlashInfer backend (#23147)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-19 13:11:51 -07:00 |
|
Marko Rosenmueller
|
80141bbf2f
|
fix: use cache_salt for gpt-oss (#23186)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-08-19 18:12:25 +00:00 |
|
bnellnm
|
b94faf9d50
|
[Bugfix] Fix accuracy issue when using flashinfer cutlass moe, TP=1 and modelopt. (#23125)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-19 14:00:51 -04:00 |
|
Woosuk Kwon
|
5b5f350d67
|
[Misc] Enable yapf for FlashInfer backend (#23193)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-19 10:33:47 -07:00 |
|
22quinn
|
f7cf5b512e
|
[Frontend] Add /collective_rpc API endpoint (#23075)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-19 17:29:32 +00:00 |
|
Ruixiang Tan
|
03d4235fd2
|
[Misc] Fix the benchmark's README and improve the error messages for the benchmark's argument checks (#22654)
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
|
2025-08-19 10:18:51 -07:00 |
|
Isotr0py
|
d6a1a20973
|
[CI/Build] Update transformers to v4.55.2 (#23093)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 10:06:17 -07:00 |
|
Benji Beck
|
a70d0bd0a3
|
Migrate LlavaOnevisionMultiInputs to TensorSchema (#21844)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-19 17:02:02 +00:00 |
|
Yuge Zhang
|
24f4d1a224
|
Add return_token_ids parameter to OpenAI API endpoints (#22587)
Signed-off-by: Yuge Zhang <scottyugochang@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-19 09:48:31 -07:00 |
|
yiz-liu
|
4f510bc2a1
|
[Model] Removes redundant all-reduce operation in Qwen3MoeSparseMoeBlock (#23169)
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
|
2025-08-19 16:18:41 +00:00 |
|
TJian
|
1298c67795
|
[FEAT] [Performance] Enable DP for ViT in Qwen2.5VL (#22742)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-19 15:25:57 +00:00 |
|
Jee Jee Li
|
4d9c61993a
|
[Bugfix] Fix benchmark_moe.py (#23177)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-19 13:39:40 +00:00 |
|
myselvess
|
b87cb97a53
|
[Model] support new model ovis2.5 (#23084)
Signed-off-by: myselvess <244285088@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 13:12:59 +00:00 |
|
wang.yuqi
|
f856c33ce9
|
[Model] Add multi_label_classification support (#23173)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-19 12:54:30 +00:00 |
|
elvischenv
|
03752dba8f
|
[NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-08-19 08:22:15 -04:00 |
|
Woosuk Kwon
|
40f26734b9
|
[Misc] Fix seq_lens for graph capture (#23175)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-19 03:58:16 -07:00 |
|
Tialo
|
2c3f557f08
|
[Doc] use power of 2 (#23172)
|
2025-08-19 03:16:23 -07:00 |
|
Woosuk Kwon
|
21bcc8263f
|
[Misc] Avoid accessing req_ids inside a loop (#23159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-19 09:39:38 +00:00 |
|
qizixi
|
5bfe0dea7a
|
[bug fix] Fix llama4 spec decoding (#22691)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2025-08-19 08:53:24 +00:00 |
|
Isotr0py
|
31fd3265c8
|
[Bugfix] Fix broken Minimax-01-VL model (#22116)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 08:49:29 +00:00 |
|
hustxiayang
|
31436e8b4f
|
[Misc] Add request_id into benchmark_serve.py (#23065)
Signed-off-by: yangxia <yangxiast@gmail.com>
|
2025-08-19 08:32:18 +00:00 |
|
qizixi
|
4efd43e9b4
|
Fix GLM-4.5V-FP8 numerical issue (#22949)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 07:56:31 +00:00 |
|
Daniel Serebrenik
|
3c8a787247
|
[Benchmark] Add flag --served-model-name to benchmark_serving_multi_turn (#22889)
Signed-off-by: daniels <daniels@pliops.com>
|
2025-08-19 07:48:07 +00:00 |
|
Grace Ho
|
01a08739e0
|
[misc] split engine_model into json file for nsys profile tool (#23117)
Signed-off-by: Grace Ho <grho@nvidia.com>
Signed-off-by: Grace Ho <146482179+gracehonv@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 15:44:53 +08:00 |
|
Jiangyun Zhu
|
fda9537c5e
|
[Model] Support Pipeline Parallelism for moonshotai/Kimi-VL-A3B-Thinking-2506 (#23114)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 14:24:31 +08:00 |
|
Wentao Ye
|
90bbe0a5ad
|
[Log] Warning Once for Cutlass MLA (#23137)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-18 23:24:16 -07:00 |
|
Benji Beck
|
e75f342261
|
Migrate InternVLImagePixelInputs (in nemotron_vl.py) to TensorSchema (#22023)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 13:48:26 +08:00 |
|
Nikhil Suryawanshi
|
78dba404ad
|
[Hardware][IBM Z]Enable v1 for s390x and s390x dockerfile fixes (#22725)
Signed-off-by: Nikhil Suryawanshi <suryawanshin74@gmail.com>
|
2025-08-19 04:40:37 +00:00 |
|
Chengji Yao
|
e9d6a3db69
|
[TPU] make ptxla not imported when using tpu_commons (#23081)
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
|
2025-08-19 11:46:42 +08:00 |
|
Xiao
|
a4454e9401
|
chore: disable enable_cpp_symbolic_shape_guards (#23048)
Signed-off-by: Xiao Liu <xiszishu@gmail.com>
|
2025-08-18 23:08:05 -04:00 |
|
Woosuk Kwon
|
14006840ea
|
[V0 Deprecation] Remove V0 FlashInfer attention backend (#22776)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 19:54:16 -07:00 |
|
Robert Shaw
|
6603288736
|
[CI][V0 Deprecation] Removed V0 Only Chunked Prefill and Prefix Caching Tests (#22871)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 17:39:01 -07:00 |
|
Thomas Parnell
|
95e3095136
|
[Misc] Add @tdoublep as a maintainer of hybrid model and Triton-attention related code (#23122)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-19 08:31:38 +08:00 |
|
Woosuk Kwon
|
c9b38be8aa
|
[Spec Decode] Make propose_draft_token_ids non-blocking for lower TTFT (#23041)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 17:20:38 -07:00 |
|
Woosuk Kwon
|
0dd3f4f5ab
|
[Misc] Minor refactoring for prepare_inputs (#23116)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 16:58:05 -07:00 |
|
Xiang Xu
|
498259ccce
|
Install tpu_info==0.4.0 to fix core dump for TPU (#23135)
|
2025-08-18 16:23:33 -07:00 |
|
Michael Goin
|
6d25e3fd6e
|
Use Blackwell FlashInfer MXFP4 MoE by default if available (#23008)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-18 15:25:49 -07:00 |
|
Breno Baldas Skuk
|
ac6eb49de3
|
fix: OpenAI SDK compat (ResponseTextConfig) (#23126)
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: Breno Baldas Skuk <breno.skuk@hcompany.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-18 15:22:59 -07:00 |
|
Michael Goin
|
bf756321c7
|
[CI Bugfix] Pin openai<1.100 to unblock CI (#23118)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-18 12:14:01 -07:00 |
|
Raushan Turganbay
|
0e3bb543f0
|
[Bugfix] Support compile for Transformers multimodal (#23095)
Signed-off-by: raushan <raushan@huggingface.co>
|
2025-08-18 13:35:48 +00:00 |
|
杨朱 · Kiki
|
569aefd134
|
chore: remove unnecessary patch_padding_side for the chatglm model (#23090)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-18 12:32:13 +00:00 |
|
Cyrus Leung
|
d3f71f1224
|
[Refactor] Get prompt updates earlier (#23097)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-18 12:31:53 +00:00 |
|