yiz-liu
|
4f510bc2a1
|
[Model] Removes redundant all-reduce operation in Qwen3MoeSparseMoeBlock (#23169)
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
|
2025-08-19 16:18:41 +00:00 |
|
TJian
|
1298c67795
|
[FEAT] [Performance] Enable DP for ViT in Qwen2.5VL (#22742)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-19 15:25:57 +00:00 |
|
Jee Jee Li
|
4d9c61993a
|
[Bugfix] Fix benchmark_moe.py (#23177)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-19 13:39:40 +00:00 |
|
myselvess
|
b87cb97a53
|
[Model] support new model ovis2.5 (#23084)
Signed-off-by: myselvess <244285088@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 13:12:59 +00:00 |
|
wang.yuqi
|
f856c33ce9
|
[Model] Add multi_label_classification support (#23173)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-19 12:54:30 +00:00 |
|
elvischenv
|
03752dba8f
|
[NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-08-19 08:22:15 -04:00 |
|
Woosuk Kwon
|
40f26734b9
|
[Misc] Fix seq_lens for graph capture (#23175)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-19 03:58:16 -07:00 |
|
Tialo
|
2c3f557f08
|
[Doc] use power of 2 (#23172)
|
2025-08-19 03:16:23 -07:00 |
|
Woosuk Kwon
|
21bcc8263f
|
[Misc] Avoid accessing req_ids inside a loop (#23159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-19 09:39:38 +00:00 |
|
qizixi
|
5bfe0dea7a
|
[bug fix] Fix llama4 spec decoding (#22691)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2025-08-19 08:53:24 +00:00 |
|
Isotr0py
|
31fd3265c8
|
[Bugfix] Fix broken Minimax-01-VL model (#22116)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 08:49:29 +00:00 |
|
hustxiayang
|
31436e8b4f
|
[Misc] Add request_id into benchmark_serve.py (#23065)
Signed-off-by: yangxia <yangxiast@gmail.com>
|
2025-08-19 08:32:18 +00:00 |
|
qizixi
|
4efd43e9b4
|
Fix GLM-4.5V-FP8 numerical issue (#22949)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 07:56:31 +00:00 |
|
Daniel Serebrenik
|
3c8a787247
|
[Benchmark] Add flag --served-model-name to benchmark_serving_multi_turn (#22889)
Signed-off-by: daniels <daniels@pliops.com>
|
2025-08-19 07:48:07 +00:00 |
|
Grace Ho
|
01a08739e0
|
[misc] split engine_model into json file for nsys profile tool (#23117)
Signed-off-by: Grace Ho <grho@nvidia.com>
Signed-off-by: Grace Ho <146482179+gracehonv@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 15:44:53 +08:00 |
|
Jiangyun Zhu
|
fda9537c5e
|
[Model] Support Pipeline Parallelism for moonshotai/Kimi-VL-A3B-Thinking-2506 (#23114)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 14:24:31 +08:00 |
|
Wentao Ye
|
90bbe0a5ad
|
[Log] Warning Once for Cutlass MLA (#23137)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-18 23:24:16 -07:00 |
|
Benji Beck
|
e75f342261
|
Migrate InternVLImagePixelInputs (in nemotron_vl.py) to TensorSchema (#22023)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 13:48:26 +08:00 |
|
Nikhil Suryawanshi
|
78dba404ad
|
[Hardware][IBM Z]Enable v1 for s390x and s390x dockerfile fixes (#22725)
Signed-off-by: Nikhil Suryawanshi <suryawanshin74@gmail.com>
|
2025-08-19 04:40:37 +00:00 |
|
Chengji Yao
|
e9d6a3db69
|
[TPU] make ptxla not imported when using tpu_commons (#23081)
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
|
2025-08-19 11:46:42 +08:00 |
|
Xiao
|
a4454e9401
|
chore: disable enable_cpp_symbolic_shape_guards (#23048)
Signed-off-by: Xiao Liu <xiszishu@gmail.com>
|
2025-08-18 23:08:05 -04:00 |
|
Woosuk Kwon
|
14006840ea
|
[V0 Deprecation] Remove V0 FlashInfer attention backend (#22776)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 19:54:16 -07:00 |
|
Robert Shaw
|
6603288736
|
[CI][V0 Deprecation] Removed V0 Only Chunked Prefill and Prefix Caching Tests (#22871)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 17:39:01 -07:00 |
|
Thomas Parnell
|
95e3095136
|
[Misc] Add @tdoublep as a maintainer of hybrid model and Triton-attention related code (#23122)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-19 08:31:38 +08:00 |
|
Woosuk Kwon
|
c9b38be8aa
|
[Spec Decode] Make propose_draft_token_ids non-blocking for lower TTFT (#23041)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 17:20:38 -07:00 |
|
Woosuk Kwon
|
0dd3f4f5ab
|
[Misc] Minor refactoring for prepare_inputs (#23116)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 16:58:05 -07:00 |
|
Xiang Xu
|
498259ccce
|
Install tpu_info==0.4.0 to fix core dump for TPU (#23135)
|
2025-08-18 16:23:33 -07:00 |
|
Michael Goin
|
6d25e3fd6e
|
Use Blackwell FlashInfer MXFP4 MoE by default if available (#23008)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-18 15:25:49 -07:00 |
|
Breno Baldas Skuk
|
ac6eb49de3
|
fix: OpenAI SDK compat (ResponseTextConfig) (#23126)
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: Breno Baldas Skuk <breno.skuk@hcompany.ai>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-18 15:22:59 -07:00 |
|
Michael Goin
|
bf756321c7
|
[CI Bugfix] Pin openai<1.100 to unblock CI (#23118)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-18 12:14:01 -07:00 |
|
Raushan Turganbay
|
0e3bb543f0
|
[Bugfix] Support compile for Transformers multimodal (#23095)
Signed-off-by: raushan <raushan@huggingface.co>
|
2025-08-18 13:35:48 +00:00 |
|
杨朱 · Kiki
|
569aefd134
|
chore: remove unnecessary patch_padding_side for the chatglm model (#23090)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-18 12:32:13 +00:00 |
|
Cyrus Leung
|
d3f71f1224
|
[Refactor] Get prompt updates earlier (#23097)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-18 12:31:53 +00:00 |
|
Ning Xie
|
5a30bd10d8
|
[Bugfix] fix IntermediateTensors equal method (#23027)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-18 02:58:11 -07:00 |
|
Cyrus Leung
|
27e8d1ea3e
|
[Refactor] Define MultiModalKwargsItems separate from MultiModalKwargs (#23053)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-18 09:52:00 +00:00 |
|
Kunshang Ji
|
5c79b0d648
|
[XPU][CI]add xpu env vars in CI scripts (#22946)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-08-18 09:47:03 +00:00 |
|
Kunshang Ji
|
5f5664b3e4
|
[XPU] Fix compile size for xpu (#23069)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-08-18 00:04:08 -07:00 |
|
Roger Wang
|
89657a557c
|
[Misc] Fix backward compatibility from #23030 (#23070)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-08-17 23:33:29 -07:00 |
|
Ning Xie
|
08d5f7113a
|
[Misc] refactor function name (#23029)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-17 22:16:21 -07:00 |
|
Andy Lo
|
b2fd0b81e0
|
[Bugfix][CI] Machete kernels: deterministic ordering for more cache hits (#23055)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2025-08-17 22:10:26 -07:00 |
|
double7
|
9f1c642254
|
[Bugfix] fix Qwen2.5-Omni processor output mapping (#23058)
Signed-off-by: double7 <33449816+DoubleVII@users.noreply.github.com>
Co-authored-by: 杨森 <yangsen.double7@bytedance.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-17 22:09:11 -07:00 |
|
Ning Xie
|
7be3a59d8e
|
[Misc] enhance static type hint (#23059)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-17 22:09:08 -07:00 |
|
Woosuk Kwon
|
8ea0c2753a
|
[Misc] Minor code cleanup for _get_prompt_logprobs_dict (#23064)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-17 18:16:03 -07:00 |
|
Simon Mo
|
0fc8fa751a
|
fix: gptq marlin weight loading failure (#23066)
Create Release / Create Release (push) Has been cancelled
v0.10.1rc1
|
2025-08-17 15:56:07 -07:00 |
|
Calvin Chen
|
21e39436c8
|
[XPU] fix xpu to set cudagraph batch sizes (#23044)
Signed-off-by: calvin chen <wen.chen@dynamia.ai>
|
2025-08-17 21:45:42 +00:00 |
|
Woosuk Kwon
|
6d243efeda
|
[Misc] Convert use_structured_output property into constant (#23060)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-17 12:41:38 -07:00 |
|
Woosuk Kwon
|
c55bc1db26
|
[Misc] Remove dead return (#23061)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-17 10:36:46 -07:00 |
|
Lucas Wilkinson
|
292084e72a
|
[BugFix] Fix for IMA in FA3 varlen combine (#22967)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-17 08:52:04 -07:00 |
|
Kevinzz
|
16bff144be
|
[Misc] fix typo in the multimodal doc (#23051)
|
2025-08-17 01:56:20 -07:00 |
|
947132885
|
fe0411fc6f
|
[Bugfix] should use stack instead of concat (#22972)
Signed-off-by: 947132885 <947132885@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-17 08:46:36 +00:00 |
|