Tyler Michael Smith
|
955c624915
|
[Bugfix][Wide EP] Fix redundant work when using DeepEP, TP Attn, and EP MoE (#24134)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-08 19:01:51 -07:00 |
|
Yang Kaiyong
|
43d9ad03ba
|
[Model loader]: support multi-thread model weight loading (#23928)
Signed-off-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-09-08 18:49:39 +00:00 |
|
Jee Jee Li
|
8d7f39b48c
|
[Model] Remove quantized mixtral (#24437)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-08 11:02:14 -07:00 |
|
Jee Jee Li
|
6f4a82f8b5
|
[Model] Enable BNB support for qwen2_5_omni_thinker (#24420)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-08 09:37:08 -07:00 |
|
Chenheli Hua
|
01dfb5e982
|
[Frontend] User-provided uuids for medias in chat. (RFC #22044) (#23449)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-09-08 06:42:20 -07:00 |
|
tomeras91
|
e041314184
|
[Bugfix] Fix mamba2 prefill chunking (#23279)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-08 11:42:41 +00:00 |
|
Li Wang
|
5e537f45b4
|
[Bugfix] Fix get_quant_config when using modelscope (#24421)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-09-08 11:03:02 +00:00 |
|
Didier Durand
|
f4962a6d55
|
[Doc]: fix typos in Python comments (#24417)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-08 00:22:16 -07:00 |
|
Chatcharin Sangbutsarakum
|
60f0843ef8
|
[Model] Remove unnecessary CUDA sync of Qwen2VL image and video preprocess (#24334)
Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-07 23:11:12 -07:00 |
|
Chatcharin Sangbutsarakum
|
8a46602606
|
[Model] Remove unnecessary CUDA sync of GLM-4.1V image and video preprocess (#24332)
Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-07 23:10:54 -07:00 |
|
Jee Jee Li
|
62f66be1f7
|
[Bugfix] Fix Qwen3-coder moe tuned config (#24072)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-07 05:19:46 +00:00 |
|
Saman A. Pour
|
75334956c2
|
QWEN3 Thinking Fused MoE kernels Optimization configs (#24330)
Signed-off-by: Saman Keon <samanamp@outlook.com>
|
2025-09-07 03:18:54 +00:00 |
|
Benji Beck
|
37a6fa95fd
|
Migrate Qwen2 inputs to TensorSchema (#23475)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-06 20:07:31 -07:00 |
|
Woosuk Kwon
|
4172235ab7
|
[V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-06 16:15:18 -07:00 |
|
Isotr0py
|
00a4e56d8d
|
[Bugfix] Fix broken deepseek fp8 TP weights loading (#24367)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-06 09:23:12 -07:00 |
|
Roger Wang
|
eddaafc1c7
|
[Multimodal] Improve max video embedding length estimation in V1 (#24312)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-09-06 02:33:19 -07:00 |
|
wang.yuqi
|
6d6c6b05d3
|
[New Model]: google/embeddinggemma-300m (#24318)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-05 22:58:36 -07:00 |
|
Isotr0py
|
53b19ccdd5
|
[Core] Allow disabling TP sharding for parallel Linear layer (#23024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-05 22:53:58 -07:00 |
|
Didier Durand
|
35bf193864
|
[Doc]: fix typos in Python comments (#24294)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-05 19:41:12 -07:00 |
|
Aaron Pham
|
c29fb540ff
|
[gpt-oss] tool parser supports for /chat/completions [1/n] (#22386)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-09-04 20:39:12 -07:00 |
|
Saman A. Pour
|
482e52f56c
|
QWEN3 Coder Fused MoE kernels Optimization configs (#24266)
Signed-off-by: Saman Keon <samanamp@outlook.com>
|
2025-09-04 20:33:43 +00:00 |
|
Jee Jee Li
|
94866d7c93
|
[Misc] Slight improve deepgemm print (#24085)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-04 16:06:51 +00:00 |
|
Didier Durand
|
83609ca91d
|
[Doc]: fix typos in Python comments (#24173)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-04 08:52:17 -07:00 |
|
nvjullin
|
37241077d5
|
[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-04 09:25:40 -04:00 |
|
Yash Pratap Singh
|
c9f7081f9c
|
[LoRA]: Add lora support to qwen-2.5-omni (#24231)
|
2025-09-04 05:50:50 -07:00 |
|
Jiangyun Zhu
|
eafa8dcde6
|
[Model] Add pp support for hunyuan (#24212)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-09-04 03:58:26 -07:00 |
|
whx
|
3efb9f4d95
|
[Attention][Platform] Refactor MLA to support Custom Op (#23332)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2025-09-04 02:46:37 -07:00 |
|
mgazz
|
51d5e9be7d
|
[Core][Model] Terratorch backend integration (#23513)
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-04 00:22:41 -07:00 |
|
bingchen-mi
|
e7fc70016f
|
[Model] Add MiDashengLM model support (#23652)
Signed-off-by: chenbing8 <chenbing8@xiaomi.com>
Signed-off-by: bingchen-mi <chenbing8@xiaomi.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-04 00:08:09 -07:00 |
|
Li, Jiang
|
57b1ce94f7
|
[CPU] Refactor CPU unquantized linear (#24150)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-04 14:28:45 +08:00 |
|
Benji Beck
|
cb55ad86fe
|
Migrate ultravox inputs to TensorSchema (#23503)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-04 06:09:11 +00:00 |
|
Benji Beck
|
731a6940e3
|
Migrate whisper inputs to TensorSchema (#23505)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-03 18:04:00 +00:00 |
|
bnellnm
|
e9b92dcd89
|
[Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-03 12:35:18 -04:00 |
|
nopperl
|
fa4311d85f
|
[V1] v1 engine + full CUDA graph support for PLaMo2 (#23998)
Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp>
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
|
2025-09-03 08:24:02 -07:00 |
|
qscqesze
|
6997a25ac6
|
[Model] Remove useless code from MiniMax implementation (#23982)
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
|
2025-09-03 11:27:04 +00:00 |
|
Yong Hoon Shin
|
426cc8629f
|
[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (#24132)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-09-03 04:57:59 +00:00 |
|
Didier Durand
|
02d411fdb2
|
[Doc]: fix typos in Python comments (#24115)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:14:07 -07:00 |
|
Didier Durand
|
d7e1e59972
|
[Doc]: fix typos in Python comments (#24093)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:05:45 -07:00 |
|
co63oc
|
1bd007f234
|
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-02 20:44:50 -07:00 |
|
Wentao Ye
|
930a24144c
|
[Bug] R1 Accuracy: Fix routed_scaling_factor Double Mul Issue (#24119)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-02 22:22:30 +00:00 |
|
nathan
|
598bd74cf8
|
Fix weights loading for Apertus (#24100)
Signed-off-by: Nathan Ranchin <nranchin@student.ethz.ch>
|
2025-09-02 18:34:28 +00:00 |
|
Kyuyeun Kim
|
9480ae24e3
|
[Bugfix] Fix packed_factor missing attribute error (#23902)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2025-09-02 10:56:31 -07:00 |
|
Kyle Sayers
|
1c41310584
|
[Bugfix] Fix transform_config parsing in Compressed Tensors (#23945)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-09-02 13:54:10 -04:00 |
|
wang.yuqi
|
e0653f6c0b
|
[Model] Classification models support logit_bias / sigmoid_normalize (#24031)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-02 16:48:57 +00:00 |
|
Kyungmin Lee
|
38ba061f6f
|
[BugFix] Fix EXAONE4 rotary embeddings (#23918)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-02 14:40:55 +00:00 |
|
Nicolò Lucchesi
|
0a74e9d0f2
|
[Gemma3n] Fix audio batching (#24052)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-02 22:23:35 +08:00 |
|
WeiQing Chen
|
2f0bab3f26
|
[Model] Support dp on ViT on GLM-4.5V (#23168)
Signed-off-by: David Chen <530634352@qq.com>
|
2025-09-02 10:48:18 +00:00 |
|
Benji Beck
|
56d04089ef
|
Migrate Interns1 inputs to TensorSchema (#23510)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-02 04:35:45 +00:00 |
|
Yan Ma
|
7be0cb8e9e
|
[XPU][Feature] fp8 online quantization support for XPU (#23148)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>
|
2025-09-02 04:06:53 +00:00 |
|
Benji Beck
|
1fa1d6a9a0
|
Migrate OvisImagePatchInputs to TensorSchema (#22024)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-02 12:01:36 +08:00 |
|