Ignacio Sica
|
369a079568
|
[Hardware][Apple-CPU] Disable OneDNN build for Apple Silicon (#24200)
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-09-04 02:48:25 -07:00 |
|
Lucas Wilkinson
|
402759d472
|
[Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-04 02:47:59 -07:00 |
|
Fanli Lin
|
2c301ee2eb
|
[Bugfix] Fix Incremental Detokenization with tokenizers == 0.22.0 (#24159)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli0116@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-04 02:47:08 -07:00 |
|
whx
|
3efb9f4d95
|
[Attention][Platform] Refactor MLA to support Custom Op (#23332)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2025-09-04 02:46:37 -07:00 |
|
anthonsu
|
04f3c35cff
|
Improve flexibility of auto_tune.sh execution. (#23766)
Signed-off-by: Anthony Su <50185138+anthonsu@users.noreply.github.com>
Signed-off-by: anthonsu <50185138+anthonsu@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-04 09:41:41 +00:00 |
|
mgazz
|
51d5e9be7d
|
[Core][Model] Terratorch backend integration (#23513)
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-04 00:22:41 -07:00 |
|
bingchen-mi
|
e7fc70016f
|
[Model] Add MiDashengLM model support (#23652)
Signed-off-by: chenbing8 <chenbing8@xiaomi.com>
Signed-off-by: bingchen-mi <chenbing8@xiaomi.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-04 00:08:09 -07:00 |
|
Weida Hong
|
12e1e63cc5
|
[Misc] Enhance output readability of helper script (#24214)
Signed-off-by: Weida Hong <wdhongtw@google.com>
|
2025-09-04 06:38:26 +00:00 |
|
Li, Jiang
|
57b1ce94f7
|
[CPU] Refactor CPU unquantized linear (#24150)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-04 14:28:45 +08:00 |
|
Benji Beck
|
cb55ad86fe
|
Migrate ultravox inputs to TensorSchema (#23503)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-04 06:09:11 +00:00 |
|
Flora Feng
|
712b273f65
|
[Refactor] Introduce basic Renderer for completion-style request (#24010)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-09-04 05:21:12 +00:00 |
|
Qiming Zhang
|
e919d6f549
|
[Kernel][Bugfix] Fix grouped topk cu (#24146)
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
|
2025-09-04 12:37:37 +08:00 |
|
wuhang
|
a38f8bd54c
|
[Feature][Responses API]Support MCP tools with streaming mode + background mode (#23927)
Signed-off-by: wuhang <wuhang6@huawei.com>
|
2025-09-04 04:05:10 +00:00 |
|
Peter Pan
|
b5ee1e3261
|
Remove deprecated PyNcclConnector (#24151)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-09-03 22:49:16 +00:00 |
|
George Nagy II
|
36c260dad6
|
[Feature][gpt-oss] Add support for num_cached_tokens and num_reasoning_tokens tracking (#23460)
Signed-off-by: George Nagy II <george.nagy0969@gmail.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-03 21:08:47 +00:00 |
|
Kebe
|
a43a3f1770
|
[Bugfix][DP] DP distribution does not require ray[default] (#23822)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-09-03 13:21:36 -07:00 |
|
WeiQing Chen
|
6adaed42f4
|
[Feature][P/D]: Optimize NIXL Connector xfer Launch (#23887)
Signed-off-by: ycyaw66 <497410282@qq.com>
Co-authored-by: ycyaw66 <497410282@qq.com>
|
2025-09-03 19:14:30 +00:00 |
|
Matthew Bonanni
|
a742322092
|
[Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend (#23289)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-03 14:05:24 -04:00 |
|
Benji Beck
|
731a6940e3
|
Migrate whisper inputs to TensorSchema (#23505)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-03 18:04:00 +00:00 |
|
bnellnm
|
e9b92dcd89
|
[Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-03 12:35:18 -04:00 |
|
nopperl
|
fa4311d85f
|
[V1] v1 engine + full CUDA graph support for PLaMo2 (#23998)
Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp>
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
|
2025-09-03 08:24:02 -07:00 |
|
Burkhard Ringlein
|
6d80ae83e1
|
[Bugfix] Fixing division by zero in triton_attn if query_heads/kv_heads > 16 (#23424)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
|
2025-09-03 15:01:09 +00:00 |
|
dongbo910220
|
4ba0c587ba
|
FIX: Add libnuma-dev to Dockerfile for dev stage (#20388)
Signed-off-by: dongbo910220 <1275604947@qq.com>
|
2025-09-03 07:17:20 -07:00 |
|
qscqesze
|
6997a25ac6
|
[Model] Remove useless code from MiniMax implementation (#23982)
Signed-off-by: QscQ <qscqesze@gmail.com>
Signed-off-by: qingjun <qingjun@minimaxi.com>
|
2025-09-03 11:27:04 +00:00 |
|
Jakub Smid
|
28f350e147
|
Support add_generation_prompt in embeddings endpoint with chat request (#23931)
Signed-off-by: biba10 <jaksmid@seznam.cz>
|
2025-09-03 10:47:55 +00:00 |
|
wang.yuqi
|
51383bd472
|
[CI] Accelerate mteb test by setting SentenceTransformers mteb score to a constant (#24088)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-03 17:23:56 +08:00 |
|
Isotr0py
|
9c99e4871f
|
[Misc] Clean up deadcode for legacy processing pipeline (#24153)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-03 08:34:29 +00:00 |
|
dsinghvi
|
70549c1245
|
[CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-09-03 16:13:11 +08:00 |
|
Nicolò Lucchesi
|
f0c503f66e
|
[Nixl] Heterogeneous TP support FlashInfer (#20189)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-03 15:19:54 +08:00 |
|
youkaichao
|
f38035c123
|
[distributed][rl] remove nccl cumem env var override (#24141)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-03 06:45:25 +00:00 |
|
Yong Hoon Shin
|
426cc8629f
|
[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (#24132)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-09-03 04:57:59 +00:00 |
|
Jiangyun Zhu
|
e81d4e69c1
|
[Misc] Add check for dual_chunk_attention (#24070)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-09-03 04:19:14 +00:00 |
|
Didier Durand
|
02d411fdb2
|
[Doc]: fix typos in Python comments (#24115)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:14:07 -07:00 |
|
Didier Durand
|
d7e1e59972
|
[Doc]: fix typos in Python comments (#24093)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:05:45 -07:00 |
|
Wentao Ye
|
c4ed78b14f
|
[Compile] Fix Compile Warning for w4a8_mm_entry.cu (#23660)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-02 20:45:52 -07:00 |
|
co63oc
|
1bd007f234
|
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-02 20:44:50 -07:00 |
|
afeldman-nm
|
136d853e65
|
[V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (#23656)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
|
2025-09-03 02:52:51 +00:00 |
|
Russell Bryant
|
e32a0e8678
|
Upgrade xgrammar to 0.1.23 (#22988)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-03 02:32:59 +00:00 |
|
youkaichao
|
42dc59dbac
|
Update release pipeline post PyTorch 2.8.0 update (#24073)
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
|
2025-09-03 10:09:19 +08:00 |
|
Chaojun Zhang
|
862f2ef893
|
[XPU] Fix the bug of LoRA logits on the XPU platform (#24081)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
|
2025-09-03 08:21:18 +08:00 |
|
Matthew Bonanni
|
2fd1a40a54
|
[CI/Build] Disable SiluMul NVFP4 quant fusion tests (#24121)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-02 16:50:28 -07:00 |
|
Wentao Ye
|
930a24144c
|
[Bug] R1 Accuracy: Fix routed_scaling_factor Double Mul Issue (#24119)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-02 22:22:30 +00:00 |
|
rasmith
|
457e471971
|
[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (#23692)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-09-02 22:13:57 +00:00 |
|
Thomas Parnell
|
d328f7894f
|
[CI] Enable all hf transformers baselines in test_hybrid (#23936)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-09-02 20:15:06 +00:00 |
|
Wentao Ye
|
98aee612aa
|
[Log] Only Print Profiler Results on Rank 0 (#23370)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-02 18:53:34 +00:00 |
|
nathan
|
598bd74cf8
|
Fix weights loading for Apertus (#24100)
Signed-off-by: Nathan Ranchin <nranchin@student.ethz.ch>
|
2025-09-02 18:34:28 +00:00 |
|
Mark McLoughlin
|
2417798471
|
[Metrics] Deprecate TPOT in favor of ITL (#24110)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-09-02 18:10:10 +00:00 |
|
Kyuyeun Kim
|
9480ae24e3
|
[Bugfix] Fix packed_factor missing attribute error (#23902)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2025-09-02 10:56:31 -07:00 |
|
Chenheli Hua
|
f399182e8c
|
Run ruff format on a few files. (#24075)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-09-02 17:55:32 +00:00 |
|
Kyle Sayers
|
1c41310584
|
[Bugfix] Fix transform_config parsing in Compressed Tensors (#23945)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-09-02 13:54:10 -04:00 |
|