Isotr0py
|
fbd8595c5c
|
[Bugfix] Fix basic models tests hanging due to mm processor creation (#22571)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-09 11:42:21 -07:00 |
|
Nicolò Lucchesi
|
5a16fa614c
|
[Model] Gemma3n MM (#20495)
Signed-off-by: ShriKode <shrikode@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: ShriKode <shrikode@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-08-09 09:56:25 -07:00 |
|
Harry Mellor
|
2d18256e47
|
Move ParallelConfig from config/__init__.py to config/parallel.py (#22565)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-09 08:33:46 -07:00 |
|
Harry Mellor
|
56186474f6
|
[Docs] Reduce noise in docs and --help from the JSON tip (#22567)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-09 08:31:32 -07:00 |
|
Thomas Parnell
|
1bf5e1f25b
|
[CI] [Hybrid] Speed up hybrid models test by removing large models (#22563)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-09 02:04:42 -07:00 |
|
Yuxuan Zhang
|
a6022e6fbc
|
GLM-4.5V with new class name at transformers (#22520)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-09 00:50:21 -07:00 |
|
Thomas Parnell
|
2be07a0db1
|
Update docs for Minimax-Text support (#22562)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-09 00:18:18 -07:00 |
|
Jee Jee Li
|
0edc0cd52b
|
[Bugfix] Fix CI moe kernel failure (#22556)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-09 00:03:29 -07:00 |
|
Isotr0py
|
7920e9b1c5
|
[Bugfix] Fix failing GPT-OSS initialization test (#22557)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-09 00:03:26 -07:00 |
|
Charlie Fu
|
b7c0942b65
|
[ROCm][Misc] Rename the context_len to seq_len in ROCm custom paged attention kernel (#22097)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-08-08 23:15:06 -07:00 |
|
Kyuyeun Kim
|
9a0c5ded5a
|
[TPU] Add support for online w8a8 quantization (#22425)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2025-08-08 23:12:54 -07:00 |
|
Eldar Kurtić
|
10a02535d4
|
Fix loading of quantized BigCode models (#22463)
Signed-off-by: Eldar Kurtic <eldar@neuralmagic.com>
|
2025-08-08 23:12:12 -07:00 |
|
Cyrus Leung
|
65552b476b
|
[Misc] Use config definitions from Transformers library (#21913)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-08 23:10:51 -07:00 |
|
Or Ozeri
|
7ad7adb67f
|
v1: Pass KVConnectorOutput to scheduler-side (#22157)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-08-08 23:09:51 -07:00 |
|
Thomas Parnell
|
6ade99eafa
|
[V1] [Hybrid] Support Minimax-Text-01 in V1 (#22151)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-08 23:08:48 -07:00 |
|
Wentao Ye
|
3157aebb63
|
[Log] Add Warning for Deprecation of DeepGEMM old version (#22194)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-08 23:07:48 -07:00 |
|
Thomas Parnell
|
8a0ffd6285
|
Remove mamba_ssm from vLLM requirements; install inside test container using --no-build-isolation (#22541)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-08 23:05:32 -07:00 |
|
Roger Wang
|
23472ff51c
|
[Doc] Add usage of implicit text-only mode (#22561)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
|
2025-08-08 23:04:19 -07:00 |
|
Roger Wang
|
08b751ba74
|
Implicit language-model-only mode via limit-mm-per-prompt (#22299)
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Andy Xie <andy.xning@gmail.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: XIn Li <xinli@nvidia.com>
Signed-off-by: Junhao Li <junhao@ubicloud.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Signed-off-by: Linkun <github@lkchen.net>
Co-authored-by: Ning Xie <andy.xning@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
Co-authored-by: Zhiyu <zhiyuc@nvidia.com>
Co-authored-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: XIn Li <xinli@nvidia.com>
Co-authored-by: Junhao Li <streaver91@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Yuxuan Zhang <2448370773@qq.com>
Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Po-Han Huang (NVIDIA) <53919306+nvpohanh@users.noreply.github.com>
Co-authored-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Hong Hanh <hanh.usth@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: lkchen <github@lkchen.net>
|
2025-08-08 22:21:40 -07:00 |
|
Isotr0py
|
429e4e2d42
|
[Bugfix] Fix ModernBert cuda graph capturing in v1 (#21901)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-08 22:17:22 -07:00 |
|
Pradyun92
|
35afe1b30b
|
[BugFix] [P/D] Handle lookahead token count edge-case with Eagle Spec Decoding and P/D (#22317)
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
Signed-off-by: Pradyun92 <142861237+Pradyun92@users.noreply.github.com>
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-08-08 17:04:15 -07:00 |
|
Kunshang Ji
|
81c57f60a2
|
[XPU] upgrade torch 2.8 on for XPU (#22300)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-08-08 17:03:45 -07:00 |
|
Russell Bryant
|
311d875614
|
Drop flaky test_healthcheck_response_time (#22539)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-08 16:56:47 -07:00 |
|
Harry Mellor
|
e3edc0a7a8
|
Extract CompilationConfig from config.py (#22524)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-08 16:34:25 -07:00 |
|
yyweiss
|
baece8c3d2
|
[Frontend] Add unix domain socket support (#18097)
Signed-off-by: <yyweiss@gmail.com>
Signed-off-by: yyw <yyweiss@gmail.com>
|
2025-08-08 16:23:44 -07:00 |
|
Guy Stone
|
2fcf6b27b6
|
[Docs] fix broken links in metrics.md (#22315)
Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-08 16:22:35 -07:00 |
|
Harry Mellor
|
41b9655751
|
Skip Qwen 1 in CI because remote code is no longer compatible with Transformers (#22536)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-08 16:20:58 -07:00 |
|
Thomas Parnell
|
bd875d2eb7
|
[Bugfix] Update FA commit hash (#22546)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-08 16:10:25 -07:00 |
|
Varun Sundar Rabindranath
|
f703b923f3
|
[Misc] DeepGEMM : Avoid JIT generation in the hot-path (#22215)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-08-08 16:09:59 -07:00 |
|
Lucas Wilkinson
|
cd9b9de1fb
|
[BugFix] Fix IMA FlashMLA full cuda-graph and DP + Update FlashMLA (#21691)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-08-08 16:09:42 -07:00 |
|
Chen Zhang
|
fe6d8257a1
|
[gpt-oss] Support tool call and implement MCP tool server (#22427)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-08 15:06:37 -07:00 |
|
Ricardo Decal
|
e290594072
|
[Docs] Rename “Distributed inference and serving” to “Parallelism & Scaling” (#22466)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-08-08 19:26:21 +00:00 |
|
Yongye Zhu
|
f756a682d9
|
[gpt-oss] guard import when triton kernel is not installed (#22529)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-08 11:18:33 -07:00 |
|
Daniel Serebrenik
|
f0964e29cb
|
[Benchmark] Add benchmark tool for multi turn conversations (#20267)
|
2025-08-08 10:28:50 -07:00 |
|
Yongye Zhu
|
e789cad6b8
|
[gpt-oss] triton kernel mxfp4 (#22421)
Signed-off-by: <zyy1102000@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-08 08:24:07 -07:00 |
|
Harry Mellor
|
e5ebeeba53
|
Remove exception for Python 3.8 typing from linter (#22506)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-08 03:06:46 -07:00 |
|
Harry Mellor
|
7be7f3824a
|
[Docs] Improve API docs (+small tweaks) (#22459)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-08 03:02:51 -07:00 |
|
Nick Hill
|
ccdae737a0
|
[BugFix] Don't cancel asyncio tasks directly from destructors (#22476)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-08 01:13:18 -07:00 |
|
rongfu.leng
|
904063907c
|
[Misc] fix openai version (#22485)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-08-08 01:12:54 -07:00 |
|
Cyrus Leung
|
43c4f3d77c
|
[Misc] Begin deprecation of get_tensor_model_*_group (#22494)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-08 01:11:54 -07:00 |
|
Cyrus Leung
|
1712543df6
|
[CI/Build] Fix multimodal tests (#22491)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-08 00:31:19 -07:00 |
|
lkchen
|
808a7b69df
|
[bench] Fix benchmark/serve.py to ignore unavailable results (#22382)
Signed-off-by: Linkun <github@lkchen.net>
|
2025-08-07 23:15:50 -07:00 |
|
iAmir97
|
099c046463
|
[Doc] Sleep mode documentation (#22310)
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Hong Hanh <hanh.usth@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-08-08 12:25:18 +08:00 |
|
Po-Han Huang (NVIDIA)
|
af473f0a85
|
[bugfix] Fix Llama3/4 issues caused by FlashInfer 0.2.10 (#22426)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-08-07 20:25:01 -07:00 |
|
Cyrus Leung
|
157f9c1368
|
Fix pre-commit (#22487)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-07 20:21:54 -07:00 |
|
ZiTian Zhao
|
6f287915d8
|
Optimize MiniCPMO mask creation with vectorized implementation (#22464)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>
|
2025-08-07 20:18:50 -07:00 |
|
Yuxuan Zhang
|
c152e2a8a0
|
not tie_word_embeddings for glm-4.5 and glm-4.5v (#22460)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-08-07 19:37:23 -07:00 |
|
Chauncey
|
17eaaef595
|
[Bugfix] Fix RuntimeError: Index put requires the source and destination dtypes match (#22065)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-08-07 19:20:21 -07:00 |
|
Junhao Li
|
3303f134e0
|
[Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) (#22131)
Signed-off-by: Junhao Li <junhao@ubicloud.com>
|
2025-08-07 19:18:28 -07:00 |
|
Shu Wang
|
b2c8ce57c6
|
Fix Flashinfer CUTLASS MOE Allgather (#21963)
Signed-off-by: Shu Wang <shuw@nvidia.com>
|
2025-08-07 19:18:25 -07:00 |
|