Cyrus Leung
|
19b927e52d
|
[Core] Use individual MM items in P0/P1 cache and model runner (#22570)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-13 07:18:07 -07:00 |
|
milesial
|
20d65aa755
|
[Frontend] Multithreaded async multimodal load_bytes (#22710)
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
|
2025-08-13 06:09:26 -07:00 |
|
Gh0u1L5
|
b159c0a67a
|
Fix GGUF loader for Qwen3 MoE. (#22785)
Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com>
|
2025-08-13 06:08:23 -07:00 |
|
Yuanyuan Chen
|
6772bb0f7d
|
Remove unnecessary CUDA sync of qwen image and video preprocess (#22792)
Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-13 06:07:28 -07:00 |
|
Chen Zhang
|
fceafaf582
|
[Bugfix][mamba] Fix type annotation of Mamba2Metadata (#22787)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-13 06:07:09 -07:00 |
|
Nicolò Lucchesi
|
6b794c756c
|
[Nixl][CI] Fix tests (#22806)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-13 06:03:53 -07:00 |
|
Chi Zhang
|
98deac3879
|
[FEATURE] support custom vllm tuned config path for fused moe triton kernels (#22791)
Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com>
|
2025-08-13 20:27:25 +08:00 |
|
Kdump
|
653124bd46
|
[Frontend] Add chunked processing to handle long inputs in embedding models (#22280)
Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: Kdump <rootshellexp@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-13 04:14:24 -07:00 |
|
wangxiyuan
|
0b1bdac6af
|
[Platform] Custom ops support for FusedMoe (#22509)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-08-13 04:12:00 -07:00 |
|
Giancarlo Delfin
|
d94e3026de
|
[V1] Add tree drafting tests for eagle spec decoding (#22705)
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>
|
2025-08-13 04:11:28 -07:00 |
|
633WHU
|
3f52738dce
|
[Doc] Add max_lora_rank configuration guide (#22782)
Signed-off-by: chiliu <cliu_whu@yeah.net>
|
2025-08-13 04:10:07 -07:00 |
|
Duc-Viet Hoang
|
a01e0018b5
|
[Bugfix] Fix Nemotron VL image processing (#22739)
Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
|
2025-08-13 03:11:36 -07:00 |
|
Yuxuan Zhang
|
9e7e5baaa8
|
[Model] Add missing prefix to glm4_1v (#22716)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-08-13 01:23:33 -07:00 |
|
zzh142857
|
d16aa3dae4
|
[Model] Add option to run Step3VisionEncoder in DP (#22697)
Signed-off-by: zzh142857 <chaorenzhaozhenghao@gmail.com>
|
2025-08-13 00:09:13 -07:00 |
|
Chen Zhang
|
6807af8f46
|
[gpt-oss] upgrade gpt-oss to v0.0.3 and add version check (#22768)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-12 21:37:26 -07:00 |
|
shixianc
|
4c558cf62e
|
[Perf] Support topk softmax fused kernel for broader num_experts (#22211)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
|
2025-08-12 21:34:47 -07:00 |
|
Wentao Ye
|
77a6bf07ae
|
[Bug] Fix Unexpected Keyword Argument 'w1_bias' (#22757)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-12 21:31:47 -07:00 |
|
Michael Goin
|
4082338a25
|
Remove unneeded ROCm platform import when using CUDA (#22765)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-12 21:26:38 -07:00 |
|
Michael Goin
|
c6b928798e
|
Force TRTLLM attention for gpt-oss on SM100 (#22678)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-12 21:22:16 -07:00 |
|
Michael Goin
|
b1361c7273
|
[Bugfix] Fix default enable for CUTLASS MLA on SM100 (#22738)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-12 21:22:05 -07:00 |
|
Po-Han Huang (NVIDIA)
|
4f0f844b16
|
Fix cuda illegal mem access with Llama4 TP8 + rms_norm custom op (#22701)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-08-12 21:21:50 -07:00 |
|
Woosuk Kwon
|
c5830381af
|
[V0 Deprecation] Remove args for multi-step scheduling (#22779)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-08-12 20:38:18 -07:00 |
|
Woosuk Kwon
|
d31f97cf57
|
[Misc] Remove tests/multi_step/__init__.py (#22778)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-08-12 20:21:18 -07:00 |
|
Woosuk Kwon
|
71683ca6f6
|
[V0 Deprecation] Remove multi-step scheduling (#22138)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-08-12 20:18:39 -07:00 |
|
Michael Goin
|
e18859298d
|
Add hardware plugins to installation doc (#22732)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 17:14:46 -07:00 |
|
Jee Jee Li
|
fde0b611a3
|
[Model] Decouple glm4v (#22751)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-12 17:13:17 -07:00 |
|
Harry Mellor
|
d0a6301588
|
Fix Transformers backend tensor parallel for multimodal models (#22673)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 17:12:30 -07:00 |
|
Harry Mellor
|
45c3936e94
|
[Docs] Hide the navigation and toc sidebars on home page (#22749)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 17:12:26 -07:00 |
|
Frank Wang
|
ba81acbdc1
|
[Bugfix] Bump DeepGEMM Version to Fix SMXX Layout Issues (#22606)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
|
2025-08-12 15:43:06 -07:00 |
|
RUTHLESS-BOT
|
53c730286c
|
[Misc] parametrize 'dtype' in test_flash_mla (#22641)
Signed-off-by: RUTHLESS-BOT <wujiafeng@cmbchina.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-12 16:31:48 -04:00 |
|
zifeitong
|
6534d2fc97
|
Fix torch version check for SM100 mxfp4 (#22535)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-08-12 12:54:42 -07:00 |
|
Nicolò Lucchesi
|
422f22e012
|
[CI][Nixl] Check kv cache layout during handshake (#22745)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-12 12:53:52 -07:00 |
|
Xiaozhu Meng
|
6bd8ebf026
|
[Kernel][AMD] Avoid D2H copy and cumsum kernel (#22683)
Signed-off-by: Xiaozhu <mxz297@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-12 12:53:36 -07:00 |
|
Wentao Ye
|
dab4f9f764
|
[Chore] Update CODEOWNERS to include @yewentao256 for CUDA kernels, attention backends, quantization, and related tests (#22741)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-13 00:50:31 +08:00 |
|
TeeKen Lau
|
c42fe0b63a
|
Add more test scenario for tensor schema (#22733)
Signed-off-by: teekenl <teekenlau@gmail.com>
|
2025-08-12 16:34:41 +00:00 |
|
Rahul Tuli
|
5a4b4b3729
|
Add: SupportsEagle3 interface for explicit EAGLE3 support (#22642)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
|
2025-08-12 09:24:52 -07:00 |
|
Daniel Serebrenik
|
e5d3d63c42
|
[Benchmark] Fix terminal colors in benchmark_serving_multi_turn (python 3.12) (#22730)
Signed-off-by: daniels <daniels@pliops.com>
|
2025-08-12 14:41:37 +00:00 |
|
Nicolò Lucchesi
|
3d9d40efde
|
[Bugfix][CI] Fix test_remote_decode_lifecycle.py::test_short_prompt_lifecycle (#22727)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-12 07:30:17 -07:00 |
|
Po-Han Huang (NVIDIA)
|
67c153b88a
|
Fix Llama4 FlashInfer FP4 MoE issues (#22511)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-08-12 05:50:59 -07:00 |
|
wang.yuqi
|
f7ad6a1eb3
|
[CI Failure] fix tests/entrypoints/openai/test_skip_tokenizer.py (#22708)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-12 05:42:58 -07:00 |
|
Harry Mellor
|
80bb1e8afe
|
Officially support SmolLM3 using the Transformers backend (#22665)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 05:38:48 -07:00 |
|
Nicolò Lucchesi
|
d030b01548
|
[BugFix][Nixl][PD] Fix heterogenous TP (#22663)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-08-12 05:37:30 -07:00 |
|
Harry Mellor
|
767e63b860
|
[Docs] Improve docs navigation (#22720)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 04:25:55 -07:00 |
|
Yongye Zhu
|
007dd90859
|
[gpt-oss] Enable gpt-oss on ampere (#22714)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-12 03:21:44 -07:00 |
|
Jee Jee Li
|
b8a9d0e429
|
[Misc] remove GH discussions link (#22722)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-12 03:15:33 -07:00 |
|
zejunchen-zejun
|
50f2aae1b4
|
[LMCache][Example] Align the PYTHONHASHSEED for prefillers and decoders for KV chunks hashing (#21161)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-08-12 02:05:14 -07:00 |
|
RishiAstra
|
46ae7f6666
|
[Bugfix] Mamba2 SSD varlen bug fix initstates decay, improve test, assert chunk pwr 2 (#21783)
Signed-off-by: Rishi Astra <40644327+RishiAstra@users.noreply.github.com>
|
2025-08-12 02:04:37 -07:00 |
|
Jun-Howie
|
1ece7f30ba
|
Fix: AWQ Marlin get_quant_method does not recognize "modules_to_not_convert" (#21888)
Signed-off-by: JunHowie <JunHowie@aliyun.com>
Co-authored-by: JunHowie <JunHowie@aliyun.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-12 02:03:53 -07:00 |
|
phantomlei
|
bc8372efc3
|
[Bugfix] Fix erroneous randomly generated cases in bad word testing (#22170)
Signed-off-by: phantomlei <phantomlei3@gmail.com>
|
2025-08-12 02:03:22 -07:00 |
|
Sugar-zsg
|
8d17fa633e
|
[V0] Correct CUDA Graph capture for encoder-decoder models (#22630)
|
2025-08-12 02:01:08 -07:00 |
|