Cyrus Leung
|
d7e17aaacd
|
[Refactor] Move profiling methods to MM budget (#33559)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-02 23:27:00 +08:00 |
|
Kebe
|
528e9b1490
|
[Feature][Core] Support Fabric detection to adapt the MNNVL protocol for the GB series (#33540)
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Thomas Vegas <tvegas@nvidia.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2026-02-02 22:55:46 +08:00 |
|
shanjiaz
|
d95b4be47a
|
move spec decode slow test to test_areas.yaml (#33365)
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
|
2026-02-02 06:28:36 -08:00 |
|
Isotr0py
|
4061dcf4c5
|
[Bugfix] Enable Kimi k25 processor test (#33562)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-02 14:25:25 +00:00 |
|
danielafrimi
|
0aca8b8c62
|
[MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) (#32790)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
|
2026-02-02 09:18:50 -05:00 |
|
Rabi Mishra
|
9eb58f8cf1
|
fix[ROCm]: Remove unconditional aiter import (#32902)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-02-02 22:10:02 +08:00 |
|
Cyrus Leung
|
b10d05b8a8
|
[Model] Use explicit types in get_generation_prompt (#33551)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-02 12:38:49 +00:00 |
|
Borushiki
|
b398e5c819
|
Update get_expert_mapping to include self parameter (#33525)
Signed-off-by: Borushiki <38628261+Otsutsukii@users.noreply.github.com>
|
2026-02-02 20:29:07 +08:00 |
|
Grzegorz K. Karch
|
78061ef584
|
Fix accessing hidden_act from model config (#32686)
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
|
2026-02-02 11:11:33 +00:00 |
|
Nicolò Lucchesi
|
528b3076af
|
[CI][Bugfix] Fix flaky tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_example_connector_consistency (#33555)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-02 03:01:29 -08:00 |
|
Cyrus Leung
|
a502831d36
|
[Chore] Remove redundant input parsing methods (#33542)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-02 10:50:47 +00:00 |
|
Komal Kumar Teru
|
ba871fb788
|
[Misc] support arbitrary MM datasets in spec dec bench (#33486)
Signed-off-by: kkt-cohere <komal@cohere.com>
Signed-off-by: Komal Kumar Teru <162363718+kkt-cohere@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-02 08:49:48 +00:00 |
|
R3hankhan
|
ab374786c7
|
[CPU][IBM Z][Dockerfile] Fix IBM Z builds (#33243)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-02-01 23:41:29 -08:00 |
|
RED
|
808dd87b30
|
[Model] Support DeepSeek-OCR-2 (#33165)
Signed-off-by: liuli <ll407707@alibaba-inc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: liuli <ll407707@alibaba-inc.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-02 06:24:10 +00:00 |
|
Andy Lo
|
beb8899482
|
Fix mistral sliding window parsing (#33521)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-02-02 05:08:04 +00:00 |
|
Sawyer Bowerman
|
ce88756b96
|
[Doc]: update paths for Offline/Online/Others example sections (#33494)
Signed-off-by: Sawyer Bowerman <sbowerma@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-02 03:56:53 +00:00 |
|
Paco Xu
|
a3154a6092
|
[Doc] add missing model entries in supported_models.md (#33220)
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
|
2026-02-02 03:37:25 +00:00 |
|
jack
|
7c036432fc
|
[Bugfix] GLM-4 tool parser: incremental string streaming (#33218)
Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
|
2026-02-02 11:13:31 +08:00 |
|
Robert Shaw
|
318b120766
|
[Nightly CI] Remove CT Model (#33530)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-02-01 19:09:09 -08:00 |
|
csy0225
|
c3b40dc3e7
|
[Models] Step-3.5-Flash (#33523)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com>
Co-authored-by: xiewuxun <xiewuxun@stepfun.com>
Co-authored-by: zetaohong <i-hongzetao@stepfun.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-02 10:21:18 +08:00 |
|
Yifan Qiao
|
a01ef3fa51
|
[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (#33524)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
|
2026-02-02 01:59:58 +00:00 |
|
Runkai Tao
|
7320ca3942
|
Add unpermute-aware fused MoE LoRA path (#32655)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
|
2026-02-02 09:46:09 +08:00 |
|
Nick Hill
|
cf0a99f84d
|
[ModelRunner V2] Support spec decode with structured outputs (#33374)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-02 00:19:59 +00:00 |
|
Nick Hill
|
e535d90deb
|
[ModelRunner V2] Misc minor simplifications and optimizations (#33467)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-01 22:17:14 +00:00 |
|
Komal Kumar Teru
|
0b225fb7b2
|
[Misc] skip target model mm emb in draft proposal step when draft is text-only (#33437)
Signed-off-by: kkt-cohere <komal@cohere.com>
|
2026-02-01 21:13:35 +00:00 |
|
will b.
|
46b4a02794
|
Fix DeepSeek V2 RoPE initialization error (#33501)
Signed-off-by: Eduardo Salinas <edus@microsoft.com>
Signed-off-by: catswe <212922539+catswe@users.noreply.github.com>
Co-authored-by: Eduardo Salinas <edus@microsoft.com>
|
2026-02-01 21:00:56 +00:00 |
|
shaharmor98
|
8869cd8ec1
|
Add MoE config for Super B200 TP2 (#33510)
|
2026-02-01 18:48:37 +00:00 |
|
JartX
|
cd86fff38f
|
[BUGFIX] Fix hipErrorIllegalState in Qwen3-Omni during startup profiling allow inference Omni on ROCM (#33077)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2026-02-01 13:36:25 +00:00 |
|
Maral
|
b5f8c3092d
|
[W8A8 Block Linear Refactor][1/N] Keep all quantization types into QuantFP8 class. (#33047)
Signed-off-by: maral <maralbahari.98@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-01 09:28:01 +00:00 |
|
Cyrus Leung
|
21997f45b1
|
[Redo] #33110 with threading limit (#33502)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: YunzhuLu <lucia.yunzhu@gmail.com>
|
2026-02-01 09:18:11 +00:00 |
|
Luka Govedič
|
672023877b
|
Change defaults for vllm bench startup (#33489)
|
2026-01-31 23:46:01 -08:00 |
|
Zack Yu
|
754a8ca942
|
fix: only include Authorization header when OPENAI_API_KEY is set (#33488)
Signed-off-by: zack041 <zackyu041@gmail.com>
|
2026-01-31 23:35:09 -08:00 |
|
Eduardo Salinas
|
302ecf64ff
|
[Models]: lfm2_siglip2 return intermediate encoder layers (#33370)
Signed-off-by: Eduardo Salinas <edus@microsoft.com>
|
2026-02-01 06:17:49 +00:00 |
|
Cyrus Leung
|
b6bb2842cf
|
[Critical] Revert #33110 (#33500)
|
2026-01-31 21:06:42 -08:00 |
|
Cyrus Leung
|
79b6ec6aab
|
[Bugfix] Fix inconsistent handling of cache reset (#33481)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-31 20:23:41 -08:00 |
|
Greg Pereira
|
d6416fdde9
|
pin LMCache to v0.3.9 or greater with vLLM v0.15.0 (#33440)
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-31 20:50:38 -07:00 |
|
Andreas Karatzas
|
0fb3157267
|
[ROCm][CI] Update huggingface-hub pin (#33492)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-01 02:51:54 +00:00 |
|
Cyrus Leung
|
a358e4dffe
|
[Refactor] Make Renderer an abstract class (#33479)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-01 10:36:30 +08:00 |
|
René Honig
|
079781177a
|
fix: Add SM120 (RTX Blackwell) support for FlashInfer CUTLASS NVFP4 MoE kernels (#33417)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-01-31 14:06:42 -08:00 |
|
Roy Wang
|
63c0889416
|
[Misc] Fix flashinfer related tests (#33462)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-01-31 16:10:24 -05:00 |
|
smashyalts
|
1e86c802d4
|
Fix grammar (#33121)
Signed-off-by: smashyalts <smashyalts@gmail.com>
|
2026-01-31 09:59:34 -08:00 |
|
linhaifeng
|
fedf64332e
|
[Bugfix]: Fix display errors in TORCH_CHECK messages (#32942)
Signed-off-by: linhaifeng <1371675203@qq.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-31 09:48:48 -08:00 |
|
Xiao Yang
|
2238a12c13
|
[Misc] support collect_env for endpoint /server_info (#33246)
Signed-off-by: yang.xiao <yang.xiao@daocloud.io>
|
2026-02-01 01:42:59 +08:00 |
|
Harry Mellor
|
ce0afe2451
|
Update huggingface-hub pin for the last time before Transformers v5 (#33473)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-31 09:14:24 -08:00 |
|
Cyrus Leung
|
88c3e114d8
|
[Refactor] Move MM data parsing outside processor (#33408)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-31 16:46:14 +00:00 |
|
Cyrus Leung
|
92924b2ddd
|
[Deprecation] Remove deprecated items related to pooling (#33477)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-31 08:44:40 -08:00 |
|
YunzhuLu
|
27cb2f678f
|
[Bugfix] Early-reject requests with MM data longer than encode cache capacity (#33110)
Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-31 08:41:13 -08:00 |
|
jma99_2333
|
22d9a056d5
|
Support clear mm and encoder cache (#33452)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-31 15:22:25 +00:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
13b842f271
|
[BugFix][Router Replay] Capture Logical Experts with EPLB (#33013)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2026-01-31 10:12:17 -05:00 |
|
Luka Govedič
|
15f40b20aa
|
[fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops (#33441)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Richard Zou <zou3519@gmail.com>
|
2026-01-31 06:48:34 -08:00 |
|