Cyrus Leung
|
25e48a3aae
|
[Doc] Update usage of --limit-mm-per-prompt (#34148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-09 21:12:13 -08:00 |
|
Roger Wang
|
8a5e0e2b2b
|
[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-10 13:03:32 +08:00 |
|
Andreas Karatzas
|
4cde2e0159
|
[ROCm][Bugfix] Resolve Dynamo tracing crash from amdsmi calls in on_gfx* arch detection (#34108)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-09 20:50:20 -08:00 |
|
Roger Wang
|
047a457fa4
|
[Bugfix] Adopt ChunkGatedDeltaRule for Qwen3.5 (#34198)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-10 03:47:54 +00:00 |
|
Yuwei An
|
e94ec59733
|
[LMCache] Token Base IPC API (#34175)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
|
2026-02-10 01:18:42 +00:00 |
|
Ning Xie
|
13397841ab
|
[structured output] validate unsupported json features first (#33233)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2026-02-09 23:49:09 +00:00 |
|
Gregory Shtrasberg
|
c60f8e3b49
|
[Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available (#34153)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-02-09 17:38:54 -06:00 |
|
Michael Goin
|
5e75a14a66
|
[Doc] Add DCP support to attention backend doc (#33936)
|
2026-02-09 18:33:43 -05:00 |
|
Nick Hill
|
e7e52781ff
|
[ModelRunner V2][BugFix] Fix max_query_len calculation (#34167)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-09 21:47:17 +00:00 |
|
Charlie Fu
|
bb9f97308d
|
[torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. (#33945)
Signed-off-by: charlifu <charlifu@amd.com>
|
2026-02-09 16:15:43 -05:00 |
|
Hongxia Yang
|
4d39650961
|
[ROCm] update triton branch to support gpt-oss models for gfx11xx devices (#34032)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
|
2026-02-09 19:36:30 +00:00 |
|
Artus Krohn-Grimberghe
|
8fd31f6245
|
[Bugfix] Voxtral prompt/audio placeholder alignment (#34140)
Signed-off-by: Artus KG <artuskg@gmail.com>
|
2026-02-09 19:30:38 +00:00 |
|
Artus Krohn-Grimberghe
|
eadb4e868b
|
[Bugfix] Avoid duplicate k-proj weight emission in helper (#34142)
Signed-off-by: Artus KG <artuskg@gmail.com>
|
2026-02-09 19:17:44 +00:00 |
|
Jiangyun Zhu
|
285bab4752
|
[Kernel] use flashinfer for gdn prefill (#32846)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-02-09 12:17:25 -05:00 |
|
TomerBN-Nvidia
|
995bbf38f1
|
[Bugfix] Fix shared expert input for latent MoE in EP+DP (Nemotron-H) (#34087)
Signed-off-by: Tomer Natan <tbarnatan@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-02-09 16:44:18 +00:00 |
|
Mohammad Miadh Angkad
|
d4f123cc48
|
[Kernel] FlashInfer: switch allreduce fusion to unified API (#33985)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
|
2026-02-09 15:43:24 +00:00 |
|
ZhengHongming888
|
cb62e86f83
|
Add NUMA Core binding in nixl_connector for CPU xPyD (#32365)
Signed-off-by: Hongming Zheng <hongming.zheng@intel.com>
Signed-off-by: ZhengHongming888 <hongming.zheng@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-09 15:39:12 +00:00 |
|
Luka Govedič
|
781ddf7868
|
[CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200 (#34031)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2026-02-09 10:05:14 -05:00 |
|
Roger Wang
|
64a9c2528b
|
[UX] Add --language-model-only for hybrid models (#34120)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-09 14:57:33 +00:00 |
|
Lucas Wilkinson
|
d0d97e2974
|
[Misc] Fix up attention benchmarks (#33810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-09 09:42:03 -05:00 |
|
JJJYmmm
|
9562912cea
|
[MODEL] Adding Support for Qwen3.5 Models (#34110)
Signed-off-by: JJJYmmm <1650675829@qq.com>
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wulipc <wulipc@users.noreply.github.com>
Co-authored-by: ywang96 <ywang96@users.noreply.github.com>
Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-09 21:12:58 +08:00 |
|
zofia
|
9bdb06b436
|
[XPU][6/N] add xpu scaled_mm kernel (#34117)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
|
2026-02-09 20:17:35 +08:00 |
|
Nikhil Gupta
|
caad9f1e01
|
[Fix] [CPU Backend] : Prepack weights for w8a8 oneDNN matmul (#33901)
Signed-off-by: nikhil-arm <nikhil.gupta2@arm.com>
|
2026-02-09 18:04:41 +08:00 |
|
Ekagra Ranjan
|
1d5922fade
|
[ASR] Fix audio benchmark and add RTFx metric (#32300)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2026-02-09 10:02:37 +00:00 |
|
Andreas Karatzas
|
3025b3cebb
|
[CI] Remove empty image_size_factors for fuyu, glm4_1v, glm_ocr (#34107)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-09 17:37:04 +08:00 |
|
Jee Jee Li
|
978a37c823
|
[Model] GLM adaptation (#34124)
|
2026-02-09 17:32:52 +08:00 |
|
ihb2032
|
5a5c43511a
|
fix(cpu): fix mla_decode compilation on x86 without AVX512 (#34052)
Signed-off-by: ihb2032 <hebome@foxmail.com>
Co-authored-by: root <root@LAPTOP-FKNHV411.localdomain>
|
2026-02-09 08:55:41 +00:00 |
|
Nick Hill
|
d9bede0314
|
[BugFix] Fix fastsafetensors TP all procs using all GPUs (#34070)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-09 15:15:46 +08:00 |
|
wang.yuqi
|
22b64948f6
|
[Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
v0.16.0rc1
|
2026-02-09 06:42:38 +00:00 |
|
Reagan Lee
|
7c233dbb36
|
[Tiny] Rename encoder budget file to more specific name (#34103)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
|
2026-02-09 03:48:19 +00:00 |
|
kourosh hakhamaneshi
|
a75a5b54c7
|
[bug-fix] supported_tasks is breaking backward compatibility at init_app_state (#34027)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-09 09:46:46 +08:00 |
|
Andrey Talman
|
f97ca67176
|
[Release 2.10] Update to Torch 2.10 - final release (#30525)
|
2026-02-08 13:51:09 -08:00 |
|
danisereb
|
084aa19f02
|
Add support for ModelOpt MXFP8 dense models (#33786)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-02-08 11:16:48 -08:00 |
|
navmarri14
|
1ecfabe525
|
glm 4.6 fused tuned inference config for B200 (#32958)
|
2026-02-08 18:55:47 +00:00 |
|
Richard Zou
|
4df841fe75
|
[torch.compile] Add an option to force-enable the MOE cold start optimization (#33735)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-08 18:42:56 +00:00 |
|
TomerBN-Nvidia
|
a263aa6140
|
[BugFix] Change support no act and mul for marlin (#34088)
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
|
2026-02-08 17:18:22 +00:00 |
|
aabbccddwasd
|
179ae7da8f
|
[Revert] Fix performance regression for GLM-4.7-GPTQ decode and MTP acceptance rate (#33771)
Signed-off-by: aabbccddwasd <aabbccddwasd@qq.com>
|
2026-02-08 08:13:24 -08:00 |
|
Reagan Lee
|
c4df59ad43
|
Add embedding input functionality for disabled modalities [remake] (#32493)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”>
Signed-off-by: Reagan Lee <reaganjlee@gmail.com>
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-08 04:57:16 -08:00 |
|
TJian
|
785cf28fff
|
[ROCm] [CI] Reduce Resource of two test groups (#34059)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-02-08 15:17:26 +08:00 |
|
Nick Hill
|
a96197f564
|
[Perf] Simplify DeepseekV32 tokenizer, ensure fast detokenization used (#33855)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-08 07:16:34 +00:00 |
|
Andreas Karatzas
|
ab10d79855
|
[ROCm][Bugfix] fix act_quant_fusion module import error (#34069)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-07 19:21:12 -08:00 |
|
Cyrus Leung
|
7fcb705b80
|
[CI/Build] Skip GCS test (#34057)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 08:52:38 -08:00 |
|
Cyrus Leung
|
b956cdf818
|
[Doc] Fix run_batch docs (#34056)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-07 06:18:16 -08:00 |
|
Hashem Hashemi
|
ed17f54c8b
|
Perf tuning and expansion of cases covered for wvSplitKrc (#33493)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-07 05:33:11 -08:00 |
|
Jiang Wu
|
860981d8d8
|
Make directory exist ok for ray spinning up multiple replicas on a single instance (#33604)
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
|
2026-02-07 05:30:49 -08:00 |
|
zifeitong
|
52181baaea
|
Update DeepGEMM version pin in Dockerfile to match #32479 (#33935)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-02-07 05:30:22 -08:00 |
|
Rohan Potdar
|
de3869bb4d
|
move checks out of unified_kv_cache_update custom op (#33943)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-02-07 05:30:09 -08:00 |
|
whx
|
ce9b3cd3e9
|
[PluggableLayer][3/N] Apply PluggableLayer to mamba layers. (#33660)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2026-02-07 05:26:05 -08:00 |
|
Jee Jee Li
|
db4ede9743
|
[Model] Enable Step3p5ForCausalLM testing (#33755)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-07 05:25:24 -08:00 |
|
Pooya Davoodi
|
2cb2340f7a
|
[Frontend]Add support for transcriptions and translations to run_batch (#33934)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-07 05:24:57 -08:00 |
|