Luciano Martins
|
8adcf8c40a
|
feat(models): implement Google Gemma 4 architecture support (MoE, Multimodal, Reasoning, Tool-Use) (#38826)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
(cherry picked from commit 08ed2b9688)
|
2026-04-02 11:49:53 -07:00 |
|
Chauncey
|
3a30a1a6a8
|
[Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
(cherry picked from commit cbe7d18096)
|
2026-04-01 12:10:53 -07:00 |
|
Juan Pérez de Algaba
|
29982d48b3
|
(security) Enforce frame limit in VideoMediaIO (#38636)
Signed-off-by: jperezde <jperezde@redhat.com>
(cherry picked from commit 58ee614221)
|
2026-04-01 12:10:40 -07:00 |
|
Yifan Qiao
|
1dbbafd3f3
|
[Feat][v1] Simple yet General CPU KV Cache Offloading (#37160)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
(cherry picked from commit 91e4521f9f)
|
2026-04-01 01:03:14 -07:00 |
|
Lucas Wilkinson
|
0ee3b7fc3d
|
[Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
(cherry picked from commit eb47454987)
|
2026-04-01 01:02:58 -07:00 |
|
Matthew Bonanni
|
268bed9cf3
|
[Bugfix][Async] Fix async spec decoding with hybrid models (#38556)
Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>
(cherry picked from commit 757068dc65)
|
2026-04-01 01:02:35 -07:00 |
|
Jiangyun Zhu
|
bcc0fdd0f3
|
[CI] fix LM Eval Qwen3.5 Models (B200) (#38632)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
(cherry picked from commit ea7bfde6e4)
|
2026-04-01 01:02:20 -07:00 |
|
wang.yuqi
|
69b8bd4b33
|
[CI Failure] pin colmodernvbert revision (#38612)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit 719735d6c5)
|
2026-04-01 01:02:04 -07:00 |
|
haosdent
|
b92312dfd7
|
[CI] Fix SPLADE pooler test broken by #38139 (#38495)
Signed-off-by: haosdent <haosdent@gmail.com>
(cherry picked from commit a08b7733fd)
|
2026-03-30 21:52:13 -07:00 |
|
Andreas Karatzas
|
bea23536f6
|
[CI] Add temperature=0.0, reduce max_tokens, and add debug prints to audio_in_video tests (#38492)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 05:36:45 +00:00 |
|
Stanislav Kirillov
|
a6db99ba02
|
[Bugfix] Support multi-type params parsing for DeepSeek v3.2 (#33703)
Signed-off-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Stanislav Kirillov <stas@nebius.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-30 04:07:28 +00:00 |
|
Andreas Karatzas
|
4f2ed5fddb
|
[ROCm][CI] Enable hybrid chunked prefill test (#38317)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 10:30:26 +08:00 |
|
Kyle Sayers
|
d28d86e8a3
|
[QeRL] Fix online quantized reloading (#38442)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-29 14:56:41 -06:00 |
|
Wentao Ye
|
995dea1354
|
[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-29 18:12:50 +00:00 |
|
Andreas Karatzas
|
43cc5138e5
|
[ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-28 22:08:03 -07:00 |
|
haosdent
|
d39b8daf5f
|
[Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-29 00:27:52 +00:00 |
|
Walter Beller-Morales
|
fafca38adc
|
[BugFix][Frontend] apply task instruction as system prompt in cohere v2/embed (#38362)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-03-28 18:30:54 +00:00 |
|
haosdent
|
b2bc736b12
|
[CI] Fix Ernie4.5-VL initialization test (#38429)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-28 22:43:24 +08:00 |
|
Bvicii
|
bda3eda82d
|
[Bugfix] Disallow renderer_num_workers > 1 with mm processor cache (#38418)
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
|
2026-03-28 06:32:52 -07:00 |
|
yzong-rh
|
6dad4c5722
|
[Test] Fix flaky race condition in test_abort_final_step (#38414)
Signed-off-by: Yifan <yzong@redhat.com>
|
2026-03-28 09:06:56 +00:00 |
|
Nicolò Lucchesi
|
44a6528028
|
[CI] Skip failing test (#38369)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-27 13:25:19 -07:00 |
|
Kyle Sayers
|
648edcf729
|
[QeRL] Compose online quantization with quantized reloading (#38032)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-27 13:22:33 -07:00 |
|
Jonas M. Kübler
|
98e7f223b9
|
enable skipping of SW attention layers when using FP8 KV cache (#33695)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
2026-03-27 07:25:02 -06:00 |
|
Juan Pérez de Algaba
|
b111f8a61f
|
fix(security): Add VLLM_MAX_N_SEQUENCES environment variable and enforce limit (#37952)
Signed-off-by: jperezde <jperezde@redhat.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-27 09:02:10 -04:00 |
|
Sage Moore
|
497e234d38
|
[EPLB] Cleanup the transfer logic for the various eplb maps (#34520)
Signed-off-by: Sage Moore <sagmoore@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-27 10:18:46 +01:00 |
|
dtc
|
6287e7fa20
|
[P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
|
2026-03-27 09:26:40 +01:00 |
|
Flora Feng
|
aee4c14689
|
[Bugfix] Fix Hermes tool parser when stream interval > 1 (#38168)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-27 14:42:26 +08:00 |
|
Li, Jiang
|
becaed6ec8
|
[CPU] Support CT W4A16 on CPU MP kernel (#38219)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-27 14:15:28 +08:00 |
|
Or Ozeri
|
7cc302dd87
|
[kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-27 08:38:33 +03:00 |
|
Bvicii
|
999dfc1622
|
[Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop (#34789)
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-26 22:17:00 -07:00 |
|
Giancarlo Delfin
|
c32e97602d
|
[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-26 13:38:12 -07:00 |
|
Andreas Karatzas
|
9c3ae04bfe
|
[ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 (#38155)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 16:51:18 +00:00 |
|
Divakar Verma
|
b9dbc5c4ab
|
[Mamba][APC] Add test case to compare apc outputs (#34977)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-03-26 16:40:35 +00:00 |
|
Andreas Karatzas
|
bdc1719eb9
|
[ROCm][CI] Fix AITER state leak in shared_fused_moe_routed_transform test (#38137)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 09:26:46 -07:00 |
|
Zhewen Li
|
be1a85b7a2
|
Revert "[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration" (#38050) (#38169)
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
|
2026-03-26 07:59:09 -07:00 |
|
Cyrus Leung
|
2e225f7bd2
|
[Renderer] Consolidate factory methods (#38218)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-26 12:19:22 +00:00 |
|
wang.yuqi
|
dcdc145893
|
[CI] Reorganize scoring tests (#38207)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-26 12:07:01 +00:00 |
|
Andreas Karatzas
|
f2d16207c7
|
[ROCm][CI] Fix flaky GPTQ compile correctness test (#38161)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 19:57:00 +08:00 |
|
Andreas Karatzas
|
37a83007fe
|
[ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm (#38167)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 19:54:59 +08:00 |
|
Wentao Ye
|
bf5eec638d
|
[Refactor] Remove unused utils (#38153)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-26 17:08:19 +08:00 |
|
Vadim Gimpelson
|
52069012fe
|
[Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-26 01:21:47 -07:00 |
|
Matej Rojec
|
2908094567
|
Add /v1/chat/completions/batch endpoint for batched chat completions (#38011)
Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
|
2026-03-26 12:13:33 +08:00 |
|
Woosuk Kwon
|
144030c84e
|
Relocate Encoder CUDA graph manager (#38116)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-25 20:52:12 -07:00 |
|
Harry Mellor
|
3c3c084240
|
Various Transformers v5 fixes (#38127)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-26 00:10:08 +00:00 |
|
Ekagra Ranjan
|
7b54f60db0
|
[Cohere] Enable Cohere-Transcribe (#38120)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-03-25 16:13:51 -07:00 |
|
Guillaume Guy
|
70a2152830
|
[MultiModal] add support for numpy array embeddings (#38119)
Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com>
Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com>
Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-25 20:13:04 +00:00 |
|
Andreas Karatzas
|
7d6917bef5
|
[ROCm] Fix MoE kernel test failures on gfx950 (#37833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-03-25 13:46:40 -05:00 |
|
Nick Hill
|
72cad44d3c
|
[Frontend] Move APIServerProcessManager target server fn (#38115)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-25 18:14:41 +00:00 |
|
Cyrus Leung
|
ba2f0acc2d
|
[Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-25 10:22:54 -07:00 |
|
Yongye Zhu
|
678b3c99e8
|
[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration (#38050)
|
2026-03-25 10:16:40 -07:00 |
|