Julien Denize
|
60ca7981bc
|
Add explicit validation error for tool calls. (#34438)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-02-13 20:04:01 -08:00 |
|
Christian S. Perone
|
0ef5b9147b
|
fix: use __annotations__ instead of get_type_hints() for dynamic kwargs detection (#34527)
Signed-off-by: Christian S. Perone <christian.perone@gmail.com>
Signed-off-by: Christian S. Perone <perone@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-13 20:03:37 -08:00 |
|
Shiyan Deng
|
ed242652d7
|
[bug] Make sure get_modality_with_max_tokens is deterministic (#34533)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
|
2026-02-13 20:02:59 -08:00 |
|
Wei Zhao
|
b37b679770
|
[Feature][Perf] Support Selective CPU Weight Offloading (#34535)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-02-13 20:02:24 -08:00 |
|
Andreas Karatzas
|
a0638d052d
|
[Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993 (#34543)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-13 20:01:42 -08:00 |
|
Harry Huang
|
c027541eaf
|
[Hybrid] Enable spec decoding in mamba cache align mode (#33705)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-02-13 13:02:28 -08:00 |
|
Ben Browning
|
fd267bc7b7
|
[Bugfix]: Fix structured output in multi-turn gpt-oss (#34454)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-13 11:12:48 -08:00 |
|
Michael Goin
|
bfaa559305
|
Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides" (#34530)
|
2026-02-13 10:35:29 -08:00 |
|
Richard Zou
|
87789c8364
|
[Misc] vLLM's --enforce-eager should turn off compile and cudagraphs only (#34523)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-13 09:52:20 -08:00 |
|
Pushpinder Singh
|
bcd65c1f6a
|
[Bugfix] Replace c10::optional with std::optional in topk kernel (#34467)
Signed-off-by: Pushpinder Singh <pushpindersingh135@gmail.com>
|
2026-02-13 08:30:23 -08:00 |
|
Wei Zhao
|
59d53066d8
|
[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-13 08:11:26 -08:00 |
|
LoganJane
|
4a9952ec1b
|
[Bugfix] Add quant_config in ViT of Kimi-K2.5 (#34501)
Signed-off-by: LoganJane <LoganJane73@hotmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-13 16:05:34 +00:00 |
|
Roger Wang
|
1dae7b7843
|
[Bugfix] Exclude language_model_only key from MM AOT compile hash but include in model one (#34508)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-13 13:59:00 +00:00 |
|
Roger Wang
|
5885e330ef
|
[Misc] Port Qwen3.5 Configs (#34512)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-13 05:24:25 -08:00 |
|
Ilya Boytsov
|
071d863e20
|
Extend ColBERT support to non-standard BERT backbones (#34170)
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>
|
2026-02-13 09:53:09 +00:00 |
|
Woosuk Kwon
|
0916e7960b
|
[GDN] Use CPU tensors to build GDN metadata (#34498)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-13 01:24:45 -08:00 |
|
Wentao Ye
|
3d2a026fd0
|
[Feature] Pipeline Parallel Async send/recv, 2.9% E2E throughput improvement (#33368)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2026-02-13 16:38:16 +08:00 |
|
Aaron Hao
|
dddbff4624
|
[Core] Move pause and resume functions into engine (#34125)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-13 00:15:10 -08:00 |
|
Martin Hickey
|
47e9b63e1a
|
[KVConnector] Clean up redundant code in KV connectors (#34147)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2026-02-13 00:14:30 -08:00 |
|
Matthias Gehre
|
934acddef9
|
[Perf] fused_moe: add int4_w4a16 benchmark support and tuning config (#34130)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-13 00:14:27 -08:00 |
|
Marek Michalowski
|
742d214d6e
|
[Bugfix] fix the import path in moe test utils.py (#34245)
Signed-off-by: Marek Michalowski <marek.michalowski@arm.com>
|
2026-02-13 00:13:45 -08:00 |
|
haosdent
|
4137c5dfa7
|
[Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode (#34418)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-02-13 00:13:22 -08:00 |
|
Harry Huang
|
7a8a46ddcb
|
[BugFix] Fix and optimize max_num_blocks_per_req calculation for MambaSpec (#34440)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-02-13 00:13:14 -08:00 |
|
myselvess
|
bcf0731aa0
|
[New Model] support new model ovis2.6 (#34426)
Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com>
|
2026-02-13 00:12:45 -08:00 |
|
Cyrus Leung
|
ec090c2429
|
[Refactor] Call renderer for online IO processor request (#34490)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-12 22:48:45 -08:00 |
|
Roger Wang
|
eea3024f43
|
[Bugfix] Fix mamba state dtype setting for Qwen3-Next and Qwen3.5 (#34489)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-12 22:48:42 -08:00 |
|
Cyrus Leung
|
2f308214c0
|
[Refactor] Pass full VllmConfig to Renderer (#34485)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-12 22:48:38 -08:00 |
|
Cyrus Leung
|
1b4e8e53f8
|
[CI/Build] Fix CUDA re-initialization error in distributed model tests (#34491)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-13 06:43:53 +00:00 |
|
haosdent
|
dcf6ee8592
|
[Bugfix] Fix encoder cache underestimation for GLM-4V/GLM-OCR single image (#34483)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-02-12 21:04:06 -08:00 |
|
Cyrus Leung
|
372b2e762a
|
[Bugfix] Standardize getting number of image patches/tokens (#34358)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-12 20:47:01 -08:00 |
|
Andreas Karatzas
|
6afa587d31
|
[ROCm][CI] Fix serving tokens test failures (#34047)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-13 11:27:53 +08:00 |
|
Cyrus Leung
|
94ed6cf6ea
|
Add new sections to CODEOWNERS (#34309)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-12 18:39:28 -08:00 |
|
Harry Huang
|
bf37812ca7
|
[Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode (#33706)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-02-12 18:21:52 -08:00 |
|
Frank Wang
|
b86bf4417e
|
[Bugfix] Fix Random Dataset Prefix Length Inaccuracy (#33907)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-12 18:21:19 -08:00 |
|
Yanan Cao
|
de13dd781f
|
[Kernel] [Helion] [5/N] Add Helion Autotuning infrastructure (#34025)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-02-12 18:21:05 -08:00 |
|
LoganJane
|
62788f99a4
|
[Bugfix] Delete unused redundant code in Kimi-K2.5 (#34427)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-12 18:18:42 -08:00 |
|
Cyrus Leung
|
ea5ff3a1f6
|
[Refactor] Simplify BOS/EOS token handling (#34435)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-12 18:18:24 -08:00 |
|
bnellnm
|
04ea31baab
|
[Bugfix] Remove assert that's no longer valid (#34443)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-02-12 18:18:15 -08:00 |
|
Harry Huang
|
6f019e6e0a
|
[BugFix] Add block_size validation for mamba cache align mode (#34445)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-02-12 18:18:07 -08:00 |
|
Zhuohan Li
|
d707678dfb
|
Fix num_logprobs parameter description in sampler.py (#34451)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2026-02-12 18:18:03 -08:00 |
|
Cyrus Leung
|
fc22cae4ac
|
[CI/Build] Update video URLs for testing (#34446)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-12 18:15:36 -08:00 |
|
Yanan Cao
|
96161fe978
|
[Kernel] [Helion] [4/N] Add silu_mul_fp8 Helion kernel (#33373)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-02-12 18:13:12 -08:00 |
|
Jaewon
|
4453ba8d9e
|
[Core] Profiler improvements and lazy initialization (#33198)
Signed-off-by: Jaewon Lee <jaewon@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-02-12 16:16:38 -08:00 |
|
Jaewon
|
aa181c923b
|
[Core] Add sleep level 0 mode with enqueue/wait pattern (#33195)
Signed-off-by: Jaewon Lee <jaewon@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-02-12 16:16:25 -08:00 |
|
Alec S
|
be7370daf3
|
[Frontend] Enable generic structured_outputs for responses API (#33709)
Signed-off-by: Alec Solder <alecs@fb.com>
Co-authored-by: Alec Solder <alecs@fb.com>
|
2026-02-12 16:15:48 -08:00 |
|
Mengtao (Martin) Yuan
|
9ea1f598ce
|
Use paged_attention_v1 for sliding window decode in rocm_aiter_fa (#34378)
Signed-off-by: Martin Yuan <myuan@meta.com>
Co-authored-by: Martin Yuan <myuan@meta.com>
|
2026-02-12 16:14:43 -08:00 |
|
amitz-nv
|
f120bd42d3
|
[Kernel] Support Flashinfer trtllm fused MoE non gated FP8 & NVFP4 (#33506)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
|
2026-02-12 13:06:58 -08:00 |
|
Hashem Hashemi
|
fac4e96940
|
small adjustment to wvSplitKrc (#34410)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-12 20:26:36 +00:00 |
|
Michael Goin
|
6d4e27ce29
|
[Bugfix] Enforce DeepGEMM when using sparse_attn_indexer on CUDA (#34374)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-12 12:08:06 -08:00 |
|
Andreas Karatzas
|
4c078fa546
|
[ROCm][CI] Pin TorchCodec to v0.10.0 for ROCm compatibility (#34447)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-12 18:47:34 +00:00 |
|