Huy Do
f5fdec8ce2
Upgrade transformers-4.57.5 ( #32287 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2026-01-22 05:19:19 +00:00
Lucas Wilkinson
889722f3bf
[FlashMLA] Update FlashMLA to expose new arguments ( #32810 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-21 22:02:39 -07:00
Divakar Verma
49d9653852
[ROCm][CI] fix get_valid_backends ( #32787 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-01-22 04:27:47 +00:00
knlnguyen1802
378385b90c
[EC Connector] Optimize remote cache check in scheduler ( #32585 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
2026-01-22 03:30:59 +00:00
Wentao Ye
6437ff1fb9
[Deprecation] Remove deprecated environment variables ( #32812 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-22 02:25:16 +00:00
Xin Yang
63227accf5
[Kernel] Add topk_sigmoid kernel ( #31246 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-21 22:49:51 +00:00
elvischenv
808d6fd7b9
Bump Flashinfer to v0.6.1 ( #30993 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2026-01-21 08:49:50 -08:00
whx
1861ae8aae
[PluggableLayer][1/N] Define PluggableLayer (Fix ci) ( #32744 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-21 11:38:04 -05:00
Robert Shaw
4e31b7f228
[Quantization][Deprecation] Remove RTN ( #32697 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-21 16:34:42 +00:00
Robert Shaw
85f55c943c
[Quantization][Deprecation] Deprecate HQQ ( #32681 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-21 09:32:40 -05:00
Robert Shaw
42135d6898
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority ( #32414 )
2026-01-21 08:22:33 -05:00
Kim Hee Su
7727ce35c2
[Model] Add Eagle2.5-8B Vision-Language Model support ( #32456 )
...
Signed-off-by: kimheesu <wlskaka4@gmail.com >
2026-01-21 09:39:53 +00:00
Lucas Wilkinson
b4f64e5b02
Update FlashMLA ( #32491 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-21 13:03:37 +08:00
Nick Hill
6f067b1fb7
[Cleanup] Remove unused KVConnectorModelRunnerMixin methods ( #32077 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-21 11:16:37 +08:00
Alex Brooks
27b81e010d
[Bugfix] Fix Granite Vision / Don't use Siglip Pooling Head Nested Models by Default ( #32299 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2026-01-21 11:11:52 +08:00
Or Ozeri
7013e9ac8f
OffloadingConnector: Prevent redundant loads ( #29087 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-21 01:15:42 +00:00
Robert Shaw
c78ee240b3
Revert "[PluggableLayer][1/N] Define PluggableLayer" ( #32725 )
2026-01-21 00:21:06 +00:00
Vasiliy Kuznetsov
d2389c1262
fp8 online quant: split out Fp8OnlineLinearMethod ( #32189 )
2026-01-20 18:13:22 -05:00
Lucas Wilkinson
2261340806
[Misc] Remove pad_for_cudagraphs from config ( #30143 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-20 15:05:48 -05:00
dolpm
7c5dedc247
[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction ( #25205 )
...
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com >
2026-01-20 19:45:59 +00:00
Rahul Tuli
f0feb1cf81
Test: added acceptance length tests ( #32030 )
...
Signed-off-by: rahul-tuli <rtuli@redhat.com >
2026-01-20 18:55:15 +00:00
whx
4ca62a0dbd
[PluggableLayer][1/N] Define PluggableLayer ( #32331 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-20 16:19:21 +00:00
linhaifeng
7901109ea5
[Bugfix] Fix Off-by-one error in _num_tokens_to_min_blocks calculation ( #32603 )
...
Signed-off-by: linhaifeng <1371675203@qq.com >
2026-01-20 11:13:39 -05:00
杨朱 · Kiki
bb9172030e
[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric ( #32661 )
...
This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com >
2026-01-20 12:28:41 +00:00
Walter Beller-Morales
8be263c3fb
[Core] Cleanup shm based object store on engine shutdown ( #32429 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-01-20 08:53:37 +00:00
vllmellm
148117ea2e
[Refactor] Make FP8 Linear Ops use kernel abstraction ( #27814 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-20 14:48:20 +08:00
Jackmin801
12dab78f49
[Feat] allow inplace loading lora ( #31326 )
...
Signed-off-by: Jackmin801 <ongjackm@gmail.com >
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-20 10:15:20 +08:00
Tomas Ruiz
4a5299c93f
feat: spec decode with draft models ( #24322 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2026-01-19 16:05:46 -05:00
jiahanc
7350331718
[BugFix] Fix TRT-LLM NVFP4 DP/EP ( #32349 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-19 14:32:24 -05:00
Yanan Cao
9d1e611f0e
[CI] Add Helion as an optional dependency ( #32482 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-01-19 19:09:56 +00:00
Vadim Gimpelson
0727cc9ecf
[BUGFIX] Fix test_mla_backends.py. Scale MLA projection weights to prevent numerical instability ( #32529 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-01-19 13:49:29 -05:00
danisereb
aa7f37ccfa
Add support for LoRA adapters in Nemotron-H models ( #30802 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-01-19 22:30:44 +08:00
wang.yuqi
c88860d759
[Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs ( #32577 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-19 14:07:46 +00:00
Nicolò Lucchesi
74c583bc50
[Core] Whisper support torch.compile ( #30385 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-19 10:02:31 +00:00
Andreas Karatzas
c0a350ca73
[ROCm][CI] Add ROCm attention backend support for EAGLE DP tests ( #32363 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-19 09:57:54 +00:00
Yuxuan Zhang
71832ba71e
[GLM-4.7] GLM Model support for GLM-Lite ( #31386 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Yuxuan Zhang <2448370773@qq.com >
2026-01-19 01:18:38 -08:00
Matt
11bbf86f6a
[CI][Hardware][AMD] Fix test_rotary_embedding_mla_cache_fused ( #32408 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-19 08:25:47 +00:00
Hyunkyun Moon
3c8740aacb
[Frontend] Add render endpoints for prompt preprocessing ( #32473 )
...
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com >
Signed-off-by: Hyunkyun Moon <mhg5303@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-19 12:21:46 +08:00
Alex Brooks
7518a3dc65
[CI/Build] Use Common Event Map Fixture in Harmony / MCP Server Tests ( #32531 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2026-01-19 04:05:51 +00:00
bnellnm
327a02d8db
[MoE Refactor] Separate Router into OO Classes ( #30623 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-01-18 11:40:49 -05:00
Li Xie
c826c72a96
[Model] Support Step1 Model ( #32511 )
...
Signed-off-by: xieli <xieli@stepfun.com >
2026-01-18 10:20:46 +00:00
Isotr0py
8cc26acd8b
[Performance] Improve Triton prefill attention kernel's performance ( #32403 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-17 20:19:59 -08:00
Robert Shaw
4a6af8813f
[MoE Refactor] Move Test Impl into Test Dirs ( #32129 )
...
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
2026-01-18 12:16:59 +08:00
vanshil shah
037a6487af
apply _validate_input to MistralTokenizer token-id chat prompts ( #32448 )
...
Signed-off-by: Vanshil Shah <vanshilshah@gmail.com >
2026-01-17 03:23:45 +00:00
Hashem Hashemi
7a1030431a
Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. ( #29843 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-01-16 11:45:04 -06:00
rasmith
6ca4f400d8
[CI][AMD] Skip test_permute_cols since the kernel is not used and not built for ROCm ( #32444 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
2026-01-16 16:22:53 +08:00
Micah Williamson
b84c426a8c
[ROCm][CI] Skip Qwen3-30B-A3B-MXFP4A16 Eval Test On Non-CUDA Platforms ( #32460 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-16 00:17:44 -08:00
Lucas Wilkinson
14ce524249
[CI] Breakup h200 tests ( #30499 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-16 06:23:22 +00:00
wang.yuqi
4ae77dfd42
[Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest ( #32395 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-16 06:17:04 +00:00
cjackal
35bf5d08e8
[bugfix] Fix online serving crash when text type response_format is received ( #26822 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com >
Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com >
2026-01-16 12:23:54 +08:00