Commit Graph

4076 Commits

Author SHA1 Message Date
inkcherry
4505849b30 [ROCm][PD] add moriio kv connector. (#29304)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2026-01-09 14:01:57 +00:00
Andreas Karatzas
e02706d2d2 [ROCm][CI][V1] Fix nixl_connector test failure and achieve CUDA parity in test_async_scheduling (#32000)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-09 20:48:32 +08:00
Xin Yang
e7b68f4d6c [Bugfix] Fix Triton FusedMoE LoRA (#30585)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-01-09 11:46:59 +00:00
vllmellm
1a19e9cd87 [Bugfix][ROCm]Fix Qwen3-Next-80B-A3B-Thinking inference and optimize non-standard block size (544) support under rocm_atten (#31380)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2026-01-09 19:28:02 +08:00
Cyrus Leung
c8ed39b9dd [Model] Reorganize pooling layers (#31973)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-09 11:02:14 +00:00
Andreas Karatzas
020732800c [Bugfix] Fix OpenAPI schema test failures (#31921)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-09 10:56:20 +00:00
gnovack
bde38c11df fix lora moe sharding when rank < max_lora_rank (#31994)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-09 14:43:25 +08:00
Nick Hill
29ce48221c [Cleanup] Remove obsolete spec decoding compatibility logic (#32003)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-09 05:44:18 +00:00
TJian
7a05d2dc65 [CI] [ROCm] Fix tests/entrypoints/test_grpc_server.py on ROCm (#31970)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-01-09 12:54:20 +08:00
Divakar Verma
a1648c4045 [ROCm][CI] Fix test_token_classification.py::test_bert_models (#31993)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2026-01-09 04:04:33 +00:00
zhrrr
8ff4a99566 [Async][Feat] support apply penalty or bad_words for async + spec (#30495)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-01-09 02:31:50 +00:00
daniel-salib
a4ec0c5595 [Frontend] Add MCP tool streaming support to Responses API (#31761)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2026-01-09 09:19:34 +08:00
Robert Shaw
0fa8dd24d2 [Bugfix] Fix Typo from NVFP4 Refactor (#31977)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-08 16:18:50 -08:00
Nick Hill
11cec296dd [BugFix] Add spec-decode-incompatible request param validation (#31982)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-09 00:08:21 +00:00
Robert Shaw
5825bbc1f7 [Quantization] Deprecate Long Tail of Schemes (#31688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-08 19:07:45 -05:00
Lucas Wilkinson
6cdf015c3c [Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-08 15:20:49 -08:00
bnellnm
e74698c27a [Misc][Refactor] Add FusedMoERouter object (#30519)
Signed-off-by: Bill Nell <bnell@redhat.com>
2026-01-08 20:52:55 +00:00
Cyrus Leung
aa125ecf0e [Frontend] Improve error message (#31987)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 20:07:03 +00:00
Michael Goin
87e07a6b46 Revert "feat(moe): Add is_act_and_mul=False support for Triton MoE kernels" (#31978) 2026-01-08 11:31:53 -08:00
yxing-bj
fe86be66c5 [Model] Support IQuestCoder model (#31575)
Signed-off-by: yxing <yxing@iquestlab.com>
2026-01-08 14:42:57 +00:00
Mary
7645bc524b [OpenAI] Fix tool_choice=required streaming when output has trailing extra data (#31610)
Signed-off-by: maylikenoother <ogedengbemary19@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-01-08 21:01:42 +08:00
tianshu-Michael-yu
03fd76c570 [Model] Add LFM2-VL model support (#31758)
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-08 05:00:27 -08:00
Bijaya Dangol
59d260f5e4 [Model] Add Grok-2 (#31847)
Signed-off-by: dangoldbj <dangoldbj23@gmail.com>
2026-01-08 04:59:48 -08:00
Cyrus Leung
d1b6fe007f [Chore] Further cleanup pooler (#31951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-08 02:16:21 -08:00
DevByteAI
1f214290d6 fix(compile): apply partition wrapper when loading AOT cached functions (#31536)
Signed-off-by: Devbyteai <abud6673@gmail.com>
Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-08 17:27:26 +08:00
Ryan Rock
8cbdc7eb94 [CI/Build] Enable test_kv_cache_events_dp for AMD (#31834)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2026-01-08 09:00:24 +00:00
Lumosis
b634e619bb Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635)
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>
Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>
2026-01-08 09:00:07 +00:00
Isotr0py
eac3b96ec0 [Models] Allow converting Qwen3-VL into Reranker model (#31890)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-08 08:10:15 +00:00
Zhiwei
573a1d1119 [ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch (#31905)
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>
2026-01-08 15:47:44 +08:00
prashanth058
d3235cb503 [Fix] Enable mm_processor_cache with vision LoRA (#31927)
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
2026-01-08 15:31:51 +08:00
Chang Su
791b2fc30a [grpc] Support gRPC server entrypoint (#30190)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
Signed-off-by: njhill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: njhill <nickhill123@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2026-01-07 23:24:46 -08:00
Andreas Karatzas
5f2a473ff3 [ROCm][CI] v1 cpu offloading attention backend fix (#31833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-08 14:37:50 +08:00
Andreas Karatzas
087a138963 [ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory (#31928)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-08 04:35:25 +00:00
Richard Zou
a79079feef [BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (#31915)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-01-08 04:04:58 +00:00
Robert Shaw
9f6dcb71ae [MoE Refactor][16/N] Apply Refactor to NVFP4 (#31692)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
2026-01-08 03:46:27 +00:00
Andreas Karatzas
8dd2419fa9 [CI] Skip Qwen-VL in multimodal processing tests due to flaky external dependency (#31932)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-08 02:58:01 +00:00
Rabi Mishra
25eef3dc2e feat(moe): Add is_act_and_mul=False support for Triton MoE kernels (#31645)
Signed-off-by: rabi <ramishra@redhat.com>
2026-01-08 10:27:09 +08:00
Robert Shaw
5dcd7ef1f2 [MoE Refactor][15/N] Apply Refactor to Fp8 (#31415) 2026-01-07 19:42:33 -05:00
Nick Hill
10ef65eded [BugFix] Fix bad words with speculative decoding (#31908)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-07 15:46:42 -05:00
Ilya Markov
6170d47d22 [EPLB] Optimize EPLB with numpy (#29499)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-01-07 15:21:35 -05:00
Xin Yang
0ada960a20 [Kernel] Support bias type in grouped_topk kernel (#31781)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-01-07 12:16:32 -08:00
Ning Xie
c907d22158 [refactor] refactor memory constants usage (#31865)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-01-07 18:37:31 +00:00
Kfir Toledo
b89443b8d9 [KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761)
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
2026-01-07 16:59:43 +00:00
Kate Cheng
cc6dafaef2 [Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 Linear Op) (#29213)
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2026-01-07 10:53:54 -05:00
Cyrus Leung
b665bbc2d4 [Chore] Migrate V0 attention utils (#31891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-07 13:44:36 +00:00
weiyu
e7596371a4 [Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808)
Signed-off-by: Wei-Yu Lin <weiyulin@google.com>
Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>
2026-01-07 16:07:16 +08:00
Kevin McKay
4614c5a539 [Bugfix][Hardware][AMD] Consolidate FP8 min/max values helper function (#31106)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Signed-off-by: Kevin McKay <kevin@example.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07 06:55:03 +00:00
Cyrus Leung
aafd4d2354 [Chore] Try remove init_cached_hf_modules (#31786)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-07 12:34:04 +08:00
vSeamar
6f351548b2 [Frontend] Implement robust video frame recovery for corrupted videos (#29197)
Signed-off-by: cmartinez <cmartinez@roblox.com>
Signed-off-by: vSeamar <cmartinez@roblox.com>
2026-01-07 01:13:24 +00:00
Angela Yi
9a1d20a89c [CI] Add warmup run in test_fusion_attn (#31183)
Signed-off-by: angelayi <yiangela7@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-01-07 00:31:52 +00:00