Cyrus Leung
e5de19ff9a
[CI/Build[ Don't auto-rebase PRs with CI failures ( #39443 )
...
Close inactive issues and PRs / close-issues-and-pull-requests (push) Has been cancelled
macOS Apple Silicon Smoke Test / macos-m1-smoke-test (push) Has been cancelled
pre-commit / pre-run-check (push) Has been cancelled
pre-commit / pre-commit (push) Has been cancelled
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-04-09 13:57:37 -07:00
zzaebok
edee96519a
[Spec Decode] fix returning size mismatch on extract hidden states proposer ( #38610 )
...
Signed-off-by: Jaebok Lee <jaebok9541@naver.com >
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2026-04-09 20:39:39 +00:00
Rishi Puri
adaabb8a55
Add nightly b200 test for spec decode eagle correctness ( #38577 )
...
Signed-off-by: Rishi Puri <riship@nvidia.com >
2026-04-09 20:09:09 +00:00
Ekagra Ranjan
f7cad67412
[ASR] Fix spacing bw chunks in multi chunk audio transcription ( #39116 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2026-04-09 12:46:33 -07:00
Xinyu Chen
a8134aef4e
[XPU] check is_xccl_available before oneccl warmup ( #39302 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2026-04-09 12:42:17 -07:00
Michael Goin
2800706f06
[Refactor] Move NVFP4 GEMM management into NvFp4LinearKernel ( #39129 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-04-09 15:05:36 -04:00
Cyrus Leung
0d310ffbeb
[CI/Build] Update auto-rebase rule ( #39429 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-04-09 10:59:56 -07:00
Micah Williamson
d5f75fdf50
[ROCm] Correctly guard fused_silu_mul_block_quant on ROCm ( #39387 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-04-09 17:59:03 +00:00
PikaPikachu
827268e98d
[Quantization] Support Quark W8A8 INT8 MoE inference ( #36320 )
...
Signed-off-by: kangletian <Letian.Kang@amd.com >
2026-04-09 17:24:43 +00:00
Wentao Ye
56e19d7ee2
[Model Runner V2] Fix flex attention kv blocks calculation issue ( #39353 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-04-09 13:07:43 -04:00
Andreas Karatzas
9036d4c464
[ROCm][CI] Resolved nvidia package deps issue ( #39421 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-04-10 00:06:06 +08:00
Lucas Kabela
a8c6ee9b78
[Performance Improvement] Update batched_count_greater_than to handle batch size 1 without recompile ( #38933 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-04-09 23:51:31 +08:00
Cyrus Leung
3b1d9c3156
[CI/Build] Fix memory cleanup in MM test ( #39411 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-04-09 08:50:45 -07:00
Cyrus Leung
54d244f28f
[UX] Improve error message for MM input too long ( #39409 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-04-09 13:20:19 +00:00
Richard Zou
6c749399b7
[BugFix] fix tests/kernels/moe/test_moe_layer.py ( #39404 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-04-09 08:48:59 -04:00
lalit10
91eea72330
[Tests] Add Qwen3-VL multimodal memory leak check ( #39268 )
...
Signed-off-by: Lalit Laxminarayan Bangad <lalitbangad@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-04-09 04:54:46 -07:00
Andrii Skliar
df2503e125
nemotron-nano-vl: Allow use_audio_in_video to be passed at vllm serve time ( #38538 )
...
Signed-off-by: Andrii Skliar <askliar@nvidia.com >
Co-authored-by: Andrii Skliar <askliar@nvidia.com >
2026-04-09 11:44:39 +00:00
Nick Hill
c8d98f81f6
[Core] Simplify API server handshake ( #39364 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-04-09 18:56:15 +08:00
Harry Mellor
d87fb264df
[Docs] Bring README updates into docs README ( #39397 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-04-09 10:35:00 +00:00
wang.yuqi
66c079ae83
[Frontend][4/n] Improve pooling entrypoints | pooling. ( #39153 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-04-09 10:09:45 +00:00
Shengqi Chen
b6c9be509e
[CI] fix possible user permission issues in nightly index generation ( #39390 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2026-04-09 08:14:07 +00:00
Qidong Su
ed733802f0
Fix NUMA binding on non-CDMM Grace-Blackwell systems ( #39361 )
...
Signed-off-by: Qidong Su <soodoshll@gmail.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-09 07:36:51 +00:00
Andrew Barnes
8a34c5087a
[ROCm] Remove unnecessary fp8 roundtrip in gather cache NHD dequant ( #39122 )
...
Signed-off-by: Bortlesboat <bortstheboat@gmail.com >
2026-04-09 15:12:22 +08:00
Wentao Ye
ed2f282bc8
[Perf] Optimize redundant sync for pooling model, 3.7% Throughput Improvement ( #39113 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-04-08 23:12:23 -07:00
Zhewen Li
9e78555743
[Docker] Add fastsafetensors to NVIDIA Dockerfile ( #38950 )
2026-04-08 22:21:37 -07:00
sihao_li
e80e633927
[XPU] Skip VLLM_BATCH_INVARIANT for XPU in EAGLE DP test ( #39164 )
...
Signed-off-by: sihao.li <sihao.li@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-04-09 12:45:16 +08:00
Khairul Kabir
490f17d0c7
[Multimodal] Fix nested_tensors_equal: add length check for lists and tuple support ( #38388 )
...
Signed-off-by: khairulkabir1661 <khairulkabir1661@users.noreply.github.com >
Co-authored-by: khairulkabir1661 <khairulkabir1661@users.noreply.github.com >
2026-04-09 04:40:37 +00:00
Yongye Zhu
2e98406048
[Refactor] Improve indexer decode path metadata preparation ( #38865 )
2026-04-08 20:49:15 -07:00
Chendi.Xue
ef5a226819
[PD][HeteroArch]Fix accuracy issue with CPU_ATTN as Decoder and Flash_ATTN as prefiller ( #38935 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2026-04-09 11:19:07 +08:00
Wentao Ye
aec18492d0
[CI] Fix mypy for vllm/v1/ops ( #39219 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-04-09 11:06:34 +08:00
noobHappylife
2a49284c8a
Fix Responses JSON schema alias serialization ( #38519 )
...
Signed-off-by: noobhappylife <aratar1991@hotmail.com >
Co-authored-by: OpenAI Codex <codex@openai.com >
2026-04-09 10:50:16 +08:00
Ilya Boytsov
d37b378762
[Model] Update ColModernVBERT to support latest HF checkpoint ( #39307 )
...
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com >
2026-04-09 10:48:51 +08:00
Wei Zhao
92fbec391b
[Bug] Fix routing bias dtype for trtllm per-block fp8 moe ( #38989 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-04-08 19:42:43 -07:00
Ajay Anubolu
2f41d6c063
[Bugfix] Fix cpu-offload-gb assertion with non-default block sizes ( #36461 )
...
Signed-off-by: AjAnubolu <anuboluajay@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-04-08 19:42:16 -07:00
Dipika Sikka
3aecdf08b4
[Gemma4] Support quantized MoE ( #39045 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
2026-04-08 21:57:53 -04:00
Michael Goin
eb4205fee5
[UX] Integrate DeepGEMM into vLLM wheel via CMake ( #37980 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-04-08 18:56:32 -07:00
liuzhenwei
83aea2147f
[XPU][UT] update UTs in CI ( #39296 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Kunshang Ji <jikunshang95@gmail.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-04-09 09:38:16 +08:00
Maral
2e9034c998
[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. ( #33892 )
...
Signed-off-by: maral <maralbahari.98@gmail.com >
Signed-off-by: Maral <maralbahari.98@gmail.com >
2026-04-09 08:50:39 +08:00
Benjamin Chislett
8332078cfd
[Bugfix] FlashInfer MXINT4 MoE crashes, missing do_finalize ( #39315 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-08 20:36:33 -04:00
Richard Zou
ba4a78eb5d
[torch.compile] Allow usage of Opaque Objects in PyTorch 2.11 ( #39286 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-04-08 23:21:10 +00:00
Kai Song
f3c7941ec8
[Bugfix]Fix EP precision for Qwen3.5, Qwen3-Next ( #39181 )
...
Signed-off-by: Song Kai <songkai05@baidu.com >
2026-04-09 01:47:48 +04:00
Wentao Ye
3352bf8b03
[CI Bug] Fix pre-commit issue in main ( #39347 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-04-08 14:10:05 -07:00
triangleXIV
7c94ae16c6
[BugFix] --max-model-len=-1 causes over-limit requests to hang and starve the entire service ( #39102 )
...
Signed-off-by: triangle14 <y1019026570@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2026-04-08 14:03:17 -07:00
Rishi Puri
ad05edfbca
tests/v1/e2e/spec_decode: assert async scheduling is used (#39206 )
...
Signed-off-by: Rishi Puri <riship@nvidia.com >
Signed-off-by: Rishi Puri <puririshi98@berkeley.edu >
Signed-off-by: sfeng33 <4florafeng@gmail.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: Flora Feng <4florafeng@gmail.com >
2026-04-08 20:30:03 +00:00
Wentao Ye
2018137242
[Feature] Batch invariant nvfp4 linear support ( #39322 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-04-08 16:29:13 -04:00
Jackmin801
a776a48b1c
[MoE] Move DEEP_GEMM into experts/ subdirectory ( #39005 )
...
Signed-off-by: Jackmin801 <ongjackm@gmail.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-04-08 19:23:08 +00:00
Ben Browning
8477fe427d
[Tool] adjust_request to reasoning parser, and Gemma4 fixes ( #39027 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-04-08 19:04:04 +00:00
Lain
e24e0a43a4
[Attention] relax the head dim 512 and paged kv for sm90+FA4 ( #38835 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-04-08 18:23:18 +00:00
Roberto L. Castro
b55d830ec7
[Perf][Kernel] Persistent TopK scheduler: unified CUDAGraph-safe kernel with dynamic per-row dispatch - DeepSeek-V3.2 DSA decode ( #37421 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2026-04-08 13:35:57 -04:00
Shengqi Chen
75e01a39a1
[Feature] NUMA binding support for GPU workers ( #38635 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
Co-authored-by: Jason Li <jasonlizhengjian@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-04-08 09:55:24 -07:00
Or Ozeri
512c5eb455
[kv_offload+HMA][5/N]: Track group block hashes and block IDs ( #37109 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-04-08 19:50:28 +03:00
Flora Feng
13151a4df4
[Bugfix] Fix Gemma4 streaming tool call corruption for split boolean/number values ( #39114 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-04-08 16:46:27 +00:00
Gregory Shtrasberg
56c976c1b5
[ROCm] Enable fused_silu_mul_block_quant on ROCm ( #38817 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-04-08 11:23:32 -05:00
Frederik Gossen
d74a306c4b
[Core] Use tuple_return in split_module for tuple-conformant subgraphs ( #38752 )
...
Signed-off-by: Frederik Gossen <frgossen@meta.com >
Co-authored-by: Boyuan Feng <boyuan@meta.com >
2026-04-08 09:09:58 -07:00
Gregory Shtrasberg
0e9f0a516c
[ROCm][CI-Build] Cherry pick triton BUFFER_OPS fix and update AITER ( #38580 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-04-08 10:38:03 -05:00
haosdent
8904fc4d19
[Bugfix] Fix V1 logprobs empty strings for multi-byte UTF-8 tokens when logprobs > 0 ( #34875 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-04-08 15:30:00 +00:00
nemanjaudovic
1a2c17634e
[Bugfix] Add missing ASRDataset import and CLI args in benchmarks/throughput.py ( #38114 )
...
Signed-off-by: nemanjaudovic <nudovic@amd.com >
2026-04-08 13:53:53 +00:00
Matthew Bonanni
308cec5864
[FlashAttention] Symlink FA4 instead of copying when using VLLM_FLASH_ATTN_SRC_DIR ( #38814 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-04-08 12:04:34 +00:00
wang.yuqi
4e2ab1861d
[CI Failure] pin nomic-embed-text-v1 revision ( #39292 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-04-08 11:43:06 +00:00
JartX
140cbb1186
[Bugfix] Cuda Clean up scales Kvcache fp8/int8_per_token_head ( #39224 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-04-08 04:08:04 -07:00
Kevin H. Luu
6155bbd1dd
[Bugfix][Docs] Fix ReadTheDocs build crash from mocked torch decorator ( #39284 )
...
Signed-off-by: khluu <khluu000@gmail.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-08 09:43:01 +00:00
rasmith
78434b923c
[CI][AMD][BugFix][Kernel] Cast induction variable to int64 on MI350 for chunk_gated_delta_rule_fwd_kernel_h_blockdim64 to avoid illegal memory access ( #39087 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-04-08 16:57:18 +08:00
Michael Goin
2488d1dca2
[Docs] Update README ( #39251 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-04-08 11:34:07 +08:00
yoke
d734445fcd
[Bugfix][Frontend] Fix Gemma4 streaming HTML duplication after tool calls ( #38909 )
...
Signed-off-by: yoke233 <yoke2012@gmail.com >
2026-04-08 11:03:54 +08:00
Flora Feng
927975ead8
[Parser] Migrate response api streaming to unified parser ( #38755 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
Signed-off-by: Andrew Xia <axia@meta.com >
2026-04-08 10:09:00 +08:00
Flora Feng
9ea7d670d8
[Bugfix] Fix Qwen3 tool parser for Responses API tools ( #38848 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-04-08 10:08:51 +08:00
Varun Sundar Rabindranath
7b80cd8ac3
[Docs] Add Phi-4-reasoning-vision to supported models + examples ( #39232 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2026-04-08 02:02:26 +00:00
Andrey Talman
2111997f96
[release 2.11] Update to torch 2.11 ( #34644 )
2026-04-07 18:55:48 -07:00
Flora Feng
5af684c319
[CI] Add reasoning parser tests to CI ( #37025 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-04-08 00:57:36 +00:00
Md. Mekayel Anik
d521dcdbcc
docs: clarify SMT and OMP acronyms in CpuPlatform ( #39085 )
2026-04-07 17:42:07 -07:00
Giancarlo Delfin
5daf62271d
[Model Runner V2] Fuse probabilistic rejection sample kernels ( #38496 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-04-07 17:37:37 -07:00
zofia
ad3304425b
[XPU] add xpu backend implementation of mxfp8 quant ( #38682 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-04-08 08:30:35 +08:00
Lucas Wilkinson
70406eb1dc
[Attention][V0 Deprecation] Deprecate accept output buffer ( #39125 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-04-07 17:14:58 -04:00
Yubo Wang
08bfedc152
[Bugfix] Fix extract_hidden_states crash with quantized KV cache dtype ( #39160 )
...
Signed-off-by: Yubo Wang <yubowang2019@gmail.com >
2026-04-07 11:18:33 -07:00
Flora Feng
0102bd2f4c
[Parser] Pass request.tools to tool parser ( #38860 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-04-08 01:36:21 +08:00
rasmith
83d09d36b5
[CI][Bugfix][AMD][ Ensure weights created when using emulating OCP MXFP4 ( #36993 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-04-08 00:37:16 +08:00
Chendi.Xue
92b9afeecd
[XPU] Quick fix for TritonMLA to remove cuda hardcode ( #39088 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-04-08 00:17:58 +08:00
Jinzhen Lin
7310555482
[Bugfix] Fix marlin nvfp4 rescaling ( #37502 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
2026-04-07 08:57:17 -07:00
ibifrost
96b5004b71
[KVConnector] Support 3FS KVConnector ( #37636 )
...
Signed-off-by: wuchenxin <wuchenxin.wcx@alibaba-inc.com >
Signed-off-by: ibifrost <47308427+ibifrost@users.noreply.github.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2026-04-07 15:46:00 +00:00
kkyyxhll
98e1a43af7
[Bugfix][Quantization] Fix PerTensorScale loading with tuple shard_id in MergedColumnParallelLinear ( #38517 )
...
Signed-off-by: loukang <loukang@xiaohongshu.com >
2026-04-07 11:16:26 -04:00
maobaolong
729eb59f60
[KVConnector]: prioritize external connector over internal registry ( #38301 )
...
Signed-off-by: baoloongmao <baoloongmao@tencent.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-04-07 15:03:11 +00:00
Ilya Boytsov
6e1100889e
fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader ( #39176 )
...
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com >
2026-04-07 22:40:55 +08:00
Harry Mellor
edcc37a8ce
Fix Mistral yarn warning in Transformers v5 ( #37292 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
2026-04-07 13:23:33 +00:00
Harry Mellor
79df4a794d
Automatically add links to API docs for matching strings in docs ( #37434 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-04-07 21:21:18 +08:00
Ronen Schaffer
7c139ab23f
[KV Offload] Clean up ARC/LRU refactoring leftovers: group ARC tests and fix stale comment ( #38217 )
...
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com >
2026-04-07 15:14:45 +03:00
Wei Zhao
0be9516ea4
[Bug] Fix Trtllm Fp8 MoE Weight Shuffle Memory Fragamentation ( #39054 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-04-07 08:04:08 -04:00
Kyle Mylonakis
7b9de7c892
[Bugfix] Correct mistake in chained comparison in static assert logic ( #38699 )
...
Signed-off-by: Kyle Mylonakis <kyle@protopia.ai >
2026-04-07 18:24:39 +08:00
Rohan Potdar
dd9342e6bc
only patch runtime_env for torch >= 2.10 ( #38763 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-04-07 09:29:23 +00:00
Jiangyun Zhu
8060bb0333
[vLLM IR] rework gemma_rms_norm ( #39014 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-04-07 01:37:00 -07:00
Rishapveer Singh
da4c0e4db9
[Model] Use AutoWeightsLoader for FalconH1 ( #39092 )
...
Signed-off-by: Rishapveer Singh <215205492+rishaps@users.noreply.github.com >
2026-04-07 16:25:17 +08:00
Netanel Haber
a9a0e0551f
nano-nemotron-vl: get_mm_max_tokens_per_item for audio, video, image == seq_len ( #38727 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-04-07 00:23:29 -07:00
Andrew Barnes
5c35517a3e
[ROCm] Remove unused IS_FNUZ parameter from reshape_and_cache_shuffle_kernel ( #39123 )
...
Signed-off-by: Bortlesboat <bortstheboat@gmail.com >
2026-04-07 07:17:59 +00:00
Andreas Karatzas
a435e3108d
[ROCm][CI] Fix test repo-root assumptions ( #39053 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-04-07 13:36:21 +08:00
Andreas Karatzas
2df2c85be4
[Kernels][MoE] Fix legacy_routing to use bitmatrix-based routing path ( #38504 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-04-07 10:57:09 +08:00
Nick Hill
62095e82c1
[BugFix][MRV2] Fix cuda event reuse race ( #39115 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-04-07 00:21:09 +00:00
bnellnm
b2b2c5239e
[MoE Refactor] Split up compressed_tensors_moe.py ( #38960 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-04-06 20:07:54 -04:00
fxmarty-amd
00d7b497b3
[NVFP4] Support NVFP4 dense models from modelopt and compressed-tensors on AMD Instinct MI300, MI355X and Hopper through emulation ( #35733 )
...
Signed-off-by: Felix Marty <Felix.Marty@amd.com >
Signed-off-by: fxmarty-amd <felmarty@amd.com >
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com >
2026-04-06 16:18:27 -06:00
Matthew Bonanni
9c81f35b1a
[Attention][MLA] Re-enable FA4 as default MLA prefill backend ( #38819 )
2026-04-06 17:51:46 -04:00
Woosuk Kwon
f186cfe75e
[MRV2] Fix hanging issue with DeepSeek V3.2 by setting skip_attn=False ( #39098 )
...
Signed-off-by: WoosukKwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-04-06 12:55:13 -07:00
Netanel Haber
dfa5062a8f
NemotronH default mamba_ssm_cache_dtype=float32; enable auto-hook for NemotronHNanoVLV2Config ( #39032 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-04-06 19:47:46 +00:00
Yongye Zhu
e8ebbdde83
[Quantization] Add FlashInfer CuteDSL batched experts backend for NVFP4 MoE ( #38251 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-04-06 11:57:53 -07:00
namgyu-youn
94fbb09894
[EASY] Drop duplicate KV-cache initialization ( #38799 )
...
Signed-off-by: namgyu-youn <namgyu.dev@gmail.com >
2026-04-06 18:05:39 +00:00
Wentao Ye
419e73cdfa
[Bug] Fix mistral version dependency ( #39086 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-04-06 13:31:19 -04:00
bnellnm
f01482408c
[MoE Refactor][Test] FusedMoE layer test ( #24675 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-04-06 17:17:23 +00:00
zhanqiuhu
bfdc0a3a99
[NIXL][Mamba][3/N] Heterogeneous TP: 3-read conv state transfer ( #37635 )
2026-04-06 19:07:02 +02:00
bnellnm
93bada494f
[MoE Refactor] Split of DefaultMoERunner class ( #35326 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-04-06 12:41:59 -04:00
Frederik Gossen
608914de30
[Core] Re-enable Inductor pre-grad passes in standalone compile (torch>=2.12) ( #38944 )
...
Signed-off-by: Frederik Gossen <frgossen@meta.com >
2026-04-06 09:37:13 -07:00
Wentao Ye
4ae218c122
[Refactor] Remove unused dead code ( #38842 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-04-06 11:52:05 -04:00
Lukas Geiger
f40d9879f2
[Models][GDN] Remove GPU/CPU syncs in GDNAttentionMetadata.build during speculative decoding ( #38047 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2026-04-06 15:39:37 +00:00
Lucas Wilkinson
47e605092b
[Gemma4] Enable Fast Prefill Optimization ( #38879 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-04-06 11:19:39 -04:00
Walter Beller-Morales
e69a265135
[Feat][Core] safely abort requests when FSM fails to advance ( #38663 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-04-06 08:00:16 -07:00
Julien Denize
fef56c1855
[Mistral Grammar] Support Grammar Factory ( #38150 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2026-04-06 10:28:51 -04:00
bhargav-patel-29
c5e3454e5a
[Model] Add support for BharatGen's Param2MoE model ( #38000 )
...
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-06 16:19:56 +08:00
liuchenbing2026
f6983f01de
MiniMax-M2: add Eagle3 speculative decoding support ( #37512 )
...
Signed-off-by: liuchenbing <chenliumail@163.com >
Signed-off-by: liucb <liuchengbao_work@163.com >
Co-authored-by: liuchenbing <chenliumail@163.com >
2026-04-05 19:50:18 -07:00
Andreas Karatzas
780ba37458
[ROCm][Quantization] Add asymmetric INT8 quantization support to TritonInt8ScaledMMLinearKernel ( #38501 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-04-06 09:42:10 +08:00
Micah Williamson
9570654c6d
[ROCm][CI] Run Kernels Core Operation Test On MI325 and mitigate flakiness ( #38184 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-04-06 09:42:02 +08:00
Netanel Haber
d56e952239
nano_nemotron_vl: fix tensor device mismatch exception when video profiling ( #39029 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-04-05 22:23:45 +00:00
Kevin H. Luu
56de443db1
[ci] Switch some CI jobs to H200 MIG slices ( #38956 )
2026-04-05 13:26:11 -07:00
Greg Pereira
4dd49b06f8
[Bug] Fix Import paths for encoder_cudagraph modules ( #38997 )
...
Signed-off-by: greg pereira <grpereir@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-04-05 19:11:58 +00:00
Greg Pereira
f53fa26e05
[Bugfix] Fix invalid JSON in Gemma 4 streaming tool calls by stripping partial delimiters ( #38992 )
...
Signed-off-by: greg pereira <grpereir@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-04-05 17:11:18 +00:00
Wei Zhao
1af6f78ae5
[Perf] Change Trtllm fp8 MoE to use Shuffled Weights and BlockMajorK Layout ( #38993 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-04-05 10:54:31 -04:00
Martin Vit
228023b3a5
[Bugfix][MoE] Fix 6-8% decode regression: prefer multi-stream shared expert overlap ( #38990 )
...
Signed-off-by: Martin Vit <martin@voipmonitor.org >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-04-05 10:28:31 -04:00
Aaron Batilo
9a528260ef
[Bugfix][Spec Decode] Fix extract_hidden_states for VLM models ( #38987 )
...
Signed-off-by: Aaron Batilo <abatilo@coreweave.com >
2026-04-05 02:41:54 -07:00
Robert Shaw
968ed02ace
[Quantization][Deprecation] Remove Petit NVFP4 ( #32694 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-04-05 00:07:45 +00:00
Robert Shaw
7d266abb22
Revert "[vLLM IR] gemma_rms_norm" ( #38998 )
2026-04-04 17:48:08 -04:00
Xiaoshuang Wang
156405d243
[vLLM IR] gemma_rms_norm ( #38780 )
...
Signed-off-by: Icey <1790571317@qq.com >
2026-04-04 13:55:52 -04:00
Artem Perevedentsev
99e5539a67
[Perf][GDN] Align TMA usage with upstream FLA ( #38981 )
...
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-05 00:38:02 +08:00
Linkun
a88ce94bbb
[IR][RmsNorm] pass None if not has_weight ( #38961 )
...
Signed-off-by: Linkun Chen <github@lkchen.net >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-04-04 11:02:30 -04:00
Ziming Qi
2a36d8fb72
[Bugfix][CPU] Fix macOS compatibility broken by #36487 ( #38970 )
...
Signed-off-by: Ziming (2imi9) <148090931+2imi9@users.noreply.github.com >
2026-04-04 14:05:58 +00:00
lalit10
93726b2a1c
Refactor Arctic loading to use AutoWeightsLoader ( #38955 )
...
Signed-off-by: Lalit Laxminarayan Bangad <lalitbangad@gmail.com >
Co-authored-by: Lalit Laxminarayan Bangad <lalitbangad@meta.com >
2026-04-04 05:01:09 +00:00
Yongye Zhu
8617f8676b
[Bugfix] Fix DSV32 weight loading ( #38870 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2026-04-03 19:57:52 -07:00
Andreas Karatzas
06fd9ffcc4
[ROCm][CI] Fix ROCm Dockerfile conftest generation for older Docker parsers ( #38959 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-04-04 10:41:41 +08:00
Wentao Ye
cab4064cd5
[Bug] Fix workspace manager _current_workspaces size ( #38853 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-04-04 01:29:45 +00:00
Wentao Ye
062f1a2d70
[Bug] Fix compile error for swap_blocks_batch in CUDA 13 ( #38915 )
2026-04-03 16:56:38 -07:00
elenalil-aws
81994e1d0e
[Bugfix][LoRA] Fix missing in_proj_z in Qwen3_5ForConditionalGenerati… ( #38927 )
...
Signed-off-by: elenalil-aws <elenalil@amazon.com >
2026-04-03 23:30:09 +00:00
Andreas Karatzas
4b506ff90a
[ROCm][CI] Minor missing import patch ( #38951 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-04-03 23:01:20 +00:00
Andreas Karatzas
5875bb2e9c
[ROCm][CI] Added back missing common deps ( #38937 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-04-03 15:58:57 -07:00
Kevin H. Luu
f0d3ad9f3e
[ci] Remove soft fail for AMD image build job ( #38941 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-04-03 20:42:33 +00:00
Divin Honnappa
121ea5a21f
Removed GPU state confirmation and cleanup steps. ( #38238 )
...
Signed-off-by: Divin Honnappa <divin.honnappa@amd.com >
2026-04-03 13:11:08 -07:00
Jeffrey Wang
ab79863e6c
Remove MQ multi-node tests ( #38934 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2026-04-03 20:00:08 +00:00
Nick Hill
5f1de2b14b
[Model Runner V2] Add config validation for not-yet-supported features ( #38758 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-04-03 12:08:08 -07:00
yzong-rh
a5a623d961
[Bugfix] Re-enable Renormalize routing for TRT-LLM MoE experts ( #38859 )
...
Signed-off-by: Yifan Zong <yzong@redhat.com >
2026-04-04 01:48:17 +08:00
Xiaoshuang Wang
f8c3af2d85
[vLLM IR] add import_ir_kernels() to support OOT platforms ( #38807 )
...
Signed-off-by: Icey <1790571317@qq.com >
2026-04-03 17:25:19 +00:00
danisereb
50cd5674b3
Fix invalid logprobs with MTP enabled and sync scheduling ( #38711 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-04-03 12:24:37 -04:00
Vasiliy Kuznetsov
7b1a7423be
[Frontend] new online quantization frontend ( #38138 )
...
Signed-off-by: Vasiliy Kuznetsov <vasiliy@meta.com >
2026-04-03 11:58:39 -04:00
Nicolò Lucchesi
97f92c6b47
[KVConnector] Skip register_kv_caches on profiling ( #38558 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-04-03 15:40:16 +00:00
Yusuf Mohammad
46f02e00f2
[Bugfix] Fix AWQ models batch invariance issues ( #38670 )
...
Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet >
Signed-off-by: <>
Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet >
2026-04-03 14:54:15 +00:00
Qiming Zhang
6b4872240f
[XPU] bump up xpu-kernel v0.1.5, transpose moe weights ( #38342 )
...
Signed-off-by: mayuyuace <qiming1.zhang@intel.com >
Signed-off-by: Qiming Zhang <qiming1.zhang@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-04-03 14:10:02 +00:00
Necofish
580090db6b
[Kernel] Add swapAB support for SM120 CUTLASS blockwise FP8 GEMM ( #38325 )
2026-04-03 15:49:59 +02:00
Artem Perevedentsev
cb10b7e80b
[GDN] Eliminate GPU->CPU sync in prepare_chunk_indices during prefill ( #38361 )
...
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com >
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com >
2026-04-03 13:38:02 +00:00
Mieszko Dziadowiec
bf8b022e60
[Intel][Triton] Support round_int8 for Intel backend ( #38825 )
...
Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com >
Co-authored-by: Claude <noreply@anthropic.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-04-03 20:47:35 +08:00
xiangdong
40ee64c00e
[XPU][CI] Skip test_topp_only and test_topk_and_topp cases on Intel GPU in CI ( #38904 )
...
Signed-off-by: zengxian <xiangdong.zeng@intel.com >
2026-04-03 20:44:52 +08:00
wufann
1b117cb0ac
[ROCm] Fix aiter persistent mode mla with q/o nhead<16 for kimi-k2.5 tp8 ( #38615 )
...
Signed-off-by: wufann <36477220+wufann@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-03 03:54:00 -07:00
Anton Ivanov
abebd9323d
[CPU] Replace OMP initialization ( #36487 )
...
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com >
2026-04-03 18:42:43 +08:00
Hyeonki Hong
25f2b55319
[Frontend] feat: add streaming support for token generation endpoint ( #37171 )
...
Signed-off-by: Hyeonki Hong <hyeonki.hong@moreh.io >
2026-04-03 10:20:32 +00:00
xiangdong
cb4ff07f8b
[XPU][CI] Skip test_topk_only cases on Intel GPU in CI ( #38899 )
...
Signed-off-by: zengxian <xiangdong.zeng@intel.com >
2026-04-03 09:50:41 +00:00
Gregory Shtrasberg
a7d79fa133
[ROCm][CI/Build] Fix the pytest hook to properly print out the summary ( #38585 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-04-03 17:24:26 +08:00
Netanel Haber
fa9e68022d
Fix Nano Nemotron VL regressions ( #38655 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-04-03 15:22:06 +08:00
Isotr0py
5506435419
[Misc] Clean up Gemma4 implementation ( #38872 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-04-03 05:47:02 +00:00
Yifan Qiao
311c981647
[MRV2][KVConnector] Fix missing build_connector_worker_meta ( #38698 )
...
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai >
2026-04-03 08:42:52 +03:00
Li, Jiang
21d7ecc5b0
[CI/Build] Add audio deps in Dockerfile.cpu ( #38876 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-04-03 05:05:14 +00:00
Aaron Hao
4729b90838
[Bug] Add e_score_correction_bias to SKIP_TENSORS ( #38746 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-04-02 21:15:05 -07:00
shunting314
8b141ed8c3
full cudagraph for flex-attn ( #36298 )
...
Signed-off-by: shunting314 <shunting@meta.com >
2026-04-02 21:15:01 -07:00
Varun Sundar Rabindranath
2ad7c0335f
[Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B ( #38306 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2026-04-02 21:14:57 -07:00
Bowen Bao
201d2ea5bf
[CI][ROCm] Add Qwen3.5-35B-A3B-MXFP4 model eval into CI ( #38664 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com >
2026-04-03 04:05:45 +00:00
Bowen Bao
103f0de565
[ROCm][Quantization][1/N] Refactor quark_moe w_mxfp4 w/ oracle ( #38774 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-04-03 03:29:57 +00:00
wliao2
32e0c0bfa2
refactor hard coded device string in test files under tests/v1 and tests/lora ( #37566 )
...
Signed-off-by: Liao, Wei <wei.liao@intel.com >
2026-04-03 11:21:47 +08:00
Itay Etelis
4a06e1246e
[Perf] Batch KV cache swap copies via cuMemcpyBatchAsync ( #38460 )
...
Signed-off-by: Itay Etelis <itay.etelis@ibm.com >
Co-authored-by: Itay Etelis <itay.etelis@ibm.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
2026-04-03 03:13:23 +00:00
Carl Y
3bc2734dd0
[Kernel] Fuse FP8 output quantization into merge_attn_states ( #36518 )
...
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com >
2026-04-03 01:47:04 +00:00
Carl Y
1f5ec2889c
[mla] Support fused FP8/NVFP4 output quantization in MLA attention ( #35792 ) ( #36205 )
...
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com >
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-02 21:16:11 -04:00
Yan Ma
ee3cf45739
[XPU] Initial support for GDN attention on Qwen3-next/Qwen3.5 ( #33657 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Co-authored-by: Chendi Xue <chendi.xue@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-04-03 08:59:11 +08:00
Matthew Bonanni
05e68e1f81
[CI] Fix test_nixl_connector ( #38838 )
2026-04-02 17:52:13 -07:00
Vadim Gimpelson
771913e4a0
[Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5 ( #38832 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-04-03 04:45:57 +04:00
1096125073
71a9125c67
[New Model]: add support for telechat3 ( #38510 )
...
Signed-off-by: xiayongqiang <xiayq1@chinatelecom.cn >
Co-authored-by: xiayongqiang <xiayq1@chinatelecom.cn >
2026-04-03 08:26:22 +08:00
Nicolò Lucchesi
66e86f1dbd
[Kernel] Mamba support different layout for Conv state ( #37416 )
2026-04-03 01:50:09 +02:00
Michael
bb39382b2b
[Bugfix]: Fix Gemma4ToolParser.__init__() missing tools parameter ( #38847 )
...
Signed-off-by: Michael Hospedales <hospedales@me.com >
2026-04-02 14:35:19 -07:00
zhanqiuhu
7b743ba953
[CI] Fix: pass string cache_dtype in test_register_kv_caches ( #38836 )
2026-04-02 19:42:09 +00:00
Stefano Castagnetta
188defbd0b
[CI] Add flashinfer.py to attention test source deps ( #38792 )
...
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-04-02 19:24:29 +00:00
Luciano Martins
08ed2b9688
feat(models): implement Google Gemma 4 architecture support (MoE, Multimodal, Reasoning, Tool-Use) ( #38826 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Signed-off-by: Luciano Martins <lucianomartins@google.com >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2026-04-02 11:13:28 -07:00
Yanan Cao
ecd5443dbc
Bump helion dependency from 0.3.2 to 0.3.3 ( #38062 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-02 10:59:33 -07:00
Stefano Castagnetta
58262dec6e
[Bugfix] Fix test mocks after SM100 restriction in #38730 ( #38791 )
...
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-04-02 13:12:58 -04:00
Lucas Wilkinson
cb3935a8fc
[FA4] Update flash-attention to latest upstream FA4 ( #38690 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-04-02 17:02:37 +00:00
Bowen Bao
82a006beeb
[CI][ROCm] Add gpt-oss w4a8 in CI ( #38292 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com >
2026-04-03 00:06:01 +08:00
wang.yuqi
a9b4f07ba2
[Frontend] Re-enable running MaxSim on GPU ( #38620 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-04-03 00:03:13 +08:00
Koushik Dutta
d9408ffba3
Triton MLA perf fixes ( #33529 )
...
Signed-off-by: Koushik Dutta <koushd@gmail.com >
Co-authored-by: root <root@ubuntu-nvidia.localdomain >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-04-02 09:40:01 -04:00
Yusuf Mohammad
16a65e4173
[Bugfix] Enable batch-invariant Triton matmul on all Ampere GPUs (SM 8x) ( #38427 )
...
Signed-off-by: yusuf <yusufmohammad@live.com >
Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet >
Signed-off-by: Yusuf Mohammad <79484377+YM2132@users.noreply.github.com >
Signed-off-by: <>
Co-authored-by: Claude <noreply@anthropic.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet >
2026-04-02 09:29:58 -04:00
bsliu
c0817e4d39
[Model] Add support for Cheers multimodal model ( #38788 )
...
Signed-off-by: bsliu <1187291748@qq.com >
Signed-off-by: 吴炳贤 <wubingxian24@mails.ucas.ac.cn >
2026-04-02 21:01:40 +08:00
Harry Mellor
dfe5e31689
Don't compile vision encoder for Transformers backend ( #30518 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-04-02 12:42:29 +00:00
JartX
2ce3d0ce36
[Feature] KV cache per-token-head INT8/FP8 quantization ( #38378 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: yangyang4991 <yangyang4991@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2026-04-02 08:13:26 -04:00
Jiangyun Zhu
4eefbf9609
[Perf] fuse kernels in gdn ( #37813 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-04-02 11:52:18 +00:00
vllmellm
551b3fb39f
[ROCm] Enable VLLM triton FP8 moe for gfx1201, tuned for Qwen3-30B-A3B-FP8 tp=2 and Qwen/Qwen3.5-35B-A3B-FP8 tp=2 ( #38086 )
...
Signed-off-by: big-yellow-duck <jeffaw99@hotmail.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-04-02 08:13:42 +00:00
Li, Jiang
c6f722b93e
[CPU] Support gelu act in cpu_fused_moe ( #38770 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-04-02 14:14:32 +08:00
Xin Yang
9bd7231106
Revert "[Kernel] Add gpt-oss Router GEMM kernel ( #37205 )" ( #38778 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-04-01 22:02:32 -07:00
Yanan Cao
73f48ce559
[Kernel] [Helion] Use warning_once in get_gpu_name to prevent log spam ( #38743 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com >
2026-04-01 21:30:31 -07:00
Gregory Shtrasberg
3aab680e3e
[ROCm][Bugfix] Fix ROCm runtime failure due to missing symbol ( #38750 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com >
2026-04-01 21:30:11 -07:00
Sergey Zinchenko
5a2d420c17
[Bugfix] Use dedicated MM processor cache in /tokenize to prevent sender-cache pollution ( #38545 )
...
Signed-off-by: Sergey Zinchenko <sergey.zinchenko.rnd@gmail.com >
2026-04-01 21:14:49 -07:00
Benjamin Chislett
5f96f9aff1
[Perf] DSV3.2 Indexer Fused Weights Projection ( #38684 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-04-02 03:34:49 +00:00
Luka Govedič
694449050f
Fix multiline-format string for python 3.10 ( #38739 )
...
Signed-off-by: Luka Govedic <luka.govedic@gmail.com >
2026-04-02 03:19:35 +00:00
Nick Hill
6241521dd2
[BugFix] Fix precommit breakage due to conflicting in-flight merges ( #38759 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-04-01 15:35:06 -07:00
Kevin H. Luu
1785dc5501
Revert "[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params ( #37831 )" ( #38751 )
2026-04-02 06:34:28 +08:00
Chang Su
54500546ac
[Bugfix] Preserve original ImportError in gRPC server entrypoint ( #38673 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com >
2026-04-01 22:16:44 +00:00
Jeffrey Wang
de5e6c44c6
[Feat][Executor] Introduce RayExecutorV2 ( #36836 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2026-04-01 14:34:29 -07:00
yzong-rh
cb268e4e55
[Refactor] Simplify FutureWrapper in MultiprocExecutor ( #38644 )
...
Signed-off-by: Yifan <yzong@redhat.com >
Signed-off-by: Yifan Zong <yzong@redhat.com >
2026-04-01 21:28:26 +00:00
Stefano Castagnetta
6183cae1bd
[Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang ( #38730 )
...
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com >
2026-04-01 12:08:40 -07:00
Monishver
c09ad767cd
Feature/silu block quant fusion v1 ( #32996 )
...
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com >
2026-04-01 18:50:43 +00:00
Wentao Ye
c9a9db0e02
[Compile] Fix nvfp4 compile warning ( #38573 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-04-01 18:28:57 +00:00
Chauncey
cbe7d18096
[Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str ( #38242 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-04-01 09:56:45 -07:00
Michael Goin
db5d0719e1
[Kernel] Add MXFP8 to Marlin GEMM/MoE and refactor Mxfp8LinearOp ( #34664 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-04-01 09:41:42 -07:00
yzong-rh
dc0428ebb8
[NIXL][BUG] Fix Triton heterogeneous TP ( #37940 )
...
Signed-off-by: Yifan <yzong@redhat.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-04-01 17:23:15 +02:00
Jesus Talavera
148c2072ec
Add ibm-granite/granite-vision-3.3-2b to supported models documentation ( #38714 )
...
Signed-off-by: Jesus Talavera <jesus.talavera@ibm.com >
2026-04-01 08:22:25 -07:00
majianhan
2f5c3c1ec0
[Misc] Fix docstring typo: buildin -> builtin ( #38722 )
...
Co-authored-by: majianhan <majianhan@kylinos.cn >
2026-04-01 07:39:46 -07:00
Fynn Schmitt-Ulms
fa246d5231
Fix shape comment in extract_hidden_states example ( #38723 )
...
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com >
2026-04-01 07:29:33 -07:00
bnellnm
7cf56a59a2
[MoE Refactor] Make SharedExperts class for use with DefaultMoERunner ( #35153 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-04-01 09:44:08 -04:00
Elvir Crnčević
5e30e9b9a9
[Bugfix] Revert "Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding" ( #38359 )
...
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-04-01 09:11:10 -04:00
손세정
582340f273
[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params ( #37831 )
...
Signed-off-by: AAISSJ <maze0717@g.skku.edu >
Signed-off-by: <>
Co-authored-by: 세덩 <saison@sedeong-ui-MacBookAir.local >
2026-04-01 20:22:29 +08:00
yjz
992368522f
[KVTransfer] Fix TpKVTopology.is_kv_replicated equality case ( #38179 )
...
Signed-off-by: JianDan0212 <zhangyj0212@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-04-01 12:41:49 +02:00
Juan Pérez de Algaba
58ee614221
(security) Enforce frame limit in VideoMediaIO ( #38636 )
...
Signed-off-by: jperezde <jperezde@redhat.com >
2026-04-01 10:23:45 +00:00
Harry Mellor
f9f6a9097a
Add verified label to trigger pre-commit ( #38708 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-04-01 02:31:02 -07:00
Zhanda Zhu
c75a313824
[Perf] triton bilinear_pos_embed kernel for ViT ( #37948 )
...
Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com >
2026-04-01 01:52:02 -07:00
Lukas Geiger
4f6eed3bd4
[Core] Simplify multimodal masking ( #34246 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2026-04-01 01:18:22 -07:00
Li, Jiang
36d7f19897
[CPU] Support head_size 512 in cpu_attn ( #38676 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-04-01 05:42:27 +00:00
Jeffrey Wang
2d725b89c5
[Bugfix] Lazy import diskcache to avoid sqlite3/libstdc++ ImportError at startup ( #38649 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2026-04-01 05:31:20 +00:00
Augusto Yao
ef53395e2c
[bugfix] do not add extra linebreak for score/rerank with chat template ( #38617 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-04-01 04:50:07 +00:00
Lucas Wilkinson
eb47454987
[Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking ( #36178 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-04-01 00:15:53 -04:00
Matthew Bonanni
116f4be405
[1/N][Cleanup] Standardize on use of is_quantized_kv_cache ( #38659 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-04-01 04:08:01 +00:00
Wentao Ye
7b01d97a22
[Perf] Optimize mean pooling using chunks and index_add, 5.9% E2E throughput improvement ( #38559 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-04-01 03:54:58 +00:00
HarshRathva
17b72fd1c8
Fix priority preemption regression test in scheduler ( #37051 )
...
Signed-off-by: HarshRathva <harshrathvaai@gmail.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
2026-04-01 06:36:12 +03:00
Samu Tamminen
c49497726b
[ROCm][perf] Shuffle KV cache to use paged_attention_common ( #32914 )
...
Signed-off-by: Samu Tamminen <stammine@amd.com >
Co-authored-by: Tuukka Sarvi <tuukka.sarvi@amd.com >
2026-04-01 03:30:19 +00:00
Ben Browning
cb0b443274
[Misc] Add 20 regression tests for 11 tool parser bug fixes ( #38172 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-04-01 03:00:31 +00:00
Luka Govedič
40bb175027
[vLLM IR] 1/N Implement IR skeleton and rms_norm op ( #33825 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
Signed-off-by: chzhang <chaojun.zhang@intel.com >
Signed-off-by: Luka Govedic <luka.govedic@gmail.com >
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com >
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com >
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com >
2026-03-31 22:15:05 -04:00
Elvir Crnčević
0fab52f0aa
Fix NaN from stale FP4 scale padding in create_fp4_scale_tensor ( #38148 )
...
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-03-31 19:14:59 -07:00
Yifan Qiao
91e4521f9f
[Feat][v1] Simple yet General CPU KV Cache Offloading ( #37160 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu >
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai >
2026-03-31 17:58:37 -07:00
Stig-Arne Grönroos
31a719bcd3
[ROCm][perf] fix Aiter sparse MLA with MTP>1 ( #37887 )
...
Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com >
Signed-off-by: Stig-Arne Grönroos <sgronroo@amd.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-31 19:22:23 -04:00
Vedant V Jhaveri
2e56975657
Generative Scoring ( #34539 )
...
Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com >
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-31 16:02:11 -07:00
Chang Su
36f1dc19ae
feat(grpc): add periodic stats logging and servicer log forwarding ( #38333 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com >
2026-03-31 15:50:07 -07:00
Asaf Gardin
3dc01ef352
[Quantization] Consolidate dummy format logic into DummyModelLoader ( #38637 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-03-31 22:20:45 +00:00
Yanan Cao
cc671cb110
[Kernel] [Helion] [17/N] Add Helion kernel torch.compile support ( #38592 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com >
2026-03-31 17:06:42 -04:00
Wentao Ye
856589ed9a
[Refactor] Remove dead code in kv connector and model runner ( #38383 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-31 17:05:23 -04:00
czhu-cohere
517b769b58
[Perf] Fix DBO overlap: capture DeepEP event before yield ( #38451 )
...
Signed-off-by: root <conway.zhu@cohere.com >
2026-03-31 20:38:59 +00:00
yzong-rh
d9b90a07ac
[MoE Refactor] Migrate Unquantized to Full Oracle Flow ( #36286 )
...
Signed-off-by: Yifan Zong <yzong@redhat.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: yzong-rh <yzong@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-31 15:43:33 -04:00
Olya Kozlova
598190aac3
[fix] Remove trtllm ragged mla prefills ( #36540 )
...
Signed-off-by: Olya Kozlova <okozlova@nvidia.com >
2026-03-31 12:30:27 -07:00
Xu Jinyang
b779eb3363
[Model] Sync upstream BT=chunk_size fix for GDN chunk_fwd_kernel_o, simplify warmup to single pass ( #38343 )
...
Signed-off-by: AuYang <459461160@qq.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
2026-03-31 23:03:24 +04:00
BadrBasowid
077a9a8e37
[torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate ( #37373 )
...
Signed-off-by: BadrBasowid <badr.basowid@gmail.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-03-31 14:15:50 -04:00
Run Yu
07edd551cc
[CI/Build] Resolve a dependency deadlock when installing the test dependencies used in CI ( #37766 )
...
Signed-off-by: Run Yu <yurun00@gmail.com >
2026-03-31 18:05:14 +00:00
mikaylagawarecki
7c080dd3c5
[4/n] Migrate FP4/W4A8 CUTLASS kernels to torch stable ABI ( #37503 )
...
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com >
2026-03-31 10:21:13 -07:00
Yi Liu
0dd25a44ea
[Quantization][Autoround][XPU] Add W4A16 Support ( #37986 )
...
Signed-off-by: yiliu30 <yi4.liu@intel.com >
2026-03-31 16:48:24 +00:00
SandishKumarHN
3896e021a0
[Bugfix] Fix FusedMoE weight loading with padded hidden dimensions ( #37010 )
...
Signed-off-by: SandishKumarHN <sandish@fb.com >
2026-03-31 12:22:26 -04:00
zhang-prog
b6e636c12c
[Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 ( #38629 )
...
Signed-off-by: zhangyue66 <zhangyue66@baidu.com >
2026-03-31 15:50:41 +00:00
Jingu Kang
f1ff50c86c
[Bugfix] clamp dA_cumsum differences to prevent Inf in Mamba2 SSD kernels ( #37501 )
...
Signed-off-by: Jingu Kang <jg.k@navercorp.com >
2026-03-31 17:35:51 +02:00
Matthew Bonanni
757068dc65
[Bugfix][Async] Fix async spec decoding with hybrid models ( #38556 )
...
Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com >
2026-03-31 11:08:54 -04:00
Nicolò Lucchesi
7337ff7f03
[Docs] PD with Nixl compat matrix ( #38628 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-31 15:01:21 +00:00
Kyle Sayers
5869f69c5f
[Online Quant] [QeRL] Minor code cleanup ( #38574 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2026-03-31 14:56:43 +00:00
wliao2
4dfad17ed1
replace cuda_device_count_stateless() to current_platform.device_count() ( #37841 )
...
Signed-off-by: Liao, Wei <wei.liao@intel.com >
Signed-off-by: wliao2 <wei.liao@intel.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-31 22:32:54 +08:00
wenjun liu
e8057c00bc
[CI] Avoid concurrent docker pull in intel XPU CI runners to prevent rate limit issues ( #38594 )
...
Signed-off-by: wendyliu235 <wenjun.liu@intel.com >
2026-03-31 22:23:18 +08:00
Nicolò Lucchesi
7430389669
[Bugfix][CI] Skip flaky test_eagle test ( #38566 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-31 09:42:37 -04:00
ElizaWszola
202f147cf2
Fix MLA runs when use_inductor_graph_partition=True ( #38631 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
2026-03-31 13:37:43 +00:00
Jiangyun Zhu
ea7bfde6e4
[CI] fix LM Eval Qwen3.5 Models (B200) ( #38632 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-03-31 13:20:08 +00:00
sihao_li
d71a15041f
[XPU]move testing dependencies from Dockerfile to xpu-test.in ( #38596 )
...
Signed-off-by: sihao.li <sihao.li@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-31 12:49:43 +00:00
Ilya Markov
abdbb68386
[EPLB] Add alternative communication for EPLB weight exchange ( #33176 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Markov Ilya <markovilya19@gmail.com >
Co-authored-by: Markov Ilya <markovilya19@gmail.com >
2026-03-31 08:17:12 -04:00
liuzhenwei
0c63739135
[EPD] update EPD script arguments ( #36742 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2026-03-31 12:02:09 +00:00
wang.yuqi
719735d6c5
[CI Failure] pin colmodernvbert revision ( #38612 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-31 10:54:54 +00:00
Maosheng Liao
aae3e688f8
Fix document of torchrun_example.py ( #31113 )
2026-03-31 10:54:23 +00:00
Matthew Bonanni
7d65463528
[WIP][CI][Bugfix] Fix test_run_eagle_dp ( #38584 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-31 12:30:25 +02:00
Mateusz Sokół
8278825b57
DOC: TPU mention fix ( #38129 )
...
Signed-off-by: Mateusz Sokół <mat646@gmail.com >
2026-03-31 03:27:56 -07:00
Chang Su
acf7292bf2
[Misc] Move --grpc CLI argument into make_arg_parser ( #38570 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com >
2026-03-31 03:24:05 -07:00
Chauncey
ce884756f0
[Feature]: add presence_penalty and frequency_penalty fields to Responses API ( #38613 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-31 08:45:57 +00:00
wang.yuqi
d9d21eb8e3
[Frontend][3/n] Improve pooling entrypoints | scoring. ( #28631 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-31 07:52:00 +00:00
Yintong Lu
f09daea261
[CPU] Support int8 compute mode in CPU AWQ ( #35697 )
...
Signed-off-by: Yintong Lu <yintong.lu@intel.com >
2026-03-31 15:27:37 +08:00
Kevin H. Luu
42318c840b
[ci] Remove benchmarks job ( #38611 )
2026-03-31 06:46:21 +00:00
zhangyiming
1ac6694297
[OOT] Add OOT support for linear kernel. ( #37989 )
...
Signed-off-by: menogrey <1299267905@qq.com >
2026-03-31 14:33:21 +08:00
Kfir Toledo
6cc7abdc66
[kv_offload+HMA] Fix num_blocks with different per-layer page sizes and improve assert message ( #38554 )
...
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
2026-03-31 06:00:40 +00:00
Flora Feng
d53cb9cb8e
[Tool Parser][2/3] Use self.tools instead of request.tools in tool parsers ( #38189 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-31 13:41:36 +08:00
Louie Tsai
44eef0ca1e
vLLM Benchmark Suite perf regression after PR#32723 ( #38576 )
...
Signed-off-by: louie-tsai <louie.tsai@intel.com >
2026-03-31 05:23:17 +00:00
Andreas Karatzas
b9cdc85207
[ROCm][CI] Fix Whisper translation test attention backend selection ( #38508 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-31 13:21:49 +08:00
Flora Feng
3e802e8786
[Mypy] Fix adjust_request typing ( #38264 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-31 04:21:18 +00:00
Martin Hickey
350af48e14
[KVConnector] Remove redundant method KVConnectorOutput::merge() ( #38546 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
2026-03-31 07:11:02 +03:00
Lucas Kabela
e31915063d
[Bugfix] Fix for builtins (forward fix of pytorch/177558) ( #37234 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-03-31 01:08:11 +00:00
Flora Feng
29e48707e8
[Refactor] Consolidate Tool type alias in tool_parsers/utils.py ( #38265 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-31 00:55:51 +00:00
sungsoo ha
4ac227222f
[Bugfix][DCP] Fix CUDA graph capture for Decode Context Parallelism ( #36070 )
...
Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-30 20:20:43 -04:00
Vadim Gimpelson
bb51d5b40d
Add @vadiklyutiy as committer ( #38589 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-03-31 07:50:04 +08:00
Prathmesh Bhatt
93b3ec1585
feat(attention): extract KV-cache update from FlashAttentionDiffKV ba… ( #36466 )
...
Signed-off-by: Prathmesh Bhatt <71340361+Prathmesh234@users.noreply.github.com >
2026-03-30 23:16:09 +00:00
Netanel Haber
e812bf70bd
Restore non-hf processor path for Nano-Nemotron-VL (bypass call_hf_processor_mm_only) - fixes #38018 ( #38567 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com >
2026-03-30 21:56:52 +00:00
SandishKumarHN
bcc6f67447
[Bugfix] Use null block (0) for padded block table entries ( #35431 )
...
Signed-off-by: SandishKumarHN <sandish@fb.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-30 14:02:51 -07:00
Asaf Gardin
1fc69f59bb
[Bug fix][Quantization] Fix dummy weight loading ( #38478 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-03-30 16:38:02 -04:00
Micah Williamson
d9c7db18da
[ROCm][CI] Pin test_hybrid test to TRITON_ATTN on ROCm ( #38381 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-03-30 20:26:46 +00:00
Ilya Markov
12701e8af2
[EPLB] Optmize eplb mapping and record in router for prefill ( #36261 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-03-30 19:48:33 +00:00
Benjamin Chislett
494636b29d
[Feat][Spec Decode] DFlash ( #36847 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-03-30 15:03:15 -04:00
mikaylagawarecki
ab1a6a43fa
[3/n] Migrate cutlass/scaled_mm_entry.cu torch stable ABI ( #37221 )
...
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com >
2026-03-30 11:20:13 -07:00
fangyuchu
b5e608258e
[Refactor] Unify engine process monitoring in engine manager and add Ray backend support ( #35862 )
...
Signed-off-by: fangyuchu <fangyuchu@qq.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-30 10:16:09 -07:00
Matthew Bonanni
2c734ed0e0
[Bugfix][MLA] Change default SM100 MLA prefill backend back to TRT-LLM ( #38562 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-30 09:51:24 -07:00
Chendi.Xue
3b1dbaad4e
[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) ( #37467 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-03-30 16:47:30 +00:00
Johnny
b4a2f3ac36
[NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 ( #38423 )
...
Signed-off-by: johnnynunez <johnnynuca14@gmail.com >
Signed-off-by: Johnny <johnnynuca14@gmail.com >
2026-03-30 09:36:18 -07:00
roikoren755
8e6293e838
[Mamba] Add stochastic rounding support ( #35753 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-03-30 12:33:49 -04:00
Hongxia Yang
dbdd9ae067
[ROCm][Bugfix] fix exception related to trust_remote_code for MiniMax-M2.1-MXFP4 ( #37698 )
...
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com >
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com >
2026-03-30 15:49:23 +00:00
Matthias Gehre
e8b055a5ac
[Bugfix] Handle ParallelLMHead in compressed-tensors get_quant_method ( #37291 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-03-30 07:30:52 -07:00
tomeras91
246dc7d864
[Misc] Add @tomeras91 as a maintainer of Nemotron related code + mamba block ( #38547 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2026-03-30 21:12:17 +08:00
Thomas Parnell
7c3f88b2a8
[Bugfix] Remove false-positive format mismatch warnings in FLA ops ( #38255 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2026-03-30 12:32:26 +00:00
Li, Jiang
6557f4937f
[Bugfix][CPU] Skip set_num_threads after thread binding ( #38535 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-03-30 20:13:00 +08:00
Andreas Karatzas
677424c7ac
[Core][CI] Add opt-in media URL caching via VLLM_MEDIA_CACHE ( #37123 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-30 04:58:53 -07:00
Collin McCarthy
1031c84c36
Fix ambiguous num_blocks for hybrid attn mamba ( #37236 )
...
Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com >
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-03-30 11:09:45 +00:00
aliialsaeedii
7e76af14fa
[Bugfix][Frontend] Return 400 for corrupt/truncated image inputs instead of 500 ( #38253 )
...
Signed-off-by: aliialsaeedii <ali.al-saeedi@nscale.com >
2026-03-30 10:26:46 +00:00
yzong-rh
3683fe6c06
[Bugfix] Fix shared-object aliasing in n>1 streaming with tool calls ( #38158 )
...
Signed-off-by: Yifan Zong <yzong@redhat.com >
Signed-off-by: Yifan <yzong@redhat.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-03-30 10:12:13 +00:00
Nicolò Lucchesi
cc06b4e86b
[Mamba][Bugfix] Raise on insufficient cache blocks instead of silently capping cudagraph sizes ( #38270 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-30 09:41:50 +00:00
TJian
03ac6ca895
[ROCm] [DOC] Update the Documentation to include ROCm Nightly Wheel support ( #38457 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-03-30 02:25:46 -07:00
haosdent
a08b7733fd
[CI] Fix SPLADE pooler test broken by #38139 ( #38495 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-03-30 07:48:33 +00:00
Tan Pin Siang
85c0950b1f
[ROCm] Enable MORI EP for unquantized MoE with AITER backend ( #37529 )
...
Signed-off-by: Tan Pin Siang <pinsiang.tan@amd.com >
2026-03-30 15:19:33 +08:00
Juan Pérez de Algaba
57861ae48d
(security) Fix SSRF in batch runner download_bytes_from_url ( #38482 )
...
Signed-off-by: jperezde <jperezde@redhat.com >
2026-03-30 07:10:01 +00:00
Jee Jee Li
ac30a8311e
[Bugfix][Model] Fix PixtralForConditionalGeneration LoRA ( #36963 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-29 23:59:42 -07:00
PikaPikachu
63babd17f1
[Model][Quantization] Add GGUF support for MiniMax-M2.1 ( #36965 )
...
Signed-off-by: kangletian <Letian.Kang@amd.com >
2026-03-30 14:24:06 +08:00
Kevin H. Luu
fec5aeca12
[ci] Soft fail and disable retry for AMD build image job ( #38505 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-03-29 23:05:26 -07:00
Jaewon
d816834c1a
[MoE] Add RoutingMethodType.Simulated to TRT-LLM FP8/NVFP4 kernel allowlists ( #38329 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
2026-03-29 22:53:43 -07:00
Roger Wang
92f0db57a8
[Misc] Always use forward_mulmat for Conv3d on newer versions of torch. ( #38487 )
2026-03-30 05:39:41 +00:00
Andreas Karatzas
bea23536f6
[CI] Add temperature=0.0, reduce max_tokens, and add debug prints to audio_in_video tests ( #38492 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-30 05:36:45 +00:00
Jiangyun Zhu
c133f33746
Add @ZJY0516 to CODEOWNERS ( #38497 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-03-29 21:10:00 -07:00
Stanislav Kirillov
a6db99ba02
[Bugfix] Support multi-type params parsing for DeepSeek v3.2 ( #33703 )
...
Signed-off-by: Stanislav Kirillov <stas@nebius.com >
Co-authored-by: Stanislav Kirillov <stas@nebius.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-03-30 04:07:28 +00:00
Andreas Karatzas
4f2ed5fddb
[ROCm][CI] Enable hybrid chunked prefill test ( #38317 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-30 10:30:26 +08:00
Kyle Sayers
d28d86e8a3
[QeRL] Fix online quantized reloading ( #38442 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2026-03-29 14:56:41 -06:00
Wentao Ye
995dea1354
[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement ( #38139 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-29 18:12:50 +00:00
allgather
8c0b6267d7
[Transformers v5] fix missing pixtral/voxtral multimodal dispatch ( #38410 )
...
Signed-off-by: allgather <all2allops@gmail.com >
2026-03-29 09:59:06 +00:00
Andreas Karatzas
43cc5138e5
[ROCm][CI] Fix cross-attention dispatch for encoder-decoder models ( #38450 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-28 22:08:03 -07:00
Shubhra Pandit
5b8c30d62b
[Spec Decode, BugFix] Propagate norm_before_fc from Eagle3 speculator ( #38111 )
...
Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com >
2026-03-29 00:42:06 +00:00
haosdent
d39b8daf5f
[Feature] Add Qwen3-ForcedAligner support via token classification pooling ( #35367 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-03-29 00:27:52 +00:00
Walter Beller-Morales
fafca38adc
[BugFix][Frontend] apply task instruction as system prompt in cohere v2/embed ( #38362 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-03-28 18:30:54 +00:00
Kunshang Ji
aa4eb0db78
[CI]revert initialize_model context manager ( #38426 )
...
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-28 16:56:50 +00:00
Andreas Karatzas
af89140efc
[ROCm][CI] Fix UV install in Dockerfile.rocm to detect curl failures and retry ( #38415 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-29 00:47:42 +08:00
haosdent
b2bc736b12
[CI] Fix Ernie4.5-VL initialization test ( #38429 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-03-28 22:43:24 +08:00
whyiug
58c959a767
[Misc]: clean up non-core lint issues ( #37049 )
...
Signed-off-by: whyiug <whyiug@hotmail.com >
2026-03-28 10:28:16 -04:00
Bvicii
bda3eda82d
[Bugfix] Disallow renderer_num_workers > 1 with mm processor cache ( #38418 )
...
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com >
2026-03-28 06:32:52 -07:00
Michael Goin
2bf5b70ae8
[CI Bugfix] Pre-download missing FlashInfer headers in Docker build ( #38391 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-03-28 06:09:00 -07:00
yzong-rh
6dad4c5722
[Test] Fix flaky race condition in test_abort_final_step ( #38414 )
...
Signed-off-by: Yifan <yzong@redhat.com >
2026-03-28 09:06:56 +00:00
Liwen
171775f306
Fix Device Index for ROCm Ray Workers in MoE Benchmark ( #38108 )
...
Signed-off-by: Liwen <53441624+li-liwen@users.noreply.github.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-28 08:27:11 +00:00
TJian
58a249bc61
[ROCm] [Release] Update ROCm variant from rocm700 to rocm721 ( #38413 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-03-28 06:07:03 +00:00
IriKa
148a5c1226
[Bugfix]fix output Nan/Inf in marlin if dtype=float16 ( #33972 )
...
Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com >
2026-03-27 16:36:08 -07:00
Wei Zhao
b69bf2f0b1
[Perf] Use torch compile to fuse pack topk in trtllm moe ( #37695 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com >
2026-03-27 17:30:46 -06:00
rongfu.leng
88149b635e
Add nvidia h800 moe config ( #31201 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2026-03-27 16:28:48 -07:00
Hongxia Yang
83a4df049d
[ROCm][Documentation] update quickstart and installation to include rocm nightly docker tips ( #38367 )
...
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com >
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com >
2026-03-27 23:20:19 +00:00
Gregory Shtrasberg
731285c939
[ROCm][CI/Build] ROCm 7.2.1 release version; torch 2.10; triton 3.6 ( #38252 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-03-27 18:03:12 -05:00
Johnny
97d19197bc
[NVIDIA] Fix DGX Spark logic ( #38126 )
...
Signed-off-by: johnnynunez <johnnynuca14@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
Signed-off-by: Sathish Sanjeevi <sathish.krishnan.p.s@gmail.com >
Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com >
Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Andreas Karatzas <akaratza@amd.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com >
Co-authored-by: Sathish Sanjeevi <SKPsanjeevi@users.noreply.github.com >
Co-authored-by: Guillaume Guy <guillaume.c.guy@gmail.com >
Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-03-27 15:26:07 -07:00
Giancarlo Delfin
384e4d5f48
[Model Runner V2] Rebuild attention metadata before eagle decode full… ( #38311 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-03-27 13:46:42 -07:00
Nicolò Lucchesi
44a6528028
[CI] Skip failing test ( #38369 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-27 13:25:19 -07:00
Kyle Sayers
648edcf729
[QeRL] Compose online quantization with quantized reloading ( #38032 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2026-03-27 13:22:33 -07:00
Michael Goin
7ba425e916
Add short flag -sc for --speculative-config argument ( #38380 )
...
Co-authored-by: Claude <noreply@anthropic.com >
2026-03-27 12:04:22 -07:00
Gregory Shtrasberg
b8665383df
[ROCm] Fix GPT-OSS import for triton 3.6 ( #37453 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-03-27 18:00:57 +00:00
Rohan Potdar
0e9358c11d
{ROCm]: gpt-oss fusion/padding fixes ( #38043 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com >
Co-authored-by: Andreas Karatzas <akaratza@amd.com >
2026-03-27 12:19:15 -04:00
Harry Mellor
21d2b53f88
Remove need for explicit \n in docstring lists for --help formatting ( #38350 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-27 08:38:00 -07:00
Jonas M. Kübler
98e7f223b9
enable skipping of SW attention layers when using FP8 KV cache ( #33695 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
2026-03-27 07:25:02 -06:00
Juan Pérez de Algaba
b111f8a61f
fix(security): Add VLLM_MAX_N_SEQUENCES environment variable and enforce limit ( #37952 )
...
Signed-off-by: jperezde <jperezde@redhat.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2026-03-27 09:02:10 -04:00
Sage Moore
497e234d38
[EPLB] Cleanup the transfer logic for the various eplb maps ( #34520 )
...
Signed-off-by: Sage Moore <sagmoore@redhat.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2026-03-27 10:18:46 +01:00
dtc
6287e7fa20
[P/D] Mooncake: Add unit tests and minor fixes for mooncake connector ( #36946 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com >
2026-03-27 09:26:40 +01:00
Shengqi Chen
84e439a9cb
[CI/Build] Move nightly wheel index generation to a single post-build step ( #38322 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-03-27 07:44:18 +00:00
Yuichiro Utsumi
a1746ff9ec
[Doc] Clarify Helm chart location in deployment guide ( #38328 )
...
Signed-off-by: Yuichiro Utsumi <utsumi.yuichiro@fujitsu.com >
Signed-off-by: Yuichiro Utsumi <81412151+utsumi-fj@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-27 15:43:02 +08:00
Flora Feng
aee4c14689
[Bugfix] Fix Hermes tool parser when stream interval > 1 ( #38168 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-27 14:42:26 +08:00
Bowen Bao
0ae89f18fd
[Refactor] Move FusedMoE hidden_size roundup to quant_method ( #34285 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com >
2026-03-26 23:38:26 -07:00
wenjun liu
c2b17d71af
[CI] Add xpu auto-label rule for Intel GPU/XPU PRs ( #38320 )
...
Signed-off-by: wendyliu235 <wenjun.liu@intel.com >
2026-03-27 14:22:38 +08:00
Li, Jiang
becaed6ec8
[CPU] Support CT W4A16 on CPU MP kernel ( #38219 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-03-27 14:15:28 +08:00
Xiaoshuang Wang
a8eab8f30d
[Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 ( #37975 )
...
Signed-off-by: wxsIcey <1790571317@qq.com >
Signed-off-by: Icey <1790571317@qq.com >
2026-03-27 14:13:21 +08:00
cjackal
2babac0bed
[frontend] dump openai responses type by alias ( #38262 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
2026-03-27 05:58:20 +00:00
Or Ozeri
7cc302dd87
[kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models ( #37853 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-03-27 08:38:33 +03:00
Bvicii
999dfc1622
[Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop ( #34789 )
...
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-26 22:17:00 -07:00
wenjun liu
d86060122a
[CI/Build] enable Intel XPU test flow with prebuilt image ( #37447 )
...
Signed-off-by: wendyliu235 <wenjun.liu@intel.com >
2026-03-26 18:16:04 -07:00
Harry Mellor
f73bcb1c51
Various Transformers v5 config fixes ( #38247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-26 23:06:59 +00:00
yzong-rh
28048bd6b0
[Bugfix] Add missing f-string prefix in xgrammar choices error message ( #38162 )
...
Signed-off-by: Yifan Zong <yzong@redhat.com >
2026-03-26 21:43:03 +00:00
Giancarlo Delfin
c32e97602d
[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling ( #38045 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-03-26 13:38:12 -07:00
Wei Zhao
0904b6550d
Fix multi-node allreduce fusion ( #38136 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: root <root@theia0053.lyris.clusters.nvidia.com >
2026-03-26 20:24:36 +00:00
Stig-Arne Grönroos
f26fcdfb9e
[Bugfix][ROCm] Fix lru_cache on paged_mqa_logits_module ( #37547 )
...
Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com >
2026-03-26 19:01:05 +00:00
TJian
bc9c6fbbe6
[ROCm] [Bugfix] [Release] Fix nightly rocm release pipeline ( #38263 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-03-26 18:47:10 +00:00
Andreas Karatzas
bff9a1c266
[ROCm][CI] Override PYTORCH_ROCM_ARCH with detected GPU arch in test containers ( #38165 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-26 18:33:45 +00:00
Andreas Karatzas
db01535e2b
[ROCm][CI] Add uv pip compile workflow for rocm-test.txt lockfile ( #37930 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-26 12:44:01 -05:00
jennyyyyzhen
a4cf9b22ba
[ROCM][Bugfix] Use correct stride in cp_mha_gather_cache_kernel for hybrid model ( #37228 ) ( #37228 )
...
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu >
Co-authored-by: yZhen <yZhen@fb.com >
2026-03-26 10:33:39 -07:00
Andreas Karatzas
9c3ae04bfe
[ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 ( #38155 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-26 16:51:18 +00:00
Andreas Karatzas
a8e48a7b85
[CI] Fix conch kernel crash on 3D input by reshaping to 2D before GEMM ( #38178 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-26 11:46:03 -05:00
Divakar Verma
b9dbc5c4ab
[Mamba][APC] Add test case to compare apc outputs ( #34977 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-03-26 16:40:35 +00:00
TJian
60af7b967b
[Releases] [ROCm] Enable Nightly Docker Image and Wheel Releases for ROCm ( #37283 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com >
2026-03-26 16:32:25 +00:00
Andreas Karatzas
bdc1719eb9
[ROCm][CI] Fix AITER state leak in shared_fused_moe_routed_transform test ( #38137 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-26 09:26:46 -07:00
haosdent
0aac2048bf
[Bugfix] Restore CUDA graph persistent buffers for FP8 FlashMLA decode ( #35175 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-26 16:13:39 +00:00
Chuan (Richard) Li
cb2263218e
[Bugfix][Minor] Fix potential NameError in mamba backend selector and misc typos ( #35886 )
...
Signed-off-by: Li <chuali@amd.com >
2026-03-26 11:59:24 -04:00
Wentao Ye
e054f152fa
[CI] Add batch invariant test for b200 ( #38014 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-26 11:54:54 -04:00
zhang-prog
0f5b526040
[Fix] Remove unused packing_position_embedding from PaddleOCRVL for better checkpoint compatibility ( #38232 )
...
Signed-off-by: zhangyue66 <zhangyue66@baidu.com >
2026-03-26 15:34:49 +00:00
Zhewen Li
be1a85b7a2
Revert "[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration" ( #38050 ) ( #38169 )
...
Co-authored-by: Zhewen Li <zhewenli@inferact.ai >
2026-03-26 07:59:09 -07:00
Cyrus Leung
2e225f7bd2
[Renderer] Consolidate factory methods ( #38218 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-26 12:19:22 +00:00
Jared Wen
757eafcf37
[bug-fix] GLM OCR Patch Merger context_dim ( #37962 )
...
Signed-off-by: JaredforReal <w13431838023@gmail.com >
2026-03-26 05:11:21 -07:00
wang.yuqi
dcdc145893
[CI] Reorganize scoring tests ( #38207 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-26 12:07:01 +00:00
Andreas Karatzas
f2d16207c7
[ROCm][CI] Fix flaky GPTQ compile correctness test ( #38161 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-26 19:57:00 +08:00
Andreas Karatzas
37a83007fe
[ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm ( #38167 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-26 19:54:59 +08:00
Wentao Ye
bf5eec638d
[Refactor] Remove unused utils ( #38153 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-26 17:08:19 +08:00
Mateusz Sokół
b1cb1d3d2c
DOC: Documentation pages fixes ( #38125 )
...
Signed-off-by: Mateusz Sokół <mat646@gmail.com >
2026-03-26 16:55:42 +08:00
Kunshang Ji
6ae8bbd0c2
[XPU] Disable xpu graph by default ( #38193 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-26 01:53:45 -07:00
Cyrus Leung
a9213c0ffe
[Doc] Fix outdated reference to CUDAGraphManager ( #38209 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-26 01:52:38 -07:00
Cyrus Leung
502c41a8f6
[Model] Use helper function to run MM processors with token inputs (where applicable) ( #38018 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-26 16:44:04 +08:00
Vadim Gimpelson
52069012fe
[Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell ( #38083 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-03-26 01:21:47 -07:00
Fadi Arafeh
71161e8b63
[cpu][ci] remove soft-fail for Arm CI and add quant model tests ( #37691 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-03-26 07:03:31 +00:00
Terry Gao
38de822310
[Model] Add torch.compile support for InternVL vision encoder ( #38049 )
...
Signed-off-by: tianrengao <terrygao87@gmail.com >
2026-03-25 23:52:29 -07:00
Jee Jee Li
2bfbdca23c
[Bugfix] Fix benchmark_fused_collective.py ( #38082 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-25 23:51:00 -07:00
Matej Rojec
2908094567
Add /v1/chat/completions/batch endpoint for batched chat completions ( #38011 )
...
Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com >
2026-03-26 12:13:33 +08:00
BadrBasowid
e6bf9f15ec
[Bugfix][CI] Fix Marlin FP8 Linear Kernel for Compressed Tensors Format ( #38092 )
...
Signed-off-by: BadrBasowid <Badr.Basowid@gmail.com >
Signed-off-by: BadrBasowid <61441185+BadrBasowid@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-25 21:11:43 -07:00
Woosuk Kwon
144030c84e
Relocate Encoder CUDA graph manager ( #38116 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-25 20:52:12 -07:00
Flora Feng
e2db2b4234
[Tool Parser][1/3] Pass tools to ToolParser constructor ( #38029 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-26 10:29:06 +08:00
Chauncey
87f05d6880
[Revert] Remove DeepGEMM availability check in DeepseekV32IndexerMetadataBuilder ( #38076 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-26 01:43:51 +00:00
Andreas Karatzas
36f6aede23
[Misc] Optimized check to encapsulate both CUDA and ROCm platforms ( #34549 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-26 09:43:07 +08:00
Xin Yang
9704a5c310
Disable dual stream execution of input projection for Qwen3 ( #38152 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-03-26 01:20:39 +00:00
Wei Zhao
74056039b7
Fix minimax m2.5 nvfp4 kv scales weight loading ( #37214 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-03-26 00:48:06 +00:00
Jacob Platin
d7d51a7ee5
[Bugfix] Fix Qwen3.5-FP8 Weight Loading Error on TPU ( #37348 )
...
Signed-off-by: Jacob Platin <jacobplatin@google.com >
2026-03-26 00:46:01 +00:00
Harry Mellor
3c3c084240
Various Transformers v5 fixes ( #38127 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-26 00:10:08 +00:00
Ekagra Ranjan
7b54f60db0
[Cohere] Enable Cohere-Transcribe ( #38120 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2026-03-25 16:13:51 -07:00
Rohan Potdar
a0e8c74005
[ROCm]: Update rope+kvcache fusion conditions and disable custom op by default ( #36716 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-03-25 20:58:44 +00:00
Guillaume Guy
70a2152830
[MultiModal] add support for numpy array embeddings ( #38119 )
...
Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com >
Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com >
Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-03-25 20:13:04 +00:00
Sathish Sanjeevi
978fc18bf0
[ROCm] Utilize persistent MLA kernel from AITER ( #36574 )
...
Signed-off-by: Sathish Sanjeevi <sathish.krishnan.p.s@gmail.com >
2026-03-26 03:00:42 +08:00
Andreas Karatzas
7d6917bef5
[ROCm] Fix MoE kernel test failures on gfx950 ( #37833 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-03-25 13:46:40 -05:00
Mark McLoughlin
e38817fadb
[Core][KV Connector] Remove use of num_cached_tokens in error handling ( #38096 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-03-25 18:20:48 +00:00
Nick Hill
72cad44d3c
[Frontend] Move APIServerProcessManager target server fn ( #38115 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-25 18:14:41 +00:00
Cyrus Leung
ba2f0acc2d
[Misc] Reorganize inputs ( #35182 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-25 10:22:54 -07:00
Yongye Zhu
678b3c99e8
[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration ( #38050 )
2026-03-25 10:16:40 -07:00
mikaylagawarecki
bf4cc9ed2d
[2/n] Migrate per_token_group_quant to torch stable ABI ( #36058 )
...
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com >
2026-03-25 10:15:13 -07:00
Ben Browning
1ac2ef2e53
[CI/Docs] Improve aarch64/DGX Spark support for dev setup ( #38057 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-25 09:24:42 -07:00
Richard Zou
6e37c46b35
[compile] Add some more startup tests for top models ( #38046 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-25 12:02:22 -04:00
Wentao Ye
1bf2ddd0ee
[Refactor] Rename WAITING_FOR_FSM to WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR ( #38048 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-25 11:41:44 -04:00
Necofish
e7221180e1
[Kernel] Optimize SM120 CUTLASS blockwise FP8 GEMM ( #37970 )
...
Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-03-25 08:20:04 -07:00
RobTand
4a76ad12e0
[Bugfix] Preserve CUDA arch suffix (a/f) for SM12x — fixes NVFP4 NaN on desktop Blackwell ( #37725 )
...
Signed-off-by: Rob Tand <robert.tand@icloud.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2026-03-25 08:18:25 -07:00
Wentao Ye
d7e93e13fb
[Feature] EPLB Support for GPU Model Runner v2 ( #37488 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-25 08:16:39 -07:00
Andrii Skliar
cd7643015e
[Feature] Support per-draft-model MoE backend via --speculative-config ( #37880 )
...
Signed-off-by: Andrii Skliar <askliar@nvidia.com >
Signed-off-by: [Andrii Skliar] <askliar@nvidia.com >
Co-authored-by: Andrii Skliar <askliar@nvidia.com >
2026-03-25 14:31:52 +00:00
Ben Browning
a1a2566447
[Docs] Add guide for editing agent instruction files ( #37819 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2026-03-25 13:54:09 +00:00
yjz
b745e8b5d3
[KVTransfer][Mooncake] Add heterogeneous TP support for disaggregated P/D in MooncakeConnector ( #36869 )
...
Signed-off-by: JianDan0212 <zhangyj0212@gmail.com >
2026-03-25 14:24:07 +01:00
Harry Mellor
d215d1efca
[Mypy] Better fixes for the mypy issues in vllm/config ( #37902 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-25 06:14:43 -07:00
Fadi Arafeh
34d317dcec
[CPU][UX][Perf] Enable tcmalloc by default ( #37607 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-03-25 20:39:57 +08:00
grYe99
7ac48fd357
[Model] Add AutoWeightsLoader support for jais ( #38074 )
...
Signed-off-by: grYe99 <guorongye99@gmail.com >
Co-authored-by: grYe99 <guorongye99@gmail.com >
2026-03-25 12:38:40 +00:00
Harry Mellor
d6bb2a9d9a
Fix Plamo 2/3 & LFM2 for Transformers v5 ( #38090 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-25 12:29:49 +00:00
Harry Mellor
1e673a43ce
Better weight tying check for multimodal models ( #38035 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-25 12:07:23 +00:00
Andreas Karatzas
04417ecd5f
[ROCm][CI] Rename filepath test to point to correct file ( #38102 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-25 20:05:46 +08:00
R0CKSTAR
242c93f744
[Docs] Adds vllm-musa to custom_op.md ( #37840 )
...
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
2026-03-25 11:54:36 +00:00
Matthias Gehre
a889b7f584
[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 ( #37280 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-03-25 11:42:58 +00:00
Harry Mellor
ba2910f73a
Fix offline mode test for Transformers v5 ( #38095 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-25 11:39:48 +00:00
Andreas Karatzas
f262a62aa1
[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test ( #37616 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-25 10:55:51 +00:00
Andreas Karatzas
9ac2fcafbb
[CI] Fix realtime WebSocket timeout deadlock and unhandled model validation errors ( #37483 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-25 11:24:33 +01:00
Kunshang Ji
e9ae3f8077
[Hardware][XPU] Align memory usage with cuda on xpu ( #37029 )
...
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-25 18:14:29 +08:00
Andreas Karatzas
04cec4f927
[ROCm][CI] Increase OpenAPI schema test timeouts ( #38088 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-25 18:06:58 +08:00
Kunshang Ji
14771f7150
[XPU] support MLA model on Intel GPU ( #37143 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-25 17:43:42 +08:00
Gregory Shtrasberg
189ddefbfd
[ROCm] Attention selector reordering ( #36702 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
Co-authored-by: Micah Williamson <micah.williamson@amd.com >
2026-03-25 17:42:56 +08:00
Chauncey
09c3dc9186
[Revert] Remove CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function ( #37968 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-25 06:19:37 +00:00
vllmellm
42e9547976
[ROCm][Test] Fix ROCM_AITER_UNIFIED_ATTN attn+quant fusion test ( #37640 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-03-25 05:06:15 +00:00
Chauncey
a32783bb35
[Bugfix] Fix IndexError when accessing prev_tool_call_arr in OpenAIToolParser ( #37958 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-25 12:06:21 +08:00
Baorun (Lauren) Mu
9d0351c91d
[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc ( #37914 )
...
Signed-off-by: Baorun Mu <bmu@nvidia.com >
2026-03-24 19:53:24 -07:00
Artem Perevedentsev
a93a53f8a1
[Performance] Auto-enable prefetch on NFS with RAM guard ( #37673 )
...
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com >
2026-03-24 17:31:14 -07:00
Andreas Karatzas
679c6a3ecc
[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 ( #37787 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-25 08:17:33 +08:00
Andreas Karatzas
8bbb7c7f20
[ROCm][CI][PD] Add Hybrid SSM integration tests to CI ( #37924 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-25 07:58:39 +08:00
Kevin H. Luu
af945615b5
[release] Move the rest of release jobs to release queue ( #38044 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-03-24 16:40:58 -07:00
Terry Gao
82580b10ac
[Perf] Disable inductor runtime asserts by default for serving perfor… ( #37485 )
...
Signed-off-by: tianrengao <terrygao87@gmail.com >
Co-authored-by: Tianren Gao <tianren@fb.com >
2026-03-24 19:37:51 -04:00
Netanel Haber
a0d487b2e1
nano_nemotron_vl: suppress readonly torch.from_numpy() warning in image and video resize paths ( #37903 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-03-24 23:25:56 +00:00
Junhao
b73b5b0629
Make microbatch optimization (DBO) work with general models ( #37926 )
...
Signed-off-by: Junhao Li <junhao@ubicloud.com >
2026-03-24 14:40:08 -07:00
Michael Goin
0f0e03890e
[UX] Add flashinfer-cubin as CUDA default dep ( #37233 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-03-24 14:13:08 -07:00
Woosuk Kwon
4b53740d7f
[MRV2] Fix for DS v3.2 ( #38030 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-24 14:03:24 -07:00
Nick Hill
4e824d1c83
[Model Runner V2][Minor] Simplify PP logic ( #38031 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-24 13:57:17 -07:00
amey asgaonkar
0c1809c806
Add Ubuntu 24.04 support for Docker builds ( #35386 )
...
Signed-off-by: aasgaonkar <aasgaonkar@nvidia.com >
2026-03-24 13:34:44 -07:00
liangel-02
8c47fdfdb1
[FlexAttention] allow custom mask mod ( #37692 )
...
Signed-off-by: Angel Li <liangel@meta.com >
2026-03-24 16:03:24 -04:00
Javier De Jesus
54b0578ada
[Bugfix] Pass hf_token through config loading paths for gated model support ( #37920 )
...
Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com >
2026-03-24 15:22:05 -04:00
Richard Zou
89f572dbc0
[BugFix] fix VLLM_USE_STANDALONE_COMPILE=0 ( #38015 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-24 19:08:26 +00:00
Richard Zou
71a4a2fbd0
[BugFix] Fix order of compile logging ( #38012 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-24 18:58:18 +00:00
Nick Cao
935c46dd9b
[Model] Add Granite 4.0 1B speech to supported models ( #38019 )
...
Signed-off-by: Nick Cao <ncao@redhat.com >
2026-03-24 18:23:41 +00:00
Willy Hardy
057fc94cbd
[Bugfix] Fix structured output crash on CPU due to pin_memory=True ( #37706 )
...
Signed-off-by: Willy Hardy <whardy@redhat.com >
Signed-off-by: Will Hardy <whardy@redhat.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-24 17:44:17 +00:00
Vineeta Tiwari
b58c5f28aa
docs: fix broken offline inference paths in documentation ( #37998 )
...
Signed-off-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com >
Signed-off-by: Vineeta Tiwari <vineetatiwari2000@gmail.com >
Co-authored-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-24 17:35:14 +00:00
Ming Yang
c07e2ca6e0
Fix Mamba state corruption from referencing stale block table entries ( #37728 ) ( #37728 ) ( #37728 )
2026-03-24 10:29:59 -07:00
Dhruv Singal
4df5fa7439
[Bugfix] Force continuous usage stats when CLI override is enabled ( #37923 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
Co-authored-by: OpenCode <noreply@openai.com >
2026-03-24 10:29:50 -07:00
sihao_li
a5416bc52e
[XPU] Support Intel XPU hardware information collection in usage stats ( #37964 )
...
Signed-off-by: sihao.li <sihao.li@intel.com >
2026-03-24 10:29:17 -07:00
Harry Mellor
b3601da6e7
[Mypy] Fix mypy for vllm/model_executor (except vllm/model_executor/layers) ( #37904 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-24 17:14:01 +00:00
Dan Blanaru
dc78c2c933
[Core] add option to schedule requests based on full ISL ( #37307 )
...
Signed-off-by: Dan Blanaru <48605845+DanBlanaru@users.noreply.github.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-03-24 13:01:12 -04:00
Sungjae Lee
4731884796
[Feature] limit thinking tokens (hard limit) ( #20859 )
...
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com >
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com >
Signed-off-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-24 09:53:07 -07:00
Harry Mellor
8de5261e69
Update new contributor message ( #37999 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-24 16:01:41 +00:00
wang.yuqi
1b6cb920e6
[Deprecate] Deprecate pooling multi task support. ( #37956 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-03-24 14:07:47 +00:00
Li, Jiang
352b90c4a4
[Bugfix] Add replacement of _compute_slot_mapping_kernel on CPU ( #37987 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-03-24 07:00:20 -07:00
Sage
1c0aabdeb0
[Bugfix] Suppress spurious CPU KV cache warning in launch render ( #37911 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-24 12:36:18 +00:00
Ilya Markov
14acf429ac
[EPLB] Remove main waits in case of slow EPLB ( #36271 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-03-24 11:50:44 +00:00
Harry Mellor
ce57fd5557
[Docs] Fix build ( #37991 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-24 03:20:49 -07:00
Flora Feng
2e67fa756d
Fix tool_parser_cls type annotation from Callable to type[ToolParser] ( #37957 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-23 22:58:27 -07:00
Ronen Schaffer
e3c6c10cad
[KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove Backend abstraction, restructure into cpu/ package ( #37874 )
...
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com >
2026-03-24 07:02:51 +02:00
jetxa
16a664df24
[Frontend][Bugfix] Pass default_chat_template_kwargs to AnthropicServingMessages ( #37899 )
...
Signed-off-by: jetxa <jetxzhang@outlook.com >
2026-03-24 05:00:12 +00:00
Kevin H. Luu
7281199a8c
[release] Move agent queue to Release cluster queues ( #37783 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-03-23 20:36:47 -07:00
Kevin H. Luu
b2dd75eb48
Downsize CPU jobs to use small queue ( #37913 )
...
Signed-off-by: khluu <khluu000@gmail.com >
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-03-23 20:36:37 -07:00
Wentao Ye
c59a132f96
[V0 Deprecation] Refactor kv cache from list to element ( #37487 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-23 20:10:11 -07:00
Andreas Karatzas
de99d91ece
[ROCm][CI] Split Entrypoints Integration (API Server 1) into 3 jobs ( #37906 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-24 09:48:37 +08:00
Wentao Ye
83c9d525b6
[CI] Add batch invariant test: Block FP8 + small MOE ( #37895 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-23 21:16:14 -04:00
Giancarlo Delfin
8f4824b664
[Model Runner V2] Gather multimodal embeddings before draft model postprocess ( #37932 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-03-23 18:14:13 -07:00
roikoren755
56777b5c89
[Test] E2E Nemotron-3-Super tests ( #36803 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-03-23 17:49:56 -07:00
Kevin H. Luu
2488a82f89
[CI] Split V1 Others into 3 separate jobs ( #37016 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-24 06:44:38 +08:00
Ranran
dc6908ac6a
[Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning ( #35007 )
...
Signed-off-by: Ranran <1012869439@qq.com >
Signed-off-by: Ranran <hzz5361@psu.edu >
Signed-off-by: ran <hzz5361@psu.edu >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-03-23 18:31:14 -04:00
yzong-rh
e85f8f0932
[Bug][MoE] Strengthen _supports_current_device() checks in the TRTLLM FP8, NVFP4, and FlashInfer CuteDSL MoE experts ( #36728 )
...
Signed-off-by: Yifan Zong <yzong@redhat.com >
2026-03-23 17:02:57 -04:00
Robert Shaw
5bf3c42d4c
[Bug][MoE] Fix TRTLLM NVFP4 Routing Kernel Precision ( #36725 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-23 20:19:06 +00:00
Kyle Sayers
38364a7e32
[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels ( #36799 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2026-03-23 16:03:29 -04:00
Matthew Bonanni
fafe76b4af
[Async][Spec Decoding] Zero-bubble async scheduling + spec decoding ( #32951 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Co-authored-by: zhrrr <43847754+izhuhaoran@users.noreply.github.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2026-03-23 15:37:22 -04:00
Woosuk Kwon
ffb5b32b5f
[MRV2] Consider spec decoding in warmup ( #37812 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-23 17:45:43 +00:00
Kunshang Ji
91fd695b75
[CI] split Entrypoints Integration (API Server 1) into 3 jobs ( #37882 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-23 10:37:56 -07:00
Nicolò Lucchesi
1cbbcfe8a3
[CI][PD] Add Hybrid SSM integration tests to CI ( #37657 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-23 23:58:19 +08:00
Angela Yi
aceadb5ee1
Use lazy graph module during split_module to defer recompile() ( #37609 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-03-23 11:21:29 -04:00
Yufeng He
ec2280611a
[Bugfix] Fix RoBERTa position_ids accumulation on CUDA graph padding ( #37884 )
2026-03-23 15:15:12 +00:00
yanghui1-arch
7151ae6528
[Bugfix] RoBERTa position_id accumulation in CUDA graph padding region ( #37873 )
...
Signed-off-by: dass90 <3053034939@qq.com >
2026-03-23 14:59:21 +00:00
Wentao Ye
45bd5c8e75
[Mypy] Fix mypy for vllm/config ( #37808 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-23 14:33:59 +00:00
Zhaodong Bing
10a1018c12
[ROCm] fix sleep mode not releasing GPU memory problem on ROCm ( #37533 )
...
Signed-off-by: bingzhaodong <aaab8b@gmail.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-03-23 06:07:19 -07:00
Jee Jee Li
aec2dc6c0d
[Bugfix][LoRA] Fix incorrect LoRA Log ( #37877 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-23 11:42:52 +00:00
DorBernsohn
7938d12119
[Bugfix] Fix CPU backend crash in KV cache block zeroing ( #37550 )
...
Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com >
2026-03-23 11:35:45 +00:00
Kunshang Ji
debd6e768c
[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle ( #37784 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-23 11:10:41 +00:00
Andrew Xia
9ace378a63
[Frontend][Responses API] Fix arrival_time recording for TTFT on initial request ( #37498 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2026-03-23 09:58:08 +00:00
Kunshang Ji
27d5ee3e6f
[FP8]add FP8 WoQ kernel abstraction. ( #32929 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2026-03-23 09:47:47 +00:00
wangxiyuan
35141a7eed
[Misc]Update gitignore ( #37863 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2026-03-23 01:14:10 -07:00
Chuan (Richard) Li
e99fb98867
[ROCm] Fix fused_moe_fake signature mismatch and other AITER bugs ( #36100 )
...
Signed-off-by: Li <chuali@amd.com >
2026-03-23 15:48:31 +08:00
Artem Perevedentsev
a16133a0f1
[Perf] [Bugfix] Fix Triton autotuning in inference for Qwen3.5 ( #37338 )
...
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com >
2026-03-23 00:37:58 -07:00
Hojin Yang
54ab804e87
[Bugfix] Store Qwen3Next A_log in fp32 ( #37810 )
...
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-23 15:36:57 +08:00
r266-tech
02e6efe56d
[Bugfix] JAIS: Only apply ALiBi when position_embedding_type='alibi' ( #37820 )
...
Co-authored-by: r266-tech <r266-tech@users.noreply.github.com >
2026-03-23 07:36:34 +00:00
Matthias Gehre
410d300893
[ROCm][Refactor] Enable AWQMarlinConfig on ROCm to use choose_mp_linear_kernel ( #36505 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-03-23 15:36:08 +08:00
Yan Ma
d3fe857135
update doc for online fp8 quantization ( #37851 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2026-03-23 05:19:03 +00:00
Baorun (Lauren) Mu
f85e479e66
[Feature] ViT Full CUDA Graph ( #35963 )
...
Signed-off-by: Baorun Mu <bmu@nvidia.com >
2026-03-23 13:01:10 +08:00
Jee Jee Li
1f0d210641
[CI/Build][LoRA] Update Qwen35 LoRA testing ( #37816 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-23 12:55:49 +08:00
Ben Browning
3bbe2e1e6e
[Test] Consolidate tool parser unit tests to tests/tool_parsers ( #37834 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2026-03-23 04:24:25 +00:00
Augusto Yao
6e04e79326
always use embed&token_classify for bge-m3 ( #37632 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-23 03:10:57 +00:00
Lasha Koroshinadze
e7767eccae
Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling ( #37643 )
...
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com >
2026-03-23 10:29:07 +08:00
Woosuk Kwon
43877a620b
[MRV2] Enable PP CUDA graph test ( #37830 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-22 16:30:25 -07:00
zhanqiuhu
63f49b8bd4
[Model Runner V2] Enable piecewise CUDA graphs for pipeline parallelism ( #35162 )
...
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-22 20:48:25 +00:00
Woosuk Kwon
a5e9d511de
[MRV2] Use FP64 for Gumbel noise ( #37798 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-22 12:28:10 -07:00
Yongye Zhu
c058ff44d4
[Bigfix]fix lora test by pass padded size back to the layer ( #37811 )
2026-03-22 13:20:13 -06:00
Woosuk Kwon
ce9b1d76cf
[MRV2] Skip hidden states allocation for PW CUDA graphs ( #37818 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-22 11:47:21 -07:00
Netanel Haber
e74c17e153
Enable NemotronHPuzzle + NemotronHMTP ( #37803 )
2026-03-22 15:13:58 +00:00
Wentao Ye
eaf4978621
[Test] Only Run MLA model when user explicitly set for batch invariance ( #37719 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-22 09:09:12 -04:00
Wentao Ye
77d24c4bfe
[Bug] Fix fp8 deepgemm batch invariant ( #37718 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-22 08:57:20 -04:00
Giancarlo Delfin
b3e846017d
[Model Runner V2] Support multi-modal embeddings for spec decode model ( #36097 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-22 02:48:43 -07:00
Andreas Karatzas
cd1242d82a
[ROCm][CI] Stabilize ROCm speech-to-text translation test with lower min acc threshold ( #37723 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 17:32:08 +08:00
Robert Shaw
4383f1532e
[MoE] Move PF Methods to Folder ( #35927 )
2026-03-22 02:42:59 -06:00
Andreas Karatzas
6eedec6e36
[ROCm][CI] Make some duplicated tests optional so that they are only evaluated in our nightly ( #37780 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:03:18 +08:00
Andreas Karatzas
ffc8531524
[ROCm][CI] Added missing resampy dependency for MM audio tests ( #37778 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:02:41 +08:00
Andreas Karatzas
6ecba840d7
[ROCm][CI] get_cu_count was renamed to num_compute_units in #35042 ( #37764 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:02:21 +08:00
Andreas Karatzas
3b06c55c78
[ROCm][CI] Fix MEGA_AOT_ARTIFACT fallback when PyTorch < 2.10.0 lacks AOT support ( #37763 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:02:03 +08:00
Yang Liu
b050700462
[Perf] Optimize glm4.xv VIT ( #37779 )
...
Signed-off-by: Yang <lymailforjob@gmail.com >
2026-03-22 06:12:34 +00:00
Andreas Karatzas
5dac719b2b
[Bugfix] Handle libsndfile sf_error(NULL) race condition in audio fallback ( #37782 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 13:37:29 +08:00
Andreas Karatzas
c862481c02
[CI] Skip ISAAC multimodal tests due to broken upstream HF model weights ( #37781 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 13:23:32 +08:00
Andreas Karatzas
c86b17cfe6
[ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm ( #37717 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 12:25:16 +08:00
Andreas Karatzas
66f927f205
[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing ( #37775 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 03:22:24 +00:00
Andreas Karatzas
e78bc74268
[ROCm][CI] close missing quote in kernels/moe block in run-amd-test.sh ( #37774 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 09:42:34 +08:00
Robert Shaw
6b2fa3a762
[MoE] Move FlashInfer CuteDSL experts into fused_moe/experts/ ( #37759 )
...
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2026-03-21 19:15:16 -04:00
Robert Shaw
eeee5b262d
[Quantization][Deprecation] Remove PTPC FP8 ( #32700 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-21 22:10:16 +00:00
Robert Shaw
5ad0446572
Revert "Consolidate AWQ quantization into single awq_marlin.py file" ( #37768 )
2026-03-21 17:20:41 -04:00
Robert Shaw
8cc700dd6a
Consolidate AWQ quantization into single awq_marlin.py file
...
Merge awq.py and awq_marlin.py into a single file, eliminating the
circular import between them. awq.py becomes a backward-compat shim.
Follows the same structure as gptq_marlin.py.
Co-authored-by: Claude
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2026-03-21 17:09:17 -04:00
Brandon Pelfrey
80b70884eb
Add tensor IPC transfer mechanism for multimodal data ( #32104 )
...
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com >
Signed-off-by: Brandon Pelfrey <brandonpelfrey@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-21 20:10:20 +00:00
Mohammad Miadh Angkad
61e381dcf0
[Perf] Add SM 10.3 (B300/GB300) all-reduce communicator tuning ( #37756 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
2026-03-21 19:43:47 +00:00
Mohammad Miadh Angkad
88f1b374f5
[Core] Enable allreduce fusion by default for SM 10.3 (B300/GB300) ( #37755 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
2026-03-21 19:40:37 +00:00
Francesco Fusco
298e510848
[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() ( #37318 )
...
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com >
2026-03-21 09:29:43 +00:00
Chaitanya Sri Krishna Lolla
3982bc2cd0
[ROCm] Enable DeepEP ROCm as all2allbackend for AMD GPUs. ( #34692 )
...
Signed-off-by: Tej Kiran <vpolamre@amd.com >
Co-authored-by: Tej Kiran <vpolamre@amd.com >
2026-03-21 00:32:31 -07:00
Andreas Karatzas
02eec7ecbe
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (MI355) ( #37721 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-21 15:27:12 +08:00
Bongwoo Bak
17ee641c45
[Responses API] Add kv_transfer_params for PD disaggregation ( #37424 )
...
Signed-off-by: bongwoobak <bongwoobak@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-03-21 13:48:54 +08:00
Andreas Karatzas
0d50fa1db6
[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 ( #37610 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-21 12:57:25 +08:00
Simon Mo
1fa1e53a73
Revert "[compile] Initialize passes at VllmBackend init" ( #37733 )
2026-03-20 21:35:49 -07:00
Andreas Karatzas
3ffa52009f
[ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collection on cross-platform builds ( #37617 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-21 11:58:58 +08:00
Yongye Zhu
87bd91892f
[MoE Refactor] Mxfp4 oracle rebased ( #37128 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-21 03:37:04 +00:00
Isotr0py
c7f98b4d0a
[Frontend] Remove librosa from audio dependency ( #37058 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-21 11:36:15 +08:00
tmm77
1c472f8fe1
Add get_device_uuid for rocm ( #37694 )
...
Signed-off-by: Tiffany Mintz <Tiffany.Mintz@amd.com >
2026-03-21 11:33:16 +08:00
Itay Alroy
c57d38d603
elastic_ep: Fix issues with repeated scale up/down cycles ( #37131 )
...
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com >
2026-03-20 23:13:02 +00:00
Kaihang Jiang
e5ed6c6c13
[BugFix] Allow qk_nope_head_dim=192 in FlashInfer MLA backend checks ( #37475 )
...
Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com >
2026-03-20 16:14:55 -06:00
Wentao Ye
b3d0b37908
[Refactor] Remove unused dead code ( #36171 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-20 16:12:51 -06:00
Santino Ramos
85f671b8e1
[Model Runner V2] Support Streaming Inputs ( #37028 )
...
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com >
2026-03-20 20:42:25 +00:00
Andreas Karatzas
8bc6b5cdb0
[ROCm][CI] Setting some mi325_4 tests back to optional (in parity with upstream) ( #37711 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-20 12:25:08 -07:00
Vadim Gimpelson
4f16ebbbd3
[Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing ( #37591 ) ( #37605 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-03-20 12:19:26 -07:00
Angela Yi
12fd17eb51
[compile] Initialize passes at VllmBackend init ( #35216 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-03-20 11:40:33 -07:00
Cyrus Leung
37aadf6237
[Model] Update Kimi-K25 and Isaac processors to fit HF-style ( #37693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-20 18:30:22 +00:00
Le Yang
d7d2b5e405
[Bugfix] Disable --calculate-kv-scales for hybrid GDN/Mamba+Attention… ( #37565 )
...
Signed-off-by: Young-Leo <562593859@qq.com >
2026-03-20 18:28:34 +00:00
SherryC41
6ec5e9fd37
refactor: abstract deepgemm support into platform ( #37519 )
...
Co-authored-by: sherryC41 <sherry.c.c41@gmail.com >
2026-03-20 17:54:08 +00:00
Lucas Wilkinson
e1d85e5c24
[Attention] Support distinguishing between short extends and decodes ( #37303 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-03-20 10:49:36 -07:00
Peter Pan
79eb9369c5
fix CUDAGraph memory being counted twice ( #37426 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
Signed-off-by: Peter Pan <peter.pan@daocloud.io >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-20 17:36:32 +00:00
Woosuk Kwon
e80cfe575d
[MRV2] Avoid recompilation of _gather_block_tables_kernel ( #37645 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-20 10:31:45 -07:00
Xin Yang
d0532bf38d
[Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_kernels ( #37683 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-03-20 11:28:41 -06:00
Andreas Karatzas
fb4e8bf442
[ROCm][CI] Fix accuracy for llama-nemotron-vl pooling tests ( #37613 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-20 10:16:59 -07:00
Harry Mellor
6ade4bc5a5
Fix various config related issues for Transformers v5 ( #37681 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-20 16:30:12 +00:00
Zhengxu Chen
2e089b96a8
[compile] Add compiled artifact counter for VLLM_USE_MEGA_AOT_ARTIFACT=1. ( #37589 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-20 16:22:46 +00:00
Martin Hickey
880be2b1b8
[Metrics] Some small refactoring for better maintainability ( #33898 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
2026-03-20 16:11:34 +00:00
Zhengxu Chen
c0f5fae601
[compile] Fix aot test failures with torch 2.12. ( #37604 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-20 16:06:29 +00:00
Rémi Delacourt
aa84e43ccb
[Pixtral] Enable Pixtral language model support Eagle3 ( #37182 )
...
Signed-off-by: remi <remi@mistral.ai >
2026-03-20 15:50:15 +00:00
Matthias Gehre
5e806bcf54
[Bugfix] Fix ConchLinearKernel channelwise quantization (group_size=-1) ( #37329 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-03-20 10:32:21 -05:00
Matthias Gehre
56a62c310c
[Bugfix] Reject channelwise quantization (group_size <= 0) in ExllamaLinearKernel ( #37331 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-03-20 10:31:57 -05:00
L.B.R.
1779c09898
[ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode ( #34709 )
...
Signed-off-by: L.B.R. <lbr@mmonad.com >
Co-authored-by: L.B.R. <lbr@mmonad.com >
2026-03-20 10:11:23 -05:00
xuebwang-amd
44eea10f68
[ROCm][Quantization] make quark ocp mx dtype parser robust for weight-only quantization ( #36232 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
2026-03-20 10:10:03 -05:00
Ilya Boytsov
8b6c6b9505
[Model] Add LFM2-ColBERT-350M support ( #37528 )
...
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com >
2026-03-20 14:57:57 +00:00
Harry Mellor
9f6d9dd371
Fix attribute error in isaac_patch_hf_runner ( #37685 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-20 14:49:40 +00:00
Jee Jee Li
dd20ee4e3e
[UX] Enable torch_profiler_with_stack ( #37571 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-20 11:17:26 +00:00
Chauncey
0523449c9c
[Misc] Use logger.info_once for auto tool choice log message ( #37661 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-20 10:40:36 +00:00
Flora Feng
b4c1aef21c
[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ ( #37500 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-20 02:50:34 -07:00
Flora Feng
6050b93bed
[Refactor] Move serve entrypoint tests under tests/entrypoints/serve/ ( #37595 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-20 02:10:47 -07:00
Andreas Karatzas
5a4a179591
[ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible attention backend ( #37611 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-20 17:07:26 +08:00
Andreas Karatzas
37cd9fc107
[ROCm][CI] Remove deepep DBO tests on gfx90a ( #37614 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-20 17:07:07 +08:00
Andreas Karatzas
9cfd4ebb5e
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list ( #37619 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-20 17:06:53 +08:00
wang.yuqi
ed359c497a
[Model] Deprecate the score task (this will not affect users). ( #37537 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-20 08:07:56 +00:00
Giancarlo Delfin
dcee9be95a
[Model Runner V2] Fix draft logits not populated during cudagraph replay ( #37639 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-03-20 07:43:47 +00:00
Andreas Karatzas
bd8c4c0752
[CI] Removing deprecated rlhf examples reference ( #37585 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-20 15:20:33 +08:00
Wei Zhao
0140eafb15
[Bug] Fix FlashInfer allreduce fusion workspace uninitialized error ( #37461 )
...
Signed-off-by: root <root@prenyx0169.a51.clusters.nvidia.com >
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Signed-off-by: <>
Co-authored-by: root <root@prenyx0169.a51.clusters.nvidia.com >
Co-authored-by: root <root@prenyx0042.a51.clusters.nvidia.com >
2026-03-20 03:09:21 -04:00
Kunshang Ji
bdf6a0a57b
[XPU] bump vllm-xpu-kernels to v0.1.4 ( #37641 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-20 15:04:38 +08:00
Wangbei25
0674d1fee7
[PluggableLayer][MM] Add PluggableLayer for CustomQwen2Decoder ( #37293 )
...
Signed-off-by: Wangbei25 <wangbei41@huawie.com >
Signed-off-by: Wangbei25 <wangbei41@huawei.com >
Co-authored-by: Wangbei25 <wangbei41@huawie.com >
2026-03-20 06:24:07 +00:00
Cyrus Leung
30108fc8b0
[Model] Refactor Step3-VL processor to HF style ( #37579 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-20 06:05:08 +00:00
Flora Feng
e2d1c8b5e8
[Refactor] Relocate entrypoint tests to match serving code structure ( #37593 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-20 05:31:23 +00:00
Huanxing
6951fcd44f
[XPU] Automatically detect target platform as XPU in build. ( #37634 )
...
Signed-off-by: huanxing <huanxing.shen@intel.com >
2026-03-20 13:30:15 +08:00
Giancarlo Delfin
39474513f6
[Model Runner V2] fix draft attention metadata generation ( #37364 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-03-19 21:05:15 -07:00
Yuxiang Liang
638a872d77
fix(xpu): Re-compute compile ranges after platform-specific config updates ( #37523 )
...
Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com >
Signed-off-by: Yuxiang Liang <yuliang@habana.ai >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-20 03:52:35 +00:00
Flora Feng
9040151fe1
[V0 Deprecation] Deprecate --disable-frontend-multiprocessing ( #37612 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-20 11:31:43 +08:00
Jee Jee Li
8fbe3f303f
[Bugfix][LoRA] Fix Qwen35 LoRA ( #36976 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-20 11:09:32 +08:00
Xiao
ea2c148fa7
[compile][graph_partition]Add tensor size handling ( #36038 )
...
Signed-off-by: Xiao Fu <xiaofu@meta.com >
2026-03-19 19:55:25 -07:00
Tianmu Li
47b7af0d87
[Feat] Enable CompressedTensorW4A8Int for XPU ( #37207 )
...
Signed-off-by: Li, Tianmu <tianmu.li@intel.com >
2026-03-20 02:34:28 +00:00
tianshu-Michael-yu
269bf46d99
fix: disambiguate multimodal prefix cache keys ( #36708 )
...
Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com >
2026-03-20 10:33:20 +08:00
Flora Feng
e5a77a5015
[CI] Update mergify tool-calling label paths ( #37478 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-20 02:22:23 +00:00
Itay Alroy
ca1ac1a4b4
Fix DP coordinator ZMQ port TOCTOU ( #37452 )
...
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
2026-03-20 00:58:31 +00:00
Divakar Verma
4ca3fa6bb4
[ROCm][Bugfix] fix cache block size mismatch for aiter unified attention ( #37606 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-03-20 00:00:08 +00:00
Flora Feng
be12afd284
[Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 ( #36056 )
2026-03-19 19:51:25 -04:00
Wentao Ye
df3c0291a3
[Bug] Fix EmbedIOprocessor "classify" <-> "embed" ( #37573 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-20 07:40:10 +08:00
Wentao Ye
2be1a0f74b
[Refactor] Remove dead code in pooling model ( #37572 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-20 07:39:43 +08:00
Jim Smith
4120a05ff1
Fix AttributeError in Qwen3.5 GDN layers with quantized models ( #37448 )
...
Signed-off-by: Jim Smith <jim@joshua8.ai >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com >
2026-03-19 19:21:14 -04:00
rasmith
98ff042917
[CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_aiter_topk since its not necessary ( #36996 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-03-20 07:12:45 +08:00
Artem Perevedentsev
b55156eae9
[Performance] Enable Triton autotuning disk cache by default ( #37188 )
...
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com >
2026-03-19 17:36:28 -04:00
Laith Sakka
112944fab9
test Qwen/Qwen3-4B-Instruct-2507 for unbacked ( #36064 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2026-03-19 17:28:45 -04:00
bnellnm
91be5f9be3
[MoE Refactor] Rename "naive" all2all backend ( #36294 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-03-19 15:50:34 -04:00
Aaron Hao
4ee847e400
Comment fix for async rl example ( #35244 )
...
Signed-off-by: hao-aaron <ahao@anyscale.com >
2026-03-19 19:46:07 +00:00
Andreas Karatzas
040a505ff5
[ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline ( #34839 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-19 14:30:58 -05:00
bnellnm
9279c59a0e
[MoE Refactor] DefaultMoERunner simplifcation ( #33049 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-03-19 15:07:44 -04:00
Wentao Ye
7454096199
[Log] Log once in local node by default ( #37568 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-19 12:04:59 -07:00
Andreas Karatzas
fb8b5e05fc
[CI] Add retry with 4x backoff to HTTP fetches for transient failures ( #37218 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-19 19:00:20 +00:00
Harry Mellor
e5d96dc8fc
Fix SpeculatorsConfig now that PreTrainedConfig is a dataclass in Transformers ( #37574 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 18:04:40 +00:00
EdalatiAli
daa05bf340
[Bugfix] Fix AttributeError when serving MXFP8 models with DeepGEMM installed ( #37358 )
...
Signed-off-by: EdalatiAli <aliedalati@cohere.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-19 17:58:33 +00:00
Lucas Kabela
7769b58307
[torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict ( #37345 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-03-19 17:26:12 +00:00
Chauncey
2f9f946b22
[P/D] AnthropicMessages add kv_transfer_params for PD disaggregation ( #37535 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-19 16:41:20 +00:00
Fadi Arafeh
2890aecce5
[CPU][UX] Do not crash when tcmalloc/libiomp are not ldpreloaded ( #37561 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-03-19 16:35:45 +00:00
Harry Mellor
34f093b417
[CI] Gate pre-commit on ready label or number of contributions ( #37544 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 16:21:57 +00:00
Harry Mellor
4dce8321a9
Run MacOS smoke test on daily cron job instead of every commit ( #37567 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 16:19:50 +00:00
Cyrus Leung
657855ab41
[Misc] Cleanup more configs and processors ( #37560 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-19 15:45:23 +00:00
Wei Zhao
e27b8ba3d1
[Bug] Fix fp8 trtllm MoE modular kernel supported routing methods ( #37346 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-03-19 11:43:06 -04:00
Woosuk Kwon
40b8363b45
[MRV2] Use fp32 for draft logits ( #37526 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-19 08:41:21 -07:00
mikaylagawarecki
8b10e4fb31
[1/n] Migrate permute_cols to libtorch stable ABI ( #31509 )
...
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com >
2026-03-19 11:27:26 -04:00
Ifta khairul Alam Adil
104605cbf2
Remove deprecated reasoning_content message field(part-2) ( #37480 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com >
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Philip Ottesen <phiott256@gmail.com >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
Signed-off-by: Andy Lo <andy@mistral.ai >
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com >
Signed-off-by: sihao.li <sihao.li@intel.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: JartX <sagformas@epdcenter.es >
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Philip Ottesen <phiott256@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com >
Co-authored-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com >
Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 15:20:08 +00:00
Jee Jee Li
96266f119b
[LoRA] Minor improvements to LoRA log ( #37557 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-03-19 15:18:06 +00:00
Sage Moore
7c0cf3bcd0
Cap the number of API servers to 1 when using Elastic EP. ( #37466 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2026-03-19 10:42:57 -04:00
Harry Mellor
572b432913
Stop bench CLI from recursively casting all configs to dict ( #37559 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 14:04:03 +00:00
Cyrus Leung
9515c20868
[Misc] Clean up processing logic ( #37541 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-19 13:30:20 +00:00
DorBernsohn
c63ca2b2e6
[Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support ( #37438 )
...
Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com >
2026-03-19 21:08:00 +08:00
Harry Mellor
a32eaf5bb2
[CI] Merge cleanup_pr_body.yml and reminder_comment.yml ( #37552 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 12:55:07 +00:00
XueLiang Yang
e390742c59
Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu… ( #37536 )
...
Signed-off-by: xueliangyang-oeuler <yxl546827391@gmail.com >
Co-authored-by: xueliangyang-oeuler <yxl546827391@gmail.com >
2026-03-19 12:05:07 +00:00
Cyrus Leung
7a6ebcbfcf
[Model] Remove unnecessary get_language_model ( #37545 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-19 20:00:36 +08:00
Cyrus Leung
c7bc12c20f
[CI/Build] Split out MM pooling tests ( #37542 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-19 11:36:11 +00:00
wang.yuqi
f9e2a38386
[Docs] Reorganize pooling docs. ( #35592 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 11:25:47 +00:00
Harry Mellor
4426447bba
Don't log exc_info when vLLM tries to doenload a file that doesn't exist ( #37458 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-19 10:38:29 +00:00
Li, Jiang
3322e26420
[Bugfix] Avoid more OpenMP thread reallocation in CPU torch compile ( #37538 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-03-19 10:24:39 +00:00
Cyrus Leung
765e461065
[Bugfix] Fix Nemotron Parse loading ( #37407 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-19 09:55:29 +00:00
Duyi-Wang
6a9cceb219
[Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_input_quant ( #37418 )
...
Signed-off-by: Duyi-Wang <duyi.wang@amd.com >
2026-03-19 09:49:27 +00:00
yassha
199f914183
fix(cpu): add null check for aligned_alloc in ScratchPadManager ( #37369 )
...
Signed-off-by: yassha <50112520+yassha@users.noreply.github.com >
2026-03-19 17:45:06 +08:00
Kunshang Ji
ca21483bf9
[MISC] fix pin_memory=torch.cuda.is_available(), use is_pin_memory_available ( #37415 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-19 09:23:24 +00:00
TJian
da70c87e81
[CI] Fix wrong path test file, missing rlhf_async_new_apis.py ( #37532 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-03-19 02:21:55 -07:00
Collin McCarthy
0b6d52629f
Support temporal compression for Nemotron-3-VL videos ( #36808 )
...
Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com >
2026-03-19 08:02:19 +00:00
Ziming Huang
d3cc379567
[Perf] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ( #37425 )
...
Signed-off-by: 智鸣 <hzm414167@alibaba-inc.com >
2026-03-19 15:43:48 +08:00
cdpath
354cd580d5
fix(anthropic): remove non-standard 'data: [DONE]' from Anthropic streaming ( #37510 )
...
Signed-off-by: cdpath <cdpath@outlook.com >
2026-03-19 07:23:35 +00:00
zhanqiuhu
d49f273144
[SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation ( #37310 )
2026-03-19 08:22:00 +01:00
Flora Feng
b21d384304
[Refactor] Relocate endpoint tests to mirror serving code directory structure ( #37504 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-19 07:19:36 +00:00
Hongxia Yang
e3126cd107
[ROCm] issue management - request information for bug issues on ROCm ( #37009 )
...
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com >
2026-03-19 03:51:29 +00:00
Wentao Ye
e37ff5b5c8
[Perf] Optimize token_embed for pooling models, 1.0% token throughput improvement ( #37347 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-19 10:27:51 +08:00
Aaron Hao
6accb21f2a
[bug] Fix deadlock with pause resume and collective_rpc ( #37024 )
...
Signed-off-by: hao-aaron <ahao@anyscale.com >
2026-03-19 01:49:02 +00:00
Giancarlo Delfin
053f3b6309
[Model Runner V2] Spec decode rejection sampler logprobs support ( #37237 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-03-19 01:36:27 +00:00
Aaron Hao
5f82706a21
[BUG] Exclude SKIP_TENSORS from get_layer_size() + new weight sync example for dpep ( #37334 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-03-19 00:45:10 +00:00
Sage Moore
c32a58cc2a
[EPLB] Simplify EPLB rearrange by only returning one map ( #36267 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-03-18 20:34:00 -04:00
Elvir Crnčević
ef2c4f778d
[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding ( #37442 )
...
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-19 00:28:37 +00:00
sihao_li
9dade5da3a
[XPU]Unify xpu test dependencies in dockerfile.xpu ( #36477 )
...
Signed-off-by: sihao.li <sihao.li@intel.com >
2026-03-19 08:12:07 +08:00
Thillai Chithambaram
828f862acb
[Bugfix] Expand quantization method support in perf metrics ( #37231 )
...
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com >
2026-03-18 23:54:19 +00:00
Andy Lo
577df69b26
[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish ( #37054 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2026-03-18 23:07:29 +00:00
Giancarlo Delfin
04244fd0e1
[Model Runner V2] Spec decode rejection sampler greedy support ( #37238 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-03-18 15:59:03 -07:00
Michael Goin
9482b0b085
[Bugfix] Remove assertion for NVFP4 scale dynamic range ( #37465 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-03-18 15:37:49 -07:00
Woosuk Kwon
5bc1da147f
[LoRA][BugFix] Fix skipped LoRA adapters for Mistral3 ( #36928 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-18 22:34:19 +00:00
Philip Ottesen
0091017188
fix(worker): optimize swap_states to copy only active token prefixes ( #34733 )
...
Signed-off-by: Philip Ottesen <phiott256@gmail.com >
2026-03-18 14:59:27 -07:00
Wentao Ye
0d81a1fe61
[V0 Deprecation] Deprecate virtual engine ( #37195 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-18 14:30:14 -07:00
Netanel Haber
6ae4c8d6fc
chunk parakeet into 30s clips to prevent OOMs on long audios ( #36671 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-03-18 14:22:24 -07:00
JartX
a913b612d8
[Bugfix] Fix ROCm crash in qwen3_next multi-stream events ( #36795 ) ( #37427 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2026-03-18 16:06:31 -04:00
Harry Mellor
5ce2d10e4a
Fix models which use layer_type_validation for Transformers v5 ( #37398 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-18 18:41:51 +00:00
Chengyu Fang
738d0a281f
[Bugfix] Fix incorrect use of merge_size in Qwen3-VL video timestamp calculation ( #37439 )
...
Signed-off-by: chengyufang <cnyvfang@outlook.com >
2026-03-18 11:36:34 -07:00
youkaichao
70b81c4f3d
[bugfix][async scheduling] fix extra cuda context in device 0 with EP/DP ( #37449 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2026-03-18 18:32:30 +00:00
Cyrus Leung
7476d148db
[Model] Remove unnecessary processor definition for Nemotron Parse ( #37456 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-18 18:25:13 +00:00
Cyrus Leung
f3732bd931
[Misc] Clean up model registry ( #37457 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-18 18:24:44 +00:00
Wentao Ye
0ef7f79054
[Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E throughput improvement ( #37340 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-18 14:18:34 -04:00
Or Ozeri
5dd8df0701
[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec ( #36642 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-03-18 19:26:40 +02:00
Harry Mellor
39bfb57b7c
Add API docs link if the CLI arg is a config class ( #37432 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-18 17:19:35 +00:00
RonaldBXu
c9d838fc33
Adding deterministic lora benchmarking to vLLM Bench ( #36057 )
...
Signed-off-by: Ubuntu <ubuntu@ip-172-31-43-201.ap-northeast-1.compute.internal >
Signed-off-by: Ronald Xu <ronaldxu@amazon.com >
2026-03-18 16:02:03 +00:00
Xin Yang
b1169d7be8
[Kernel] Add gpt-oss Router GEMM kernel ( #37205 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-03-18 08:15:56 -07:00
XLiu-2000
17808394bc
standardize load_weights using AutoWeightsLoader for kimi_linear and minimax_text_01 ( #37371 )
...
Signed-off-by: XuLiu <xuliu40@gmail.com >
Co-authored-by: XuLiu <xuliu40@gmail.com >
2026-03-18 15:05:37 +00:00
elvischenv
296839a1b0
[Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE ( #30647 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2026-03-18 15:01:26 +00:00
Wentao Ye
c373b5c00d
[Log] Reduce duplicate log ( #37313 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-18 10:57:44 -04:00
Itay Alroy
de1a86b7de
elastic_ep: Fix stateless group port races ( #36330 )
...
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
2026-03-18 14:36:18 +00:00
Cyrus Leung
99267c23ca
[2/3] Refactor InternVL-based processors ( #37324 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-18 22:22:19 +08:00
Or Ozeri
525f2eeb0b
[kv_offload+HMA][6/N]: Split offloading_connector.py ( #37405 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-03-18 14:42:46 +01:00
Yufeng He
918b7890a1
[Bugfix] Fix base64 JPEG video frames returning empty metadata ( #37301 )
...
Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com >
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-18 13:40:03 +00:00
Andy Lo
98b09ddc27
[NIXL][Bugfix] metrics & testing minor bug ( #36051 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2026-03-18 14:39:14 +01:00
Shwetha Poojary
cef1f302d2
[Model] Enable LoRA support for tower and connector in H2OVL ( #31696 )
...
Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com >
2026-03-18 13:26:47 +00:00
Elvir Crnčević
17c47fb869
[Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy ( #37322 )
...
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-03-18 18:30:29 +08:00
Chauncey
b322b197f1
[Build] Bump python openai version ( #32316 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-18 18:20:10 +08:00
Andreas Karatzas
eaf7c9b976
[CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename ( #37328 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-18 09:44:12 +00:00
Aaron Hao
47a1f11bff
[docs] Add docs for new RL flows ( #36188 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-18 09:04:26 +00:00
Karan Bansal
fad09e8a1f
fix(glm47): improve tool call parsing and content normalization ( #37386 )
...
Signed-off-by: karanb192 <karan@example.com >
Co-authored-by: karanb192 <karan@example.com >
2026-03-18 08:12:21 +00:00
Jee Jee Li
8c31f47c63
[LoRA] Make LoRA respect language_model_only ( #37375 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-18 07:53:34 +00:00
Li, Jiang
261801242f
[Bugfix] Avoid OpenMP thread reallocation in CPU torch compile ( #37391 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-03-18 07:51:39 +00:00
Or Ozeri
fcf0687b27
[kv_offload+HMA][0/N]: Support block-level preemption handling ( #34805 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-03-18 08:49:53 +02:00
liuzhenwei
86b7e3c95a
[XPU] skip unsupported ut and update test_nixl_connector ( #37179 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-18 13:32:59 +08:00
Andrew Xia
0e95916155
[responsesAPI] parser.extract_response_outputs can take in token IDs ( #37130 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2026-03-18 05:31:31 +00:00
Andreas Karatzas
ce2ef42fd3
[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset ( #37335 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-18 05:26:20 +00:00
Andreas Karatzas
8b6325758c
[ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture ( #37349 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-18 04:55:40 +00:00
gxd3
a0dd1995c7
[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. ( #36924 )
...
Signed-off-by: Guangxiang Du <gxd@google.com >
2026-03-18 12:53:28 +08:00
Xin Yang
f1740006e4
[Perf] Enable dual stream execution of input projection for Qwen3 ( #36795 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-03-18 11:13:27 +08:00
Andreas Karatzas
58cde5c026
[ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm ( #37330 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-18 11:12:26 +08:00
Roy Wang
761e0aa7a0
[Performance] Add --enable-ep-weight-filter CLI option ( #37351 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-18 09:36:55 +08:00
Yanan Cao
ff9fbc9aff
[Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynamo-friendly ( #36705 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-18 01:23:35 +00:00
Divakar Verma
e6c4797704
[ROCm][Quantization] add fp8xfp8 attn support for rocm_aiter_unified_attn ( #36927 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-03-18 08:49:32 +08:00
Michael Goin
09e4576f65
[Kernel] Add non-gated support for NVFP4 CUTLASS MoE ( #37320 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-03-17 18:12:04 -04:00
Andreas Karatzas
3ed7b1e6e0
[ROCm] Validate block_size for explicitly selected attention backends ( #36846 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-17 17:04:40 -05:00
JartX
e8f9dbc369
[Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable cudagraph memory profiling ( #36720 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2026-03-17 17:55:34 -04:00
Yong Hoon Shin
de35c06c66
Make KV connector metadata build overridable via plugin ( #37336 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2026-03-17 21:29:06 +00:00
Athrael Soju
c0745a851a
[Model] Add ColQwen3.5 4.5B support ( #36887 )
...
Signed-off-by: Athrael Soju <athrael.soju@gmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-17 21:17:02 +00:00
Ekagra Ranjan
b5ca9c3557
[Models] Cohere ASR ( #35809 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2026-03-17 21:04:17 +00:00
Chao-Ju Chen
245758992e
[Bugfix] Rescale NVFP4 weight scales to fix BF16 dequant underflow ( #34577 )
...
Signed-off-by: ricky-chaoju <ricky.chen@infinirc.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-03-17 20:48:42 +00:00
Dimitrios Bariamis
1204cf0a9d
[Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 ( #37158 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
2026-03-17 20:13:06 +00:00
Wei Zhao
b36adfa349
[Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache ( #37252 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-03-17 20:09:20 +00:00
Michael Goin
e78821b438
[Deprecation] Deprecate --calculate-kv-scales option ( #37201 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-03-17 19:57:24 +00:00
Cyrus Leung
51f0acda79
[Model] Remove unused handle_oov_mm_token ( #37321 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-17 19:44:52 +00:00
Brian Dellabetta
fa75204b16
bump compressed-tensors version to 0.14.0.1 ( #36988 )
...
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com >
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com >
2026-03-17 15:36:19 -04:00
Wentao Ye
bdb903bb5f
[Bug] Fix FlashInfer MNNVL socket collisions under concurrent vLLM jobs ( #36674 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-17 15:19:52 -04:00
Andrey Talman
68f783a727
[Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility ( #35673 )
...
Signed-off-by: atalman <atalman@fb.com >
2026-03-17 18:47:59 +00:00
Avinash Singh
c5030c439d
[CI] Split Distributed Tests (4 GPUs) and Kernel MoE tests ( #37100 )
...
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com >
Signed-off-by: Avinash Singh <107198269+avinashsingh77@users.noreply.github.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-03-17 11:44:55 -07:00
Michael Goin
51b2333be1
[Perf] Optimize top-k search in apply_top_k_top_p_triton sampler ( #37225 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-03-17 11:35:17 -07:00
Andreas Karatzas
4ed51308c8
[CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ ( #37230 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-17 09:08:08 -07:00
Cyrus Leung
c781fbbab3
[Bugfix] Standardize custom HF Processor init ( #37289 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-17 15:38:55 +00:00
Richard Zou
979ff44cea
[BugFix] PyTorch Compilation Tests should error if any test fails ( #37300 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-17 15:26:38 +00:00
Benjamin Chislett
f63ed7b5ac
[Bugfix] Fix DP MTP Dummy Run ( #35243 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-03-17 11:16:48 -04:00
Ning Xie
c9e5096256
[openapi] remove redundant exception stack trace[4/N] ( #37157 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-03-17 15:06:25 +00:00
Anton Vlasjuk
2ff0ad9694
[UltraVox] Fix output type ( #37224 )
...
Signed-off-by: vasqu <antonprogamer@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-17 14:51:17 +00:00
Isotr0py
a836524d20
[Chore] Replace all base64 usages with faster pybase64 package ( #37290 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-17 14:44:19 +00:00
Bhoomit
3717a4dd47
[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules ( #34984 )
...
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-17 14:36:41 +00:00
Harry Mellor
ecfcdd2ce4
Fix Phi3 test that fails with Transformers v5 ( #37298 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-17 14:29:24 +00:00
Siew's Capital Jarvis
c25dbc2d27
[Bugfix] Fix unclean shutdown crash with AllReduce Fusion workspace ( #36955 )
...
Signed-off-by: Jarvis <brayden.stanley.0127@gmail.com >
2026-03-17 14:22:09 +00:00
Jonas M. Kübler
77d2a5f17b
pick up tuned prefill configs for FP8 FA3 ( #36265 )
...
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com >
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
2026-03-17 07:00:26 -07:00
Sage
59192dfd39
[Frontend] Complete OpenAI render delegation ( #37287 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-17 13:53:55 +00:00
Umut Polat
56cb1baa66
[Misc] Use VLLMValidationError in batch, pooling, and tokenize protocol validators ( #36256 )
...
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com >
2026-03-17 13:52:30 +00:00
Cyrus Leung
f340324335
[1/2] Move InternVL-based processors ( #37260 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-17 21:50:56 +08:00
sfbemerk
2660b9289c
Bugfix for offloading+prefetch for GLM-4.7-FP8 ( #37178 )
...
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
2026-03-17 21:22:09 +08:00
Viacheslav
293f036e6d
Add gigachat 3.1 tool parser + fix gigachat3 tool parser ( #36664 )
...
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com >
2026-03-17 12:03:20 +00:00
youkaichao
0fb142a454
[perf][connector] optimize build_connector_meta when host buffer transfer is not used ( #37165 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2026-03-17 11:59:35 +00:00
Sage
00f8e0d211
[Frontend] Delegate tokenization serving preprocessing to OpenAIServingRender ( #37266 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-17 11:22:54 +00:00
zhao, zhenhui
4af9ed21cb
[Bugfix](xpu): prevent “selected index k out of range” in TP decode path ( #37259 )
...
Signed-off-by: zhenzhao <zhenzhao@habana.ai >
2026-03-17 11:14:07 +00:00
Augusto Yao
9c7cab5ebb
[Feature]: Support for multiple embedding types in a single inference call ( #35829 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
2026-03-17 17:05:42 +08:00
Chauncey
132bfd45b6
[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens ( #37258 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-17 08:54:52 +00:00
xiao-llm
24b4272a8c
Fix infinite recursive search issue in quark.py ( #32779 )
...
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com >
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com >
Signed-off-by: kimheesu <wlskaka4@gmail.com >
Co-authored-by: Yanwen Lin <lyw1124278064@gmail.com >
Co-authored-by: Kim Hee Su <wlskaka4@gmail.com >
2026-03-17 07:19:15 +00:00
Benjamin Chislett
8a680463fa
[Bugfix] Fix NemotronH MTP + Chunked Prefill ( #35447 )
2026-03-17 07:07:33 +01:00
Nick Cao
20b14095a4
[Bugfix] Fix loading Music Flamingo ( #35535 )
...
Signed-off-by: Nick Cao <ncao@redhat.com >
2026-03-17 05:24:40 +00:00
PatchyTIS
17c1bdf371
[Bugfix] dtype mismatch in ngram gpu propose ( #37246 )
...
Signed-off-by: PatchouliTaisa <patchychen@tencent.com >
Co-authored-by: PatchouliTaisa <patchychen@tencent.com >
2026-03-17 05:19:55 +00:00
Flora Feng
3e3d320c1b
[Refactor] Relocate responses API tests ( #37241 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-17 05:14:52 +00:00
Andreas Karatzas
54a62a79f7
[ROCm] Fix AttributeError for torch.compiler.skip_all_guards_unsafe on older PyTorch ( #37219 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-17 11:34:49 +08:00
Flora Feng
384dc7f77b
[Refactor] Relocate completion and chat completion tests ( #37125 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-17 11:31:23 +08:00
Flora Feng
f04d5226f8
[CI] Fix flaky tool_use chat completion tests with deterministic seed ( #37027 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-17 03:24:34 +00:00
Kyuyeun Kim
0a0a1a198b
Add ability to replace oot ops when using lora ( #37181 )
...
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com >
2026-03-16 18:04:15 -07:00
Vadim Gimpelson
6c1cfbad32
Support non-contiguous KV cache in TRTLLM fp8 dequant kernel ( #36867 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com >
Co-authored-by: Pavani Majety <pavanimajety@gmail.com >
2026-03-16 17:48:42 -07:00
Harry Huang
45f526d652
[BugFix] Correct max memory usage for multiple KV-cache groups ( #36030 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-03-17 00:38:52 +00:00
Julien Denize
5db91f0aaf
Fix some Mistral parser issues ( #37209 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2026-03-17 00:08:56 +00:00
Walter Beller-Morales
061980c36a
[Feature][Frontend] add support for Cohere Embed v2 API ( #37074 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-03-16 19:55:53 -04:00
Ben Browning
7a49742b88
[CI/Build] Add common tool call parser test suite ( #27599 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2026-03-16 19:46:20 -04:00
Terry Gao
3e6a1e1686
[Custom Ops] Add functional + out variant for scaled_fp4_quant ( #34389 )
...
Signed-off-by: tianrengao <terrygao87@gmail.com >
2026-03-16 18:51:46 -04:00
Julien Denize
7961486a9b
Fix EagleMistralLarge3Model initialization ( #37232 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2026-03-16 15:41:00 -07:00
Andreas Karatzas
4f9b14c21c
[CI] Stabilize multinode DP internal LB completion tests ( #36356 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-16 15:40:23 -07:00
Yuchen Fama
31a458c091
[Doc] Clarify schema enforcement behavior for tool_choice modes ( #37064 )
...
Signed-off-by: yfama <yuchengu@gmail.com >
2026-03-16 22:27:42 +00:00
Wei Zhao
a3a51d20e7
[Benchmark] Improvements to attention benchmark script ( #37115 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-03-16 22:22:40 +00:00
EdalatiAli
e5b807607c
[Quant][Feature] Support online MXFP8 quantization for MoE and dense models ( #35448 )
...
Signed-off-by: EdalatiAli <aliedalati@cohere.com >
2026-03-16 18:07:39 -04:00
Elvir Crnčević
fd4d96302a
Fix eplb nvfp4 experts hook ( #37217 )
...
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com >
Signed-off-by: Elvir Crncevic <elvir@anthropic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-16 22:03:54 +00:00
Krish Gupta
c0f011918d
[Bugfix] opcheck false mutation error in rms_norm_per_block_quant ( #36688 ) ( #36779 )
...
Signed-off-by: Krish Gupta <krishom70@gmail.com >
2026-03-16 21:11:33 +00:00
Zhengxu Chen
e6ae4b1be1
[compile] Enable mega aot artifact for torch 2.12+. ( #37198 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-16 21:05:51 +00:00
zhanqiuhu
2dccb38f73
[Bugfix][MultiConnector] Fix MultiConnector for SupportsHMA sub-connectors ( #36549 )
2026-03-16 20:51:04 +00:00
Kunshang Ji
d157216093
[BUGFIX][Mamba] Use uint64 for address in KVBlockZeroer ( #37197 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-16 21:39:56 +01:00
Matthew Bonanni
93f3c8e531
[Misc] Add float16 to CacheDType ( #37199 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-16 13:24:48 -07:00
rasmith
2cc26c3a99
[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test ( #37213 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-03-16 13:22:57 -07:00
Flora Feng
dfa8852db2
[Refactor] Consolidate GPT-OSS reasoning parser tests ( #36915 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
Signed-off-by: Flora Feng <4florafeng@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-16 15:53:07 -04:00
Lucas Kabela
714c6e0eab
[torch.compile][BE] Modify cudagraph callable to check for is_forward_context_set ( #36288 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-03-16 19:42:34 +00:00
Sage
0fefd00e6c
[Bugfix] Fix render server crash for quantized models on CPU-only hosts ( #37215 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-16 18:59:01 +00:00
Nicolò Lucchesi
f5c081d432
[PD][Nixl] Add support for hybrid SSM-FA models ( #36687 )
2026-03-16 19:58:06 +01:00
Matthew Bonanni
c88ea8338b
[MTP][Sparse MLA] Take advantage of native MTP support in indexer when possible ( #36982 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-16 13:51:21 -04:00
Max de Bayser
9f9ecff4cd
Add simple granite4 tool parser ( #36827 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2026-03-16 10:49:09 -07:00
haosdent
ca1954d58c
[Bugfix] Disable cross-layer KV cache for MLA attention backends ( #37090 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
2026-03-16 19:03:10 +02:00
Raushan Turganbay
55e6d3d5c0
[Bugfix] Make siglip/clip compatible with transformers v5 ( #37200 )
...
Signed-off-by: raushan <raushan@huggingface.co >
2026-03-16 16:48:18 +00:00
Chauncey
6682c231fa
[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing ( #37148 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-16 16:27:47 +00:00
Itay Etelis
5ae685c1c8
[Bugfix] Relax TRTLLM KV cache contiguity assertion for cross-layer layout ( #34158 )
...
Signed-off-by: Itay Etelis <itay.etelis@ibm.com >
Co-authored-by: Itay Etelis <itay.etelis@ibm.com >
2026-03-16 11:20:51 -04:00
Wentao Ye
ce8cf9161d
[Compile] Fix compile warning st256_cs in cuda_vec_utils.cuh ( #36693 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-16 11:12:15 -04:00
xjx
18be11fd59
[BUGFIX]fix CUDA OOM ERROR : invalid argument at cumem_allocator.cpp:119 ( #35594 )
...
Signed-off-by: xjx <493337577@qq.com >
2026-03-16 15:10:42 +00:00
Yuanheng Zhao
8d8855fdae
[Bugfix] Add safety check and fallback for null scaling factor ( #36106 )
...
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-16 14:27:29 +00:00
Wentao Ye
e855d380fa
[Compile] Fix compile warning in moe_permute ( #36529 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-16 10:16:14 -04:00
Benjamin Bartels
0e5a9382af
[Bugfix] accept redacted thinking blocks in Anthropic messages ( #36992 )
...
Signed-off-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local >
Signed-off-by: bbartels <benjamin@bartels.dev >
Co-authored-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local >
2026-03-16 22:01:57 +08:00
Fynn Schmitt-Ulms
04bf5a35fa
[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear ( #37013 )
2026-03-16 14:53:45 +01:00
Tianyu Guo
43a73f853b
Remove unused EVS functions in qwen3_vl.py ( #37183 )
...
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn >
2026-03-16 13:09:09 +00:00
Julien Denize
ffbc2e5bdb
Patch Mistral config ( #37104 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2026-03-16 12:22:18 +00:00
Lukas Geiger
f9e6db3034
[Models][Qwen3 ViT] Keep max_seqlen on CPU to prevent D2H sync ( #37139 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-16 12:11:59 +00:00
elvischenv
d61d2b08e9
[Build] Fix API rate limit exceeded when using VLLM_USE_PRECOMPILED=1 ( #36229 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-16 12:09:27 +00:00
Artem Perevedentsev
f5e59ee7a6
[Performance] Add prefetch for checkpoints to OS page cache ( #36012 )
...
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com >
2026-03-16 11:32:02 +00:00
Harry Mellor
9b005edc48
[Docs] Make the link to hardware plugins clearer ( #37174 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-16 04:12:58 -07:00
Robin Nabel
bf9a185395
GLM4 tool parser: fix streaming mode ( #35208 )
...
Signed-off-by: Robin Nabel <opensource@nabel.co >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-03-16 18:48:52 +08:00
Harry Mellor
ad041c79db
Fix text only inputs for MRoPE models with the Transformers modelling backend ( #37055 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-16 10:31:16 +00:00
Kunshang Ji
747b068136
[Hardware] Replace memory related torch.cuda APIs ( #37031 )
...
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
2026-03-16 10:24:48 +00:00
Harry Mellor
122f75d939
Fix pipeline parallel with multimodal models with the Transformers modelling backend ( #37057 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-16 10:20:37 +00:00
SoluMilken
d8f8a7aad2
[Misc] Sync pre-commit to 4.5.1 in workflows and docs ( #36675 )
...
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-16 10:03:21 +00:00
Roy Wang
0115e957d4
[Frontend][Misc] Remove unused log in /is_sleeping ( #37093 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2026-03-16 17:46:28 +08:00
haosdent
116ed130f4
[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches ( #34871 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-03-16 10:30:23 +01:00
Vadim Gimpelson
8374387bd8
[FlashInfer] Revert block_size 16 + head_size 256 workaround on Blackwell ( #36987 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-03-16 09:04:29 +00:00
Isotr0py
912fbe9555
[Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs ( #37147 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-16 08:56:06 +00:00
Laith Sakka
52131f88d9
use skip_all_guards_unsafe to drop global_state and torch_function_mode_stack guards instead of previous hacks ( #36204 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2026-03-16 08:52:31 +00:00
Roy Wang
821eb80c0d
[Performance][Model Loader] Skip non-local expert weights during EP model loading ( #37136 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2026-03-16 01:33:36 -07:00
Andreas Karatzas
a2956a0f8e
[ROCm][CI] Retrying in case of batch variance effects and reducing flakiness ( #36442 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-16 16:08:51 +08:00
Andreas Karatzas
911355e216
[ROCm] Fix KV copy methods and auto-select attention backend for ROCm ( #36845 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-16 16:07:27 +08:00
Chauncey
8d3f8f485e
[Bugfix] fix Qwen3.5 tool calling bug ( #36774 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-16 15:38:42 +08:00
Woosuk Kwon
96efb91480
[Model Runner V2] Fix processed logits in sample() ( #37144 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-16 00:35:49 -07:00
leo-cf-tian
2754231ba3
[Kernel] Add FlashInfer MoE A2A Kernel ( #36022 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Signed-off-by: Leo Tian <lctian@nvidia.com >
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com >
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com >
2026-03-15 23:45:32 -07:00
bigshanedogg
2390d44209
[Model] Add HyperCLOVAX-SEED-Think-14B language model support ( #37107 )
...
Signed-off-by: bigshanedogg <bigshane319@gmail.com >
2026-03-16 06:40:05 +00:00
Li, Jiang
7362b4450a
[Bugfix] Avoid LD_PRELOAD check on MacOS ( #37145 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-03-15 23:31:44 -07:00
Andreas Karatzas
57a314d155
[CI][Bugfix] Fix 500 errors from priority overflow and TemplateError subclasses in schema fuzz tests ( #37127 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-16 05:27:21 +00:00
Andreas Karatzas
d4c57863f7
[ROCm][CI] Fix engine teardown and text normalization to stabilize voxtral test ( #37138 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-16 04:49:31 +00:00
Wang, Yiting
68e1b711f1
[XPU] Add deepseek_scaling_rope fused kernel ( #36612 )
...
Signed-off-by: yitingw1 <yiting.wang@intel.com >
2026-03-16 12:35:08 +08:00
rasmith
0024f39a32
[ROCm][P/D][MORI][BugFix] Add transfer_id for moriio_connector so moriio_connector to restore P/D functionality ( #34907 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-03-16 10:36:51 +08:00
Andrew Xia
e9163b536e
[responsesAPI][ez] add a unit test for SimpleContext logprobs ( #37126 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2026-03-15 17:12:26 -07:00
Lalithnarayan C
7acaea634c
In-Tree AMD Zen CPU Backend via zentorch [1/N] ( #35970 )
...
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-03-15 23:35:35 +00:00
Jiangyun Zhu
697e4ff352
[GDN] add a config for gdn kernel selection ( #36647 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-16 00:40:17 +08:00
Hari
a3e2e250f0
[Feature] Add Azure Blob Storage support for RunAI Model Streamer ( #34614 )
...
Signed-off-by: hasethuraman <hsethuraman@microsoft.com >
2026-03-15 19:38:21 +08:00
Isotr0py
143e4dccdf
[Misc] Add online audio_in_video test ( #36775 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-15 00:14:11 -07:00
Isotr0py
6590a3ecda
[Frontend] Remove torchcodec from audio dependency ( #37061 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-15 05:15:59 +00:00
Russell Bryant
b3debb7e77
[Build] Upgrade xgrammar to get a security fix ( #36168 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-15 03:13:48 +00:00
Nick Hill
458c1a4b2d
[Frontend] Reduce chat template warmup logging levels ( #37062 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-14 13:48:59 -07:00
Karan Bansal
821fde2df4
[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference ( #32384 )
...
Signed-off-by: Karan Bansal <karanb192@gmail.com >
Co-authored-by: Inokinoki <inoki@inoki.cc >
2026-03-14 17:29:06 +00:00
arlo
8c29042bb9
[Feature] Add InstantTensor weight loader ( #36139 )
2026-03-14 18:05:23 +01:00
Cyrus Leung
5467d137b3
[Frontend] Avoid startup error log for models without chat template ( #37040 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-14 09:36:11 -07:00
Santino Ramos
3ed46f374b
[Model Runner V2] Add Support for XD-RoPE ( #36817 )
...
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com >
2026-03-14 09:27:55 -07:00
seanmamasde
84868e4793
[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM formats ( #35109 )
...
Signed-off-by: seanmamasde <seanmamasde@gmail.com >
2026-03-14 08:44:03 -07:00
Isotr0py
a8e8d62dd8
[Misc] Clean up Kimi-audio whisper encoder loading ( #36903 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-14 23:37:52 +08:00
Julien Denize
e42b49bd69
Mistral common v10 ( #36971 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: root <root@h200-bar-196-227.slurm-bar-compute.tenant-slurm.svc.cluster.local >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-14 07:26:43 -07:00
Sergey Zinchenko
4a718e770d
[Bug] Fix Failure in /v1/chat/completions/render for Multimodal Requests ( https://github.com/vllm-project/vllm/issues/35665 ) ( #35684 )
2026-03-14 14:10:11 +00:00
Kevin H. Luu
600a039f57
[CI] Shard Multi-Modal Models (Standard) into 4 parallel jobs ( #37014 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-14 08:26:54 +00:00
Harry Mellor
ffa5d74f15
Enable loading of fused expert weights in the Transformers modelling backend ( #36997 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-14 07:01:06 +00:00
Kevin H. Luu
74fe80ee95
[CI] Split Distributed Tests (4 GPUs) into 3 parallel jobs ( #37015 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-14 12:21:13 +08:00
Flora Feng
bcfdadb1bc
[Refactor] Relocate chat completion and anthropic tests ( #36919 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-14 12:16:16 +08:00
Yanan Cao
236de72e49
[CI] Pin helion version ( #37012 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-13 23:25:29 -04:00
sbeurnier
a116f96930
[V1] Remove pin_memory() in async_copy_to_gpu to fix sporadic stalls ( #37006 )
...
Signed-off-by: Sebastien Beurnier <sbeurnier@together.ai >
2026-03-14 01:37:32 +00:00
Li, Jiang
092ace9e3a
[UX] Improve UX of CPU backend ( #36968 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-14 09:27:29 +08:00
Andrew Xia
f680dc1b39
[responsesAPI] prioritize content over summary in reasoning item input ( #36516 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <mitandrewxia@gmail.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Andrew Xia <axia@fb.com >
2026-03-14 09:20:30 +08:00
Giulio Leone
b41aa264f9
fix: resolve chat template names before kwargs detection ( #36937 )
...
Co-authored-by: giulio-leone <giulio.leone@users.noreply.github.com >
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-03-14 00:20:16 +00:00
Dimitrios Bariamis
367cf5cd3e
[Feat][Bugfix] Enable additional dimension for Flashinfer MLA and fix routing dtype ( #36931 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
2026-03-13 16:41:16 -07:00
haosdent
6d53efd2a5
[Bugfix] Fix MLA attention crash with AWQ/GPTQ quantized models ( #34695 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-03-13 23:25:41 +00:00
Benjamin Chislett
8b346309a5
[Refactor] Consolidate SupportsEagle ( #36063 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-03-13 23:22:40 +00:00
Nick Hill
54a6db827f
[BugFix] Fix "DP Coordinator receives unexpected..." messages ( #37008 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-13 23:18:05 +00:00
Matthew Bonanni
9efc4db965
[Bugfix] Fix DeepSeek-V3.2 tokenizer stripping spaces ( #37004 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-13 22:55:36 +00:00
Kevin H. Luu
f1816fb192
[CI] Split V1 e2e + engine (1 GPU) into separate jobs ( #36945 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-13 14:16:02 -07:00
Harry Mellor
0005d2a3c9
Use Transformers v5 WeightRenaming for Transformers modeling backend ( #31545 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-13 20:49:08 +00:00
Ekagra Ranjan
d0b402974f
[Bugfix][Spec Decode] Avoid double call of Ngram CPU ( #36952 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2026-03-13 20:33:19 +00:00
Divakar Verma
6341d43043
[ROCm][Quantization] add quark w4a8 mxfp4_fp8 for LinearLayer ( #35316 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-03-13 19:44:24 +00:00
Mark McLoughlin
7afe0faab1
[Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish ( #36666 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-13 12:10:06 -07:00
Harry Mellor
5a3f1eb62f
[Misc] Set default kv_buffer_device in a better way ( #36862 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-13 19:07:33 +00:00
yugong333
b3ce711b93
Fp8 lora dense kernel ( #35242 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
2026-03-13 19:05:08 +00:00
Isotr0py
abf61aaa8e
[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request ( #36800 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-13 18:16:05 +00:00
bigmoyan
4508532fbd
[Bugfix] fix paddleocr crash on some image shape ( #36959 )
...
Signed-off-by: wangzhengtao <wangzhengtao@msh.team >
Signed-off-by: bigmoyan <moyan_work@foxmail.com >
Co-authored-by: wangzhengtao <wangzhengtao@msh.team >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-13 13:46:55 +00:00
Itay Alroy
d5af196c18
[2/N] Elastic EP Milestone 2: Integrating NIXL-EP ( #35627 )
...
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
Co-authored-by: Yongji Wu <wuyongji317@gmail.com >
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com >
2026-03-13 09:25:33 -04:00
Chaojun Zhang
82f836d976
[XPU] Support LoRA via torch.compile on XPU platform ( #36962 )
...
Signed-off-by: chzhang <chaojun.zhang@intel.com >
2026-03-13 10:34:59 +00:00
Andreas Karatzas
4fccd30f19
[ROCm][CI] Upgrading orchestrator to handle python pipeline markers and options ( #36181 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-13 02:04:22 -07:00
Or Ozeri
cfaf4668f7
[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec ( #36610 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-03-13 08:04:21 +00:00
Andreas Karatzas
99a57bdf74
[ROCm][CI] Corrected the GPT-OSS test root path ( #36711 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-13 15:53:43 +08:00
Sage
a2268617cf
[Frontend] Delegate preprocessing to OpenAIServingRender ( #36483 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-13 00:39:43 -07:00
Rohan Potdar
a4ad9db541
Enable RoPE+KV cache fusion for ROCm AITER FA (non-shuffle layout) ( #35786 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-03-13 07:33:22 +00:00
Nick Hill
b373b5102a
[Tests] Shutdown test RemoteVLLMServer cleanly ( #36950 )
...
Recent PR #33949 changed the teardown logic of the RemoteVLLMServer test utility class to
send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated
shutdown logic that assumes only the top-level process will receive a signal (for example
when running in a container that's shut down).
This caused a bunch of errors and stacktraces in some test logs, even though those tests
still pass. We should still attempt a normal shutdown and only kill other procs if they are
still running after a few seconds.
Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-13 07:32:55 +00:00
Thomas Parnell
f296a1966d
[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs ( #36876 )
2026-03-13 07:09:39 +01:00
Csrayz
bc2c0c86ef
[Frontend] Fix usage incorrectly returned with empty stream_options` ( #36379 )
...
Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com >
2026-03-13 03:33:04 +00:00
jaime campos salas
891c60dcd5
fix(kv-cache): increase hybrid attention grouping threshold from 1.25 to 1.5 ( #36684 )
...
Signed-off-by: Jaime Campos Salas <jaime.campos.salas@gmail.com >
2026-03-12 23:28:27 -04:00
whyiug
1ce13cf992
[Model] Add support for BERT-like Chinese ERNIE pooling models ( #36385 )
...
Signed-off-by: whyiug <whyiug@hotmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-13 03:23:53 +00:00
Nikita
10f08dedfa
[Model] Add ColPali late interaction model for multi-modal retrieval ( #36818 )
...
Signed-off-by: Nikita Sukharev <kaonael@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-13 02:18:57 +00:00
Aaron Hao
5e1a373d2e
[BUG] Fix rank calculation in NCCLWeightTransferEngine ( #36940 )
...
Signed-off-by: hao-aaron <ahao@anyscale.com >
2026-03-13 01:56:51 +00:00
Simo Lin
572c776bfb
build: update smg-grpc-servicer to use vllm extra ( #36938 )
...
Signed-off-by: Simo Lin <linsimo.mark@gmail.com >
2026-03-13 01:31:36 +00:00
Yifan Qiao
55d8073d06
[Bugfix] ep_scatter kernel store-load race condition ( #34991 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu >
2026-03-13 01:07:59 +00:00
Nick Hill
cd32d6f586
[Model Runner V2] Some code simplification ( #36929 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-13 00:59:23 +00:00
Jaewon
aaa3092f51
[MoE] Add routing simulation override for MXFP4 quantized MoE ( #33595 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
2026-03-13 00:30:44 +00:00
Shubhra Pandit
87985077a4
[Speculative Decoding] Add norm_before_fc for gpt-oss draft models ( #36545 )
...
Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2026-03-12 23:03:32 +00:00
Ryan Rock
a79c1c2c80
[AMD][Build] Add DeepEP to ROCm Dockerfile ( #36086 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2026-03-12 21:33:32 +00:00
Andreas Karatzas
cc8f1f4764
[ROCm][CI] Preparing gfx90a mirroring ( #36210 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-12 13:42:25 -07:00
Michael Goin
05b9e8ab5b
Revise environment setup in AGENTS.md ( #36909 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-12 19:21:11 +00:00
Xinan Miao
2cdf92228c
[Feature]: Remove Chunking From FusedMoE ( #34086 )
...
Signed-off-by: SouthWest7 <am1ao@qq.com >
Signed-off-by: Southwest <1403572259@qq.com >
Signed-off-by: southwest <am1ao@qq.com >
Signed-off-by: Xinan Miao <1403572259@qq.com >
Co-authored-by: SouthWest7 <am1ao@qq.com >
2026-03-12 14:24:38 -04:00
Marc Sun
c973ecdead
[bnb] Skip moe + bnb test ( #36896 )
...
Signed-off-by: Marc Sun <marc@huggingface.co >
2026-03-12 18:03:25 +00:00
Harry Mellor
e39257a552
Add AGENTS.md ( #36877 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-12 10:20:50 -07:00
Dimitrios Bariamis
cc16b24b17
Update Flashinfer to 0.6.6 ( #36768 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
2026-03-12 13:19:19 -04:00
Eunkwang Jeon
bdc2343454
[Bugfix] Fix KeyError in parse_response_input for reasoning items with optional content ( #34499 )
...
Signed-off-by: jeonsworld <jeonsworld@gmail.com >
2026-03-13 00:13:36 +08:00
Matthew Bonanni
f444c05c32
[Attention] Use FA4 for MLA prefill ( #34732 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-12 12:10:17 -04:00
SoluMilken
85199f9681
[Bugfix] fix main branch pre-commit error (1 line change) ( #36897 )
...
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw >
2026-03-12 09:08:37 -07:00
grimulkan
a1257fd1ea
[Kernel] Add FP8 KV cache support to Triton MLA decode attention ( #34597 )
...
Signed-off-by: grimulkan <grimulkan@gmail.com >
2026-03-12 08:32:34 -07:00
Thomas Parnell
abcffbba8c
[CI] Fix mypy pre-commit errors on main ( #36882 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-12 08:22:29 -07:00
Kunshang Ji
53ec16a705
[Hardware] Replace torch.cuda.device_count/current_device/set_device API ( #36145 )
...
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-12 07:57:47 -07:00
Wei Zhao
2e693f48e7
[Perf] Add TRTLLM FP8 MoE Modular Kernel ( #36307 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-03-12 07:32:31 -07:00
Martin Hickey
7f1f36bf91
[CI] Fix mypy for vllm/reasoning ( #35742 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-12 12:21:33 +00:00
Mark McLoughlin
5282c7d4d0
[docs] Add lightweight AI assisted contribution policy ( #30947 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-03-12 11:46:13 +00:00
caozuoba
9e19f8338b
[Perf] add packed recurrent fast path for decode ( #36596 )
...
Signed-off-by: hdj <1293066020@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-12 04:01:57 -07:00
Sage
06e0bc21d2
[Frontend] Split OpenAIServingModels into OpenAIModelRegistry + OpenAIServingModels ( #36536 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-12 03:29:37 -07:00
Chauncey
5a71cdd76e
[Bugfix] Fix crash when tool_choice=required exceeds max_tokens ( #36841 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-12 03:28:45 -07:00
Shanshan Shen
f0d3658c0f
[MM][OOT] Support CPU seq_lens for OOT MMEncoderAttention kernels ( #36605 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-12 03:28:23 -07:00
Michael Goin
57431d8231
[UX] Only show FP4 Marlin fallback warning for w4a4 models ( #36806 )
...
Co-authored-by: Claude <noreply@anthropic.com >
2026-03-12 05:19:35 -04:00
Xu Jinyang
3e64fe4a18
[Bugfix] Warm up Triton autotuner for GDN layers during V1 profiling ( #36599 )
...
Signed-off-by: AuYang <459461160@qq.com >
2026-03-12 00:51:09 -07:00
sfeiqiang
8cb24d3aed
[KV Connector] Support using FlexKV as KV Cache Offloading option. ( #34328 )
...
Signed-off-by: phaedonsun <phaedonsun@tencent.com >
Co-authored-by: phaedonsun <phaedonsun@tencent.com >
2026-03-12 00:46:20 -07:00
István Ketykó
00726c74c9
[Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop ( #36670 )
...
Signed-off-by: István Ketykó <istvan.ketyko@gmail.com >
2026-03-12 15:35:54 +08:00
Chauncey
9fe404ed04
[Frontend] OpenAI Responses API supports Tool/Function calling with streaming ( #29947 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-12 15:03:50 +08:00
Sage
802f306cd1
[Tests] Skip model weight download for render-only test server ( #36813 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-12 06:24:42 +00:00
Yan Ma
894843eb25
replace with torch.cuda.device with with torch.accelerator.device_index ( #36144 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2026-03-11 23:12:57 -07:00
Yanan Cao
584a3f56de
[Kernel][Helion][13/N] Force static_shapes=False in helion register ( #36677 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-12 05:35:29 +00:00
Nick Hill
36735fd772
[BugFix] Fix multiple/duplicate stdout prefixes ( #36822 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-12 12:23:21 +08:00
wang.yuqi
6ecabe4936
[CI Failure] Fix Language Models Test (Extended Pooling) daily CI Failure ( #36761 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-12 12:22:05 +08:00
Woosuk Kwon
2f8b4ce0c0
[Model Runner V2] Do not initialize sampler for non-last PP ranks ( #36824 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-12 03:55:28 +00:00
Yuwei An
2ef69456f5
[LMCache] Fault Tolerance Mechanism ( #36586 )
...
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com >
2026-03-12 03:54:39 +00:00
Louie Tsai
17852aa503
more models for vLLM Benchmark Suite ( #35086 )
...
Signed-off-by: louie-tsai <louie.tsai@intel.com >
2026-03-12 11:36:51 +08:00
Flora Feng
8647c6cf51
[Bugfix] Fix minimax_m2 tool parser when stream interval > 1 ( #35895 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-12 10:25:14 +08:00
Kunshang Ji
513949f95f
[XPU][Doc] Remove manual OneAPI install step, now handled by torch-xpu ( #36831 )
...
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-03-12 01:46:02 +00:00
Nick Hill
262b76a09f
[Frontend] Exclude anthropic billing header to avoid prefix cache miss ( #36829 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-12 01:20:34 +00:00
Wentao Ye
c34ba6b961
[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement ( #36710 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-12 08:37:01 +08:00
Matthias Gehre
24062b704f
[ROCm][CI/Build] Add gfx1152/gfx1153 (Krackan) to HIP supported architectures ( #36499 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-03-11 23:14:40 +00:00
Aaron Hao
d6b61e5166
[BUG] Fix async rlhf tests ( #35811 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-03-11 18:06:10 -04:00
Yanan Cao
cf632499ee
[Kernel] [Helion] [15/N] Split config files into per-platform files ( #36698 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 17:25:29 -04:00
Yanan Cao
a3774a8198
[Kernel] [Helion] [12/N] Use FakeTensorMode to avoid GPU allocation during config key computation ( #36563 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 17:25:16 -04:00
Yanan Cao
0ce21c46a0
[Kernel] [Helion] [14/N] Set autotune_ignore_errors=True during autotuning ( #36683 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 17:25:04 -04:00
Woosuk Kwon
55eed6b7a5
[Model Runner V2] Add WhisperModelState [6/N] ( #35790 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-11 14:20:38 -07:00
Giancarlo Delfin
c77181e534
[Model Runner V2] Add probabilistic rejection sampling for spec decoding ( #35461 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-03-11 14:04:32 -07:00
maobaolong
12001f2ebc
[LMCache] Pass TP size in lookup for MLA multi-reader locking ( #36129 )
...
Signed-off-by: baoloongmao <baoloongmao@tencent.com >
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu >
2026-03-11 20:45:20 +00:00
Or Ozeri
7ee5d5093b
[BugFix][kv_offload] Fix offloading decodes with async scheduling ( #33881 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-03-11 20:43:40 +00:00
jennyyyyzhen
428bc718bd
[Bugfix][ROCm] Strip block_size before attention backend validation ( #36274 )
...
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-03-11 13:37:31 -07:00
汪志鹏
ff1e3d9c63
[BugFix]: add bagel to MM_PREFIX_LM_MODELS ( #36316 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com >
2026-03-11 19:55:59 +00:00
Wentao Ye
35bdca5431
[Refactor] Remove dead code in KV connector ( #36424 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-11 19:40:17 +00:00
Amanzhol Salykov
8a24842765
[ROCm] add tuned moe_wna16_triton kernel configs for CDNA4 ( #35093 )
...
Signed-off-by: salykova <amsalykov@gmail.com >
Signed-off-by: amd-asalykov <asalykov@amd.com >
2026-03-11 19:00:08 +00:00
Harry Mellor
65986db6ba
Make Gemma and Gemma 2 accept inputs_embeds like Gemma 3 ( #36787 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 18:12:43 +00:00
Luka Govedič
9556af87d5
[torch.compile] Add support for non-contiguous fused RMSNorm + group quant ( #36551 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
2026-03-11 10:56:55 -07:00
Or Ozeri
a1a3523a56
[KVConnector] Support worker -> scheduler metadata ( #31964 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-03-11 17:36:37 +00:00
tianshu-Michael-yu
741f4e046b
fix: align lfm2 thumbnail token counting with HF ( #36707 )
2026-03-11 10:28:38 -07:00
Julien Denize
a5d06dc557
Add 320 dimension size support to MLA ( #36161 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2026-03-11 10:21:22 -07:00
Harry Mellor
5efa206a8c
Fix ExaoneMoeMTP test that never ran in Transformers v4 ( #36792 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 17:10:23 +00:00
Cyrus Leung
196802dfa6
[Misc] Clean up renderers ( #36770 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-11 16:39:29 +00:00
Isotr0py
c84b519cf3
[Bugfix] Fix negative max_tokens when input prompt is too long ( #36789 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-11 16:30:51 +00:00
Flora Feng
741ecf0630
[CI] Add bfcl tool call correctness eval ( #36560 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-03-11 12:27:36 -04:00
Robert Shaw
b7e5a588d8
[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels ( #36061 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-11 16:07:14 +00:00
Richard Zou
822e250ab7
[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation ( #36093 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-11 16:07:09 +00:00
Hongxin Xu
bea02cdf93
Fix routed experts capture for hybrid models (Mamba + Attention) ( #35744 )
...
Signed-off-by: arlenxu <arlenxu@tencent.com >
Signed-off-by: xhx1022 <1737006628@qq.com >
Co-authored-by: arlenxu <arlenxu@tencent.com >
2026-03-11 08:53:10 -07:00
Julien Denize
a3ea760ea5
Add 'none' reasoning effort to ChatCompletionRequest ( #36238 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2026-03-11 15:45:34 +00:00
Harry Mellor
35db669f1d
Correct link to supported hardware on vllm.ai ( #36798 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 08:43:28 -07:00
Julien Denize
afebeffbfb
Add support to Mistral large 3 eagle with dense layers ( #36163 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-11 15:42:56 +00:00
Jhao-Ting Chen
5573894737
Kimi k2.5 MLA based eagle3 ( #36361 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com >
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com >
Co-authored-by: Izzy Putterman <iputterman@nvidia.com >
2026-03-11 11:36:11 -04:00
Harry Mellor
d5816c8c2f
Fix tied weights in weight mapping test for Transformers v5 ( #36788 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 15:10:26 +00:00
Woosuk Kwon
8ccbcda5c0
[Model Runner V2] Remove unused warmup_for_prefill method ( #36762 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-11 08:02:44 -07:00
tvirolai-amd
a9e532afe2
[ROCm][Perf] Allow MTP lens > 1 in Sparse MLA ( #36681 )
...
Signed-off-by: Teemu Virolainen <teemu.virolainen@amd.com >
2026-03-11 14:43:03 +00:00
Harry Mellor
f3163bba67
Disable docs build skipping until a better solution is found ( #36790 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 13:53:23 +00:00
Martin Hickey
700a1ddc65
[Misc] Use envs module to get VLLM_DISABLED_KERNELS ( #35776 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
2026-03-11 13:37:46 +00:00
Silvia Colabrese
f33251ffc8
[Bugfix] Fix Mistral-small --format ( #36782 )
...
Signed-off-by: 12010486 <silvia.colabrese@intel.com >
2026-03-11 04:47:52 -07:00
Wuxun Zhang
e584dce52b
Add XPU MLA Sparse backend for DeepSeek v3.2 ( #33230 )
...
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com >
2026-03-11 19:19:15 +08:00
Ning Xie
40c0461f24
[openapi] refactor render related openapi [3/N] ( #36749 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-03-11 03:14:34 -07:00
Weiguang Li
724759684c
[Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps ( #36136 )
...
Signed-off-by: OiPunk <codingpunk@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 03:13:06 -07:00
Michael Goin
9c34e9d24f
Disable cascade attention by default ( #36318 )
2026-03-11 03:12:23 -07:00
Richard Zou
09b6f99852
[compile] aot_compile should respect VLLM_DISABLE_COMPILE_CACHE ( #36358 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-11 03:12:03 -07:00
Ethan T.
c87fb515ed
fix(lora): use replaced_module_name in pooling model name check ( #36402 )
...
Signed-off-by: gambletan <ethanchang32@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 03:11:27 -07:00
Itay Alroy
5353c9b016
platforms: Fix Ray DP startup crash ( #36665 )
...
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
2026-03-11 03:08:55 -07:00
Angela Yi
13e79fc811
[ci] Update rtol for test_classification ( #36556 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com >
2026-03-11 03:08:16 -07:00
Rahul Tuli
9d07a3d6e4
Add: Eagle3 support for Qwen3.5 ( #36658 )
...
Signed-off-by: Rahul-Tuli <rtuli@redhat.com >
2026-03-11 03:07:42 -07:00
Cyrus Leung
646b85544b
[Refactor] Remove Molmo2 processor wrapper ( #36667 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-11 03:07:20 -07:00
tc-mb
4286cc5ec2
fix(minicpmv): fix audio inference by handling meta device in init_re… ( #36751 )
...
Signed-off-by: caitianchi <caitianchi@modelbest.cn >
2026-03-11 03:06:28 -07:00
LoganJane
545d18d81b
[Bugfix] Support other quantization methods in glm41v ( #36321 )
...
Signed-off-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com >
Co-authored-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-11 09:48:05 +00:00
roikoren755
e661b9ee83
[NemotronH] Small fix reasoning parser ( #36635 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-03-11 02:44:41 -07:00
YiSheng5
c910eeb125
[XPU]Bug fix for some unexpected error when use AgRs backend on XPU device. ( #36593 )
...
Signed-off-by: yisheng <yi.sheng@intel.com >
2026-03-11 09:17:46 +00:00
Harry Mellor
f4ae58b38b
Remove unused config field from Gemma2 ( #36672 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 01:51:19 -07:00
Isotr0py
e568cf88bc
[UX] Infer dtype for local checkpoint ( #36218 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-11 08:50:04 +00:00
Nicolò Lucchesi
098d844731
[NIXL][1/N] Refactor kernel_block_size detection ( #35752 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-11 01:11:23 -07:00
JartX
a40ee486f2
[Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 ( #35923 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: akaratza <akaratza@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-03-11 07:45:57 +00:00
pschlan-amd
eac2dc2b41
AITER MLA backend: Avoid CPU sync in _build_decode ( #35765 )
...
Signed-off-by: Patrick Schlangen <pschlan@amd.com >
2026-03-11 07:25:00 +00:00
Flora Feng
d5080aeaa4
[Refactor] Remove deadcode in Responses API serving ( #36726 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
Co-authored-by: Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-11 07:11:41 +00:00
liuzhenwei
f22d6e0267
[Hardware][NIXL] set default kv buffer type for different platform ( #36438 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-11 05:19:28 +00:00
Kunshang Ji
76c6e6da08
[XPU] Support block fp8 moe by fallback to TritonExpert on XPU ( #36458 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-10 21:54:09 -07:00
typer-J
4184653775
feat: add RISC-V support for CPU backend (v2) ( #36578 )
...
Signed-off-by: typer-J <2236066784@qq.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-03-10 21:51:39 -07:00
Sladyn
4aaaf8c8ce
feat(spec_decode): fuse EAGLE step slot mapping and metadata updates ( #33503 )
...
Signed-off-by: sladynnunes <snunes@usc.edu >
2026-03-11 04:35:33 +00:00
Hongbin Guo
4bf533623b
[Doc] Fix duplicate words in comments ( #36713 )
...
Signed-off-by: Hongbin10 <jdmjdm1998@163.com >
2026-03-10 21:28:31 -07:00
Matthew Bonanni
5f77ef15ae
[Misc][Attention] Clean up unused method in CPU_ATTN ( #36673 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-10 21:27:22 -07:00
elvischenv
7d6abdd022
[Fix] Use torch.empty for output in attention+quant fusion ( #31785 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2026-03-10 21:26:14 -07:00
Wentao Ye
a8ff2cca92
[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement ( #35781 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
2026-03-10 21:25:30 -07:00
tunglinwood
42fadebecb
[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct ( #36127 )
...
Signed-off-by: tunglinwood <tunglinwood@gmail.com >
Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com >
Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com >
2026-03-10 21:24:48 -07:00
tianshu-Michael-yu
a197eda9c3
Add tuned H100 MoE configs for LFM2 8B and 24B ( #36699 )
2026-03-10 21:22:02 -07:00
Kevin H. Luu
82b110d50e
[ci] Bound nvidia-cudnn-frontend version ( #36719 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-03-11 12:17:35 +08:00
Benjamin Chislett
9040cd40af
[DSV3.2][MTP] Optimize Indexer MTP handling ( #36723 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-03-11 12:16:56 +08:00
fangyuchu
fa0d353acf
[Bugfix] Surface exceptions from non-blocking execute_model in UniProcExecutor to avoid DP deadlocks ( #35194 )
...
Signed-off-by: fangyuchu <fangyuchu@qq.com >
2026-03-11 03:22:21 +00:00
Augusto Yao
b386bb3d7c
fix bugs when token_classify & classify run concurrently ( #36614 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
2026-03-10 20:16:34 -07:00
Ning Xie
fe714dd507
[openapi server] log exception in exception handler(2/N) ( #36201 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-03-10 20:16:30 -07:00
Matthew Bonanni
8ab3d7427c
[Bugfix] Fix DeepSeek V3.2 OOM during CG memory profiling ( #36691 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-11 03:01:07 +00:00
Wei Zhao
84e436ed1c
[Bug] Fix TRTLLM Block FP8 MoE Monolithic ( #36296 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-03-10 22:04:47 -04:00
Andreas Karatzas
81939e7733
[ROCm][CI] Making some tests optional to reduce workload ( #36090 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-10 16:45:27 -07:00
Woosuk Kwon
195d1ca3e8
[Minor] Enhance error message for TRTLLM decode uniformity check ( #36609 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-10 15:38:45 -07:00
Nick Hill
8d983d7cd6
[Model Runner V2] Add initial CI tests ( #36041 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-10 14:55:21 -07:00
Nick Hill
65b2f405dc
[Core] Simplify core kv-cache blocks initialization logic ( #36521 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-10 20:20:02 +00:00
Nick Hill
2a68464c5b
[Test] test_async_scheduling.py improvements ( #36340 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-10 11:17:26 -07:00
Zhengxu Chen
bdd8981dab
[compile] Apply stored functorch config while finalizing loaded artifacts. ( #36582 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-10 09:34:35 -07:00
Woosuk Kwon
f088a831dd
[Model Runner V2] Use unpadded num_tokens for PW CUDA graph attn metadata ( #36626 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-10 09:30:56 -07:00
Harry Mellor
f83b933b84
[CI] Bump mypy version to 1.19.1 ( #36104 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-10 09:18:28 -07:00
Pleaplusone
82f3f30e26
[ROCm][Perf] Enable sparse_mla's cudagraph on ROCm platform ( #35719 )
...
Signed-off-by: ganyi <ygan@amd.com >
2026-03-10 09:14:35 -07:00
Matthew Bonanni
9095cbbfb6
[Bugfix][Sparse MLA] report indexer CG support properly ( #36519 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-10 09:14:31 -07:00
Hashem Hashemi
721ae79f50
Improvements to wvSplitKrc skinny GEMM solution ( #34304 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-03-10 09:14:27 -07:00
AllenDou
aefc59f088
FunASR model bugfix ( #36633 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-03-10 08:14:21 -07:00
Harry Mellor
d88f28da05
Fix hf_override_fn when it modifies model_type ( #35200 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-10 15:03:18 +00:00
Srinivasoo7
106ff69c4e
feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency ( #35342 )
...
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com >
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com >
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com >
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
2026-03-10 14:43:40 +00:00
Jiangyun Zhu
ca5fb4bbd8
[Bugfix] Avoid merging empty-only partitions into splitting-op subgraphs ( #36595 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-03-10 07:39:01 -07:00
Alvin Tang
cf88b23749
fix: check HTTP status in batch read_file to prevent silent failures ( #36397 )
...
Signed-off-by: gambletan <ethanchang32@gmail.com >
Co-authored-by: gambletan <ethanchang32@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-10 07:22:40 -07:00
wang.yuqi
a3189a08b0
[Model] Consolidate score logic by introduce score_type ( #36479 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-10 13:32:25 +00:00
SoluMilken
409c4e632d
[Misc] fix typo: homogenous-> homogeneous (2 lines change) ( #36508 )
...
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw >
2026-03-10 06:25:37 -07:00
Raushan Turganbay
8850738b70
[Bugfix] Fix processor signature ( #36630 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-10 06:20:47 -07:00
Mark McLoughlin
234860399b
[Frontend][Core] Revert "Add shutdown timeout" ( #34730 and #36270 ) ( #36628 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-03-10 06:20:41 -07:00
Harry Mellor
c88510083b
Fix Qwen2.5-VL test for Transformers v5 ( #36532 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-10 12:05:34 +00:00
Vadim Gimpelson
4ff8c3c8f9
[BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU ( #35219 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-03-10 03:32:20 -07:00
Chang Su
507ddbe992
feat(grpc): extract gRPC servicer into smg-grpc-servicer package, add --grpc flag to vllm serve ( #36169 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-03-10 03:29:59 -07:00
Nick Hill
ddbb0d230a
[Model Runner V2] Fix mm input embeddings lookup ( #36588 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-10 00:24:58 -07:00
Nick Hill
9efc3bdcd6
[Model Runner V2] Fix _compute_slot_mappings_kernel for chunked prefill ( #36580 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-10 00:23:42 -07:00
amirkl94
156e33553c
Fix: Re-Enable EP for trtllm MoE FP8 backend ( #36494 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
2026-03-09 23:11:27 -07:00
hallerite
d0cd736caa
[Bugfix] Fix RuntimeError: Already borrowed that degrades VLM serving throughput under concurrent load. ( #36557 )
...
Signed-off-by: hallerite <hallerite@users.noreply.github.com >
Signed-off-by: hallerite <git@hallerite.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-09 22:30:51 -07:00
Harry Mellor
195c997203
Fix LFM2 MoE test for Transformers v5 ( #36534 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-09 22:29:17 -07:00
Zhuohan Li
04b67d8f62
Remove unused disable_fallback field ( #36546 )
2026-03-09 20:56:54 -07:00
Wentao Ye
7279374f91
[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement ( #36159 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-09 20:55:58 -07:00
Woosuk Kwon
006aea17d7
[BugFix] Remove incorrect assert in split_decodes_and_prefills ( #36553 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-09 20:02:02 -07:00
Hojin Yang
0836be3b03
[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support ( #31471 )
...
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-10 10:59:19 +08:00
Ajay Anubolu
4e95ec111c
[Bugfix] Fix Qwen3-Next in_proj_ba weight sharding with TP > 1 ( #36242 )
...
Signed-off-by: AjAnubolu <anuboluajay@gmail.com >
2026-03-09 19:16:26 -07:00
Andreas Karatzas
179547d62c
[ROCm][CI] Fix ROCm GPT-OSS Eval test group ( #36179 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-09 17:55:20 -07:00
youkaichao
f85b4eda3a
[bugfix] fix nvlink for nixl/ucx ( #36475 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2026-03-10 07:49:47 +08:00
Woosuk Kwon
2a194ddd72
[Model Runner V2] Add model_state inputs to CUDA graph capture ( #36544 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-09 15:14:51 -07:00
Shaun Kotek
203a7f27da
add nemotron v3 reasoning parser ( #36393 )
...
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com >
Co-authored-by: root <root@gpu-259.slurm-workers-slurm.slurm.svc.cluster.local >
2026-03-09 15:11:41 -07:00
Lucas Wilkinson
483463f735
[MRV2] Extensible CG dispatch rework ( #35959 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-03-09 13:58:45 -07:00
Matthew Bonanni
4e571ce643
[MTP][Misc] Clean up dead code ( #36507 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-09 14:43:06 -04:00
Micah Williamson
4ff9b045fe
[ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm ( #36025 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-03-09 13:27:55 -05:00
Lucas Kabela
3fd03f1ec2
[BE] Rename should_torch_compile_mm_vit to should_torch_compile_mm_encoder ( #36281 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-03-09 18:22:05 +00:00
Woosuk Kwon
10a5f4d53d
[Model Runner V2] Use NamedTuple for execute_model_state ( #35930 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-09 11:17:34 -07:00
Simon Mo
fe0c085c28
[Docs] Remove the reo beacon ( #36528 )
...
Co-authored-by: Cursor Agent <cursoragent@cursor.com >
2026-03-09 11:16:50 -07:00
Taneem Ibrahim
8d6b3d5dda
[Misc] Refactored 5 duplicate helper functions that were copied-pasted across multiple parsers ( #36436 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-03-09 14:14:11 -04:00
Copilot
4b87ffbefb
[torch.compile] Rename compile_ranges_split_points to compile_ranges_endpoints ( #36027 )
...
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-03-09 18:04:40 +00:00
Shaun Kotek
fa028207aa
Fix/resupport nongated fused moe triton ( #36412 )
...
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com >
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com >
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: liweiguang <codingpunk@gmail.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Alex Brooks <albrooks@redhat.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: cong-or <conchubhar.gannon@gmail.com >
Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com >
Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com >
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
Signed-off-by: Xin Yang <xyangx@amazon.com >
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: nvnbagrov <nbagrov@nvidia.com >
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com >
Co-authored-by: danisereb <daserebrenik@nvidia.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Weiguang Li <codingpunk@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: Alex Brooks <albrooks@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: cong-or <conchubhar.gannon@gmail.com >
Co-authored-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com >
Co-authored-by: liuzhenwei <zhenwei.liu@intel.com >
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-09 11:01:18 -07:00
Russell Bryant
d460a18fc6
[Docs] Expand --allowed-media-domains security guidance with threat details ( #36506 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-09 17:43:42 +00:00
Woosuk Kwon
6e956d9eca
[Model Runner V2] Add dummy profile_cudagraph_memory API ( #36520 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-09 10:20:13 -07:00
Andreas Karatzas
1e0f917b34
[ROCm][CI] Fix logprob divergence for TitanML/tiny-mixtral under AITER rms_norm ( #36101 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-09 12:07:44 -05:00
Andreas Karatzas
c174d54f86
[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks ( #36292 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-09 12:02:41 -05:00
SoluMilken
55d27cca55
[Misc] fix typo: dependant -> dependent (2 lines change) ( #36511 )
...
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw >
2026-03-09 10:00:12 -07:00
Roberto L. Castro
580864d81e
[Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 ( #34917 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
2026-03-09 09:50:36 -07:00
Roberto L. Castro
2b28b9b269
[Attention][Perf] Optimize cp_gather_and_upconvert_fp8_kv_cache - DeepSeek-v3.2 ( #35290 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-03-09 09:46:57 -07:00
Taoyu Zhu
70485a11bd
[ROCM] Optimize the fused_topk_bias to use aiter instead of fallback torch ops. ( #36253 )
...
Signed-off-by: zhutaoyu <zhutaoyu97@gmail.com >
2026-03-09 11:30:35 -05:00
Harry Mellor
74a9f54cdb
[CI] Fix edge case that could lead to broken docs builds on main ( #36515 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-09 09:06:19 -07:00
Matthew Bonanni
00c4cb5606
[Bugfix] Clear stale CG keys after memory profiling ( #36416 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-09 11:56:00 -04:00
Wentao Ye
941e52c298
[Refactor] Simplify chat_completion_full_generator for tool parsers ( #35634 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-09 23:33:46 +08:00
Wentao Ye
be292b7c14
[Bug] Fix pooling model benchmark script ( #36300 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-09 11:17:45 -04:00
Matthew Bonanni
77a73458e3
Reapply [Attention] Refactor check_and_update_config ( #35122 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-09 07:17:14 -07:00
Tianyu Guo
5578f2a4d3
Support online use_audio_in_video ( #36319 )
...
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-09 07:16:44 -07:00
Cyrus Leung
3ec2115015
[Frontend] Move warmup into Renderer ( #36482 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-09 06:03:21 -07:00
Isotr0py
b0906d8b02
[MM Encoder] Default to use TORCH_SDPA backend for ViT on Volta/Turing GPU ( #36472 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-09 03:43:44 -07:00
Kevin H. Luu
aaf5fa9abf
[ci] Bound openai dependency to 2.24.0 ( #36471 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-03-09 03:43:26 -07:00
Cyrus Leung
f96c3ab08c
[Deprecation][1/2] Remove items deprecated in v0.18 ( #36470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-09 03:43:23 -07:00
Xin Yang
dc6b578466
[Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next ( #35777 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-03-08 23:41:01 -07:00
liuzhenwei
1bc9c77f6d
[XPU] Add test script of PD disaggregation ( #36434 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2026-03-09 05:50:27 +00:00
Alex Brooks
65a4da1504
[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) ( #36160 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
2026-03-09 05:46:23 +00:00
Li, Jiang
217f27598d
[Bugfix] Avoid to replace non-tensor members in cpu model runner ( #36430 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-03-09 13:06:28 +08:00
wang.yuqi
fff3711a24
[Frontend][2/n] Improve pooling entrypoints | embed. ( #36110 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
2026-03-09 11:42:19 +08:00
Tushar Shetty
c4d859c274
[Bugfix] Skip out-of-stage layers in get_layers_from_vllm_config for pipeline parallel ( #36243 )
...
Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com >
Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com >
2026-03-08 20:40:16 -07:00
cong-or
747431044d
feat(attention): extract KV-cache update from FlexAttention backend ( #36263 )
...
Signed-off-by: cong-or <conchubhar.gannon@gmail.com >
2026-03-08 20:40:12 -07:00
Cyrus Leung
d62856b928
[Misc] Move processors to transformers_utils ( #35953 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-09 11:31:39 +08:00
Alex Brooks
bd2659a566
Increase Flexibility for OOV Multimodal Token Handling ( #34858 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
2026-03-08 20:30:49 -07:00
Shaun Kotek
90512b2e8b
fix: Use iterator as not to store all the file loads in memory at once ( #36149 )
...
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com >
2026-03-08 20:25:21 -07:00
wang.yuqi
dcf8862fd4
[Examples][1/n] Resettle basic examples. ( #35579 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-08 20:22:53 -07:00
Weiguang Li
43aa389231
[Bugfix] Fix CPU OMP autobind assertion to use local_world_size ( #35815 )
...
Signed-off-by: liweiguang <codingpunk@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-03-08 20:07:29 -07:00
Wentao Ye
384425f84e
[Dependency] Remove default ray dependency ( #36170 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-08 20:06:22 -07:00
Harry Mellor
a0f44bb616
Allow markdownlint to run locally ( #36398 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-08 20:05:24 -07:00
Kunshang Ji
fde4771bbd
[XPU][Doc] update xpu document about triton dependency/conflict issue. ( #36301 )
...
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-03-09 02:09:22 +00:00
Jiangyun Zhu
e5ff140216
[cudagraph] fix cudagraph warning in deepseekv32 ( #28044 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-03-08 20:27:41 -04:00
danisereb
0a6a3a1290
Add support for ModelOpt MXFP8 MoE models ( #35986 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-03-08 13:00:05 -07:00
Sage
4497431df6
[Frontend] Add GPU-less render serving path (vllm launch render) ( #36166 )
2026-03-08 16:35:09 +01:00
nvnbagrov
b7332b058c
[Model] Nano Nemotron VL - fast media preprocessing ( #35657 )
...
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com >
2026-03-08 03:04:05 -07:00
Andreas Karatzas
40077ea3de
[CI] fix flaky empty responses and add diagnostic assertions in vision chat tests ( #36341 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-08 14:42:24 +08:00
Samuel Shen
5d6aae4577
[LMCache MP Patch]: Race Condition + Duplicated Block Ids ( #35831 )
2026-03-07 13:52:48 -08:00
Roy Huang
63298ee173
[Bugfix][LMCache][KVConnector] fix potential memory leak in LMCache multiprocess mode ( #35931 )
2026-03-07 13:52:35 -08:00
Richard Zou
2dde535df1
[compile] Split compile/warmup monitoring ( #36098 )
2026-03-07 13:52:11 -08:00
Wei Zhao
379689d533
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse ( #35891 )
2026-03-07 13:51:54 -08:00
PatchyTIS
a6be75dbd2
[Core] NGram GPU Implementation compatible with Async Scheduler ( #29184 )
2026-03-07 13:51:37 -08:00
Micah Williamson
ee54f9cdb9
[ROCm][CI] Accept Different But Valid Output for test_olmoe_tp ( #35224 )
2026-03-07 13:50:52 -08:00
Micah Williamson
fc4657756f
[ROCm][CI] Enable AITER for failing test_gpt_oss test case on MI355 ( #36174 )
2026-03-07 13:50:17 -08:00
qli88
eebd14651f
[CI] Enable Crosslayer KV layout tests for ROCm platforms ( #35416 )
2026-03-07 13:49:56 -08:00
Matthew Bonanni
ebb9cc5f2b
[UX][Startup] Account for CUDA graphs during memory profiling ( #30515 )
2026-03-07 13:49:23 -08:00
rahul-sarvam
85f50eb41f
Adding support to Sarvam's MoE models ( #33942 )
...
Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com >
2026-03-08 01:16:24 +08:00
Taneem Ibrahim
5261223c2d
[Misc] Remove duplicate parser registration ( #36303 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-03-07 09:37:01 -05:00
lif
00b814ba5a
[V0 Deprecation] Remove unused swap_space parameter ( #36216 )
...
Signed-off-by: majiayu000 <1835304752@qq.com >
Co-authored-by: mcelrath
2026-03-07 22:09:55 +08:00
vllmellm
ee8a29511f
[Bugfix] Fix compressed-tensors quantization failure for DeepSeek-R1 on MI300x ( #36247 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-03-07 09:26:59 +00:00
milesial
755356b3d1
feat: expose media_io_kwargs at runtime ( #34778 )
...
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com >
2026-03-07 04:27:04 +00:00
Andreas Karatzas
58928475e4
[ROCm][CI] Making entrypoints more deterministic on ROCm ( #36293 )
2026-03-06 19:04:40 -08:00
Mengtao (Martin) Yuan
1a9718085c
Fix CUDA graph decode capture crash in AITER FlashAttention ( #36042 )
...
Signed-off-by: Martin Yuan <myuan@meta.com >
Co-authored-by: Martin Yuan <myuan@meta.com >
2026-03-06 18:12:07 -08:00
Kunshang Ji
7eb524e64c
refine vllm bench throughput --backend hf ( #35971 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-07 02:10:33 +00:00
Nick Hill
c7f32e08c2
[BugFix] Avoid ignored trust_remote_code warnings ( #36290 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-07 01:24:18 +00:00
Nick Hill
b354686524
[Model Runner V2] Fix warmup for pipeline parallel ( #36280 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-06 16:58:51 -08:00
Nick Hill
6a18d8789b
[Core] Fix benign error log during normal shutdown ( #36270 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2026-03-07 00:39:21 +00:00
Itay Alroy
24a03915f5
mla: don't update kv cache on dummy forwards ( #36282 )
...
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
2026-03-07 00:36:00 +00:00
Andreas Karatzas
b5e34e1fca
[ROCm][CI] Fixing yaml file for external amd-ci signal ( #36284 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 18:30:39 -06:00
Copilot
ce8546a12b
[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page ( #35538 )
...
Signed-off-by: ProExpertProg <luka.govedic@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
Co-authored-by: ProExpertProg <luka.govedic@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-03-06 23:55:06 +00:00
Chuan (Richard) Li
c188749bcd
[ROCm] Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5/Linear) ( #35850 )
...
Signed-off-by: Li <chuali@amd.com >
2026-03-06 20:24:03 +00:00
Alexei-V-Ivanov-AMD
225d1090a0
Enabling some B200-specific tests on MI355 ( #35253 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
Signed-off-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com >
2026-03-06 19:27:20 +00:00
eellison
f3c6c9c9d7
[CustomOp] CustomOp FusedRMSNormGated ( #35877 )
...
Signed-off-by: Elias Ellison <elias.ellison@gmail.com >
Signed-off-by: eellison <elias.ellison@gmail.com >
2026-03-06 10:53:37 -08:00
Nick Hill
26bd43b52d
Revert "[BugFix] Fix engine hanging after KV cache initialization fai… ( #36262 )
2026-03-06 08:28:09 -08:00
Travis Johnson
6b625a8807
[Bugfix] Quickfix followups to busy loop removal in #28053 ( #36068 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-06 08:13:05 -08:00
Richard Zou
54756b6109
[compile] Stop unconditionally patching constrain_to_fx_strides ( #36152 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-06 10:17:27 -05:00
Raphaël Rialland
39f9ea0da4
[Bugfix] Fix cudagraph_mode:FULL dispatch (This does not impact FULL_AND_PIECEWISE (default)) ( #36165 )
2026-03-06 09:15:31 -05:00
Isotr0py
e4ae148a78
[Refactor] Modular video loader backend refactoring ( #35202 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-06 06:06:59 -08:00
Isotr0py
1d0c0d209c
[Misc] Lazy import registered processors ( #36024 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-06 06:06:45 -08:00
Chenguang Zheng
fcb73f306c
[bugfix] add api process rank in default multimodal request ( #36150 )
...
Signed-off-by: fake0fan <645327136@qq.com >
Signed-off-by: Chenguang ZHENG <645327136@qq.com >
2026-03-06 12:00:09 +00:00
Harry Mellor
e2090bf3af
[CI] Fix startup error test ( #36230 )
...
A change in engine startup error messages in #35478 caused this test failure.
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-06 11:50:28 +00:00
Andreas Karatzas
2a00d3241f
[CI][MM] Gate vision encoder attention mask to MiniCPM only, fixing Aria regression ( #36206 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 01:17:08 -08:00
Alex Brooks
10f4db4dbe
[Frontend] Add Support for MM Encoder/Decoder Beam Search (Offline) ( #36153 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-06 01:16:56 -08:00
Nicolò Lucchesi
5b3ba94ab4
[Core][KVConnector] Support HMA+NixlConnector ( #35758 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-06 08:51:21 +01:00
zhanqiuhu
90f3c01fa4
[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding ( #35158 )
...
Signed-off-by: Claude <noreply@anthropic.com >
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-03-06 08:50:44 +01:00
Andreas Karatzas
807d680337
[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance ( #35553 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 15:15:12 +08:00
Tyler Michael Smith
5afb387bd4
Change "following fields were present in the request but ignored" log from warn to debug ( #36173 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-03-05 22:15:46 -08:00
Walter Beller-Morales
43e77e59ab
[BugFix] avoid infinite loop with VLLM_PORT and get_open_ports_list ( #36191 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-03-05 22:15:29 -08:00
Russell Bryant
00bd08edee
[Security] Respect user trust_remote_code setting in NemotronVL and KimiK25 ( #36192 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-05 22:15:19 -08:00
Ajay Anubolu
43f10573c9
[Bugfix] Fix misleading context length error messages ( #36197 )
...
Signed-off-by: AjAnubolu <anuboluajay@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-05 22:15:12 -08:00
Yongye Zhu
86e1060b17
[Bugfix] Fix inner_dp_world initialization order for multi-node TP ( #35892 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-03-05 22:04:44 -08:00
Mark McLoughlin
27066d1b2b
[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish ( #34730 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-03-05 22:04:31 -08:00
cong-or
57c84ff129
perf: add __slots__ to KVCacheBlock ( #36164 )
...
Signed-off-by: cong-or <conchubhar.gannon@gmail.com >
2026-03-05 22:04:09 -08:00
Xiang Shi
e68de8adc0
docs: fix wrong cc in int8.md ( #36209 )
...
Signed-off-by: Xiang Shi <realkevin@tutanota.com >
2026-03-06 06:01:02 +00:00
Andreas Karatzas
a1ffa56a1e
[CI] Fix bge-m3 similarity reference values after *Defination* typo fix ( #36208 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 05:07:29 +00:00
Shiyan Deng
0a208d1f54
[BugFix] Fix engine hanging after KV cache initialization failure ( #35478 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-03-05 20:58:09 -08:00
Shiyan Deng
03a49bb8f0
[Feature] Add --distributed-timeout-seconds CLI option ( #36047 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-03-05 20:57:51 -08:00
Shiyan Deng
8e87cc57f1
[Bug] Fix a corner case in _process_simple_streaming_events ( #34754 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-03-05 20:57:32 -08:00
Cyrus Leung
6dd302653f
[Misc] Rename group_mm_kwargs_by_modality -> group_and_batch_mm_kwargs ( #36158 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-06 12:32:48 +08:00
Cyrus Leung
de00ebeac4
[Bugfix] Fix simple Mistral-Small example ( #36156 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-05 20:25:11 -08:00
Andreas Karatzas
639680d220
[ROCm][CI] Adding missing dependencies for Multi-modal models tests ( #36177 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-06 12:23:10 +08:00
Rohan Potdar
c5362c739f
Reenable features for ROCm attention backends ( #36185 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-03-05 20:21:06 -08:00
Nikhil Gupta
0a49676fb0
cpu: aarch64: Upgrade OneDNN for aarch64 to add support for int8 matmul ( #36147 )
...
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com >
2026-03-06 03:48:59 +00:00
Jeffrey Wang
c012a8c477
Don't fire ray compatibility webhook when PR or branch is not provided ( #36088 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2026-03-06 00:42:21 +00:00
Dor Huri
ebed80a7c8
[Performance] Extract KV-cache update from TreeAttention backend ( #35384 )
...
Signed-off-by: dorhuri123 <dor.huri1@live.biu.ac.il >
2026-03-06 00:22:43 +00:00
Nick Hill
a73af584fe
[Model Runner V2] Fix warmup for very small kvcache and/or blocksizes ( #36176 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-05 14:48:10 -08:00
Zhengxu Chen
a97954b6a8
[compile] Consistent compiler config for saved/loaded vllm backends. ( #35810 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-05 15:08:12 -05:00
Yanhong Li
a911f4dd20
[Model] Add support for OLMo Hybrid ( #32550 )
2026-03-05 14:51:06 -05:00
Russell Bryant
5395471d29
[CI] Add explicit permissions to macOS smoke test workflow ( #35775 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-05 19:08:48 +00:00
Frank Wang
a57c877f18
[BugFix] Fallback from FA4->FA2 for Batch Invariance ( #36059 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
2026-03-05 14:05:56 -05:00
Xin Yang
f917020983
[Perf] Optimize FusedMoEModularKernel output tensor using torch.empty ( #35794 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-03-05 13:47:53 -05:00
tomeras91
86483ca774
[Bugfix] Disable FlashInfer TRTLLM BF16 path for non-gated MoE ( #36146 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2026-03-05 09:49:05 -08:00
Netanel Haber
b93a9e6f6d
ParakeetProjection.norm = RMSNorm instead of nn.LayerNorm ( #36133 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-03-05 17:29:30 +00:00
Xinyu Chen
d8839ef7d9
[XPU] Enable ModelRunnerV2 on XPU ( #36078 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2026-03-05 17:19:18 +00:00
Avery Miao
e998fa76b9
[BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 ( #35994 )
...
Signed-off-by: Miao, Avery <avery.miao@intel.com >
2026-03-05 09:16:29 -08:00
Jiayi Yan
6a895197fa
[Bugfix][CI] fix typos ( #34934 )
...
Signed-off-by: 1195343015 <1195343015@qq.com >
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 17:05:46 +00:00
Sage Moore
8c760b6ab6
[ROCm] Refactor ROCm attention backend selection logic ( #35246 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2026-03-05 10:51:26 -06:00
AllenDou
3ee68590c7
refactor funasr model. ( #36108 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-05 08:07:37 -08:00
Cyrus Leung
7196348157
[Bugfix] Fix Qwen-VL tokenizer implementation ( #36140 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-05 08:07:19 -08:00
Ning Xie
176c799f4c
[openai api] log exception in exception handler (1/N) ( #31164 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-03-05 16:00:12 +00:00
Or Ozeri
612e7729c2
[KVConnector] Scheduler: Fix num_computed_tokens after async KV load ( #34616 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-03-05 14:25:15 +00:00
Harry Mellor
ecde7af9c4
Fix import that was moved in Transformers 5.2.0 ( #36120 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 13:59:44 +00:00
Harry Mellor
8df523351f
[Docs] Only build docs if documentation or ready labels are present ( #36135 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 13:58:16 +00:00
Andreas Karatzas
b03ff6a96b
[CI] Stabilize test_no_args_tool_call and add ROCm-specific server args ( #36107 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-05 21:52:49 +08:00
Ajay Anubolu
ed81d5edd1
[Bugfix] Fix RunAI streamer crash with S3-hosted model paths ( #35976 )
...
Signed-off-by: AjAnubolu <anuboluajay@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-05 12:14:20 +00:00
Shiyan Deng
3c23ac840e
[Bugfix] Fix mypy errors in hermes_tool_parser.py ( #36114 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
2026-03-05 11:37:47 +00:00
cjackal
a708ef5944
[Misc] Fix SyntaxWarning - invalid escape sequence '\e' ( #36020 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
2026-03-05 10:55:31 +00:00
Kunshang Ji
66a2209645
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize ( #36085 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-05 10:36:39 +00:00
Doug Smith
0bfa229bf1
[Release] Include source distribution (sdist) in PyPI uploads ( #35136 )
...
Signed-off-by: dougbtv <dosmith@redhat.com >
Co-authored-by: Daniele Trifirò <dtrifiro@redhat.com >
2026-03-05 01:43:50 -08:00
Paco Xu
7493c51c55
[Docs] add Dynamo/aibrix integration and kubeai/aks link ( #32767 )
...
Signed-off-by: Paco Xu <paco.xu@daocloud.io >
2026-03-05 17:39:50 +08:00
Reagan Lee
ac773bbe80
[Docs] Update docs to include mm processor + encoder benchmarks ( #34083 )
...
Signed-off-by: Reagan <reaganjlee@gmail.com >
2026-03-05 01:38:25 -08:00
Christian Munley
48e376a007
qwen3coder tool parser fix anyOf double encoded parameters ( #36032 )
...
Signed-off-by: Christian Munley <cmunley@nvidia.com >
2026-03-05 09:06:57 +00:00
Isotr0py
21eb2c3372
[Chore] Correct MTP models test registry ordering ( #36115 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-05 08:55:04 +00:00
Seiji Eicher
e2b31243c0
[Docs] Update CacheConfig block_size docstring to remove inaccurate limit when using CUDA ( #35632 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2026-03-05 06:24:08 +00:00
Martin Hickey
c3598d02fa
[Misc] Remove deprecated items that are due for removal ( #36006 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
2026-03-05 06:14:50 +00:00
Benjamin Chislett
57c629e9c1
[Bugfix] Fix block_size for hybrid model MTP ( #36036 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-03-05 06:10:54 +00:00
zihaoanllm
d106bf39f5
[Doc] Add Parallel Draft Models ( #35973 )
...
Signed-off-by: <zihaoan2@amd.com >
Signed-off-by: zihaoanllm <zihaoan2@amd.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 05:44:07 +00:00
Yanan Cao
b0651021e5
[Kernel] [Helion] [11/N] Retune configs for silu_mul_fp8 ( #36062 )
2026-03-04 21:25:59 -08:00
Hanjun Cho
f600d5192e
[Bugfix] Fix score layer quantization for sequence classification models - Qwen3 (VL) Reranker ( #35849 )
...
Signed-off-by: Hanjun Cho <gkswns0531@gmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-04 20:57:20 -08:00
Tianmu Li
8e7820131e
[Perf] Use dummy M for weight prepacking on x86 ( #35890 )
...
Signed-off-by: Li, Tianmu <tianmu.li@intel.com >
2026-03-05 04:56:49 +00:00
Andrii Skliar
0a12cea25f
Order config.py in Lexicographical order ( #35866 )
...
Signed-off-by: Andrii Skliar <askliar@nvidia.com >
Co-authored-by: Andrii Skliar <askliar@nvidia.com >
2026-03-04 20:56:47 -08:00
Zhengxu Chen
dd6dbd93f8
[compile] Fix extra cache save on warm start. ( #35921 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-05 12:56:30 +08:00
Harry Mellor
26366009c5
[CI] Don't leave docs preview comment on closed PRs ( #36087 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-05 04:51:46 +00:00
Nick Hill
16c472abe7
[Core] Move ray-specific WorkerWrapperBase methods to RayWorkerWrapper ( #35328 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-05 12:11:59 +08:00
daje0601
3b23d57c96
[Model] Add LoRA support for Whisper models ( #29856 )
...
Signed-off-by: daje0601 <englishmt4118@gmail.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-03-05 10:38:25 +08:00
Wentao Ye
2f4226fe52
[CI] Fix pre-commit mypy issue in main ( #36049 )
2026-03-04 18:13:12 -08:00
nkm-meta
792cbd64ca
Add platform method to enable custom collective ops registration ( #34760 )
...
Signed-off-by: Naina Kuruballi Mahesh <nainakm@meta.com >
2026-03-05 00:50:32 +00:00
Zhengxu Chen
2ed4722e26
[compile] Reduce log spam from compile. ( #36044 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-05 00:48:36 +00:00
Nick Hill
a3299c3d1d
[Model Runner V2] Misc code simplification ( #35941 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-04 15:26:35 -08:00
Andreas Karatzas
6c21a0c2d7
[ROCm][CI] Added MI325 mirrors (stage C) ( #35239 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-04 14:48:46 -08:00
Shanshan Shen
562339abc3
[Misc] Support OOT linear method registering ( #35981 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-03-04 22:25:56 +00:00
amitz-nv
d7adcadb9b
[Bugfix] Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 ( #36017 )
...
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com >
2026-03-04 22:23:51 +00:00
Simon Mo
f678c3f61a
[RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag ( #35928 )
...
Co-authored-by: Cursor Agent <cursoragent@cursor.com >
2026-03-04 17:05:32 -05:00
Thomas Parnell
be0a3f7570
[Bugfix] Fix race in non-blocking num_accepted_tokens GPU->CPU copy ( #36013 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-04 13:52:44 -08:00
Harry Mellor
17dc9c7fc9
[CI] Bump mypy version ( #34950 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 20:55:11 +00:00
fenypatel99
7eca859110
Add PyTorch profiler schedule support with warmup/active iterations ( #35240 )
2026-03-04 12:53:38 -08:00
Russell Bryant
636ee223ac
[Docs] Document security risks of GPT-OSS Python tool ( #35139 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-04 20:27:31 +00:00
Robert Shaw
b7d59ffce2
[UX] Remove NoOpOffloader log ( #35678 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-04 12:13:40 -08:00
Richard Zou
5569f5218d
[torch.compile] Stop lazily compiling ( #35472 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-04 12:13:17 -08:00
Davina Zaman
138d891d7f
[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode ( #32441 )
...
Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 11:44:39 -08:00
Stefano Castagnetta
d7166e74c1
[CI] Add Blackwell AsyncTP correctness test ( #35871 )
...
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com >
2026-03-04 19:41:21 +00:00
Nick Hill
417fd28fb1
[Model Runner V2] Fix pooling ( #36019 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-04 10:53:17 -08:00
tomeras91
7faba503c4
[Kernel][Mamba] Optimize Mamba2 SSD prefill Triton kernels ( #35397 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2026-03-04 19:47:17 +01:00
Hyunkyun Moon
bc6be89d16
[Frontend] Add vllm launch command for GPU-less preprocessing serving ( #34551 )
...
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com >
2026-03-04 18:41:52 +00:00
Maxime Grenu
32224f568a
docs: update CPU Docker images to reference Docker Hub instead of AWS ECR ( #34882 )
...
Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 10:31:35 -08:00
Abhishek Mathukiya
f3dc292e9f
docs: add version requirement note for --profiler-config flag ( #32454 )
...
Signed-off-by: abhishkh <mathukiya.a@northeastern.edu >
2026-03-04 18:13:54 +00:00
Chen
138c5fa186
[Docs] Add RunPod GPU deployment guide for vLLM ( #34531 )
...
Signed-off-by: lisperz <zhuchen200245@163.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 10:11:34 -08:00
Russell Bryant
2f2c1d73a7
[Docs] Upgrade dynamic LoRA warning to admonition block ( #35218 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-03-04 10:01:42 -08:00
Bhuminjay Soni
fb3e78ab09
[Feature][CI]: compare func & no_func outputs in test_functionalization.py ( #35481 )
...
Signed-off-by: Bhuminjay <bhuminjaysoni@gmail.com >
Signed-off-by: Bhuminjay Soni <Soni5Happy@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-03-04 18:01:16 +00:00
Michael Yao
fd3bfe74c9
[Docs] Update design/multiprocessing.md ( #30677 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2026-03-04 17:58:59 +00:00
tc-mb
bfdb512f11
fix minicpmo4.5: fix attn_mask in vit attn && fix resampler pos_emb i… ( #34127 )
...
Signed-off-by: tc-mb <caitianchi@modelbest.cn >
Co-authored-by: hezhihui <hezhihui@modelbest.cn >
2026-03-04 17:46:17 +00:00
Sage
d25c1ec3c9
docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build ( #35090 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-04 17:45:35 +00:00
Xing Liu
7cc6058ac6
[Doc] Add MTP docs and update speculative decoding guidance ( #35197 )
...
Signed-off-by: liuxing <945764858@qq.com >
2026-03-04 17:23:34 +00:00
Manrique Vargas
28028dff2f
fix(docs): use static rdzv backend in multi-node troubleshooting script ( #34784 )
...
Signed-off-by: machov <mv1742@nyu.edu >
2026-03-04 17:15:35 +00:00
Dr Alex Mitre
3417ba5648
docs: add README for logits_processor examples ( #35933 )
2026-03-04 17:09:19 +00:00
Yan Ma
58cfe0dc44
Fix phi4-mm and remove cuda binding ( #35964 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2026-03-05 01:08:05 +08:00
simone-dotolo
e86221deb6
[Doc] Fix GPU Worker count in Process Count Summary ( #36000 )
...
Signed-off-by: simone-dotolo <simonedotolo@libero.it >
Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-04 17:03:14 +00:00
Netanel Haber
289fc48ab7
Use MMEncoderAttention (=use FlashAttention) instead of torch.sdpa in radio.py ( #35653 )
2026-03-04 08:43:13 -08:00
Christian Pinto
2f2212e6cc
Split generic IO Processor plugins tests from Terratorch specific ones ( #35756 )
...
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
2026-03-05 00:01:03 +08:00
Nicolò Lucchesi
18e01a0a10
[Misc] Add --attention-backend auto option ( #35738 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-04 15:12:27 +00:00
sungsoo ha
6cb901093f
[Core] Add All-to-All communication backend for DCP ( #34883 )
...
Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com >
Signed-off-by: sungsoo ha <hasungsoo@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 10:01:57 -05:00
Cyrus Leung
ead7bde1ab
[Bugfix] Make kaldi_native_fbank optional ( #35996 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-04 06:47:32 -08:00
Qi Wang
6aa6ad8992
[BugFix] Fix implicit and incorrect assumption on ECConnector is_producer ( #34783 )
...
Signed-off-by: Qi Wang <qiwa@nvidia.com >
2026-03-04 15:01:30 +01:00
Raghavan
c8c3935b70
[Bugfix][Model] Fix FP8 k_scale/v_scale not loaded for Qwen3-MoE ( #35656 )
...
Signed-off-by: raghavan <oneraghavan@gmail.com >
2026-03-04 13:15:38 +00:00
Ronen Schaffer
bb6888b8b1
[Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blocks in LRU/ARC prepare_store() ( #35846 )
...
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com >
2026-03-04 14:25:33 +02:00
Taneem Ibrahim
1aaec59d79
[MISC] fixed tool_parser mypy errors ( #35640 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 12:23:12 +00:00
pougetat
1659b2e058
[Feature] Add basic metrics for /realtime endpoint ( #35500 )
...
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Signed-off-by: pougetat <thomas.pougetabadie@gmail.com >
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-04 19:56:32 +08:00
haosdent
d6e04f4c43
[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models ( #34094 ) ( #34571 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com >
2026-03-04 11:56:22 +01:00
Kunshang Ji
a8f66cbde8
[XPU] bump vllm-xpu-kernels to v0.1.3 ( #35984 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-04 18:23:31 +08:00
Kunshang Ji
16d2ad1d38
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache ( #30681 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-04 09:49:47 +00:00
Chuan (Richard) Li
5dc3538736
[ROCm][Bugfix] Fall back from CK MXFP4 MoE when GEMM dimensions are unsupported ( #35893 )
...
Signed-off-by: Li <chuali@amd.com >
2026-03-04 08:30:54 +00:00
Nathan Price
36bf213181
[Bugfix] Add missing dynamic_arg_dims for Qwen3-ASR torch.compile ( #35869 )
...
Signed-off-by: Nathan Price <nathan@abridge.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-04 08:29:01 +00:00
Joe Runde
6f0dd93801
[Core] Remove busy loop from idle buffer readers ( #28053 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-04 07:44:20 +00:00
Andrii Skliar
5d199ac8f2
Support Audio Extraction from MP4 Video for Nemotron Nano VL ( #35539 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Signed-off-by: Andrii Skliar <askliar@nvidia.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Signed-off-by: Andrii <askliar@nvidia.com >
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Co-authored-by: Andrii Skliar <askliar@oci-nrt-cs-001-vscode-01.cm.cluster >
Co-authored-by: Andrii <askliar@nvidia.com >
Co-authored-by: root <root@pool0-03748.cm.cluster >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: root <root@pool0-02416.cm.cluster >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com >
Co-authored-by: root <root@pool0-04880.cm.cluster >
2026-03-03 23:20:33 -08:00
Komal Kumar Teru
9e0f44bec4
[cohere][fix][spec-decode]: fix crash when allowed_token_ids is set without penalties ( #35654 )
...
Signed-off-by: kkt-cohere <komal@cohere.com >
2026-03-03 23:20:15 -08:00
lailoo
097eb544e9
[Bugfix] Improve engine ready timeout error message ( #35616 )
...
Signed-off-by: damaozi <1811866786@qq.com >
2026-03-04 05:54:32 +00:00
ShiJie Zhong
7cdba98edf
[BugFix] Support tool_choice=none in the Anthropic API ( #35835 )
...
Signed-off-by: ZhongsJie <zhongsjie@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-03-04 05:24:46 +00:00
Charlie Fu
3c85cd9d74
[Rocm][CI] Fix ROCm LM Eval Large Models (8 Card) ( #35913 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2026-03-04 04:50:13 +00:00
Andreas Karatzas
edba15045a
[Bugfix] Guard mm_token_type_ids kwarg in get_mrope_input_positions ( #35711 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-04 04:12:51 +00:00
Cyrus Leung
e379396167
[Refactor] Clean up processor kwargs extraction ( #35872 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-03 19:53:53 -08:00
Isotr0py
6e9f21e8a2
[Chore] Remove debug code in model implementation ( #35883 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-03 19:50:58 -08:00
AllenDou
c1d963403c
[model] support FireRedASR2 ( #35727 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-03 19:41:30 -08:00
Shanshan Shen
77e6dcbbfa
[PluggableLayer][MM] Add PluggableLayer for RelPosAttention ( #33753 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-03-03 19:41:27 -08:00
William Zhang
70c73df69e
[Bugfix] Fix EVS implementation for Qwen3 VL ( #33607 )
...
Signed-off-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com >
2026-03-04 02:18:11 +00:00
xjx
9a9d442464
Enable bnb for multiple indices weight ( #35838 )
...
Signed-off-by: xjx <493337577@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-04 01:46:47 +00:00
Andreas Karatzas
f7da9cdffc
[ROCm][CI] Support async weight transfer example with platform-aware determinism ( #35710 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-04 09:44:14 +08:00
Jaewon
f22ff2958c
[Bugfix] Fix coord_socket assertion in DPEngineCoreProc for offline DP mode ( #35916 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
2026-03-04 00:10:11 +00:00
Nick Hill
d15c3b90fc
[Core] Move save_tensorized_model logic to Worker ( #35825 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-03 15:31:59 -08:00
zhrrr
97286a20ed
[Model Runner V2] support dp & ep for spec decoding ( #35294 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai >
2026-03-03 15:19:45 -08:00
Amr Mahdi
12b38c0f45
[CI/Build] Allow mounting AWS credentials for sccache S3 auth ( #35912 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-03-03 14:30:47 -08:00
Woosuk Kwon
467886a0c4
[Model Runner V2] Fix inputs_embeds=None bug for MM models ( #35917 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-03 13:47:45 -08:00
bnellnm
a9b8b13e5c
[Bugfix] Fix misnamed parameter in compressed_tensors_moe.py ( #35813 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-03-03 16:29:57 -05:00
Micah Williamson
e7213003cb
[ROCm][CI] Fix TP size issue for test_gpt_oss ( #35887 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-03-03 20:57:34 +00:00
Rohan Potdar
3a8eef5869
[ROCm][Bugfix]: Disable AITER Triton ROPE by default ( #35601 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-03-03 13:43:56 -06:00
Robert Shaw
97995f6376
[MoE Refactor] Create MK for TRTLLM Kernels ( #32564 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
2026-03-03 10:39:50 -08:00
Robert Shaw
881a6b011b
[CI] Temporarily Disable Llama4 MoE Refactor Test ( #35870 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-03-03 10:36:15 -08:00
Matthew Bonanni
8e1fd5baf0
[CI] Bump num_speculative_tokens to 3 in nightly DeepSeek tests ( #35882 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-03 09:26:44 -08:00
JasonCohere
ae88468bcc
fix: Ensure invalid audio files return 400 error ( #34715 )
...
Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-03-03 08:47:39 -08:00
ojhaanshika
e05cb3b93e
TRTLLM gen-full attn Test Coverage ( #34986 )
...
Signed-off-by: Anshika Ojha <anshikao@nvidia.com >
Co-authored-by: Anshika Ojha <anshikao@gb-nvl-059-compute09.nvidia.com >
2026-03-03 11:35:34 -05:00
Lucas Wilkinson
28ef9ba399
[BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA ( #34552 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-03 07:21:57 -08:00
TJian
fb7fdc49c4
[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops ( #34307 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-03-03 06:24:21 -08:00
wang.yuqi
ea463978bb
[Frontend][1/n] Improve pooling entrypoints | classify. ( #35604 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-03-03 06:05:36 -08:00
Li, Jiang
440f0e7dc6
[Bugfix] Avoid src/dst as None in irecv/isend_tensor_dict ( #35754 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-03-03 05:56:08 -08:00
wang.yuqi
fd4a90f337
[CI] And PPL test for Qwen3.5. ( #35853 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-03 13:15:51 +00:00
Thomas Parnell
ad9d09e2b8
[Perf] [Hybrid] Copy num_accepted_tokens in non-blocking way when not using prefix caching ( #35442 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2026-03-03 04:15:43 -08:00
Szymon Reginis
4beebfd146
[CI/Build][Intel] Add new performance benchmarks for Intel Gaudi 3 ( #31025 )
...
Signed-off-by: Szymon Reginis <sreginis@habana.ai >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-03 19:48:24 +08:00
hallerite
b8401cde0e
add regression test ( #35834 )
...
Signed-off-by: hallerite <git@hallerite.com >
2026-03-03 07:32:15 +00:00
TJian
5dfc5abe94
[ROCm] [Release] Change the package from aiter to amd-aiter ( #35198 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-03-02 23:13:39 -08:00
lin-shh
8fa68a8ce4
Fix TYPE_CHECKING stub defaults in envs.py to match actual runtime defaults ( #35645 )
2026-03-02 21:59:43 -08:00
lin-shh
35a6f0bfe2
[Misc] Fix typos in comments: explict→explicit, paramaters→parameters ( #35648 )
2026-03-02 21:59:14 -08:00
Taneem Ibrahim
3a6cbf16e2
[MISC] Removed unused function find_all_indices() from tool_parsers/utils.py ( #35683 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-03-03 13:58:42 +08:00
Lucas Wilkinson
f44d1ddc8c
[BugFix] Fix cmake based incremental install (wrong vllm install dir) ( #35773 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-03-02 21:58:16 -08:00
Cyrus Leung
48a54c1e0d
[CI/Build] Trigger processor tests on registry update ( #35824 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-03 13:55:57 +08:00
Micah Williamson
8b9e8b7454
[ROCm][CI] Fix Assertion Logic For test_gpt_oss ( #35806 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-03-03 05:08:04 +00:00
Wentao Ye
c21d0039ec
[Refactor] Fix maxsim cuda platform and add cli to control it ( #35427 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-03 12:48:31 +08:00
Isotr0py
7d8bbe6f42
[CI/Build] Automatically patch video metadata for multimodal processor test ( #35822 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-03 04:27:45 +00:00
aykoppol
25e02647c2
[Core] Add optional flags to check for repetitive token patterns in engine output ( #35451 )
...
Signed-off-by: aykoppol <aykoppol+git@gmail.com >
2026-03-03 12:23:25 +08:00
Woosuk Kwon
a0a5178ab4
[Model Runner V2] Use ModelState.prepare_attn() for cuda graph capture [5/N] ( #35774 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-02 20:06:27 -08:00
Isotr0py
8ea8ba275e
[V0 deprecation] Remove Swin model ( #35821 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-02 20:03:41 -08:00
Woosuk Kwon
4f85bae9d6
[Docs][Model Runner V2] Add Design Docs ( #35819 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-02 19:58:14 -08:00
Andy Lo
0a7165fd71
[ModelRunnerV2] Rename sampler functions and variables for clarity ( #35459 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2026-03-02 19:48:56 -08:00
Robert Shaw
6521ccf286
[CI] Temporarily Disable Nightly Failures ( #35770 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-03 01:49:13 +00:00
Martin Vit
8ebd872f50
[Tool Parser] Fix Qwen3Coder streaming parameter loss with speculative decode ( #35615 )
...
Signed-off-by: Martin Vit <martin@voipmonitor.org >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-03 09:40:37 +08:00
zhrrr
168ee03e1c
[Model Runner V2][Perf] align dummy_run tokens to uniform decode for dp cudagraph ( #35376 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2026-03-02 17:10:47 -08:00
liuzhenwei
9dd656f0ea
[XPU][NIXL] Add GPUDirect RDMA support for XPU ( #35270 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-03 08:42:49 +08:00
Jakub Zakrzewski
c8b678e53e
[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 ( #35735 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
2026-03-03 08:32:14 +08:00
Andreas Karatzas
18c29c746b
[ROCm][CI] Fix backslash-continuation in pytest marker re-quoting and treat exit code 5 as success ( #35798 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-02 16:07:51 -08:00
Hanjie Qiu
96fc09503a
[All Reduce] Change default backend of Flashinfer All Reduce to trtllm ( #35793 )
...
Signed-off-by: hjjq <hanjieq@nvidia.com >
2026-03-02 18:57:38 -05:00
Roger Wang
1b82b433fc
[Bugfix] Fix MM processor test for Qwen3.5 ( #35797 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-03-02 23:05:08 +00:00
Robert Shaw
9319044ee9
[MoE][Perf] Wrap DSV3 QKVAProj GEMM in custom op for torch.compile ( #35751 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-02 23:03:49 +00:00
Boyuan Feng
c42dc402c1
clean unused cudagraph_batch_sizes ( #35552 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2026-03-02 22:00:16 +00:00
Ye (Charlotte) Qi
fa6a6be519
[Bugfix] Fix missing sequence_lengths in qwen3_omni_moe_thinker ( #35741 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2026-03-02 21:11:56 +00:00
Aaron Hao
cad21918e3
[BUG] Fix rlhf_async example ( #35788 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-03-02 20:36:40 +00:00
Jeffrey Wang
53700bf49b
[ci] Add Ray compatibility check informational CI job ( #34672 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2026-03-02 12:06:16 -08:00
Yashwant Bezawada
a13d8c03c9
[KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops ( #31057 )
...
Signed-off-by: Yashwant Bezawada <yashwant_b@me.com >
2026-03-02 15:04:47 -05:00
Fynn Schmitt-Ulms
9433acb8df
[Spec Decode] Add hidden states extraction system ( #33736 )
...
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com >
2026-03-02 14:29:09 -05:00
Richard Zou
d1a6e96d9e
[torch.compile] Improve cold and warm start compile tests ( #35709 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-02 19:27:06 +00:00
CSWYF3634076
2a9e3347e9
[BugFix][Model]Fix the garbled code in Ernie4.5-VL caused by fast_moe_cold_start ( #35587 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2026-03-02 18:56:33 +00:00
Isotr0py
cc0d565f40
[CI/Build] Enable Qwen3.5 tests on CI ( #35763 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-02 17:43:53 +00:00
Patryk Wolsza
358e4d5ba7
[CI][HPU] Pin vllm commit compatible with vllm-gaudi - HPU tests ( #35307 )
...
Signed-off-by: PatrykWo <patryk.wolsza@intel.com >
2026-03-02 17:02:26 +00:00
Cyrus Leung
792a74b973
[Doc] Improve UX of --enable-log-requests ( #35723 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-02 08:24:09 -08:00
Turner Jabbour
4034c3d32e
[Core] Move test utility to test file ( #35672 )
...
Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com >
2026-03-02 10:56:03 -05:00
Martin Hickey
7560d674c9
[CI] Fix mypy for vllm/device allocator ( #35518 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-02 15:53:18 +00:00
ElizaWszola
d9c7730877
[Performance] Extract kv update ops from MLA attention backends ( #34627 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Di Wu <dw2761@nyu.edu >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-03-02 10:43:19 -05:00
Runkai Tao
ada4f4fadd
[Fix Bug]num_active_loras always equals to zero ( #34119 )
...
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-02 23:17:46 +08:00
Harry Mellor
7e9149d9a9
[Docs] Add breadcrumbs for better UX ( #35749 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-02 14:31:54 +00:00
Martin Hickey
87c98b0236
[MyPy][BugFix] Check profiler is assigned before calling start() on it ( #35505 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-02 13:23:42 +00:00
Tyler Michael Smith
de7dd634b9
Fix unresolved-import errors when using Astral's ty by removing src.root ( #35681 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-03-02 10:26:47 +00:00
Chauncey
9a87b0578f
[Feat] Supports Anthropic Messages count_tokens API ( #35588 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-02 09:48:54 +00:00
wangxiyuan
510bc9e1df
[Misc] Cleanup useless current_platform import ( #35715 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2026-03-02 09:36:54 +00:00
Charles Ashby
cbd361fd46
[CPU][Distributed] Fix Enable _CPUSHMDistributed only when TP/PP ranks share the same SHM group name ( #34169 )
...
Signed-off-by: Charles Ashby <charlesa.l@hotmail.com >
2026-03-02 09:34:35 +00:00
Nicolò Lucchesi
c212202d93
[Misc] Bound NIXL upper bound version ( #35495 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-03-02 16:57:07 +08:00
Andreas Karatzas
ec27b36b4b
[CI] Defining extended V1 e2e + engine tests ( #35580 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-02 08:10:54 +00:00
Charlie Fu
3fd1d4ec2c
[Rocm][CI] Fix LM Eval Large Models (H100) test group ( #34750 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2026-03-02 07:43:38 +00:00
EdalatiAli
cb21972a97
[Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels ( #34448 )
...
Signed-off-by: EdalatiAli <aliedalati@cohere.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-03-01 23:31:19 -08:00
Andreas Karatzas
c34963f138
[ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism ( #35152 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-02 15:04:18 +08:00
Hongxia Yang
f26650d649
[ROCm] add amd-quark package in requirements for rocm to use quantized models ( #35658 )
...
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com >
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com >
2026-03-02 06:02:43 +00:00
Kunshang Ji
92f5d0f070
[XPU] fix mxfp4 activation type ( #35691 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-02 11:48:39 +08:00
Jesse Cai
a60985b07e
Fix deprecated v1 config tests ( #35327 )
...
Signed-off-by: Jesse Cai <jessecai@fb.com >
2026-03-01 20:32:03 -05:00
Lucas Wilkinson
8b5014d3dd
[Attention] FA4 integration ( #32974 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-03-01 23:44:57 +00:00
zhanqiuhu
57a96e26c9
Revert "[Bugfix] Disable TRTLLM attention with KV transfer enabled ( #33192 )" ( #34832 )
...
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
2026-03-01 22:32:37 +00:00
Richard Zou
e82fbeec7b
[torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 ( #35475 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-01 21:44:22 +00:00
haosdent
6290470843
[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile ( #35256 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-03-01 15:14:46 -05:00
Woosuk Kwon
72f4d16262
[Model Runner V2] Use block table apis for capture inputs ( #35671 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-01 10:31:13 -08:00
Seungho Yoon
5a435507d8
fix(mxfp4): return is_monolithic=False when LoRA is enabled for Triton backend ( #35382 )
...
Signed-off-by: Seungho Yoon <yoonsnowdev@gmail.com >
2026-03-01 09:59:30 -05:00
Taneem Ibrahim
59d7af9c6c
[MISC] Fixing a null reference by removing parallel_utils from mypy EXCLUDE ( #35630 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-03-01 09:26:44 -05:00
Asaf Gardin
bbf81f9a92
[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching ( #34798 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-03-01 20:40:23 +08:00
Woosuk Kwon
da543d1abe
[Model Runner V2] Minor refactoring for EncoderRunner ( #35628 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-01 00:15:39 -08:00
Ryan Rock
87d319c52f
[AMD][CI] Support Triton attention with ExampleConnector ( #34931 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2026-03-01 09:58:07 +02:00
lin-shh
a9ec392c86
Fix typo: implictly -> implicitly in isaac.py docstring ( #35646 )
2026-02-28 23:34:37 -08:00
lailoo
afd089f231
[Bugfix][Model] Fix Qwen3.5/Qwen3Next ignoring --dtype flag on older GPUs ( #35617 )
2026-03-01 03:27:37 +00:00
gnovack
3ecd0bf9fc
Add TMA support to fused_moe_lora kernel ( #32195 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-01 10:55:25 +08:00
Woosuk Kwon
e3eb146f7a
[Model Runner V2] Add ModelStateInterface [4/N] ( #35621 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-28 13:19:45 -08:00
Martin Vit
95a395dbec
[Bugfix] Fix Anthropic API base64 image handling in Messages endpoint ( #35557 )
...
Signed-off-by: Martin Vit <martin@voipmonitor.org >
2026-02-28 20:57:08 +00:00
Isotr0py
e94b263bd6
[Chore] Cleanup BNB utilization dead code ( #35620 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-28 19:22:41 +00:00
Wentao Ye
e113a30113
[Deprecation] Deprecate code in 0.17 as scheduled ( #35441 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-28 17:32:37 +00:00
Cyrus Leung
1dafb29f91
[Benchmark] Avoid unnecessary video download in MMVU ( #35618 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-28 09:07:02 -08:00
emricksini-h
49b9ae32e9
[Fix] Avoid sending image input to other PP ranks ( #35405 )
...
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-01 00:14:29 +08:00
cwazai
63d7972f13
Fix Qwen3_5MTP packed_modules_mapping for gate_up_proj ( #35581 )
2026-02-28 14:50:55 +00:00
flutist
c68e69f144
custom dataset img support base64 ( #35280 )
...
Signed-off-by: xjx <493337577@qq.com >
2026-02-28 11:49:52 +00:00
Chauncey
7e08c22b8c
[Feat] Add CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function ( #35271 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-28 10:12:00 +00:00
Augusto Yao
8e75d88554
add io_process_plugin for sparse embedding ( #34214 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-28 09:16:37 +00:00
Mario Hong
0892d1ab1f
[Feature]Supports Anthropic Thinking Block ( #33671 )
...
Signed-off-by: mariohong <mariohong128@gmail.com >
Co-authored-by: zetaohong <i-hongzetao@stepfun.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-02-28 09:02:33 +00:00
Hashem Hashemi
7600642eae
Add padding support to wvSplitK solution for skinny GEMMs ( #33762 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-28 09:02:05 +00:00
Andreas Karatzas
1e69c04887
[ROCm][CI] Parametrize vision score tests across attention backends with per-backend tolerances ( #35571 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-28 08:59:26 +00:00
Cyrus Leung
4292e3b807
[Benchmark] Improve UX of sweep scripts ( #35600 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-28 00:36:02 -08:00
Cyrus Leung
24d6ea8afd
[Benchmark] Rename SLA Finder to Workload Explorer ( #35586 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-27 23:31:55 -08:00
Chauncey
57c86c0741
[Misc] Change logging level from info to debug for tool parser import ( #35575 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-28 14:51:35 +08:00
Chauncey
06254d4cbb
[CI] add trainer_send_weights for MockWeightTransferEngine ( #35589 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-28 06:47:43 +00:00
Andreas Karatzas
f5d1281c9d
[ROCm][CI] Expose tests to AMD production CI and fix amdsmi heap corruption ( #35071 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-28 13:57:31 +08:00
Andreas Karatzas
94029ffaf0
[ROCm] Derive device capability from GCN arch string without CUDA init ( #35069 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-28 13:55:28 +08:00
Andreas Karatzas
88e8525f2e
[ROCm][CI] Adding infiniband mappings for moriio tests ( #35170 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-28 13:53:28 +08:00
Ilya Markov
b2d8b422b2
[EPLB] Enforce sync eplb for NCCL-based all2all backend ( #35212 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-02-28 05:47:12 +00:00
Umut Polat
1d5ab5d603
[Bugfix] Move chat completion response_format validation to Pydantic model_validator ( #35510 )
...
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com >
2026-02-27 21:26:19 -08:00
Huy Do
7b346ba8ed
[Bugfix] Propagate compilation_time from workers to main process for TP>1 ( #35503 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2026-02-28 05:03:22 +00:00
Itay Alroy
dea268336f
[1/N] Elastic EP Milestone 2 ( #34861 )
...
Signed-off-by: Yongji Wu <wuyongji317@gmail.com >
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com >
Co-authored-by: Yongji Wu <wuyongji317@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com >
2026-02-28 04:46:42 +00:00
Ma Jian
90805ff464
[CI/Build] CPU release supports both of AVX2 and AVX512 ( #35466 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: jiang1.li <jiang1.li@intel.com >
2026-02-28 04:35:21 +00:00
Matthew Bonanni
2562e0271e
[MTP] Validate that MTP weights are actually loaded ( #35548 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-28 12:27:40 +08:00
Cyrus Leung
fd68cd132b
[Bugfix] Fixes for SLA finder ( #35537 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-27 20:20:55 -08:00
Micah Williamson
0edf101d2b
[ROCm] Add stablelm Head Size 80 To Supported Head Sizes For ROCM_ATTN ( #35527 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-28 12:16:34 +08:00
Douglas Lehr
d5b6f3ba36
[ROCm][Quantization] Add Composable Kernel (CK) backend support for M… ( #34301 )
...
Signed-off-by: Doug Lehr <douglehr@amd.com >
Signed-off-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com >
Signed-off-by: Douglas Lehr <Doug.Lehr@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com >
2026-02-28 03:37:01 +00:00
Woosuk Kwon
1a014a0a93
[Model Runner V2] Move MM encoder to Model States [3/N] ( #35564 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-27 18:32:38 -08:00
Woosuk Kwon
86ac7bcf84
[Model Runner V2] Support pooling models ( #35120 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-27 18:03:01 -08:00
Umut Polat
405f28d38d
[Misc] Clean up ResponsesRequest model validators ( #35531 )
...
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com >
2026-02-28 01:19:21 +00:00
youkaichao
5323672bc2
[misc] cleanup one level of error stack when nixl fails to initialize ( #35517 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2026-02-28 08:42:37 +08:00
Roberto L. Castro
a201ad72d8
[Refactor][Kernel] Add global helper to deduplicate vectorized memory ops ( #35105 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
2026-02-27 16:28:17 -08:00
Rohan Potdar
e3691988d0
[ROCm]: fix aiter rope functionalization ( #35533 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-27 22:42:30 +00:00
Gregory Shtrasberg
9fa6c68fa6
[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends ( #35334 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-02-27 21:32:55 +00:00
Aaron Hao
2ce6f3cf67
[Feat][RL][2/2] Native Weight Syncing API: IPC ( #34171 )
...
Signed-off-by: hao-aaron <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-02-27 13:45:21 -07:00
Jakub Zakrzewski
1f3dbd95fd
[Bugfix][Model] Fix gpt-oss batch invariance ( #35404 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
2026-02-27 20:41:24 +00:00
Lucas Wilkinson
1d532f9d8f
[DP] Only use DP padding when cudagraphs are actually used ( #34102 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-27 15:14:31 -05:00
Lucas Kabela
234a65b781
[Bugfix] Add monkeypatch to prevent race condition from writing ( #35420 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-02-27 14:51:36 -05:00
SteadfastAsArt
2decec9856
[Transformers backend] Ignore MTP weights when num_nextn_predict_layers=0 ( #34888 )
...
Signed-off-by: SteadfastAsArt <695488173@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-27 19:39:23 +00:00
Zhengxu Chen
29b35477b0
[compile] Fix caching error over pytree slice node. ( #35308 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-02-27 19:34:16 +00:00
Nick Hill
b1d9f5372d
[Model Runner V2] Warmup kernels ( #35172 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-27 10:43:30 -08:00
Raushan Turganbay
fd6de37fca
[BugFix] Fix 3D rope in transformers backend ( #35097 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-27 18:34:49 +00:00
Netanel Haber
c8aca0c9e1
Support parakeet as audio encoder for nemotron-nano-vl ( #35100 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-27 11:07:38 -07:00
Martin Hickey
b602e4f299
[Doc] Fix link to Llama chat template for usability ( #35525 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-27 17:51:09 +00:00
Huamin Li
157722da75
[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block ( #35480 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2026-02-28 01:50:37 +08:00
Nick Hill
1d897ff04f
[Misc] Fill in some v1 CODEOWNERS gaps ( #35524 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-27 09:34:37 -08:00
fort726
905d76b51d
[Model] Add huggingface skt/A.X-K1 model ( #32407 )
...
Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com >
Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com >
Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-27 09:26:02 -08:00
Yanan Cao
9098ce690c
[Kernel] [Helion] [7/N] Use HOP to represent Helion Kernel call to enable fx tracing and pattern matching ( #34390 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-02-27 09:21:35 -08:00
Nick Hill
876312f0b5
[Core] Fix gpu_worker.py pre-commit errors ( #35312 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-27 07:54:24 -08:00
Boyuan Feng
5de98abc12
Add @BoyuanFeng to CODEOWNERS ( #35317 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2026-02-27 15:53:47 +00:00
Koushik Dutta
9251ed5c4f
[Bugfix] Handle case when kimi ends reasoning with a tool call ( #33646 )
...
Signed-off-by: Koushik Dutta <koushd@gmail.com >
Co-authored-by: mondaylord <20212010046@fudan.edu.cn >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-27 14:58:28 +00:00
Yueqian Lin
e8249378e4
[Bugfix] Fix check_interleaved_audio_video false positive for batched non-interleaved requests ( #35487 )
...
Signed-off-by: linyueqian <linyueqian@outlook.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-27 06:48:25 -08:00
haosdent
6d4f9d3ad5
[Bugfix] Fix DCP + FA3 crash due to missing num_splits in _forward_with_dcp ( #35082 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-27 22:27:06 +08:00
Harry Mellor
fbe3f0120a
Revert "Add GlmOcrConfig for GLM-OCR model type recognition" ( #35512 )
2026-02-27 06:13:27 -08:00
Jason Li
66c1751d13
[compile] Cleanup: Remove unnecessary +rms_norm forcing for sequence parallelism ( #35410 )
...
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com >
2026-02-27 08:36:37 -05:00
Tib
6467b635b6
[Bugfix] Add missing activation attr to RMSNormGated ( #35423 )
...
Signed-off-by: tibG <naps@qubes.milou >
Co-authored-by: tibG <naps@qubes.milou >
2026-02-27 12:53:35 +00:00
Max Hu
9c3fe9936b
Flashinfer cuDNN backend for Qwen3 VL ViT attention ( #34580 )
...
Signed-off-by: Max Hu <maxhu@nvidia.com >
Signed-off-by: Max Hu <hyoung2991@gmail.com >
Co-authored-by: Max Hu <maxhu@nvidia.com >
Co-authored-by: Shang Wang <shangw@nvidia.com >
2026-02-27 20:20:23 +08:00
Umut Polat
b66a74649e
[Bugfix] Replace assert with ValueError for response_format validation in completions endpoint ( #35456 )
...
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com >
2026-02-27 08:01:06 +00:00
Wang Xingran
07bdabef03
[Bugfix] Use 'sum' reduction instead of 'avg' in Async TP reduce-scatter ( #33088 )
...
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com >
Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com >
Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com >
2026-02-27 07:06:08 +00:00
Chengyi Nie
a572baff5e
[Model Performance] Add Qwen3MoE tuned MoE configs for H200 ( #35457 )
...
Signed-off-by: Chengyi Nie <cnie@roblox.com >
Co-authored-by: Chengyi Nie <cnie@roblox.com >
2026-02-27 13:51:14 +08:00
zofia
516cf26698
[Bug] correct out dtype of rms_norm_gated native path ( #35369 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-27 05:19:51 +00:00
Jiangyun Zhu
487e5c51f7
[Bugfix] disable allreduce_rms_fusion by default when pp size > 1 ( #35424 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-02-27 04:18:52 +00:00
Daniel Huang
1a8c71674e
[BugFix] Repo utils debug print patch ( #35434 )
...
Signed-off-by: Daniel Huang <daniel1.huang@intel.com >
2026-02-27 03:50:56 +00:00
Wentao Ye
062b789632
[Bug] Fix outdated links in source code ( #35314 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-27 03:50:46 +00:00
gnovack
a532c83849
use 'max_active_experts' for moe lora input size ( #33197 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2026-02-27 03:50:43 +00:00
Jee Jee Li
1e5ad9b74f
[Bugfix] Fix Qwen3NextForCausalLM packed_modules_mapping ( #35413 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-26 19:46:30 -08:00
Nicolò Lucchesi
cabdaa7619
[Misc] Move GPUModelRunner.prepare_kernel_block_sizes to utils ( #35400 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-27 11:42:51 +08:00
Chenyaaang
06be53563b
[Core]Extract is_last_rank in Ray for tpu to override ( #33012 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2026-02-27 03:18:52 +00:00
Angela Yi
c29ee9c326
[compile] Invalidate cache for cpu flags ( #35119 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-02-27 02:54:11 +00:00
daniel-salib
d43048ce05
[Bugfix] Emit reasoning_part events in simple streaming path for Resp… ( #35184 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2026-02-27 09:49:06 +08:00
Michael Goin
4fec53cfcb
[CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI ( #34274 )
2026-02-26 17:58:03 -07:00
roikoren755
38c498b8e3
[Performance] Cublas Bf16 Gate with Fp32 Output ( #35121 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-02-26 16:51:28 -08:00
Andrii Skliar
56a6371706
[Update] Use FlashInfer fast_decode_plan directly instead of replication ( #34687 )
...
Signed-off-by: Andrii <askliar@nvidia.com >
Co-authored-by: Andrii <askliar@nvidia.com >
2026-02-26 16:31:43 -08:00
Pavani Majety
6283021142
[Bugfix] Fix KV Scale loading for MLA Models ( #35430 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2026-02-26 23:38:19 +00:00
Aleksandr Malyshev
01923eec70
[ROCm][Quantization] GPT OSS Upstream MoE wmxfp4_afp8 with static scales ( #30357 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2026-02-26 16:50:16 -06:00
pkousha
31fb6f43da
[Kernel][perf] optimize NCCL symm_mem vs custom_AR selection thresholds ( #33839 )
...
Signed-off-by: <>
Signed-off-by: pkousha <43781676+pkousha@users.noreply.github.com >
Co-authored-by: Pouya Kousha <pkousha@login-eos01.eos.clusters.nvidia.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-26 14:35:58 -08:00
Tyler Michael Smith
eb19955c37
[WideEP] Remove pplx all2all backend ( #33724 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-26 14:30:10 -08:00
Lucia Fang
0f2f24c8b2
[Bugfix] Fix MessageQueue connect_ip for cross-node data parallelism ( #35429 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-26 22:08:16 +00:00
sychen52
d0105b84f0
add mixed precision support for modelopt ( #35047 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com >
2026-02-26 21:56:24 +00:00
danielafrimi
832a780f3a
Nemotron: use per-layer config in NemotronHMLPDecoderLayer for heterogeneous models ( #35396 )
...
Signed-off-by: dafrimi <dafrimi@nvidia.com >
2026-02-26 16:55:19 -05:00
ElizaWszola
98217b09f9
[Performance] Extract KV cache update op from flashinfer forward ( #35422 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
2026-02-26 21:29:01 +00:00
不做了睡大觉
967572dd5f
fix(reasoning): Qwen3ReasoningParser returns truncated output as reasoning ( #35230 )
...
Signed-off-by: stakeswky <stakeswky@users.noreply.github.com >
Co-authored-by: stakeswky <stakeswky@users.noreply.github.com >
2026-02-26 20:30:45 +00:00
Woosuk Kwon
3d66502e1b
[Model Runner V2] Prepare attn metadata in ModelState [2/N] ( #35383 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-26 11:47:02 -08:00
Woosuk Kwon
c66aa48e99
[Model Runner V2] Add model states [1/N] ( #35350 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-26 11:20:35 -08:00
Nick Hill
b6d5a17298
[Model Runner V2] Fix error-handling ( #35063 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-26 11:00:19 -08:00
Lucas Wilkinson
5e58bdc711
[Bugfix] Remove erroneous lower bound on LoRA vocab size constraint ( #35354 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-26 18:44:50 +00:00
Runkai Tao
a1f53addb1
[BugFix] Align fused MoE-LoRA kernel config with actual weight shapes ( #34396 )
...
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu >
2026-02-26 18:03:10 +00:00
Wentao Ye
05970c772c
[Refactor] Remove dead code for attention benchmark script ( #35418 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-26 09:53:46 -08:00
Yiliu Dong
d940607629
[Core] Support min_tokens with speculative decoding ( #32642 )
...
Signed-off-by: qianlihuang <yiliu.dong@qq.com >
Co-authored-by: qianlihuang <yiliu.dong@qq.com >
2026-02-26 12:31:28 -05:00
Wentao Ye
99c7892c5b
[Perf] Optimize maxsim scores computation for pooling models, 13.9% E2E throughput improvement ( #35330 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-26 17:14:54 +00:00
hujia177
ec8f943db1
Add GlmOcrConfig for GLM-OCR model type recognition ( #34982 )
2026-02-26 17:04:42 +00:00
Or Ozeri
f2ad952f40
[BugFix][kv_offload]: Fix kernel block size detection ( #35125 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-02-26 16:29:34 +00:00
Sage Moore
9e2cabdf9c
[ROCm] Update the torch version in rocm_build.txt to use the official 2.10 release ( #34387 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2026-02-26 16:28:45 +00:00
Douglas Lehr
ec8ab9d254
[ROCm] Add dynamic mxfp4 quantization for DeepSeek V2 projection layers ( #34157 )
...
Signed-off-by: Doug Lehr <douglehr@amd.com >
Signed-off-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
2026-02-26 10:00:49 -06:00
Wentao Ye
05972ea7e5
[Refactor] Remove dead or duplicate func utils or variables ( #35318 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-26 10:57:56 -05:00
Jakub Zakrzewski
111d869069
[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model ( #35297 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
2026-02-26 14:17:17 +00:00
stingoChen
7fea7250a4
[Bug] Fix missing <think> tag after tool call in MiniMax 2.1 ( #35352 )
...
Signed-off-by: 冬马 <chenxinke@cai-inc.com >
Co-authored-by: 冬马 <chenxinke@cai-inc.com >
2026-02-26 22:11:07 +08:00
Cyrus Leung
845ee348ef
[Misc] Standardize handling of mm_processor_kwargs.size ( #35284 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-26 13:05:46 +00:00
Asaf Gardin
ec13e549d3
[Bugfix] Fix uint32 overflow in Mamba selective scan state pointer arithmetic ( #35275 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-02-26 12:22:06 +00:00
Li-Yongwen
c6ca51598a
[Bugfix] fix device_name for routing replay ( #34336 )
...
Signed-off-by: liyongwen <1310439159@qq.com >
2026-02-26 12:18:38 +00:00
Yueqian Lin
c0615a296d
[Bugfix] Fix Qwen2.5-Omni and Qwen3-Omni mixed-modality embed regression ( #35368 )
...
Signed-off-by: linyueqian <linyueqian@outlook.com >
2026-02-26 11:58:23 +00:00
Harry Mellor
01914445b0
Remove bc-lint ( #35274 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-26 03:01:01 -08:00
Kunshang Ji
5281713e11
[XPU] use fixed UMD version in dockerfile.xpu ( #35392 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-26 18:54:55 +08:00
HZY
32693db8ce
[Bugfix] [Qwen3.5]Fix Qwen3.5 FP8 quantization: tuple shard_id weight loading ( #35289 )
...
Signed-off-by: daowu.hzy <daowu.hzy@alibaba-inc.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-26 18:26:15 +08:00
Akash kaothalkar
e03ddcfbd4
[Hardware][Powerpc]Enable prefix caching and chunked prefill for ppc64le ( #35081 )
...
Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash kaothalkar <akash.kaothalkar@ibm.com >
2026-02-26 10:21:24 +00:00
Sophie du Couédic
02acd16861
[Benchmarks] Plot benchmark timeline and requests statistics ( #35220 )
...
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-26 02:17:43 -08:00
Jiangyun Zhu
ab87f85231
[Model] Ring 2.5 ( #35102 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-02-26 02:17:11 -08:00
Krish Gupta
3827c8c55a
[Test] Add tests for n parameter in chat completions API ( #35283 )
...
Signed-off-by: KrxGu <krishom70@gmail.com >
2026-02-26 09:14:07 +00:00
Kevin McKay
ade81f17fe
[Bugfix][Hardware][AMD] Gate FP4 ops on gfx950 to prevent MI300X crash ( #35250 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2026-02-26 16:11:07 +08:00
Gregory Shtrasberg
6042e66cd5
[ROCm] Add extra step in config initialization to populate custom ops before compilation config init ( #34848 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-02-26 16:05:40 +08:00
Chaojun Zhang
9f9a675b23
[XPU][8/N] Fix kernel bugs in XPU LoRA and MOE LORA ( #34115 )
...
Signed-off-by: chzhang <chaojun.zhang@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-26 15:46:44 +08:00
Ofir Zafrir
a07c4c5939
[BugFix][XPU] Fix speculative decoding on Intel XPU due to bug with IGC_ForceOCLSIMDWidth=16 ( #35298 )
...
Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-26 07:15:16 +00:00
Cyrus Leung
d3a51da92a
[Benchmark] Simplify SLA scan ( #35306 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-25 22:35:41 -08:00
Flora Feng
186ea22efe
[Misc][Harmony] Move Responses API only harmony utils to responses/harmony.py ( #35339 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-02-26 14:35:16 +08:00
Daniele
4a9c07a0a2
[BugFix] anthropic/serving_messages: fix tool call arguments streaming ( #34887 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-26 05:39:48 +00:00
Jason Li
9d37941017
[torch.compile] Sequence Parallelism threshold compile ranges ( #28672 )
...
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com >
Signed-off-by: Jason Li <jasonlizhengjian@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-26 05:00:12 +00:00
Fadi Arafeh
4171ff6dd9
[CPU][Feat] Enable KleidiAI INT8_W4A8 for all input dtypes ( #34890 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-02-26 05:00:10 +00:00
Woosuk Kwon
13025e71e8
[Model Runner V2] Add coding style guide ( #35325 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-25 20:42:40 -08:00
Hanjie Qiu
71dfce6aa6
[Kernel] Refactor FlashInfer allreduce for mnnvl backend ( #34109 )
...
Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com >
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com >
2026-02-26 03:17:20 +00:00
hujiaxin0
2aa4140402
openpangu-vl support video input ( #34134 )
...
Signed-off-by: hujiaxin <524446785@qq.com >
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com >
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-26 03:08:09 +00:00
Roberto L. Castro
86c3b5a808
[BugFix] Fix fp4 quant kernel on CUDA 12.8 ( #35210 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
2026-02-25 18:32:50 -08:00
Seungmin Kim
160424a937
[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs ( #33992 )
...
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com >
Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com >
Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-25 18:15:51 -08:00
Lucas Wilkinson
9511a3f8ee
[Bugfix] Fix AttributeError in SMControlContextManager ( #35338 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-25 18:01:10 -08:00
Michael Goin
de527e1cec
[UX] Add --moe-backend arg for explicit kernel selection ( #33807 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-25 17:44:44 -08:00
Yongye Zhu
1976356ee6
[MoE Refactor] MXFP4 Cutlass Experts to MK ( #34542 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2026-02-25 17:32:39 -08:00
Michael Goin
cbf8f7028c
[UX] Add --performance-mode {balanced,interactivity,throughput} ( #34936 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-25 17:28:31 -08:00
Ming Yang
6831650c40
[offloader] v2: Hide weight onloading latency via prefetching ( #29941 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-25 17:20:59 -08:00
Andreas Karatzas
ed42507f6d
[ROCm][CI] Amending deletion of AMD mirror ( #35322 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-25 14:17:56 -08:00
Andreas Karatzas
9571e99945
[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests ( #35265 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-25 14:16:18 -08:00
Elizabeth Thomas
c97234c08b
fix(mxfp4): Disable monolithic path for TRITON backend with EP ( #34270 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-25 13:33:42 -08:00
rasmith
b188bab441
[CI][AMD][BugFix] Add torch.cuda.set_device to test_punica_ops so punica kernels execute on same device as tensor ( #34985 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-25 19:18:00 +00:00
Lucas Wilkinson
15d76f74e2
Revert "[Misc] Enable weights loading tracking for quantized models" ( #35309 )
2026-02-25 09:20:15 -08:00
Andreas Karatzas
8fd6975479
[ROCm][CI] Disable skinny GEMMs in multimodal tests to fix non-deterministic results ( #35049 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-25 16:48:37 +00:00
pushkar
5d18bf8b32
[Bugfix] Fix Harmony preamble visibility in Responses API ( #32114 )
...
Signed-off-by: Pushkar Patel <git@thepushkarp.com >
Signed-off-by: pupa <pupa@users.noreply.github.com >
2026-02-25 08:08:16 -08:00
haosdent
0788ff0a15
[Bugfix] Gracefully disable AllReduceFusionPass on GPUs without multicast support ( #35085 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-25 07:31:45 -08:00
Chendi.Xue
d72b0be33c
[XPU]Fix for Qwen-OMNI crash ( #35249 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2026-02-25 07:31:07 -08:00
Bhoomit
42489e43c2
[Misc][LoRA] Increase max vocab size limit to 258048 in logits processor ( #34773 )
...
Signed-off-by: Bhoomit Vasani <vbhoomit@amazon.com >
2026-02-25 23:30:55 +08:00
Mario Hong
af5e6afa0a
[Bugfix] Fix step3p5 reasoning with interleaved thinking ( #34211 )
...
Signed-off-by: mariohong <mariohong128@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-02-25 15:13:01 +00:00
Benjamin Chislett
ee59a7c615
[Tests] Add GSM8k check to SpecDec E2E tests ( #34772 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-25 07:51:14 -05:00
Joao Gante
709eadbb0b
Doc link typo ( #35281 )
...
Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-25 03:00:31 -08:00
Harry Mellor
90fc7f9109
Fix custom processors that use deleted behaviour for Transformers v5 ( #35107 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-25 02:36:21 -08:00
Yanwen Lin
675ec59aa9
[Bugfix][CPU] Fix basic unit tests failing in CPU platforms ( #34677 )
...
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-25 08:36:15 +00:00
Yanwen Lin
80e60a6133
[Doc] Suggest "--managed-python" flag when installing python using uv ( #33069 )
...
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com >
2026-02-25 08:19:43 +00:00
jonoillar
26e722f906
[DOC][BugFix] Specfiy build dependency installation ( #34513 )
...
Signed-off-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com >
Co-authored-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com >
2026-02-25 08:04:06 +00:00
lichuang
2c619e5e3f
[Docs]Fix documentation formatting in architecture overview ( #34679 )
...
Signed-off-by: codedump <lichuang1982@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-25 08:00:15 +00:00
Simon Mo
8a685be8d9
docs: document committer proposal process in governance ( #35225 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-25 07:58:48 +00:00
Laura Wang
2465071510
[Perf] Add opt-in SM100 Oink RMSNorm custom-op path ( #31828 )
...
Signed-off-by: Laura Wang <3700467+Laurawly@users.noreply.github.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-24 23:01:53 -08:00
wenshuai
cd43673668
[Perf] Optimize FP8 gemm of sm120. ( #34424 )
...
Signed-off-by: wenshuai <wenshuai@xiaomi.com >
2026-02-24 22:25:24 -08:00
Xinyu Chen
35d44b4557
[XPU]Support CUDAGraph on XPU Platform ( #34482 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
Co-authored-by: chzhang <chaojun.zhang@intel.com >
Co-authored-by: zhenwei-intel <zhenwei.liu@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-24 22:22:52 -08:00
Kunshang Ji
8ad54a991b
[Platform] Add current_platform.num_compute_units interface ( #35042 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
2026-02-24 22:22:49 -08:00
Kunshang Ji
92510edc32
remove cuda check in top_k_top_p_triton kernel ( #35011 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-24 22:22:31 -08:00
Isotr0py
a6c137521c
[Misc] Add shard_id validation for MergedColumnLinear ( #35055 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-24 22:12:28 -08:00
Isotr0py
4572a06afe
[Misc] Enable weights loading tracking for quantized models ( #35074 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-24 22:11:03 -08:00
Zhengxu Chen
5cc29cfb8b
[compile] Improve error message during artifacts load failure. ( #35115 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-02-24 22:01:09 -08:00
Chen Zhang
8fae54faff
[Linear Attention] fix bug for linear attention + prefix caching + reset_prefix_cache ( #35157 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2026-02-24 22:00:19 -08:00
Harry Mellor
f7967577f5
Remove requirement to use --hf-overrides for DeepseekVLV2ForCausalLM ( #35203 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-24 22:00:06 -08:00
pks
af770b8e7b
[Bugfix] Fix AttributeError when passing StructuredOutputsParams to CompletionRequest ( #35237 )
...
Signed-off-by: Patrick Simianer <patrick@lilt.com >
2026-02-24 22:00:03 -08:00
Andreas Karatzas
2ff3e436ad
[Responses][CI] Filter negative token IDs in schema fuzz test to avoid 500 errors ( #35231 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-25 05:52:44 +00:00
Jhao-Ting Chen
c2c4c4611a
[FIX] fused moe with lora shared expert dual stream (1.07x otps) ( #34933 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-25 04:40:45 +00:00
Rohan Potdar
f38f8c9742
[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE ( #35180 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-25 04:36:40 +00:00
Flora Feng
ec1d30c0f6
[Responses] Decouple SSE event helpers from Harmony context ( #35148 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-02-24 20:05:25 -08:00
Pooya Davoodi
e3b2324ec4
[Frontend] Use init_app_state and FrontendArgs in run_batch ( #32967 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-24 19:40:39 -08:00
Nick Hill
dbf0da817a
[Core] Cleanup engine pause/sleep logic ( #34528 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-24 19:33:34 -08:00
Xin Yang
3bbb2046ff
[Bugfix] Fix expert_ids padding values in moe_align_block_size kernel ( #35161 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-24 17:14:24 -08:00
yugong333
576fe50333
Adding Nemotron fp8 Triton MoE Config ( #34674 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-24 15:56:38 -08:00
Hashem Hashemi
a0e50a4260
Convert wvSplitKQ to 16x16 MFMA in prep for mi4xx. ( #34100 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-24 23:35:21 +00:00
Benjamin Chislett
9fa5b25a23
[Bug][DSV3.2] Always prepare metadata for DeepGEMM Sparse Attention ( #35075 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-24 14:55:22 -08:00
Robert Shaw
ea97750414
[CI] Fix Distributed Tests ( #35236 )
...
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2026-02-24 22:31:56 +00:00
Andreas Karatzas
067c5d9ad1
[ROCm][CI] Added MI325 mirrors ( #34923 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-24 13:37:15 -08:00
Benjamin Chislett
f5972a872f
[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support ( #33726 )
...
Signed-off-by: Shahar Mor <smor@nvidia.com >
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Shahar Mor <smor@nvidia.com >
Co-authored-by: Roi Koren <roik@nvidia.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-24 09:49:56 -08:00
Matthew Bonanni
a9e15e040d
Add @MatthewBonanni to CODEOWNERS ( #35207 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-24 10:45:10 -07:00
Lucas Wilkinson
542ca66357
Revert "[CI/Build] Remove redundant OpenTelemetry pip install from CI configs" ( #35211 )
2026-02-24 09:26:42 -08:00
Cyrus Leung
fc8456c336
[CI/Build] Fix kernels test location ( #35205 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-24 09:20:34 -08:00
Wentao Ye
9ce8fad2a9
[Perf] Optimize Python Slice for Structured Output using islice instead of [:] ( #33593 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-24 09:02:36 -08:00
Harry Mellor
c38b8d5a31
Remove padding_index from models that don't use it for better Transformers v5 compatibility ( #35189 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-24 08:04:46 -08:00
Robert Shaw
60da0e1544
[CI] Remove Duplicated Tests ( #35199 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-24 23:53:30 +08:00
danisereb
9609b1f18d
Integrate flashinfer mm_mxfp8 in ModelOpt MXFP8 ( #35053 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-24 08:45:13 -07:00
danisereb
a0c7081695
Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe ( #35088 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-24 07:25:44 -08:00
R3hankhan
34ce0ffd1f
[CPU][Perf] Accelerate Attention head for s390x using vector intrinsics ( #34434 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-02-24 07:25:39 -08:00
Robin Nabel
0de5333989
Fix GLM4 parser tests ( #34905 )
...
Signed-off-by: Robin Nabel <opensource@nabel.co >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-02-24 22:27:42 +08:00
Eldar Kurtić
a87cc50859
[Attn,KV-cache] Use per-head scales in the attention selector ( #34281 )
...
Signed-off-by: Your Name <you@example.com >
Signed-off-by: Eldar Kurtic <research@neuralmagic.com >
Co-authored-by: Eldar Kurtic <research@neuralmagic.com >
Co-authored-by: Your Name <you@example.com >
2026-02-24 09:02:43 -05:00
Cyrus Leung
761e63e541
[Frontend] Always pass supported_tasks to validation ( #35186 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-24 04:16:33 -08:00
Isotr0py
d12d201409
[Bugfix] Fix failing FunASR processor test ( #35111 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-24 04:13:45 -08:00
eustlb
b3ad37c5db
[glm-asr] change defaults dummy audio size ( #35108 )
...
Signed-off-by: Eustache Le Bihan <eulebihan@gmail.com >
2026-02-24 04:13:33 -08:00
Wentao Ye
14561fabfd
[Perf] Optimize pooling model redundant copy, 1.8% throughput improvement ( #35127 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-24 04:13:11 -08:00
Zhengxu Chen
c77f3e1207
[compile] Save aot compile artifacts atomically. ( #35117 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-02-24 04:11:01 -08:00
Dor Huri
012dee9233
[Feature] Add LoRA tower/connector support for Llama 4 Vision (mllama4) ( #35147 )
...
Signed-off-by: dorhuri123 <dor.huri1@live.biu.ac.il >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-24 04:10:32 -08:00
Tugsbayasgalan Manlaibaatar
f1c664545b
Make voxtral compile friendly ( #33959 )
...
Signed-off-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-24 09:33:35 +01:00
Xin Yang
c870eb9e0f
[LoRA] Update LoRA expand kernel block_n calculation ( #32621 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-23 23:17:53 -08:00
BadrBasowid
6af03f2394
[Refactor] [1/N] Reorganize kernel abstraction directory ( #34055 )
...
Signed-off-by: BadrBasowid <badr.basowid@gmail.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-24 06:47:22 +00:00
Vlad Tiberiu Mihailescu
1a6cf39dec
[CI/Build] Remove redundant OpenTelemetry pip install from CI configs ( #35032 )
...
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com >
2026-02-23 22:24:11 -08:00
Nicolò Lucchesi
f91808ae0d
[MM] Allow audio chunking for offline LLM ( #34628 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-23 21:04:28 -08:00
Vadim Gimpelson
33a0d43c71
[BUGFIX][Qwen3.5] Hardcode mlp.gate as not quantizable ( #35156 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-23 19:42:24 -08:00
pschlan-amd
80d93fd6da
gpu_model_runner: Cache is_encoder_decoder from model config ( #35099 )
...
Signed-off-by: Patrick Schlangen <pschlan@amd.com >
2026-02-23 19:08:34 -08:00
Jia Guo
ec85340531
[Quantization] Support FP8 MoE bias for models like GPT-OSS ( #34906 )
...
Signed-off-by: jasperjiaguo <jasperg662@gmail.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-02-23 19:07:47 -08:00
Rohan Potdar
2ff4e51152
[ROCm] AITER fused RoPE+KVCache ( #33443 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
Signed-off-by: charlifu <charlifu@amd.com >
Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com >
Co-authored-by: charlifu <charlifu@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com >
2026-02-23 19:06:00 -08:00
Asaf Gardin
95642441d0
[Mamba1] - Change supports_update_block_table to True ( #35054 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-02-23 19:05:57 -08:00
Xin Yang
a7c9f7b7ec
[Bugfix] Fix lora_ids in FusedMoE LoRA test ( #35135 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-23 21:49:25 -05:00
Michael Goin
a4bd661fb3
[Perf] Enable FlashInfer DeepGEMM swapAB on SM90 by default ( #34924 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-23 17:34:41 -08:00
Michael Goin
3ef9fd0f98
[Bugfix] Fix DSV3 kernels breaking _C and _moe_C on unsupported arches ( #35123 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-23 17:11:27 -08:00
Michael Goin
22a97e6613
[Perf] Improve default triton fused moe configs ( #34846 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-23 16:01:28 -08:00
Aaron Hao
596ed1f02e
[RL] Validation for pause_mode='keep' ( #34992 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-02-23 16:30:56 -05:00
Nicolò Lucchesi
b8d8b7e934
[Misc] Monitor interface changes ( #35113 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-23 17:14:51 +00:00
Harry Mellor
28c5e69ba0
Enforce that model is the first positional arg when --served-model-name is used ( #34973 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 08:38:05 -08:00
Harry Mellor
864167d376
Fix custom processors that use deleted import for Transformers v5 ( #35101 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 08:38:00 -08:00
haosdent
a2ba6a5244
[Bugfix] Fix prefix caching for Mamba 'all' mode (Nemotron models) ( #34874 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-23 17:31:51 +01:00
Harry Mellor
c4f38696f7
Use Xet high performance mode for Transformers v5 ( #35098 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 08:19:30 -08:00
haosdent
a7f341c323
[Bugfix] Fix MRotaryEmbedding missing truncate attr with YaRN scaling ( #35080 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-23 16:05:52 +00:00
Robert Shaw
d13ece38d7
[CI] Skip Responses API ( #34990 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-23 07:46:45 -08:00
Mark McLoughlin
5cc7c4452e
[Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) ( #30950 )
...
Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus.
`--enable-mfu-metrics` is required for these to be exposed.
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-23 15:01:07 +00:00
Eldar Kurtić
b95bb6927f
[kv-cache, ct] Use compressed-tensors as a source of ground-truth for quant strategies ( #34254 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
2026-02-23 07:37:55 -07:00
Cyrus Leung
392645454b
[Refactor] Decouple TimingContext from InputProcessingContext ( #35083 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-23 14:15:50 +00:00
Eldar Kurtić
1e8438a89a
[Llama4,CI] Bring back Llama-4 bug fixes, and also fix Maverick tests ( #35033 )
...
Signed-off-by: Eldar Kurtic <you@example.com >
Co-authored-by: Eldar Kurtic <you@example.com >
2026-02-23 09:04:34 -05:00
Robert Shaw
8435b2e049
[ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (6% B1 Speedup) ( #34302 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-23 14:02:26 +00:00
Yan Ma
b1b5e045df
[XPU] allow TORCH_SDPA/TRITON_ATTN as XPU vit Backend ( #35010 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2026-02-23 05:06:44 -08:00
Andreas Karatzas
5f68464f92
[ROCm][CI] Fix spec decode profile assertion and logprob test determinism ( #35043 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-23 05:05:54 -08:00
Vincent Gimenes
aa08a30fc9
[CLEANING] Remove unused disable_by_batch_size from SpeculativeConfig ( #35060 )
...
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com >
2026-02-23 05:05:36 -08:00
Wentao Ye
7f40e9e516
[Refactor] Remove dead private func _fp8_perm and _extract_mask_for_item ( #35068 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-23 05:05:20 -08:00
Harry Mellor
103e614b14
Fix pipeline parallel with embed scaling in the Transformers modelling backend ( #35094 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 05:04:47 -08:00
Neil Schemenauer
54e2f83d0a
[Feature] Lazy import for the "mistral" tokenizer module. ( #34651 )
...
Signed-off-by: Neil Schemenauer <nas@arctrix.com >
2026-02-23 00:43:01 -08:00
Gabe Goodhart
e631f8e78e
fix: Apply embedding_multiplier to inputs_embeds ( #34813 )
...
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-23 00:42:46 -08:00
Martin Hickey
e97c46a92d
[BugFix]: Fix local mypy issues ( #34739 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 00:40:29 -08:00
Jee Jee Li
7291d1b288
[Bugfix] Fix kernel benchmark ( #33752 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-22 21:18:08 -08:00
Cyrus Leung
987506bca6
[Refactor] Simplify dummy data generation ( #35025 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-22 20:55:27 -08:00
Woosuk Kwon
c645e9a214
[Model Runner V2] Remove propose_draft method ( #35070 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-22 18:27:12 -08:00
Nick Hill
944ffb5968
[Model Runner V2][Minor] Remove redundant do_spec_decode field ( #35039 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-22 16:18:04 -08:00
qizixi
2bcf71b9c0
[Spec Decode] Reduce TP communication for speculative decoding draft token generation ( #34049 )
...
Signed-off-by: qizixi <qizixi@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-22 14:59:16 -08:00
tacos8me
b7892a3bef
[Model] Add NVFP4 quantization support for Step3.5-Flash ( #34478 )
...
Signed-off-by: tacos8me <ian@cloudhabit.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-22 12:30:46 -07:00
Benjamin Chislett
682566b18e
[Bug] Refactor max_num_batched_tokens to account for drafting ( #34898 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-22 11:18:46 -05:00
qizixi
b9c2a565cc
[Spec Decode] Defer clearing KV connector metadata for EAGLE3 speculative decode + prefill / decode disagg setup ( #34529 )
...
Signed-off-by: qizixi <qizixi@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-22 08:08:32 -08:00
Andreas Karatzas
dd8c3a7fb2
[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays ( #35052 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-22 10:07:18 +00:00
Andreas Karatzas
a8a47c17b6
[ROCm][CI] Fix flaky embedding chat test by using tolerance-based comparison ( #35050 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-22 09:03:44 +00:00
Roger Wang
40f88d8318
[Bugfix] Fix Qwen3/Qwen3.5 Reasoning Parser ( #34779 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-21 23:15:35 -08:00
Woosuk Kwon
2cbf9656ce
[Model Runner V2] Enable CUDA graph for Eagle3 ( #35040 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-21 21:42:50 -08:00
Xiao Li
30132cd144
Fix apply_top_k_top_p_triton called by non-cuda logits Tensor ( #35030 )
...
Signed-off-by: Xiao Li <ilx@meta.com >
2026-02-21 21:11:54 -08:00
Cyrus Leung
cbd95a2dd1
[Benchmark] Use sns.relplot for plotting ( #35027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-21 20:26:48 -08:00
Athrael Soju
970861ac0c
[New Model] Add ColModernVBERT ( #34558 )
...
Signed-off-by: Athrael Soju <athrael.soju@gmail.com >
Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com >
2026-02-22 12:23:41 +08:00
Wentao Ye
d24bdd7c4b
[CI] Bump mteb version to mteb[bm25s]>=2, <3 for pooling model unit tests ( #34961 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-21 20:23:24 -08:00
Andreas Karatzas
d403c1da1c
[CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream ( #35008 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-22 04:01:10 +00:00
Woosuk Kwon
b71fbd06e2
[Model Runner V2] Support attention group ( #35036 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-21 16:42:53 -08:00
Vadim Gimpelson
74d90b1ce4
[Model Bash][DSR1] Add selective dynamic shape marking for CustomOp ( #34900 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-21 19:28:01 -05:00
Woosuk Kwon
a4047d4ea9
[Model Runner V2] Support Eagle3 (no CUDA graph) ( #35029 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-21 12:55:24 -08:00
Cyrus Leung
965fe45935
[CI/Build] Fix gRPC version mismatch ( #35013 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-21 12:14:41 -07:00
Roman
98b0205c3c
[Frontend] Add automatic language detection for Whisper transcription ( #34342 )
...
Signed-off-by: space_check <roman.vuskov@rwth-aachen.de >
Signed-off-by: Roman <45857014+spacecheck@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-21 04:49:41 -08:00
Huy Do
272b535ab3
[Bugfix] Gate 256-bit instructions to CUDA 12.9+ ( #34791 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 04:48:14 -08:00
Cyrus Leung
f74f1572ca
[Benchmark] Improve benchmarks ( #35012 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-21 10:31:58 +00:00
petrpechman
bebfe55b1c
[Doc] Fix example of eagle3 ( #34960 )
...
Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz >
Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz >
2026-02-21 09:57:53 +00:00
Nick Hill
820d7815eb
[Core] Minor structured-output related scheduler optimization ( #34765 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 01:38:28 -08:00
Nicolò Lucchesi
ab6f3487a6
[PD] Change kv_load_failure_policy Default from "recompute" to "fail" ( #34896 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer
8dc8a99b56
[ROCm] Enable bitsandbytes quantization support on ROCm ( #34688 )
...
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com >
2026-02-21 00:34:55 -08:00
jennyyyyzhen
2aab2bb543
[ROCM] Optimize ROCM_AITER_FA spec decode eagle performance ( #34541 )
...
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu >
2026-02-20 20:32:05 -08:00
Andreas Karatzas
54254f7a61
[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends ( #34599 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:25:23 -08:00
Andreas Karatzas
cf93c1a128
[ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 ( #34570 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:25:07 -08:00
Andreas Karatzas
89358f0d35
[CI] Fix ColBERT HF comparison tests on AMD CI + refactor ( #34567 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:12:05 -08:00
zhongdaor-nv
a0fe7ea2f0
[feat] Add per-block extra_keys to KV events ( #33304 )
...
Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 20:11:40 -08:00
Andreas Karatzas
991d6bff38
[CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure ( #33949 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:03:32 -08:00
Kata Coder
5719a4e4e6
[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed ( #34574 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
2026-02-20 20:01:40 -08:00
pougetat
11be2c74dc
[Realtime] Add Qwen3-ASR realtime streaming support ( #34613 )
...
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-20 19:59:42 -08:00
Xin Yang
7a5adad480
[Kernel] Optimize sample_recovered_tokens_kernel ( #34974 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-20 19:59:06 -08:00
Li
59c6233297
Support prompt_embeds for pooling requests in output processor ( #34904 )
...
Signed-off-by: Li Zhang <lzhanga@amazon.com >
Co-authored-by: Li Zhang <lzhanga@amazon.com >
2026-02-20 19:57:38 -08:00
Taneem Ibrahim
d38cd3dde5
[Misc] Fix mypy errors in vllm/profiler and remove from exclude list ( #34959 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
2026-02-20 19:56:33 -08:00
Rohan Potdar
ded333fb9b
[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion ( #34636 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-20 19:56:16 -08:00
Yanan Cao
9d7577b2bd
[Kernel] [Helion] [9/N] Canonicalize GPU variant names to base model names ( #34928 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-20 19:55:51 -08:00
Vlad Tiberiu Mihailescu
e739c29ea4
[CI/Build] Add opentelemetry libs in default vllm build (requirements/common.txt) ( #34466 )
...
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com >
2026-02-20 19:54:55 -08:00
yugong333
a55caf6ae9
[LoRA] Support Quantized Adapters ( #30286 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Signed-off-by: wz1qqx <ziqi.wang@novita.ai >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: wz1qqx <55830058+wz1qqx@users.noreply.github.com >
Co-authored-by: wz1qqx <ziqi.wang@novita.ai >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 19:54:35 -08:00
Lucas Wilkinson
0e22cd618b
Revert "[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers " ( #34997 )
2026-02-20 17:19:19 -08:00
Wei Zhao
ea5f903f80
Bump Flashinfer Version and Re-enable DeepSeek NVFP4 AR+Norm Fusion ( #34899 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 13:37:31 -08:00
Ryan Rock
0632ed8778
[AMD][CI] Fix test_custom_allreduce for A100 testgroup ( #34735 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2026-02-20 21:33:04 +00:00
Lucas Wilkinson
aaefc58ee0
[CI] Revert PRs 34818 and 33600 ( #34979 )
2026-02-20 13:25:50 -08:00
Wei Zhao
f24b2de3d3
[Test] Add FP8 KV Cache Testing for MLA Backends ( #34473 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-20 18:51:58 +00:00
Michael Goin
fac1507f03
[CI] Remove failing prime-rl integration test ( #34843 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-02-20 10:17:42 -08:00
Zhengxu Chen
f863994084
[compile] Fix torch.compile time discrepancy in logging. ( #34912 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 08:47:14 -08:00
Zhengxu Chen
e4a5d8c653
[compile] Move torch_aot_compile directory under torch_compile_cache ( #34831 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-02-20 08:46:45 -08:00
Yanan Cao
a6d0299c75
[Kernel] [Helion] [6/N] Add num_tokens dimension to silu_mul autotuning and dispatching ( #34185 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-02-20 08:36:51 -08:00
Harry Mellor
6ce80f7071
Ensure that MkDocs v2 does not get installed ( #34958 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-20 15:38:11 +00:00
Huamin Li
1fe462168c
[perf] Avoid dtype promotion sync in mamba_get_block_table_tensor ( #34870 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 06:21:56 -08:00
Flora Feng
ed31a020ee
[Refactor] Extract Harmony streaming SSE event builders into streaming_events.py ( #34909 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 06:20:46 -08:00
Cyrus Leung
f9ac19204f
[V0 Deprecation] Remove unused MM placeholders in request output ( #34944 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-20 06:19:23 -08:00
Vadim Gimpelson
59965affbd
[BUGFIX] Fix _dummy_run missing prepare_inputs_event synchronization ( #34866 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-20 05:54:27 -08:00
Xin Yang
b1c4f0b265
[Kernel] Optimize grouped topk kernel ( #34206 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-20 01:34:45 -08:00
Kevin McKay
8de7c636cc
[Bugfix][Hardware][AMD] Fix ROCM_AITER_FA speculative decoding support ( #32877 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-19 22:25:46 -08:00
Frank Wang
059779231f
[Minor] Add logging when using MXFP4 MXFP8 TRTLLM backend ( #34916 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-19 22:07:57 -08:00
tianshu-Michael-yu
ea37530b47
[Models] LFM2: Support LoRA ( #34921 )
...
Co-authored-by: Piotr Mazurek <piotr635@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-19 22:07:23 -08:00
Micah Williamson
f5432e35a3
[ROCm][CI] Loosen RemoteOpenAIServer Startup Timeout ( #34922 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-20 05:37:49 +00:00
杨朱 · Kiki
07cab212f0
[Misc] Add deprecated environment variable utilities ( #33677 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-19 21:33:25 -08:00
rasmith
0c1dc42748
[CI][AMD][BugFix][P/D] Add default_vllm_config to test_moriio_connector.py so tests pass ( #33739 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-19 21:32:40 -08:00
Varun Chawla
676f82ae81
Add validation to reject non-text content in system messages ( #34072 )
...
Signed-off-by: Varun Chawla <varun_6april@hotmail.com >
2026-02-19 21:30:33 -08:00
Elizabeth Thomas
81bfc21a6a
[Model Bash]: Improve FP8 Oracle for Config Specific Kernel Selection ( #34260 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com >
Signed-off-by: Robert Shaw <robertgshaw2-redhat@h100-02.nemg-001.lab.rdu2.dc.redhat.com >
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
Co-authored-by: Robert Shaw <robertgshaw2-redhat@h100-02.nemg-001.lab.rdu2.dc.redhat.com >
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com >
2026-02-19 21:29:08 -08:00
Matthias Gehre
4e2c7caf2d
[Bugfix] Add regression test for MoE quant_config under torch.compile ( #34335 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-02-20 13:27:26 +08:00
Bowen Bao
d9e62c03eb
[Quark] Fix MoE fp8 activation scale handling on mi300 ( #34386 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com >
2026-02-19 21:27:14 -08:00
Kevin H. Luu
a1a2d79442
[ci] Use the right tag for CPU arm64 image ( #34915 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-02-19 19:59:15 -08:00
Cyrus Leung
ac900c89bb
[Refactor] Implement output type check in LLM ( #34794 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-19 19:57:55 -08:00
Mark McLoughlin
76df6072ff
[Core] Fix state names in pause_scheduler() ( #34840 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-19 17:21:46 -08:00
Michael Goin
16f24e8797
[CI] Add GPT-OSS Eval job for H100 ( #34359 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-02-19 17:14:54 -08:00
Nick Hill
40b2f1c3d9
[Model Runner V2] Minor CPU optimizations ( #34856 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-19 16:05:37 -08:00
Mayank Ketkar
648951a9c3
[Bugfix] Fix benchmark_fused_collective crash on CustomOp init ( #34665 )
...
Signed-off-by: Mayank Ketkar <mketkar@zoox.com >
Signed-off-by: Mayank Ketkar <mayket04@gmail.com >
Co-authored-by: Mayank Ketkar <mketkar@zoox.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-02-19 19:01:00 -05:00
Michael Goin
f72061a19a
[UX] More descriptive reasons in is_supported_config for MoE ( #34908 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-19 15:20:52 -08:00
Matthew Bonanni
662205d34e
[Bugfix] Fix Basic Models Test ( #34818 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-19 14:49:07 -08:00
Roger Wang
4fb8beefaa
[Bugfix] Fix cutlass fp8 kernel on hopper for Qwen3.5 ( #34914 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-19 13:34:55 -08:00
Alexei-V-Ivanov-AMD
304319c4ed
Change targets for AMD build in the "CI" pipeline ( #34918 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2026-02-19 21:26:53 +00:00
Wentao Ye
c683d11c94
[Refactor] Deprecate head_first for chunk_gated_delta_rule ( #34263 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-19 13:23:49 -05:00
roikoren755
3eff45d793
Revert "[NemotronH] Do not force router to run in fp32 ( #34582 )" ( #34808 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-19 09:47:05 -08:00
Robert Shaw
4685a630a2
[Model Bash][DeepSeekR1] Remove Shared Expert Clone ( #34344 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-19 07:56:14 -08:00
Eldar Kurtić
ee1d25f199
[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers ( #34471 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-19 07:55:41 -08:00
Linda
6fff24f30f
[Bugfix] Qwen3.5 kv-scale weight remapping ( #34719 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com >
2026-02-19 04:13:37 -08:00
Cyrus Leung
23210a911e
[CI/Build] Try to make beam search test less flaky ( #34885 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-19 19:16:58 +08:00
Cyrus Leung
1391378861
[Bugfix] Fix edge case in UUID data parsing ( #34884 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-19 02:24:30 -08:00
Andreas Karatzas
f6220f9877
[ROCm][Test] Fix beam search determinism failures from batch-size-dependent FP divergence and removed wrong marker ( #34878 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-19 08:25:26 +00:00
Andreas Karatzas
2df2bb27b0
[ROCm][CI] Removing all blocking labels from MI355 until stable infra ( #34879 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-19 07:53:08 +00:00
Tal Nir
f75b61a9e9
[Voxtral Realtime] Fix engine crash on empty multimodal embeddings ( #34862 )
...
Signed-off-by: Tal Nir <tal@nervexneurotech.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-18 23:21:47 -08:00
Wei Zhao
7f51e93864
[Bug] Fix DeepSeek V3 weight loading caused by incorrect prefix ( #34876 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-18 23:20:30 -08:00
Alex Brooks
4611af1663
[Bugfix] Add Quant Config to Llava Next Projector ( #34847 )
...
Signed-off-by: Alex Brooks <albrooks@redhat.com >
2026-02-18 23:18:23 -08:00
Manrique Vargas
ad5aa6bd9f
fix(docs): fix typos in comments and docstrings ( #34836 )
...
Signed-off-by: machov <mv1742@nyu.edu >
2026-02-18 23:17:41 -08:00
Jaeyeon Kim(김재연)
9681068cf9
[Frontend] Fix reasoning_tokens for text-based parsers in Responses API ( #33513 )
...
Signed-off-by: Jaeyeon Kim <anencore94@gmail.com >
2026-02-18 23:16:41 -08:00
Kevin H. Luu
b6101d384d
Deprecate test-pipeline.yaml ( #34864 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-02-19 02:15:27 +00:00
Woosuk Kwon
5fcb0cdd68
[Model Runner V2] Use FP32 for Gumbel Noise ( #34854 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-18 17:07:37 -08:00
Woosuk Kwon
c878b43b64
[Model Runner V2] Remove unnecessary copies in PW CUDA graph capture ( #34849 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-18 15:52:50 -08:00
rasmith
2b84ac669c
[CI][AMD][BugFix] Use torch.testing.assert_close instead of assert torch.allclose in test_rocm_skinny_gemms.py ( #34181 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-18 23:10:19 +00:00
zhrrr
11d3976b88
[Model Runner V2] support piecewise & mixed cudagraph ( #32771 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2026-02-18 15:03:17 -08:00
Yongye Zhu
40da9625a1
[MoE Refactor] Convert mxfp4 marlin into modular kernel format ( #34588 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-18 14:37:14 -08:00
Flora Feng
8d9babd4de
Fix empty tool_call_id in Anthropic messages API tool result conversion ( #34745 )
...
Signed-off-by: <>
Signed-off-by: sfeng33 <4florafeng@gmail.com >
Co-authored-by: Flora Feng <sfeng33@h100-01.nemg-001.lab.rdu2.dc.redhat.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-18 14:31:59 -08:00
Aaron Hao
e99ba957ec
[BUG] Fixing Weight Sync unit test ( #34841 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2026-02-18 17:20:10 -05:00
Kyle Sayers
64ac1395e8
[Docs] Clean up speculators docs ( #34065 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2026-02-18 13:48:11 -08:00
Cyrus Leung
61cf087680
[Bugfix] Fix lora tests ( #34834 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-18 13:22:31 -08:00
Wenlong Wang
847a57cd12
[Bugfix][MoE Kernel] Fix incorrect routing selection for models without expert groups (e.g., MiniMax-M2.1) ( #34673 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-18 13:03:24 -08:00
rasmith
fcd6ac97ed
[CI][AMD][BugFix] Skip tests in test_unquantized_backend_selection that should not run on ROCm ( #34655 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-18 15:00:40 -05:00
Woosuk Kwon
95be2a7f22
[Model Runner V2] Minor simplification for DCP ( #34786 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-18 11:04:53 -08:00
Jaden Mathias
0e60c925cf
[Bugfix] Remove assert causing hipErrorStreamCaptureUnsupported ( #34455 )
...
Signed-off-by: Jaden Mathias <jaden.mathias@amd.com >
2026-02-18 18:54:54 +00:00
Teng Ma
d7ff22204a
[Misc] Add mooncake-transfer-engine to kv_connectors requirements ( #34826 )
...
Signed-off-by: Teng Ma <teng-ma@linux.alibaba.com >
2026-02-18 18:26:24 +00:00
Isotr0py
c0bd8b13da
[Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion ( #34697 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
2026-02-18 09:46:53 -08:00
Michael Goin
caeb887bf6
[Bugfix] Fix NVFP4 TRTLLM MoE non-gated support; add gsm8k for Nemotron-3-Nano FP8+NVFP4 ( #34725 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-18 09:39:22 -08:00
Ilya Markov
6b3166a7c7
[CI][Bugfix] Fix multinode test script ( #34820 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-02-18 11:45:10 -05:00
Robert Shaw
25e2e136ef
[CI] temporarily disable multi-node tests ( #34825 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-18 11:32:44 -05:00
Robert Shaw
6874638bc4
[Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) ( #34758 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-18 07:42:36 -08:00
Burkhard Ringlein
e24663c5a9
Add unit tests for fp8 output fusion of triton_attn ( #34228 )
...
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-18 06:22:49 -05:00
Nick Hill
c50e105a88
[Model Runner V2] Avoid prepare prefill kernel launch overhead ( #34780 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-18 00:49:21 -08:00
Cyrus Leung
a766b30349
[Renderer] Deprecate code paths for old input processing ( #34775 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-18 00:35:04 -08:00
Asaf Joseph Gardin
1faa8cb73c
[Quantization] - Added uses_meta_device_weights to quant config ( #34645 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-02-17 23:43:44 -08:00
Marek Michalowski
e89a91d927
[Bugfix] fix activation in cpu_fused_moe_torch call ( #34696 )
...
Signed-off-by: Marek Michalowski <marek.michalowski@arm.com >
2026-02-17 23:39:46 -08:00
Michael Goin
909b147197
[Bugfix] Fix prefix creation for Qwen3.5 ( #34723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-17 23:39:15 -08:00
ElizaWszola
a88b3be7c4
[Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales ( #33255 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-17 23:35:04 -08:00
Nick Hill
a49ea5a58f
[Model Runner V2] A bit more PP simplification ( #34766 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-17 21:39:07 -08:00
Cyrus Leung
30ebe0dc3c
[CI/Build] Remove use of skip_v1 ( #34699 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-18 12:19:11 +08:00
Andreas Karatzas
cef65f0715
[ROCm][CI] Removed hard-coded attn backend requirement for Qwen VL ( #34753 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-18 03:59:53 +00:00
Russell Bryant
6f3b2047ab
[Core] Fix SSRF bypass via backslash-@ URL parsing inconsistency ( #34743 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: isotr0py <2037008807@qq.com >
2026-02-18 03:53:35 +00:00
Luka Govedič
02e8f26cea
[torch.compile] Turn on silu+fp4 quant fusion by default for O1+ ( #34718 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2026-02-18 03:29:15 +00:00
Hongxia Yang
4a00a511bb
[BugFix] [Build] fix string literals comparison in indexer_k_quant_and_cache calling site ( #34653 )
...
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com >
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com >
2026-02-17 19:19:41 -08:00
Cyrus Leung
a0d8d944e2
[Renderer] Move MM Hash parsing into Renderer ( #34711 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-17 19:18:55 -08:00
Amr Mahdi
df3f537a66
[CI] Remove unused precompiled wheel args from image build ( #34767 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-02-17 18:58:18 -08:00
Matthew Bonanni
7743152957
[Attention] Refactor check_and_update_config ( #33600 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-17 17:06:54 -08:00
Wentao Ye
ab33d2a629
[Feature] Decode Context Parallel support for GPU model runner v2 ( #34179 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-17 16:27:15 -08:00
Woosuk Kwon
be3af2d29e
[Model Runner V2] Further simplification for PP ( #34724 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-17 15:18:18 -08:00
Jongseok Park
c656ba3b4d
[Kernel] Triton-based Top-k and Top-p sampler kernels ( #33538 )
...
Signed-off-by: js_park <cakeng@naver.com >
Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com >
Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-02-17 23:14:30 +00:00
Matthew Bonanni
dc5fa77a4e
[Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs ( #34457 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-17 14:01:27 -05:00
Flora Feng
1e4a084c8e
[CI] Fix flaky test_parsable_context ( #34717 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-02-17 18:42:52 +00:00
Richard Zou
7967e854da
[BugFix] Fix sp tests ( #34716 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-17 17:07:56 +00:00
almayne
6bd6d0c3c1
Fixed whisper CPU test that does not spawn properly. ( #34324 )
...
Signed-off-by: Anna Mayne <anna.mayne@arm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-17 06:46:23 -08:00
Nicolò Lucchesi
8e962fef5f
[CI][Nixl] Add CrossLayer KV layout tests ( #34615 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-17 21:35:40 +08:00
Cyrus Leung
574fe75245
[Renderer] Move InputPreprocessor into Renderer (2/2) ( #34560 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-17 05:29:01 -08:00
junuxyz
c61a98f529
[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior ( #34514 )
...
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com >
2026-02-17 12:22:56 +00:00
Harry Mellor
28bffe9466
Fix docs build warning ( #34686 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-17 02:31:40 -08:00
ChenqianCao
ad65177a19
[Bugfix] Fix 'remove_instance_endpoint' method logic in disagg_proxy_demo ( #32922 )
...
Signed-off-by: ChenqianCao <39755070+ChenqianCao@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-17 10:06:53 +00:00
Tim Dettmers
d44a5b6c47
Remove dead bitsandbytes CxB code from 8-bit inference path ( #34633 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-17 01:49:14 -08:00
Jiangyun Zhu
1d65283e95
Revert "[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj" ( #34683 )
2026-02-17 01:29:27 -08:00
kourosh hakhamaneshi
c464b57374
[Ray] Propagate third-party env vars to Ray workers via prefix matching ( #34383 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-17 01:08:42 -08:00
Amr Mahdi
c5c38e152a
[CI] Fix bake config artifact path for AMI rebuild pipeline ( #34656 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-02-17 06:39:44 +00:00
Woosuk Kwon
d00df624f3
[Model Runner V2] Minor refactoring for penalties ( #34662 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-16 21:43:00 -08:00
Woosuk Kwon
9752da9d9c
[Model Runner V2] Minor simplification for BadWordsState ( #34669 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-16 21:27:24 -08:00
Woosuk Kwon
04925b2202
[Model Runner V2] Minor cleanup for PP ( #34666 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-16 19:15:31 -08:00
Woosuk Kwon
d74278fb67
[Model Runner V2] Fix unintended CPU-GPU sync in make_dummy ( #34667 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-16 19:00:29 -08:00
haosdent
b68fd899d1
[Bugfix] Fix fused MoE int32 overflow in stride*offset without perf regression ( #34507 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-16 17:58:49 -08:00
Aneesh Puttur
0b5f9b7204
[CI] Enable mypy import following for vllm/v1/kv_offload ( #34639 )
...
Signed-off-by: Aneesh Puttur <aneeshputtur@gmail.com >
2026-02-17 09:58:15 +08:00
zhanqiuhu
9a8853f781
[Core] Pipeline Parallel support for Model Runner V2 ( #33960 )
...
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
2026-02-16 17:48:16 -08:00
zhrrr
387a1898d9
[Model Runner V2] support bad_words sampling param ( #33433 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-16 16:36:06 -08:00
roikoren755
3b30e61507
[NemotronH] Do not force router to run in fp32 ( #34582 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-02-16 10:15:32 -08:00
Alexei-V-Ivanov-AMD
824f9e8f3c
Targeting the MI355 agent pool with all existing tests ( #34629 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2026-02-16 17:02:27 +00:00
Nicolò Lucchesi
6cc403e67d
[Bugfix][CI] Fix flaky entrypoints/openai/test_response_api_with_harmony.py::test_function_calling[openai/gpt-oss-20b] ( #34624 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-16 16:11:07 +00:00
Almog Tavor
72d5951d02
[Bugfix] Treat generation_config max_tokens as default not ceiling ( #34063 )
...
Signed-off-by: almogtavor <almogtavor@gmail.com >
2026-02-16 07:58:24 -08:00
Lucas Kabela
a3205beffb
[CI] Enable mypy coverage for individual excluded files ( #34292 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-16 07:34:29 -08:00
Christian Pinto
6930becd45
(bugfix): Fixed encode in LLM entrypoint for IOProcessr plugin prompts ( #34618 )
...
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
2026-02-16 07:33:55 -08:00
Andreas Karatzas
03a8770a6d
[ROCm][CI] Fix plugins test group; updating terratorch and dependencies ( #34589 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-16 07:33:42 -08:00
Yiqi Xue
bc56a1d56e
[Bugfix] Fix ARC touch KeyError for non-ready T1 blocks in kv offload ( #34576 )
...
Signed-off-by: Yiqi Xue <xuey666@gmail.com >
2026-02-16 07:33:19 -08:00
danisereb
ec7d9e6745
Fix call to moe_mk in modelopt MoE modules (required for LoRA) ( #34575 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-16 07:33:09 -08:00
Isotr0py
3bb4e4311c
[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj ( #34492 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-16 07:32:51 -08:00
Amr Mahdi
08f8c198ae
[CI] Disable precompiled wheel path in CI image builds ( #34606 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-02-16 15:14:43 +00:00
Harry Mellor
a21cedf4ff
Bump lm-eval version for Transformers v5 compatibility ( #33994 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-16 05:24:35 -08:00
emricksini-h
3ef74cde5d
[CI][Tracing] Fix race condition by adding server readiness check ( #34364 )
...
Attempt to resolve #34284 : "Metrics Tracing (2GPU)" fails with a
segmentation fault.
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai >
2026-02-16 12:57:39 +00:00
Ekagra Ranjan
cd81cdb399
[Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs ( #31058 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-16 11:08:44 +00:00
Andreas Karatzas
1e828573b4
[CI][Metrics] Stabilize tests with polling and subprocess guards ( #34566 )
...
test_abort_metrics_reset is flaky due to hardware-dependent
fixed sleeps: replace fixed sleeps with polling.
test_metrics_exist_run_batch passes even when the engine crashes
on startup (false positive): add subprocess lifecycle guards.
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-16 10:52:02 +00:00
Samu Tamminen
a5ccc85c8c
[Bugfix] Fix Dynamo unexpected keyword argument ( #34320 )
...
Signed-off-by: Samu Tamminen <stammine@amd.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-16 01:32:30 -08:00
Roger Wang
b5475d0534
Revert "[Misc] fix qwen3.5 config" ( #34610 )
2026-02-16 01:06:05 -08:00
JJJYmmm
9521002f0a
[Misc] fix qwen3.5 config ( #34604 )
2026-02-16 00:25:38 -08:00
Cyrus Leung
ec17bdd894
[Renderer] Move InputPreprocessor into Renderer (1.5/2) ( #34598 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-15 23:46:33 -08:00
Amr Mahdi
bb59c90248
[CI] Write bake config to temp directory instead of repo root ( #34569 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-02-15 22:15:47 -08:00
bnellnm
5bff999d12
[Bugfix] Add method to swap quant_method on FusedMoE to fix LoRA issues ( #34453 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-15 20:10:50 -08:00
Lucas Wilkinson
bb85929aa6
[BugFix] Fix Python 3.13 FlashMLA import error ( #34548 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-15 20:09:18 -08:00
Parth Bansal
5653021094
[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model ( #34584 )
...
Signed-off-by: Parth Bansal <parthbansal127@gmail.com >
2026-02-16 12:09:00 +08:00
Andreas Karatzas
974d829b05
[CI][Frontend] Return 422 instead of 500 for invalid Anthropic tool_choice ( #34590 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-15 20:06:48 -08:00
Isotr0py
91ac5d9bfd
[CI/Build] Enable tests for recent day-0 new models ( #34585 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 18:17:04 -08:00
Luka Govedič
23d825aba1
[torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test ( #34392 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-15 06:33:57 -08:00
Maryam Tahhan
f07a128413
[CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… ( #33079 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-02-15 06:33:08 -08:00
Isotr0py
71cd89264f
[MM Encoder] Add Triton ViT attention backend ( #32183 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 06:32:47 -08:00
Isotr0py
19fab44152
[Doc] Update Encoder-Decoder models support doc with Florence-2 ( #34581 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 04:18:57 -08:00
Seiji Eicher
79c7e09235
[KV Connector] Add temporary, off-by-default VLLM_DISABLE_REQUEST_ID_RANDOMIZATION workaround ( #34415 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2026-02-14 23:26:10 -08:00
haosdent
79f3fab05a
[Bugfix] Handle num_expert_group=None in flashinfer block-scale FP8 MoE ( #34494 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-14 23:25:46 -08:00
Vadim Gimpelson
604b9eaec5
[BUGFIX] Fix accuracy regression for NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 with TP>1 ( #34476 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-14 23:25:17 -08:00
Stanislav Kirillov
50dbd6c9e6
[bugfix] Fix critical bug when reporting for all paths where handler.create_error_response is used ( #34516 )
...
Signed-off-by: Stanislav Kirillov <stas@nebius.com >
Co-authored-by: Stanislav Kirillov <stas@nebius.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-14 23:24:25 -08:00
Andreas Karatzas
98bcc6ca59
[CI][Entrypoints] Validate detokenize token IDs to prevent int64 overflow causing 500 ( #34468 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-14 23:08:38 -08:00
Andreas Karatzas
f13e86d8dd
[Kernels] Fix Helion GPU utils to use platform-agnostic device name API ( #34537 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-14 20:29:23 -08:00
Woosuk Kwon
9ca768c740
[Model Runner V2] Minor cleanup for Sampler ( #34563 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-14 18:29:03 -08:00
Thomas Parnell
d5fe3f702c
[Hybrid] Enable mamba prefix cache "align" mode with async scheduling ( #33997 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2026-02-14 13:15:56 -08:00
Cyrus Leung
73391a1baa
[Renderer] Move InputPreprocessor into Renderer (1/2) ( #34510 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-14 10:14:21 -08:00
Andreas Karatzas
b3c14229b0
[ROCm][CI] Guard sparse MLA backend imports for ROCm compatibility in tests ( #34538 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-14 07:32:09 -08:00
Roger Wang
2f186635cb
[Bugfix] Fix Qwen3.5 config loading ( #34554 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-14 03:56:11 -08:00
Christian Pinto
342a7cda2d
[Misc] Update tests and examples for Prithvi/Terratorch models ( #34416 )
...
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-13 23:03:51 -08:00
Kata Coder
d1ea65d0a1
[new model] add COLQwen3 code & Inference ( #34398 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
Signed-off-by: katacoder <craftsangjae@gmail.com >
2026-02-14 12:15:19 +08:00
Andreas Karatzas
de42abb366
[CI] Heavy refactoring of Voxtral multimodal audio model tests ( #34294 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-13 20:04:29 -08:00
Julien Denize
60ca7981bc
Add explicit validation error for tool calls. ( #34438 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2026-02-13 20:04:01 -08:00
Christian S. Perone
0ef5b9147b
fix: use __annotations__ instead of get_type_hints() for dynamic kwargs detection ( #34527 )
...
Signed-off-by: Christian S. Perone <christian.perone@gmail.com >
Signed-off-by: Christian S. Perone <perone@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-13 20:03:37 -08:00
Shiyan Deng
ed242652d7
[bug] Make sure get_modality_with_max_tokens is deterministic ( #34533 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
2026-02-13 20:02:59 -08:00
Wei Zhao
b37b679770
[Feature][Perf] Support Selective CPU Weight Offloading ( #34535 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-13 20:02:24 -08:00
Andreas Karatzas
a0638d052d
[Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993 ( #34543 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-13 20:01:42 -08:00
Harry Huang
c027541eaf
[Hybrid] Enable spec decoding in mamba cache align mode ( #33705 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-02-13 13:02:28 -08:00
Ben Browning
fd267bc7b7
[Bugfix]: Fix structured output in multi-turn gpt-oss ( #34454 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-13 11:12:48 -08:00
Michael Goin
bfaa559305
Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides" ( #34530 )
2026-02-13 10:35:29 -08:00
Richard Zou
87789c8364
[Misc] vLLM's --enforce-eager should turn off compile and cudagraphs only ( #34523 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-13 09:52:20 -08:00
Pushpinder Singh
bcd65c1f6a
[Bugfix] Replace c10::optional with std::optional in topk kernel ( #34467 )
...
Signed-off-by: Pushpinder Singh <pushpindersingh135@gmail.com >
2026-02-13 08:30:23 -08:00
Wei Zhao
59d53066d8
[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation ( #32993 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-13 08:11:26 -08:00
LoganJane
4a9952ec1b
[Bugfix] Add quant_config in ViT of Kimi-K2.5 ( #34501 )
...
Signed-off-by: LoganJane <LoganJane73@hotmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-13 16:05:34 +00:00
Roger Wang
1dae7b7843
[Bugfix] Exclude language_model_only key from MM AOT compile hash but include in model one ( #34508 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-13 13:59:00 +00:00
Roger Wang
5885e330ef
[Misc] Port Qwen3.5 Configs ( #34512 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-13 05:24:25 -08:00
Ilya Boytsov
071d863e20
Extend ColBERT support to non-standard BERT backbones ( #34170 )
...
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com >
2026-02-13 09:53:09 +00:00
Woosuk Kwon
0916e7960b
[GDN] Use CPU tensors to build GDN metadata ( #34498 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-13 01:24:45 -08:00
Wentao Ye
3d2a026fd0
[Feature] Pipeline Parallel Async send/recv, 2.9% E2E throughput improvement ( #33368 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-02-13 16:38:16 +08:00
Aaron Hao
dddbff4624
[Core] Move pause and resume functions into engine ( #34125 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Signed-off-by: hao-aaron <ahao@anyscale.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-02-13 00:15:10 -08:00
Martin Hickey
47e9b63e1a
[KVConnector] Clean up redundant code in KV connectors ( #34147 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
2026-02-13 00:14:30 -08:00
Matthias Gehre
934acddef9
[Perf] fused_moe: add int4_w4a16 benchmark support and tuning config ( #34130 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-13 00:14:27 -08:00
Marek Michalowski
742d214d6e
[Bugfix] fix the import path in moe test utils.py ( #34245 )
...
Signed-off-by: Marek Michalowski <marek.michalowski@arm.com >
2026-02-13 00:13:45 -08:00
haosdent
4137c5dfa7
[Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode ( #34418 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-13 00:13:22 -08:00
Harry Huang
7a8a46ddcb
[BugFix] Fix and optimize max_num_blocks_per_req calculation for MambaSpec ( #34440 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-02-13 00:13:14 -08:00
myselvess
bcf0731aa0
[New Model] support new model ovis2.6 ( #34426 )
...
Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com >
2026-02-13 00:12:45 -08:00
Cyrus Leung
ec090c2429
[Refactor] Call renderer for online IO processor request ( #34490 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-12 22:48:45 -08:00
Roger Wang
eea3024f43
[Bugfix] Fix mamba state dtype setting for Qwen3-Next and Qwen3.5 ( #34489 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-12 22:48:42 -08:00
Cyrus Leung
2f308214c0
[Refactor] Pass full VllmConfig to Renderer ( #34485 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 22:48:38 -08:00
Cyrus Leung
1b4e8e53f8
[CI/Build] Fix CUDA re-initialization error in distributed model tests ( #34491 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-13 06:43:53 +00:00
haosdent
dcf6ee8592
[Bugfix] Fix encoder cache underestimation for GLM-4V/GLM-OCR single image ( #34483 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-12 21:04:06 -08:00
Cyrus Leung
372b2e762a
[Bugfix] Standardize getting number of image patches/tokens ( #34358 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 20:47:01 -08:00
Andreas Karatzas
6afa587d31
[ROCm][CI] Fix serving tokens test failures ( #34047 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-13 11:27:53 +08:00
Cyrus Leung
94ed6cf6ea
Add new sections to CODEOWNERS ( #34309 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 18:39:28 -08:00
Harry Huang
bf37812ca7
[Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode ( #33706 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-02-12 18:21:52 -08:00
Frank Wang
b86bf4417e
[Bugfix] Fix Random Dataset Prefix Length Inaccuracy ( #33907 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-12 18:21:19 -08:00
Yanan Cao
de13dd781f
[Kernel] [Helion] [5/N] Add Helion Autotuning infrastructure ( #34025 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-02-12 18:21:05 -08:00
LoganJane
62788f99a4
[Bugfix] Delete unused redundant code in Kimi-K2.5 ( #34427 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-12 18:18:42 -08:00
Cyrus Leung
ea5ff3a1f6
[Refactor] Simplify BOS/EOS token handling ( #34435 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 18:18:24 -08:00
bnellnm
04ea31baab
[Bugfix] Remove assert that's no longer valid ( #34443 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-12 18:18:15 -08:00
Harry Huang
6f019e6e0a
[BugFix] Add block_size validation for mamba cache align mode ( #34445 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-02-12 18:18:07 -08:00
Zhuohan Li
d707678dfb
Fix num_logprobs parameter description in sampler.py ( #34451 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2026-02-12 18:18:03 -08:00
Cyrus Leung
fc22cae4ac
[CI/Build] Update video URLs for testing ( #34446 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 18:15:36 -08:00
Yanan Cao
96161fe978
[Kernel] [Helion] [4/N] Add silu_mul_fp8 Helion kernel ( #33373 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-02-12 18:13:12 -08:00
Jaewon
4453ba8d9e
[Core] Profiler improvements and lazy initialization ( #33198 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-12 16:16:38 -08:00
Jaewon
aa181c923b
[Core] Add sleep level 0 mode with enqueue/wait pattern ( #33195 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-12 16:16:25 -08:00
Alec S
be7370daf3
[Frontend] Enable generic structured_outputs for responses API ( #33709 )
...
Signed-off-by: Alec Solder <alecs@fb.com >
Co-authored-by: Alec Solder <alecs@fb.com >
2026-02-12 16:15:48 -08:00
Mengtao (Martin) Yuan
9ea1f598ce
Use paged_attention_v1 for sliding window decode in rocm_aiter_fa ( #34378 )
...
Signed-off-by: Martin Yuan <myuan@meta.com >
Co-authored-by: Martin Yuan <myuan@meta.com >
2026-02-12 16:14:43 -08:00
amitz-nv
f120bd42d3
[Kernel] Support Flashinfer trtllm fused MoE non gated FP8 & NVFP4 ( #33506 )
...
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com >
2026-02-12 13:06:58 -08:00
Hashem Hashemi
fac4e96940
small adjustment to wvSplitKrc ( #34410 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-12 20:26:36 +00:00
Michael Goin
6d4e27ce29
[Bugfix] Enforce DeepGEMM when using sparse_attn_indexer on CUDA ( #34374 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-12 12:08:06 -08:00
Andreas Karatzas
4c078fa546
[ROCm][CI] Pin TorchCodec to v0.10.0 for ROCm compatibility ( #34447 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-12 18:47:34 +00:00
Patrick von Platen
6c0baee610
[Voxtral Realtime] Refactor & Improve buffering logic ( #34428 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-12 09:46:43 -08:00
Patrick von Platen
1100a97621
[Voxstral Realtime] Enable tests ( #33803 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-02-12 09:43:24 -08:00
xuebwang-amd
766e167821
[ROCm][quantization] improve OCP weight quant parser robust ( #34431 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-12 09:40:19 -08:00
Isotr0py
becbe24808
[Bugfix] Remove broken raw url GGUF model loading support ( #34433 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-12 09:40:01 -08:00
Harry Mellor
679ca5d8d3
Fix MoE for the Transformers modelling backend ( #34436 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-12 09:29:42 -08:00
Matthew Bonanni
f2c47886fd
[Attention] Add FlashInfer Sparse MLA backend ( #33451 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2026-02-12 17:21:54 +00:00
Nicolò Lucchesi
334c715e0f
[Docs] Spec decoding docs warning removal ( #34439 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-12 09:01:51 -08:00
Aaron Hao
7b5a8b4a9d
[BUG] Reset running requests when clearing cache for pause/resume ( #34382 )
...
Signed-off-by: hao-aaron <ahao@anyscale.com >
2026-02-12 16:19:13 +00:00
danisereb
dea63512bb
Add config file for fused MoE for Nemotron (TP4, B200) ( #34411 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-12 06:09:55 -08:00
Douglas Lehr
8a798be929
[ROCm] Enable MXFP4 MoE weight pre-shuffling on gfx950 and update aiter ( #34192 )
...
Signed-off-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com >
2026-02-12 05:06:33 -08:00
Cyrus Leung
fb455ed547
[V0 Deprecation] Remove code related to per-request logits processors ( #34400 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 20:44:28 +08:00
baonudesifeizhai
f5897613fb
Fix Mistral config remap to accept compressed-tensors quantization #34028 ( #34104 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2026-02-12 08:22:06 +00:00
Louie Tsai
55a1a9563a
Vllm CPU benchmark suite improvement ( #34128 )
...
Signed-off-by: louie-tsai <louie.tsai@intel.com >
2026-02-12 16:04:44 +08:00
AllenDou
386bfe5d08
[bugfix] refactor FunASR's _get_data_parser ( #34397 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-02-12 07:26:49 +00:00
Kyle Sayers
e9cd691132
[Bugfix] Fix Sparse24 Compressed Tensors models ( #33446 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-11 23:15:16 -08:00
Yichuan Wang
80f2ba6ea6
Fix DeepSeek-OCR tensor validation for all size variants ( #34085 )
...
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-11 22:50:23 -08:00
Lucas Wilkinson
136b0bfa59
[BugFix] Fix DP chunking ( #34379 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Bill Nell <bnell@redhat.com >
2026-02-12 06:44:03 +00:00
Cyrus Leung
b96f7314b4
[Refactor] Pass Renderer to Input Processor ( #34329 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 19:38:11 -08:00
Cyrus Leung
ced2a92f40
[Refactor] Move validation to params definitions ( #34362 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 19:33:15 -08:00
Runkai Tao
e1d97c38f8
[Bug Fix] Fix naive_block_assignment always defaulting to False due to arg misalignment ( #33848 )
...
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu >
2026-02-12 11:30:57 +08:00
Michael Goin
ec12d39d44
[Bugfix] Fix MTP accuracy for GLM-5 ( #34385 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-12 11:08:19 +08:00
Michael Goin
ff1f83b056
[Refactor] Replace activation: str with MoEActivation enum ( #33843 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-02-11 17:29:32 -08:00
Kevin H. Luu
83b47f67b1
[ci] Integrate AMD tests into CI ( #33626 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
Signed-off-by: khluu <khluu000@gmail.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-12 08:54:17 +08:00
Micah Williamson
fb7b30c716
[ROCm][CI] Revert Test Groups From mi325_8 to mi325_1 Agent Pool In AMD CI ( #34384 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-11 15:52:34 -08:00
bnellnm
31d992d215
[Bugfix] Fix some issues with MoERunner PR #32344 ( #34371 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-11 14:33:14 -08:00
Wei Zhao
5aff2699bd
Fix CI failure - Flashinfer Kernel tests ( #34316 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-11 14:17:16 -08:00
Raushan Turganbay
527ca32197
[Bugfix] Fix more multimodal tests for transformers V5 ( #34334 )
...
Signed-off-by: raushan <raushan@huggingface.co >
2026-02-11 22:02:05 +01:00
Junseo Park
5458eb835d
[Bugfix] send None sentinel on final commit so server properly sends transcription.done ( #33963 )
...
Signed-off-by: pjs102793 <pjs102793@naver.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-02-11 21:01:53 +00:00
Tomas Ruiz
144d9b7cc8
[Benchmarks] Reduce ready checker log verbosity ( #34349 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2026-02-11 20:57:57 +00:00
elvischenv
83e26c834e
[GPT-OSS] Remove unnecessary contiguous ( #34337 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2026-02-11 15:29:29 -05:00
TJian
5001211369
[ROCm] [CI] fix test_unrecognized_env ( #34350 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-02-11 18:50:44 +00:00
Eldar Kurtić
11c7ace340
[Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) ( #34243 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
2026-02-11 13:24:22 -05:00
Xinyu Dong
be7f3d5d20
[Bugfix] fix default is_neox_style is True for deepseek ( #34353 )
...
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com >
2026-02-11 18:20:45 +00:00
Isotr0py
0ab06100f4
[Multimodal] Expose mm_processor_kwargs for DummyInputsBuilder ( #34330 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-11 09:37:40 -08:00
Xinyu Chen
ffb3d553cc
[Model Runner V2] Init cuda graph pool when necessary ( #33217 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2026-02-11 09:12:13 -08:00
junuxyz
fa7e0bfacf
[CI][BugFix] Fix silent failure in shellcheck hook and baseline exist… ( #32458 )
...
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com >
2026-02-11 17:03:48 +00:00
SorenDreano
48134a2c22
[Docs] Fix typo ("defult") and double spacing ( #34348 )
...
Signed-off-by: SorenDreano <71752785+SorenDreano@users.noreply.github.com >
Co-authored-by: Soren Dreano <soren@numind.ai >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-11 09:02:27 -08:00
kliuae
64f570ab56
[ROCm] [aiter] Split KV cache update for AiterFlashAttention ( #33681 )
...
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
2026-02-11 16:26:44 +00:00
Rohan Potdar
fd618871b4
[Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache ( #33948 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-11 11:12:05 -05:00
Harry Mellor
67a42b5a44
Don't try and run GLM-ASR with remote code ( #34352 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-11 08:09:40 -08:00
Lucas Wilkinson
c7914d30f9
Reapply [Attention][FA3] Update FA3 to include new swizzle optimization ( #34043 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-11 07:07:56 -08:00
Adam Binford
1b8756562e
Responses harmony system message structured ( #34268 )
...
Signed-off-by: Adam Binford <adamq43@gmail.com >
2026-02-11 05:14:28 -08:00
Linda
275e0d2a99
[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE ( #33715 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2026-02-11 12:38:11 +00:00
Harry Mellor
0f5e55e7a8
Make JAIS compatible with Transformers v5 ( #34264 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-11 12:30:37 +00:00
Harry Mellor
1e9204bff3
Make Qwen3VL compatible with Transformers v5 ( #34262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-11 04:13:23 -08:00
Li, Jiang
05339a7b20
[Bugfix][CPU] Fix llama4 inference on CPU ( #34321 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-02-11 19:07:23 +08:00
Harry Mellor
40b8f55358
[Docs] Reduce time spent generating API docs ( #34255 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-11 02:56:02 -08:00
Seiji Eicher
5045d5c983
Patch protobuf for CVE-2026-0994 ( #34253 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-02-11 02:25:04 -08:00
Nick Hill
e09546cf05
[Frontend] Exploit tokenizers "new stream" in FastIncrementalDetokenizer ( #34217 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-11 11:03:24 +01:00
Tianqi Ren
786806dd44
[Doc] Update Marlin support matrix for Turing ( #34319 )
...
Signed-off-by: Tianqi Ren <tianqi.r@outlook.com >
2026-02-11 09:03:41 +00:00
Nick Hill
79504027ef
[Misc] Bump fastsafetensors version for latest fixes ( #34273 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-11 00:30:09 -08:00
Luka Govedič
addac0e653
[torch.compile] Enable AR+rms fusion by default available for -O2 ( #34299 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2026-02-11 00:30:00 -08:00
Cyrus Leung
675a22ed66
[Chore] Move BaseRenderer to base.py ( #34308 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 00:29:51 -08:00
Kunshang Ji
cb9574eb85
[XPU][9/N] clean up existing ipex code/doc ( #34111 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-11 00:27:15 -08:00
AllenDou
21dfb842d7
[model] support FunASR model ( #33247 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-02-11 07:37:09 +00:00
R3hankhan
d1b837f0ae
[CPU] Enable FP16 (Half dtype) support for s390x ( #34116 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-02-11 14:41:42 +08:00
Roger Wang
0b20469c62
[Bugfix] Fix weight naming in Qwen3.5 ( #34313 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-10 21:37:14 -08:00
Tyler Michael Smith
d7982daff5
[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides ( #34279 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-11 05:15:52 +00:00
Robert Shaw
9b17c57460
[ModelBash][DSR1 NVFp4] Removed Bf16 Bias Cast ( #34298 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-11 05:00:00 +00:00
Hashem Hashemi
1b3540e6c6
Threshold fix wvSplitk for occasional CI fails ( #34013 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-11 03:59:14 +00:00
Matthias Gehre
7a048ee65f
[Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 ( #34149 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-02-11 03:58:56 +00:00
Cyrus Leung
c9a1923bb4
[Plugin] Simplify IO Processor Plugin interface ( #34236 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 19:47:39 -08:00
zofia
b482f71e9f
[XPU][7/N] enable xpu fp8 moe ( #34202 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2026-02-11 03:33:59 +00:00
Дзержи́нский
1485396abb
[Kernel] Apply 256bit LDG/STG To Activation Kernels ( #33022 )
...
Signed-off-by: Dzerzhinsky <256908701+AstroVoyager7@users.noreply.github.com >
Signed-off-by: Дзержи́нский <256908701+AstroVoyager7@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-10 19:31:51 -08:00
Kebe
5ee5c86eeb
[Bugfix][DeepSeek-V3.2] fix fp8 kvcache type cast ( #33884 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2026-02-10 19:31:36 -08:00
Cyrus Leung
b5dcb372e4
[Misc] Clean up validation logic in input processor ( #34144 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 19:29:29 -08:00
Tyler Michael Smith
066c6da6a0
[WideEP] Fix nvfp4 DeepEP High Throughput All2All backend ( #33738 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-10 19:15:43 -08:00
Richard Zou
e30cedd44b
[torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter ( #34093 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-10 19:15:40 -08:00
Cyrus Leung
3bcd494ef4
[Redo] Add --trust-remote-code to dataset bench args ( #34251 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 11:10:12 +08:00
tianshu-Michael-yu
0e725a7d22
[Bugfix] Fix Worker.load_model context-manager composition for sleep mode ( #34021 )
...
Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com >
2026-02-11 11:07:51 +08:00
Lucas Wilkinson
ba0511fd80
[Misc] Add run one batch script that supports profiling ( #32968 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-10 18:29:49 -08:00
Micah Williamson
4a1550d22d
[ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline ( #34280 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-11 01:08:11 +00:00
bnellnm
d1481ba783
[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner ( #32344 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-10 19:51:07 -05:00
7. Sun
dc6de33c3d
[CI] Add pip caching to cleanup_pr_body workflow ( #32979 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-02-11 00:45:28 +00:00
Tyler Michael Smith
c4b9e6778f
[Misc] Add pre-commit hook to catch boolean ops in with-statements ( #34271 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-10 15:13:20 -08:00
Richard Zou
341eed3d30
[torch.compile] Disable recursive pre_grad_passes ( #34092 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-10 18:02:31 -05:00
Zhengkai Zhang
6f2f59f2b3
[Misc][Spec Decode] support different load config for draft model ( #34022 )
...
Signed-off-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com >
Co-authored-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com >
2026-02-10 14:52:43 -08:00
Ilya Markov
bb2fc8b5e7
[BugFix] Fix async EPLB hang with DeepEP LL all2all backend ( #32860 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-02-10 22:34:47 +00:00
Ilya Markov
67132945bb
[Perf] Move eplb rebalance algo to async thread ( #30888 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-02-10 22:19:10 +00:00
Gregory Shtrasberg
f0ca0671c7
[Feature] Warn about unrecognized environment variables ( #33581 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-02-10 15:45:38 -06:00
Pavani Majety
578977bb5e
[SM100] Resubmit FMHA FP8 prefill for MLA ( #31195 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2026-02-10 16:18:43 -05:00
Roger Wang
9615575afc
[Bugfix] Fix mamba cache dtype for Qwen3.5 ( #34200 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-10 13:12:31 -08:00
Matthew Bonanni
4293c00b84
[Benchmarks] Fix attention benchmark smoke test ( #34269 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-10 16:04:07 -05:00
J Seppänen
506ad7d7c1
[Bugfix] Fix weights offloading for sleep mode ( #32947 )
...
Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-02-10 20:38:17 +00:00
Reagan Lee
fdd6f2ad58
Convert online APIs to use Renderer ( #34084 )
...
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com ”>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com ”>
2026-02-10 19:44:31 +00:00
Qi Wang
33bcd3dc3b
[Misc] Introduce ec_both role EC (encoder cache) connector ( #34182 )
...
Signed-off-by: Qi Wang <qiwa@nvidia.com >
2026-02-10 18:55:35 +00:00
Michael Goin
1f5febb4b8
[UX nit] Fix non-default api_server_count message ( #34152 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-10 10:35:58 -08:00
Andy Lo
ae871ca923
Minor cleanup for Voxtral ( #34247 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2026-02-10 18:18:30 +00:00
Woosuk Kwon
a2443de5fa
[Model Runner V2] Use pinned memory for write_contents ( #34222 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-10 08:55:22 -08:00
Harry Mellor
f84a2a8f31
[Docs] Speed up build environment set-up ( #34240 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-10 16:34:43 +00:00
Vadim Gimpelson
000214c4bb
[BUGFIX] Fix accuracy bugs in Qwen3-Next MTP ( #34077 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-10 10:57:11 -05:00
junuxyz
c5a66d1697
[Core][BugFix] Fix PP KV cache sharding memory validation ( #33698 )
...
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com >
2026-02-10 10:46:24 -05:00
Roberto L. Castro
afdce12c89
[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention ( #33680 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-02-10 10:29:52 -05:00
Zhengxu Chen
82e11973cc
[compile] Enable AOT compile with 2.10 in trunk. ( #34155 )
...
Signed-off-by: Zhengxu Chen <zhxchen17@meta.com >
2026-02-10 23:24:42 +08:00
xuebwang-amd
b129136c7a
[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations ( #29008 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-10 10:08:05 -05:00
mgazz
599e4335a4
Support benchmarking of Geospatial models ( #33922 )
...
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com >
2026-02-10 07:04:16 -08:00
Fan Yang
a1946570d8
add --insecure arg to the vllm bench to skip TLS ( #34026 )
...
Signed-off-by: Fan Yang <yan9fan@meta.com >
Co-authored-by: Fan Yang <yan9fan@meta.com >
2026-02-10 22:23:52 +08:00
Harry Mellor
d0bc520569
Bump mamba-ssm version in CI for Transformers v5 compatibility ( #34233 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-10 14:46:01 +01:00
Krish Gupta
748625cdaf
[V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input ( #34220 )
...
Signed-off-by: KrxGu <krishom70@gmail.com >
2026-02-10 13:05:32 +00:00
Harry Mellor
61413973e8
Stop testing for slow tokenizers as they will not exist soon ( #34235 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-10 12:08:20 +00:00
Phúc H. Lê Khắc
94de871546
[Misc] allow specify is_mm_prefix_lm in hf_config ( #34215 )
2026-02-10 11:16:21 +00:00
tc-mb
e042d7e685
Add flagos in MiniCPM-o ( #34126 )
...
Signed-off-by: tc-mb <caitianchi@modelbest.cn >
Signed-off-by: Vincent-Xiao <vincent.xiao.me@gmail.com >
Co-authored-by: Vincent-Xiao <vincent.xiao.me@gmail.com >
2026-02-10 02:51:48 -08:00
Roger Wang
ae4e280602
[Bugfix] Fix FI kernelchunk_gated_delta_rule output shape for Qwen3.5 ( #34219 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-10 10:41:24 +00:00
zzaebok
cbea11c9f0
[Docs] Fix format error in KV load failure recovery doc ( #34137 )
...
Signed-off-by: Jaebok Lee <jaebok9541@naver.com >
2026-02-10 02:16:26 -08:00
Cyrus Leung
2c32558a3c
[Bugfix] Fix --trust-remote-code conflict ( #34218 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 00:29:10 -08:00
Zetong Li
5f970120f0
[Bugfix] Fix memory inconsistency in cross-process shared memory ( #32022 )
...
Signed-off-by: Zetong Li <slippersss@126.com >
2026-02-10 08:22:03 +00:00
Cyrus Leung
998e2d91f8
Revert #34208 ( #34216 )
2026-02-09 23:59:04 -08:00
Wentao Ye
e1060a71a1
[Perf] Optimize detokenizer python logic ( #32975 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-02-09 23:54:41 -08:00
Chen Zhang
97fa8f6590
[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers ( #29387 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2026-02-10 07:41:16 +00:00
wang.yuqi
dab1de9f38
[Frontend][CI] Consolidate instrumentator entrypoints ( #34123 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-10 07:30:19 +00:00
Balaxxe
8d48d0a9d9
[Bugfix] Sort hf_weights_files in fastsafetensors_weights_iterator to match #33491 ( #34190 )
...
Signed-off-by: Balaxxe <136368465+jaim12005@users.noreply.github.com >
2026-02-09 23:06:30 -08:00
Andrew Xia
9608844f96
[responsesAPI] fix simpleContext streaming output_messages ( #34188 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2026-02-09 22:53:07 -08:00
Cyrus Leung
f69b903b4c
[Bugfix] Add --trust-remote-code to dataset bench args ( #34208 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-09 22:37:50 -08:00
Lucas Wilkinson
81e217fe6b
[Bugfix] Fix DP Attention Padding in Dummy Run ( #34187 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-10 05:29:39 +00:00
Cyrus Leung
ab97bcf662
[CI/Build] Relax test_mcp_tool_call ( #34204 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 05:18:57 +00:00
Cyrus Leung
25e48a3aae
[Doc] Update usage of --limit-mm-per-prompt ( #34148 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-09 21:12:13 -08:00
Roger Wang
8a5e0e2b2b
[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching ( #34183 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-10 13:03:32 +08:00
Andreas Karatzas
4cde2e0159
[ROCm][Bugfix] Resolve Dynamo tracing crash from amdsmi calls in on_gfx* arch detection ( #34108 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-09 20:50:20 -08:00
Roger Wang
047a457fa4
[Bugfix] Adopt ChunkGatedDeltaRule for Qwen3.5 ( #34198 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-10 03:47:54 +00:00
Yuwei An
e94ec59733
[LMCache] Token Base IPC API ( #34175 )
...
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com >
2026-02-10 01:18:42 +00:00
Ning Xie
13397841ab
[structured output] validate unsupported json features first ( #33233 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2026-02-09 23:49:09 +00:00
Gregory Shtrasberg
c60f8e3b49
[Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available ( #34153 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-02-09 17:38:54 -06:00
Michael Goin
5e75a14a66
[Doc] Add DCP support to attention backend doc ( #33936 )
2026-02-09 18:33:43 -05:00
Nick Hill
e7e52781ff
[ModelRunner V2][BugFix] Fix max_query_len calculation ( #34167 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-09 21:47:17 +00:00
Charlie Fu
bb9f97308d
[torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. ( #33945 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2026-02-09 16:15:43 -05:00
Hongxia Yang
4d39650961
[ROCm] update triton branch to support gpt-oss models for gfx11xx devices ( #34032 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
2026-02-09 19:36:30 +00:00
Artus Krohn-Grimberghe
8fd31f6245
[Bugfix] Voxtral prompt/audio placeholder alignment ( #34140 )
...
Signed-off-by: Artus KG <artuskg@gmail.com >
2026-02-09 19:30:38 +00:00
Artus Krohn-Grimberghe
eadb4e868b
[Bugfix] Avoid duplicate k-proj weight emission in helper ( #34142 )
...
Signed-off-by: Artus KG <artuskg@gmail.com >
2026-02-09 19:17:44 +00:00
Jiangyun Zhu
285bab4752
[Kernel] use flashinfer for gdn prefill ( #32846 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-02-09 12:17:25 -05:00
TomerBN-Nvidia
995bbf38f1
[Bugfix] Fix shared expert input for latent MoE in EP+DP (Nemotron-H) ( #34087 )
...
Signed-off-by: Tomer Natan <tbarnatan@nvidia.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-09 16:44:18 +00:00
Mohammad Miadh Angkad
d4f123cc48
[Kernel] FlashInfer: switch allreduce fusion to unified API ( #33985 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
2026-02-09 15:43:24 +00:00
ZhengHongming888
cb62e86f83
Add NUMA Core binding in nixl_connector for CPU xPyD ( #32365 )
...
Signed-off-by: Hongming Zheng <hongming.zheng@intel.com >
Signed-off-by: ZhengHongming888 <hongming.zheng@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-09 15:39:12 +00:00
Luka Govedič
781ddf7868
[CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200 ( #34031 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2026-02-09 10:05:14 -05:00
Roger Wang
64a9c2528b
[UX] Add --language-model-only for hybrid models ( #34120 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-09 14:57:33 +00:00
Lucas Wilkinson
d0d97e2974
[Misc] Fix up attention benchmarks ( #33810 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-09 09:42:03 -05:00
JJJYmmm
9562912cea
[MODEL] Adding Support for Qwen3.5 Models ( #34110 )
...
Signed-off-by: JJJYmmm <1650675829@qq.com >
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wulipc <wulipc@users.noreply.github.com >
Co-authored-by: ywang96 <ywang96@users.noreply.github.com >
Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-09 21:12:58 +08:00
zofia
9bdb06b436
[XPU][6/N] add xpu scaled_mm kernel ( #34117 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2026-02-09 20:17:35 +08:00
Nikhil Gupta
caad9f1e01
[Fix] [CPU Backend] : Prepack weights for w8a8 oneDNN matmul ( #33901 )
...
Signed-off-by: nikhil-arm <nikhil.gupta2@arm.com >
2026-02-09 18:04:41 +08:00
Ekagra Ranjan
1d5922fade
[ASR] Fix audio benchmark and add RTFx metric ( #32300 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2026-02-09 10:02:37 +00:00
Andreas Karatzas
3025b3cebb
[CI] Remove empty image_size_factors for fuyu, glm4_1v, glm_ocr ( #34107 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-09 17:37:04 +08:00
Jee Jee Li
978a37c823
[Model] GLM adaptation ( #34124 )
2026-02-09 17:32:52 +08:00
ihb2032
5a5c43511a
fix(cpu): fix mla_decode compilation on x86 without AVX512 ( #34052 )
...
Signed-off-by: ihb2032 <hebome@foxmail.com >
Co-authored-by: root <root@LAPTOP-FKNHV411.localdomain >
2026-02-09 08:55:41 +00:00
Nick Hill
d9bede0314
[BugFix] Fix fastsafetensors TP all procs using all GPUs ( #34070 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-09 15:15:46 +08:00
wang.yuqi
22b64948f6
[Frontend][last/5] Make pooling entrypoints request schema consensus. ( #31127 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-09 06:42:38 +00:00
Reagan Lee
7c233dbb36
[Tiny] Rename encoder budget file to more specific name ( #34103 )
...
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com ”>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com ”>
2026-02-09 03:48:19 +00:00
kourosh hakhamaneshi
a75a5b54c7
[bug-fix] supported_tasks is breaking backward compatibility at init_app_state ( #34027 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-09 09:46:46 +08:00
Andrey Talman
f97ca67176
[Release 2.10] Update to Torch 2.10 - final release ( #30525 )
2026-02-08 13:51:09 -08:00
danisereb
084aa19f02
Add support for ModelOpt MXFP8 dense models ( #33786 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-08 11:16:48 -08:00
navmarri14
1ecfabe525
glm 4.6 fused tuned inference config for B200 ( #32958 )
2026-02-08 18:55:47 +00:00
Richard Zou
4df841fe75
[torch.compile] Add an option to force-enable the MOE cold start optimization ( #33735 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-08 18:42:56 +00:00
TomerBN-Nvidia
a263aa6140
[BugFix] Change support no act and mul for marlin ( #34088 )
...
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com >
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com >
2026-02-08 17:18:22 +00:00
aabbccddwasd
179ae7da8f
[Revert] Fix performance regression for GLM-4.7-GPTQ decode and MTP acceptance rate ( #33771 )
...
Signed-off-by: aabbccddwasd <aabbccddwasd@qq.com >
2026-02-08 08:13:24 -08:00
Reagan Lee
c4df59ad43
Add embedding input functionality for disabled modalities [remake] ( #32493 )
...
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com ”>
Signed-off-by: Reagan Lee <reaganjlee@gmail.com >
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com >
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com ”>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-08 04:57:16 -08:00
TJian
785cf28fff
[ROCm] [CI] Reduce Resource of two test groups ( #34059 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-02-08 15:17:26 +08:00
Nick Hill
a96197f564
[Perf] Simplify DeepseekV32 tokenizer, ensure fast detokenization used ( #33855 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-08 07:16:34 +00:00
Andreas Karatzas
ab10d79855
[ROCm][Bugfix] fix act_quant_fusion module import error ( #34069 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-07 19:21:12 -08:00
Cyrus Leung
7fcb705b80
[CI/Build] Skip GCS test ( #34057 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 08:52:38 -08:00
Cyrus Leung
b956cdf818
[Doc] Fix run_batch docs ( #34056 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 06:18:16 -08:00
Hashem Hashemi
ed17f54c8b
Perf tuning and expansion of cases covered for wvSplitKrc ( #33493 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-07 05:33:11 -08:00
Jiang Wu
860981d8d8
Make directory exist ok for ray spinning up multiple replicas on a single instance ( #33604 )
...
Signed-off-by: Jiang Wu <jwu@cclgroup.com >
2026-02-07 05:30:49 -08:00
zifeitong
52181baaea
Update DeepGEMM version pin in Dockerfile to match #32479 ( #33935 )
...
Signed-off-by: Zifei Tong <zifeitong@gmail.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-07 05:30:22 -08:00
Rohan Potdar
de3869bb4d
move checks out of unified_kv_cache_update custom op ( #33943 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-07 05:30:09 -08:00
whx
ce9b3cd3e9
[PluggableLayer][3/N] Apply PluggableLayer to mamba layers. ( #33660 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-02-07 05:26:05 -08:00
Jee Jee Li
db4ede9743
[Model] Enable Step3p5ForCausalLM testing ( #33755 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-07 05:25:24 -08:00
Pooya Davoodi
2cb2340f7a
[Frontend]Add support for transcriptions and translations to run_batch ( #33934 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-07 05:24:57 -08:00
TundeAtSN
4df44c16ba
Enable Eagle3 speculative decoding for Mistral3ForConditionalGeneration to support eagle3 ( #33939 )
...
Signed-off-by: Akintunde Oladipo <akintunde.oladipo@servicenow.com >
Signed-off-by: TundeAtSN <akintunde.oladipo@servicenow.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-07 05:24:52 -08:00
Richard Zou
81fe69cae5
[torch.compile] Stop compiling identical artifacts ( #34003 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-07 05:24:48 -08:00
Mohammad Miadh Angkad
dd6a6e1190
[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune ( #34006 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-07 05:24:44 -08:00
Cyrus Leung
edb359cce4
[Renderer] Define render_cmpl and render_chat ( #34039 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 05:24:40 -08:00
wang.yuqi
6ed5eda300
[CI][Build] Pin grpcio-tools==1.78.0 ( #34048 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-07 05:24:35 -08:00
Cyrus Leung
11a4c9d30d
[Misc] Simplify get_max_tokens ( #34036 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 00:59:49 -08:00
lukec
15a0b9e570
Fix spelling errors ( #33978 )
2026-02-06 23:58:50 -08:00
Andreas Karatzas
c490d8cc73
[ROCm][CI] Pinning lm-eval version to resolve multi-modal small eval bug ( #34038 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-06 22:21:08 -08:00
Cyrus Leung
48312e579a
[Misc] Make PlaceholderRange.get_num_embeds a method ( #34035 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 05:30:17 +00:00
Vel
bc32444b23
[Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support ( #33517 )
...
Signed-off-by: code4me2 <velvetmoon222999@gmail.com >
2026-02-06 20:28:01 -08:00
Wentao Ye
18e8545297
[Revert] Add util handle_deprecated back ( #33998 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-07 04:14:45 +00:00
果冻虾仁
6f7adc533a
fix description in plugin_system.md ( #33999 )
2026-02-06 19:37:02 -08:00
Nick Hill
40218a82ba
[ModelRunner V2] Revert token rank comparison difference for now ( #34017 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-07 11:11:05 +08:00
kourosh hakhamaneshi
1c3b22058f
[Misc] Add backward-compatible import aliases for renamed translations module ( #34015 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-07 11:01:41 +08:00
Xin Yang
3920cafdd6
[Bugfix] Fix _fused_moe_lora_expand signature mismatch ( #33821 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-02-07 10:45:59 +08:00
rasmith
ec28784fdc
[CI][AMD]Bugfix] Check that model_config is not None in enable_norm_pad_fusion ( #34007 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-07 02:43:25 +00:00
Nicolò Lucchesi
55aeec04f5
[Bugfix] Fix Whisper tokenization ( #34011 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-07 10:42:52 +08:00
Ikenna
906077181b
[Bugfix] Fix QK Norm+RoPE fusion pattern matching on B200+FP8 ( #33967 )
...
Signed-off-by: Ikenna <ikennachifo@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-07 02:27:33 +00:00
Aaron Hao
89a385d79f
[Feat][RL] Pause and Resume with keep requests for single engine ( #32351 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-07 00:08:58 +00:00
kourosh hakhamaneshi
4a2d00eafd
[bugfix] [ROCm] Fix premature CUDA initialization in platform detection ( #33941 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
2026-02-06 16:17:55 -06:00
Dimitrios Bariamis
207c3a0c20
Fix RoutingMethodType logic ( #33919 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2026-02-06 14:03:34 -08:00
Sumanth R Hegde
ae2e93f89b
[Fix] Fix logprobs=0 handling for /inference/v1/generate endpoint ( #34010 )
...
Signed-off-by: SumanthRH <sumanthrh99@gmail.com >
2026-02-06 20:33:40 +00:00
xuebwang-amd
9e9acce577
[Bugfix] Fix no attribute error of SharedFusedMoE (DeepSeek-V3.1 as test model) ( #33993 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
2026-02-06 19:11:32 +00:00
Charlie Fu
fe5438200b
[Rocm][Bugfix] Fix dtype not same for gemm_a4w4 op ( #33734 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2026-02-06 19:09:59 +00:00
Wentao Ye
77c09e1130
[Refactor] Remove align block size logic in moe_permute ( #33449 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-06 10:57:06 -08:00
zhrrr
16786da735
[Model Runner V2] support apply penalty for spec decode ( #33251 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2026-02-06 10:56:48 -08:00
vllmellm
aaa2efbe98
[DOC] [ROCm] Update docker deployment doc ( #33971 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 10:05:35 -08:00
Seiji Eicher
aca5967416
[KV Connector] Add missing method overrides to MultiConnector ( #33292 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2026-02-06 12:58:21 -05:00
Wentao Ye
67a746e87f
[Log] Optimize duplicate startup log ( #33944 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-06 17:49:56 +00:00
Chauncey
7bec435130
[Bugfix] Fix the issue where tool calling does not work when using fast detokenization with dsv32 ( #33964 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-06 09:23:44 -08:00
Eldar Kurtić
5c52644b10
[Docs] Update link to Benchmark CLI documentation ( #33254 )
...
Signed-off-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com >
2026-02-06 16:00:59 +00:00
zofia
2ce9fe4ad0
[XPU][5/N] add wna16 xpu kernel ( #33973 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2026-02-06 15:59:53 +00:00
Cyrus Leung
cd8b405bd0
[Refactor] Consolidate sequence normalization and enc-dec parsing ( #33928 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-06 15:43:47 +00:00
tc-mb
4707f7ebb4
[Model] Support MiniCPM-o 4.5 ( #33431 )
...
Signed-off-by: caitianchi <caitianchi@modelbest.cn >
Signed-off-by: tc-mb <caitianchi@modelbest.cn >
Co-authored-by: mslv <mslv@baai.ac.cn >
2026-02-06 15:29:10 +00:00
Michael Goin
c39ee9ee2b
[Docs] Add sections on process architecture and minimum CPU resources ( #33940 )
...
It seems users can be confused about vLLM's performance when running
with very small amounts of CPU cores available. We are missing a clear
overview of what vLLM's process architecture is, so I added this along with
some diagrams in arch_overview.md, and included a section on CPU resource
recommendations in optimization.md
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-06 15:26:43 +00:00
Andreas Karatzas
350ca72c04
[ROCm][AITER] Fix AITER import regression for explicit backend selection ( #33749 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-06 15:08:16 +00:00
FredericOdermatt
1fb0495a72
[FIX] guidance: use max(vocab_size, len(tokenizer)) for n_vocab ( #33509 )
...
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch >
2026-02-06 14:23:03 +00:00
Raushan Turganbay
85ee1d962b
[Bugfix] Fix models and tests for transformers v5 ( #33977 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 21:47:41 +08:00
Harry Mellor
51a7bda625
Update WeightTransferConfig to be more standard like the others ( #33989 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 13:15:00 +00:00
SorenDreano
6e7b1c4b59
[Docs] Improve documentation ( #33799 )
...
Co-authored-by: Soren Dreano <soren@numind.ai >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-06 12:57:09 +00:00
Kurt Shuster
2991dd3d22
[Bugfix][Model] Support LoRA on Qwen3 Output Embedding ( #29816 )
...
Signed-off-by: kurt <kurt@thinkingmachines.ai >
2026-02-06 20:25:31 +08:00
Luka Govedič
ac32e66cf9
[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) ( #33731 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: ProExpertProg <luka.govedic@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-06 04:19:49 -08:00
Fadi Arafeh
f79d9dce16
[CPU][BugFix] Fix loading of w8a8int models with bias ( #33582 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-02-06 11:59:20 +00:00
Harry Mellor
ba5cbbf107
Bump HF Hub client to get bug fix ( #33984 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 11:25:33 +00:00
zhang-prog
233b26ab35
[PaddleOCR-VL] Add BC for transformers 5.0 config ( #33976 )
...
Signed-off-by: zhangyue66 <zhangyue66@baidu.com >
2026-02-06 10:33:49 +00:00
Harry Mellor
791a94bed0
Consolidate and fix forbidden import pre-commit checks ( #33982 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 01:47:41 -08:00
Xinyu Chen
e969a169ef
support view_from_cpu_tensor on XPU ( #33868 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2026-02-06 08:34:20 +00:00
Harry Mellor
6d8d34be6d
Fix main pre-commit ( #33975 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 00:08:05 -08:00
Gassan Salama
1363e3d6d5
[cpu][performance] CPU Paged Attention NEON BFMMLA BF16 Implementation ( #32263 )
...
Signed-off-by: Gassan <gassan.salama@arm.com >
2026-02-06 15:01:48 +08:00
chengchengpei
965525667b
Onboard voyage-4-nano ( #33720 )
...
Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com >
Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com >
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-06 06:23:34 +00:00
sihao_li
6550815c3a
[XPU]Replace pip in docker.xpu with uv pip ( #31112 )
...
Signed-off-by: sihao.li <sihao.li@intel.com >
2026-02-06 14:02:33 +08:00
Kunshang Ji
7439e4f41b
[XPU][4/N] add mxfp4 moe model support ( #33679 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-06 13:03:59 +08:00
R3hankhan
ac04dd374f
[CPU] Add BF16 Kernel type for s390x ( #33788 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-02-06 04:57:02 +00:00
Cyrus Leung
035a6cb09a
[Misc] Update code for encoder-decoder models ( #33900 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-06 11:38:39 +08:00
Mingliang Li
a32cb49b60
feat(frontend): early-fail tokenization guard for user requests ( #31366 )
...
Signed-off-by: limingliang <limingliang@stepfun.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: limingliang <limingliang@stepfun.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-05 19:38:02 -08:00
Rabi Mishra
20d7454c9b
fix(ROCm): Make flash_attn import optional in MLA attention ( #33511 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-02-06 02:22:53 +00:00
Simon Mo
5819ca8944
[Docs] Add reo analytics ( #33957 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2026-02-05 17:42:22 -08:00
Xin Yang
79028d4388
[Perf] Disable clean_logits in deepgemm fp8_mqa_logits kernel ( #33568 )
2026-02-05 20:34:00 -05:00
emricksini-h
325ab6b0a8
[Feature] OTEL tracing during loading ( #31162 )
2026-02-05 16:59:28 -08:00
Wei Zhao
91a07ff618
[Bugfix] Fix DeepSeek v3.2 tokenizer outputting None issue ( #33832 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-05 23:50:49 +00:00
Hashem Hashemi
d5c4800112
Adds padding and perf improvements to wvSplitK_fp8 ( #33527 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-05 22:16:02 +00:00
Lumosis
42d5d705f9
[Minor] Sort safetensors files to ensure deterministic loading order ( #33491 )
...
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2026-02-05 17:05:09 -05:00
Cyrus Leung
116880a5a0
[Bugfix] Make MM batching more robust ( #33817 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-05 20:40:58 +00:00
Matthew Bonanni
4145e50d85
[Bugfix] Fix DSV3.2 NVFP4 ( #33932 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-05 19:22:19 +00:00
Nicolò Lucchesi
20f5d185a6
[Misc] Rename translations to speech_to_text for OAI serving component ( #33904 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 19:16:52 +00:00
Harry Mellor
1887acca9e
Fix tokenizer test for renamed attr on Transformers v5 ( #33902 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-05 19:16:20 +00:00
Tsukasa OI
92e7562a99
[Bugfix] Suppress non-TTY color output on the process name part of the log ( #29714 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com >
2026-02-05 18:47:09 +00:00
Isotr0py
87d0d17ab5
[Models] Consolidate Deepseek-OCR2 processor ( #33909 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-05 18:29:20 +00:00
bnellnm
a57c8228ff
[Moe Refactor] Make Inplace Flag for FusedMoEModularKernel part of the constructor ( #33375 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-05 18:07:18 +00:00
zackyoray
1ee95841bd
[Bugfix] Fix swapped engine_ids in NIXL Llama 4 local attention path ( #33795 )
...
Signed-off-by: Yoray Zack <yorayz@nvidia.com >
2026-02-05 17:51:58 +00:00
Nicolò Lucchesi
7d8c6804e2
[Misc] Add debug logs ( #33931 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 09:42:40 -08:00
Benjamin Chislett
af3162d3aa
[Spec Decode] Unified Parallel Drafting ( #32887 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-05 12:37:18 -05:00
danisereb
5b2a9422f0
[BugFix] Fix LoRA Fp8 ( #33879 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-05 17:25:55 +00:00
Aaron Hao
c1858b7ec8
[Feat][RL][1/2] Native Weight Syncing API: NCCL ( #31943 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Co-authored-by: SumanthRH <sumanthrh99@gmail.com >
2026-02-05 12:13:23 -05:00
Mario Hong
82914d2ae8
[Bugfix] Fix step3p5 parser when using mtp ( #33690 )
...
Signed-off-by: mariohong <mariohong128@gmail.com >
2026-02-05 16:04:04 +00:00
Nicolò Lucchesi
81a90e5277
[Docs] Add bart-plugin to docs ( #33905 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 12:20:25 +00:00
wang.yuqi
1c3a221d3b
[Bugfix] Fix corner case of sparse embedding ( #33886 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 02:51:22 -08:00
Cyrus Leung
7bd42e609d
[Refactor] Clean up input preprocessing ( #33687 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-05 18:43:42 +08:00
Isotr0py
a2522839d8
[Bugfix] Fix Kimi-K2.5 NVFP4 checkpoints weight loading ( #33876 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-05 10:29:54 +00:00
jiahanc
59a5cb387a
[perf] Integrate flashinfer concat_mla_k ( #31171 )
2026-02-05 05:23:11 -05:00
liranschour
8322d4e47f
Enable Cross layers KV cache layout at NIXL Connector V2 ( #33339 )
...
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-05 02:17:02 -08:00
Andreas Karatzas
3e472e81f9
[ROCm][Bugfix][CI] Fix hybrid models and their tests (Mamba/Jamba/Bamba) ( #32710 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-02-05 10:01:23 +00:00
Cyrus Leung
038914b7c8
[Refactor] Move task outside of PoolingParams.verify ( #33796 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 09:33:11 +00:00
Pavani Majety
d2f4a71cd5
[Bugfix] Kimi-K2 grouped_topk usage for Flashinfer monolithic kernels. ( #33858 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2026-02-05 09:32:10 +00:00
Mark McLoughlin
2abd97592f
[KV Connector][Metrics] Do not count local prefix cache hits in connector queries ( #30522 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-05 09:57:27 +02:00
Chauncey
6abb0454ad
[Perf] Optimize the performance of structured output + reasoning ( #33557 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-05 15:45:29 +08:00
Li, Jiang
db6f71d4c9
[CI/Build] Fix CPU CI test case title ( #33870 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-02-05 15:07:14 +08:00
Fadi Arafeh
fd03538bf9
[CPU][BugFix] Allow w8a8 oneDNN quantized matmul to support 3D inputs ( #33727 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-02-05 06:26:09 +00:00
Andreas Karatzas
1f70313e59
[Bugfix] Fix ScoreMultiModalParam multi-document scoring returning single result ( #33837 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 06:17:00 +00:00
Li, Jiang
07daee132b
[CI/Build] Parallelize CPU CI tests ( #33778 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2026-02-05 13:53:48 +08:00
Andrew Xia
9595afda18
[2/N] move responses/serving _make_response_output_items logic to parser ( #33281 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Signed-off-by: Andrew Xia <axia@meta.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2026-02-05 13:46:15 +08:00
rasmith
c1395f72cd
[CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly ( #33840 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2026-02-05 05:05:48 +00:00
rinbaro
007b183d74
[docs] fix unintentional misspellings ( #33863 )
...
Signed-off-by: rinbaro <ilgomishra@gmail.com >
2026-02-04 20:50:59 -08:00
Nick Hill
add9f1fbd9
[Minor] Include StreamingInput in inputs package ( #33856 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-05 04:38:20 +00:00
Luka Govedič
e3bf79ffa0
Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" ( #33841 )
2026-02-04 19:54:27 -08:00
Andreas Karatzas
fb1270f1f8
[CI][Bugfix]: return McpCall for built-in MCP tools in non-streaming mode ( #32762 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-05 11:14:06 +08:00
Kevin H. Luu
72bb24e2db
[release] Minor fixes to release annotation ( #33849 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2026-02-05 02:07:35 +00:00
Chauncey
a7be77beef
[Bugfix] fix DeepSeek R1 with CUTLASS MLA Broken on B200 ( #33637 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-05 01:28:36 +00:00
zhanqiuhu
bbe0574d8e
[Bugfix] Disable TRTLLM attention when KV transfer is enabled ( #33192 )
...
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
2026-02-05 00:49:18 +00:00
Luka Govedič
4d9513537d
[CI][torch.compile] Reduce e2e fusion test time ( #33293 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: ProExpertProg <luka.govedic@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-02-04 19:09:03 -05:00
Ilya Boytsov
439afa4eea
feat: Add ColBERT late interaction model support ( #33686 )
...
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com >
Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 08:05:13 +08:00
Nick Hill
fa4e0fb028
[Core] Don't schedule spec tokens with prefill chunks ( #33652 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-04 23:40:22 +00:00
Sage Moore
ce498a6d61
Change the type signature of MixtureOfExperts.expert_weights to MutableSequence[Sequence[Tensor]] ( #33573 )
...
Signed-off-by: Sage Moore <sagmoore@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-04 17:02:46 -05:00
Richard Zou
9f14c9224d
Revert "[torch.compile] Significantly speed up cold start times" ( #33820 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-04 21:59:59 +00:00
Muhammad Hashmi
535de06cb1
[Model] Add transcription support for Qwen3-Omni ( #29828 )
...
Signed-off-by: Muhammad Hashmi <mhashmi@berkeley.edu >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: NickLucche <nlucches@redhat.com >
2026-02-04 21:17:47 +00:00
Simon Danielsson
4292c90a2a
[Bugfix] Support RotaryEmbedding CustomOp for gpt-oss ( #33800 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
2026-02-04 20:17:41 +00:00
Taeksang Kim
6e98f6d8b6
Implement zero-copy GQA for multimodal and CPU ( #33732 )
...
Signed-off-by: Taeksang Kim <ts.kim@hyperaccel.ai >
2026-02-04 20:11:39 +00:00
kourosh hakhamaneshi
2f6d17cb2f
[rocm][ray] Fix: Unify Ray device visibility handling across CUDA and ROCm ( #33308 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
2026-02-04 10:09:14 -08:00
Isotr0py
192ad4648b
[Bugfix] Fix interns1-pro initialization and PP ( #33793 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-04 17:54:45 +00:00
Lucas Wilkinson
0e92298622
[Misc] Delay deprecation of CommonAttentionMetadata properties ( #33801 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-04 08:41:57 -08:00
jiangkuaixue123
87d9a26166
[Bugfix] Fix ubatch wrapper num_tokens calculate ( #33694 )
...
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com >
2026-02-04 16:41:45 +00:00
Cyrus Leung
80f921ba4b
[Bugfix] Fix normalize still being passed to PoolerConfig ( #33794 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-04 23:56:02 +08:00
Wentao Ye
711edaf0d0
[Perf] Optimize spec decoding + async scheduling, 1.5% Throughput improvement ( #33612 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-02-04 09:34:32 -05:00
Micah Williamson
1d367a738e
[Bugfix][ROCm] Include float8_e4m3fnuz in NCCL Dtype Dispatching ( #33713 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-04 05:36:29 -08:00
Cyrus Leung
32a02c7ca2
Apply #33621 to main ( #33758 )
...
Signed-off-by: Zachary Aristei <zaristei@nvidia.com >
Co-authored-by: zaristei2 <zaristei2@gmail.com >
Co-authored-by: Zachary Aristei <zaristei@nvidia.com >
2026-02-04 05:35:39 -08:00
Chauncey
f67ee8b859
[Perf] Optimize chat completion streaming performance ( #33782 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-04 12:30:36 +00:00
Cyrus Leung
e57ef99b40
[Model] Apply #32631 for recent models ( #33785 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-04 12:23:01 +00:00
Yueqian Lin
f8516a1ab9
[Bugfix][Model] Fix audio-in-video support for Qwen2.5-Omni and Qwen3-Omni ( #33605 )
...
Signed-off-by: linyueqian <linyueqian@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-04 12:15:29 +00:00
Vadim Gimpelson
824058076c
[PERF] Change GDN Attention State Layout from [N, HV, K, V] to [N, HV, V, K] ( #33291 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-04 11:20:52 +00:00
Or Ozeri
8e32690869
[KV Connector][BugFix] scheduler: Delay freeing blocks of aborted async loads ( #32255 )
...
Fixes a not-yet-reported case where it was possible for blocks to be
freed by an abort before an async transfer completed, resulting
in corrupted KV data.
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-02-04 11:16:34 +00:00
Zhengxu Chen
a208439537
[compile] Remove runner type from ignored caching factor list. ( #33712 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-02-04 10:56:45 +00:00
Zhengxu Chen
bcd2f74c0d
[compile] Clean up AOT compile bypass on evaluate_guards. ( #33578 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-02-04 02:12:53 -08:00
Kunshang Ji
f79f777803
[XPU][2/N] add support unquantized moe support for xpu ( #33659 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-04 02:12:25 -08:00
Augusto Yao
4c8d1bf361
use ORJSONResponse when available to improve the efficiency of request process ( #33548 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
2026-02-04 10:04:11 +00:00
Kunshang Ji
061da6bcf7
[XPU] remove common path warning log ( #33769 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-04 16:40:17 +08:00
zhanqiuhu
4403e3ed4c
[Metrics] Add labeled prompt token metrics for P/D disaggregation ( #33290 )
...
Add labeled Prometheus metrics to distinguish where prompt tokens come
from in P/D disaggregated deployments.
In P/D disaggregation, decode instances receive KV cache from prefill instances.
Currently, decode reports inflated prompt throughput because it counts all
prompt tokens as "computed", even though most were transferred.
This PR adds labeled metrics so users can understand actual compute work vs
transferred work:
vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally
vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer
vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache
vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
2026-02-04 07:46:48 +00:00
Matt
08e094997e
[Hardware][AMD][CI] Refactor AMD tests to properly use BuildKite parallelism ( #32745 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-02-04 14:51:33 +08:00
Wentao Ye
d88a1df699
[Deprecation] Deprecate profiling envs ( #33722 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-04 05:58:21 +00:00
Cyrus Leung
90d74ebaa4
[Deprecation] Remove _get_data_parser in MM processor ( #33757 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-04 05:51:52 +00:00
Frank Wang
45f8fd6f97
[Feature] Enable TRITON_ATTN for Batch Invariance ( #33688 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
2026-02-04 13:27:34 +08:00
Wentao Ye
5e1e0a0fbd
[Refactor] Remove unused dead code ( #33718 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-03 21:25:11 -08:00
Michael Goin
eb5ed20743
[Bugfix] Define router_logits_dtype for remaining MoE models ( #33737 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-04 13:24:14 +08:00
Huy Do
2647163674
Save startup benchmark results as a list of values ( #33629 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2026-02-03 20:37:51 -08:00
Shanshan Shen
9fb27dd3b3
[MM] Align the prefix of MMEncoderAttention with Attention ( #33750 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-02-04 04:07:30 +00:00
R3hankhan
4dffc5e044
[CPU] Split attention dispatch by head_dim alignment ( #32161 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-02-03 19:37:15 -08:00
Andrew Xia
e1bf04b6c2
[1/N] Initial Implementation of Parser for ResponsesAPI ( #32712 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2026-02-04 10:59:03 +08:00
Isotr0py
02080179a3
[Bugfix] Fix torchrun PP broadcast deadlock with async scheduling ( #33701 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-04 02:17:37 +00:00
wang.yuqi
1b8fe6f7c4
[Frontend][4/n] Make pooling entrypoints request schema consensus | ScoreRequest ( #33060 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-04 01:48:40 +00:00
Nick Hill
52ee21021a
[BugFix][Spec Decoding] Fix negative accepted tokens metric crash ( #33729 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-03 23:34:41 +00:00
Wentao Ye
655efb3e69
[Dependency] Remove comments of ray in dependency files ( #33351 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-03 15:30:47 -08:00
Matthew Bonanni
bd8da29a66
[Bugfix] Fix sparse MLA metadata building ( #33579 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-03 15:29:48 -08:00
Michael Goin
2a99c5a6c8
[Bugfix] Disable TRTLLM FP8 MoE if router_logits_dtype==float32 and routing_method!=DeepSeekV3 ( #33613 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-03 13:26:51 -08:00
Patrick von Platen
3f7662d650
[Voxtral Realtime] Change name ( #33716 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-02-03 13:03:28 -08:00
Vadim Gimpelson
a372f3f40a
[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1 ( #33257 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-03 15:10:31 -05:00
Harry Mellor
61e632aea1
Turn @config into a dataclass_transform ( #31541 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-03 17:40:59 +00:00
Richard Zou
b1bb18de8d
[torch.compile] Significantly speed up cold start times ( #33641 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-03 09:12:11 -08:00
Lucas Wilkinson
2267cb1cfd
[Attention][FA3] Update FA3 to include new swizzle optimization ( #23465 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-02-03 08:08:47 -08:00
dtc
0d6ccf68fa
[P/D] rework mooncake connector and introduce its bootstrap server ( #31034 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2026-02-03 08:08:25 -08:00
Cyrus Leung
18e7cbbb15
[Bugfix] Fix startup hang for Granite Speech ( #33699 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-03 15:57:56 +00:00
Patrick von Platen
f0d5251715
[Voxtral models] Skip warm-up to skip confusing error message in warm-up ( #33576 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-03 07:22:34 -08:00
Shanshan Shen
5c4f2dd6ef
[MM] Pass prefix parameter to MMEncoderAttention ( #33674 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-02-03 06:47:41 -08:00
wang.yuqi
f3d8a34671
[Bugfix] Do not add extra \n for image-only cases when constructing multimodal text prompts. ( #33647 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-03 06:43:47 -08:00
shaharmor98
4bc913aeec
Feat/add nemotron nano v3 tests ( #33345 )
2026-02-03 08:52:49 -05:00
Kuntai Du
fbb3cf6981
[Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer ( #33377 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2026-02-03 21:50:15 +08:00
Krish Gupta
2df2b3499d
Document NixlConnector backend selection via kv_connector_extra_config ( #33552 )
...
Signed-off-by: KrxGu <krishom70@gmail.com >
2026-02-03 05:49:59 -08:00
Harry Mellor
2a8d84e66d
Fix Gemma3n audio encoder for Transformers v5 ( #33673 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-03 05:49:49 -08:00
zxy
a3acfa1071
[Models] Intern-S1-Pro ( #33636 )
...
Signed-off-by: zxy <zhou0493@e.ntu.edu.sg >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-03 05:49:45 -08:00
Harry Mellor
be8168ff88
Fix Gemma3 GGUF for Transformers v5 ( #33683 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-03 12:36:53 +00:00
Harry Mellor
f6af34626d
Fix offline test for Transformers v5 ( #33682 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-03 12:07:24 +00:00
Song Zhixin
ceab70c89d
[Bugfix] fix qwen3-asr response error ( #33644 )
...
Signed-off-by: jesse <szxfml@gmail.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-03 03:33:56 -08:00
Cyrus Leung
52683ccbe1
[Misc] Update default image format of encode_base64 ( #33656 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-03 03:13:16 -08:00
Michael Goin
e346e2d056
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE ( #33620 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-03 10:37:15 +00:00
Cyrus Leung
83449a5ff0
[Refactor] Clean up pooling serial utils ( #33665 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-03 10:29:18 +00:00
Lucas Hänke de Cansino
dad2d6a590
[Bugfix][Model] Fix DeepSeek-OCR-2 chat template to include BOS token ( #33642 )
...
Signed-off-by: l4b4r4b4b4 <lucas.cansino@mail.de >
2026-02-03 00:35:58 -08:00
Isotr0py
32e84fa1ff
[CI/Build] Investigate torchrun distributed tests hanging issue ( #33650 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-03 15:49:17 +08:00
Richard Zou
fd9c83d0e0
[torch.compile] Document the workaround to standalone_compile failing ( #33571 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-03 07:16:55 +00:00
杨朱 · Kiki
b95cc5014d
[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable ( #33535 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-03 15:01:59 +08:00
Nick Hill
61397891ce
[Minor] Some code simplification in scheduler.py ( #33597 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-03 15:00:00 +08:00
杨朱 · Kiki
ef248ff740
[Misc] Remove deprecated profiler environment variables ( #33536 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-03 14:58:44 +08:00
Kunshang Ji
e10604480b
[XPU][1/N] Deprecate ipex and switch to vllm-xpu-kernels for xpu platform ( #33379 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-02 22:46:10 -08:00
Chauncey
bf001da4bf
[Bugfix] Interleaved thinking keeps compatibility with reasoning_content ( #33635 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: Koushik Dutta <koushd@gmail.com >
2026-02-03 06:46:05 +00:00
杨朱 · Kiki
a0a984ac2e
[CI/Build] Remove hardcoded America/Los_Angeles timezone from Dockerfiles ( #33553 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-02 22:32:39 -08:00
Shengliang Xu
f1cb9b5544
Fix quantized Falcon-H1 model loading issues ( #32728 )
...
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-02 22:31:27 -08:00
Daniel Mescheder
4c4b6f7a97
[Frontend] Add sampling parameters to Responses API ( #32609 )
...
Signed-off-by: Daniel Mescheder <dmesch@amazon.com >
Co-authored-by: Daniel Mescheder <dmesch@amazon.com >
2026-02-03 13:51:10 +08:00
Roger Wang
10546f925a
[Bugfix] Fix mm budget setting for Qwen Omni models ( #33634 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-03 04:56:25 +00:00
Radu Salavat
e69c990c21
[Feature][CPU Backend]: Optimize ARM vectorization backend ( #30329 )
...
Signed-off-by: Radu Salavat <radu.salavat@arm.com >
2026-02-02 20:17:56 -08:00
Richard Zou
5eac9a1b34
[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding ( #33624 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-03 03:38:49 +00:00
Nathan Weinberg
1b60b45d0d
[CI/Build] add directions for CPU image upload to Docker Hub ( #32032 )
...
Signed-off-by: Nathan Weinberg <nweinber@redhat.com >
Signed-off-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com >
Co-authored-by: Li, Jiang <bigpyj64@gmail.com >
2026-02-03 02:48:06 +00:00
Dezhan
4b3803d180
[BugFix] DPMetadata raises assert error for dense model ( #32739 )
...
Co-authored-by: Dezhan Tu <dztu@meta.com >
2026-02-03 00:56:44 +00:00
Patrick von Platen
5019c59dd2
[Voxtral Realtime] Introduce global log mel max ( #33574 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-02 17:01:47 -05:00
Lain
089cd4f002
fix cutlass_3x_gemm_fp8_blockwise on sm103a ( #32224 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2026-02-02 11:47:46 -08:00
Vasiliy Kuznetsov
0130223bd9
fix memory for online fp8 quantization with streaming weight load ( #31914 )
...
Signed-off-by: vasiliy <vasiliy@fb.com >
2026-02-02 14:17:42 -05:00
Matthew Bonanni
5d1aef3004
[UX] Format attention backend log line ( #33570 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-02 18:57:12 +00:00
yugong333
ffe1fc7a28
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. ( #32005 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
2026-02-02 12:30:06 -05:00
Harry Mellor
8b7346d5f1
Update huggingface-hub again ( #33567 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-02 09:20:54 -08:00
Harry Mellor
6141ebe0dd
Remove incorrect tokenizer info test ( #33565 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-02 17:11:44 +00:00
Yang Liu
199e3cb476
[Model] Use mm_position to compute mrope positions for GLM-4.xV ( #33039 )
...
Signed-off-by: Yang <lymailforjob@gmail.com >
2026-02-02 16:55:48 +00:00
Matthew Bonanni
9f8cb81b44
[CI] Add DeepSeek V3.2 nightly eval ( #33566 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-02 16:10:02 +00:00
Cyrus Leung
d7e17aaacd
[Refactor] Move profiling methods to MM budget ( #33559 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-02 23:27:00 +08:00
Kebe
528e9b1490
[Feature][Core] Support Fabric detection to adapt the MNNVL protocol for the GB series ( #33540 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Thomas Vegas <tvegas@nvidia.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2026-02-02 22:55:46 +08:00
shanjiaz
d95b4be47a
move spec decode slow test to test_areas.yaml ( #33365 )
...
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com >
2026-02-02 06:28:36 -08:00
Isotr0py
4061dcf4c5
[Bugfix] Enable Kimi k25 processor test ( #33562 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-02 14:25:25 +00:00
danielafrimi
0aca8b8c62
[MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) ( #32790 )
...
Signed-off-by: dafrimi <dafrimi@nvidia.com >
2026-02-02 09:18:50 -05:00
Rabi Mishra
9eb58f8cf1
fix[ROCm]: Remove unconditional aiter import ( #32902 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-02-02 22:10:02 +08:00
Cyrus Leung
b10d05b8a8
[Model] Use explicit types in get_generation_prompt ( #33551 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-02 12:38:49 +00:00
Borushiki
b398e5c819
Update get_expert_mapping to include self parameter ( #33525 )
...
Signed-off-by: Borushiki <38628261+Otsutsukii@users.noreply.github.com >
2026-02-02 20:29:07 +08:00
Grzegorz K. Karch
78061ef584
Fix accessing hidden_act from model config ( #32686 )
...
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com >
2026-02-02 11:11:33 +00:00
Nicolò Lucchesi
528b3076af
[CI][Bugfix] Fix flaky tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_example_connector_consistency ( #33555 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-02 03:01:29 -08:00
Cyrus Leung
a502831d36
[Chore] Remove redundant input parsing methods ( #33542 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-02 10:50:47 +00:00
Komal Kumar Teru
ba871fb788
[Misc] support arbitrary MM datasets in spec dec bench ( #33486 )
...
Signed-off-by: kkt-cohere <komal@cohere.com >
Signed-off-by: Komal Kumar Teru <162363718+kkt-cohere@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-02 08:49:48 +00:00
R3hankhan
ab374786c7
[CPU][IBM Z][Dockerfile] Fix IBM Z builds ( #33243 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-02-01 23:41:29 -08:00
RED
808dd87b30
[Model] Support DeepSeek-OCR-2 ( #33165 )
...
Signed-off-by: liuli <ll407707@alibaba-inc.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-02 06:24:10 +00:00
Andy Lo
beb8899482
Fix mistral sliding window parsing ( #33521 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2026-02-02 05:08:04 +00:00
Sawyer Bowerman
ce88756b96
[Doc]: update paths for Offline/Online/Others example sections ( #33494 )
...
Signed-off-by: Sawyer Bowerman <sbowerma@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-02 03:56:53 +00:00
Paco Xu
a3154a6092
[Doc] add missing model entries in supported_models.md ( #33220 )
...
Signed-off-by: Paco Xu <paco.xu@daocloud.io >
2026-02-02 03:37:25 +00:00
jack
7c036432fc
[Bugfix] GLM-4 tool parser: incremental string streaming ( #33218 )
...
Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com >
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com >
2026-02-02 11:13:31 +08:00
Robert Shaw
318b120766
[Nightly CI] Remove CT Model ( #33530 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-01 19:09:09 -08:00
csy0225
c3b40dc3e7
[Models] Step-3.5-Flash ( #33523 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com >
Co-authored-by: xiewuxun <xiewuxun@stepfun.com >
Co-authored-by: zetaohong <i-hongzetao@stepfun.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-02 10:21:18 +08:00
Yifan Qiao
a01ef3fa51
[Fix] prefix cache hit rate == 0 bug with gpt-oss style models ( #33524 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu >
2026-02-02 01:59:58 +00:00
Runkai Tao
7320ca3942
Add unpermute-aware fused MoE LoRA path ( #32655 )
...
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu >
2026-02-02 09:46:09 +08:00
Nick Hill
cf0a99f84d
[ModelRunner V2] Support spec decode with structured outputs ( #33374 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-02 00:19:59 +00:00
Nick Hill
e535d90deb
[ModelRunner V2] Misc minor simplifications and optimizations ( #33467 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-01 22:17:14 +00:00
Komal Kumar Teru
0b225fb7b2
[Misc] skip target model mm emb in draft proposal step when draft is text-only ( #33437 )
...
Signed-off-by: kkt-cohere <komal@cohere.com >
2026-02-01 21:13:35 +00:00
will b.
46b4a02794
Fix DeepSeek V2 RoPE initialization error ( #33501 )
...
Signed-off-by: Eduardo Salinas <edus@microsoft.com >
Signed-off-by: catswe <212922539+catswe@users.noreply.github.com >
Co-authored-by: Eduardo Salinas <edus@microsoft.com >
2026-02-01 21:00:56 +00:00
shaharmor98
8869cd8ec1
Add MoE config for Super B200 TP2 ( #33510 )
2026-02-01 18:48:37 +00:00
JartX
cd86fff38f
[BUGFIX] Fix hipErrorIllegalState in Qwen3-Omni during startup profiling allow inference Omni on ROCM ( #33077 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2026-02-01 13:36:25 +00:00
Maral
b5f8c3092d
[W8A8 Block Linear Refactor][1/N] Keep all quantization types into QuantFP8 class. ( #33047 )
...
Signed-off-by: maral <maralbahari.98@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-01 09:28:01 +00:00
Cyrus Leung
21997f45b1
[Redo] #33110 with threading limit ( #33502 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: YunzhuLu <lucia.yunzhu@gmail.com >
2026-02-01 09:18:11 +00:00
Luka Govedič
672023877b
Change defaults for vllm bench startup ( #33489 )
2026-01-31 23:46:01 -08:00
Zack Yu
754a8ca942
fix: only include Authorization header when OPENAI_API_KEY is set ( #33488 )
...
Signed-off-by: zack041 <zackyu041@gmail.com >
2026-01-31 23:35:09 -08:00
Eduardo Salinas
302ecf64ff
[Models]: lfm2_siglip2 return intermediate encoder layers ( #33370 )
...
Signed-off-by: Eduardo Salinas <edus@microsoft.com >
2026-02-01 06:17:49 +00:00
Cyrus Leung
b6bb2842cf
[Critical] Revert #33110 ( #33500 )
2026-01-31 21:06:42 -08:00
Cyrus Leung
79b6ec6aab
[Bugfix] Fix inconsistent handling of cache reset ( #33481 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 20:23:41 -08:00
Greg Pereira
d6416fdde9
pin LMCache to v0.3.9 or greater with vLLM v0.15.0 ( #33440 )
...
Signed-off-by: greg pereira <grpereir@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-31 20:50:38 -07:00
Andreas Karatzas
0fb3157267
[ROCm][CI] Update huggingface-hub pin ( #33492 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-01 02:51:54 +00:00
Cyrus Leung
a358e4dffe
[Refactor] Make Renderer an abstract class ( #33479 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-01 10:36:30 +08:00
René Honig
079781177a
fix: Add SM120 (RTX Blackwell) support for FlashInfer CUTLASS NVFP4 MoE kernels ( #33417 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2026-01-31 14:06:42 -08:00
Roy Wang
63c0889416
[Misc] Fix flashinfer related tests ( #33462 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2026-01-31 16:10:24 -05:00
smashyalts
1e86c802d4
Fix grammar ( #33121 )
...
Signed-off-by: smashyalts <smashyalts@gmail.com >
2026-01-31 09:59:34 -08:00
linhaifeng
fedf64332e
[Bugfix]: Fix display errors in TORCH_CHECK messages ( #32942 )
...
Signed-off-by: linhaifeng <1371675203@qq.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-31 09:48:48 -08:00
Xiao Yang
2238a12c13
[Misc] support collect_env for endpoint /server_info ( #33246 )
...
Signed-off-by: yang.xiao <yang.xiao@daocloud.io >
2026-02-01 01:42:59 +08:00
Harry Mellor
ce0afe2451
Update huggingface-hub pin for the last time before Transformers v5 ( #33473 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-31 09:14:24 -08:00
Cyrus Leung
88c3e114d8
[Refactor] Move MM data parsing outside processor ( #33408 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 16:46:14 +00:00
Cyrus Leung
92924b2ddd
[Deprecation] Remove deprecated items related to pooling ( #33477 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 08:44:40 -08:00
YunzhuLu
27cb2f678f
[Bugfix] Early-reject requests with MM data longer than encode cache capacity ( #33110 )
...
Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-31 08:41:13 -08:00
jma99_2333
22d9a056d5
Support clear mm and encoder cache ( #33452 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-31 15:22:25 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
13b842f271
[BugFix][Router Replay] Capture Logical Experts with EPLB ( #33013 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2026-01-31 10:12:17 -05:00
Luka Govedič
15f40b20aa
[fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops ( #33441 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Richard Zou <zou3519@gmail.com >
2026-01-31 06:48:34 -08:00
Cyrus Leung
793af538a3
[Doc] Update plugin deprecation notices ( #33476 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 22:48:28 +08:00
cmunley1
6f5e7cda57
support return prompt token ids in responses ( #33378 )
2026-01-31 06:04:20 -08:00
Roy Wang
68feb76a6f
[Misc] Replace deprecated interface seed_everything ( #33474 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2026-01-31 05:38:39 -08:00
Cyrus Leung
4cb59dea6a
[Bugfix] Fix incompatibility between #33372 and #32863 ( #33475 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 05:21:32 -08:00
Angela Yi
608b556507
[ez] Add structured torch.compile logs ( #33213 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-31 21:00:54 +08:00
Cyrus Leung
f0a1c8453a
[Frontend] Use new Renderer for Completions and Tokenize API ( #32863 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 04:51:15 -08:00
caozuoba
8980001c93
[perf] v1/spec_decode: skip softmax for all-greedy rejection sampling ( #32852 )
...
Signed-off-by: hdj <1293066020@qq.com >
2026-01-31 09:51:26 +00:00
jennyyyyzhen
527bcd14d4
[ROCM] Enable aiter attn backend for qwen3-next model ( #32492 )
...
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu >
2026-01-31 17:03:57 +08:00
Jinwu
f68e3ea4e1
[BugFix] Add synchronize in CutlassW4A8LinearKernel to ensure data is ready for use. ( #33078 )
...
Co-authored-by: jinwuguo <jinwuguo@tencent.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-31 08:14:54 +00:00
Yanan Cao
d5c41db35b
[Kernel] [Helion] [3/N] Helion kernel registry ( #33203 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-01-31 15:38:46 +08:00
Fadi Arafeh
1618e25492
[CPU][Feat] Enable KleidiAI accelerated int4 dynamic quant with BF16 activations on Arm CPUs ( #33122 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-31 07:16:22 +00:00
AutumnAurelium
f3888aca83
Add EAGLE3 support for AFMoE ( #33111 )
...
Signed-off-by: AutumnAurelium <88015631+AutumnAurelium@users.noreply.github.com >
2026-01-31 06:53:08 +00:00
Dimitrios Bariamis
f0bca83ee4
Add support for Mistral Large 3 inference with Flashinfer MoE ( #33174 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-01-30 22:48:27 -08:00
Matthias Gehre
73419abfae
[Bugfix] Handle Asym W4A16 (ConchLinearKernel) for CT ( #33200 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-01-31 06:21:51 +00:00
Nicolò Lucchesi
e77f162cf5
[Bugfix] Fix Qwen3ASR language asr tag in output ( #33410 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-31 05:24:49 +00:00
Yanan Cao
8ecd213c0b
[Kernel] [Helion] [2/N] Helion kernel wrapper ( #32964 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-01-31 12:53:01 +08:00
Francesco Fusco
5b55c0bea7
[Attention] Clarify comment explaining attn_logits +1 dimension ( #33427 )
...
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com >
2026-01-31 04:50:30 +00:00
Patrick von Platen
15e0bb9c42
[Streaming -> Realtime] Rename all voxtral related classes, fn, files ( #33415 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-01-31 04:49:00 +00:00
Micah Williamson
6c64c41b4a
[ROCm][CI] Force max_num_seqs=1 on ROCm In test_sharded_state_loader to reduce flakiness ( #33277 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-31 12:28:29 +08:00
Russell Bryant
a2ef06e1b3
[Misc] offest -> offset in comments and variable names ( #33444 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-01-30 20:19:22 -08:00
Lucas Wilkinson
0a3c71e7e5
[BugFix] Fix whisper FA2 + full cudagraphs ( #33360 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-31 12:15:06 +08:00
Michael Goin
29fba76781
[UX] Use gguf repo_id:quant_type syntax for examples and docs ( #33371 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-31 12:14:54 +08:00
Isotr0py
9df152bbf6
[Misc] Algin Qwen3-VL-embedding image example outputs with HF repo example ( #33419 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-30 19:36:56 -08:00
Nick Hill
876a16f4fb
[ModelRunner V2] Fix spec decoding + logprobs ( #33391 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-31 03:33:26 +00:00
Matthew Bonanni
aaa901ad55
[Attention] Move MLA forward from backend to layer ( #33284 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-30 19:30:00 -08:00
Wentao Ye
010ec0c30e
[Deprecation] Deprecate seed_everything and scatter_mm_placeholders in v0.15 ( #33362 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-31 02:54:16 +00:00
Alberto Ferrer
64a40a7ab4
[Bugfix] Fix typo in read_offset variable name ( #33426 )
...
Signed-off-by: Alberto Ferrer <albertof@barrahome.org >
2026-01-31 01:26:15 +00:00
Gregory Shtrasberg
31aedfe7d6
[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831 ( #33366 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-01-30 19:05:23 -06:00
Michael Goin
67ebaff528
Refactor NVFP4 Linear utils for ModelOpt and CT ( #33201 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-30 16:37:42 -08:00
Chendi.Xue
2b465570e6
[CI][HPU]accelerate hpu test by skip python re-install and clean container name ( #33286 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2026-01-30 21:36:29 +00:00
Huy Do
9ca66ecc10
Indicate compile mode in the benchmark results ( #32990 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2026-01-30 15:34:36 -05:00
Pavani Majety
c3a9752b0c
[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. ( #32437 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2026-01-30 10:30:46 -08:00
xuebwang-amd
f451b4558b
[Quantization][ROCm] Fix MoE weight loading to be robust (Qwen3_MoE/Qwen3_next as example models) ( #33173 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
2026-01-30 17:50:23 +00:00
Vasiliy Kuznetsov
3f96fcf646
fix QERL attention import path ( #33432 )
...
Signed-off-by: vasiliy <vasiliy@fb.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-30 09:29:09 -08:00
Yanan Cao
6c1f9e4c18
[Kernel] [Helion] [1/N] Add Helion ConfigManager ( #32740 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-01-30 12:19:19 -05:00
Harry Mellor
67239c4c42
Fix encoder-decoder model disabling mm processor cache ( #33236 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-30 16:30:10 +00:00
Nicolò Lucchesi
8ece60768f
[CI] Qwen3-ASR transcriptios tests ( #33414 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-30 16:17:56 +00:00
Michael Goin
fd0e377244
Support FP8 block quant for CompressedTensorsW8A16Fp8 ( #33280 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-30 11:15:20 -05:00
Kyle Sayers
f857a03f6b
[QeRL] Layerwise Reloading ( #32133 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2026-01-30 08:50:05 -07:00
Danielle Robinson
74898a7015
[BugFix][LoRA] TritonExperts is ModularMoEPath for FP8 models ( #33393 )
...
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
2026-01-30 15:27:42 +00:00
Frank Wang
8f5d51203b
Disable Cascade Attention for Batch Invariance ( #32561 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-30 10:00:46 -05:00
Julien Denize
ae5b7aff2b
Improve Mistral format checks. ( #33253 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: juliendenize <julien.denize@mistral.ai >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-30 06:23:33 -08:00
Harry Mellor
a11bc12d53
Fix test_moe.py for Transformers v5 ( #33413 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-30 14:03:25 +00:00
Nathan Weinberg
58cb55e4de
[Doc] Enhance documentation around CPU container images ( #32286 )
...
Signed-off-by: Nathan Weinberg <nweinber@redhat.com >
2026-01-30 13:36:20 +00:00
杨朱 · Kiki
cf896ae0e3
[Misc] Clean up HIDDEN_DEPRECATED_METRICS after metric removal ( #33323 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-30 13:31:17 +00:00
Harry Mellor
c5113f60f2
Remove deprecated reasoning_content message field ( #33402 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-30 11:48:15 +00:00
vllmellm
174f16700b
[Doc] [ROCm] Update Documentation to reflect v0.15.0 release ( #33388 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-30 19:06:08 +08:00
Julien Denize
8e2ad97ad0
[BUGFIX] Pixtral cannot be loaded with --limit-mm-per-prompt 0 ( #33406 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2026-01-30 02:52:02 -08:00
Patrick von Platen
10152d2194
[Realtime API] Adds minimal realtime API based on websockets ( #33187 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-30 18:41:29 +08:00
杨朱 · Kiki
1a7894dbdf
[Misc] Replace Optional[X] with X | None syntax ( #33332 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-30 01:56:59 -08:00
Cyrus Leung
c87eac18f7
[Refactor] Move MM item count validation outside of processor ( #33396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-30 09:27:31 +00:00
tianshu-Michael-yu
f45870b53f
fix: allow LFM2 MoE prefix caching (align) ( #33376 )
...
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com >
2026-01-30 08:23:14 +00:00
hujiaxin0
ba45bedfd1
[model] Add support for openPangu7B-VL ( #32449 )
...
Signed-off-by: hujiaxin <524446785@qq.com >
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com >
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com >
2026-01-30 15:54:27 +08:00
Harry Mellor
9432ed8c7e
Explicitly set return_dict for apply_chat_template ( #33372 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-30 07:27:04 +00:00
Lucas Kabela
726d89720c
[CI] Enable mypy import following for vllm/spec_decode ( #33282 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-30 06:43:32 +00:00
Harry Mellor
d334dd26c4
Move decode context parallel validationn to ParallelConfig ( #33239 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-30 06:18:41 +00:00
Ryan Rock
070c811d6f
[CI][AMD] Skip 4 GPUs testgroup ray tests ( #33305 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-01-29 21:39:53 -08:00
Isotr0py
8bfc8d5600
[Models] Refactor Kimi-K2.5 weight loading ( #33346 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-30 05:31:20 +00:00
Harry Huang
ec51831a22
[BugFix] Disable async scheduling for Mamba prefix caching ( #33352 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-01-30 04:40:19 +00:00
Harry Mellor
80b918f2bd
Fix tie_word_embeddings for multimodal models in Transformers v5 ( #33359 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-30 03:37:39 +00:00
Wang Haoyu
c46b0cd0af
[Model][Multimodal] Add explicit MusicFlamingo adapter ( #32696 )
...
Signed-off-by: WangHaoyuuu <mailwhaoyu@gmail.com >
2026-01-30 11:01:29 +08:00
Aidan Reilly
133765760b
[Docs] Adding links and intro to Speculators and LLM Compressor ( #32849 )
...
Signed-off-by: Aidan Reilly <aireilly@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-29 14:12:35 -08:00
Michael Goin
bfb9bdaf3f
[Bugfix] Enable Triton MoE for FP8 per-tensor dynamic ( #33300 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-29 12:15:17 -08:00
Kevin H. Luu
2284461d02
[release] Minor fixes to release annotation and wheel upload ( #33129 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-29 12:09:35 -08:00
danisereb
8e2a469b3b
Add Triton fused MoE config for B200 (Nemotron Nano) ( #32804 )
2026-01-29 19:21:33 +00:00
CarstyYou
23591e631e
[Bugfix][Kernel] Fix negative memory offset in GDN Triton kernel ( #33326 )
...
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com >
2026-01-29 10:40:11 -08:00
Linda
0493d897c4
[NVIDIA] [feat] Integrate flashinfer Trtllmgen bf16 moe ( #32954 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com >
2026-01-29 10:00:13 -08:00
Chendi.Xue
8c8ebeb941
[BUGFIX][XPU] fix memory check after XPU reuse GPU_worker ( #33358 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2026-01-29 09:56:30 -08:00
Cyrus Leung
831453fcef
[Chore] Move MediaConnector to vllm.multimodal.media ( #33324 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-29 16:54:31 +00:00
Angela Yi
5a66c9cc76
[ez] Delete torch25_custom_graph_pass ( #33287 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-29 16:47:05 +00:00
Isotr0py
5e73e4900c
[Bugfix] Fix broken GLM-OCR initialization ( #33350 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-29 07:56:05 -08:00
Cyrus Leung
c6e7404cc5
[Multimodal] Simplify MM input definitions ( #33331 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-29 13:32:04 +00:00
sthWrong
17b17c0684
[Backport] [Kimi-K2.5] Replace torch.cuda with current_platform for d… ( #33320 )
2026-01-29 12:29:17 +00:00
Kunshang Ji
8bb6271c77
[Intel GPU] refine xpu worker ( #32894 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-01-29 12:26:52 +00:00
Roger Wang
8b3f0a99dd
[Models] Qwen3-ASR ( #33312 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-01-29 19:27:15 +08:00
Li, Jiang
8311f083bd
[Bugfix][CPU] Fix thread num for shared memory communication ( #33317 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 03:26:58 -08:00
Patrick von Platen
40c35038d2
[Voxtral] Streaming example ( #33042 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-29 03:22:49 -08:00
zofia
a5aa4d5c0f
[Quantization][Refactor] use platform dict to choose kernel ( #33130 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
Signed-off-by: zofia <110436990+zufangzhu@users.noreply.github.com >
2026-01-29 10:44:58 +00:00
andrii.pasternak
615e8033e5
[Bug Fix] Handle variable-length tensors in MultiModalFlatField batching ( #31751 )
...
Signed-off-by: Andrii Pasternak <andriipasternak31@gmail.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-29 10:42:59 +00:00
Ilya Markov
d09135fbd0
[BugFix] Async Eplb fix potential race condition ( #32881 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-01-29 10:31:40 +00:00
daniel-salib
8688c3d460
[fix] tesdt mcp_tool_calling_streaming with a more complex math question ( #32769 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2026-01-29 10:25:58 +00:00
Isotr0py
5400014d55
[Chore] Remove use_data_parallel kwargs from ViT implementation ( #33310 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-29 10:20:52 +00:00
Isotr0py
3a92c6f3b5
[Misc] Cleanup Kimi-K2.5's vision chunk modality entrypoints ( #33157 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-29 09:46:02 +00:00
amirkl94
e01ff5c070
Bugfix: Pass router logits dtype in nemotron shared experts ( #32669 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
2026-01-29 09:36:34 +00:00
Harry Mellor
fb946a7f89
Make mypy opt-out instead of opt-in ( #33205 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-29 09:12:26 +00:00
Lucas Wilkinson
a650ad1588
[Misc] Remove missed pad_for_cudagraph ( #33283 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-29 09:12:05 +00:00
graftim
d697581a7c
[Doc] Update outdated link to Ray documentation ( #32660 )
...
Signed-off-by: graftim <38649219+graftim@users.noreply.github.com >
2026-01-29 00:56:06 -08:00
shanjiaz
5eeba80c74
Adding optional speculator tests for larger models ( #32943 )
...
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com >
2026-01-29 16:54:02 +08:00
whx
08b1195e62
[PluggableLayer][2/N] Apply PluggableLayer to linear layers ( #33152 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-29 16:53:15 +08:00
cmunley1
3bba2edb0f
support returning tokenids in responses api ( #33212 )
...
Signed-off-by: Christian Munley <cmunley@nvidia.com >
2026-01-29 16:52:39 +08:00
Ilya Markov
53fc166402
[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend ( #33262 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-01-29 16:52:11 +08:00
Didier Durand
31b25f6516
[Doc]: fixing multiple typos in diverse files ( #33256 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 16:52:03 +08:00
wang.yuqi
abb34ac43a
[Bugfix] Fix Qwen3-VL-Reranker load. ( #33298 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 08:42:53 +00:00
Pengchao Wang
2515bbd027
[CI/Build][BugFix] fix cuda/compat loading order issue in docker build ( #33116 )
...
Signed-off-by: Pengchao Wang <wpc@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2026-01-29 00:19:05 -08:00
TJian
c487a8eef4
[Release] [ROCm] Remove old build step ( #33316 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-01-28 23:35:51 -08:00
Kiersten Stokes
9e138cb01d
[Misc][Build] Lazy load cv2 in nemotron_parse.py ( #33189 )
...
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com >
2026-01-29 06:55:50 +00:00
TJian
f9d03599ef
[Release] [CI] Optim release pipeline ( #33156 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-01-28 22:45:42 -08:00
wangln19
39037d258e
Fix tool call indexing double-counting ( #33141 )
...
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn >
2026-01-29 05:57:09 +00:00
Cyrus Leung
51550179fc
[Refactor] Define MM data parser in processing info instead of processor itself ( #33260 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-29 13:55:17 +08:00
Angela Yi
07ea184f00
[ez] Delete more torch version checks <= 2.8 ( #33288 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-29 05:28:46 +00:00
Or Ozeri
a663b218ae
[Misc] Add orozery to CODEOWNERS (core, kv_transfer, kv_offload) ( #33227 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-29 04:24:20 +00:00
Michael Goin
1bd47d6e5a
[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+SM100 ( #33285 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-28 18:40:59 -08:00
Michael Goin
141cd43967
[UX] Remove noisy CT UnquantizedLinearMethod warn ( #33273 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-28 16:09:30 -08:00
Nick Hill
6bf3b46d78
[ModelRunner V2] Misc code simplification and cleanup ( #33266 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-28 14:41:23 -08:00
Matthew Bonanni
77c4f45c6c
[7/N][Attention][Docs] Add documentation for attention backends ( #32477 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-28 17:20:22 -05:00
Michael Goin
ca1969186d
[UX] Enable nested configs in config yaml files ( #33193 )
2026-01-28 16:54:25 -05:00
Gregory Shtrasberg
ab597c869a
[Bugfix] Add missing encoder only guard for do_kv_cache_update ( #33269 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-01-28 21:25:07 +00:00
Angela Yi
4197168ea5
[ez] Remove checks for torch version <= 2.8 ( #33209 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-28 16:03:56 -05:00
Rohan Potdar
59bcc5b6f2
Use aiter triton fused_add_rmsnorm_pad for gpt-oss ( #30976 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-01-28 20:47:47 +00:00
Wentao Ye
3e440786af
[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement ( #32618 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-28 20:30:32 +00:00
Kevin H. Luu
8bdd3979d8
[CI] Change GPU key to device key for B200 test ( #33275 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-28 19:14:29 +00:00
Wentao Ye
c4e744dbd4
[Perf] Optimize moe_permute for CUTLASS FP8 ( #32892 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-28 10:15:24 -08:00
Nicolò Lucchesi
8ebf372e9d
[CI] Whisper tests enforce_eager=False ( #33098 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-28 09:36:56 -08:00
cwazai
f210f0b7b1
[lora/moe] Avoid extra intermediate buffer & Python slicing in expand phase when split_k == 1 ( #32774 )
...
Signed-off-by: 陈建华 <1647430658@qq.com >
2026-01-29 00:22:45 +08:00
Bin Bao
392c5af4fe
[Benchmark] Add startup benchmarking to buildkite run ( #33183 )
...
Signed-off-by: Bin Bao <binbao@meta.com >
2026-01-28 16:03:07 +00:00
Robert Shaw
af9b69f977
[Quantization][Deprecation] Remove Marlin 24 ( #32688 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 15:54:59 +00:00
Chauncey
8e5e40daf4
[Misc] Provide a DeepSeek ReasoningParser with thinking enabled by default ( #33221 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-28 21:16:53 +08:00
Or Ozeri
2e8de86777
Revert "Enable Cross layers KV cache layout at NIXL Connector ( #30207 )" ( #33241 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-01-28 04:36:00 -08:00
Robert Shaw
247d1a32ea
[Quantization][Deprecation] Remove BitBlas ( #32683 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-28 11:06:22 +00:00
Kevin H. Luu
ecb4f82209
[CI] Update job dependency syntax for Intel and AMD jobs ( #33240 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-28 01:33:59 -08:00
Kevin H. Luu
5914090765
[CI] Update job dependency for hardware and CPU jobs ( #33237 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-28 01:10:05 -08:00
Harry Mellor
f1acbd68c5
[CI] Enable mypy import following for vllm/compilation ( #33199 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 08:59:54 +00:00
Yan Ma
9581185d51
[XPU]disable test_acceptance_length UT ( #33226 )
2026-01-28 15:24:13 +08:00
Maryam Tahhan
2dd359f953
[Docs] Simplify CPU x86 Docker build documentation ( #33071 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
2026-01-28 06:37:09 +00:00
Gregory Shtrasberg
22ad649501
[ROCm] Enabling forward_includes_kv_cache on ROCm MHA backends ( #33106 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-01-28 14:36:14 +08:00
ramos
36d450e3b8
Adds FunAudioChat multimodal audio model support ( #2 ) ( #33058 )
...
Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com >
Signed-off-by: mayufeng <mayufeng@example.com >
Co-authored-by: mayufeng <mayufeng@example.com >
2026-01-28 05:18:09 +00:00
22quinn
a2b877df6c
[Bugfix] Lazy import NgramProposer in GPU model runner ( #32821 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2026-01-27 21:07:16 -08:00
Harry Mellor
35fb0b8613
Don't use min_pixels/max_pixels from Qwen2VL's processor ( #33208 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 05:02:08 +00:00
Harry Mellor
2eb673a088
Add flake8-implicit-str-concat rules to Ruff ( #33191 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 04:56:10 +00:00
Jeffrey Wang
a97b5e206d
Relax protobuf library version constraints ( #33202 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2026-01-28 04:15:53 +00:00
Micah Williamson
911b51b69f
[ROCm][CI] Add TORCH_NCCL_BLOCKING_WAIT For Distributed Tests (A100) ( #32891 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-28 11:32:31 +08:00
Xinan Miao
604e3b87e8
[Feature]: Container image WORKDIR consistency ( #33159 )
...
Signed-off-by: SouthWest7 <am1ao@qq.com >
Co-authored-by: SouthWest7 <am1ao@qq.com >
2026-01-28 11:06:48 +08:00
Harry Mellor
706f123b23
[Docs] Use definition lists for CLI reference docs ( #33186 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com >
2026-01-28 02:22:48 +00:00
Angela Yi
fb7abfc1d0
[docs] Improve tlparse section ( #33211 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-28 02:07:37 +00:00
Kevin H. Luu
5d3d6e44e8
[CI] minor fixes to pipeline generator and tests ( #33151 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-27 17:04:02 -08:00
Woosuk Kwon
46ec6d71c7
[Model Runner V2] Use a different stream for grammar bitmask h2d copy ( #33059 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-01-27 16:37:43 -08:00
Matthew Bonanni
e82fa448c4
Add attention benchmarking tools ( #26835 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-01-28 00:09:20 +00:00
Richard Zou
d9aa39a3bb
[torch.compile] Speed up MOE handling in forward_context ( #33184 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-27 15:17:54 -08:00
Wentao Ye
3a6d5cbefd
[Perf] Optimize dcp allocate tensor ( #33102 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-27 17:24:41 -05:00
linhaifeng
f5d7049cc1
[Bugfix] Fix display error (inconsistent with context) ( #33020 )
...
Signed-off-by: linhaifeng <1371675203@qq.com >
2026-01-27 20:33:29 +00:00
Alexei-V-Ivanov-AMD
3c3c547ce0
Enabling "2 node" distributed tests in the AMD CI pipeline. ( #32719 )
...
Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com >
Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-01-27 19:13:21 +00:00
Matthew Bonanni
1cbccb6dba
[Attention] Use has_flashinfer helper ( #33177 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-27 18:33:17 +00:00
Iris
bd92089d33
feature: support eagle3 for HunyuanVL & Hunyuan ( #33035 )
...
Signed-off-by: irisliu10 <601012173@qq.com >
Signed-off-by: Iris <38269816+irisliu10@users.noreply.github.com >
2026-01-27 17:55:48 +00:00
Karan Bansal
a6760f1525
[Doc] Improve serve parameter documentation with meaningful defaults ( #33082 )
...
Signed-off-by: Karan Bansal <karanb192@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-27 09:19:37 -08:00
IriKa
66e601ef79
Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing ( #33076 )
...
Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com >
2026-01-27 11:04:05 -05:00
Nick Hill
0cd259b2d8
[BugFix] Fix P/D with non-MoE DP ( #33037 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-27 08:03:47 -08:00
danielafrimi
83fb2d09e8
Support heterogeneous NemotronHPuzzle model ( #32549 )
...
Signed-off-by: <dafrimi@nvidia.com >
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com >
Signed-off-by: root <dafrimi@nvidia.com >
2026-01-27 10:55:54 -05:00
danisereb
f3a5ee705f
[LoRA][Spec Decode] Support LoRA for Nemotron-H MTP models ( #32265 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-27 07:53:26 -08:00
wang.yuqi
7cbbca9aaa
[Frontend] Cleanup api server ( #33158 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
2026-01-27 15:18:10 +00:00
omkhalil
5ec44056f7
[Metrics][MFU] Fix UnembedMetrics FLOP overcounting for prefill ( #33045 ) ( #33045 )
...
Fix UnembedMetrics to correctly count FLOPs for the unembedding (LM head) layer.
The bug: UnembedMetrics used total_num_tokens() which counts all tokens in the
batch for projection flops, vocab projections are run on just the last token for the
autoregressive use case.
Co-authored-by: Omar Mohamed Khalil <omarkhalil@meta.com >
2026-01-27 15:16:49 +00:00
Nicolò Lucchesi
492a7983dd
[Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1 ( #33090 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-27 15:03:20 +00:00
Matthew Bonanni
a608b4c6c2
[5/N][Attention] Finish eliminating vllm/attention folder ( #32064 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-27 10:02:51 -05:00
Nicolò Lucchesi
1f3a2c2944
[Bugfix] Disable CG for Whisper+FA2 ( #33164 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-27 21:46:51 +08:00
omerpaz95
7227d06156
[Metrics] [KVConnector] Add Offloading Connector metrics ( #27942 )
...
Added queries and hits metrics for the Offloading Connector.
Also added timing metrics for store and load operations, which take the
average time it takes to load/store, per-token.
The metrics are available from Prometheus and from the StatLogger.
Signed-off-by: omerpaz95 <omerpaz95@gmail.com >
Co-authored-by: Omer Paz <Omer.Paz@ibm.com >
2026-01-27 13:34:49 +00:00
Harry Mellor
14385c80fc
Fix weight mapping test for Transfomers v5 ( #33162 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-27 12:30:14 +00:00
wang.yuqi
76139d0801
[Frontend] Frontend will only attach supported tasks corresponding entrypoints. ( #33139 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-27 12:15:43 +00:00
Lifan Shen
da8d0c441a
[AMD][QWEN3-NEXT] FP8 Tunings ( #32042 )
...
Signed-off-by: Lifan Shen <lifans@meta.com >
2026-01-27 09:34:13 +00:00
rasmith
58996f3589
[AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_kernel when on gfx942 ( #32976 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2026-01-27 07:16:43 +00:00
Roger Wang
b539f988e1
[Models] Kimi-K2.5 ( #33131 )
...
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn >
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-27 14:50:31 +08:00
Andreas Karatzas
6c00645712
[CI][Pooling] Stabilize ModernBERT test ( #32909 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-27 05:26:48 +00:00
Ning Xie
b781eeaa15
[code clean] remove duplicate code ( #33135 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-27 04:57:16 +00:00
Cyrus Leung
e0b005d9cf
[Frontend] Cleanup serving engine ( #33103 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 20:47:26 -08:00
Richard Zou
3b8f0fe59e
[torch.compile] Stop assuming 32 bit indexing ( #33113 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-27 04:25:02 +00:00
Cyrus Leung
c831911be2
[Frontend] Reduce mixin usage in serving pooling ( #33101 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-27 11:50:37 +08:00
Paco Xu
157caf511b
[Perf] avoid duplicate mem_get_info() call in get_current_memory_usage ( #33064 )
...
Signed-off-by: Paco Xu <paco.xu@daocloud.io >
2026-01-27 03:45:45 +00:00
Vincent Gimenes
0b53bec60b
[DOC]: Add warning about max_num_batched_tokens and max_model_len when chunked prefill is disabled ( #33109 )
...
Signed-off-by: Vincent Gimenes <147169146+VincentG1234@users.noreply.github.com >
2026-01-27 03:05:02 +00:00
Strahinja Stamenkovic
c568581ff3
Fix IndexError with encoder-decoder models when using Custom Paged Attention ( #33112 )
...
Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com >
2026-01-27 10:33:37 +08:00
wangln19
2d7053438a
fix: preserve native tool call ID in multi-turn tool calling ( #32768 )
...
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Isotr0py <2037008807@qq.com >
2026-01-27 10:22:35 +08:00
Robert Shaw
5a93b9162b
[MoE Refactor] Integrate Naive Prepare Finalize into MK ( #32567 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com >
2026-01-27 01:28:02 +00:00
Woosuk Kwon
6d86fde09c
[Model Runner V2] Remove UvaBufferPool for cpu->gpu copy ( #33055 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-01-26 16:47:35 -08:00
XiongfeiWei
510ed1e8d3
[Bugfix][TPU] Return a Default fp8 MoE Backend ( #32908 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-01-26 18:46:11 -05:00
Pengchao Wang
8caffd92df
[Bugfix][MXFP4] Call trtllm_fp4_block_scale_moe with kwargs ( #33104 )
...
Signed-off-by: Pengchao Wang <wpc@fb.com >
2026-01-26 15:13:18 -08:00
dolpm
58a05b0ca1
[fix] CPUDNNLGEMMHandler pointer baked into inductor artifact ( #32913 )
...
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com >
2026-01-26 16:59:44 -05:00
Jared Wen
6ee7f18f33
[Logging] add --disable-access-log-for-endpoints CLI option ( #30011 )
...
Add a new CLI option --disable-access-log-for-endpoints to suppress
uvicorn access logs for specified endpoints (e.g., /health, /metrics, /ping).
This addresses the need to reduce log noise in production environments
where health check endpoints are frequently polled by load balancers or
monitoring systems, generating excessive log entries that obscure
meaningful request logs.
Fixes #29982
Signed-off-by: JaredforReal <w13431838023@gmail.com >
2026-01-26 21:49:03 +00:00
Wentao Ye
8f987883cb
[Refactor] Remove unused _moe_permute function ( #33108 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-26 16:06:45 -05:00
Kevin H. Luu
ebe0ba91db
[ci] Sync test areas with test-pipeline.yaml and enable new pipeline generator ( #33080 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
Signed-off-by: khluu <khluu000@gmail.com >
Co-authored-by: Kevin Luu <khluu@Kevins-MacBook-Pro.local >
2026-01-26 12:28:20 -08:00
Robert Shaw
43a013c3a2
[Bugfix] Fix Dtypes for Pynccl Wrapper ( #33030 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-26 20:09:32 +00:00
Cyrus Leung
c25dbee40d
[Model] Bump transformers version for test registry ( #33100 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 18:53:22 +00:00
Nicolò Lucchesi
19ab0f7ce5
[Bugfix] Fix Voxtral streaming slot_mapping ( #33073 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-26 10:40:40 -08:00
danielafrimi
67fe677c53
[FIX] Always support TP > 4 for FP4 Gemm ( #31099 )
...
Signed-off-by: dafrimi <dafrimi@nvidia.com >
Co-authored-by: root <root@gpu-51.slurm-workers-slurm.slurm.svc.cluster.local >
2026-01-26 11:04:20 -07:00
Andy Lo
d56afd45fd
Remove unused logic in models/mistral.py ( #33095 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2026-01-26 09:01:52 -08:00
Chauncey
a2393ed496
[CI] Fix AssertionError: MCP tool call not found in output_messages ( #33093 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-26 15:19:57 +00:00
Pleaplusone
be6931ee27
[ROCm][Bugfix] Fix ptpc scale load issue for fused shared expert path in deepseek mtp ( #33018 )
...
Signed-off-by: ganyi <ygan@amd.com >
2026-01-26 23:19:04 +08:00
Chauncey
9ef3b718d9
[Bugfix] Fix Can't instantiate abstract class DeepseekV32IndexerBackend ( #33052 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-26 06:44:02 -08:00
Yuxuan Zhang
bb17e8f11c
[GLM-OCR] GLM-OCR with MTP Support ( #33005 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-26 06:24:43 -08:00
Cyrus Leung
dcd80206b7
[Chore] Update type annotation of input_ids in model forward ( #33063 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 06:02:10 -08:00
danisereb
f4a0921c9c
[Performance] Tune Mamba selective scan kernel for B200 ( #32873 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-01-26 05:56:54 -08:00
VihaanThat
208c56256f
[Feature] Add LoRA support for Gemma3 vision components ( #32764 )
2026-01-26 13:56:40 +00:00
Alex Brooks
9ac818a551
[Misc] HF Hub LoRA Resolver ( #20320 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2026-01-26 13:56:32 +00:00
Itay Etelis
6ca2c91b96
[Model] Use mm_position to compute mrope positions for Qwen3-Omni ( #33010 )
...
Signed-off-by: Itay Etelis <itay.etelis@ibm.com >
Co-authored-by: Itay Etelis <itay.etelis@ibm.com >
2026-01-26 13:48:07 +00:00
cwazai
e33192b269
[lora/moe] Improve fused MoE‑LoRA kernel indexing and memory access ( #32770 )
...
Signed-off-by: 陈建华 <1647430658@qq.com >
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com >
Signed-off-by: kimheesu <wlskaka4@gmail.com >
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: ganyi <ygan@amd.com >
Signed-off-by: whx-sjtu <2952154980@qq.com >
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Signed-off-by: Xin Yang <xyangx@amazon.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com >
Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Huy Do <huydhn@gmail.com >
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Alex Sun <alex.s@amd.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Signed-off-by: Richard Zou <zou3519@gmail.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
Signed-off-by: AuYang <459461160@qq.com >
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com >
Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com >
Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: Karan Bansal <karanb192@gmail.com >
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com >
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: sangbumlikeagod <oironese@naver.com >
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com >
Signed-off-by: Matteo Fari <matteofari06@gmail.com >
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: Orion Reblitz-Richardson <orionr@meta.com >
Signed-off-by: Orion Reblitz-Richardson <orionr@gmail.com >
Signed-off-by: marksverdhei <marksverdhei@hotmail.com >
Signed-off-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Randall Smith <ransmith@amd.com >
Signed-off-by: jon <joninco@bullpoint.org >
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: Luka Govedič <luka.govedic@gmail.com >
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
Signed-off-by: esmeetu <jasonailu87@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Reagan <reaganjlee@gmail.com >
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com >
Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com >
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com >
Signed-off-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com >
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: cwazai <38356712+cwazai@users.noreply.github.com >
Co-authored-by: Yanwen Lin <lyw1124278064@gmail.com >
Co-authored-by: Kim Hee Su <wlskaka4@gmail.com >
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Pleaplusone <ygan@amd.com >
Co-authored-by: whx <56632993+whx-sjtu@users.noreply.github.com >
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: danisereb <daserebrenik@nvidia.com >
Co-authored-by: Yanan Cao <gmagogsfm@users.noreply.github.com >
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Matt <156021403+mawong-amd@users.noreply.github.com >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: Lucain <lucainp@gmail.com >
Co-authored-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Micah Williamson <micah.williamson@amd.com >
Co-authored-by: Andreas Karatzas <akaratza@amd.com >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com >
Co-authored-by: Kebe <mail@kebe7jun.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Alex Sun <minchsun@amd.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Shengqi Chen <harry-chen@outlook.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Lucas Kabela <lucaskabela@meta.com >
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com >
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com >
Co-authored-by: Xu Jinyang <72930776+AuYang261@users.noreply.github.com >
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: David Ramon Prados <davidramon3@hotmail.es >
Co-authored-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Fadi Arafeh <115173828+fadara01@users.noreply.github.com >
Co-authored-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com >
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Co-authored-by: Rishabh Saini <rishabhsaini01@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Karan Bansal <karanb192@users.noreply.github.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: tianshu-Michael-yu <101950379+tianshu-Michael-yu@users.noreply.github.com >
Co-authored-by: Raushan Turganbay <raushan@huggingface.co >
Co-authored-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com >
Co-authored-by: Matteo Fari <matteofari06@gmail.com >
Co-authored-by: Harry Huang <vastrockhuang162@gmail.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Orion Reblitz-Richardson <orionr@gmail.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
Co-authored-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
Co-authored-by: rasmith <Randall.Smith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: joninco <joninco@bullpoint.org >
Co-authored-by: dolpm <34420038+dolpm@users.noreply.github.com >
Co-authored-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com >
Co-authored-by: Luka Govedič <luka.govedic@gmail.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: monajafi-amd <mohammad.najafi@amd.com >
Co-authored-by: ruizcrp <ruiz.crp@gmail.com >
Co-authored-by: Shengqi Chen <i@harrychen.xyz >
Co-authored-by: 7. Sun <jhao.sun@gmail.com >
Co-authored-by: Roy Wang <jasonailu87@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com >
Co-authored-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com >
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com >
Co-authored-by: david guan <102001211+Chenhao-Guan@users.noreply.github.com >
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: Maryam Tahhan <mtahhan@redhat.com >
Co-authored-by: Joshua Deng <91448271+joshuadeng@users.noreply.github.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-26 04:56:34 -08:00
Cyrus Leung
61274bdef5
[Doc] Further update multi-modal impl doc ( #33065 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 10:54:20 +00:00
ltd0924
b40db4dfec
[StepVL] add step vl offline example ( #33054 )
...
Signed-off-by: luotingdan <luotingdan@stepfun.com >
Co-authored-by: luotingdan <luotingdan@stepfun.com >
2026-01-26 01:00:32 -08:00