Jeffrey Wang
5f7f9ea884
Relax protobuf library version constraints ( #33202 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
(cherry picked from commit a97b5e206d )
2026-01-28 02:17:19 -08:00
Nick Hill
7779de34da
[BugFix] Fix P/D with non-MoE DP ( #33037 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
(cherry picked from commit 0cd259b2d8 )
2026-01-28 02:17:08 -08:00
Nicolò Lucchesi
0d8ce320a2
[Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1 ( #33090 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
(cherry picked from commit 492a7983dd )
2026-01-28 02:16:56 -08:00
Nicolò Lucchesi
d51e1f8b62
[Bugfix] Disable CG for Whisper+FA2 ( #33164 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
(cherry picked from commit 1f3a2c2944 )
2026-01-28 02:16:41 -08:00
Roger Wang
5042815ab6
[Models] Kimi-K2.5 ( #33131 )
...
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn >
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit b539f988e1 )
2026-01-28 02:16:28 -08:00
Chauncey
afb390ab02
[CI] Fix AssertionError: MCP tool call not found in output_messages ( #33093 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
(cherry picked from commit a2393ed496 )
2026-01-28 02:16:14 -08:00
Robert Shaw
cf1167e50b
[Bugfix] Fix Dtypes for Pynccl Wrapper ( #33030 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
(cherry picked from commit 43a013c3a2 )
2026-01-26 12:37:16 -08:00
Cyrus Leung
11b556878b
[Refactor] Use data parser for matching data items to multi-modal UUIDs ( #32955 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 15:00:28 +08:00
Danielle Robinson
ee484b3f4b
Set splitk=1 for fused-moe-lora expand kernel ( #32882 )
...
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-25 22:52:34 -08:00
Woosuk Kwon
a9b53dd435
[Model Runner V2] Add LoRAState to consolidate lora logic ( #33062 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-01-25 22:21:12 -08:00
Robert Shaw
254db42ede
[Tests] Remove Duplicates ( #33032 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-26 05:23:54 +00:00
ltd0924
105d104576
[StepVL] support close img patch ( #32923 )
...
Signed-off-by: luotingdan <luotingdan@stepfun.com >
Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: luotingdan <luotingdan@stepfun.com >
2026-01-25 20:56:39 -08:00
Lucas Wilkinson
566cdb6cfb
[CI] Fix MHA attention test failure (AttributeError when model_config is None in ViT attention backend) ( #33033 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-25 19:49:53 -08:00
Woosuk Kwon
2f0d3ba745
[Model Runner V2] Minor simplification for finish_requests ( #33048 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-01-25 18:35:02 -08:00
Woosuk Kwon
edf927bc9f
[Model Runner V2] Fix slot_mapping after #25954 ( #33046 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-01-25 18:29:49 -08:00
Andreas Karatzas
22aeb43007
[Bugfix][VLM] Fix transformers backend embed_multimodal for Qwen2.5-VL profiling ( #32969 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-26 08:34:05 +08:00
Itay Etelis
a698e8e7ad
[Model] Use mm_position to compute mrope positions for Qwen2.5-Omni ( #32772 )
...
Signed-off-by: Itay Etelis <itay.etelis@ibm.com >
Co-authored-by: Itay Etelis <itay.etelis@ibm.com >
2026-01-25 20:15:53 +08:00
zhanqiuhu
151e5451c2
[Doc] Add Qwen2.5 models to batch invariance tested models ( #33016 )
...
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
2026-01-25 09:20:46 +00:00
Jee Jee Li
73b243463b
[BugFix] Add env variable to control PDL in LoRA ( #32836 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-25 16:32:30 +08:00
JJJYmmm
7e67df5570
[Bugfix] fix encoder cache hang in Qwen3VL ( #32684 )
...
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-25 05:17:31 +00:00
7. Sun
ff6c1da4e6
[Docs] Fix Apple silicon include path in CPU installation docs ( #32977 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-25 01:51:49 +00:00
Roberto L. Castro
fcb9df99bd
[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) ( #32520 )
...
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
2026-01-24 18:45:27 -07:00
TJian
1ebdff412a
[DOC] [ROCm] Update doc for v0.14.1 ( #32998 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-01-25 09:13:21 +08:00
Joshua Deng
91601ff478
[Feature] add session based streaming input support to v1 ( #28973 )
...
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com >
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-24 12:06:28 -08:00
yugong333
d4dbb7af63
Using max_loras + 1 to construct grid in fused_moe_lora ( #32277 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
2026-01-24 12:39:30 -05:00
Maryam Tahhan
203d0bc0c2
[CPU] Improve CPU Docker build ( #30953 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-01-24 17:08:24 +00:00
Fadi Arafeh
17ab54de81
[CPU Backend][BugFix] Fix failing Darwin pipelines ( #33002 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-24 17:02:22 +00:00
7. Sun
cd775bdbe0
[Tests] Replace flaky sleep with polling in test_background_cancel ( #32986 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-24 16:39:07 +00:00
Lucas Wilkinson
da5e7b12be
[MLA] Fuse cat and qaunt for fp8 kv-cache ( #32950 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-24 16:03:02 +00:00
Louie Tsai
719ac592ed
Update CPU doc according to feedback ( #32963 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-24 16:02:44 +00:00
Hiroken.
1209b784f2
[Bugfix]: resolve torch.compile cache conflict between mm_encoder_tp_modes ( #32842 )
...
Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com >
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com >
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com >
2026-01-24 14:45:14 +00:00
Lukas Geiger
5fa0f6efa9
[EncoderCacheManager] Remove unnecessary copy ( #32800 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2026-01-24 14:28:57 +00:00
david guan
bc0d291bfe
feat: Complete LoRA support for MiniMaxM2 Fixes #32736 ( #32763 )
...
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-01-24 20:48:46 +08:00
Isotr0py
9ad7f89f55
[Models]: Make Multimodal config implicit in ViT implementation ( #31972 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-24 20:34:26 +08:00
Hiroken.
6450b536a6
[Bugfix] Fix E2E latency calculation and add warmup support in mm_processor benchmark ( #32646 )
...
Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com >
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com >
Signed-off-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com >
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com >
2026-01-24 10:31:41 +00:00
7. Sun
0f19427db5
[Perf] Cache exc.errors() result in validation exception handler ( #32984 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-24 02:01:35 -08:00
Cyrus Leung
51931c5c9a
[UX] Deduplicate sampling parameter startup logs ( #32953 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-24 17:37:28 +08:00
Reagan Lee
06b557ecd9
feat(benchmark): add encoder forward pass benchmarking to mm-processor ( #31655 )
...
Signed-off-by: Reagan <reaganjlee@gmail.com >
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com >
Co-authored-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com >
2026-01-24 08:24:44 +00:00
Roger Wang
81c2a889ce
[Doc] Ignore typo check on doc ( #32999 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-01-23 23:52:22 -08:00
Isotr0py
8edaf38570
[Models] Add SharedFusedMoE support to Qwen3MoE ( #32082 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-23 23:36:31 -08:00
Roy Wang
5c86a89805
[docs] Update governance process links ( #32995 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-23 23:32:44 -08:00
7. Sun
0ccecf8833
[Tests] Standardize RNG seed utility across test files ( #32982 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-24 06:47:14 +00:00
7. Sun
0b9a735e11
[Tests] Clarify pytest skip reasons with actionable context ( #32981 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-24 06:38:50 +00:00
7. Sun
14d03b8ddb
[Perf] Cache xpu_get_mem_info() result to avoid duplicate calls ( #32983 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-23 20:56:23 -08:00
Michael Goin
d0cbac5827
[Dev UX] Add auto-detection for VLLM_PRECOMPILED_WHEEL_VARIANT during install ( #32948 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Shengqi Chen <i@harrychen.xyz >
2026-01-23 19:15:17 -08:00
ruizcrp
c0d820457a
Auth_token added in documentation as it is required ( #32988 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-24 03:03:05 +00:00
monajafi-amd
97ef11dd34
[ROCm][ViT] Enable Flash Attention Triton backend on RDNA3/RDNA4 ( #32944 )
...
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com >
2026-01-24 10:03:07 +08:00
Xin Yang
ecc3dd66cc
[Bugfix] Fix FusedMoE LoRA kernel offs_token out of bound value ( #32279 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-24 01:41:35 +00:00
Joe Runde
7e1f10d562
[Core][Bugfix] allow graceful worker termination ( #32965 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2026-01-23 17:28:45 -08:00
ElizaWszola
a28b94e6ef
[Performance] Split FlashAttn attention and cache update ( #25954 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Luka Govedič <luka.govedic@gmail.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <luka.govedic@gmail.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
2026-01-23 17:28:06 -08:00
dolpm
0118cdcc02
[fix] add VLLM_OBJECT_STORAGE_SHM_BUFFER_NAME to compile factors ( #32912 )
...
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com >
2026-01-23 22:53:10 +00:00
Shengqi Chen
136c499f6e
[CI] fix version comparsion and exclusion patterns in upload-release-wheels.sh ( #32971 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2026-01-23 22:21:49 +00:00
joninco
ebd0a17e0e
[Bugfix] Fix missing is_layer_skipped check for FusedMoE in AWQConfig ( #32935 )
...
Signed-off-by: jon <joninco@bullpoint.org >
2026-01-23 17:19:56 -05:00
Wentao Ye
37c9859fab
[Refactor] Clean up unused variables & func ( #32692 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-23 17:04:25 -05:00
Michael Goin
4561f13985
[Refactor] Rename gptq_marlin to marlin to match MoE ( #32952 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-23 16:48:12 -05:00
rasmith
6cc6d92be5
[CI][AMD][BugFix] Update wvSplitK (and other skinny_gemm wrappers) to ensure tensors passed will be made contiguous for the kernel ( #32831 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2026-01-23 13:35:48 -08:00
Wentao Ye
dfab5f3764
[Bug] Fix benchmark script moe_permute_unpermute ( #32949 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-23 16:18:56 -05:00
Markus / Mark
586a57ad7e
fix: Add glm4_moe_lite to MLA detection ( #32614 )
...
Signed-off-by: marksverdhei <marksverdhei@hotmail.com >
Signed-off-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2026-01-23 12:38:57 -08:00
Lucas Wilkinson
3a41459501
[cudagraphs] Refactor cudagraph capture loop ( #32946 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-23 13:22:20 -07:00
Nick Hill
8518b30447
[Model Runner V2] Add KV Connector support ( #32742 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-23 10:49:17 -08:00
Matthew Bonanni
2d6b537157
[Bugfix][CI] Fix pre-commit ( #32956 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-23 10:26:56 -08:00
Orion Reblitz-Richardson
68b0a6c1ba
[CI][torch nightlies] Use main Dockerfile with flags for nightly torch tests ( #30443 )
...
Signed-off-by: Orion Reblitz-Richardson <orionr@meta.com >
Signed-off-by: Orion Reblitz-Richardson <orionr@gmail.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-01-23 10:22:56 -08:00
Harry Huang
5206e5e28c
[V1][Hybrid] Mamba Prefix Caching with align mode ( #30877 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2026-01-23 09:56:48 -08:00
Matteo Fari
fec9da0af4
[Model] Enable LoRA support for internvl2 ( #32397 )
...
Signed-off-by: Matteo Fari <matteofari06@gmail.com >
2026-01-24 01:39:01 +08:00
Luka Govedič
bbbd696af9
[torch.compile][CI] Add back attn fusion on hopper/ada ( #32940 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2026-01-23 16:49:20 +00:00
sangbumlikeagod
9b77bb790d
[Frontend] add logprob, compression_rate to 'verbose_json' features ( #31059 )
...
Signed-off-by: sangbumlikeagod <oironese@naver.com >
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com >
2026-01-23 16:35:13 +00:00
Matt
305e53ade8
[Hardware][AMD][CI][Bugfix] Fix Kernels Attention Cache test ( #32904 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-23 16:24:26 +00:00
Mark McLoughlin
1cb4341fbc
[ROCm][PD] Remove unused moriio connector proxy code ( #32939 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-01-23 15:59:04 +00:00
baonudesifeizhai
1fb648bf10
[Bugfix] Fix FP8 MoE EP Weight Loading for ModelOpt Llama4 ( #32886 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2026-01-23 10:31:48 -05:00
Nicolò Lucchesi
7e22309755
[Misc] Postpone torch_profiler deprecation ( #32867 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-23 14:39:48 +00:00
Xin Yang
90c2007932
[Bugfix] Disable tma_aligned_scales in test_fusions_e2e ( #32916 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-23 14:34:30 +00:00
Raushan Turganbay
d95d650762
[Bugfix] Fix getting vision features in Transformer Multimodal backend ( #32933 )
...
Signed-off-by: raushan <raushan@huggingface.co >
2026-01-23 13:34:48 +00:00
tianshu-Michael-yu
13d8746c54
[Feature]: Remove DtoH Copy for lfm2_vl On Default Stream ( #32815 )
...
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com >
2026-01-23 13:20:30 +00:00
Fadi Arafeh
10e94c84f6
[CPU][Feat] Update PyTorch to v2.10 for CPU Backend ( #32869 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-01-23 21:13:06 +08:00
Isotr0py
243e78c20f
[Benchmark][Bugfix] Fix race condtion when starting server for sweep benchmark ( #32927 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-23 12:11:18 +00:00
Fadi Arafeh
aac0b817fa
[CPU Backend][BugFix] Fix failing CPU MoE test ( #32876 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-23 12:06:51 +00:00
wang.yuqi
05f3d714db
[Frontend][3/n] Make pooling entrypoints request schema consensus | EmbedRequest & ClassifyRequest ( #32905 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:03:44 +00:00
Patrick von Platen
3f3f89529d
[Voxtral] Add new streaming arch ( #32861 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:41:52 +01:00
Li, Jiang
5da4c7d789
[CI/Build][CPU] Fix failed pooling tests and macos smoke test ( #32907 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 10:48:20 +00:00
Nicolò Lucchesi
160c6fa387
[Misc] Add get_name to missing AttentionBackends ( #32698 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-23 10:35:44 +00:00
Andreas Karatzas
a8eb1182f1
[CI][Models] Add VLM Support for Sequence Classification Conversion ( #32885 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-23 16:22:51 +08:00
Karan Bansal
fa6e599a61
[Bugfix] Fix _CPU_MOE_ACT AssertionError when vLLM config not set ( #32777 )
...
Signed-off-by: Karan Bansal <karanb192@gmail.com >
2026-01-23 08:22:37 +00:00
Wentao Ye
7ef5873752
[CI] Fix mypy for vllm/v1/structured_output ( #32722 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-23 11:55:51 +08:00
Luka Govedič
5e4e0e51f4
[torch.compile] Compile CustomOp.forward_native for SiluAndMul and QuantFP8 to avoid raw torch ops inside opaque custom ops ( #32806 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-22 19:52:26 -08:00
Rishabh Saini
f61c9da711
[BugFix] deepseek_v32_encoding: Replace asserts with proper exceptions ( #32884 )
...
Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com >
2026-01-23 03:44:11 +00:00
Nick Hill
7fe255889e
[Misc] Log vLLM logo when starting server ( #32796 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-23 11:15:12 +08:00
bnellnm
dc917cceb8
[MoE Refactor] Move select_experts from FusedMoEQuantMethod -> FusedMoE ( #31996 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-01-22 18:21:35 -05:00
Fadi Arafeh
fc56f4a071
[BugFix] Fix invalid flashinfer_fused_moe_blockscale_fp8 op registration ( #32855 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-22 22:27:40 +00:00
Xin Yang
d08b356ee0
[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper ( #32619 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-22 15:47:04 -05:00
Wentao Ye
f744810184
[Refactor] Remove unused tpu files ( #32610 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-22 15:35:18 -05:00
Eldar Kurtić
44f08af3a7
Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) ( #30141 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com >
Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com >
2026-01-22 13:29:57 -07:00
Matthew Bonanni
955b43a5a5
[Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 ( #32795 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-22 19:05:18 +00:00
Fadi Arafeh
744ef30484
[CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm ( #32792 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-22 18:55:23 +00:00
Matthew Bonanni
300622e609
[CI][Attention] Add more CI dependencies for attention tests ( #32487 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-22 18:44:56 +00:00
RickyChen / 陳昭儒
69d09fdd6c
[Feature] Add --ssl-ciphers CLI argument for TLS cipher control ( #30937 )
...
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com >
2026-01-22 09:53:24 -08:00
David Ramon Prados
3a63be0faa
Support custom URI schemes and trace handlers for profiler ( #32393 )
2026-01-22 09:45:40 -08:00
Tyler Michael Smith
803e3f3f68
[UX] Default api_server_count to dp_size if not specified ( #32525 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-01-22 17:35:35 +00:00
Vadim Gimpelson
70917b1c55
[MISC] Add .cursor to .gitignore ( #32868 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-01-22 17:27:13 +00:00
Matt
c517d8c934
[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars ( #32837 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-23 00:59:15 +08:00
Xu Jinyang
fc37187a51
[Bugfix] ModelScope is supported when downloading LORA models. ( #32844 )
...
Signed-off-by: AuYang <459461160@qq.com >
2026-01-22 16:33:21 +00:00
Maximilien de Bayser
ff365eea94
Support bge-m3 sparse embeddings and colbert embeddings ( #14526 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
2026-01-22 23:52:57 +08:00
Isotr0py
444e2e7e1f
[Misc] Bump opencv-python dependecy version to 4.13 ( #32668 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-22 15:51:15 +00:00
Nick Hill
bc14663e6a
[Cleanup] Move scheduler get_routed_experts logic to separate method ( #32706 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-22 10:46:00 -05:00
Richard Zou
654a71fc3c
[torch.compile] Improve Cold Start for MoEs ( #32805 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-22 10:44:40 -05:00
Lucas Kabela
15e302dfce
[Misc][BE] Turn on strict type coverage for vllm/compilation ( #31756 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-22 15:12:26 +00:00
Cyrus Leung
d117a4d1a9
[Frontend] Introduce Renderer for processing chat messages (using ModelConfig) ( #30200 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-22 12:44:22 +00:00
Or Ozeri
421012b63a
OffloadingConnector: Support kernel_block_size != block_size ( #30692 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-22 12:30:04 +00:00
Chauncey
841d53aaa8
[Frontend] add prompt_cache_key for openresponses ( #32824 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-22 11:34:14 +00:00
Shengqi Chen
1752262e96
[CI] refactor release pipeline config into groups ( #32833 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2026-01-22 11:27:21 +00:00
Nicolò Lucchesi
ea6102b85d
[Bugfix] Fix Whisper/encoder-decoder GPU memory leak ( #32789 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-22 10:50:37 +00:00
wang.yuqi
328cbb2773
[Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest ( #32574 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-22 10:32:44 +00:00
liranschour
64e3d67ac0
Enable Cross layers KV cache layout at NIXL Connector ( #30207 )
...
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
2026-01-22 10:12:58 +00:00
Nick Hill
098b2d66fe
[Benchmark] Don't default to temperature==0 in vllm bench serve ( #32723 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-22 10:03:15 +00:00
Isotr0py
8ebf271bb6
[Misc] Replace urllib's urlparse with urllib3's parse_url ( #32746 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-22 16:37:15 +08:00
Alex Sun
49a1262267
[AMD][ROCm] MoRI EP: a high-performance all2all backend ( #28664 )
...
Signed-off-by: Alex Sun <alex.s@amd.com >
2026-01-22 16:33:18 +08:00
Cyrus Leung
2b8a38b6d6
[Model] Extend collect_children and no_init_weights contexts ( #32757 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-22 08:20:27 +00:00
Kebe
1bf1a34b19
[bench] add start_times field to vllm bench serve json result ( #32667 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2026-01-22 07:10:14 +00:00
Andreas Karatzas
a810299838
[ROCm][CI][Docs] Add comment explaining TRITON_ATTN fallback for ROCm ( #32835 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-21 22:11:09 -08:00
Andreas Karatzas
eb1629da24
[ROCm][CI] Fix AITER test flakiness by using explicit attention backend ( #32346 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-22 13:55:25 +08:00
Micah Williamson
019e2c3b7c
[ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization ( #32731 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-22 05:47:33 +00:00
Huy Do
f5fdec8ce2
Upgrade transformers-4.57.5 ( #32287 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2026-01-22 05:19:19 +00:00
Patrick von Platen
1579c9b5fd
[Llama.py -> mistral.py] Extract mistral-only relevant code into separate file ( #32780 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-01-22 05:14:57 +00:00
Lucas Wilkinson
889722f3bf
[FlashMLA] Update FlashMLA to expose new arguments ( #32810 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-21 22:02:39 -07:00
Divakar Verma
49d9653852
[ROCm][CI] fix get_valid_backends ( #32787 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-01-22 04:27:47 +00:00
Ifta khairul Alam Adil
a1d82466ea
[Docs] Remove outdated async_scheduling limitation with speculative decoding ( #32775 )
...
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com >
Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com >
2026-01-21 20:19:25 -08:00
Lucain
24a163ed77
Cleanup some huggingface_hub-related stuff ( #32788 )
2026-01-22 03:38:17 +00:00
knlnguyen1802
378385b90c
[EC Connector] Optimize remote cache check in scheduler ( #32585 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
2026-01-22 03:30:59 +00:00
Matt
c5487e2b96
[Bugfix] Fix potential EAGLE spec decode segfault during graph capture ( #32818 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-22 03:11:55 +00:00
Wentao Ye
6437ff1fb9
[Deprecation] Remove deprecated environment variables ( #32812 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-22 02:25:16 +00:00
Woosuk Kwon
5e00b561cd
[Model Runner V2] Do not error on attention backends ( #32820 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-21 17:02:48 -08:00
Woosuk Kwon
408195ec59
[Model Runner V2] Refactor Prompt Logprobs ( #32811 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-21 15:12:20 -08:00
Xin Yang
63227accf5
[Kernel] Add topk_sigmoid kernel ( #31246 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-21 22:49:51 +00:00
Yanan Cao
e675dda67b
[Misc] Add Helion version check to collect_env ( #32797 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-01-21 21:54:46 +00:00
Nick Hill
24dc30f7ff
[ModelRunner V2] Don't pin reused flashinfer tensors ( #32799 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-21 13:17:43 -08:00
Divakar Verma
180fba653e
[ROCm] fix import for on_gfx9 ( #32783 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-01-21 18:41:11 +00:00
danisereb
f999539869
Add missing import of fused_topk to benchmark_moe ( #32784 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-01-21 18:30:10 +00:00
Woosuk Kwon
e1da249c93
[Model Runner V2] Minor refactor for compute_slot_mappings ( #32794 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-21 10:24:35 -08:00
Nick Hill
9b693d023c
[Misc] Omit "disable NCCL for DP sync" startup log when not applicable ( #32707 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-21 17:03:39 +00:00
elvischenv
808d6fd7b9
Bump Flashinfer to v0.6.1 ( #30993 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2026-01-21 08:49:50 -08:00
whx
1861ae8aae
[PluggableLayer][1/N] Define PluggableLayer (Fix ci) ( #32744 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-21 11:38:04 -05:00
Robert Shaw
4e31b7f228
[Quantization][Deprecation] Remove RTN ( #32697 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-21 16:34:42 +00:00
Pleaplusone
6c20e89c02
[ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp ( #29287 )
...
Signed-off-by: ganyi <ygan@amd.com >
2026-01-21 23:16:30 +08:00
Robert Shaw
85f55c943c
[Quantization][Deprecation] Deprecate HQQ ( #32681 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-21 09:32:40 -05:00
Robert Shaw
cea3c754c4
[Quantization][Deprecation] Remove DeepSpeedFp8 ( #32679 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-21 09:32:12 -05:00
Robert Shaw
42135d6898
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority ( #32414 )
2026-01-21 08:22:33 -05:00
Divakar Verma
e14467be43
[bugfix] Aria model ( #32727 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-01-21 05:11:31 -08:00
Kim Hee Su
7727ce35c2
[Model] Add Eagle2.5-8B Vision-Language Model support ( #32456 )
...
Signed-off-by: kimheesu <wlskaka4@gmail.com >
2026-01-21 09:39:53 +00:00
Yanwen Lin
6bb2bc71e2
[Bugfix] Force using spawn multiprocess method when it's the WSL platform ( #32749 )
...
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com >
2026-01-21 09:35:55 +00:00
Lucas Kabela
c80f92c14d
[Documentation] Fix typo in docs/design/torch_compile_multimodal.md ( #32741 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-20 23:54:20 -08:00
RickyChen / 陳昭儒
f23fb5a7c1
[Bugfix] Support HF sharded weights for Mistral3/Pixtral models ( #32673 )
...
Signed-off-by: ricky-chaoju <ricky.chen@infinirc.com >
Signed-off-by: vllm-dev <ricky.chen@infinirc.com >
2026-01-20 23:27:30 -08:00
Paco Xu
360aa93f8f
[Docs] Fix GitHub handle in governance process ( #32582 )
...
Signed-off-by: Paco Xu <paco.xu@daocloud.io >
2026-01-21 07:07:50 +00:00
Netanel Haber
27ca95b3c9
[Bugfix] Fix Nemotron-Nano-v2-vlm static resolution ( #32682 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-01-21 06:28:21 +00:00
Lucas Wilkinson
b4f64e5b02
Update FlashMLA ( #32491 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-21 13:03:37 +08:00
shanjiaz
7ab80a8e37
Added qwen3 vision language moe support for speculative decoding ( #32048 )
...
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com >
Signed-off-by: shanjiaz <43143795+shanjiaz@users.noreply.github.com >
2026-01-21 03:24:05 +00:00
gopalsarda
0900cedb3f
Enable Eagle3 speculative decoding for Pixtral (LlavaForConditionalGeneration) ( #32542 )
...
Signed-off-by: gopalsarda <gopal.sarda@servicenow.com >
2026-01-21 11:18:05 +08:00
Nick Hill
6f067b1fb7
[Cleanup] Remove unused KVConnectorModelRunnerMixin methods ( #32077 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-21 11:16:37 +08:00
Alex Brooks
27b81e010d
[Bugfix] Fix Granite Vision / Don't use Siglip Pooling Head Nested Models by Default ( #32299 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2026-01-21 11:11:52 +08:00
Or Ozeri
7013e9ac8f
OffloadingConnector: Prevent redundant loads ( #29087 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-21 01:15:42 +00:00
Robert Shaw
c78ee240b3
Revert "[PluggableLayer][1/N] Define PluggableLayer" ( #32725 )
2026-01-21 00:21:06 +00:00
Vasiliy Kuznetsov
d2389c1262
fp8 online quant: split out Fp8OnlineLinearMethod ( #32189 )
2026-01-20 18:13:22 -05:00
Micah Williamson
22375f8d13
[ROCm][CI] Remove DS async eplb accuracy test from AMD CI ( #32717 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-20 13:40:48 -08:00
TJian
9b67338b78
[Bugfix] Suppress log on non-ROCm platform ( #32703 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-01-20 13:38:20 -08:00
Lucas Wilkinson
2261340806
[Misc] Remove pad_for_cudagraphs from config ( #30143 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-20 15:05:48 -05:00
Shinichi Hemmi
86c69dc54c
[Bugfix] Fix byte fallback handling when using outlines ( #31391 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
Co-authored-by: Kenichi Maehashi <maehashi@preferred.jp >
2026-01-20 19:48:08 +00:00
dolpm
7c5dedc247
[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction ( #25205 )
...
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com >
2026-01-20 19:45:59 +00:00
Cyrus Leung
193069d129
[5/N] Initialize MM components in context managers (Q-Z) ( #32695 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 19:10:23 +00:00
Rahul Tuli
f0feb1cf81
Test: added acceptance length tests ( #32030 )
...
Signed-off-by: rahul-tuli <rtuli@redhat.com >
2026-01-20 18:55:15 +00:00
Cyrus Leung
09194b90a5
[Doc] Update docs for MM model development with context usage ( #32691 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 10:37:35 -08:00
Woosuk Kwon
9ab4388cd3
[Model Runner V2] Support FLASHINFER_MLA backend ( #32709 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-20 10:26:17 -08:00
JJJYmmm
04a9e064db
[Bugfix] fix the ima issue of qwen-vit ( #32687 )
...
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
2026-01-20 17:21:25 +00:00
TJian
c025263ddd
[Doc] [ROCm] Update ROCm getting started doc ( #32580 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 09:20:08 -08:00
Wentao Ye
6c97b9b9b6
[Perf] Only clone when needed for moe_permute ( #32273 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-20 11:34:39 -05:00
whx
4ca62a0dbd
[PluggableLayer][1/N] Define PluggableLayer ( #32331 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-20 16:19:21 +00:00
linhaifeng
7901109ea5
[Bugfix] Fix Off-by-one error in _num_tokens_to_min_blocks calculation ( #32603 )
...
Signed-off-by: linhaifeng <1371675203@qq.com >
2026-01-20 11:13:39 -05:00
YiSheng5
13f6630a9e
[XPU]Support AgRsAll2AllManager on XPU device ( #32654 )
...
Signed-off-by: yisheng <yi.sheng@intel.com >
2026-01-20 14:27:24 +00:00
Cyrus Leung
fda3f03eb2
[4/N] Initialize MM components in context managers (M-P) ( #32663 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 14:06:32 +00:00
杨朱 · Kiki
bb9172030e
[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric ( #32661 )
...
This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com >
2026-01-20 12:28:41 +00:00
Chauncey
c4e5bdf61b
[Bugfix] Fix the fp8_mqa_logits dim mismatch ( #32652 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-20 18:48:07 +08:00
Cyrus Leung
7f1bcd18ff
[3/N] Initialize MM components in context managers (I-L) ( #32650 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 10:21:56 +00:00
Walter Beller-Morales
8be263c3fb
[Core] Cleanup shm based object store on engine shutdown ( #32429 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-01-20 08:53:37 +00:00
Cyrus Leung
e1a34c3a5d
[2/N] Initialize MM components in context managers (E-H) ( #32641 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 08:12:56 +00:00
vllmellm
148117ea2e
[Refactor] Make FP8 Linear Ops use kernel abstraction ( #27814 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-20 14:48:20 +08:00
Woosuk Kwon
e9c83cdc51
[Model Runner V2] Skip kernel launch for penalties & logit_bias ( #32634 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-19 22:20:19 -08:00
Cyrus Leung
b75e85dede
[1/N] Initialize MM components in context managers (A-D) ( #32632 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 14:12:42 +08:00
Cyrus Leung
4753f3bf69
[Model] Use context managers for encoder- and LM-only mode ( #32605 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 11:43:38 +08:00
Woosuk Kwon
6c01ffb897
[Model Runner V2] Decouple temperature from penalties ( #32629 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-19 19:13:24 -08:00
Woosuk Kwon
7b7cdce968
[Model Runner V2] Refactor get_cudagraph_and_dp_padding ( #32625 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-19 18:25:02 -08:00
Jackmin801
12dab78f49
[Feat] allow inplace loading lora ( #31326 )
...
Signed-off-by: Jackmin801 <ongjackm@gmail.com >
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-20 10:15:20 +08:00
Woosuk Kwon
05dc4bfab6
[Model Runner V2] Initialized communication buffer for DP ( #32624 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-19 17:27:06 -08:00
Matthew Bonanni
1a1fc3bbc0
[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill ( #32615 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-19 18:41:34 -05:00
Woosuk Kwon
43fada5360
[Model Runner V2] Refactor dummy_run ( #32533 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-19 14:50:59 -08:00
Tomas Ruiz
4a5299c93f
feat: spec decode with draft models ( #24322 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2026-01-19 16:05:46 -05:00
lon
73f2a81c75
docs: prefix caching seems quite outdated ( #28784 )
...
Signed-off-by: lon <114724657+longregen@users.noreply.github.com >
Signed-off-by: Russell Bryant <russell.bryant@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <russell.bryant@gmail.com >
2026-01-19 11:49:52 -08:00
jiahanc
7350331718
[BugFix] Fix TRT-LLM NVFP4 DP/EP ( #32349 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-19 14:32:24 -05:00
Yanan Cao
9d1e611f0e
[CI] Add Helion as an optional dependency ( #32482 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-01-19 19:09:56 +00:00
Vadim Gimpelson
0727cc9ecf
[BUGFIX] Fix test_mla_backends.py. Scale MLA projection weights to prevent numerical instability ( #32529 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-01-19 13:49:29 -05:00
qli88
a0490be8f1
[CI][amd] Revert NIXL connector change to avoid crash ( #32570 )
...
Signed-off-by: Qiang Li <qiang.li2@amd.com >
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-19 18:39:16 +00:00
Netanel Haber
cd3ac5b797
support dynamic resolution image encoding for Nemotron Nano VL ( #32121 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com >
2026-01-19 18:15:58 +00:00
Jee Jee Li
2636d76257
[Misc] Remove unused ModelKeys ( #32608 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-19 17:34:59 +00:00
danisereb
aa7f37ccfa
Add support for LoRA adapters in Nemotron-H models ( #30802 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-01-19 22:30:44 +08:00
wang.yuqi
c88860d759
[Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs ( #32577 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-19 14:07:46 +00:00
Nicolò Lucchesi
758df5afe7
[NIXL][Metrics] Track nixl_num_kv_expired_reqs metric in Prometheus ( #32340 )
...
Add a new metric to track the number of requests that had their KV blocks
expire. The scenario is particularly important to surface and track as it is a
vital indicator of the health of the deployment.
Currently we're resorting to track these failures through unstructured log
parsing (which is, among other thing, error string dependent); current main:
> Releasing expired KV blocks for request cmpl-071d which were retrieved by 0 decode worker(s) within 0 seconds.
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-19 12:28:27 +00:00
Daniel Mescheder
cdd03d25d3
[CI/Build] Fix dependency conflict between model-hosting-container-standards and starlette ( #32560 )
...
Signed-off-by: Daniel Mescheder <dmesch@amazon.com >
Co-authored-by: Daniel Mescheder <dmesch@amazon.com >
2026-01-19 03:27:08 -08:00
Nicolò Lucchesi
74c583bc50
[Core] Whisper support torch.compile ( #30385 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-19 10:02:31 +00:00
Andreas Karatzas
c0a350ca73
[ROCm][CI] Add ROCm attention backend support for EAGLE DP tests ( #32363 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-19 09:57:54 +00:00
Yuxuan Zhang
71832ba71e
[GLM-4.7] GLM Model support for GLM-Lite ( #31386 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Yuxuan Zhang <2448370773@qq.com >
2026-01-19 01:18:38 -08:00
Matt
11bbf86f6a
[CI][Hardware][AMD] Fix test_rotary_embedding_mla_cache_fused ( #32408 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-19 08:25:47 +00:00
Hyunkyun Moon
3c8740aacb
[Frontend] Add render endpoints for prompt preprocessing ( #32473 )
...
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com >
Signed-off-by: Hyunkyun Moon <mhg5303@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-19 12:21:46 +08:00
Alex Brooks
7518a3dc65
[CI/Build] Use Common Event Map Fixture in Harmony / MCP Server Tests ( #32531 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2026-01-19 04:05:51 +00:00
honglyua
976af2f314
[BugFix] Fix embed_input_ids argument error of QwenVLForConditionalGeneration ( #32462 )
2026-01-19 03:06:02 +00:00
Woosuk Kwon
9a1f16da1e
[Model Runner V2] Refactor update_states ( #32562 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-18 17:32:42 -08:00
Woosuk Kwon
bb1848cd62
[Model Runner V2] Support VLM ( #32546 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-18 16:58:51 -08:00
Vadim Gimpelson
6101a26dc9
[BUGFIX] Fix degenerate strides in TRTLLM query tensors for FlashInfer backend. Fixes issue #32353 ( #32417 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-01-18 16:57:32 -08:00
Iryna Boiko
f5d1740030
[Bugfix] Add OOT backend option ( #32471 )
...
Signed-off-by: Iryna Boiko <iboiko@habana.ai >
2026-01-18 22:20:39 +00:00
Wentao Ye
eebc58df0c
[Refactor] Remove unused cutlass moe problem size function ( #32047 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-18 12:46:59 -08:00
Wentao Ye
16de822c71
[Refactor] Remove unused file pallas_kv_cache_update.py ( #32433 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-18 12:46:39 -08:00
Deming
5480c6b1fa
[Doc] Correct comment for _jobs dict in OffloadingConnectorWorker ( #32556 )
2026-01-18 12:46:00 -08:00
Andrey Khalyavin
ba29ab441e
Use the same memory for workspace13 and fused_output. ( #31531 )
...
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru >
2026-01-18 19:14:22 +00:00
Robert Shaw
afc3622602
[CI] Move Distributed Tests from H200 -> H100 ( #32555 )
2026-01-18 10:25:23 -08:00
bnellnm
327a02d8db
[MoE Refactor] Separate Router into OO Classes ( #30623 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-01-18 11:40:49 -05:00
tjp_zju
2f03035a61
"refactor: refactor_repeated_interfaces" ( #32486 )
...
Signed-off-by: tom-zju <tanjianpingzju1990@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-18 22:07:01 +08:00
Isotr0py
38bf2ffb21
[Bugfix] Fix GLM-ASR audio encoder RoPE dim ( #32540 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-18 19:17:59 +08:00
Li Xie
c826c72a96
[Model] Support Step1 Model ( #32511 )
...
Signed-off-by: xieli <xieli@stepfun.com >
2026-01-18 10:20:46 +00:00
Canlin Guo
fe36bf5e80
[Model] Remove the unnecessary dtype conversion in MiniCPM ( #32523 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com >
2026-01-18 08:07:28 +00:00
Woosuk Kwon
963dc0b865
[Model Runner V2] Minor optimization for eagle input processing ( #32535 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-17 21:55:17 -08:00
Isotr0py
8cc26acd8b
[Performance] Improve Triton prefill attention kernel's performance ( #32403 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-17 20:19:59 -08:00
Robert Shaw
4a6af8813f
[MoE Refactor] Move Test Impl into Test Dirs ( #32129 )
...
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
2026-01-18 12:16:59 +08:00
Woosuk Kwon
4147910f1e
[Model Runner V2] Move mrope_positions buffer to MRopeState ( #32532 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-17 20:09:48 -08:00
Karan Bansal
3055232ba0
[Feature] Add FIPS 140-3 compliant hash algorithm option for multimodal hashing ( #32386 )
...
Signed-off-by: Karan Bansal <karanb192@gmail.com >
2026-01-18 11:02:01 +08:00
Shengqi Chen
965765aef9
[build] fix cu130 related release pipeline steps and publish as nightly image ( #32522 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2026-01-17 18:36:11 -08:00
Mritunjay Kumar Sharma
9e078d0582
[CI/Build][Docker] Add centralized version manifest for Docker builds ( #31492 )
...
Signed-off-by: Mritunjay Sharma <mritunjay.sharma@chainguard.dev >
2026-01-17 13:45:30 +00:00
Guofang.Tang
2b99f210f5
[Misc] Fix typo: seperator -> separator in flashmla_sparse.py ( #32411 )
...
Signed-off-by: Guofang Tang <tinggofun@gmail.com >
Co-authored-by: Guofang Tang <tinggofun@gmail.com >
2026-01-17 12:18:30 +00:00
Kim Hee Su
1646fea672
[Model] Molmo2: Enable quantized weight mapping for vision backbone ( #32385 )
...
Signed-off-by: kimheesu <wlskaka4@gmail.com >
2026-01-17 09:33:05 +00:00
Paul Pak
d3317bbba4
[Models] Lfm2Moe: minor name changes for resolving lora conflicts ( #29063 )
...
Signed-off-by: Paul Pak <paulpak58@gmail.com >
2026-01-16 22:12:55 -08:00
Shengqi Chen
8e61425ee6
[CI] Implement uploading to PyPI and GitHub in the release pipeline, enable release image building for CUDA 13.0 ( #31032 )
2026-01-17 04:52:33 +00:00
Matthew Bonanni
2e7c89e708
Revert "[Attention][MLA] Make FLASHINFER_MLA the default MLA backen… ( #32484 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-17 04:42:39 +00:00
vanshil shah
037a6487af
apply _validate_input to MistralTokenizer token-id chat prompts ( #32448 )
...
Signed-off-by: Vanshil Shah <vanshilshah@gmail.com >
2026-01-17 03:23:45 +00:00
Simon Mo
5a3050a089
[Docs][Governance] Add @robertshaw2-redhat to lead maintainers group ( #32498 )
...
Co-authored-by: Claude <noreply@anthropic.com >
2026-01-16 18:35:49 -08:00
Chenyaaang
484e22bc18
[TPU][Core] Enable Pipeline Parallelism on TPU backend ( #28506 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2026-01-16 15:29:20 -08:00
Lucas Wilkinson
ca21288080
[CI] Fix OOM in Hopper Fusion E2E Tests (H100) ( #32489 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-16 21:27:16 +00:00
Andrew Xia
4c82b6fac7
[responsesAPI] allow tuning include_stop_str_in_output ( #32383 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2026-01-16 21:14:40 +00:00
Xin Yang
a884bc62d6
[LoRA] Update LoRA expand kernel heuristic ( #32425 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-16 18:38:07 +00:00
Hashem Hashemi
7a1030431a
Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. ( #29843 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-01-16 11:45:04 -06:00
Wentao Ye
9fd918e510
[CI] Update deepgemm to newer version ( #32479 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-17 01:18:05 +08:00
Ilya Markov
c9a533079c
[EPLB][BugFix]Possible deadlock fix ( #32418 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-01-16 09:11:01 -05:00
rasmith
6ca4f400d8
[CI][AMD] Skip test_permute_cols since the kernel is not used and not built for ROCm ( #32444 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
2026-01-16 16:22:53 +08:00
Cyrus Leung
180e981d56
[Chore] Replace swish with silu ( #32459 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-16 08:22:45 +00:00
Micah Williamson
b84c426a8c
[ROCm][CI] Skip Qwen3-30B-A3B-MXFP4A16 Eval Test On Non-CUDA Platforms ( #32460 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-16 00:17:44 -08:00
Rabi Mishra
b66b0d6abb
fix(rocm): Enable non-gated MoE (is_act_and_mul=False) support on ROCm ( #32244 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-01-16 15:31:10 +08:00
Hongxin Xu
03da3b52ef
[Bugfix] Refactor to support DP parallel in R3 ( #32306 )
...
Signed-off-by: xhx1022 <1737006628@qq.com >
Co-authored-by: arlenxu <arlenxu@tencent.com >
2026-01-16 15:13:58 +08:00
Lucas Wilkinson
14ce524249
[CI] Breakup h200 tests ( #30499 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-16 06:23:22 +00:00
wang.yuqi
4ae77dfd42
[Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest ( #32395 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-16 06:17:04 +00:00
XiongfeiWei
73f635a75f
[Bug] Add TPU backend option ( #32438 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2026-01-16 05:17:12 +00:00
cjackal
35bf5d08e8
[bugfix] Fix online serving crash when text type response_format is received ( #26822 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com >
Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com >
2026-01-16 12:23:54 +08:00
Kebe
5de6dd0662
[Bugfix] [DeepSeek-V3.2] fix sparse_attn_indexer padding ( #32175 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-16 03:21:55 +00:00
ltd0924
709502558c
[Model] Add Step3vl 10b ( #32329 )
...
Signed-off-by: luotingdan <luotingdan@stepfun.com >
Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: luotingdan <luotingdan@stepfun.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-15 19:04:16 -08:00
Micah Williamson
46f8a982b1
[ROCm][CI] Enable AITER Unified Attention On ROCm For gpt-oss Test ( #32431 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-16 00:55:57 +00:00
Matthew Bonanni
bcf2333cd6
[CI] Fix LM Eval Large Models (H100) ( #32423 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-16 00:52:49 +00:00
Michael Goin
83239ff19a
Add thread_n=64 support to Marlin MoE ( #32360 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-15 16:45:44 -08:00
TomerBN-Nvidia
c277fbdf31
[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors ( #32257 )
...
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com >
2026-01-15 16:15:05 -08:00
Wentao Ye
aca5c51487
[Refactor] Remove unused file ( #32422 )
2026-01-15 15:59:38 -07:00
Yongye Zhu
31c29257c8
[MoE Refactor][17/N] Apply Refactor to Bf16 ( #31827 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-01-15 12:53:40 -08:00
Aleksandr Malyshev
8c11001ba2
[ROCM] DSfp4 mla projection gemms weight dynamic quantization ( #32238 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2026-01-15 14:13:08 -06:00
Richard Zou
bd292be0c0
[BugFix] Python file source reading can fail on UnicodeDecodeError ( #32416 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-15 20:01:41 +00:00
TJian
41c544f78a
[ROCm] [CI] [Release] Rocm wheel pipeline with sccache ( #32264 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-01-16 02:56:18 +08:00
Michael Goin
1be5a73571
[UX] Use kv_offloading_backend=native by default ( #32421 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-15 18:55:11 +00:00
Lucas Wilkinson
c36ba69bda
[BugFix] Fix assert x_s.shape[-1] == x_q.shape[-1] // group_shape[1] in Blackwell Quantized MoE Test ( #32362 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-15 10:19:12 -08:00
Matthias Gehre
047413375c
[Attention][AMD] Make flash-attn optional ( #30361 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-01-15 17:18:24 +00:00
smit kadvani
74e4bb1c5a
fixing podman build issue ( #32131 )
...
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com >
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-01-15 11:07:08 -06:00
Wentao Ye
b34474bf2c
[Feature] Support async scheduling + PP ( #32359 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-15 12:06:23 -05:00
Woosuk Kwon
6218034dd7
[Model Runner V2] Support FlashInfer backend & Fix CUDA Graph bug [1/2] ( #32348 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-15 08:59:23 -08:00
Pleaplusone
77c16df31d
[ROCm][Bugfix] Disable hip sampler to fix deepseek's accuracy issue on ROCm ( #32413 )
...
Signed-off-by: ganyi <ygan@amd.com >
2026-01-15 16:35:47 +00:00
Pleaplusone
130d6c9514
[ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for AiterFlashAttentionBackend ( #29887 )
...
Signed-off-by: ganyi <ygan@amd.com >
2026-01-15 15:29:53 +00:00
Dipika Sikka
361dfdc9d8
[Quant] Support MXFP4 W4A16 for compressed-tensors MoE models ( #32285 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-15 07:25:55 -08:00
Matthew Bonanni
8ebfacaa75
[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill ( #32339 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-15 09:49:57 -05:00
brian033
b89275d018
[ROCm] Improve error handling while loading quantized model on gfx120… ( #31715 )
...
Signed-off-by: brian033 <85883730+brian033@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-01-15 04:16:00 -08:00
Cyrus Leung
28459785ff
[3/N] Group together media-related code ( #32406 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-15 11:52:12 +00:00
rasmith
8853a50af2
[CI][BugFix][AMD][FP8] Fix test_rms_norm so it runs correctly on ROCm ( #32372 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2026-01-15 19:05:54 +08:00
Douglas Lehr
c5891b5430
[ROCM] Add ROCm image build to release pipeline ( #31995 )
...
Signed-off-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
2026-01-15 19:01:40 +08:00
Chauncey
707b44cc28
[Refactor] [11/N] to simplify the mcp architecture ( #32396 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-15 18:49:31 +08:00
rongfu.leng
3a4e10c847
[Benchmark] [Feature] add vllm bench sweep startup command ( #32337 )
...
Signed-off-by: lengrongfu <lenronfu@gmail.com >
2026-01-15 09:25:46 +00:00
Cyrus Leung
cbbae38f93
[2/N] Move cache factories to MM registry ( #32382 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-15 01:02:30 -08:00
Cyrus Leung
cdba4c74b3
[Model] Avoid token selection in SigLIP pooling head ( #32389 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-15 17:01:59 +08:00
seeksky
a52d1396a7
fix: avoid crash on zero-arg tool calls in glm4 parser ( #32321 )
...
Signed-off-by: seekskyworld <djh1813553759@gmail.com >
2026-01-15 08:45:59 +00:00
dtc
1e584823f8
[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode ( #32314 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com >
2026-01-15 16:31:16 +08:00
Chauncey
4c1c501a7e
[Refactor] [10/N] to simplify the vLLM openai completion serving architecture ( #32369 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-15 07:41:34 +00:00
Andreas Karatzas
ae1eba6a9a
[ROCm][CI] Pin transformers 4.57.3 to fix jina test failures ( #32350 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-15 15:19:34 +08:00
Ofir Zafrir
e9ec2a72d8
[Bugfix] Fix stale common_attn_metadata.max_seq_len in speculative decoding with Eagle ( #32312 )
...
Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com >
2026-01-15 06:39:37 +00:00
Lucas Wilkinson
2c9b4cf5bf
[BugFix] Fix DeepSeek-V3.1 + DeepGEMM incompatible scale shapes ( #32361 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com >
2026-01-15 06:32:22 +00:00
Ning Xie
9d7ae3fcdb
[code clean] remove duplicate check ( #32376 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-15 05:29:34 +00:00
rasmith
3c2685645e
[CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol ( #32201 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
2026-01-15 05:04:34 +00:00
Micah Williamson
773d7073ae
[ROCm][CI] Disable async scheduling on ROCm for test_structured_output[meta-llama/Meta-Llama-3.1-8B-Instruct-xgrammar-auto-speculative_config9] ( #32355 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-15 04:53:43 +00:00
kzwrime
edadca109c
[Bugfix] Add CpuCommunicator.dispatch and combine to fix DP+MoE inference ( #31867 )
...
Signed-off-by: kunzh <zhikun.wu@outlook.com >
2026-01-15 04:50:48 +00:00
Li Wang
d86fc23bdd
[Misc] Remove redundant line ( #32366 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
2026-01-15 04:29:56 +00:00
Shiyan Deng
375e5984fe
Support configure skip_special_tokens in openai response api ( #32345 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
2026-01-15 04:07:26 +00:00
baonudesifeizhai
19b251fe3d
Fix optional parameter parsing in MiniMax M2 tool parser #32278 ( #32342 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2026-01-15 04:05:48 +00:00
Ryan Rock
15422ed3f7
[CI/Build][Hardware][AMD] Fix v1/shutdown ( #31997 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2026-01-15 04:01:42 +00:00
dolpm
8471b27df9
[compile] raise on compile_size implicit padding ( #32343 )
...
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com >
2026-01-14 20:46:56 +00:00
Lumosis
66652e8082
[BugFix] Assign page_size_padded when unifying kv cache spec. ( #32283 )
...
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com >
2026-01-14 20:10:01 +00:00
vllmellm
e27078ea80
[Bugfix][ROCm][performance] Resolve the performance regression issue of the Qwen3-Next-80B-A3B-Thinking under rocm_atten ( #32336 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-14 19:32:48 +00:00
Aleksandr Samarin
d084e9fca7
[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding ( #26291 )
...
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com >
Signed-off-by: southfreebird <yvorott@gmail.com >
Co-authored-by: southfreebird <yvorott@gmail.com >
2026-01-14 13:20:52 -05:00
qli88
3a612322eb
[CI] Move rixl/ucx from Dockerfile.rocm_base to Dockerfile.rocm ( #32295 )
...
Signed-off-by: Qiang Li <qiang.li2@amd.com >
2026-01-14 16:53:36 +00:00
Cyrus Leung
9ea07b41da
[1/N] Reorganize multimodal processing code ( #32327 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-14 15:25:31 +00:00
Ning Xie
552b262936
rename tokenize serving api request id prefix to tokenize ( #32328 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-14 14:52:20 +00:00
Chauncey
00e6402d56
[Frontend] track responsesAPI server_load ( #32323 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-14 12:00:37 +00:00
Shanshan Shen
ce0946249d
[Misc] Make mem utils can be reused by other platforms ( #32322 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-01-14 03:46:01 -08:00
Cyrus Leung
3f28174c6a
[Frontend] Standardize use of create_error_response ( #32319 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-14 11:22:26 +00:00
Chauncey
769d0629e1
[Refactor] [9/N] to simplify the vLLM openai translations serving ar chitecture ( #32313 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-14 10:20:58 +00:00
Cyrus Leung
90db5b31e4
[Refactor] Move top-level dummy data generation to registry ( #32310 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-14 02:17:46 -08:00
Roger Wang
b8199f6049
[Model] Re-implement Qwen3Omni Audio Encoder ( #32167 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-01-14 15:40:30 +08:00
sangho.lee
7e6f123810
Add Molmo2 multimodal model support ( #30997 )
...
Signed-off-by: sanghol <sanghol@allenai.org >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-14 15:33:09 +08:00
Chauncey
9312a6c03a
[Refactor] [8/N] to simplify the vLLM openai responsesapi_serving architecture ( #32260 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-14 07:26:24 +00:00
Michael Goin
6388b50058
[Docs] Add docs about OOT Quantization Plugins ( #32035 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-14 15:25:45 +08:00
Hongxia Yang
048bb59728
AMD CI Test - unskip moe_sum test and moe_align_block_size tests ( #32039 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
2026-01-13 23:25:10 -08:00
Angela Yi
7933638051
[misc] Remove is_torch_equal_or_newer(2.4) cases ( #32296 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-13 23:22:07 -08:00
David
6b176095e3
[Build] Relax anthropic version pin from ==0.71.0 to >=0.71.0 ( #32289 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-13 23:21:39 -08:00
Andreas Karatzas
9d0d7f48d5
[ROCm][CI] Handle missing vision_config in Isaac model attention patch ( #32281 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-14 07:21:26 +00:00
Yi Liu
50632adc58
Consolidate Intel Quantization Toolkit Integration in vLLM ( #31716 )
...
Signed-off-by: yiliu30 <yi4.liu@intel.com >
2026-01-14 07:11:30 +00:00
Micah Williamson
6fa6e7ef0c
[ROCm][CI] Disable Async Scheduling For Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test ( #32275 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-14 13:29:42 +08:00
Woosuk Kwon
90c0836902
[Model Runner V2] Refactor Sampler ( #32245 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-13 17:58:12 -08:00
Roberto L. Castro
8ef50d9a6b
[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding ( #30885 )
...
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com >
2026-01-13 15:22:53 -08:00
emricksini-h
2a60ac91d0
[Improvement] Persist CUDA compat libraries paths to prevent reset on apt-get ( #30784 )
...
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai >
2026-01-13 14:35:05 -08:00
Michael Goin
9e65bb4ef4
Add mergify label job for "bug" in PR titles ( #31980 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-13 14:28:19 -08:00
Simon Mo
0db574b185
[Build] Add scripts for cherry-picking and trigger build ( #32282 )
...
Co-authored-by: Cursor Agent <cursoragent@cursor.com >
2026-01-13 13:21:05 -08:00
HappyAmazonian
2f4a71daf2
[Misc] Add In-Container restart capability through supervisord for sagemaker entrypoint ( #28502 )
...
Signed-off-by: Shen Teng <sheteng@amazon.com >
Signed-off-by: HappyAmazonian <91216626+HappyAmazonian@users.noreply.github.com >
2026-01-13 13:06:10 -08:00
Rabi Mishra
69f8a0ea37
fix(rocm): Use refresh_env_variables() for rocm_aiter_ops in test_moe ( #31711 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-01-13 19:11:54 +00:00
Wentao Ye
f28125d87b
[Perf] Optimize grouped topk kernel, 1.2%~2% E2E Throughput improvement ( #32058 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-13 10:58:18 -08:00
Dmitry Tokarev
46f8c6b725
Fix CUDA 13 wheel installation doc ( #32276 )
...
Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com >
2026-01-13 10:48:37 -08:00
Andrew Xia
af54d2e2d0
[responseAPI] support partial message generation ( #32100 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Signed-off-by: Andrew Xia <mitandrewxia@gmail.com >
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
Co-authored-by: Andrew Xia <axia@fb.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-01-13 10:41:26 -08:00
Sage Moore
6beef12b9b
[EPLB][Cleanup] Remove is_async_enabled from EplbModelState ( #32050 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2026-01-13 18:19:03 +00:00
Mark McLoughlin
ab74b2a27a
[Trivial] Remove duplicate enable_mfu_metrics ( #32246 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-01-14 01:09:23 +08:00
Matthew Bonanni
2263d44b68
[4/N][Attention] Move MLA common to model_executor ( #32060 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-13 09:08:45 -08:00
Mathis Felardos
4f3676e726
nixl_connector: export UCX_MEM_MMAP_HOOK_MODE=none to avoid a UCX memory leak ( #32181 )
...
Signed-off-by: Mathis Felardos <mathis@mistral.ai >
2026-01-13 16:21:10 +00:00
Martin Hickey
510265472c
[BugFix] [KVConnector] Fix KV events for LMCache connector ( #32169 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-13 15:50:34 +00:00
Chauncey
4f02cb2eac
[Refactor] [7/N] to simplify the vLLM lora serving architecture ( #32251 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-13 15:37:34 +00:00
Cyrus Leung
252c011012
[Refactor] Remove MultiModalProfiler ( #32254 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-13 15:10:20 +00:00
Matthew Bonanni
98f60e5acb
[6/N][Attention] Move utils to more appropriate locations ( #32215 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-13 05:38:52 -08:00
Chauncey
fefce49807
[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture ( #32240 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-13 13:01:39 +00:00
Mickaël Seznec
a5bbbd2f24
[Quantization] fix: overflow with static per-tensor scaling ( #29867 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-13 12:56:01 +00:00
Nicolò Lucchesi
8c8653b672
[Docs] Nixl Usage recommend fail kv_load_failure_policy ( #32198 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-13 12:51:57 +00:00
Cyrus Leung
232214b2ae
[Bugfix] Replace PoolingParams.normalize with use_activation ( #32243 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-13 10:45:42 +00:00
Cyrus Leung
eb28e8068d
[Refactor] Remove get_encoder_dummy_data ( #32241 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-13 09:21:23 +00:00
YunzhuLu
542a4059b2
[Model] Use mm_position to compute mrope positions for Qwen2-VL/2.5-VL ( #32126 )
...
Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-13 09:04:29 +00:00
Andreas Karatzas
df7e12715f
[ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing ( #32061 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-13 15:14:30 +08:00
Roy Wang
44c34f22d9
[Doc] Update installation from source command ( #32239 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2026-01-12 23:10:27 -08:00
Xingyu Liu
80221e1884
[BugFix]Fix eagle draft_model_config and add tests ( #31753 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com >
2026-01-12 23:09:36 -08:00
Andreas Karatzas
5e714f7ff4
[ROCm][CI] Fix HuggingFace flash_attention_2 accuracy issue in Isaac vision encoder ( #32233 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-12 22:33:59 -08:00
Andreas Karatzas
11b6af5280
[ROCm][Bugfix] Fix Mamba batched decode producing incorrect output ( #32099 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-13 05:46:53 +00:00
Wentao Ye
2a719e0865
[Perf] Optimize requests abort ( #32211 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-13 04:11:37 +00:00
Andrew Bennett
f243abc92d
Fix various typos found in docs ( #32212 )
...
Signed-off-by: Andrew Bennett <potatosaladx@meta.com >
2026-01-13 03:41:47 +00:00
Sanghoon Yoon
60b77e1463
[Frontend] Add reasoning_effort to OpenAIServing._preprocess_chat() ( #31956 )
...
Signed-off-by: Sanghoon Yoon <seanyoon@kakao.com >
2026-01-13 03:21:49 +00:00
cjackal
15b33ff064
[Misc] improve warning/assert messages ( #32226 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
2026-01-13 03:11:23 +00:00
Nick Hill
c6bb5b5603
[BugFix] Fix engine crash caused by chat tools + response_format ( #32127 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-13 10:33:14 +08:00
Nick Hill
9273a427b5
[Misc] Allow enabling NCCL for DP sync when async scheduling ( #32197 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-13 02:03:08 +00:00
Cyrus Leung
78d13ea9de
[Model] Handle trust_remote_code for transformers backend ( #32194 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-13 09:30:12 +08:00
Andrew Xia
a307ac0734
[responsesAPI] add unit test for optional function tool call id ( #32036 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2026-01-12 16:14:54 -08:00
Divakar Verma
a28d9f4470
[ROCm][CI] Handle pytest status code 5 when a shard isn't allocated any tests ( #32040 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-01-12 17:35:49 -05:00
xuebwang-amd
629584bfc9
[Kernel][MoE] fix computation order of MoE weight multiplication and improve flow ( #31962 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
2026-01-12 17:17:30 -05:00
Woosuk Kwon
0a7dd23754
[Model Runner V2] Add support for M-RoPE ( #32143 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-12 13:37:43 -08:00
Woosuk Kwon
dec28688c5
[Model Runner V2] Minor refactor for logit_bias ( #32209 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-12 13:08:30 -08:00
Vadim Gimpelson
9f430c94bd
[BUGFIX] Add missed remaping of the names of fp8 kv-scale ( #32199 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-01-12 20:42:06 +00:00
Nicolò Lucchesi
f8bd8394e3
[NIXL][Bugfix] Failure logging overhaul + early metadata free on failure ( #32031 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-12 20:38:49 +00:00
Woosuk Kwon
ca81811bfe
[Model Runner V2] Support logit_bias, allowed_token_ids, min_tokens ( #32163 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-12 11:31:10 -08:00
Lucas Kabela
ad8818bb5e
[Misc][BE] Type coverage for vllm/compilation [3/3] ( #31748 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-12 19:24:38 +00:00
Nicolò Lucchesi
08e8e99ce7
[Misc] Change log level for batch queue log ( #32192 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-12 18:59:31 +00:00
Or Ozeri
2be765b68a
[BugFix] scheduler: Fix ordering preserving of skipped requests ( #32173 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-12 18:39:38 +00:00
Roger Wang
16abe6b85a
[Misc] Set default torch num threads for input processing ( #31879 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-01-12 10:28:16 -08:00
Ilya Markov
1eb61ab34b
[Refactor] EPLB rebalance algo to NumPy ( #30697 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-01-12 18:13:23 +00:00
Kyungmin Lee
3d962d72ab
[BugFix] fix FusedMoE.make_expert_params_mapping in EXAONE-MoE ( #32196 )
...
Signed-off-by: lkm2835 <lkm2835@gmail.com >
2026-01-12 10:00:45 -08:00
Matthew Bonanni
20228cb851
[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py ( #32054 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-12 09:13:56 -08:00
Cyrus Leung
7c0d3c5152
[Benchmark] Share data between SLA runs ( #32184 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-13 01:12:22 +08:00
Nicolò Lucchesi
5b68107411
[Misc][PD] Fix get_attn_backend usage in transfer connectors ( #31988 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-12 18:10:05 +01:00
Asaf Joseph Gardin
8fb2c135be
[Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode ( #32118 )
...
Signed-off-by: Josephasafg <ajgard7@gmail.com >
2026-01-12 17:02:38 +00:00
Cyrus Leung
8863c2b25c
[Model] Standardize pooling heads ( #32148 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-12 17:01:49 +00:00
danielafrimi
3f72639d36
[FIX] Add NO_MUL activation support for modular kernel path ( #31528 )
...
Signed-off-by: dafrimi <dafrimi@nvidia.com >
Signed-off-by: <>
Co-authored-by: root <root@gpu-267.slurm-workers-slurm.slurm.svc.cluster.local >
Co-authored-by: root <root@gpu-537.slurm-workers-slurm.slurm.svc.cluster.local >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: root <root@pool0-01777.cm.cluster >
2026-01-12 11:55:49 -05:00
Jaehyun An
6bc9c8473e
[MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct ( #29384 )
...
Signed-off-by: Jaehyun An <steve.ai@kakaocorp.com >
2026-01-12 16:39:02 +00:00
Kyungmin Lee
63ed2409e8
Add K-EXAONE-236B-A23B ( #31621 )
...
Signed-off-by: lkm2835 <lkm2835@gmail.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: lgai-exaone <exaonemodels@lgresearch.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-12 16:30:50 +00:00
Andy Zhang
95e53d907c
doc: Update model references in supported_models.md ( #32188 )
2026-01-12 08:15:28 -08:00
TJian
0346396e94
[ROCm] [Bugfix] Fix order of mori build in Dockerfile.rocm_base ( #32179 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-01-12 15:33:21 +00:00
Andy Zhang
e68b0dad8b
doc: Update model name for Qwen3-Coder in documentation ( #32185 )
...
Signed-off-by: Andy Zhang <xiazhang@microsoft.com >
2026-01-12 07:10:50 -08:00
Or Ozeri
9cddbdba6d
OffloadingConnector: Add cpu_bytes_to_use configuration ( #24498 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-12 15:00:43 +00:00
Hongxin Xu
49e6b86c91
[Feature] Support recording expert indices for rollout router replay ( #28284 )
...
Signed-off-by: xhx1022 <1737006628@qq.com >
Signed-off-by: Hongxin Xu <70438206+xhx1022@users.noreply.github.com >
Signed-off-by: arlenxu <arlenxu@tencent.com >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: arlenxu <arlenxu@tencent.com >
2026-01-12 06:23:04 -08:00
dtc
0565f1fdec
[P/D] Refactor mooncake connector sender thread using async coroutines ( #31573 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2026-01-12 12:35:35 +00:00
Isotr0py
9dbe1fe960
[Bugfix] Fix missing scale passing for encoder Triton Attention implementation ( #32149 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-12 11:13:41 +00:00
RickyChen / 陳昭儒
a5f89ae296
[Doc] Add documentation for offline API docs feature ( #32134 )
...
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com >
2026-01-12 10:33:48 +00:00
Jee Jee Li
05e8981234
[Doc] Improve LoRA docs ( #32159 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-12 02:19:17 -08:00
XlKsyt
899541bdb1
[doc] fix broken links ( #32158 )
...
Signed-off-by: minimAluminiumalism <caixuesen@outlook.com >
2026-01-12 10:18:38 +00:00
daniel-salib
d7b2e57097
[Frontend] Fix Flaky MCP Streaming Test ( #32153 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2026-01-12 18:03:32 +08:00
Andika Rachman
5e034f2e3d
[cpu][bench] Add Fused MoE Micro Benchmark for CPU Backend ( #32092 )
...
Signed-off-by: andikarachman <andika.rachman.y@gmail.com >
2026-01-12 10:03:28 +00:00
Nicolò Lucchesi
22970c1626
[Misc] Disable default --ready-check-timeout-sec extra call in vllm bench ( #30975 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-12 01:58:21 -08:00
Cyrus Leung
600aaab8d6
[Model] Remove incorrect SupportsPP from MTP models ( #32150 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-12 01:19:30 -08:00
wang.yuqi
60446cd684
[Model] Improve multimodal pooling examples ( #32085 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-12 07:54:09 +00:00
Cyrus Leung
9101dc756c
[Model] Avoid hardcoding pooling type ( #32119 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-11 21:28:12 -08:00
Woosuk Kwon
025a32f9ed
[Model Runner V2] Remove async barrier ( #32083 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-11 20:24:30 -08:00
Woosuk Kwon
19504ac07f
[Model Runner V2] Skip building deprecated fields in attn metadata ( #32132 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-11 14:31:04 -08:00
Jiangyun Zhu
3df619ac94
[CI] fix test_concat_and_cache_mla_rope_fused ( #32117 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-01-11 15:11:11 +00:00
Ning Xie
d74132ca3b
fix offline inference chat response prompt ( #32088 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-11 14:01:18 +00:00
maang
a34abc49b7
[FixBug] Improve exception string in tensorizer.py ( #31680 )
...
Signed-off-by: maang <maang_h@163.com >
Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-11 05:01:53 -08:00
rongfu.leng
d70249e2e9
[Misc] fix this log format not space ( #32112 )
...
Signed-off-by: lengrongfu <lenronfu@gmail.com >
2026-01-11 05:01:16 -08:00
Cyrus Leung
a374532111
[CI/Build] Separate out flaky responses API tests ( #32110 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-11 05:01:12 -08:00
Isotr0py
cee7436a26
[Misc] Make scipy as optional audio/benchmark dependency ( #32096 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-11 00:18:57 -08:00
Or Ozeri
4c16ba617f
[KVConnector] OffloadingConnector: Fix bug in handling of preemptions ( #29870 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-11 08:05:36 +00:00
Matt
bde57ab2ed
[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group ( #31713 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-10 23:19:46 -08:00
Fadi Arafeh
9103ed1696
[CPU][BugFix] Disable AOT Compile for CPU ( #32037 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-10 23:15:49 -08:00
Laith Sakka
46eb30f519
make assume_32_bit_indexing configurable ( #32044 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2026-01-10 23:15:46 -08:00
Andy Liu
0dd63639be
[MTP][GLM][Bugfix] Fixed .weight_scale loading logic that dropped MTP prediction accuracy with fp8+mtp ( #32101 )
...
Signed-off-by: Andy Liu <andyliu@roblox.com >
2026-01-10 23:14:54 -08:00
Cyrus Leung
ef96fa3f1f
[Benchmark][2/2] Use spline interpolation to tune SLA variables ( #32095 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-10 20:27:27 -08:00
Or Ozeri
2a4dbe24ea
[BugFix] Wait for compute before offloading KV to CPU ( #31341 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-10 22:25:08 +00:00
RickyChen / 陳昭儒
8020a60402
[Bugfix] Fix Qwen3-VL-Reranker model loading for sequence classification ( #32089 )
...
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-10 12:40:09 -08:00
Vadim Gimpelson
e15a5ff07b
[MISC] Add strict contiguity check for FlashInfer attention tensors ( #32008 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com >
2026-01-10 12:40:05 -08:00
Vensen
6ea001cfb7
[Bugfix][Quantization] Ensure input contiguity in per_token_quant_int8 ( #31637 )
...
Signed-off-by: vensen <vensenmu@gmail.com >
2026-01-10 12:40:02 -08:00
shyeh25
1c46dea001
Revert "[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#308… ( #31617 )
...
Signed-off-by: shyeh25 <206795756+shyeh25@users.noreply.github.com >
2026-01-10 12:39:59 -08:00
Or Ozeri
028599739d
[BugFix] scheduler: Fix resuming of preempted requests after async load ( #31583 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-10 12:39:25 -08:00
gnovack
d1fd802fa3
fused_moe_kernel - cast accumulator after applying router weights ( #32002 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2026-01-11 04:36:45 +08:00
Xin Yang
543c23be78
[LoRA][Perf] Improve FusedMoE LoRA performance for small rank ( #32019 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-10 11:04:18 -08:00
jvlunteren
b8bf5c45bb
[Kernel] Optimize Sliding Window Attention in 3D Triton Kernel ( #31984 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com >
2026-01-10 18:13:44 +00:00
Michael Goin
e6c6f2c79d
[Quant] Support MXFP4 W4A16 for compressed-tensors dense models ( #31926 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-01-10 06:44:35 -08:00
Jeremy Teboul
07286ec5a6
[Bugfix] Fix integer overflow in Gemma3n audio processing ( #31657 )
...
Signed-off-by: Jeremy Teboul <jeremyte@meta.com >
2026-01-10 17:52:53 +08:00
Ning Xie
14fc7a68c7
[Bugfix] fix offline chat output prompt ( #32076 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-10 07:50:57 +00:00
Cyrus Leung
5f2385a4c8
[Benchmark][1/2] Generalize SLA criterion validation from binary flags to margins ( #32075 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-10 07:11:03 +00:00
Frelam
a01a1c0d69
[Bugfix] fix encoder cache leak of waiting requests in scheduler to solve stuck in CPU scheduling ( #31857 )
...
Signed-off-by: frelam <frelam112233@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-10 06:27:58 +00:00
Lucas Wilkinson
da6709c9fe
[Misc] Delay deprecation of CommonAttentionMetadata properties ( #32074 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-09 21:06:44 -08:00
Andreas Karatzas
d83becd503
[ROCm][CI] Fix flaky test_function_calling_with_stream and reduce schema test examples ( #32063 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-10 05:02:35 +00:00
roikoren755
0c9614876e
Update modelopt KV cache quantization resolution to new scheme ( #31895 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-01-10 04:54:13 +00:00
Cyrus Leung
583a90e005
[Refactor] Separate sequence and token pooling types ( #32026 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-10 04:53:24 +00:00
maang
52d428295d
[Core] Refactor ColumnParallelLinear: remove unused parameter and optimize forward ( #31939 )
...
Signed-off-by: maang <maang_h@163.com >
2026-01-10 04:19:49 +00:00
Kevin McKay
c60578de0a
[Bugfix][Hardware][AMD] Use dynamic WARP_SIZE in sampler vectorized_process ( #31295 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2026-01-10 03:57:38 +00:00
PatrykSaffer
80fead8bf6
Fuse RoPE and MLA KV-cache write ( #25774 )
...
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com >
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai >
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-09 19:18:37 -08:00
Akshat Shrivastava
e45946bd91
feature/issac 0.2 ( #31550 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-10 03:18:05 +00:00
Lucas Kabela
ea6d067a2a
[Misc][LLaMa4] Compile LLaMa Vision Encoder ( #30709 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-09 22:01:38 -05:00
Ning Xie
abd9224280
resolve pydantic error in startup benchmark ( #31348 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-10 02:41:27 +00:00
Kevin McKay
4dc0d606b7
[Bugfix] Narrow broad exceptions in compilation backends ( #31616 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-01-09 21:39:22 -05:00
Micah Williamson
ac0675ff6b
[CI] Allow Deprecated Quantization For LM Eval Tests ( #32065 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-09 19:10:47 -07:00
Wentao Ye
e18464a57d
[Perf] Optimize async scheduling placeholder using empty ( #32056 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-10 00:46:11 +00:00
Russell Bryant
1963245ed1
[Core] Use weights_only=True with torch.load ( #32045 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2026-01-10 00:28:57 +00:00
Matthew Bonanni
0308901975
[2/N][Attention] Fix pre-commit errors ( #32052 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-10 00:27:15 +00:00
Lucas Kabela
aaf4b70aae
[Misc][BE] Type coverage for vllm/compilation [2/3] ( #31744 )
2026-01-09 18:30:38 -05:00
Nick Hill
3adffd5b90
[Misc] Enable async scheduling by default with spec decoding ( #31998 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-09 23:09:34 +00:00
zhrrr
97ba96fbe9
[perf][async] support non cpu sync get logprob tensors for spec ( #31336 )
...
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2026-01-09 21:24:51 +00:00
Chendi.Xue
94578127a4
[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout ( #30275 )
2026-01-09 21:22:19 +00:00
Matthew Bonanni
2612ba9285
[1/N][Attention] Restructure attention: move files ( #31916 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-09 13:10:24 -08:00
Andrew Xia
1f8b7c536b
[responsesAPI] fix incomplete_messages for simple/parsable context ( #31836 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2026-01-09 21:00:57 +00:00
Lucas Wilkinson
0a0aa07747
[Quant] Make static quant support all group shapes ( #30833 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-09 12:49:27 -08:00
jiahanc
f9e2a75a1e
[fix] add cutedsl to global sf ( #32001 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
2026-01-09 12:03:02 -08:00
Runkai Tao
a4d5d663e2
Add unpermute-aware fused MoE path and small-batch fallback ( #29354 )
...
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-09 12:58:39 -07:00
Jeremy Teboul
657e9c0e18
[Fix] Introduce audio channels spec ( #31595 )
...
Signed-off-by: Jeremy Teboul <jeremyte@meta.com >
2026-01-09 19:34:51 +00:00
Wentao Ye
308feab33f
[Perf] Optimize cutlass moe problem size calculation, 5.3% E2E Throughput improvement, 2.2% TTFT improvement ( #31830 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-09 11:13:43 -08:00
Wentao Ye
28ae32a5d3
[Refactor] Remove numpy split in async scheduling ( #32034 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-09 19:09:02 +00:00
Andrew Xia
f32c629eb4
[Frontend][gpt-oss] Allow system message to overwrite model identity ( #31737 )
...
Signed-off-by: lacora <hyelacora@gmail.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: lacora <hyelacora@gmail.com >
Co-authored-by: Andrew Xia <axia@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-01-09 14:03:57 -05:00
Yifan Qiao
cd4a95e3aa
[Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinator ( #31707 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu >
2026-01-09 10:53:20 -08:00
Michael Goin
d5ec6c056f
[UX] Add vLLM model inspection view ( #29450 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-09 10:12:35 -07:00
Shanshan Shen
08d954f036
[Doc] Add developer guide for CustomOp ( #30886 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-01-09 16:21:11 +00:00
Kevin Šuc
ac9f9330e6
Rename --exclude-log-deltas to --enable-log-deltas ( #32020 )
...
Signed-off-by: Catacomba <kevinsuc16@gmail.com >
2026-01-09 15:30:40 +00:00
Isotr0py
2d0c5b630e
[Doc] Remove hardcoded Whisper in example openai translation client ( #32027 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-09 14:44:52 +00:00
Michael Goin
34cd32fe30
[Perf][Kernel] Fused SiLU+Mul+Quant kernel for NVFP4 cutlass_moe ( #31832 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-01-09 07:40:33 -07:00
R3hankhan
8e27663b6a
[CPU] Add head sizes 80 and 112 with vec16 fallback ( #31968 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-01-09 22:14:46 +08:00
maang
7cdf7e2fe0
[Model] Remove redundant None check in DeepSeekOCR image input processing ( #32016 )
...
Signed-off-by: maang <maang_h@163.com >
2026-01-09 06:12:44 -08:00
Adolfo Victoria
bbf80ede43
Fix type error ( #31999 )
...
Signed-off-by: Adolfo Victoria <adolfokarim@gmail.com >
Co-authored-by: Adolfo Victoria <adovi@meta.com >
2026-01-09 22:03:32 +08:00
inkcherry
4505849b30
[ROCm][PD] add moriio kv connector. ( #29304 )
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com >
2026-01-09 14:01:57 +00:00
Roger Wang
db07433ce5
[Misc] Skip hashing kwargs if value is None ( #32025 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-01-09 13:20:59 +00:00
Andreas Karatzas
e02706d2d2
[ROCm][CI][V1] Fix nixl_connector test failure and achieve CUDA parity in test_async_scheduling ( #32000 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-09 20:48:32 +08:00
Sophie du Couédic
b474782ad7
[Feature][Benchmarks] Custom dataset: read output length from dataset ( #31881 )
...
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com >
2026-01-09 12:40:59 +00:00
Bofeng Xue
55212c1404
fix: remove duplicate engine_id check in nixl_connector ( #31948 )
...
Signed-off-by: Bofeng BF1 Xue <xuebf1@Lenovo.com >
Co-authored-by: Bofeng BF1 Xue <xuebf1@Lenovo.com >
2026-01-09 12:13:17 +00:00
Xin Yang
e7b68f4d6c
[Bugfix] Fix Triton FusedMoE LoRA ( #30585 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-09 11:46:59 +00:00
vllmellm
1a19e9cd87
[Bugfix][ROCm]Fix Qwen3-Next-80B-A3B-Thinking inference and optimize non-standard block size (544) support under rocm_atten ( #31380 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-09 19:28:02 +08:00
Cyrus Leung
c8ed39b9dd
[Model] Reorganize pooling layers ( #31973 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-09 11:02:14 +00:00
Andreas Karatzas
020732800c
[Bugfix] Fix OpenAPI schema test failures ( #31921 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-09 10:56:20 +00:00
Alex Brooks
dc77cb7129
[Bugfix] Fix Var Length Batched Padding in Granite Speech ( #31906 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2026-01-09 10:28:43 +00:00
gnovack
bde38c11df
fix lora moe sharding when rank < max_lora_rank ( #31994 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-09 14:43:25 +08:00
Xin Yang
707b240d7e
[Bugfix] Fix FusedMoE LoRA w2_output_size ( #31949 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-09 00:54:05 -05:00
Nick Hill
29ce48221c
[Cleanup] Remove obsolete spec decoding compatibility logic ( #32003 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-09 05:44:18 +00:00
TJian
7a05d2dc65
[CI] [ROCm] Fix tests/entrypoints/test_grpc_server.py on ROCm ( #31970 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-01-09 12:54:20 +08:00
Divakar Verma
a1648c4045
[ROCm][CI] Fix test_token_classification.py::test_bert_models ( #31993 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2026-01-09 04:04:33 +00:00
RioS
e2d49ec2a4
[Bugfix] missing tokens occur in harmony streaming ( #30437 )
...
Signed-off-by: RioS <aa248424@gmail.com >
Signed-off-by: Ri0S <aa248424@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-01-09 03:59:34 +00:00
Xin Yang
8413868dab
[Bugfix] Fix typo in FusedMoE LoRA reshape comment ( #31992 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-08 18:46:05 -08:00
zhrrr
8ff4a99566
[Async][Feat] support apply penalty or bad_words for async + spec ( #30495 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-09 02:31:50 +00:00
daniel-salib
a4ec0c5595
[Frontend] Add MCP tool streaming support to Responses API ( #31761 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2026-01-09 09:19:34 +08:00
Robert Shaw
0fa8dd24d2
[Bugfix] Fix Typo from NVFP4 Refactor ( #31977 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-08 16:18:50 -08:00
Max Hu
6ebe34d6fa
[Feature] Add iteration level logging and enhance nvtx marker ( #31193 )
...
Signed-off-by: Max Hu <maxhu@nvidia.com >
Signed-off-by: Max Hu <hyoung2991@gmail.com >
Co-authored-by: Max Hu <maxhu@nvidia.com >
2026-01-09 00:13:39 +00:00
Nick Hill
11cec296dd
[BugFix] Add spec-decode-incompatible request param validation ( #31982 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-09 00:08:21 +00:00
Robert Shaw
5825bbc1f7
[Quantization] Deprecate Long Tail of Schemes ( #31688 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-08 19:07:45 -05:00
Yongye Zhu
d62cfe546d
[MoE Refactoring][Bugfix]Wrap WNA16 Triton kernel into mk and change compressed tensor kernel selection ( #31752 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-08 19:01:30 -05:00
Lucas Wilkinson
6cdf015c3c
[Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future ( #31747 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-08 15:20:49 -08:00
Dipika Sikka
5d3b6097ad
[Compressed-Tensors] Simplify NVFP4 Conditions, enable marlin support for NVFP4A16 MoEs ( #30881 )
2026-01-08 17:45:17 -05:00
bnellnm
e74698c27a
[Misc][Refactor] Add FusedMoERouter object ( #30519 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-01-08 20:52:55 +00:00
Cyrus Leung
aa125ecf0e
[Frontend] Improve error message ( #31987 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-08 20:07:03 +00:00
Lucas Kabela
f16bfbe5bc
[Documentation][torch.compile] Add documentation for torch.compile + multimodal encoders ( #31627 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-08 14:33:24 -05:00
Michael Goin
87e07a6b46
Revert "feat(moe): Add is_act_and_mul=False support for Triton MoE kernels" ( #31978 )
2026-01-08 11:31:53 -08:00
Woosuk Kwon
7508243249
[Model Runner V2] Simplify BlockTables with UVA ( #31965 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2026-01-08 10:24:26 -08:00
Nicolò Lucchesi
83e1c76dbe
[CI][ROCm] Fix NIXL tests on ROCm ( #31728 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-09 01:34:43 +08:00
Nishidha Panpaliya
a563866b48
Fix ijson build for Power. ( #31702 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
2026-01-08 17:12:33 +00:00
Nick Hill
a3d909ad2b
[Misc] Tidy up some spec decode logic in GPUModelRunner ( #31591 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-08 09:10:07 -08:00
Jee Jee Li
49568d5cf9
[Doc] Improve MM models LoRA notes ( #31979 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-08 08:55:22 -08:00
danisereb
b8112c1d85
[Bugfix] Fix vllm serve failure with Nemotron Nano V3 FP8 ( #31960 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-01-08 16:08:37 +00:00
Chauncey
eaba8ece77
[Bugfix]: Fix Step3ReasoningParser missing is_reasoning_end_streaming ( #31969 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-08 15:28:13 +00:00
yxing-bj
fe86be66c5
[Model] Support IQuestCoder model ( #31575 )
...
Signed-off-by: yxing <yxing@iquestlab.com >
2026-01-08 14:42:57 +00:00
Chauncey
1da3a5441a
[Docs]: update claude code url ( #31971 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-08 14:04:55 +00:00
TJian
72c068b8e0
[CI] [Bugfix] Fix unbounded variable in run-multi-node-test.sh ( #31967 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-01-08 05:42:01 -08:00
Mary
7645bc524b
[OpenAI] Fix tool_choice=required streaming when output has trailing extra data ( #31610 )
...
Signed-off-by: maylikenoother <ogedengbemary19@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-01-08 21:01:42 +08:00
Ce Zhao
1123a87892
[Model] Enable LoRA support for Pixtral ( #31724 )
...
Signed-off-by: <>
Signed-off-by: 赵策 <alcor@zhaocedeMacBook-Air.local >
Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com >
Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com >
2026-01-08 05:00:57 -08:00
tianshu-Michael-yu
03fd76c570
[Model] Add LFM2-VL model support ( #31758 )
...
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-08 05:00:27 -08:00
Bijaya Dangol
59d260f5e4
[Model] Add Grok-2 ( #31847 )
...
Signed-off-by: dangoldbj <dangoldbj23@gmail.com >
2026-01-08 04:59:48 -08:00
Patrick von Platen
18d4e481d0
[Voxtral] Fix speech transcription api ( #31388 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: bk-201 <joy25810@foxmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: bk-201 <joy25810@foxmail.com >
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com >
Co-authored-by: Anexdeus <5142168@mail.ru >
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2026-01-08 18:34:19 +08:00
Isotr0py
2972a05473
[MM Encoder]: Make MMEncoderAttention's scale takes effect properly ( #31950 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-08 02:33:48 -08:00
Cyrus Leung
5576227bc1
[Model] Standardize common vision encoders ( #31947 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-08 02:33:16 -08:00
Cyrus Leung
d1b6fe007f
[Chore] Further cleanup pooler ( #31951 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-08 02:16:21 -08:00
omer-dayan
04a49669d1
RayLLM Bugfix - Preserve obj store URL for multi engine_config creation ( #30803 )
...
Signed-off-by: Omer Dayan <omdayan@nvidia.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-08 10:00:25 +00:00
BingjiaWang
96fcd3c267
[Misc] Support qwen3-next lora ( #31719 )
2026-01-08 09:27:50 +00:00
DevByteAI
1f214290d6
fix(compile): apply partition wrapper when loading AOT cached functions ( #31536 )
...
Signed-off-by: Devbyteai <abud6673@gmail.com >
Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-08 17:27:26 +08:00
Ryan Rock
8cbdc7eb94
[CI/Build] Enable test_kv_cache_events_dp for AMD ( #31834 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2026-01-08 09:00:24 +00:00
Lumosis
b634e619bb
Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. ( #31635 )
...
Signed-off-by: Lihao Ran <imlihao.ran@gmail.com >
Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com >
2026-01-08 09:00:07 +00:00
Isotr0py
eac3b96ec0
[Models] Allow converting Qwen3-VL into Reranker model ( #31890 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-08 08:10:15 +00:00
Zhiwei
573a1d1119
[ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch ( #31905 )
...
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com >
2026-01-08 15:47:44 +08:00
Shang Wang
33156f56e0
[docker] A follow-up patch to fix #30913 : [docker] install cuda13 version of lmcache and nixl ( #31775 )
...
Signed-off-by: Shang Wang <shangw@nvidia.com >
2026-01-07 23:47:02 -08:00
Rabi Mishra
107cf8e92f
fix(rocm): Add get_supported_kernel_block_sizes() to ROCM_ATTN ( #31712 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-01-08 15:46:07 +08:00
Zyyeric
63baa28cf5
[Model] Enable LoRA support for tower and connector in GLM4-V ( #31652 )
...
Signed-off-by: Zyyeric <eric1976808123@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-08 15:45:53 +08:00
Andy Liu
e5173d3bac
[Bugfix] Remove the num_hidden_layers override for glm4_moe ( #31745 )
2026-01-08 15:45:10 +08:00
prashanth058
d3235cb503
[Fix] Enable mm_processor_cache with vision LoRA ( #31927 )
...
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com >
2026-01-08 15:31:51 +08:00
Nick Hill
287b37cda4
[BugFix] Fix spec decoding edge case bugs ( #31944 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-08 15:31:03 +08:00
Chang Su
791b2fc30a
[grpc] Support gRPC server entrypoint ( #30190 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com >
Signed-off-by: njhill <nickhill123@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: njhill <nickhill123@gmail.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2026-01-07 23:24:46 -08:00
Lucas Wilkinson
be6a81f31b
[chore] Update FA commit ( #30460 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-07 23:24:18 -08:00
Ronald
2ab441befe
[platform] add dp_metadata arg to set_additional_forward_context ( #31942 )
...
Signed-off-by: Ronald1995 <ronaldautomobile@163.com >
2026-01-08 06:56:44 +00:00
ShaanveerS
9572f74f15
[Model] Enable LoRA support for tower and connector in DotsOCR ( #31825 )
...
Signed-off-by: ShaanveerS <shaanver.singh@gmail.com >
2026-01-08 14:50:16 +08:00
Andreas Karatzas
5f2a473ff3
[ROCm][CI] v1 cpu offloading attention backend fix ( #31833 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 14:37:50 +08:00
Michael Goin
6b2a672e47
[Doc] Add Claude code usage example ( #31188 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-08 13:50:23 +08:00
rasmith
f1b1bea5c3
[CI][BugFix][AMD] Actually skip tests marked @pytest.mark.skip_v1 ( #31873 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2026-01-08 13:06:09 +08:00
Charlie Fu
cddbc2b4b2
[ROCm][CI] Add rocm support for run-multi-node-test.sh ( #31922 )
...
Signed-off-by: charlifu <charlifu@amd.com >
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-08 04:36:39 +00:00
Andreas Karatzas
087a138963
[ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory ( #31928 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 04:35:25 +00:00
Andreas Karatzas
c4041f37a4
[ROCm][LoRA] Fix MoE accuracy regression by preserving float32 router weight scaling ( #31931 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 04:17:56 +00:00
Richard Zou
a79079feef
[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 ( #31915 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-08 04:04:58 +00:00
Robert Shaw
9f6dcb71ae
[MoE Refactor][16/N] Apply Refactor to NVFP4 ( #31692 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2026-01-08 03:46:27 +00:00
Andreas Karatzas
8dd2419fa9
[CI] Skip Qwen-VL in multimodal processing tests due to flaky external dependency ( #31932 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 02:58:01 +00:00
Rabi Mishra
39d82005f7
fix(rocm): add early return in get_flash_attn_version for ROCm ( #31286 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-01-08 10:28:07 +08:00
Rabi Mishra
25eef3dc2e
feat(moe): Add is_act_and_mul=False support for Triton MoE kernels ( #31645 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-01-08 10:27:09 +08:00
Matthew Bonanni
0d7667419f
[0/N][Attention] Fix miscellaneous pre-commit issues ( #31924 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-08 01:15:17 +00:00
Robert Shaw
5dcd7ef1f2
[MoE Refactor][15/N] Apply Refactor to Fp8 ( #31415 )
2026-01-07 19:42:33 -05:00
Elvir Crnčević
ffc0a2798b
Add back missing DeepEP LL params ( #31911 )
...
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com >
2026-01-07 17:47:54 -05:00
Nick Hill
10ef65eded
[BugFix] Fix bad words with speculative decoding ( #31908 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-07 15:46:42 -05:00
Ilya Markov
6170d47d22
[EPLB] Optimize EPLB with numpy ( #29499 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-01-07 15:21:35 -05:00
Xin Yang
0ada960a20
[Kernel] Support bias type in grouped_topk kernel ( #31781 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-07 12:16:32 -08:00
Ning Xie
c907d22158
[refactor] refactor memory constants usage ( #31865 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-07 18:37:31 +00:00
Michael Goin
f347ac6c34
[Perf] Fuse stride preparation for NVFP4 cutlass_moe ( #31837 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-07 13:31:26 -05:00
Festus Ayobami Owumi
05f47bd8d2
[Doc] Fix: Correct vLLM announcing blog post link in docs ( #31868 )
...
Signed-off-by: enfinity <festusowumi@gmail.com >
2026-01-07 10:06:42 -08:00
roikoren755
bf184a6621
Enable quantized attention in NemotronH models ( #31898 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-01-07 17:37:19 +00:00
Jee Jee Li
30399cc725
UX: add vLLM env info in '/server_info' ( #31899 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-07 17:13:02 +00:00
Kfir Toledo
b89443b8d9
[KVConnector]: Enable Cross-layers KV cache layout for MultiConnector ( #30761 )
...
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com >
2026-01-07 16:59:43 +00:00
Marko Rosenmueller
1d9e9ae8a4
[Bugfix]: prevent leaking tokens in crash log ( #30751 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com >
2026-01-07 16:15:19 +00:00
Cyrus Leung
b7036c87a1
[Refactor] Clean up pooler modules ( #31897 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-08 00:07:43 +08:00
Kate Cheng
cc6dafaef2
[Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 Linear Op) ( #29213 )
...
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com >
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com >
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com >
2026-01-07 10:53:54 -05:00
R3hankhan
1ab055efe6
[OpenAI] Extend VLLMValidationError to additional validation parameters ( #31870 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-01-07 14:45:49 +00:00
Cyrus Leung
b665bbc2d4
[Chore] Migrate V0 attention utils ( #31891 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-07 13:44:36 +00:00
Jared Wen
974138751b
[Refactor] GLM-ASR Modeling ( #31779 )
...
Signed-off-by: JaredforReal <w13431838023@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-07 13:08:29 +00:00
vllmellm
41cfa50632
[ROCm][AITER] fix wrong argument passed to AITER flash_attn_varlen_func ( #31880 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-07 11:25:03 +00:00
Andy Liu
d111bc53ad
[Bugfix][MTP] Fix GLM4 MoE fp8 loading with MTP on ( #31757 )
...
Signed-off-by: Andy Liu <andyliu@roblox.com >
2026-01-07 09:18:52 +00:00
BlankR
0790f07695
[Misc] Improve error messages for unsupported types and parameters ( #30593 )
...
Signed-off-by: BlankR <hjyblanche@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-07 09:00:16 +00:00
maang
1f33e38e81
[Model] Cleanup: Remove redundant manual definition of make_empty_intermediate_tensors in GLM-4-MoE ( #31869 )
...
Signed-off-by: maang <maang_h@163.com >
2026-01-07 08:18:28 +00:00
sihao_li
59fe6f298e
[XPU]fallback to TRITON_ATTN on xpu when use float32 dtype ( #31762 )
...
Signed-off-by: sihao.li <sihao.li@intel.com >
2026-01-07 08:10:29 +00:00
weiyu
e7596371a4
[Refactor][TPU] Remove torch_xla path and use tpu-inference ( #30808 )
...
Signed-off-by: Wei-Yu Lin <weiyulin@google.com >
Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com >
2026-01-07 16:07:16 +08:00
xuebwang-amd
0dd5dee9b9
[Bugfix][Kernel] fix bias adding in triton kernel implemented fused moe ( #31676 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
2026-01-07 07:36:13 +00:00
Kevin McKay
4614c5a539
[Bugfix][Hardware][AMD] Consolidate FP8 min/max values helper function ( #31106 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
Signed-off-by: Kevin McKay <kevin@example.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-01-07 06:55:03 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
482914849c
[BugFix] LoRA: Support loading base_layer of experts ( #31104 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2026-01-07 14:49:39 +08:00
tianshu-Michael-yu
efeaac92f2
[Bugfix] Fix race condition in async-scheduling for vlm model ( #31841 )
...
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com >
2026-01-07 06:45:10 +00:00
tjp_zju
55caa6051d
refactor: find_loaded_library ( #31866 )
...
Signed-off-by: tjp_zju <tanjianpingzju1990@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-07 06:42:20 +00:00
Lucas Wilkinson
c7a79d41a0
[Attention][3/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties ( #31850 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-07 13:31:34 +08:00
vllmellm
6409004b26
[ROCm][AITER] bugfix accuracy regression in ROCM_AITER_TRITON_MLA backend ( #31816 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-07 05:04:53 +00:00
Cyrus Leung
aafd4d2354
[Chore] Try remove init_cached_hf_modules ( #31786 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-07 12:34:04 +08:00
Jack Yang
0a2c2dc3f1
fixed mypy warnings for files vllm/v1/attention with TEMPORARY workaround ( #31465 )
...
Signed-off-by: Zhuohao Yang <zy242@cornell.edu >
Co-authored-by: Zhuohao Yang <zy242@cornell.edu >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-07 04:08:47 +00:00
Tyler Michael Smith
f09c5feb7c
Change warning in get_current_vllm_config to report caller's line number ( #31855 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-01-07 03:48:13 +00:00
Cyrus Leung
1b8af957f6
[Doc] Update release docs ( #31799 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-07 03:27:40 +00:00
Ce Zhao
a051525e07
[Model] Enable LoRA support for PaliGemma ( #31656 )
...
Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com >
Signed-off-by: Alcor <alcor_zhao@outlook.com >
Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com >
2026-01-07 10:09:32 +08:00
Yihua Cheng
5b833be49e
[1/2][lmcache connector] clean up lmcache multi-process adapter ( #31838 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2026-01-07 02:02:42 +00:00
Lucas Kabela
873480d133
[Misc][BE] Type coverage for vllm/compilation [1/3] ( #31554 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-06 20:37:51 -05:00
vSeamar
6f351548b2
[Frontend] Implement robust video frame recovery for corrupted videos ( #29197 )
...
Signed-off-by: cmartinez <cmartinez@roblox.com >
Signed-off-by: vSeamar <cmartinez@roblox.com >
2026-01-07 01:13:24 +00:00
Andreas Karatzas
364a8bc6dc
[ROCm][CI] Fix plugin tests (2 GPUs) failures on ROCm and removing VLLM_FLOAT32_MATMUL_PRECISION from all ROCm tests ( #31829 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-07 01:12:23 +00:00
Angela Yi
9a1d20a89c
[CI] Add warmup run in test_fusion_attn ( #31183 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-07 00:31:52 +00:00
Cyrus Leung
309a8f66ee
[Bugfix] Handle mistral tokenizer in get_hf_processor ( #31817 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-07 07:46:56 +08:00
Andreas Karatzas
e5d427e93a
[ROCm][CI] Pinning timm lib version to fix ImportError in Multi-Modal Tests (Nemotron) ( #31835 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-06 23:23:11 +00:00
Andreas Karatzas
2a42ae790d
[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm ( #31820 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-06 23:21:15 +00:00
Matthew Bonanni
d49899732e
[Spec Decode][UX] Add acceptance stats to vllm bench serve report ( #31739 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2026-01-06 21:21:42 +00:00
Elvir Crnčević
dba95378a6
Report error log after vllm bench serve ( #31808 )
...
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com >
2026-01-06 20:24:19 +00:00
Nikhil G
ada6f91d56
Fix RecursionError in MediaWithBytes unpickling ( #31191 )
...
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com >
2026-01-06 20:11:26 +00:00
Li, Jiang
8becf146bd
[Quantization][Refactor] Move CPU GPTQ kernel into MP linear ( #31801 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-06 19:10:18 +00:00
Charlie Fu
c07163663d
[ROCm][CI] Fix tests/compile unit tests ( #28895 )
...
Signed-off-by: charlifu <charlifu@amd.com >
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com >
Co-authored-by: Micah Williamson <micah.williamson@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-06 18:50:43 +00:00
Benjamin Chislett
f7008ce1c4
[Perf] Async Scheduling + Speculative Decoding + Structured Outputs ( #29821 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-06 18:50:37 +00:00
Yakine Tahtah
4e67a8f616
[Bugfix] Fix GLM-4 MoE router logits dtype for data parallel chunking ( #31055 )
...
Signed-off-by: ReinforcedKnowledge <reinforced.knowledge@gmail.com >
2026-01-06 17:57:56 +00:00
Masataro Asai
142c4d1738
make 500: InternalServerError more informative ( #20610 )
...
Signed-off-by: Masataro Asai <guicho2.71828@gmail.com >
2026-01-06 17:36:24 +00:00
Ning Xie
6f5e653383
[Log] add log about gpu worker init snapshot and requested memory ( #29493 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-06 17:32:55 +00:00
Vadim Gimpelson
22dffca982
[PERF] Speed-up of GDN attention decode part (Qwen3-Next) ( #31722 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-01-06 17:32:46 +00:00
Lucas Wilkinson
4c73be14e0
[Attention][2/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties ( #31774 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-06 17:32:14 +00:00
Jinzhen Lin
2f4bdee61e
[Quantization][MoE] remove unused ep logic from moe marlin ( #31571 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-06 09:07:19 -08:00
roikoren755
28c94770ad
[NemotronH] Use ReplicatedLinear for fc1_latent_proj ( #31807 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-01-06 16:00:40 +00:00
Robert Shaw
af8fd73051
[MoE Refactor][14/N] Clean Up FI Quant Config Smuggling ( #31593 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-06 15:47:04 +00:00
Robert Shaw
d3e477c013
[MoE Refactor] Add Temporary Integration Tests - H100/B200 ( #31759 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-06 10:34:17 -05:00
Isotr0py
02809af1e7
[Bugfix]: Fix cross attention backend selection for Turing GPU ( #31806 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-06 23:15:56 +08:00
Jee Jee Li
cbd4690a03
[LoRA]Disable linear LoRA kernel PDL ( #31777 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-06 23:12:25 +08:00
wang.yuqi
96860af655
[Model] rename use_pad_token to use_sep_token ( #31784 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-06 14:16:04 +00:00
Chauncey
0202971a48
[Frontend] Support GLM-4.5 / GLM-4.7 with enable_thinking: false ( #31788 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-06 13:53:21 +00:00
Jzz1943
2c1a4f2488
[Bugfix]: avoid overriding audio/text kwargs (Qwen3-Omni) ( #31790 )
...
Signed-off-by: Zhongze Jiang <jiangzhongze.jzz@ant-intl.com >
2026-01-06 12:59:17 +00:00
Cyrus Leung
6444824873
[Misc] Implement TokenizerLike.convert_tokens_to_ids ( #31796 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-06 12:08:22 +00:00
kzwrime
bf0f3a4638
[Bugfix] Fix torch.compile error for DP + MoE on CPU Backend ( #31650 )
...
Signed-off-by: kunzh <zhikun.wu@outlook.com >
2026-01-06 12:06:20 +00:00
Lucas Wilkinson
e0327c9db2
[Attention][1/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties ( #31773 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-06 04:05:17 -08:00
Cyrus Leung
14df02b4e1
[Chore] Cleanup mem_utils.py ( #31793 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-06 19:55:59 +08:00
BlankR
6ebb66ccea
[Doc] Fix format of multimodal_inputs.md ( #31800 )
...
Signed-off-by: BlankR <hjyblanche@gmail.com >
2026-01-06 03:30:24 -08:00
wang.yuqi
43d384bab4
[CI] Increase the MTEB_EMBED_TOL threshold to 5e-4. ( #31797 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-06 19:30:05 +08:00
Cyrus Leung
db318326a5
[Misc] Use deprecated for seed_everything ( #31780 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-06 11:29:55 +00:00
Fadi Arafeh
799b5721f6
[cpu][bench] Add CPU paged attention benchmarks ( #31720 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-06 10:57:57 +00:00
Cyrus Leung
97ca4c3b60
[Chore] Remove more V0 dead code from sequence.py ( #31783 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-06 10:25:14 +00:00
Isotr0py
ee2e69d6cd
[Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff ( #31776 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-06 00:44:22 -08:00
Isotr0py
7101e0851f
[Models]: Use MMEncoderAttention for MoonViT ( #31738 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: h100 <h100@inferact.ai >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: h100 <h100@inferact.ai >
2026-01-06 08:00:25 +00:00
vllmellm
e9717801bd
[Bugfix][ROCm] Fix Unsupported attention metadata type for speculative decoding in eagle.py ( #31714 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-06 07:53:22 +00:00
Cyrus Leung
da71d44410
[Doc] Show that use_audio_in_video is supported in docs ( #30837 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-05 23:27:19 -08:00
Kevin McKay
1fb0209bbc
[Bugfix][Hardware][AMD] Fix exception types in AITER MLA FP8 check ( #31177 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-06 14:10:59 +08:00
Robert Shaw
81323ea221
[CI] Fix CPU MM PRocessor Test ( #31764 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-06 04:22:18 +00:00
Michael Goin
e1cd7a5faf
[Bugfix] Add init_workspace_manager to moe kernel benchmarks ( #31042 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-05 19:14:33 -08:00
Michael Goin
a68e703c32
[UX] Add -ep shorthand for --enable-expert-parallel ( #30890 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-05 19:13:36 -08:00
maang
cd1245a184
[Cleanup] Remove redundant decoder_layer_type assignment in Qwen2 ( #31760 )
...
Signed-off-by: maang <maang_h@163.com >
2026-01-05 18:09:18 -08:00
Wentao Ye
ffec815422
[Perf] Optimize additional fill(0) in cutlass moe, 2.9% E2E throughput improvement, 10.8% TTFT improvement ( #31754 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-05 18:01:13 -08:00
maang
d386ab1412
[Docs] Improve malformed exception caused by backslash line continuations ( #31694 )
...
Signed-off-by: maang <maang_h@163.com >
Signed-off-by: maang <55082429+maang-h@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-05 17:51:54 -08:00
Michael Goin
ccb309a964
Revert "[CI Failure] Disable B200 tests while runner is broken" ( #31750 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-01-05 17:26:33 -08:00
John Calderon
2f4e6548ef
[Bugfix] vLLM produces invalid UTF-8 tokens and “�” ( #28874 )
...
Signed-off-by: John Calderon <jcalderon@nvidia.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2026-01-06 00:23:00 +00:00
Seiji Eicher
3c98c2d21b
[CI/Build] Allow user to configure NVSHMEM version via ENV or command line ( #30732 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-01-05 15:56:08 -08:00
Michael Goin
9513029898
[Bugfix] Properly apply v_scale for mimo_v2_flash ( #31175 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-05 23:20:46 +00:00
Robert Shaw
f6c0009afa
[Bugfix] Fix Broken ModelOpt NVFP4 MoE ( #31742 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-05 23:18:38 +00:00
Yongye Zhu
776ca1e187
[MoE Refactor] Aiter Experts for BF16 MoE ( #31542 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-01-05 14:52:59 -08:00
Wentao Ye
af9a7ec255
[Bug] Revert torch warning fix ( #31585 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-05 22:31:21 +00:00
Matthew Bonanni
276e03b92c
[CI][DeepSeek] Add nightly DeepSeek R1 lm_eval tests on H200 ( #30356 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-05 17:17:59 -05:00
Nick Hill
32f4e4db00
[Cleanup] Remove deprecated fields from CachedRequestData class ( #31734 )
...
Signed-off-by: njhill <nickhill123@gmail.com >
2026-01-05 21:07:14 +00:00
amitz-nv
ee21291825
[Model] Nemotron Parse 1.1 Support ( #30864 )
...
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-05 13:00:14 -08:00
Qidong Su
af1b07b0c5
[docker] install cuda13 version of lmcache and nixl ( #30913 )
...
Signed-off-by: Qidong Su <soodoshll@gmail.com >
2026-01-05 12:50:39 -08:00
gnovack
c77a993cc2
pin lora_b moe weights on cpu ( #31317 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2026-01-05 12:15:40 -08:00
Roberto L. Castro
fdcc5176be
[BugFix] Fix architecture flags to prevent issues on SM103 ( #31150 )
...
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com >
2026-01-05 20:11:35 +00:00
Wang Kunpeng
5708297e4e
[Misc][Model][Refactor] Pass the prefix into Linear layers ( #31669 )
...
Signed-off-by: Wang Kunpeng <1289706727@qq.com >
2026-01-05 20:03:18 +00:00
baonudesifeizhai
02dbb933cb
Fix GLM-4.6v flash tool calling in transformers 5.x ( #31622 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2026-01-05 11:32:43 -08:00
Isotr0py
51e38a8e30
[Misc] Enable Paligemma's PrefixLM attention mask computation ( #31725 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-06 03:31:49 +08:00
Or Ozeri
d8e38d4939
Triton Attention: Support cross-layers blocks ( #30687 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-05 19:29:16 +00:00
kzwrime
21156ff199
[Bugfix] Add missing extra_tensors arg to DeviceCommunicatorBase.disp… ( #31644 )
...
Signed-off-by: kunzh <zhikun.wu@outlook.com >
2026-01-06 01:26:09 +08:00
RickyChen / 陳昭儒
c455b771fd
[Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --enforce-eager ( #31643 )
...
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com >
2026-01-06 01:25:38 +08:00
Michael Goin
eefa713a66
[CI Failure] Disable B200 tests while runner is broken ( #31732 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-05 08:50:51 -08:00
Kevin Šuc
79ed460dd5
[Frontend] [Doc] Exclude log deltas feature ( #30322 )
...
Signed-off-by: Catacomba <kevinsuc16@gmail.com >
Signed-off-by: Kevin Šuc <kevinsuc16@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-05 16:34:35 +00:00
Isotr0py
6aa5b18e1d
[v1] Add encoder-only/cross attention support to Triton Attention backend ( #31406 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-06 00:00:23 +08:00
wang.yuqi
911d38ed99
[Model] Let more models to support the score template. ( #31335 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-05 11:54:26 +00:00
zzzzwwjj
caaa482aca
[platform] Support additional forward context for OOT ( #31674 )
...
Signed-off-by: zzzzwwjj <1183291235@qq.com >
Signed-off-by: zzzzwwjj <34335947+zzzzwwjj@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-05 10:25:13 +00:00
Yihua Cheng
b471aad41f
[KVconnector][LMCache] remove the import of legacy LMCache code ( #31704 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2026-01-05 10:11:01 +00:00
Jee Jee Li
d5503ca7f9
[LoRA] LoRA PDL improvement ( #31660 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-05 08:28:46 +00:00
Qiping Pan
a2ad15c070
[Model] Enable LoRA support for BLIP2 ( #31620 )
...
Signed-off-by: Qiping Pan <panqiping@outlook.com >
2026-01-05 08:02:24 +00:00
Tres
3133c192a3
[ROCM] Reorder arguments and rename parameters for rope_cached_thd_positions_2c_fwd_inplace ( #29993 )
...
Signed-off-by: Tres Popp <tres.popp@amd.com >
2026-01-05 15:37:57 +08:00
wang.yuqi
76fd458aa7
[CI] Bump sentence-transformer from 3.2.1 to 5.2.0 ( #31664 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-04 21:45:01 -08:00
cjackal
e2701cc525
[Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser ( #31581 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-01-05 05:42:47 +00:00
Tyler Michael Smith
fe8a9fbd2e
[Bugfix] Fix EPLB state logging error ( #31455 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-01-05 04:06:28 +00:00
Ning Xie
98b8b3abaa
[log] enable max_log_len trim only when needed ( #31482 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-05 03:55:43 +00:00
CHENYUE
346e56455a
Add chat prefix completion feature to DeepSeek v3.2 ( #31147 )
2026-01-05 11:20:25 +08:00
wang.yuqi
8be6432bda
[CI Failure] Fix NomicBert max_model_len validation ( #31662 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-05 11:06:52 +08:00
Nick Hill
43e3f8e4a9
[Misc] Various code simplifications ( #31666 )
...
Signed-off-by: njhill <nickhill123@gmail.com >
2026-01-04 18:35:56 -08:00
wangxiyuan
bb4337b34c
[Platform] Deprecate seed_everything ( #31659 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2026-01-04 18:34:04 -08:00
Isotr0py
367856de14
[CI/Build] Revive skipped reward models e2e test ( #31665 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-05 02:33:46 +00:00
Nick Hill
da436f868a
[Minor] Small pooler output processing optimization ( #31667 )
...
Signed-off-by: njhill <nickhill123@gmail.com >
2026-01-04 18:33:12 -08:00
Jee Jee Li
f099cd557a
[Bugfix] Fix AttributeError: 'Stream' object has no attribute 'dp_size' ( #31663 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-05 02:31:31 +00:00
Andreas Karatzas
f2b6dfd237
[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp ( #31597 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-05 02:17:05 +00:00
Andreas Karatzas
89f1f25310
[CI] Skip Phi-MoE test due to old API util ( #31632 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-05 08:52:07 +08:00
Nick Hill
b53b89fdb3
[BugFix] Async scheduling: handle model forward errors more cleanly ( #31611 )
...
Signed-off-by: njhill <nickhill123@gmail.com >
2026-01-04 11:04:37 -08:00
Ning Xie
6522721d17
[misc] Sort uvicorn log level description according to verbosity ( #31137 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-04 18:45:37 +00:00
Yuxuan Zhang
0d4044edd8
fix no think of GLM-4.5 / GLM-4.7 ( #31449 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
2026-01-04 11:43:00 +08:00
Reagan Lee
41ab179738
[Docs] Fix argparse include path for mm-processor benchmark ( #31654 )
...
Signed-off-by: Reagan <reaganjlee@gmail.com >
2026-01-04 03:31:29 +00:00
Robert Shaw
268b1c55ad
[MoE Refactor][13/N] Convert FI to Use PFNoEP ( #31533 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-03 12:26:36 -08:00
Andreas Karatzas
4f9ce35afe
[CI][Bugfix] Fix token counting in chunked prefill compl test ( #31630 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-03 14:28:49 +08:00
jeremyteboul
97a01308e9
Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring ( #29255 )
...
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com >
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com >
2026-01-03 04:31:09 +00:00
Xingyu Liu
0eee877f67
[Core] Parse vLLM engine required fields from hf_config to model_arch_config ( #28454 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com >
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com >
2026-01-02 15:13:15 -08:00
Alfred
a0e9ee83c7
[Benchmark] Fix OOM during MoE kernel tuning for large models ( #31604 )
...
Signed-off-by: Alfred <massif0601@gmail.com >
2026-01-02 22:24:51 +00:00
Yongye Zhu
a3f2f40947
[MoE Refactor] Explicit construct mk for flashinfer bf16 kernel ( #31504 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-01-02 13:54:50 -08:00
Yongye Zhu
5a468ff7c7
[MoE Refactor] Split invoke_fused_moe_kernel ( #31050 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-01-02 13:47:15 -08:00
Andreas Karatzas
6ef770df7c
[MoE] Fix output_shape calculation in Attention layer to handle 3D query inputs ( #31596 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-02 15:46:23 +00:00
Nick Hill
bd877162eb
[BugFix] Support online dense model DP without overhead ( #30739 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: njhill <nickhill123@gmail.com >
2026-01-02 23:36:38 +08:00
Xinyu Chen
08f425bad1
CustomOp: test forward dispatch for grouped_topk ( #31530 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2026-01-02 10:04:01 -05:00
labAxiaoming
a01f2faedf
Add multimodal input method in the documentation ( #31601 )
...
Signed-off-by: xiaoming <1259730330@qq.com >
2026-01-02 12:43:30 +00:00
Kyuyeun Kim
cc410e8644
[Bugfix] Fix weight_loader v1 block scale ( #31103 )
...
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com >
2026-01-02 13:14:10 +08:00
Kevin McKay
825c2dc133
[Bugfix][Hardware][AMD] Fix last_page_len calculation in AITER MLA decode ( #31282 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2026-01-01 21:14:00 -08:00
Vaibhav Sourirajan
1f43c121d5
Remove unused use_marlin variable in Mxfp4MoEMethod ( #31549 )
...
Signed-off-by: vaibhav sourirajan <vs2787@columbia.edu >
2026-01-01 21:13:36 -08:00
Tmn07
ca179d0f64
[Bugfix] Fix activation quantization for compressed-tensors W4A16 ( #31572 )
...
Signed-off-by: Tmn07 <tmn0796@gmail.com >
2026-01-01 21:13:22 -08:00
Andreas Karatzas
013b54088c
[ROCm][CI] Fix ModernBERT token classification test ( #31612 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-02 04:19:08 +00:00
Jay Hemnani
5ac55eb30f
[Model] Enable LoRA support for tower and connector in LLaVA ( #31513 )
...
Signed-off-by: Jay Hemnani <jayhemnani9910@gmail.com >
Co-authored-by: Jay Hemnani <jayhemnani9910@gmail.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-01 19:32:39 -08:00
Benjamin Chislett
ea53ca5e85
[Bugfix] Fix block size used in EAGLE slot mapping ( #31540 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-01-01 19:32:30 -08:00
zhima771
27864a851c
feat: support LoRA for DeepSeek-OCR(Language Model part) ( #31569 )
...
Signed-off-by: zhima771 <15836938703@163.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-01 19:32:11 -08:00
Andreas Karatzas
5cc4876630
[ROCm][CI] Fix failure in Language Models Tests (Extra Standard) by reducing agent pool size ( #31553 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-01 19:29:42 -08:00
Kevin McKay
5fff44064b
[Bugfix] Replace BaseException with specific exceptions in FLA utils ( #31590 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2026-01-01 19:27:54 -08:00
Reagan Lee
1f5b7c41c3
Add Multimodal Processor Benchmark ( #29105 )
...
Signed-off-by: Reagan Lee <reaganjlee@gmail.com >
Signed-off-by: Reagan <reaganjlee@gmail.com >
2026-01-01 19:26:53 -08:00
Ekagra Ranjan
adcf682fc7
[Audio] Improve Audio Inference Scripts (offline/online) ( #29279 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2025-12-31 23:34:18 +00:00
Andreas Karatzas
21de6d4b02
[CI][Bugfix] Fix token counting in chunked prefill streaming test ( #31565 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-31 23:05:14 +00:00
Nick Hill
6c2cfb62ff
[BugFix] Fix async scheduling for pooling models ( #31584 )
...
Signed-off-by: njhill <nickhill123@gmail.com >
2025-12-31 14:48:51 -08:00
Fanjiang Ye
d8da76f3b7
[Bugfix] Fix BAGEL online serving for text and image understanding ( #31546 )
...
Signed-off-by: Dylan1229 <yvanphys@gmail.com >
Signed-off-by: UED <zxr3611244710@gmail.com >
Signed-off-by: mr-ye-cao <yecaoyc2019@gmail.com >
Co-authored-by: UED <zxr3611244710@gmail.com >
Co-authored-by: mr-ye-cao <yecaoyc2019@gmail.com >
Co-authored-by: Mr-Ye-Cao <60802056+Mr-Ye-Cao@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-31 14:46:10 -08:00
baonudesifeizhai
d722e9e614
Add GLM-ASR multimodal support ( #31436 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-31 23:12:24 +08:00
Andreas Karatzas
cf16342d43
[ROCm][CI] Update MiniCPM model test: MiniCPM3-4B to MiniCPM4.1-8B and simplify attention backend testing ( #31551 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-31 00:12:01 -08:00
Wentao Ye
357d435c54
[Bug] Fix log issue with \n ( #31390 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-12-30 21:16:55 -08:00
danisereb
108a2728f7
Add get_expert_mapping to NemotronHModel (for LoRA support) ( #31539 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2025-12-30 21:09:03 -08:00
TJian
578c8f51f6
[CI] [Critical] [CUDA] Fix duplicated test name ( #31562 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-12-30 21:01:09 -08:00
maang-h
b4bb5f312f
[Core] Remove unused num_tokens parameter from _init_model_kwargs ( #31517 )
...
Signed-off-by: maang <maang_h@163.com >
2025-12-30 20:47:23 -08:00
SameerAsal
70e1acefcd
[BugFix] Fix NUMA node validation in CPU platform ( #31520 )
...
Signed-off-by: SameerAsal <SameerAsal@users.noreply.github.com >
Co-authored-by: SameerAsal <SameerAsal@users.noreply.github.com >
2025-12-31 04:06:49 +00:00
Qiu
84f6cd741b
[Mics] add pcp basic support to MoE model ( #31003 )
2025-12-30 20:01:29 -08:00
B-201
ecd49ce7e6
[Fix] Align fused moe lora_b shape with peft ( #31534 )
...
Signed-off-by: bk-201 <joy25810@foxmail.com >
2025-12-31 09:44:59 +08:00
Amr Mahdi
e1ee11b2a5
Add docker buildx bake configuration ( #31477 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2025-12-31 01:08:54 +00:00
vintipandey
04147dcfa7
[Bugfix]Fix pooling model always disabled due to incorrect PP rank check ( #31505 )
...
Signed-off-by: vintipandey <vinti.pandey@gmail.com >
2025-12-30 11:27:10 -08:00
JartX
07728bf5cd
[BugFix] add select_gemm_impl on CompressedTensorsWNA16MoEMethod to support LoRA ( #31453 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2025-12-30 11:20:15 -08:00
yt0428
3f52fa5aa2
[Model] Add support for openPangu moe model ( #28775 )
...
Signed-off-by: yuantao <2422264527@qq.com >
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-30 08:11:38 -08:00
Li, Jiang
7157596103
[CPU] Disable async schedule on CPU ( #31525 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-12-30 12:34:08 +00:00
Nicolò Lucchesi
ab1af6aa3e
[CI][NIXL] Split DPEP tests ( #31491 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-30 07:26:12 -05:00
Pleaplusone
1a834df2d4
[ROCm][Bugfix] Fix accuracy issue on fmoe when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS enabled ( #31523 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-12-30 09:21:49 +00:00
Kevin
51085c2aeb
[Frontend] add continue_final_message parameter to /embeddings endpoint ( #31497 )
...
Signed-off-by: Kevin P-W <140451262+kevin-pw@users.noreply.github.com >
2025-12-30 07:21:13 +00:00
Roger Feng
3d973764ce
[xpu] [bugfix] upgrade to latest oneccl in dockerfile ( #31522 )
...
Signed-off-by: roger feng <roger.feng@intel.com >
2025-12-30 14:52:28 +08:00
Nick Hill
3b312fb792
[Minor] Various small code cleanups/simplifications ( #31508 )
...
Signed-off-by: njhill <nickhill123@gmail.com >
2025-12-29 22:42:06 -08:00
ZT-AIA
f84bf7d79b
Add Loraconfig parameter to get_punica_wrapper function ( #31408 )
...
Signed-off-by: ZT-AIA <1028681969@qq.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-29 22:27:31 -08:00
Roy Wang
99dcf5dcc5
Migrate meetups & sponsors [2/N] ( #31500 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2025-12-30 04:26:15 +00:00
Hojin Yang
dc837bc23e
feat(frontend): add --default-chat-template-kwargs CLI argument ( #31343 )
...
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com >
2025-12-30 03:38:47 +00:00
Nick Hill
e54ee3ea33
[Core] Deduplicate generate/encode logic in AsyncLLM ( #31510 )
...
Signed-off-by: njhill <nickhill123@gmail.com >
2025-12-30 10:42:45 +08:00
wangln19
358bfd315c
fix: update kimi k2 tool parser logic ( #31207 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Signed-off-by: Wang Linian <wanglinian@stu.pku.edu.cn >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-12-30 10:01:58 +08:00
Sage
39512aba72
[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction ( #27577 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com >
2025-12-30 00:17:16 +00:00
qli88
0f35429a0c
[CI]Test Group 'NixlConnector PD accuracy tests' is fixed ( #31460 )
...
Signed-off-by: qli88 <qiang.li2@amd.com >
2025-12-29 23:48:56 +00:00
Alexei-V-Ivanov-AMD
d63b969675
[CI/ROCm] Fixing "V1 Test attention (H100)" test group. ( #31187 )
...
Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com >
Signed-off-by: <>
Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com >
Co-authored-by: root <root@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com >
2025-12-29 16:53:59 -05:00
Robert Shaw
56f516254c
[Bugfix][ROCm] Fix Static Quant Issue ( #31502 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-12-29 13:27:55 -08:00
Robert Shaw
9152a30d8f
[MoE Refactor][12/N] Marlin Fp8 MoE Pure Function ( #31499 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-12-29 13:27:00 -08:00
Nick Hill
c2ff33cc8c
[Core] Enable async scheduling by default ( #27614 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2025-12-29 13:20:55 -07:00
chunxiaozheng
b12cb38398
implements register kv caches in lmcache connector ( #31397 )
...
Signed-off-by: idellzheng <idellzheng@tencent.com >
2025-12-29 11:13:42 -08:00
Roger Young
5bc664110f
Optimize QKNorm for MiniMax-M2/M2.1 ( #31493 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-12-29 16:30:18 +00:00
RickyChen / 陳昭儒
b3a2bdf1ac
[Feature] Add offline FastAPI documentation support for air-gapped environments ( #30184 )
...
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com >
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-29 16:22:39 +00:00
Harry Mellor
e37e7349e6
Replace nn.ConvNd with vLLM's ConvNdLayer for Transformers modeling backend ( #31498 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-29 16:20:01 +00:00
Roy Wang
b5d2d71d26
Migrate doc to website: Hardware Plugins (1/N) ( #31496 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2025-12-29 15:55:20 +00:00
Harry Mellor
decc244767
[Docs] Use relative md links instead of absolute html links for cross referencing ( #31494 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-29 13:33:44 +00:00
amittell
9c884faa95
[Bugfix] Preserve tool call id/type/name in streaming finish chunk ( #31438 )
...
Signed-off-by: amittell <mittell@me.com >
Signed-off-by: Alex Mittell <mittell@me.com >
2025-12-29 21:10:52 +08:00
Chauncey
48d5ca4e8b
[CI] fix test_chat_truncation_content_not_null test ( #31488 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-29 12:47:08 +00:00
twj
bf73a3e4d7
[Bugfix][Frontend] Fix Jina reranker multimodal input compatibility ( #31445 )
...
Signed-off-by: tianwenjing <tianwenjing@jfgenius.com >
Signed-off-by: twj <151701930+twjww@users.noreply.github.com >
Co-authored-by: tianwenjing <tianwenjing@jfgenius.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-29 01:13:18 -08:00
Andreas Karatzas
3ecfdc3776
[ROCm][GPTQ][Bugfix] Fix GPTQ GEMM kernel output zeroing race condition ( #30719 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-29 01:13:14 -08:00
Andreas Karatzas
45c1ca1ca1
[ROCm][CI] Skip DeepGemm-dependent test on ROCm platform ( #31462 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-29 16:31:10 +09:00
Li, Jiang
17347daaa2
[CI/Build][CPU] Update CPU CI test cases ( #31466 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-12-29 14:17:52 +08:00
Mamy Ratsimbazafy
b9793e6a8c
Add Fused MoE Triton kernels for GLM-4.5-Air, GLM-4.5v, GLM-4.6v on 2x RTX Pro 6000 ( #31407 )
...
Signed-off-by: Mamy Ratsimbazafy <mamy_github@numforge.co >
2025-12-28 08:38:33 -08:00
Jzz1943
0b6b701050
[Model] Add tuned triton fused_moe configs for Qwen3Moe on B200 ( #31448 )
...
Signed-off-by: Zhongze Jiang <jiangzhongze.jzz@ant-intl.com >
2025-12-28 08:38:07 -08:00
Nick Hill
094fcce250
[BugFix] Re-fix async multimodal cpu tensor race condition ( #31373 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Signed-off-by: njhill <nickhill123@gmail.com >
2025-12-28 03:05:08 -08:00
Andreas Karatzas
573dd0e6f0
[ROCm] Migrate xgrammar to upstream release ( #31327 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-28 00:08:29 -08:00
Andreas Karatzas
f70368867e
[ROCm][CI] Add TorchCodec source build for transcription tests ( #31323 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-28 16:06:05 +08:00
Andreas Karatzas
96142f2094
[ROCm][CI] Added perceptron lib in requirements for isaac multi-modal test ( #31441 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-28 04:15:14 +00:00
Boyuan Feng
62def07d67
[BugFix] register quant scale tensors as buffer ( #31395 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-28 11:20:02 +08:00
yitingdc
b326598e97
add tip for VLLM_USE_PRECOMPILED arg to reduce docker build time ( #31385 )
...
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io >
2025-12-28 03:19:47 +00:00
Robert Shaw
727c41f3fd
[MoE Refactor][10/N] Cleanup Fp8 Process Weights After Loading ( #31169 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-12-27 20:22:48 +00:00
Boyuan Feng
2f12cd32c0
[BugFix] Fix cache issue in compilation_config ( #31376 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-27 09:30:39 -05:00
Isotr0py
40a8756224
[Chore]: Remove HF format Phi4-MM examples ( #31405 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-27 13:42:02 +00:00
Isotr0py
3d024985ab
[CI/Build] Ignore max transformers version for more common tests ( #31401 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-27 13:06:26 +00:00
baonudesifeizhai
8711b21676
Fix/get raw stream patch #30905 ( #30912 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-12-26 20:08:47 -08:00
Yifan Qiao
52bf066516
[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector ( #30166 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu >
Co-authored-by: KuntaiDu <kuntai@uchicago.edu >
2025-12-26 18:25:46 -08:00
Kunshang Ji
5326c89803
[XPU][CI]skip test_preprocess_error_handling due to fork/spawn issue ( #31381 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-12-26 21:40:44 +00:00
Xinyu Chen
87f1b8ca2c
CustomOp: Unify aiter impl into GroupedTopk ( #31221 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2025-12-26 12:44:29 -05:00
rongfu.leng
887e900b77
[Docs] Add profiler user docs for http request ( #31370 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-12-26 23:48:15 +08:00
Patrick von Platen
48e744976c
[Mistral common] Ensure all functions are imported from the top & only use public methods ( #31138 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-26 04:48:24 -08:00
Jee Jee Li
ce1eafd1a5
[Core] Initialize LoRA support for tower and connector in multi-modal models ( #26674 )
...
Signed-off-by: bk-201 <joy25810@foxmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com >
Co-authored-by: bk-201 <joy25810@foxmail.com >
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com >
Co-authored-by: Anexdeus <5142168@mail.ru >
2025-12-26 04:48:20 -08:00
Harry Mellor
0b544e6476
[Docs] Fix some snippets ( #31378 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-26 12:47:41 +00:00
Jee Jee Li
c3666f56fd
[Misc] Fix Qwen2-MoE shared_expert_gate ( #31339 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-26 05:10:39 +00:00
Andreas Karatzas
c79dbfa9ad
[CI] Fix flaky vision beam search test with flexible semantic validation ( #31324 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-26 04:39:32 +00:00
Shinichi Hemmi
9ee05cbe7f
Support LoRA and GPTQModel for PLaMo 2/3 ( #31322 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
2025-12-26 11:41:33 +08:00
Ning Xie
3b8f31b362
[benchmark] use model card root instead of id ( #31329 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-12-26 10:55:56 +08:00
Isotr0py
2cd94259c8
[CI/Build] Ignore max transformers version skipping for initialization tests ( #30619 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-26 10:50:32 +08:00
oscardev256
b7165d53c6
Feature/isaac 0.1 ( #28367 )
...
Signed-off-by: oscardev256 <42308241+oscardev256@users.noreply.github.com >
Signed-off-by: Oscar Gonzalez <ogonzal6@alumni.jh.edu >
Signed-off-by: Yang <lymailforjob@gmail.com >
Co-authored-by: Yang <lymailforjob@gmail.com >
2025-12-25 18:49:11 -08:00
Nick Hill
81786c8774
[BugFix] Fix async scheduling + reasoning with struct output ( #31332 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2025-12-25 23:01:02 +00:00
Stan Wozniak
f1531d9f2a
[Hybrid] Mamba2 prefix cache blocks freeing for running requests ( #28047 )
...
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-12-25 20:54:06 +00:00
SongHe
2d6001f491
[Model][Ernie4.5-VL] Support video metadata for timestamp rendering ( #31274 )
...
Signed-off-by: dengsonghe <dengsonghe@baidu.com >
Co-authored-by: dengsonghe <dengsonghe@baidu.com >
2025-12-25 14:07:15 +00:00
Amir Samani
030fc44914
use the same stream for cuda graph catpure and replay for NCCL ( #29207 )
...
Signed-off-by: Amir Samani <asamani@nvidia.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-12-25 19:10:03 +08:00
Isotr0py
2532f437ee
[Doc] Add troubleshooting for Triton PTX error about undefined gpu-name ( #31338 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-12-25 02:26:34 -08:00
Louie Tsai
f15185fbdb
[Benchmark Suite] improve cpu Benchmark Suite tests and comparison report for 0.12.0 ( #30994 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
2025-12-25 08:51:45 +00:00
Mark Gatere
ba25a65992
[Frontend] add FunctionGemma tool parser support ( #31218 )
...
Signed-off-by: gateremark <gateremg@gmail.com >
2025-12-25 15:29:25 +08:00
Amith KK
42826bbccd
[Doc] Add tool call parser documentation for GPT-OSS models ( #31212 )
...
Signed-off-by: Amith KK <amithkumaran@gmail.com >
2025-12-25 05:29:10 +00:00
Richard Zou
254f6b9867
[Bugfix] Fix eagle dp tests on A100 ( #31241 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-12-25 00:05:04 +00:00
Michael Goin
bc5ef333e0
[Perf] Add skip_clone to SamplingParams for internal request handling ( #31041 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-24 14:35:57 -08:00
Cyrus Leung
09dc7c690c
[Chore][1/2] Drop v0.14 deprecations ( #31285 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-24 09:54:01 -08:00
ゆり
506eb0f454
[Bugfix] Remove dead block_quant_to_tensor_quant function ( #31294 )
...
Co-authored-by: yurekami <yurekami@users.noreply.github.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-24 17:22:48 +00:00
Ning Xie
5d93089686
[cli] complete vllm cli help message ( #31226 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-12-24 15:45:47 +00:00
Kevin McKay
66c9887440
[Bugfix][Hardware][AMD] Fix FP8 dtype in silu_mul quantization ( #31179 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2025-12-24 10:37:11 -05:00
wang.yuqi
1ff67df182
[CI] Reorganization pooling_mteb_test ( #31265 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-12-24 23:36:20 +08:00
skaraban3807
7cd288a4b3
[PERF] Add interleaved memory allocation to NUMA module ( #30800 )
2025-12-24 13:47:49 +00:00
Cyrus Leung
d201807339
[Chore] Bump lm-eval version ( #31264 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-24 05:39:13 -08:00
Cyrus Leung
aa3868ecfe
[Chore] Remove unused noqas ( #31263 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-24 05:38:46 -08:00
Cyrus Leung
7adeb4bfa8
[Bugfix] Fix max_model_len="auto" handling ( #31260 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-24 19:15:27 +08:00
wang.yuqi
bd89ce16d2
[Model] Introduce verify_and_update_model_config for VerifyAndUpdateConfig. ( #31131 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
2025-12-24 09:54:57 +00:00
Pleaplusone
b41aeb3468
[Bugfix][ROCm] Fix load issue on deepseek quark quantization when shared expert enabled ( #31261 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-12-24 16:47:44 +08:00
Ryan Rock
ddfac7034e
[CI/Build] Ignore data_parallel_size_local ( #30281 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2025-12-24 07:40:54 +00:00
Micah Williamson
6559d96796
[ROCm][CI] Set TORCH_NCCL_BLOCKING_WAIT Distributed Tests On ROCm ( #31259 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-24 07:19:07 +00:00
kliuae
1c74150bca
[ROCm][CI] Fix "Distributed Tests (H200)" Test ( #31227 )
...
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
2025-12-24 06:56:30 +00:00
Andreas Karatzas
0247a91e00
[ROCm][CI] Fix entrypoints tests and Python-only installation test on ROCm ( #28979 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-23 22:42:30 -08:00
Michael Goin
8ee90c83f8
Add --max-model-len auto to auto-fit context to available memory ( #29431 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-23 21:37:14 -08:00
Nick Cao
d7e05ac743
[docker] Fix downloading sccache on aarch64 platform ( #30070 )
...
Signed-off-by: Nick Cao <nickcao@nichi.co >
2025-12-23 21:36:33 -08:00
sihao_li
471ddb99a0
[XPU] Remove distributed_executor_backend check ( #30760 )
...
Signed-off-by: sihao.li <sihao.li@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2025-12-23 21:34:33 -08:00
Xiong Wang
bb24592d13
[Qwen3-Omni] fixed _get_feat_extract_output_lengths function ( #31007 )
...
Signed-off-by: Xiong Wang <wangxiongts@163.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-12-23 21:33:54 -08:00
Matthew Bonanni
369f47aa0f
[DeepSeek v3.2] Remove unnecessary syncwarps ( #31047 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-23 21:33:30 -08:00
zejunchen-zejun
dabff12ed3
[Bugfix][ROCm][Dynamo][DS 3.1][FP8] fix unsupported hasattr call when Dynamo tracing for ROCm device ( #31149 )
...
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com >
2025-12-23 21:32:19 -08:00
Ming Yang
3bb9561928
Revert "[bench] Support common prefix len config (for decode-only bench)" ( #31240 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-12-23 21:17:23 -08:00
Micah Williamson
3ce791ac77
[ROCm][CI] Set VLLM_FLOAT32_MATMUL_PRECISION="tf32" For terratorch Tests In AMD CI ( #31242 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-24 03:21:50 +00:00
Andreas Karatzas
e42894f5b5
[ROCm][CI][Bugfix] Fix Siglip2 rotary embedding dispatch and InternVL video test tolerance ( #31235 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-24 02:56:58 +00:00
Wentao Ye
76e6a95192
[Bug] Fix Number of dimensions of tensors must match. for Deepseek V3.2 ( #31160 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-24 10:41:09 +08:00
Chao Lei
8b59753cdb
[P/D] Mooncake connector support more protocols ( #30133 )
...
Signed-off-by: LCAIZJ <leichao139636@163.com >
2025-12-24 10:24:07 +08:00
Chen Zhang
538e830caa
[KVEvent] User request.block_hash for parent block_hash ( #30544 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu >
Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu >
2025-12-23 18:23:43 -08:00
rongfu.leng
4ed11105d7
[Misc] Remove unused custom ops copy_blocks and copy_blocks_mla ( #30967 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-12-23 18:22:35 -08:00
Cyrus Leung
dd424571c8
[Bugfix] Enable dynamic_dims for different embeds shape ( #31223 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-24 10:15:47 +08:00
Cyrus Leung
ca6a95ba25
[Chore] Simplify logic of _execute_mm_encoder ( #31222 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-23 18:15:16 -08:00
Vadim Gimpelson
bc0a5a0c08
[CI] Add Qwen3-Next-FP8 to Blackwell model tests ( #31049 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-12-23 17:21:50 -08:00
Andreas Karatzas
bfa2c0bbb9
[ROCm][Bugfix] Fix RuntimeError in MMEncoderAttention by replacing .view() with .reshape() ( #31203 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-23 21:48:01 +00:00
Mark McLoughlin
f790068600
[Core] Add a random suffix to frontend-provided request IDs ( #27987 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-12-23 13:05:39 -08:00
Asaf Joseph Gardin
34916ae37f
[Mamba] - Consolidate Mambas Attention Logic ( #28133 )
2025-12-23 21:57:00 +01:00
Yuan Tang
0736f901e7
docs: Add llm-d integration to the website ( #31234 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2025-12-23 20:27:22 +00:00
Harry Mellor
c016c95b45
Use helper function instead of looping through attribute names ( #29788 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-23 17:31:56 +00:00
Harry Mellor
1339878e13
Only patch original_max_position_embeddings for Transformers v4 ( #31214 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-23 16:46:32 +00:00
danielafrimi
b94f80ffb8
[FIX] FP4 quantization kernel padding initialization bug ( #31097 )
...
Signed-off-by: <>
Co-authored-by: root <root@gpu-193.slurm-workers-slurm.slurm.svc.cluster.local >
Co-authored-by: root <root@gpu-951.slurm-workers-slurm.slurm.svc.cluster.local >
2025-12-23 08:45:18 -08:00
Joachim Studnia
38c361f99d
Fix edge case Mistral tool parser ( #30724 )
...
Signed-off-by: Joachim Studnia <joachim@mistral.ai >
Signed-off-by: Joachim Studnia <studniajoachim@gmail.com >
Signed-off-by: juliendenize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: juliendenize <julien.denize@mistral.ai >
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2025-12-23 14:19:58 +00:00
Cyrus Leung
bb62dda2c3
[Misc] Introduce encode_*_url utility function ( #31208 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-23 13:45:21 +00:00
Patrick von Platen
3faa8bee57
adapt voxtral ( #31095 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2025-12-23 05:31:55 -08:00
Harry Mellor
b10d47e0e0
Add util function for checking nesting of rope parameters ( #31146 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-23 11:41:49 +00:00
R3hankhan
769f27e701
[OpenAI] Add parameter metadata to validation errors ( #30134 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2025-12-23 11:30:12 +00:00
Jakub Zakrzewski
23daef548d
[Frontend] Support using chat template as custom score template for reranking models ( #30550 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-12-23 11:19:16 +00:00
Jee Jee Li
27c6c2f98c
[Bugfix] Fix MoE LoRA bin/pt loading ( #31161 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-23 19:09:15 +08:00
Weida Hong
73cfb7a722
Correct position of docstring of class attributes ( #31209 )
...
Signed-off-by: Weida Hong <wdhongtw@google.com >
2025-12-23 02:08:58 -08:00
vllmellm
f32cfd7d97
[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass ( #26575 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-12-23 02:07:54 -08:00
Jee Jee Li
6b16fff01b
[Bugfix] Fix Jais2ForCausalLM ( #31198 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-23 07:44:01 +00:00
Yan Ma
f1c2c20136
[XPU] decrease IGC_ForceOCLSIMDWidth for speculative decoding triton-xpu kernel compilation ( #30538 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-12-23 05:22:15 +00:00
Cyrus Leung
8cef137689
[Chore] Update more locations to use attention_config.backend ( #31153 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-22 19:19:50 -08:00
quanliu
a37328fc5c
[Feature] Batch invariant: Lora ( #30097 )
...
Signed-off-by: quanliu <18646313696@163.com >
2025-12-23 10:32:47 +08:00
Pavani Majety
3e10262356
Revert "[SM100] Enable fp8 compute for prefill MLA ( #30746 )" ( #31197 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-12-22 18:15:33 -08:00
Angela Yi
612d5ffdab
[ci] Fix Pytorch compilation test oom in 2.10 ( #31194 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-12-23 01:56:47 +00:00
Divakar Verma
78e5e62bbf
[AMD][CI] fix v1/engine test_preprocess_error_handling ( #31192 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-12-23 01:28:19 +00:00
Robert Shaw
b57b967386
[MoE Refactor][7/N] AITER MK ( #31102 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-12-22 16:42:58 -07:00
Michael Goin
6d518ffbaa
[CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests ( #31182 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-22 15:40:35 -08:00
Benjamin Chislett
85aff45e24
[Perf] Remove blocking copy in GDN Attention ( #31167 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-12-22 14:25:22 -08:00
Wentao Ye
5312a7284e
[Bug] Fix 'CutlassMLAImpl' object has no attribute '_workspace_buffer' ( #31173 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-22 14:24:27 -08:00
Lucas Wilkinson
de71747655
[SpecDecode] Simplified alternative padded-speculation acceptance rate fix ( #29845 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-22 13:06:10 -08:00
Michael Goin
9586354053
[Doc] Add vllm-metal to hardware plugin documentation ( #31174 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-22 20:06:29 +00:00
Pavani Majety
b10f41c894
[SM100] Enable fp8 compute for prefill MLA ( #30746 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-12-22 19:15:57 +00:00
Yongye Zhu
7b926e8901
[MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE ( #31052 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2025-12-22 17:34:19 +00:00
Gregory Shtrasberg
ab3a85fd68
[ROCm][CI/Build] Fix triton version to one that has triton_kernels required for gpt-oss to run ( #31159 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-12-22 17:19:27 +00:00
Boyuan Feng
8dd0db687b
[UX] improve profiler error message ( #31125 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-22 08:45:59 -08:00
TJian
022f3cea53
[ROCm] [Critical]: Remove unused variable ( #31156 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-12-22 08:28:22 -08:00
Micah Williamson
a5bc77c253
[AMD][CI] Add "V1 Test e2e + engine" to mi325_8 Agent Pool ( #31040 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-22 10:41:56 -05:00
Nicolò Lucchesi
b1c3f96ae3
[CI][Bugfix] Fix entrypoints/openai/test_audio.py ( #31151 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-22 07:21:40 -08:00
dengyunyang
8f8f469b1b
[BugFix] skip language model in Encoder ( #30242 )
...
Signed-off-by: dengyunyang <584797741@qq.com >
2025-12-22 05:25:59 -08:00
Shengqi Chen
2cf91c2ea4
[CI] add polling for precompiled wheel in python_only_compile.sh, fix index generation for releases ( #30781 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-22 13:24:21 +00:00
AlonKejzman
bd6d5a7475
[gpt-oss] Fix harmony parser in streaming responses ( #30205 )
...
Signed-off-by: AlonKejzman <alonkeizman@gmail.com >
2025-12-22 20:56:06 +08:00
Li Wang
256a33ecb4
[Model] Fix bagel failed to run ( #31132 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
2025-12-22 02:15:54 -08:00
Roger Young
c02a2705f9
Update MiniMax-M2 ToolCall and add MiniMax-M2.1 in Docs ( #31083 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-12-22 05:28:40 +00:00
Kevin McKay
cf8eed7bef
[Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled ( #31109 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2025-12-21 21:14:58 -08:00
Kevin McKay
44ae85f725
[Misc] Fix typo: 'occured' -> 'occurred' ( #31120 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2025-12-21 21:14:27 -08:00
Kevin McKay
14c3e6ade3
[Misc] Fix spelling typos in model comments ( #31117 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2025-12-21 21:14:14 -08:00
Kevin McKay
42b42824ae
[Misc] Fix grammar errors in comments and messages ( #31115 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2025-12-21 21:14:02 -08:00
Kevin McKay
ec58c10ce1
[Misc] Fix quantization-related typos ( #31116 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2025-12-21 21:13:48 -08:00
Kevin McKay
8c084de59d
[Misc] Fix spelling typos in comments ( #31114 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
2025-12-21 21:13:14 -08:00
CedricHuang
19cc9468fd
[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM ( #30957 )
2025-12-21 22:34:49 -05:00
Jee Jee Li
097978a15d
[Kernel] Enable fused_qknorm_rope_kernel supports partial rope ( #30821 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-21 18:39:22 -08:00
Lucas Wilkinson
7e065eba59
[CI] Fix "2 Node Tests (4 GPUs in total)" ( #31090 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-22 10:32:40 +08:00
Steve Westerhouse
9d701e90d8
[Doc] Clarify FP8 KV cache computation workflow ( #31071 )
...
Signed-off-by: westers <steve.westerhouse@origami-analytics.com >
2025-12-22 08:41:37 +08:00
Michael Goin
06d490282f
[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size ( #30897 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-21 09:41:57 -08:00
Robert Shaw
b471092d3a
[MoE Refactor][4/N] Marlin Fp8 Mk ( #31036 )
2025-12-21 12:37:42 -05:00
Ameen Patel
93cabc417c
ci: add nvidia-smi warmup before Prime-RL integration test ( #31093 )
...
Signed-off-by: AmeenP <ameenp360@gmail.com >
2025-12-21 15:43:01 +00:00
Chauncey
bb80f69bc9
add aarnphm and chaunceyjiang to the new tool_parser directory ( #31088 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-21 03:24:34 +00:00
汪志鹏
3e92b2b7ac
[BugFix]fix gpt-oss v1/completions response bug ( #30608 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com >
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: bbrowning <bbrownin@redhat.com >
2025-12-21 10:39:31 +08:00
Jinzhen Lin
7c73ceb581
[Quantization] add marlin w4a8/w8a8 check ( #31061 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
2025-12-20 21:58:11 +00:00
Lucas Wilkinson
ae0770fa6b
[CI] Fix H200 Distributed test ( #31054 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-20 16:48:49 -05:00
Jinzhen Lin
ee52d9901d
[Quantization] support logical_widths for fp8 marlin ( #30962 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-20 12:02:57 -08:00
baonudesifeizhai
54c8924384
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash ( #28891 )
...
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com >
Signed-off-by: Dongjie Zou <85092850+baonudesifeizhai@users.noreply.github.com >
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com >
2025-12-20 18:22:04 +00:00
Yan Ma
560ae9638c
[XPU] enable fp8 online streaming quantization ( #30944 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-12-20 13:45:27 +00:00
Jeffrey Wang
1501a4070e
[Bugfix] Read truncate_prompt_tokens from pooling_params in AsyncLLM.encode() ( #31013 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2025-12-20 10:29:31 +00:00
Lucas Wilkinson
ff2168bca3
[CI] FIx fixture 'siglip_attention_config' not found ( #31053 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-20 03:46:15 +00:00
Gregory Shtrasberg
0be149524c
[ROCm][CI/Build] Update ROCm dockerfiles ( #30991 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-12-20 03:19:12 +00:00
zejunchen-zejun
d52c5096d7
[Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm ( #30869 )
...
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com >
2025-12-20 09:03:35 +08:00
Yuxuan Zhang
8a7a414374
GLM-4.7 Tool Parser and Doc Update ( #30876 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
2025-12-20 00:09:58 +00:00
Robert Shaw
95befecc18
[MoE Refactor][2/N] Use Modular Kernels for Fp8 ( #30825 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-12-19 23:36:38 +00:00
Wentao Ye
4cf9429897
[Bug] Fix error 'Dynamo failed to run FX node with fake tensors for Deepseek V3.2 ( #31046 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-19 23:31:31 +00:00
Robert Shaw
83a317f650
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) ( #30990 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-12-19 13:09:54 -08:00
Lucas Wilkinson
5f6477d1d0
[BugFix] Fix TypeError: unhashable type: 'dict' when serving deepseek32 ( #30924 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-19 16:07:54 -05:00
Wentao Ye
3bd8335bd0
[Refactor] Refactor for DeepGemmQuantScaleFMT using cache ( #30898 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-19 13:50:39 -07:00
Seiji Eicher
1ab5213531
Make engine core client handshake timeout configurable ( #27444 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-12-19 20:38:30 +00:00
Zhonghua Deng
969bbc7c61
[Model] Add MiMo-V2-Flash support ( #30836 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com >
Signed-off-by: Jumiar <liuanqim10@126.com >
Signed-off-by: Zyann7 <zyann7@outlook.com >
Co-authored-by: Jumiar <liuanqim10@126.com >
Co-authored-by: Zyann7 <zyann7@outlook.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-19 17:17:03 +00:00
Andrey Talman
268a972c62
Update Pytorch version update docs ( #30982 )
2025-12-19 16:08:53 +00:00
Jinzhen Lin
5fbfa8d9ef
[Quantization] fix marlin w8a8 check ( #30961 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
2025-12-19 07:33:22 -08:00
Shanshan Shen
23a1946e3b
[CustomOp][Refactor] Extract common methods for ApplyRotaryEmb CustomOp ( #31021 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-12-19 22:16:09 +08:00
Thomas Parnell
b5545d9d5c
[Bugfix] [Kernel] Triton attention kernels: mask out V blocks that fall outside sliding window ( #30887 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-12-19 21:39:54 +08:00
Nishidha Panpaliya
bd2b52fc2d
[CPU][Bugfix] Fix ppc64le CPU build ( #30871 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
2025-12-19 12:26:35 +00:00
Li, Jiang
420ba2dbb6
Enable aarch64 CPU performance benchmarks ( #26494 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com >
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Ioana Ghiban <ioana.ghiban@arm.com >
Co-authored-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-12-19 12:16:18 +00:00
Marko Rosenmueller
455949675d
[Frontend][Bug] allow tool calls in analysis channel ( #28139 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-12-19 10:47:44 +00:00
lif
086b96339f
[Bugfix] Add validation for tool requests when tool_parser is unavailable ( #30613 )
...
Signed-off-by: majiayu000 <1835304752@qq.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-19 18:23:28 +08:00
Jinzhen Lin
9187de9fac
[Quantization] enable compressed-tensors marlin support for turing (2) ( #31008 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
2025-12-19 08:56:35 +00:00
Isotr0py
ac1c934276
[Bugfix] Fix incorrect tiles creation for mm prefix triton attention ( #30974 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-19 16:00:33 +08:00
Wenqi Glantz
4924ac582c
Add hidden dimension validation for multimodal embedding inputs ( #30968 )
...
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com >
2025-12-19 07:59:36 +00:00
Li, Jiang
096b25c9ed
[Doc][CPU] Fix index link for CPU regular release wheels ( #31015 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-12-19 07:29:52 +00:00
Jinzhen Lin
de08b8f61b
[Quantization] enable compressed-tensors marlin support for turing ( #31000 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
2025-12-18 20:29:48 -08:00
Nick Hill
2ac85a4544
[BugFix] Fix logprobs with spec decode and modified logits ( #30846 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-18 19:58:28 -08:00
Andreas Karatzas
7b43db210c
[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements ( #30270 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-19 02:17:27 +00:00
PlatinumGod
6a09612b2e
[Bugfix] Fix tool_choice="none" being ignored by GPT-OSS/harmony models ( #30867 )
...
Signed-off-by: yujiepu <pyjapple@gmail.com >
Signed-off-by: PlatinumGod <pyjapple@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-12-19 09:34:27 +08:00
Nick Hill
45c0526ac9
[BugFix] Handle errors when preprocessing added requests ( #30895 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-19 01:29:11 +00:00
Benjamin Chislett
d6b3d39b6d
[Cleanup] Refactor FlashInferMetadataBuilder ( #29128 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-18 14:45:30 -08:00
Chendi.Xue
6ca74bc11a
[NIXL][BUG FIX] Fix both failing issue and accuracy issue with nixl + host_buffer on CUDA ( #30419 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-12-18 22:10:02 +00:00
Harry Mellor
19c583398a
Check for truthy rope_parameters not the existence of it ( #30983 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-18 13:59:10 -08:00
Nick Hill
b0b77c4655
[BugFix] Fix spec decode + structured outputs + preemption edge case ( #30916 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-18 12:59:55 -08:00
Kayvan Mivehnejad
634a14bd7d
Strengthen input validation and tests for 'parse_raw_prompts’. ( #30652 )
...
Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com >
2025-12-18 19:51:58 +00:00
Chen Zhang
24b65eff0d
[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 ( #30319 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-12-18 19:47:56 +00:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar ( #30363 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-18 19:46:28 +00:00
Wentao Ye
97000a2be7
[Bug] Fix compressed tensor not using deepgemm ( #30820 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-18 14:45:55 -05:00
Isotr0py
d2dc5dfc6e
[Bugfix] Remove tile_size=64 for mm_prefix triton attention ( #30973 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-18 20:42:32 +01:00
navmarri14
b8c477c115
tuned fused configs for B300 ( #30629 )
2025-12-18 11:41:59 -08:00
jiahanc
53ad423f26
[Perf] enable flashinfer rotary_embedding custom ops in DeepSeek rotary ( #30729 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
2025-12-18 14:31:18 -05:00
wz1qqx
889f8bb250
[BugFix]Reclaim resources to prevent memory leaks when use LMCacheMPConnector ( #30745 )
...
Signed-off-by: wz1qqx <ziqi.wang@novita.ai >
Co-authored-by: wz1qqx <ziqi.wang@novita.ai >
2025-12-18 19:09:51 +00:00
Fanli Lin
058926d48c
[XPU] allow custom workers (e.g. vllm-omni workers) to be used on XPU ( #30935 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com >
2025-12-18 10:16:36 -08:00
Isotr0py
700a5ad6c6
[MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface ( #30684 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-19 02:04:19 +08:00
Alec
62be3670cb
[BugFix] Add sleep to fix tight loop and release GIL ( #29476 )
...
Signed-off-by: alec-flowers <aflowers@nvidia.com >
Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-12-18 09:52:55 -08:00
inkcherry
500f26e6d3
[Bugfix] fix DP-aware routing in OpenAI API requests ( #29002 )
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com >
2025-12-18 09:50:42 -08:00
Nick Hill
686cbaac64
[Cleanup] Remove unused ModelRunner V1 InputBatch.num_tokens field ( #30218 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-18 09:17:00 -08:00
Vasiliy Kuznetsov
f4ee2c3d90
fix fp8 online quantization streaming with tp > 1 ( #30900 )
...
Signed-off-by: vasiliy <vasiliy@fb.com >
2025-12-18 11:45:15 -05:00
Xin Yang
9a5e96523b
[LoRA] Set default MXFP4 LoRA backend to Marlin ( #30598 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-18 08:42:22 -08:00
wzyrrr
326e7c3105
[Doc] Add Sophgo TPU Support ( #30949 )
...
Co-authored-by: zhaoyang.wang <zhaoyang.wang@sophgo.com >
2025-12-18 16:29:33 +00:00
Lucas Kabela
0db5439ded
[Bugfix][torch2.10] Fix test_qwen2_5_vl_compilation with 2.10 RC ( #30822 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-18 08:23:31 -08:00
sarathc-cerebras
28d15ab56b
adds jais 2 support ( #30188 )
...
Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-12-18 15:46:58 +00:00
Wentao Ye
6628758233
[Bug] Fix batch invariant in torch 2.10 ( #30907 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-18 07:27:51 -08:00
zhrrr
eee600c34f
[Misc] support nsys profile for bench latency ( #29776 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2025-12-18 14:52:20 +00:00
Michael Goin
100f93d2be
Filter safetensors files to download if .safetensors.index.json exists ( #30537 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-18 14:51:17 +00:00
vllmellm
96bf50a2c0
[ROCm] Serving Fails on Radeon Due to AITER Dtype Import ( #30952 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-12-18 11:47:46 +00:00
Li, Jiang
f90d3636e2
[Bugfix][CPU] Fix Mac CPU build ( #30955 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-12-18 01:38:22 -08:00
Ming Yang
8372be2828
[moe] Use enable_chunking func (to support disabling chunking) ( #29935 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-12-18 09:02:38 +00:00
Andreas Karatzas
8da6ae49c3
[ROCm][Bugfix] Fix fa_version argument error in flash_attn_maxseqlen_wrapper for ROCm without aiter ( #30909 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-18 16:45:51 +08:00
Lucas Wilkinson
30bb19a760
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) ( #30910 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-17 23:50:15 -08:00
Chauncey
aa7e836055
[Bugfix] Fix Unicode issues in GLM-4 tool calling ( #30920 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-18 07:12:17 +00:00
Andreas Karatzas
be2ad5f920
[ROCm][Bugfix] fix(structured_output): Skip guidance backend for schemas with patternProperties ( #30730 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-18 07:04:57 +00:00
wangxiyuan
a85724bd6e
[Platform] Let EPD work with non-cuda platform ( #30225 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-12-18 06:45:29 +00:00
Yifan Qiao
11a89cf95c
[Fix][FlexAttention] return max logical block index to handle reused blocks ( #30915 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu >
2025-12-18 06:42:21 +00:00
Li, Jiang
e3ab93c896
[CPU] Refactor CPU fused MOE ( #30531 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-12-18 14:36:49 +08:00
Nathan Price
fc2ae6d617
fix: add warmup for audio preprocessing ( #30706 )
...
Signed-off-by: Nathan Price <nathan@abridge.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-18 06:12:29 +00:00
Yihua Cheng
ec965569d9
[KV connector][LMCache] Only record the cuda event when there are request to store/load ( #30814 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2025-12-18 05:31:34 +00:00
Divakar Verma
82dc338ad6
[AMD][CI] fix lm eval ci arg ( #30911 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-12-18 13:18:26 +08:00
Vadim Gimpelson
717ac33d9c
[PERF] Qwen3-next. Add fp8 cutlass MoE tuned configs. chmod -x *MI308X.json ( #29553 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-12-18 13:16:04 +08:00
Li, Jiang
cfb7e55515
[Doc][CPU] Update CPU doc ( #30765 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-18 04:59:09 +00:00
zzhxxx
b166ef20e1
[refactor] Add prefix support to embed_tokens in DeepSeek MTP ( #30788 )
...
Signed-off-by: zzhx1 <zzh_201018@outlook.com >
2025-12-18 04:45:56 +00:00
Zhengxu Chen
5f2f3fba1d
[compile] Fix CI for test_gpt2_cache_hit ( #30902 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-12-17 20:22:23 -08:00
Matthew Bonanni
4a8412f773
[UX] Reduce DeepGEMM warmup log output to single progress bar ( #30903 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-17 20:21:51 -08:00
Bowen Bao
0c738b58bc
[Quantization] Support Quark int4-fp8 w4a8 for MoE ( #30071 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com >
2025-12-18 04:20:42 +00:00
gnovack
5a3adf581e
fused_moe_lora PDL improvements ( #30716 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-17 19:55:00 -08:00
Isotr0py
6fe5887652
[Chore] Remove v0 dead code for Qwen2.5-omni ( #30883 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-17 19:54:39 -08:00
Nicolò Lucchesi
bc3700e0cd
[NIXL] Support P tensor-parallel-size > D tensor-parallel-size ( #27274 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-18 11:53:30 +08:00
Micah Williamson
fd8afdf38d
[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 ( #30811 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-18 10:27:37 +08:00
SungMinCho
a0b782f9cc
[Metrics] Model FLOPs Utilization estimation ( #30738 )
...
Signed-off-by: SungMinCho <tjdals4565@gmail.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-12-18 01:40:51 +00:00
Rafael Vasquez
ed2897f336
[CI][Feature] Adds auto-rebase PR rule ( #30875 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2025-12-18 00:46:44 +00:00
Isotr0py
74a1ac38b0
[v1] Add PrefixLM support to TritonAttention backend ( #30386 )
2025-12-17 16:05:24 -08:00
Nathan Price
05a83dc6ee
feat(api): Eager chat template warmup to eliminate first-request latency ( #30700 )
...
Signed-off-by: Nathan Price <nathan@abridge.com >
2025-12-18 00:01:29 +00:00
Varun Sundar Rabindranath
e3fc374a9a
[BugFix] Workspace allocation during profile run : DeepEPHighThroughput + DeepGEMM ( #30899 )
2025-12-17 15:00:59 -08:00
Andrey Talman
e06d0bf0aa
2.9.1 PyTorch release update ( #28495 )
2025-12-17 12:20:22 -08:00
Xunzhuo
e3a0f21e6c
[docs]: add ecosystem projects sr in docs/governance ( #30844 )
...
Signed-off-by: bitliu <bitliu@tencent.com >
2025-12-17 18:45:56 +00:00
Matthew Bonanni
7eb6cb6c18
[Attention] Update tests to remove deprecated env vars ( #30563 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-17 09:49:59 -08:00
Nicolò Lucchesi
9ca8cb38fd
[CI][Bugfix] Fix flaky tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio ( #30878 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-17 18:49:56 +01:00
Cyrus Leung
2497228ad4
[Chore] Factor out logic for requesting initial memory ( #30868 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-17 07:32:17 -08:00
KimHyemin
196cdc3224
[Model] Gemma3: Support untied word embeddings ( #30827 )
...
Signed-off-by: www-spam <panmahm@naver.com >
2025-12-17 07:11:18 -08:00
高鑫崧
b7b6a60aca
Adapt the old parameter enable_thinking in chat_template_kwargs ( #30852 )
...
Signed-off-by: xinsong.gao <1418762819@qq.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-12-17 07:10:59 -08:00
rongfu.leng
9e67c4ce98
[Docs] fix function name ( #30748 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-12-17 12:14:45 +00:00
Jialin Ouyang
6e9dbcc50e
[Fix] uniform decode batch check ( #30747 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-12-17 19:58:43 +08:00
Hank_
6482e3895b
chores: adjust the attn register param order ( #30688 )
...
Signed-off-by: Hank <hcc.mayday@gmail.com >
2025-12-17 19:58:16 +08:00
Harry Mellor
fb980eb2fd
Fix lazy import ( #30858 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-17 03:33:50 -08:00
baoqian426
84896fda22
[Bugfix] deepseek-V3.2 self.weights_proj has no bias ( #30841 )
...
Signed-off-by: baoqian <1354987947@qq.com >
Signed-off-by: baoqian426 <1354987947@qq.com >
2025-12-17 03:32:34 -08:00
Kevin H. Luu
4bf6c23668
[ci] Sync test areas yaml file with test-pipeline ( #30862 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-12-17 02:30:56 -08:00
Chauncey
9ad5b21710
[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory ( #30749 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-17 02:27:30 -08:00
Wentao Ye
f284d7bd0c
[Bug] Fix AttributeError: 'ColumnParallelLinear' object has no attribute weight_scale_inv ( #30823 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-17 02:00:35 -08:00
Zhengxu Chen
53cd7f868b
[compile] Recompile graph module during Dynamo cache loading. ( #30743 )
...
Signed-off-by: Zhengxu Chen <zhxchen17@fb.com >
2025-12-17 02:00:12 -08:00
danielafrimi
7b966ae2ba
[Fix]Load kv-cache dtype from hf_quant_config.json automatically (fix for reverted PR) ( #30785 )
...
Signed-off-by: <>
Co-authored-by: root <root@gpu-937.slurm-workers-slurm.slurm.svc.cluster.local >
2025-12-17 01:56:38 -08:00
Zhengxu Chen
9db1db5949
[compile] Ignore VLLM_FORCE_AOT_LOAD from cache factors ( #30809 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-12-17 01:56:24 -08:00
Zhengxu Chen
177c391db2
[compile] Disable aot when eager backend is used. ( #30810 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-12-17 01:55:56 -08:00
Michael Goin
519ef9a911
[UX] Make vllm bench serve discover model by default and use --input-len ( #30816 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-17 01:55:30 -08:00
Ye (Charlotte) Qi
a100152288
[Kernels][FI] Skip trtllm attention when num_kv_heads=1 ( #30842 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-12-17 01:54:21 -08:00
Andrew Xia
4c054d89aa
[Doc][ResponsesAPI] add documentation ( #30840 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-12-17 01:53:02 -08:00
Sheng Lin
f4e884f222
[NIXL][Bugfix] Fix NIXL/RDMA registration failure over CuMemAllocator ( #29569 )
...
Signed-off-by: Somoku <linsh0@protonmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-12-17 01:52:58 -08:00
Xinyu Chen
3b1d440ede
CustomOp: grouped topk ( #29575 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2025-12-17 17:43:00 +08:00
Asaf Joseph Gardin
a9e15c21ef
[Mamba] Removed disable cascade attn in MambaModelConfig ( #30712 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-12-17 08:48:53 +00:00
Robin
20fda43151
[Bugfix][Frontend] Prevent IndexError in MiniMax M2 tool parser during streaming extraction ( #30555 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-12-17 16:37:57 +08:00
Yan Ma
4f735babb7
[XPU] fix broken fp8 online quantization for XPU platform ( #30831 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-12-17 00:28:13 -08:00
Li, Jiang
0cd5353644
[Bugfix][CPU] Fix CPU backend ROPE dispatch for VL models ( #30829 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-16 23:25:12 -08:00
Michael Goin
d4d2751732
Update note comment for flashinfer attention warmup ( #30711 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-16 21:29:03 -08:00
shanjiaz
009a773828
bump up compressed tensors version to 0.13.0 ( #30799 )
...
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com >
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com >
2025-12-16 21:01:04 -08:00
Cyrus Leung
44d3b1df3d
[CI/Build] Fix compatibility between #30244 and #30396 ( #30787 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-16 20:21:19 -08:00
Fadi Arafeh
bb5ac1fe38
[CPU] Add action to automatically label CPU related PRs ( #30678 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-12-17 04:21:07 +00:00
Michael Goin
811cdf5197
Update model-hosting-container-standards to 0.1.10 ( #30815 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-12-16 17:52:14 -08:00
Grzegorz K. Karch
f5db6385a1
Fix nemotron_nas intermediate_size computation ( #30795 )
...
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com >
2025-12-17 01:06:28 +00:00
Amr Mahdi
c0a88df7f7
[docker] Allow kv_connectors install to fail on arm64 ( #30806 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2025-12-16 16:41:57 -08:00
Nicolò Lucchesi
e087fbc393
[MM] Pass FA version in ViT Attn ( #30756 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-17 07:54:45 +08:00
Michael Goin
e80455ca8b
Replace deprecated enable_fusion with fuse_norm_quant in test_rms_group_quant ( #30817 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-16 23:40:47 +00:00
TJian
2410132bb1
[ROCm] [Bugfix] Fix torch sdpa hallucination ( #30789 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-12-16 15:32:43 -08:00
Michael Goin
0a1ab1e565
[Perf][Kernels] Vectorize csrc/activations_kernels.cu ( #29512 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-16 14:56:02 -08:00
Wentao Ye
b6ec077e05
[CI] Skip ci failure test ( #30804 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-16 22:47:53 +00:00
Jinzhen Lin
ce96857fdd
[Kernel][Quantization][MoE] add marlin kernel support for turing (sm75) ( #29901 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-12-16 14:35:28 -08:00
Daniel Cámpora
eaa82a709a
[Bugfix][DSV32] Fix overflow in topk. ( #30754 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-12-16 14:21:17 -08:00
Roger Wang
f5f51e5931
[Core][MM] Optimize encoder cache manager by operating with embeddings only ( #30475 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Sun Kim <sunytokki@gmail.com >
2025-12-16 14:18:17 -08:00
Lucas Wilkinson
9fec0e13d5
[Attention] Cache attention metadata builds across hybrid KV-cache groups ( #29627 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com >
2025-12-16 17:10:16 -05:00
jiahanc
254a7f8fd6
[Perf] Do FP4 quant before All gather on flashinfer trtllmgen MOE ( #30014 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
2025-12-16 13:01:48 -08:00
Wentao Ye
f21f5ea38c
[Refactor] Small refactor for group topk ( #30562 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-12-16 14:50:59 -05:00
Nicolò Lucchesi
ca702a14dc
[Frontend] Add max-completion-token option to transcription/translation endpoints ( #30769 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-16 19:36:49 +00:00
Michael Goin
10ee1c64cf
[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test ( #30723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-16 14:28:34 -05:00
Mark McLoughlin
66c3537e5d
[Docs][API] Remove warning about LoRARequest being internal-only ( #30774 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-12-16 08:35:46 -08:00
Harry Mellor
e1625498f4
Update where bytes_to_unicode is imported from ( #30771 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-16 08:05:01 -08:00
Harry Mellor
0b0acc758e
Remove head_mask from Ultravox and Swin ( #30764 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-16 08:02:41 -08:00
Harry Mellor
af506fd76a
Fix instantiation of HfHubHTTPError in LoRA test ( #30768 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-16 08:02:24 -08:00
Ming Yang
ce12b407f2
[TRTLLM] Remove the MoE GEMM weight name change ( #30713 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-12-16 11:01:38 -05:00
Wentao Ye
59bd5f6a71
[Feat] Enable eplb with default all2all backend ( #30559 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-16 10:33:52 -05:00
Lucas Wilkinson
00a8d7628c
[BugFix] Fix memory spike in workspace allocation ( #30744 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-16 06:46:22 -08:00
Isotr0py
4de08ad698
[CI/Build] Skip broken ViT backend functionality test tempoarily ( #30782 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-16 06:45:25 -08:00
Nicolò Lucchesi
75eb302a2e
[Bugfix] Whisper fix number of allocated CrossAttn blocks per-request ( #30772 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-16 14:20:19 +00:00
Pleaplusone
9dbbc59b15
[ROCm][MTP] Support MTP for AITER MLA backend ( #28624 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-12-16 14:10:26 +00:00
Boyuan Feng
104003dc77
update piecewise cudagraph warning when splitting_ops=[] ( #30728 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-16 06:09:34 -08:00
TJian
d0fb572929
[ROCm] [AITER] [DOC] Add usage description about check functions in _aiter_ops ( #30586 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-12-16 13:50:47 +00:00
Harry Mellor
6f15ac5de7
Don'e assume position_embedding_type will be present for BERT and RoBERTa models ( #30770 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-16 13:40:26 +00:00
Junru Shen
676db55eec
[Bugfix] Fix prefix_repetition routing in bench throughput ( #29663 )
...
Signed-off-by: Junru Shen <jrshen.sjr@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-16 01:37:15 -08:00
Jee Jee Li
0e391e7570
[Bugfix] Fix RequestOutput miss lora_request ( #30636 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-16 01:36:35 -08:00
Andrew Xia
0d0c929f23
[responsesAPI][8] input/output messages for ResponsesParser ( #30158 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Signed-off-by: Andrew Xia <axia@meta.com >
Co-authored-by: Andrew Xia <axia@fb.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-12-16 13:54:59 +08:00
Isotr0py
e94384bbad
[Bugfix] Fix broken ViT attention selection for Blackwell device ( #30731 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-16 05:24:32 +00:00
jiangkuaixue123
b9ff4f2a8d
[feature] extend DBO to XBO ( #30120 )
...
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com >
Co-authored-by: root <root@hk01dgx028.cm.cluster >
2025-12-16 00:04:01 -05:00
Boyuan Feng
c881db364e
improve lazy import test ( #30733 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-16 03:12:05 +00:00
Shanshan Shen
3bd9c49158
[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic ( #29873 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
Co-authored-by: gcanlin <canlinguosdu@gmail.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-12-15 19:08:16 -08:00
Amr Mahdi
ff21a0fc85
[docker] Restructure Dockerfile for more efficient and cache-friendly builds ( #30626 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2025-12-15 18:52:19 -08:00
penfree
bbd850e597
[Bugfix] fix streaming final output for non harmony ( #30237 )
...
Signed-off-by: penfree <qiupengfei@baidu.com >
Co-authored-by: penfree <qiupengfei@baidu.com >
2025-12-16 09:03:11 +08:00
Shengqi Chen
511e81e7c9
[BUILD] use sm_100f when compiling flashmla to fix support on sm103 ( #30705 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-15 14:48:01 -08:00
Matthew Bonanni
a182be4308
[UX][Attention] Add attention_config argument to LLM() ( #30710 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-15 17:29:09 -05:00
Kevin Musgrave
c01d589813
[Benchmarks] auto_tune.sh: Use hostname variable for server requests ( #30529 )
...
Signed-off-by: Kevin Musgrave <kevin.musgrave@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 22:00:29 +00:00
Matthew Bonanni
60dbf7d8f1
Update batch invariant to use attention config ( #30704 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 15:24:16 -05:00
Michael Goin
a450c64a30
[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args ( #30708 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-15 20:18:02 +00:00
Fadi Arafeh
b2191abdca
[docs][fix] Update Arm CPU vLLM wheel installation docs ( #30594 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-12-15 19:46:25 +00:00
Matthew Bonanni
51e5b3e3c4
[Bugfix] Fix ViT with FlashAttention on ROCm ( #30703 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-15 19:45:21 +00:00
Isotr0py
ec154c36ee
[Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform ( #30212 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 17:36:07 +00:00
Harry Mellor
970713d4a4
Remove SkipValidation from ModelConfig ( #30695 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-15 17:34:08 +00:00
mondaylord
17fec3af09
[Bugfix] Fix missing first token in tool calls during reasoning-to-tool transition ( #30671 )
...
Signed-off-by: mondaylord <20212010046@fudan.edu.cn >
2025-12-15 16:13:37 +00:00
yjc9696
855b101d75
[Frontend] add tools for dsv32 developer role ( #30040 )
...
Signed-off-by: pridejcyang <pridejcyang@tencent.com >
Co-authored-by: pridejcyang <pridejcyang@tencent.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-15 15:08:47 +00:00
Robert Shaw
d0502b4928
[MoE][Refactor 1/N] Separate Online Quantization ( #30627 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-12-15 06:54:53 -08:00
Max Hu
3f175f18a2
[Bugfix] Fix multimodal configuration for Qwen3VL MOE model ( #30670 )
...
Signed-off-by: Max Hu <hyoung2991@gmail.com >
2025-12-15 14:06:01 +00:00
Cyrus Leung
ed586e7724
[Refactor] [3/N] Move tool parser tests and run on CPU ( #30693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-15 13:45:36 +00:00
Chauncey
2a1776b7ac
[Refactor] [2/N] Move tool parsers into the vLLM main directory ( #30675 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-15 12:54:52 +00:00
Nicolò Lucchesi
185c22bf2f
[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allocator + KV cache connector ( #29805 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-15 11:17:58 +00:00
duke
e4806d973a
[BugFix] Add embed_input_ids method to make QWenLMHeadModel a vllm model ( #30674 )
...
Signed-off-by: root <iwzbi@zju.edu.cn >
Co-authored-by: root <iwzbi@zju.edu.cn >
2025-12-15 10:38:29 +00:00
wang.yuqi
4429d934de
[Model] Automatic conversion of TokenClassification model ( #30666 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-12-15 08:13:00 +00:00
ゆり
33278073d6
typing: Add type hints to TurnMetrics class in context.py ( #30552 )
...
Co-authored-by: zkexorability <zkexorability@gmail.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-14 23:00:39 -08:00
汪志鹏
1adeb3b84c
[New Model] BAGEL support (AR only) ( #28439 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com >
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-15 14:58:23 +08:00
Kunshang Ji
e3a1cd1c59
[XPU] fix Dockerfile.xpu, avoid wheel conflicts ( #30662 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-12-15 13:32:06 +08:00
Wentao Ye
3778673ea8
[Feat] Refactor for parallel_config in FusedMoEModularKernel ( #30282 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-12-15 04:21:36 +00:00
Seokhyun An
b337647aa0
[Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template ( #30648 )
...
Signed-off-by: Seokhyun An <iamseokhyun@gmail.com >
2025-12-15 04:21:12 +00:00
Jee Jee Li
a524d1ba0a
[Bugfix] Fix deepseek_v32 tokenizer_mode ( #30658 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-15 04:20:31 +00:00
Shanshan Shen
87b4d1557d
[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. ( #30125 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-12-15 11:13:32 +08:00
Wenqi Glantz
84e23d103d
additional protection for CVE-2025-62164 ( #30649 )
...
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com >
2025-12-15 03:07:10 +00:00
Shanshan Shen
738648fb81
[CustomOp] Support object-level enable for CustomOp ( #30547 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-12-15 11:02:09 +08:00
Boyuan Feng
917fdae5b2
[Log] Skip piecewise cudagraph warn when using full cudagraph ( #30657 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-15 02:49:45 +00:00
Robert Shaw
e2ed238885
Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" ( #30653 )
2025-12-14 19:33:41 -05:00
Or Ozeri
174e39ead7
CPU KV Offloading: Use more CUDA streams ( #29013 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-12-14 23:50:45 +00:00
RioS
9ccbf6b692
[responsesAPI]add extra body parameters ( #30532 )
...
Signed-off-by: Ri0S <aa248424@gmail.com >
2025-12-14 19:25:45 +00:00
Chendi.Xue
ae2e503dda
[NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 ( #30420 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-12-14 15:38:28 +00:00
Tsukasa OI
9e33a1a75b
[Model][Quantization] Override HF defaults to GGUF ones (incl. Qwen3 MoE) ( #30118 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com >
2025-12-14 15:01:42 +00:00
Vensen
add4b0ca44
[Bugfix][benchmarks] Fix input token calculation for rerank benchmark metrics ( #30596 )
...
Signed-off-by: vensen <vensenmu@gmail.com >
2025-12-14 14:57:15 +00:00
ZiTian Zhao
ae88aada38
[Feature]Add EVS (Efficient Video Sampling) Support for Qwen3-VL ( #29752 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
Co-authored-by: deitxfge <huhaibo1990@126.com >
2025-12-14 05:24:56 -08:00
yifant-code
5ccf0efa84
[Bugfix] Improve error messages in ModelConfig validation ( #30213 )
...
Signed-off-by: ytian218 <ytian218@bloomberg.net >
Co-authored-by: ytian218 <ytian218@bloomberg.net >
2025-12-14 21:23:37 +08:00
ElizaWszola
994acec0cc
[Bugfix] Fix fusion for VL models ( #30244 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
2025-12-14 21:22:37 +08:00
zifeitong
48b8456ff9
[Bugfix] Revert Qwen2-VL part of change in #28271 ( #30542 )
...
Signed-off-by: Zifei Tong <zifeitong@gmail.com >
2025-12-14 05:20:08 -08:00
Drew Botwinick
5b64ac21f9
[Bugfix] Update get_processor_data to use get_all method ( #30583 )
...
Signed-off-by: Drew Botwinick <6953152+dbotwinick@users.noreply.github.com >
2025-12-14 21:19:20 +08:00
Bin Bao
a8ec486592
[Misc] Add a script to benchmark compilation time ( #29919 )
...
Signed-off-by: Bin Bao <binbao@meta.com >
2025-12-14 13:02:39 +00:00
tjp_zju
6ecc1e411b
[Bugfix] fix _get_quant_method of FusedMoE for deepseekV3.2 on non-NV… ( #30057 )
...
Signed-off-by: tjp_zju <tanjianpingzju1990@gmail.com >
2025-12-14 02:20:51 -08:00
Shengliang Xu
0bb0bae436
Nvidia ModelOpt workaround for issue 28072 ( #30164 )
...
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2025-12-14 18:18:31 +08:00
Johannes F
060893654d
fix: Update json features supported by xGrammar ( #30390 )
...
Signed-off-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com >
Signed-off-by: Johannes F <johannesflommersfeld@users.noreply.github.com >
Co-authored-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-14 02:16:06 -08:00
Matthias Gehre
e9add129ad
[Bugfix] awq_gemm: fix argument order swap ( #30364 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-12-14 18:15:37 +08:00
Ilya Markov
3224ea9915
[torch.compile] Add encoder tag for compilation ( #30489 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2025-12-14 18:15:11 +08:00
Lasha Koroshinadze
3a20450d31
Add AudioFlamingo3 model support ( #30539 )
...
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com >
Signed-off-by: Lasha Koroshinadze <26011196+lashahub@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-14 02:14:55 -08:00
Didier Durand
1a55cfafcb
[Doc]: fixing typos in various files ( #30540 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-12-14 02:14:37 -08:00
drslark
add1b9d3de
[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring ( #30632 )
...
Signed-off-by: drslark <slarksblood@qq.com >
2025-12-14 01:32:16 -08:00
Cyrus Leung
dcb31196da
[Chore] Remove redundant RequestPrompt ( #30612 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-14 09:22:37 +00:00
Laith Sakka
f569c654e1
enable unbacked with aot_compile ( #30462 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2025-12-14 08:14:06 +00:00
Micah Williamson
97f2f160fd
[ROCm][CI] Add "Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test" Back Into AMD CI ( #30590 )
...
Signed-off-by: David Chen <530634352@qq.com >
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com >
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-14 06:56:26 +00:00
Kayvan Mivehnejad
29f7d97715
Improve parse_raw_prompt test cases for invalid input .v2 ( #30512 )
...
Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com >
2025-12-14 11:18:41 +08:00
Qier Li
dc7fb5bebe
[Bug][KVConnector][Metrics] Remove a vacuous assertion breaking external-launcher ( #30577 )
...
Co-authored-by: Qier Li <qier@fb.com >
2025-12-14 01:23:08 +00:00
Qidong Su
24429d5924
[Doc] Add instructions for building docker image on GB300 with CUDA13 ( #30414 )
...
Signed-off-by: Qidong Su <soodoshll@gmail.com >
2025-12-13 21:56:53 +00:00
Wentao Ye
6e78ed6ba7
[Logs] Optimize startup logs 4 ( #29903 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-13 16:12:53 -05:00
Isotr0py
7c16f3fbcc
[Doc] Add documents for multi-node distributed serving with MP backend ( #30509 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-13 18:02:29 +00:00
lif
ddbfbe5278
[Docs] Clarify Expert Parallel behavior for attention and MoE layers ( #30615 )
...
Signed-off-by: majiayu000 <1835304752@qq.com >
2025-12-13 08:37:59 -09:00
Laith Sakka
763963aa73
set assume_32bit_indexing and pass unbacked hints ( #30459 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2025-12-13 15:36:53 +00:00
Cyrus Leung
39cefbdf17
[Refactor] TokenizerRegistry only uses lazy imports ( #30609 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-13 23:16:22 +08:00
Chen Zhang
ace34e3783
[Bugfix] Qwen3-next with --hf-overrides \{\"num_hidden_layers\":8\} ( #30433 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-12-13 22:12:45 +08:00
Isotr0py
e5db3e2774
[CI/Build] Fix broken mm processor test Mistral-3-large ( #30597 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-13 04:43:01 -08:00
Cyrus Leung
64251f48df
[Chore] Adjust tokenizer import to avoid circular imports ( #30601 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-13 04:42:39 -08:00
Nick Hill
1cec5b7ea9
[Scheduer] Simplify stop checking for pooling models ( #30591 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-13 09:45:26 +00:00
Cyrus Leung
b09806e28f
[Bugfix] Dictionary MM embeddings for online chat ( #30507 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-13 15:48:56 +08:00
Tsukasa OI
fdc135d768
[Misc][Quantization] Clarify the intent of GGUF FusedMoE weight materialization ( #30310 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com >
2025-12-13 13:55:14 +08:00
Roberto L. Castro
4fa7ce46f3
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM ( #30484 )
...
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-12-12 19:34:23 -08:00
Nicolò Lucchesi
57e9bf1864
[CI] Whisper logprobs tests ( #30504 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-13 10:49:11 +08:00
Michael Goin
2f32a68d75
[CI] Update several models in registry that are available online now ( #30514 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-12-12 18:28:13 -08:00
Matthew Bonanni
f5dfbbd8e9
[Docs] Remove references to VLLM_ATTENTION_BACKEND ( #30564 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-13 10:20:15 +08:00
Michael Goin
fc0119425c
Add IBM and Red Hat to compute resources sponsors ( #30581 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-12-13 01:34:23 +00:00
Matthew Bonanni
86a3261525
[Bugfix] Pass FA version in MultiHeadAttention ( #30575 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-13 00:02:11 +00:00
rasmith
08f8a5627e
[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality ( #30292 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-12-12 18:41:56 -05:00
Kevin H. Luu
b4039c08b5
[ci] Mark PrimeRL integration test as soft fail ( #30578 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-12-12 14:13:09 -08:00
Wentao Ye
1e6b115300
[Refactor] Reduce duplicate code in per_token_group_quant cuda kernels ( #30496 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-12 16:45:23 -05:00
danielafrimi
13618626df
[MoE-FP8-modelopt] Add FlashInfer alignment padding for intermediate dimensions ( #29748 )
...
Signed-off-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster >
Signed-off-by: dafrimi <dafrimi@nvidia.com >
Co-authored-by: Daniel Afrimi <dafrimi@pool0-00589.cm.cluster >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-12-12 20:42:32 +00:00
danielafrimi
6ec0d8dbe4
[Fix]Load kv-cache dtype from hf_quant_config.json automatically ( #29980 )
...
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com >
2025-12-12 11:27:47 -08:00
Li, Jiang
9693dd0fe3
[CI/Build] Add x86 CPU wheel release pipeline ( #28848 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-12-12 19:21:35 +00:00
Xin Yang
1f19d8f899
[Perf] Set split_k to 1 for triton_kernels ( #30528 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2025-12-12 14:07:57 -05:00
shivampr
cd7740ac5c
[ROCm] Enable Triton ScaledMM fallback + kernel selection fix ( #26668 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com >
Signed-off-by: Shivam <shivamprasad91@gmail.com >
2025-12-12 13:28:20 -05:00
Wentao Ye
02a5880394
[CI] Fix mypy for vllm/v1/executor ( #30517 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-12 18:05:34 +00:00
realliujiaxu
d2c919dcc2
[bugfix] fix bug when top_logprobs=0 with spec decoding ( #30059 )
...
Signed-off-by: realliujiaxu <realliujiaxu@163.com >
2025-12-12 09:03:35 -08:00
Benjamin Bartels
f3237f3f6b
[Frontend] Fixes anthropic streaming message_start usage nesting ( #30266 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
2025-12-12 16:28:54 +00:00
jvlunteren
9c0ee995a8
[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel ( #28306 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com >
Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com >
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com >
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-12-12 16:55:40 +01:00
Michael Goin
09ad3b76b3
[Bug] Fix attention_backend arg string parsing ( #30534 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-12 08:40:50 -07:00
Christina Norman
dc13c99eed
fix(gguf): Disable bfloat16 for GGUF on blackwell device ( #30408 )
...
Signed-off-by: Christina <truffle@gmail.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Christina Norman <christina@example.com >
Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-12 23:10:12 +08:00
Vladislav Nosivskoy
3e34adcdfb
[DeepSeek V3.2] Proper drop_thinking logic ( #30490 )
...
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com >
2025-12-12 15:01:06 +00:00
Lucas Wilkinson
3e41992fec
[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 ( #27532 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-12 05:57:47 -08:00
吴坎
91401c7a26
[Bugfix] Fix CMakeLists Environment Variable ( #21804 )
...
Signed-off-by: wu-kan <github@wu-kan.com >
Signed-off-by: 吴坎 <github@wu-kan.cn >
Signed-off-by: wu-kan <github@wu-kan.cn >
2025-12-12 10:54:52 +00:00
Jaehwang Jung
f90319d5d1
[Bugfix] Schedule failure due to wrong get_image_size_with_most_features ( #29692 )
2025-12-12 02:27:20 -08:00
rasmith
302b2c1eb9
[CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. ( #30291 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-12-12 09:30:23 +00:00
Ben Browning
8f8fda261a
[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting ( #28729 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-12-12 12:59:53 +08:00
Zhengxu Chen
fe1787107e
[compile] Parse compile range cache keys as Range during cache loading. ( #30516 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-12-12 04:30:51 +00:00
Andreas Karatzas
783644e4ac
[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available ( #30527 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-12 03:54:56 +00:00
Ryan Rock
197473c4e7
[CI/Build] Use spawn subprocess for ROCm ( #30272 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2025-12-12 03:33:17 +00:00
Nick Hill
947dfda9c2
[LMCache] Relax lmcache version requirement ( #30425 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-11 18:18:47 -09:00
Michael Goin
9f2fc16a69
[Bugfix][Model] Fix Afmoe rope_parameters issue ( #30505 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-12 02:53:57 +00:00
Bhanu Prakash Voutharoja
6a6fc41c79
gptq marlin quantization support for fused moe with lora ( #30254 )
...
Signed-off-by: Bhanu068 <voutharoja.bhanu06@gmail.com >
2025-12-12 02:27:22 +00:00
Fadi Arafeh
f355ad5412
[CPU][FIX] Fix build failures on Arm CPUs with torch nightly ( #30481 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-12-12 02:09:25 +00:00
Lucas Wilkinson
042da73244
[Core] Refactor _build_attention_metadata ( #29628 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-11 17:54:12 -08:00
Andreas Karatzas
b5945d49c0
[ROCm][CI] Use mi325_4 agent pool for V1 e2e tests ( #30526 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-12 01:37:24 +00:00
rasmith
ba80926681
[CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they require cutlass_pack_scale_fp8 ( #30508 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-12 01:02:19 +00:00
jiahanc
0ab23c2b2b
[fix] fix SM check for Flashinfer TRTLLM MOE ( #30314 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
2025-12-12 01:00:58 +00:00
rasmith
48661d275f
[CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm ( #30417 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-12-12 00:24:20 +00:00
Ev Lacey
d527cf0b3d
[FIX]Patch run-cluster.sh (fix for #28328 ) ( #30002 )
...
Signed-off-by: elacey <elacey@nvidia.com >
Signed-off-by: Ev Lacey <github@everettlacey.com >
2025-12-11 23:36:31 +00:00
Concurrensee
2cc5affc38
[ROCM][CI] Fix AMD Examples Test Group ( #30276 )
...
Signed-off-by: Yida Wu <yida.wu@amd.com >
Signed-off-by: Yida <yida.wu@amd.com >
2025-12-11 18:03:54 -05:00
Andrew Briand
a00d88973d
[EPLB] Support EPLB w/ NVFP4 ( #29804 )
...
Signed-off-by: Andrew Briand <abriand@nvidia.com >
Co-authored-by: Andrew Briand <abriand@nvidia.com >
2025-12-11 22:59:40 +00:00
Wentao Ye
61249b177d
[Refactor] Remove useless syncwarp ( #30510 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-11 17:43:41 -05:00
Wentao Ye
c817b14151
[Perf] Optimize deepgemm experts initialization, 3.9% TTFT improvement ( #30494 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: li-jinpeng <3332126450@qq.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-12-11 17:28:34 -05:00
ioana ghiban
3efdc3feae
[Docs][CPU backend] Add pre-built Arm CPU Docker images ( #30491 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com >
2025-12-11 22:03:29 +00:00
Nicolò Lucchesi
0efd9f867c
[Core] Whisper Enable Encoder Batching ( #29421 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-11 21:06:51 +00:00
Xingyu Liu
90d6cf921f
[BugFix][MM]support VLLM_RANDOMIZE_DP_DUMMY_INPUTS ( #30472 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-11 21:00:15 +00:00
Harry Mellor
cf3eacfe58
Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim ( #30389 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-11 20:45:23 +00:00
Zhengxu Chen
92fea56fd1
[compile] Stop one-off setting enable_aot_compile and use context manager instead. ( #30503 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-12-11 20:28:03 +00:00
Ye (Charlotte) Qi
e458270a95
[Misc] Add mcp to requirements ( #30474 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-12-11 20:06:09 +00:00
Andreas Karatzas
72aaac5b66
[ROCm][Bugfix] Add MLACommonMetadata to allowed attention types for speculative decoding ( #30430 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-11 19:25:01 +00:00
汪志鹏
0e71eaa644
[Feature] AWQ marlin quantization support for fused moe with lora ( #30442 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com >
2025-12-11 18:03:32 +00:00
Harry Mellor
8781cd6b88
Add Eagle and Eagle3 support to Transformers modeling backend ( #30340 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-11 17:02:10 +00:00
Julien Denize
aa3c250c48
[IMPROVEMENT] Change MistralReasoningParser behavior ( #30391 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2025-12-11 17:53:26 +01:00
Shengqi Chen
305b168a9f
[CI] refine more logic when generating and using nightly wheels & indices, add cuda130 build for aarch64, specify correct manylinux version ( #30341 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-12 00:42:30 +08:00
Harry Mellor
93db3256a4
Give pooling examples better names ( #30488 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-11 16:22:58 +00:00
ioana ghiban
17cb540248
[Docs][CPU Backend] Add nightly and per revision pre-built Arm CPU wheels ( #30402 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-11 15:57:10 +00:00
Harry Mellor
97a042f3bc
Make the httpx logger less annoying when Transformers v5 is installed ( #30480 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-11 15:44:56 +00:00
Cyrus Leung
3a3b06ee70
[Misc] Improve error message for is_multimodal ( #30483 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-11 06:39:51 -08:00
Martin Hickey
f4417f8449
[KVConnector] Add KV events to KV Connectors ( #28309 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
2025-12-11 15:30:29 +01:00
Qiu
a11f4a81e0
[Misc][PCP&DCP] relocate PCP feature check ( #30050 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-11 03:36:18 -08:00
Kenichi Maehashi
853611bb18
Fix typo of endpoint name in CLI args docs ( #30473 )
...
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp >
2025-12-11 11:07:56 +00:00
Cyrus Leung
d917747c95
[Bugfix] Fix task still being passed in tests/benchmarks ( #30476 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-11 10:33:55 +00:00
wang.yuqi
a5f9fb5960
[Deprecation] Deprecation --convert reward, use --convert embed instead. ( #30463 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-12-11 10:18:25 +00:00
jeremyteboul
4515eb1a0b
[Fix] Update lazing loading of video loader backend ( #30444 )
...
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com >
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com >
2025-12-11 10:14:57 +00:00
Cyrus Leung
13d63b65e0
[Deprecation] Remove missed fallback for embed_input_ids ( #30469 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-11 10:06:36 +00:00
wz1qqx
b4e8b91278
[Fix]fix import error from lmcache ( #30376 )
...
Signed-off-by: wz1qqx <ziqi.wang@novita.ai >
Co-authored-by: wz1qqx <ziqi.wang@novita.ai >
2025-12-11 09:23:52 +00:00
Rei.
6299628d32
[bugfix] fix MiniMaxM2ReasoningParser streaming output not separating reasoning_content. ( #29882 )
...
Signed-off-by: Rei <1477174254@qq.com >
2025-12-11 09:05:08 +00:00
Ming Yang
fba8906930
[perf] Use direct copy (broadcast) instead of cat for k_nope/k_pe in MLA prefill ( #29710 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-12-11 08:20:45 +00:00
Ning Xie
d02d1043de
fix: enhance human_readable_int function ( #30337 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-12-10 23:30:33 -08:00
Cyrus Leung
979f50efd0
[Deprecation] Remove fallbacks for embed_input_ids and embed_multimodal ( #30458 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-11 06:58:23 +00:00
gh-wf
36c9ce2554
Ensure minimum frames for GLM 4.6V compatibility ( #30285 )
...
Signed-off-by: Wayne Ferguson <wayneferguson@gmail.com >
2025-12-11 05:26:49 +00:00
xyDong0223
1a516557e1
[Doc] Add Baidu Kunlun XPU support ( #30455 )
...
Signed-off-by: xyDong0223 <dongxinyu23@gmail.com >
2025-12-11 04:52:17 +00:00
Wentao Ye
d6464f2679
[Chore] Fix torch precision warning ( #30428 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-11 04:05:56 +00:00
Cyrus Leung
7e24e5d4d6
[Deprecation] Remove deprecated task, seed and MM settings ( #30397 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-10 19:59:39 -08:00
Cyrus Leung
5a87d8b9b1
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release ( #30396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-10 19:59:35 -08:00
Divakar Verma
d1e1fb4363
[Bugfix] Fix grouped_topk pytorch impl when num_experts can't be grouped properly ( #29439 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-12-10 19:47:18 -08:00
Andreas Karatzas
b51255f369
[ROCm] Fix broken import in platform attention backend dispatching ( #30432 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-11 01:12:58 +00:00
Sage Moore
b4054c8ab4
Revert "[CI] Add Async Eplb nightly CI tests ( #29385 )" ( #30431 )
2025-12-11 00:48:35 +00:00
Xu Song
25221b44bb
Add more docs for regex ( #30106 )
...
Signed-off-by: Xu Song <xusong.vip@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-11 00:12:21 +00:00
shivampr
8580919ac3
[Bugfix] fix confusing OOM errors during v1 init ( #28051 )
...
Signed-off-by: Shivam <shivamprasad91@gmail.com >
Signed-off-by: shivampr <shivampr.dev@gmail.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-12-10 23:17:41 +00:00
Christina Norman
166ac3c94d
fix(shm): Add memory barriers for cross-process shared memory visibility ( #30407 )
...
Signed-off-by: Christina Holland <hey@christinaholland.com >
Signed-off-by: Christina <truffle@gmail.com >
2025-12-10 23:01:19 +00:00
Seiji Eicher
b9e0951f96
[docs] Improve wide-EP performance + benchmarking documentation ( #27933 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-12-10 22:15:54 +00:00
Michael Goin
fcb894222f
[Docs] Update EPLB docs ( #30426 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-12-10 11:56:51 -09:00
Nick Hill
6ccb7baeb1
[LMCache] Fix breakage due to new LMCache version ( #30216 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-10 11:52:01 -08:00
Po-Han Huang (NVIDIA)
eea41804a4
[bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used ( #30241 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
2025-12-10 11:18:51 -08:00
Jialin Ouyang
9f042ba26b
[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well ( #29289 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-12-10 14:13:01 -05:00
Cyrus Leung
e72d65b959
{Deprecation] Remove tokenizer setter ( #30400 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-10 19:10:58 +00:00
Will Eaton
a9e4106f28
[P/D] KV Load Failure Recovery/Abort Configuration ( #26813 )
...
Signed-off-by: Will Eaton <weaton@redhat.com >
Signed-off-by: Will Eaton <me@wseaton.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-10 11:00:52 -08:00
Anker
e8e8cd73e5
[Bugfix] Fix HunyuanOCR cross-image contamination in batch processing ( #30344 )
...
Signed-off-by: Lennart Brog <lennart.borg@list-ag.de >
Signed-off-by: Anker <20343812+anker-c2@users.noreply.github.com >
2025-12-10 18:09:31 +00:00
Cyrus Leung
253305d5b2
[Chore] Delay recent deprecations ( #30398 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-10 17:48:38 +00:00
Matthew Bonanni
794a7875ee
[Misc] Consistent case for vllm bench serve results ( #30403 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-10 09:44:02 -08:00
Mark McLoughlin
2dcbac9077
[Docs] Generate full list of metrics in user docs ( #30388 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-10 16:09:34 +00:00
Lucas Wilkinson
aacf0abf8b
[BugFix] Fix AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight_scale' ( #30399 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-10 07:59:23 -08:00
Nicolò Lucchesi
c756fb6781
[Core] Whisper enable FULL_DECODE_ONLY CudaGraph ( #30072 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-10 06:14:24 -08:00
Roger Young
d017bceb08
[BugFix] Fix minimax m2 model rotary_dim ( #30384 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-12-10 04:58:50 -08:00
Aditya Tewari
cebda2a4af
[CPU] Support for Whisper ( #30062 )
...
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com >
2025-12-10 04:58:42 -08:00
Daniele
53d2420b44
[Bugfix] tpu_model_runner: set vllm config context when calling reset_dynamo_cache() ( #30331 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com >
2025-12-10 04:58:35 -08:00
Chauncey
9db78f34dc
[Bugfix] Fix the issue where DeepSeek v3.2 cannot use structured_output ( #30371 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-10 08:30:16 +00:00
Fadi Arafeh
434ac76a7c
[cpu][ci] Add CPU Attention Tests for Neon Backend ( #30347 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-12-10 05:37:35 +00:00
Andreas Karatzas
ed7af3178a
[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group ( #29358 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
Co-authored-by: Micah Williamson <micah.williamson@amd.com >
2025-12-10 05:33:13 +00:00
Radu Salavat
180345807f
[CMake][Build]: Remove unused ACL CMake env variables ( #30339 )
...
Signed-off-by: Radu Salavat <radu.salavat@arm.com >
2025-12-10 04:27:19 +00:00
Mingliang Li
d007387aa7
[Bugfix] Cache added_vocab to avoid per-token overhead ( #30351 )
...
Signed-off-by: limingliang <limingliang@stepfun.com >
Co-authored-by: limingliang <limingliang@stepfun.com >
2025-12-10 12:05:51 +08:00
Wilson Wu
3bdd426636
Fix typos in comments across multiple files ( #30345 )
...
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-12-09 20:05:28 -08:00
haoyangli-amd
06462392e4
[bugfix][quantization] fix quark qwen3 kv_cache quantization ( #30308 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com >
2025-12-10 03:24:12 +00:00
Micah Williamson
7d80c73d42
[CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance ( #30367 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-10 02:35:49 +00:00
rasmith
b75f826fca
[CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform ( #30020 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-12-10 02:28:37 +00:00
Andrew Xia
c3487aca34
[responsesAPI][6] Fix multi turn MCP tokenization ( #30230 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-12-10 10:13:13 +08:00
Lucas Wilkinson
abe93bce59
[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode ( #29624 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-12-09 17:18:10 -08:00
ElizaWszola
2e7035dd8c
[Bugfix] Fix fp8 DeepGemm compilation issues ( #30336 )
2025-12-09 20:17:25 -05:00
PatrykSaffer
4c2e10ea19
[Bugfix] Fix cuda graph sizes when running with speculative decoding ( #30330 )
...
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com >
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai >
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com >
2025-12-10 00:47:07 +00:00
dongbo910220
03b5f940fd
[V1][Spec Decode] Optimize Medusa proposer to avoid GPU-CPU sync ( #29723 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-12-10 00:15:01 +00:00
Hashem Hashemi
2e7054da06
Improve wvsplitK tile and balance heristics. ( #29937 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2025-12-09 23:51:32 +00:00
Charlie Fu
3c680f4a17
[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter ( #25693 )
...
Signed-off-by: charlifu <charlifu@amd.com >
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com >
Co-authored-by: Micah Williamson <micah.williamson@amd.com >
Co-authored-by: wuhuikx <hattie.wu@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
2025-12-09 22:39:26 +00:00
Kyle Sayers
fccd532587
[Quantization] FP8 Weight Reloading for Quantized RL Rollout ( #28480 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-12-09 13:54:32 -08:00
bnellnm
00e5cbb967
[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply ( #29066 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-12-09 13:48:25 -08:00
rasmith
7618dc973d
[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py ( #29145 )
2025-12-09 20:18:17 +00:00
dependabot[bot]
f8dacc66b6
Bump actions/stale from 10.1.0 to 10.1.1 ( #30234 )
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-09 20:12:14 +00:00
dependabot[bot]
7cab92fd45
Bump actions/checkout from 6.0.0 to 6.0.1 ( #30233 )
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-09 20:03:16 +00:00
Tsukasa OI
73a484caa1
[Model][Quantization] Fix / Add GGUF support for Qwen2 MoE models ( #30307 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com >
2025-12-09 19:13:10 +00:00
Lucas Wilkinson
b37bf51e75
[CI/Test] Fix FP8 per-tensor quant test reference scale shape ( #30352 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-09 12:52:20 -06:00
Lucas Wilkinson
95501a70ec
[BugFix] Fix DeepSeek-R1 hang with DP and MTP ( #30119 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-09 18:51:19 +00:00
Benjamin Chislett
e858bfe051
[Cleanup] Refactor profiling env vars into a CLI config ( #29912 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-09 13:29:33 -05:00
Woosuk Kwon
d471b2aff0
[Model Runner V2] Support num NaNs in logits ( #30187 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-12-09 10:00:49 -08:00
Woosuk Kwon
9e6562a3f6
[Model Runner V2] Fix Triton warning on tl.where ( #30355 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-12-09 09:59:54 -08:00
Ilya Markov
0b6a8a304c
[BugFix] Fix non detected failing tests ( #30277 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2025-12-09 17:57:55 +00:00
Alexei-V-Ivanov-AMD
804e3468c0
Update AMD test definitions (2025-12-08) ( #30298 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-12-09 17:31:30 +00:00
Wentao Ye
83319b44c2
[Compile] Fix torch warning TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled ( #29897 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-09 10:40:37 -05:00
Lucas Wilkinson
56037dfa2f
[BugFix] Fix assert batch_descriptor.num_tokens == num_tokens_padded ( #30173 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-09 10:36:12 -05:00
quanliu
5dcd593baf
[Feature] Batch-Invariant Support for FA2 and LoRA ( #30018 )
...
Signed-off-by: quanliu <18646313696@163.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-12-09 10:01:38 -05:00
Julien Denize
5c213d2899
[BUGFIX] Mistral tool call parser v11+ ( #30332 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2025-12-09 14:55:38 +00:00
vllmellm
ee14644ba9
[ROCm] Aiter Quant Kernels ( #25552 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-12-09 14:27:37 +00:00
Dongjie Zou
1166c31cc7
[Bugfix]: Fix glm46 awq marlin moe wna16 compatibility ( #30210 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2025-12-09 12:20:21 +00:00
haoyangli-amd
03416eada6
[bugfix][quantization] Fix fp8 per_tensor scale shape ( #30257 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com >
2025-12-09 19:28:50 +08:00
Hubert de La Jonquiere
c72ea10723
[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. ( #30056 )
2025-12-09 18:54:08 +08:00
Jaya Yuan
67475a6e81
[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA ( #30309 )
...
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com >
2025-12-09 08:22:14 +00:00
wang.yuqi
9c32df6101
[Bugfix] Qwen 3 VL Embedding loading ( #30303 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-09 08:04:02 +00:00
Micah Williamson
aeb82b1930
[CI] Fix Flaky test_eagle_max_len Test ( #30306 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-09 07:33:34 +00:00
Lucas Wilkinson
aed846917f
[Attention] Make split_decodes_and_prefills(..., require_uniform=True) support padding ( #29644 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-12-09 07:24:01 +00:00
Yongtao Huang
e4605d225e
[Misc] Fix safetensors import for safe_open ( #30300 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-12-09 06:50:06 +00:00
Tsukasa OI
58d5b3f514
[Model][Quantization] Restore MoE + GGUF models support (incl. Qwen3 MoE) by allowing Sideload Parameters ( #30116 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-09 05:30:05 +00:00
Fanli Lin
c2e1987a6e
[Doc] update Intel GPU MM status in Feature x Hardware matrix ( #30294 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-12-09 05:16:44 +00:00
Fadi Arafeh
e130845984
[CPU][CI] Enable fused MoE tests in Arm CI ( #30132 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-12-09 04:55:39 +00:00
liangel-02
4b03b50211
update torchao safetensors impl ( #30155 )
...
Signed-off-by: Angel Li <liangel@meta.com >
2025-12-09 12:46:35 +08:00
Or Ozeri
4c6fd25880
kv_transfer: Rename the shared storage connectors ( #30201 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-12-08 20:46:09 -08:00
Michael Goin
03b91f7262
[Bugfix] Fix compressed-tensors models failing to load with transformers backend ( #30287 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-08 20:44:28 -08:00
czhu-cohere
f6227c22ab
[Kernel]Support W4A8 Grouped GEMM on Hopper ( #29691 )
...
Signed-off-by: czhu-cohere <conway.zhu@cohere.com >
2025-12-08 19:29:06 -08:00
gnovack
ea657f2078
Lora MoE Align Improvements ( #29257 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2025-12-09 10:35:16 +08:00
Kevin H. Luu
db14f61f2d
[ci] Refactor CI file structure ( #29343 )
2025-12-08 17:25:43 -09:00
Micah Williamson
78c7503364
[ROCm][CI] Skip NVIDIA-Only Prime-RL Test in AMD CI ( #29420 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-09 02:14:02 +00:00
Christina Norman
e41312a2f5
[Bugfix] Skip generation config fallback for GGUF to prevent multi-process hang ( #30209 )
...
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-09 01:52:43 +00:00
Yanan Cao
7b35011ad1
Mark qwen2_5_vl as xfail ( #30283 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-12-09 01:14:10 +00:00
Zhewen Li
ae339b1a67
[Bugfix] Fix DeepGEMM after #29546 ( #30267 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
Signed-off-by: Zhewen Li <zhewenli@meta.com >
2025-12-09 01:05:27 +00:00
Wentao Ye
0ee6416f67
[Perf] Optimize group_topk kernel, 1.9% Throughput improvement, 2.1% TPOT improvemnt ( #30159 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-08 19:44:01 -05:00
Wentao Ye
d9417096d1
[Feature] Batch invariant: Enable TRITON_MLA without prefix-caching ( #29125 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-08 19:31:57 -05:00
Ming Yang
9d6235ca9a
[moe] Allow disabling DP chunking ( #29936 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-12-09 00:29:36 +00:00
Victor Ziliang Peng
f1599ca55d
feat(metrics): Add prefill KV compute metric excluding cached tokens ( #30189 )
...
Signed-off-by: Ziliang Peng <ziliang@character.ai >
2025-12-09 00:08:48 +00:00
Ming Yang
60d17251c9
[Disagg] Support large batch size in proxy server and update NixlConnector doc for DP ( #28782 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-12-09 00:01:08 +00:00
Lain
1fb632fdb6
[Perf] Improve fp8 quant in mla; replace ReduceSum with ReduceScatterSum ( #29795 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
2025-12-08 15:02:34 -08:00
Charlie Fu
6af70e11a0
[ROCm][CI] Fix test_max_len.py for Rocm ( #29916 )
...
Signed-off-by: charlifu <charlifu@amd.com >
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com >
2025-12-08 16:58:30 -05:00
roikoren755
ae0f69b16a
Add SpecDec support to selective_state_update ( #29488 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2025-12-08 16:45:18 -05:00
Dmitry Tokarev
799804d140
Bump nvshmem to 3.3.24 and fix CUDA 13 installation ( #30149 )
...
Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-08 20:24:34 +00:00
Vasiliy Kuznetsov
0d402d2600
online fp8 quant with streaming weight post-processing ( #29196 )
...
Signed-off-by: vasiliy <vasiliy@fb.com >
2025-12-08 20:15:10 +00:00
Johnny Yang
d1b5e7afbf
[TPU] Bump tpu-inference to 0.12.0 ( #30221 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com >
2025-12-08 20:10:10 +00:00
shaharmor98
fcd5306f65
Add latent MoE support ( #30203 )
...
Signed-off-by: Shahar Mor <smor@nvidia.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-12-08 17:35:01 +00:00
weiguihua2
398a596ed2
[MP executor] fix get device count for multi node of mp executor feature ( #30042 )
...
Signed-off-by: weiguihua2 <weiguihua2@huawei.com >
2025-12-09 01:33:48 +08:00
Jee Jee Li
67312cad11
[Misc] Split the LoRA code ( #30253 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-09 00:59:31 +08:00
Laith Sakka
87aee9ed2b
Add evaluate_guards option to DynamicShapesConfig ( #27432 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2025-12-08 10:46:15 -05:00
Daniel Cámpora
184076c3fe
[DeepSeek v3.2] Make top-k work for any logit values. ( #27568 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-08 06:55:58 -08:00
Ye (Charlotte) Qi
eb1051fb95
[ROCm] Guard group quant RMS norm fusion patterns ( #30239 )
2025-12-08 14:44:48 +00:00
Jee Jee Li
80433e225e
[LoRA] Reduce the loading time of MoE LoRA ( #30243 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-08 13:29:47 +00:00
Harry Mellor
5c2433a6f3
Add tip for mypy and markdownlint to the pre-commit comment ( #30259 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-08 13:11:51 +00:00
Simon Mo
77072e93b3
[docs] governance documents ( #24801 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-12-08 12:06:20 +00:00
wang.yuqi
2e660c2434
[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. ( #30249 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-08 12:01:21 +00:00
Shiming Zhang
408cf42f67
[CI] Prevents triggering of an inactive issue/PR check for forked repository. ( #29654 )
...
Signed-off-by: Shiming Zhang <wzshiming@hotmail.com >
2025-12-08 10:29:14 +00:00
wang.yuqi
9e77ffca3f
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API ( #26686 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-12-08 08:10:09 +00:00
Dazhi Jiang
bcb6f5947f
[Perf] Remove sync point in vit torch sdpa attn backend ( #30232 )
...
Signed-off-by: Dazhi Jiang <dazhi_jiang@163.com >
2025-12-08 07:12:42 +00:00
Zhiyu
cd00c443d2
[Misc] Rename TensorRT Model Optimizer to Model Optimizer ( #30091 )
...
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com >
2025-12-08 07:05:27 +00:00
Jiangyun Zhu
d143271234
[Bugfix] fix fuse_allreduce_rms when tp =1 ( #30178 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-12-08 06:43:47 +00:00
Zhiwei
c6df05ebb4
[ROCm] [Fused Moe EP] Use binary expert mask for aiter fused moe kernel ( #29773 )
...
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com >
2025-12-08 05:23:46 +00:00
Nick Hill
d726a7b0ed
[BugFix] Unblock use of LoRA with data parallel mode ( #30220 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-08 12:21:05 +08:00
Zhijian Jiang
344b50d525
Address comment to mergify.yml in #30117 ( #30219 )
...
Signed-off-by: Zhijian Jiang <Zhijian.Jiang@outlook.com >
2025-12-08 11:26:25 +08:00
Andrew Xia
735284ed86
[responsesAPI][7] Browser, Container MCP tools for non harmony models ( #29989 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-08 10:04:03 +08:00
daniel-salib
444f0e3f33
[Frontend] Add MCP type support infrastructure to Responses API ( #30054 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2025-12-08 10:02:52 +08:00
ElizaWszola
af0444bf40
[Performance] Fused blockwise quant RMS norm ( #27883 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
2025-12-07 16:38:04 +00:00
Lucas Wilkinson
0044c4038c
[BugFix][DeepSeek-V3.2] Fix backend selection logic for Blackwell ( #30195 )
2025-12-07 10:53:51 -05:00
Isotr0py
b952f4d3c3
[v1] Add PrefixLM support to FlexAttention backend ( #27938 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-07 15:51:36 +00:00
Wentao Ye
541a2ef892
[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. ( #29546 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-07 20:31:14 +08:00
Jee Jee Li
b0f4866a77
[CI/Build]Temporary workaround for test_default_mm_loras timeout ( #30202 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-07 20:27:11 +08:00
Jinzhen Lin
879ddb09c3
[Kernel][MoE] optimize moe_align_block_size ( #29642 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-07 01:58:47 -08:00
Yifan Qiao
1b0482b9d1
[Misc][Core] Remove unused req_index increment in scheduler ( #30176 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu >
2025-12-07 08:39:21 +00:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )" ( #30199 )
2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-06 23:15:42 -08:00
Luke
a49d813fa8
Lazy loading to avoid importing all files ( #29716 )
...
Signed-off-by: Luke <yq0536@gmail.com >
2025-12-07 07:13:14 +00:00
Wentao Ye
17eb25e327
[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement ( #29558 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-07 04:44:50 +00:00
jeremyteboul
dce6d229f7
Support multiple image/audio embeddings per requests ( #29988 )
...
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com >
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com >
2025-12-07 04:34:24 +00:00
Yanan Cao
cbedb703cc
[Frontend] Remove confusing -O.xx flag error ( #30169 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-12-07 02:53:42 +00:00
AuruTus
8d3da4c79d
[MISC]: change NIXL compatibility hash logging level to debug ( #30182 )
2025-12-07 00:21:03 +00:00
Andrew Xia
421125d03a
[ez] move harmony utils to parser folder ( #30117 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-12-06 17:34:34 -05:00
Cyrus Leung
671427efbf
[Model] Move multimodal_cpu_fields definition to field config ( #30181 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-06 13:40:02 +00:00
Viacheslav
21bb323542
Gigachat 3 tool parser and tests ( #29905 )
...
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com >
2025-12-06 12:04:14 +00:00
Chukwuma Nwaugha
17a9abec2b
simplify requires_files list creation ( #29656 )
...
Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com >
2025-12-06 09:42:41 +00:00
Ye (Charlotte) Qi
92c35abb24
[Misc] Fix circular import in vllm.transformers_utils.config ( #30179 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-12-06 09:24:03 +00:00
Yu Jiaqi
43e7593031
Support tokenization_kwargs override ( #29794 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-12-06 09:12:53 +00:00
Cyrus Leung
c46b932df2
[Chore] Deprecate SupportsMultiModal.merge_by_field_config ( #30170 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-06 07:57:28 +00:00
redwrasse
6476382384
prefix caching design doc sha256 now default ( #29261 )
...
Signed-off-by: redwrasse <mail@redwrasse.io >
2025-12-06 07:39:56 +00:00
kx
d6aeaddf4a
[bugfix] fix type[AttentionBackend] bug in kv_connector_base_v1 ( #30051 )
...
Signed-off-by: 01267596 <xiongkai123@cmbchina.com >
Co-authored-by: 01267596 <xiongkai123@cmbchina.com >
2025-12-06 07:11:31 +00:00
Woosuk Kwon
a238cbd89d
[Model Runner V2] Support min-p sampling ( #30171 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-12-05 21:42:47 -08:00
Nick Hill
4026ae31e9
[Misc] Move disable_nccl_for_dp_synchronization init logic into VllmConfig ( #30161 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-05 20:59:04 -08:00
rasmith
b12f4a9830
[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN ( #29985 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-12-05 20:57:38 -08:00
Rohan Potdar
40a046cd82
[Bugfix]: Fix TokenizerLike interface ( #30009 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2025-12-05 20:56:40 -08:00
Peter Salas
e858bc4d14
[Model] Add support for transformer-based Ultravox v0.7 projector ( #30089 )
...
Signed-off-by: Peter Salas <peter@fixie.ai >
2025-12-05 20:55:43 -08:00
Dongjie Zou
e3fbb6f152
fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb ( #30093 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2025-12-05 20:55:09 -08:00
yuttian1
c4d62618ca
Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend ( #30102 )
...
Signed-off-by: yuttian1 <yuttian@amd.com >
2025-12-05 20:54:38 -08:00
rasmith
62079d8600
[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm ( #30109 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-12-06 12:54:17 +08:00
Harry Mellor
bf4a901af9
Better error when world size is larger than node and distributed_executor_backend is not set ( #30140 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-05 20:53:52 -08:00
Samuel Shen
7e31c3a3f6
[CI]: Remove unnecessary imports from test_lmache_integration ( #30157 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-12-06 12:53:34 +08:00
rasmith
dc839ad03d
[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round ( #30151 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-12-05 20:52:11 -08:00
Deboleina
02a4169193
[Tests] Tool call tests for openai/gpt-oss-20b ( #26237 )
...
Signed-off-by: Debolina Roy <debroy@redhat.com >
2025-12-05 19:03:29 -08:00
Wentao Ye
7b5575fa7d
[Bug] Fix vLLM config is not set error ( #29999 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-05 16:42:12 -05:00
Bangsheng Tang
77e4472809
let draft model follow target model's config_format ( #30152 )
2025-12-05 13:33:42 -08:00
Divakar Verma
962d703818
[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute ( #29926 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-12-05 19:57:26 +00:00
Nicolò Lucchesi
e23ca3a0e8
[CI] Re-use whisper_client for all tests ( #30148 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-05 19:47:37 +00:00
Russell Bryant
3633035a3f
[Misc] Rename CohereForAI references to CohereLabs ( #30147 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-12-05 19:41:40 +00:00
Nicolò Lucchesi
bff78310d9
[Enc-Dec] Fix OOT tokenizer issue ( #30144 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-12-05 19:23:33 +00:00
Tova Movshovitz
adb315060c
[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache ( #27170 )
...
Signed-off-by: tovam <tovam@pliops.com >
Signed-off-by: Tova Movshovitz <tovam@pliops.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-05 18:33:26 +00:00
Ilya Markov
4e26d3b09e
[Compile] Conditional compilation. Introduce compile_ranges ( #24252 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Luka Govedič <luka.govedic@gmail.com >
Signed-off-by: ProExpertProg <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Luka Govedič <luka.govedic@gmail.com >
2025-12-05 18:17:32 +00:00
Matthew Bonanni
66e674cdd5
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments ( #26315 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2025-12-05 09:48:43 -08:00
Mark McLoughlin
dff0a2b394
[NIXL] Add remote_request_id to kv_transfer_params ( #29665 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-12-05 09:43:48 -08:00
Nick Hill
dc264bcea1
[BugFix] Eagerly abort cancelled final-step requests ( #29987 )
...
Currently, when requests are cancelled while executing their final
step, "completion" is handled based on normal stop processing (e.g.
length or stop token), so the abort has no effect. This is typically
not a problem, but when a kv connector is involved it thinks the
request completed successfully rather than being aborted.
This is problematic for disaggregated prefill which will free kv
cache blocks if the request was aborted but not if it completed
successfully—since the cancelled request will never be sent to
the decode side, kv cache blocks remain pinned until the fall-back
timeout expires. The problem is exacerbated when many requests
are cancelled and/or there are large prefills whose forward pass
takes a long time (since the window is bigger).
This PR fixes the problem by processing pending aborts
immediately prior to processing model output each step; we process
only aborts, not new requests, since it's preferable for latency to
process model outputs before new incoming requests.
Fixes #26400 .
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-05 17:28:32 +00:00
Nicolò Lucchesi
78c44fd722
[NIXL] Small cleanup of unused variables ( #29618 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-05 18:17:36 +01:00
Angela Yi
e7296b08da
[bugfix] Pass globals to aot_compiled function ( #29428 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-12-05 16:54:26 +00:00
Andrew Xia
da7bc54ea8
[responsesAPI][5] ResponsesParser with tools for full MCP python loop ( #29798 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Signed-off-by: Andrew Xia <axia@meta.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-12-05 11:11:50 -05:00
Mark McLoughlin
949a6a19d2
[NIXL] Add compatibility checking to NIXL KV connector handshake ( #29503 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-12-05 15:52:45 +01:00
Alec S
2c174420f5
Reduce validation to a warning ( #28749 )
...
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-05 14:02:49 +00:00
Yi Liu
0d8a7d8a26
[Compressed Tensors] Add XPU wNa16 support ( #29484 )
...
Signed-off-by: yiliu30 <yi4.liu@intel.com >
2025-12-05 22:02:09 +08:00
Elham
9843e332da
[CPU][Perf] Add fast vectorized exp impl from Arm Optimized Routines ( #30068 )
...
Signed-off-by: Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal >
Signed-off-by: Elham Harirpoush <elham.harirpoush@arm.com >
Co-authored-by: Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal >
2025-12-05 13:09:20 +00:00
Harry Mellor
b7d85cf25c
[CI] Have pre-commit comment on a PR if pre-commit was not used ( #30077 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-05 13:03:45 +00:00
Max Hu
c2894d3883
[Feature] Add Layer-wise NVTX Support ( #29990 )
...
Signed-off-by: Max Hu <hyoung2991@gmail.com >
Signed-off-by: Max Hu <maxhu@nvidia.com >
Co-authored-by: Max Hu <maxhu@nvidia.com >
2025-12-05 11:20:07 +00:00
Zhiwei
3628bcaaf2
[ROCm][MXFP4] Infer w4a4 quant method in rocm aiter fused moe ( #29775 )
...
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com >
2025-12-05 11:01:16 +00:00
strinczer
b73b158ab0
[Bugfix] Fix parse_output_message crash on commentary with no recipient ( #29972 )
...
Signed-off-by: Shai Trinczer <strinczer@icloud.com >
Signed-off-by: strinczer <strinczer@icloud.com >
2025-12-05 10:51:12 +00:00
Ning Xie
7ae13c66ba
[typing] fix type ( #29964 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-12-05 10:46:08 +00:00
Ming Yang
f16356fe36
[bench] Support common prefix len config (for decode-only bench) ( #29934 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-12-05 10:26:52 +00:00
Alec S
65ee97288a
[BugFix] Adding env variable to disable async grammar compilation ( #29996 )
...
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-12-05 00:49:37 -08:00
Yanan Cao
62b3333448
[Frontend] Remove deprecated -O.xx flag ( #29991 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-12-05 00:47:22 -08:00
rasmith
feecba09af
[CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues ( #29997 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-12-05 08:42:25 +00:00
amitz-nv
6038b1b04b
[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH ( #29978 )
...
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com >
2025-12-05 00:34:33 -08:00
Tiger Xu / Zhonghu Xu
60a66ea2dc
[DOC]: Add kthena to integrations ( #29931 )
...
Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com >
2025-12-05 08:11:03 +00:00
Micah Williamson
06579f9a82
[AMD][CI] Add ray[default] Dependency On ROCm To Pass v1/metrics/test_engine_logger_apis.py ( #30110 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-05 06:48:23 +00:00
Chukwuma Nwaugha
6e865b6a83
Refactor example prompts fixture ( #29854 )
...
Signed-off-by: nwaughac@gmail.com
2025-12-05 06:44:32 +00:00
Jingchun Gao
d698bb382d
[Bugfix] Correct num_q_heads on DCP for Flashinfer backends ( #29487 )
...
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com >
Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com >
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com >
2025-12-05 05:54:31 +00:00
Charlie Fu
2c22c4ca2d
[ROCm][CI] Increase the memory threshold for test_deep_sleep_fp8_kvcache ( #30104 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-12-05 04:51:44 +00:00
Laith Sakka
5867819eaf
Do not guard during noop elimination pass ( #30095 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2025-12-05 04:10:12 +00:00
Charlie Fu
7c9b2c8f81
[ROCm][CI] Add jiwer dependency for testing ( #30081 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-12-05 03:34:51 +00:00
Qiu
0098a6e3da
[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms ( #29952 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-04 21:40:51 -05:00
Hubert de La Jonquiere
befb59e5b1
[Model] Add Holo2 reasoning parser ( #30048 )
...
Signed-off-by: hdlj-h <hubert@hcompany.ai >
2025-12-05 10:38:45 +08:00
Shengqi Chen
aaddc9c82a
[CI] fix silent error in nightly wheel index generation script, add generation time to HTML index ( #30060 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-05 00:48:59 +00:00
Zhewen Li
263c38d74d
[CI/Build] Update batch invariant test trigger ( #30080 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-12-05 00:42:37 +00:00
Zhewen Li
bcf43ab1f3
[CI/Build][AMD] Add Llama4 Maverick FP8 to AMD CI ( #28695 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-12-04 16:07:20 -08:00
Alexander Matveev
4470ee2f90
[Perf] Enable separate shared_experts stream only for CUDA ( #30085 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-12-05 00:03:17 +00:00
TimWang
690cc3ef20
docs: update metrics design doc to use new vllm:kv_cache_usage_perc ( #30041 )
...
Signed-off-by: Tim <tim.wang03@sap.com >
2025-12-04 23:37:14 +00:00
Laith Sakka
1f0d184590
[aot_compile]change VLLM backend to read fake args from example_value ( #29104 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2025-12-04 17:33:45 -05:00
Lucas Wilkinson
c8ab988b15
[BugFix] Fix DBO assert assert B_block_table == B_q ( #29933 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-04 14:48:54 -05:00
Peng-YM
48a5fff66e
[Bugfix] Missing tokens in return_token_ids when tool parsers is enabled in streaming mode ( #29074 )
...
Signed-off-by: Peng-YM <1048217874pengym@gmail.com >
2025-12-04 19:09:39 +00:00
Mercykid-bash
1119f6e47a
Abstract eplb algo ( #26471 )
...
Signed-off-by: Che Ruan <cr623@ic.ac.uk >
Signed-off-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com >
Signed-off-by: Mercykid-bash <ruanche0218@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Che Ruan <cr623@ic.ac.uk >
Co-authored-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-04 19:09:09 +00:00
Harry Mellor
e10c84e06a
Access partial_rotary_factor from rope_parameters ( #29966 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-04 18:42:49 +00:00
Kuntai Du
ece2825a29
[KVConnector] Remove v0-related kv connector components such as kv pipe and kv lookup buffer ( #29705 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-12-04 18:20:48 +00:00
Jee Jee Li
652ba93da3
[Bugfix] Fix FP8 MoE LoRA ( #29890 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-04 18:17:49 +00:00
Tao Yun
6dcb07f676
support qwen3-vl handle requests with embeddings ( #30037 )
...
Signed-off-by: taoyun <1069423820@qq.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-12-04 17:34:06 +00:00
Qiu
46cbbca05c
[CI][DCP][Perf] reduce DCP CI execution time ( #29858 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
2025-12-04 17:28:21 +00:00
Cyrus Leung
b286a311c2
[Chore] Deprecate merge_by_field_config arg ( #30035 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-04 17:21:24 +00:00
Shengqi Chen
990f806473
[Doc] clarify nightly builds in developer docs ( #30019 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-05 00:28:37 +08:00
Doug Smith
5b4b42c0b6
Mark DBO test as flaky on b200 for Distributed B200 test ( #29913 )
...
Signed-off-by: dougbtv <dosmith@redhat.com >
2025-12-04 10:38:03 -05:00
Woosuk Kwon
cc050558f4
[Model Runner V2] Implement get_num_sampled_and_rejected kernel ( #30029 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-12-04 07:19:42 -08:00
Harry Mellor
5c32a06a04
Use Transformers v5 RoPE standardisation and validation ( #30046 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-04 14:54:28 +00:00
Yongtao Huang
dd97e047e0
Fix broken multiline assert in LoRAModelManager.register_module ( #30032 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-12-04 22:04:42 +08:00
Harry Mellor
9998ea5b57
Delete HF version of Phi 4 MM ( #30049 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-04 13:44:50 +00:00
wang.yuqi
74c4d80c6c
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling ( #27145 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 13:44:15 +00:00
Kevin H. Luu
1b7c7f5159
[release] install regex ( #30008 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 03:18:29 -08:00
Chauncey
6796ce8bdb
[Bugfix] Fix the issue with interleaved thinking when using streaming ( #30033 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-12-04 11:11:59 +00:00
Andreas Karatzas
e96a6a6dca
[ROCm][CI][Bugfix] Fixing the Multi-Modal Models Test (Extended) 1 group ( #30013 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-04 11:00:16 +00:00
Noa Neria
6366c098d7
Validating Runai Model Streamer Integration with S3 Object Storage ( #29320 )
...
Signed-off-by: Noa Neria <noa@run.ai >
2025-12-04 18:04:43 +08:00
dtc
842aba501d
[P/D] Introduce Mooncake Transfer Engine as kv_connector ( #24718 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com >
Signed-off-by: dtc <dtcccc@linux.alibaba.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-12-04 09:51:36 +00:00
rasmith
f2f4cea6cc
[CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER ( #29995 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-12-04 09:30:22 +00:00
Arpit Khandelwal
dfdda96747
[Core] Remove forced None assignment for deprecated PassConfig flags ( #29994 )
...
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-04 09:15:04 +00:00
Xu Wenqing
ffdd18111b
Add DeepSeek-V3.2 tool parser. ( #29848 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
2025-12-04 08:46:34 +00:00
Ye (Charlotte) Qi
b8a6ae4158
[ROCm] add fallback for aiter fp8 decode mla ( #30005 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-12-04 08:45:57 +00:00
Mark McLoughlin
899e2ef558
[Core] Fix standalone runs of test_reset_prefix_cache_e2e ( #29899 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-12-04 16:22:03 +08:00
Cyrus Leung
68eb5c8d97
[Misc] Move functions into PoolingMetadata ( #30027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-04 08:21:19 +00:00
Micah Williamson
5430e110c0
[CI][AMD] Match Main CI Behavior By Skipping test_eplb_spec_decode In AMD CI ( #30006 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-04 16:20:54 +08:00
TJian
3f1b03739a
[ROCm] [Bugfix] compute_attn_mask_seqlen for qwen3 omni ( #29974 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-12-04 08:20:24 +00:00
Charlie Fu
9aa33a74b0
[Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model ( #30001 )
...
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com >
2025-12-04 07:52:28 +00:00
CYJiang
fd68e909db
[docs] Remove _total from counter metrics names ( #30028 )
...
In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API.
Signed-off-by: CYJiang <86391540+googs1025@users.noreply.github.com >
2025-12-04 07:46:15 +00:00
daniel-salib
404fc4bfc0
[Frontend] refactor harmony utils output message parsing ( #29820 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2025-12-04 15:36:57 +08:00
Chauncey
82a64b3d8f
[Bugfix] fixed deepseekv32 tool calling error ( #30025 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-04 15:12:12 +08:00
Cyrus Leung
9ae2f60374
[Misc] Various cleanups for MM input processing ( #29970 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-04 06:22:20 +00:00
Jianwei Mao
80f8af4b2f
Fix error while downloading dependencies for CPU backend ( #29797 )
...
Signed-off-by: Jianwei Mao <maojianwei2016@126.com >
2025-12-04 06:04:44 +00:00
Kuntai Du
8aaa81b35f
[KVConnector] remove unused code (the model aware kv ops class) ( #29709 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-12-04 06:00:52 +00:00
Benjamin Bartels
fca3f46658
[Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk ( #29971 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
2025-12-04 05:50:27 +00:00
gausah01
28097d5638
[Bugfix][CPU] Fix CPU KV cache fallback memory allocation ( #29604 )
...
Signed-off-by: Gauri Sahnan <gauri.sahnan@arm.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2025-12-04 13:01:15 +08:00
Jee Jee Li
dd38ba3a26
[Bugfix] Fix adapter_enabled IMA ( #29977 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-12-04 12:51:15 +08:00
Li Wang
5f91cdda75
[Misc] Add docker build env for Ascend NPU ( #30015 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
2025-12-03 19:53:00 -08:00
Iceber Gu
33a3d6c798
fix LoRA-related examples ( #29956 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com >
2025-12-04 11:48:30 +08:00
Zhewen Li
c493b9d092
[CI/Build] Add MM code path to Examples Test ( #29986 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-12-03 19:21:45 -08:00
Xieyang Xu
ad32e3e19c
enable multi-node in external launcher mode ( #29833 )
2025-12-03 17:02:02 -08:00
Shengqi Chen
1109f98288
[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels ( #29930 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-03 14:08:19 -08:00
Elizabeth Thomas
b5407869c8
[Bugfix] Respect VLLM_CONFIGURE_LOGGING value ( #28671 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Jane Xu <janeyx@meta.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Johnny Yang <johnnyyang@google.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: bruceszchen <bruceszchen@tencent.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Johnny Yang <24908445+jcyang43@users.noreply.github.com >
2025-12-03 22:00:52 +00:00
bnellnm
2902c34826
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton ( #29929 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-12-03 20:49:00 +00:00
Wentao Ye
ac1886588f
[CI] Fix re import error ( #29973 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-12-03 15:16:54 -05:00
Yongtao Huang
2fc5d6e0d7
Fix LLMEngine.del dp_group cleanup condition ( #29954 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-12-03 12:14:44 -08:00
elvischenv
afe9eb408e
[Bugfix] Fix flashinfer ar+norm kernel not available issue ( #29960 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2025-12-03 18:50:53 +00:00
Varun Sundar Rabindranath
19bee6d12d
[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel ( #29470 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-12-03 18:04:59 +00:00
avigny
dd5d1ef780
[Bugfix] Mistral tool parser streaming update ( #19425 )
...
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com >
Signed-off-by: Chauncey <chaunceyjiang@gmail.com >
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: Jeff Cook <jeff@jeffcook.io >
Co-authored-by: sfbemerk <benjaminmerkel@mail.de >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-03 17:45:31 +00:00
Micah Williamson
d1f7392c5f
[ROCm][CI] Fix v1/logits_processors failure on ROCm ( #29927 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-04 01:17:07 +08:00
Yu Jiaqi
9ae3c55b10
SigLIP example add chat_template ( #29902 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-12-03 16:12:58 +00:00
Lumis Chen
9bcf92295a
[Core] Add xxHash as a high-performance hash option for accelerating prefix caching ( #29163 )
...
Signed-off-by: LuminolT <lumischen01@gmail.com >
Signed-off-by: Lumis Chen <lumischen01@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-12-03 16:06:57 +00:00
rasmith
5aa9b09040
[CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() ( #29839 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-12-03 22:56:35 +08:00
ioana ghiban
1bb17ecb39
[CPU Backend] [Doc]: Update Installation Docs for CPUs ( #29868 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com >
2025-12-03 13:33:50 +00:00
ioana ghiban
15b1511a15
[GPU Backend] [Doc]: Remove duplicate statements on missing GPU wheels. ( #29962 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com >
2025-12-03 12:56:47 +00:00
Chauncey
b78772c433
[Frontend] supports deepseekv32 chat template ( #29837 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-03 20:53:44 +08:00
Amr Mahdi
f5d3d93c40
[docker] Build CUDA kernels in separate Docker stage for faster rebuilds ( #29452 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2025-12-03 11:41:53 +00:00
Fadi Arafeh
78f4bb0ba8
[DOC] Add Arm to list of compute resouces providers ( #29894 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-12-03 11:36:58 +00:00
HDCharles
b294e28db2
[refactor] CTMoEMethods to use QuantizationArgs ( #28871 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-03 11:00:56 +00:00
Roger Wang
787b84a9fc
[Bugfix] Follow-up fix on MediaWithBytes ( #29951 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-12-03 10:42:49 +00:00
Tsukasa OI
42c1949643
[Bugfix][Quantization] Support BF16 tensors on GGUF ( #29948 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com >
2025-12-03 10:33:46 +00:00
Isotr0py
cc4e296ea6
[CI/Build] Avoid duplicate empty inputs test for common multimodal generation tests ( #29907 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-03 10:27:36 +00:00
Isotr0py
a21cd9ed23
[Bugfix] Fix incorrect image_grid_thw rank for HunyuanOCR from missing merge_by_field_config=True ( #29950 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-03 10:05:10 +00:00
WeiQing Chen
7fe9c1a223
[CI] Add Async Eplb nightly CI tests ( #29385 )
...
Signed-off-by: David Chen <530634352@qq.com >
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-03 09:51:08 +00:00
Chauncey
3f42b05fbc
[Refactor] [1/N] to simplify the vLLM serving architecture ( #28040 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-12-03 01:26:39 -08:00
Yong Hoon Shin
69520bc695
Add logging for cudagraph related info ( #29825 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-12-03 01:01:48 -08:00
Andrew Xia
3a7751485b
[responsesAPI] support input output messages for non harmony models ( #29549 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-12-02 23:59:23 -08:00
Cyrus Leung
bbfb55c29e
[Misc] Allow fetch_* utils to access local files by default ( #29932 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-03 15:49:34 +08:00
JackieWu
0bec63fa31
[BugFix] fix imgs_pos in hunyuan_vl ( #29879 )
...
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-03 06:20:37 +00:00
elvischenv
c719c40540
[Bugfix] Defunctionalize TRTLLM AR+Norm op for avoiding extra clone kernel before it ( #29631 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-12-03 05:15:50 +00:00
Russell Bryant
b08025a83b
[Docs] Discuss api key limitations in security guide ( #29922 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-12-02 20:57:28 -08:00
Arpit Khandelwal
d7284a2604
[Core] Rename PassConfig flags as per RFC #27995 ( #29646 )
...
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-12-03 03:38:55 +00:00
Andreas Karatzas
506ed87e87
[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues ( #29909 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-12-03 10:36:49 +08:00
Roger Wang
4dd7978374
[Bugfix] Fix regression on pooling models from PR#29621 ( #29921 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-03 10:33:45 +08:00
Lucas Wilkinson
5cdd664509
[BugFix] Fix assert in build_for_cudagraph_capture ( #29893 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-02 16:56:54 -08:00
Alexei-V-Ivanov-AMD
5f67361fd1
Reverting re-direction to amd_mi355_X. ( #29914 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-12-03 00:40:02 +00:00
maang-h
5d91d2b292
[Doc] Add allocate_slots parameter docs ( #29777 )
...
Signed-off-by: maang <maang_h@163.com >
Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-12-02 23:23:09 +00:00
Micah Williamson
c014de1ec7
[ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI ( #29808 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-12-02 22:54:36 +00:00
Julien Denize
1b1e35aaf9
[BUGFIX] Fix regex pattern for Mistral Tool Call ( #29918 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2025-12-02 14:51:58 -08:00
Julien Denize
5e5646e206
[BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention ( #29908 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2025-12-02 14:51:20 -08:00
Chauncey
0a9caca9f5
[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine ( #29764 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-12-02 22:42:28 +00:00
Sage Moore
e6f114ac25
[Bugfix][EPLB] Prevent user-provided EPLB config from being overwritten with defaults ( #29911 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-12-02 13:20:22 -09:00
Harry Mellor
6fc5841db1
Fix some more Transformers nightly tests ( #29872 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-02 21:49:44 +00:00
dependabot[bot]
3ff5b53bc2
Bump actions/setup-python from 6.0.0 to 6.1.0 ( #29768 )
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 21:29:32 +00:00
jthomson04
1528e079e2
[Perf] Avoid pageable HtoD transfer in MinTokensLogitsProcessor ( #29826 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com >
2025-12-02 21:25:52 +00:00
Divakar Verma
afb1e5b380
[CI][ROCm][tests/v1/e2e] Fix multiprocessing launch for the test ( #29123 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-12-02 20:46:10 +00:00
Copilot
1c593e117d
Fix boolean nested params, add dict format support, and enhance plotting for vllm bench sweep ( #29025 )
...
Signed-off-by: Luka Govedič <luka.govedic@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <luka.govedic@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-12-02 20:40:56 +00:00
Navanit Dubey
a2b053dc85
feat(model): Add BitsAndBytes quantization support for Qwen3-Omni-MoE ( #29896 )
...
Signed-off-by: navanit-git <navanitdubey@gmail.com >
2025-12-02 19:28:35 +00:00
Matthew Bonanni
1d93f11675
[Attention][CUDAGraph] Remove CG padding from attention backends ( #29352 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-12-02 13:48:08 -05:00
Benjamin Bartels
2d613de9ae
[CI/Build] Fixes missing runtime dependencies ( #29822 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
2025-12-02 10:21:49 -08:00
Alexei-V-Ivanov-AMD
c77b9929a0
Update AMD-CI testing mirror (as of 2025-12-02) ( #29898 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-12-02 08:52:54 -09:00
Isotr0py
63b1da76ba
[Chore]: Reorganize gguf utils funtions under transformers_utils ( #29891 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-02 17:33:23 +00:00
Andrew Xia
52cb349fc0
[responsesAPI][3] ResponsesParser to set up non harmony MCP ( #29413 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-12-02 11:24:45 -05:00
Isotr0py
0ec8422171
[Bugfix] Fix incorrect channel order for idefics3 in edge case ( #29881 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-02 16:03:52 +00:00
wang.yuqi
2eb4fe9129
[examples] Resettle pooling examples. ( #29365 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-02 15:54:28 +00:00
Matthew Bonanni
51c57b51dd
[Bugfix] Fix DeepSeek R1 MTP weight loading ( #29545 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2025-12-02 15:52:18 +00:00
ImaGoodFella
60c3d413af
[Multimodal][Core] Optimize multimodal preprocessing cache by hashing image bytes instead of pixel values ( #29621 )
...
Signed-off-by: Rahul Steiger <rasteiger@ethz.ch >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-02 21:49:02 +08:00
Cyrus Leung
68ffbca7e4
[Chore] Use tokenizer.encode and tokenizer.decode directly ( #29851 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-02 12:30:40 +00:00
Harry Mellor
951445a52d
Remove default values from InitVars so that they're not stored ( #29859 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-02 12:16:37 +00:00
Julien Denize
d8c6210eea
Add Mistral Large 3 and Ministral 3 ( #29757 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Signed-off-by: Mickael Seznec <mickael@mistral.ai >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Mickael Seznec <mickael@mistral.ai >
2025-12-02 10:29:00 +00:00
Louie Tsai
8bbcf8b6e7
[vLLM Benchmark Suite] Add default parameters section and update CPU benchmark cases ( #29381 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com >
2025-12-02 09:00:23 +00:00
Boyuan Feng
70fb77b4dc
[BugFix] add max-num-batched-token to scheduler hash ( #29829 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-02 08:55:02 +00:00
杰兮
48d15a32aa
[CI] Fix Bad_words test for tokenizer encode/decode asymmetry ( #28193 )
...
Signed-off-by: zhyajie <yajizhan@amd.com >
Co-authored-by: zhyajie <yajizhan@amd.com >
2025-12-02 00:02:12 -08:00
Boyuan Feng
3b221cb661
[BugFix] respect VLLM_LOGGING_LEVEL in logger ( #29761 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-02 07:49:16 +00:00
Wushi Dong
0037b5746a
[Core] Eliminate redundant is_encoder_decoder lookups (20-40us/step) ( #29800 )
...
Signed-off-by: Wushi Dong <dongws@meta.com >
2025-12-02 07:08:07 +00:00
Harry Mellor
f5b0846ba0
Fix some Transformers nightly tests ( #29802 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-02 07:05:27 +00:00
Zhang Xiangze
13ea39bc09
[CPU]Parallelize over tokens in int4 moe ( #29600 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-12-02 06:21:39 +00:00
Shengqi Chen
4b612664fd
[CI] Renovation of nightly wheel build & generation (take 2) ( #29838 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-01 22:17:10 -08:00
Cyrus Leung
653591d5e7
[Chore] Move tokenizer initialization methods ( #29793 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-02 13:33:37 +08:00
Divakar Verma
e2fbfc955e
[CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm ( #29827 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-12-02 05:27:46 +00:00
Divakar Verma
a690fb5bd6
[CI][ROCm] Fix test_correctness_sliding_window ( #29243 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-02 04:53:27 +00:00
usberkeley
81fe3f82af
[BugFix] Fix index error in ngram_proposer ( #29779 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-12-02 04:48:11 +00:00
Zuyi Zhao
53bf71b0f0
[Misc] Update conftest for entrypoints/sagemaker test folder ( #29799 )
...
Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com >
2025-12-01 18:56:39 -09:00
Johnny Yang
f441d36cee
Add missing return in _check_vllm_model_embed_input_ids ( #29834 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com >
2025-12-01 19:22:50 -08:00
Seiji Eicher
22274b2184
[Misc] Add ReplicaId to Ray metrics ( #24267 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Co-authored-by: rongfu.leng <1275177125@qq.com >
2025-12-02 03:21:44 +00:00
Wei Wei
fc95521ba5
[Misc] Throw error on unintended access to scheduler_config.max_model_len ( #29771 )
...
Signed-off-by: Wei Wei <wwei6@meta.com >
2025-12-02 10:58:44 +08:00
Zhuohan Li
d0cd728907
[Core] Support reseting all running requests' KV while calling reset_prefix_cache ( #28827 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-12-02 02:25:05 +00:00
Andrew Xia
fa8804ad9c
[responsesAPI][4] fix responseOutputItem Kimi K2 thinking bug ( #29555 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-12-02 02:11:35 +00:00
Divakar Verma
4b40924998
[ROCm] Fallback pytorch GELU with tanh approximation to GELU() ( #29244 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
Signed-off-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-02 02:02:22 +00:00
Hendrik Holtmann
c0dfc89485
SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm ( #29711 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-12-01 17:24:18 -08:00
Nick Hill
44822d7ff2
[BugFix] Preserve spec decoding uniform decode when scheduling ( #29759 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-01 17:15:52 -08:00
Alexei-V-Ivanov-AMD
342c4f1472
Updated CI mirror 2025-11-25 ( #29434 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
Signed-off-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2025-12-01 23:44:33 +00:00
Kevin H. Luu
1336a1ea24
Revert #29787 and #29690 ( #29815 )
2025-12-01 13:42:03 -08:00
Nengjun Ma
eaf81485ed
[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode ( #28935 )
...
Signed-off-by: leo-pony <nengjunma@outlook.com >
2025-12-01 15:02:18 -05:00
Finbarr Timbers
38caf7fa1a
Update FAQ on interleaving sliding windows support ( #29796 )
...
Signed-off-by: Finbarr Timbers <finbarrtimbers@gmail.com >
2025-12-01 19:15:19 +00:00
shivampr
cabc77cc86
[Core][Observability] Add KV cache residency metrics ( #27793 )
...
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior:
vllm:kv_block_lifetime_seconds — total lifetime from allocation to free
vllm:kv_block_idle_before_evict_seconds — idle duration before eviction
vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block
These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates.
Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled.
Two new runtime flags are introduced:
--kv-cache-metrics – enable KV cache residency metrics
--kv-cache-metrics-sample – control sampling ratio (default: 0.01)
Signed-off-by: Shivam <shivamprasad91@gmail.com >
2025-12-01 18:27:53 +00:00
Kevin H. Luu
ec7035c9d4
[ci] Make distributed 8 gpus test optional ( #29801 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-12-01 10:22:05 -08:00
knlnguyen1802
fc6acc88ca
[Bugfix] Missing cached item in the MultiModalReceiverCache ( #28525 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: Chenguang Zheng <645327136@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-01 10:18:07 -08:00
BADAOUI Abdennacer
d0985c5feb
[Hardware][AMD] Remove ROCm skip conditions for transformers backend tests ( #29782 )
...
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com >
2025-12-02 02:03:13 +08:00
sangbumlikeagod
092bb73b8a
[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation ( #24209 )
...
Signed-off-by: sangbumlikeagod <oironese@naver.com >
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com >
2025-12-01 18:19:17 +01:00
FredericOdermatt
5d43f7372e
[Doc] Update description disable_any_whitespace ( #29784 )
...
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch >
2025-12-01 16:48:33 +00:00
Shengqi Chen
37593deb02
[CI] fix url-encoding behavior in nightly metadata generation ( #29787 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-01 23:17:20 +08:00
Liu Jinyi
f5516039c5
[Doc] fix heading levels ( #29783 )
...
Signed-off-by: KKKZOZ <kkkzoz@qq.com >
2025-12-01 14:49:22 +00:00
Shengqi Chen
36db0a35e4
[CI] Renovation of nightly wheel build & generation ( #29690 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-01 21:25:39 +08:00
Marcin Ostrowski
5cfa967efa
[Bugfix] TypeError: 'NoneType' object is not callable ( #29414 )
...
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com >
2025-12-01 13:16:44 +00:00
Isotr0py
b95db244ee
[v1] Add real sliding window calculation to FlexAttention direct BlockMask building ( #26015 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
Co-authored-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2025-12-01 13:12:51 +00:00
Zhengxu Chen
ad9d656bfa
[multimodal][test] Reduce memory utilization for test_siglip to avoid OOM ( #29504 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-01 20:41:48 +08:00
Fanli Lin
f37e8938d2
[XPU] Fix AWQ skipped layer detection in IPEX quantization ( #29774 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com >
2025-12-01 12:00:52 +00:00
Cyrus Leung
f0a28bf661
[Misc] Unify tokenizer registration ( #29767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-01 11:34:58 +00:00
Mickaël Seznec
86e178f7c4
[crashfix] Eagle + multimodal can crash on mm cache miss ( #29750 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-12-01 17:29:33 +08:00
daniel-salib
014ece97c7
[Frontend] Add tool filtering support to ToolServer ( #29224 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-12-01 08:03:57 +00:00
wang.yuqi
62de4f4257
[Frontend] Resettle pooling entrypoints ( #29634 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-12-01 15:30:43 +08:00
Huamin Li
83805a6078
[CI] Skip paddleocr_vl for transformer 4.57.3 ( #29758 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-12-01 04:38:06 +00:00
Yifei Zhang
1ab8fc8197
Make PyTorch profiler gzip and CUDA time dump configurable ( #29568 )
...
Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com >
2025-12-01 04:30:46 +00:00
Shu Wang
f72a817bdf
[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch ( #27141 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-30 16:05:32 -08:00
Woosuk Kwon
ec38a7368d
[Model Runner V2] Use packed mask for prompt bin counts ( #29756 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-30 14:15:42 -08:00
Xingyu Liu
21c2627934
[Misc]Remove redundant hidden_size property in ModelConfig ( #29749 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-30 17:14:23 +00:00
Omer Ullman Argov
39d28108f4
[Feat] Support non-gated activations in NVFP4 modelopt path ( #29004 )
2025-11-30 11:02:40 -05:00
Harry Mellor
cd719de5cb
Fix RoPE failures in Transformers nightly ( #29700 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-30 14:29:32 +00:00
Pleaplusone
8c363ed666
[ROCm][Attention] Sliding window support for AiterFlashAttentionBackend ( #29234 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-30 11:31:50 +00:00
Cyrus Leung
64bc09ba27
[Core] Enable inputs_embeds_size separate from hidden_size ( #29741 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-30 17:31:12 +08:00
Isotr0py
47539cfd3e
[Bugfix] Fix mismatched nvfp4 gemm output shape ( #29742 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-30 09:15:01 +00:00
Cyrus Leung
2afcec4dec
[Misc] Update TokenizerLike interface and move get_cached_tokenizer ( #29730 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-30 14:59:47 +08:00
朝
9381b5cde0
[Doc]: Fix typo in fused_moe layer ( #29731 )
...
Signed-off-by: BowTen <bowten@qq.com >
2025-11-29 22:29:13 -08:00
Vensen
66b5840287
[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output ( #28783 )
...
Signed-off-by: vensen <vensenmu@gmail.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-11-30 14:24:25 +08:00
Huamin Li
82c795d6f2
Fix AttributeError about _use_fi_prefill ( #29734 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-11-30 06:04:55 +00:00
Isotr0py
e1464c3a08
[Quantization] Enable compressed-tensors AWQ for Turing GPU ( #29732 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-30 06:04:28 +00:00
Xin Yang
a491b0911b
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 ( #29708 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-30 10:37:25 +08:00
Jee Jee Li
b9d0504a36
[Bugfix] Revert test_tokenization.py ( #29729 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-29 16:35:15 +00:00
Jinzhen Lin
1656ad3704
[Kernel][Quantization] add w4a8 support for marlin kernel ( #24722 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin@redhat.com >
2025-11-29 07:19:33 -08:00
Cyrus Leung
fa59fe417f
[Chore] Move detokenizer_utils to vllm/tokenizers ( #29727 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-29 06:25:17 -08:00
Cyrus Leung
fe3398fab2
[Chore] Enable passing tokenizer=None into MM processor ( #29724 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-29 06:25:10 -08:00
Chukwuma Nwaugha
ad7f714d62
hfrunner.classify should return list[list[float]] not list[str] ( #29671 )
...
Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com >
2025-11-29 13:57:00 +00:00
dublc
f4341f45d3
[Doc]: fix code block rendering ( #29728 )
...
Signed-off-by: dublc <jdublc0x@gmail.com >
2025-11-29 13:46:48 +00:00
Cyrus Leung
34a984274e
[Misc] Refactor tokenizer interface ( #29693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-29 04:02:21 -08:00
Woosuk Kwon
f223ed4181
[Model Runner V2] Fuse penalties and temperature into single kernel ( #29720 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-29 02:29:16 -08:00
Didier Durand
04a797cd0e
[Doc]: fixing typos in various files. ( #29717 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-29 01:15:39 -08:00
Woosuk Kwon
6afc0ffaf6
[Model Runner V2] Add sample/ directory and reorganize files ( #29719 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-29 00:41:01 -08:00
Jee Jee Li
39e63dec7c
[LoRA] Cleanup LoRA unused code ( #29611 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-28 22:52:58 -08:00
Woosuk Kwon
4a80ad0a25
[Model Runner V2] Don't use UVA buffer for prefill_len ( #29713 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-28 20:27:16 -08:00
Angela Yi
4b17ce6815
Add gpu memory wait before test_async_tp ( #28893 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-28 20:19:05 -08:00
Lucas Wilkinson
e23f665d83
[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable ( #29698 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-28 20:19:01 -08:00
Woosuk Kwon
ca1b1e7296
[Model Runner V2] Refactor prefill token preparation ( #29712 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-28 19:49:17 -08:00
Tsukasa OI
762a4a6ca9
[Frontend] Perform offline path replacement to tokenizer ( #29706 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com >
2025-11-28 18:32:08 -08:00
Cyrus Leung
b2c50eda50
[Bugfix] Fix wrong mock attribute ( #29704 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-29 10:30:41 +08:00
Woosuk Kwon
1dcafb3dea
[Model Runner V2] Support penalties using bin counts ( #29703 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-28 17:53:17 -08:00
Andreas Karatzas
ea3370b428
[ROCm][Bugfix] Patch for the Multi-Modal Processor Test group ( #29702 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2025-11-29 01:31:44 +00:00
Mert Unsal
c625d7b1c6
[Bugfix] Fix O(n²) multimodal string prompt processing ( #29667 )
...
Signed-off-by: mertunsall <mertunsal1905@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-28 16:10:39 -08:00
Zhengxu Chen
6173682b6e
[compile] Include enable_sleep_mode into caching factors. ( #29696 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-11-29 07:58:38 +08:00
Augusto Yao
9726e64530
bugfix: correct attn output with base 2 or e ( #28840 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
2025-11-29 07:52:12 +08:00
Huamin Li
3fd1fb0b60
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 ( #28971 )" ( #29697 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-11-28 15:26:52 -08:00
Jiangyun Zhu
a51f4186f2
[Bugfix] fix dots.llm1.inst ( #29687 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-28 15:25:26 -08:00
Cyrus Leung
7675ba30de
[Misc] Remove redundant ClassRegistry ( #29681 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-28 15:24:47 -08:00
Ralf Gommers
7c1ed45848
[CI/Build]: make it possible to build with a free-threaded interpreter ( #29241 )
...
Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-28 15:21:46 -08:00
Benjamin Chislett
1986de1375
[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels ( #28597 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-11-28 22:25:05 +00:00
Yanan Cao
3461e7efd8
[Frontend] Remap -O to -cc commandline flag ( #29557 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude <noreply@anthropic.com >
2025-11-28 21:51:12 +00:00
Harry Mellor
fecae12cd7
Remove all_special_tokens_extended from tokenizer code ( #29686 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-28 20:26:51 +00:00
Cyrus Leung
8d9338fae4
[Chore] Rename Processor to InputProcessor ( #29682 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-28 09:35:41 -08:00
Isotr0py
d40c854009
[CI/Build] Rework CPU multimodal processor test ( #29684 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-28 17:10:29 +00:00
Harry Mellor
4332955602
[Docs] Add CLI reference doc for vllm bench sweep plot_pareto ( #29689 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-28 08:10:08 -09:00
Isotr0py
f946a8d743
[Chore]: Reorganize model repo operating functions in transformers_utils ( #29680 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-28 08:46:51 -08:00
Isotr0py
6f9d81d03b
[V0 deprecation] Clean up legacy paged attention helper functions ( #28043 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-28 16:44:33 +00:00
Didier Durand
fae6943068
[Doc]: fixing typos in multiple files. ( #29685 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-28 08:41:41 -08:00
果冻虾仁
3bcbb30cbf
add add_truncate_prompt_tokens in repr for PoolingParams ( #29683 )
2025-11-28 08:41:05 -08:00
Cyrus Leung
9e6bcda3ac
[mypy] Enable type checking for more directories ( #29674 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-28 08:39:27 -08:00
Harry Mellor
9eec282cb5
Guard FlashInfer sampler using the same check as FlashInfer attention backend ( #29415 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-28 08:34:48 -08:00
Cyrus Leung
0808eb813b
[Misc] Remove yapf directives ( #29675 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-28 15:07:23 +00:00
Mingyuan Ma
460d8bbf2d
Remove upstream fa checks ( #29471 )
...
Signed-off-by: mingyuanm <mingyuanm@nvidia.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-28 05:52:42 -08:00
Li, Jiang
e2f56c309d
[CPU] Update torch 2.9.1 for CPU backend ( #29664 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-11-28 13:37:54 +00:00
HappyAmazonian
f8151b66fa
Revert "Supress verbose logs from model_hosting_container_standards (… ( #29335 )
...
Signed-off-by: Shen Teng <sheteng@amazon.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-28 05:29:05 -08:00
Cyrus Leung
1168768a2d
[Optimization] Early return for _apply_matches and _iter_placeholders ( #29668 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-28 13:26:47 +00:00
Nick Hill
8e7a891602
[BugFix] Fix spec decoding max_tokens scheduling perf issue ( #29542 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-28 20:52:23 +08:00
Cyrus Leung
953d9c820b
[mypy] Pass type checking for vllm/utils and vllm/v1/pool ( #29666 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-28 20:40:47 +08:00
Cyrus Leung
33b06a6f24
[Misc] Remove redundant attention var constants ( #29650 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-28 04:35:19 -08:00
Wilson Wu
5c2b5cb422
[Docs] Add SPLADE and Ultravox models to supported models documentation ( #29659 )
...
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-11-28 01:29:28 -09:00
杰兮
3cb32e5d6e
[Rocm] Set VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS default is disabled ( #28985 )
...
Signed-off-by: zhyajie <yajizhan@amd.com >
Co-authored-by: zhyajie <yajizhan@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-11-28 02:08:42 -08:00
Cyrus Leung
ccbdf51bd5
[Doc] Reorganize benchmark docs ( #29658 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-28 17:19:25 +08:00
Filipp Fisin
5f5521bd5d
Fix parameter order in GPT-OSS weight loading function for non-MXFP4 weights ( #29506 )
...
Signed-off-by: Filipp Fisin <48059208+qGentry@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-28 00:45:10 -08:00
Julien Denize
b2c1d294fa
[BUGFIX] MistralTokenizer._call__ adds an invalid EOS token ( #29607 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-28 16:44:47 +08:00
maang-h
cc0f2a0e19
[Doc] Improve abnormal information string ( #29655 )
...
Signed-off-by: maang <maang_h@163.com >
2025-11-28 00:12:20 -08:00
rongfu.leng
480598958e
[Feature][Bench] Add pareto visualization ( #29477 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-11-27 23:53:20 -08:00
Cyrus Leung
b34e8775a3
Revert "[CPU]Update CPU PyTorch to 2.9.0 ( #29589 )" ( #29647 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-27 22:43:18 -08:00
wang.yuqi
f4b76056ee
Improve enable chunked_prefill & prefix_caching logic. ( #26623 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-27 22:05:48 -08:00
EanWang211123
37b15e97e8
[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl ( #29594 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn >
Co-authored-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-11-27 22:05:45 -08:00
maang-h
c7ba1f6bc7
[BugFix] Fix ValueError in NewRequestData repr methods ( #29392 )
...
Signed-off-by: maang <maang_h@163.com >
2025-11-28 13:42:30 +08:00
Wilson Wu
18523b87f6
[Docs] Update supported models for Olmo 3 in tool calling documentation ( #29411 )
...
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com >
2025-11-28 02:53:55 +00:00
Xin Yang
745a3bae1a
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 ( #28971 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-28 10:48:28 +08:00
scydas
35657bcd7a
[CPU]Update CPU PyTorch to 2.9.0 ( #29589 )
...
Signed-off-by: scyda <scyda@outlook.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2025-11-28 09:34:33 +08:00
Lucas Wilkinson
be493e0b3c
[BugFix] Fix new nightly failures ( #29578 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-27 13:45:38 -08:00
Woosuk Kwon
ae0ce1be27
[Model Runner V2][BugFix] Keep reference to GPU tensors in AsyncOutput ( #29623 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-27 12:38:53 -08:00
Andrii Skliar
a5345bf49d
[BugFix] Fix plan API Mismatch when using latest FlashInfer ( #29426 )
...
Signed-off-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com >
Co-authored-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com >
2025-11-27 11:34:59 -08:00
Nicolò Lucchesi
e5a621b724
[CI] Add batched audios Whisper test ( #29308 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-27 19:31:52 +00:00
Isotr0py
38658ec6f3
[Bugfix][MM encoder] Fix ViT attention backend resolving for Turing GPU ( #29614 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-27 19:17:37 +00:00
Cyrus Leung
a24ea5414b
[Deprecation] Advance deprecation status ( #29617 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-27 19:04:58 +00:00
Cyrus Leung
ea228b4491
[Misc] Remove unused code from protocol.py ( #29616 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-27 18:39:59 +00:00
果冻虾仁
d45269b378
add skip_reading_prefix_cache in repr for PoolingParams ( #29620 )
2025-11-27 09:21:00 -08:00
Cyrus Leung
ee9841daa9
[Bugfix] Fix doc build on main ( #29619 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-27 09:08:08 -08:00
Injae Ryou
0840abdd24
[BugFix] Optional tokenizer argument when loading GGUF models ( #29582 )
...
Signed-off-by: Injae Ryou <injaeryou@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-27 16:53:10 +00:00
Harry Mellor
e1f262337b
Update Transformers pin in CI to 4.57.3 ( #29418 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-27 08:42:14 -08:00
Matthew Bonanni
fc1d8be3dc
[Attention] Update attention imports ( #29540 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-27 11:19:09 -05:00
Mathis Felardos
cd007a53b4
[bugfix] avoid NIXL_ERR_REMOTE_DISCONNECT in nixl_connector when Prefill dies ( #28120 )
...
Signed-off-by: Mathis Felardos <mathis@mistral.ai >
2025-11-27 15:32:38 +00:00
Didier Durand
66d3d5422c
[Doc]: fixing typos in diverse files ( #29492 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-27 07:15:50 -08:00
Ryan Rock
bab438ff3e
[CI/Build] Skip ray tests on ROCm ( #29556 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2025-11-27 07:01:37 -08:00
Li, Jiang
882851dc81
[CI/Build][Bugfix] Fix auto label issues for CPU ( #29610 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-11-27 14:51:26 +00:00
Jee Jee Li
2f5f9acd55
[LoRA] Continue optimizing MoE LoRA weight loading ( #29322 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-27 05:56:28 -08:00
Roger Wang
cf348c8d27
[Bugfix] Fix HunyuanVL XD-RoPE ( #29593 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored by: grider-transwithai <grider@transwith.ai >
2025-11-27 12:36:24 +00:00
Li, Jiang
a5abd1d384
[CI] Auto label CPU related issues ( #29602 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-11-27 11:33:19 +00:00
Cyrus Leung
e6d4f3c254
[Bugfix] Fix pre-commit ( #29601 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-27 02:23:06 -08:00
maang-h
51906c8c55
[Docs] Improve priority parameter documentation ( #29572 )
...
Signed-off-by: maang <maang_h@163.com >
Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-11-27 02:09:24 -08:00
Morrison Turnansky
0838b52e2e
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): Set up -O infrastructure ( #26847 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Co-authored-by: adabeyta <aabeyta@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-27 01:55:58 -08:00
Cyrus Leung
00d3310d2d
[Bugfix] Update Ultravox compatibility ( #29588 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-27 01:36:18 -08:00
Woosuk Kwon
da3222f371
[Model Runner V2] Implement multi-step Eagle with CUDA graph ( #29559 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-27 00:09:41 -08:00
Micah Williamson
43c5792592
[ROCm][CI] Fix test_cpu_offloading for ROCm ( #29548 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-11-27 07:54:44 +00:00
Johnny Yang
3ecabd06ee
Fix tpu-inference platform path ( #29554 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com >
2025-11-26 23:25:21 -08:00
Jee Jee Li
c069086b9c
[Bugfix] Fix getting device for MoE LoRA ( #29475 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-26 23:16:07 -08:00
Woosuk Kwon
11ea5ec1ff
[Model Runner V2] Refactor CudaGraphManager ( #29583 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-26 21:37:59 -08:00
Fadi Arafeh
ecb1952378
[cpu][fix] Fix Arm CI tests ( #29552 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-11-27 13:09:41 +08:00
TJian
da8e1a1bf9
[DOC] Add vLLM Bangkok Meetup info ( #29561 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-27 04:42:50 +00:00
Woosuk Kwon
ee80aee1ca
[Model Runner V2] Minor cleanup for build_attn_metadata ( #29576 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-26 20:10:12 -08:00
Woosuk Kwon
0aeb698b77
[Model Runner V2] Minor code cleanup ( #29570 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-26 19:47:17 -08:00
Louie Tsai
9bb33c8919
add xpu supported model and model id for cpu ( #29380 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
2025-11-27 11:30:50 +08:00
Jinzhen Lin
a67dec7cba
[Bugfix] fix IMA issue in certain cases of the moe marlin kernel ( #28619 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-26 19:02:21 -08:00
Matthew Bonanni
77740191de
[Attention][Async] Eliminate seq_lens_cpu in FlashAttention metadata building with DCP > 1 ( #29449 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-26 18:48:43 -08:00
HDCharles
df01eda4dc
[Bugfix] Make compressed-tensors MoEs respect ignored layers ( #28878 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com >
2025-11-26 21:35:13 -05:00
Johnny Yang
ba1fcd84a7
[TPU] add tpu_inference ( #27277 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com >
2025-11-26 14:46:36 -08:00
Lucas Wilkinson
56539cddac
[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building ( #28579 )
2025-11-26 14:07:13 -05:00
Matthew Bonanni
430dd4d9eb
[Attention] Remove imports from vllm/attention/__init__.py ( #29342 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-26 10:53:15 -07:00
Alec
c4c0354eec
[CI/Build] allow user modify pplx and deepep ref by ENV or command line ( #29131 )
...
Signed-off-by: alec-flowers <aflowers@nvidia.com >
2025-11-26 17:41:16 +00:00
HDCharles
e603129505
[refactor] CTConfig methods to static/class methods ( #28870 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-26 17:21:58 +00:00
Wentao Ye
0b0aa874e8
[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement ( #29345 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-26 09:38:52 -07:00
Huamin Li
70d5953f82
Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion ( #28841 )" ( #29483 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-11-26 22:27:26 +08:00
yxt
3650a74ed8
Optimize the wording of the document and unify the terminology and th… ( #29491 )
2025-11-26 05:16:12 -08:00
Yejing Lai
bb706d6048
Fix TeleChatForCausalLM not register issue ( #29473 )
...
Signed-off-by: Lai, Yejing <yejing.lai@intel.com >
2025-11-26 05:15:00 -08:00
Cyrus Leung
e30859dff3
[Bugfix] Fix handling of image embeds in models ( #29480 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-26 05:00:15 -08:00
Roger Wang
452a7c9f7c
[Misc] Allow LM only loading for Pixtral ( #29451 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-26 05:00:00 -08:00
Pleaplusone
d9d342d214
[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek ( #27457 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-26 12:45:28 +08:00
Xin Yang
53d7f1f601
[Kernel] Use pre-allocated output buffer for triton kernel fused_experts ( #29219 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2025-11-26 10:21:00 +08:00
dependabot[bot]
c5ee430328
Bump actions/checkout from 4 to 6 ( #29293 )
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-26 01:57:08 +00:00
Michael Goin
8d6a89dffd
[UX] Suppress gloo log spam ( #29250 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-25 17:19:35 -08:00
George D. Torres
56531b79cc
[Misc] Add backup hash algorithm for FIPS constrained environments ( #28795 )
...
Signed-off-by: George D. Torres <gdavtor@gmail.com >
Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-11-26 00:50:22 +00:00
Xieyang Xu
12866af748
dummy run corner case ( #29433 )
2025-11-26 00:20:35 +00:00
Lucia Fang
d8819c88eb
fix assertion for single world use case (uni) ( #29429 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-11-26 00:14:23 +00:00
Andrey Khalyavin
de75b0bb70
[BugFix] Fix initialization of draft model. ( #29319 )
...
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-11-25 18:45:58 -05:00
Michael Goin
7df0289782
Change warning logs to debug for unimplemented MXFP4 Linear/Attention ( #29441 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-25 22:52:31 +00:00
Zhengxu Chen
0abc79482a
[caching] Add enable_prompt_embeds and cpu_offload_gb to compile hashes. ( #29435 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-11-25 21:46:41 +00:00
Nick Hill
4e57c6587f
[Core] Support logprobs with spec decode + async scheduling ( #29223 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-25 12:55:24 -08:00
Ilya Markov
e7d776273d
[Compile] Refactor. Move PostGradPassManager out of Compilation config ( #29340 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2025-11-25 19:58:56 +00:00
Eldar Kurtić
c32a18cbe7
Attempt to fix GPU OOM in a spec-decoding test ( #29419 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com >
2025-11-25 14:23:36 -05:00
Andrew Xia
b07555d26f
[responsesAPI][2] parse ResponseFunctionToolCallOutputItem ( #29383 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-11-25 10:27:26 -08:00
Harry Mellor
0353d2e162
Fix RoPE related failures in Transformers nightly tests ( #29333 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-25 16:23:45 +00:00
Harry Mellor
a1f2676879
Scheduled removal of override_pooler_config and disable_log_requests ( #29402 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-25 16:08:57 +00:00
Yifan Qiao
48ddb02b79
[Hybrid Allocator] Support KV cache groups with different block_size ( #29143 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-11-25 10:30:57 -05:00
Michael Goin
e502098643
[Kernel] Add NVFP4 MoE CUTLASS support for SM120 ( #29242 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-11-25 06:59:07 -08:00
Michael Goin
dbc3d9991a
[UX] Put CUDA attention backend selection log into one line ( #29337 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-25 06:46:18 -08:00
Injae Ryou
794029f012
[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type ( #29137 )
...
Signed-off-by: Injae Ryou <injaeryou@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-25 14:28:53 +00:00
Eldar Kurtić
0231ce836a
Revert back to torch.equal over torch.allclose from #28819 ( #29086 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com >
2025-11-25 14:23:38 +00:00
Thomas Parnell
516c3f7847
[Bugfix] Fix logic for choosing default prefix caching setting ( #29393 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-11-25 14:05:10 +00:00
Harry Mellor
51fc9e017a
Scheduled removal of CompilationConfig.use_inductor ( #29323 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-25 12:55:42 +00:00
Harry Mellor
bf0c75cd4f
Make Transformers Nightly tests soft-fail and enable all tests ( #29401 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-25 12:41:15 +00:00
Roger Wang
c2c661af9b
[Bugfix] Fix overallocation in MM profiling ( #29386 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-25 12:38:36 +00:00
Nicolò Lucchesi
798e87db5c
[Core] Generalize Encoder-Decoder seq_lens computation to avoid Whisper hardcoded logic ( #29268 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2025-11-25 11:32:11 +00:00
wang.yuqi
de6889946b
[Misc] Suppress log outputs when constructing the default vllm config. ( #29291 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-25 03:00:44 -08:00
wang.yuqi
7a80b01889
[CI] Resettle pooling entrypoints tests. ( #29370 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-11-25 10:39:10 +00:00
Ben Browning
e1dd706cd1
[Frontend] Respect Chat Completion parallel_tool_calls param ( #26233 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-25 09:56:15 +00:00
Andrew Xia
a685b47c57
[responsesAPI] refactor construct_input_messages ( #29359 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-11-25 09:47:10 +00:00
Avishek Goswami
32c40b95e0
[BugFix] bad_words filtering ineffective when n > 1 ( #29313 )
...
Signed-off-by: GOavi101 <1704178@kiit.ac.in >
2025-11-25 09:36:34 +00:00
Nick Hill
db2906108a
[Misc] Streamline unique id generation ( #29375 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-25 08:30:11 +00:00
wang.yuqi
67fc16cd8c
[Bugfix] If chunked_prefill is disabled, end the scheduling early. ( #28911 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-11-25 16:06:09 +08:00
elvischenv
6330f9477d
[Bugfix] Fix GPT-OSS AR+NORM fusion ( #28841 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2025-11-25 07:59:40 +00:00
Micah Williamson
ef1f7030f0
[ROCm][CI] Fix test_cudagraph_mode failure in AMD CI ( #29367 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-11-25 07:55:09 +00:00
Rémi Delacourt
12c007e288
EAGLE Support DP>1 ( #26086 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai >
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com >
Signed-off-by: remi <remi@mistral.ai >
2025-11-25 07:32:21 +00:00
zhrrr
f242cfcdd5
[Perf] use cpu all reduce to avoid sync when async_scheduling & dp > 1 ( #29311 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2025-11-25 15:31:07 +08:00
Icey
888152bf87
Allow oot custom compiler extension via CompilerInterface ( #28623 )
...
Signed-off-by: wxsIcey <1790571317@qq.com >
Signed-off-by: Mengqing Cao <cmq0113@163.com >
Signed-off-by: Icey <1790571317@qq.com >
Co-authored-by: Mengqing Cao <cmq0113@163.com >
2025-11-25 15:25:15 +08:00
Ryan Rock
fe3a4f5b34
[CI/Build] Pin torchgeo dependency for AMD ( #29353 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2025-11-25 07:14:59 +00:00
Fadi Arafeh
98caeadd54
[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up wei ( #29273 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-11-25 15:11:11 +08:00
vllmellm
64deead719
[Bugfix] [ROCm] [UX]: revert Flex attention backend ( #29371 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-11-25 06:56:06 +00:00
Nick Hill
7992324f23
[BugFix] Use unique ids for different transcription prompts ( #29372 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-25 06:55:16 +00:00
Inoki
40a6f53f6c
Display warning only when ROCm version is less than Pytorch required version ( #29200 )
...
Signed-off-by: Inoki <inoki@inoki.cc >
2025-11-25 14:40:06 +08:00
kflu
ce58fdc1c3
Fix PoolingParams.skip_reading_prefix_cache type ( #29364 )
...
Signed-off-by: KFL <kludev@gmail.com >
2025-11-25 06:39:29 +00:00
Fanli Lin
a21256c463
Add TP CLI argument to multimodal inference examples ( #29301 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-25 06:03:20 +00:00
Harry Mellor
316c8492bf
Scheduled removal of guided_* config fields ( #29326 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-25 05:24:05 +00:00
Lucas Wilkinson
2d9ee28cab
[CI/Test Fix] Fix CP tests on Blackwell ( #29338 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-24 20:55:57 -08:00
Jiangyun Zhu
81db702ed2
[Attention] add _cudagraph_support for linear attention ( #28934 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-25 12:25:20 +08:00
Isotr0py
92effb07a4
[Model] Add HunyuanOCR support ( #29327 )
...
Signed-off-by: manayang <jackmanayang@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: sergeywang <sergeywang@tencent.com >
Co-authored-by: manayang <jackmanayang@gmail.com >
Co-authored-by: manayang <manayang@tencent.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-25 03:28:51 +00:00
Maryam Tahhan
87185c88d5
[Bugfix] Make deprecated --task embedding consistent with `--runner… ( #29312 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
2025-11-25 03:19:52 +00:00
Mark McLoughlin
9cf4edae6e
[Metrics] Scheduled removal of deprecated metrics ( #29330 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-25 11:15:13 +08:00
汪志鹏
7012d8b45e
[Docker] Optimize Dockerfile: consolidate apt-get and reduce image size by ~200MB ( #29060 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com >
2025-11-24 19:54:00 -07:00
Divakar Verma
22b42b5402
[CI][ROCm] Install arctic-inference on ROCm tests ( #29344 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-11-25 02:15:39 +00:00
gbyu-amd
cb7214d8ea
[ROCm][MLA] enable fp8 MLA decode on ROCm ( #28032 )
...
Signed-off-by: guanbao <gyu@amd.com >
Signed-off-by: Guanbao Yu <gyu@amd.com >
Signed-off-by: gbyu-amd <Guanbao.Yu@amd.com >
Co-authored-by: guanbao <gyu@amd.com >
2025-11-25 10:15:02 +08:00
Pleaplusone
77e10c9cab
[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence ( #28029 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-24 19:05:46 -07:00
Michael Goin
6f1355a1b7
[Perf] Disable DeepGEMM MoE by default when TP=8 is used ( #29346 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-24 19:01:40 -07:00
Harry Mellor
a4ad43ad5a
Scheduled removal of ParallelConfig's direct child EPLB fields ( #29324 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-25 01:58:58 +00:00
Nick Hill
a178a0b40b
[BugFix] Fix duplicate id tool-call race condition ( #29355 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-25 01:54:26 +00:00
Kunshang Ji
b8328b49fb
[XPU] upgrade torch & ipex 2.9 on XPU platform ( #29307 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-25 09:34:47 +08:00
Hanjie Qiu
5f9679a43b
[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states ( #27688 )
...
Signed-off-by: hjjq <hanjieq@nvidia.com >
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-24 20:13:12 -05:00
Wentao Ye
699bca76c0
[UX] Raise error for attn backend of batch invariant ( #29348 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-24 17:49:01 -07:00
Michael Goin
c17610e2ba
[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 ( #29339 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-24 18:22:46 -05:00
Chen Zhang
71df2a57ef
[Hybrid Allocator] Better layer padding strategy for gpt-oss eagle ( #29303 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-11-24 14:28:32 -08:00
Tyler Michael Smith
4dd42db566
Remove VLLM_SKIP_WARMUP tip ( #29331 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-11-24 22:16:05 +00:00
Nick Hill
84371daf75
[Tests] Verify gpt_oss package is installed in harmony tests ( #29336 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-24 22:04:31 +00:00
Woosuk Kwon
f32c7d6f54
[Model Runner V2] Simplify Eagle bookkeeping with num_rejected ( #29347 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-24 13:54:59 -08:00
Yan Ma
3cfa63ad99
[XPU]fix Kimi-VL-A3B-thinking on xpu ( #29309 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-11-24 21:02:21 +00:00
Benjamin Bartels
4d6afcaddc
[CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies ( #29270 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
2025-11-24 11:40:54 -08:00
Woosuk Kwon
97588c4d12
[Model Runner V2] Add minor clarification comments for Eagle ( #29332 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-24 11:28:56 -08:00
Chenheli Hua
839c6b7b72
[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. ( #27721 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-24 19:24:37 +00:00
bnellnm
8f066146c3
[MoE][Refactor] Make select_experts a non-static method ( #29067 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-11-24 13:38:04 -05:00
Woosuk Kwon
cec418b5df
[Model Runner V2] Change Numba AoT to JIT ( #29328 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-24 09:34:37 -08:00
Woosuk Kwon
cc313cb73d
[Model Runner V2] Implement Single-step Eagle 1 ( #29300 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-24 09:32:27 -08:00
Nicolò Lucchesi
26a465584a
[NIXL] Use config to enable telemetry + NIXL version bump ( #29305 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-24 17:18:04 +00:00
Varun Sundar Rabindranath
e924bbb4f4
[Build/CI][DP/EP] Add QWen/Qwen3-30B-A3B-FP8 + EPLB tests to Nightly H100 and B200 ( #29195 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-24 16:06:17 +00:00
Aydin Abiar
656516c315
[Bugfix] properly handle nested json with llama3 tool parser ( #27701 )
...
Signed-off-by: Aydin Abiar <aydin@anyscale.com >
Signed-off-by: Aydin Abiar <62435714+Aydin-ab@users.noreply.github.com >
Co-authored-by: Aydin Abiar <aydin@anyscale.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-24 15:28:51 +00:00
vllmellm
e48b2e6848
[Bugfix] [ROCm] [UX] Reorganize ROCm Backend Selection Logic ( #26980 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-11-24 15:24:49 +00:00
Laith Sakka
7a228b5305
Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. ( #26199 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2025-11-24 10:12:41 -05:00
Yuan Tang
f716a15372
Update KServe guide link in documentation ( #29258 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2025-11-24 14:40:05 +00:00
WeiQing Chen
2601f18a82
[EPLB] Optimize EPLB for Async Rearrange Experts ( #22179 )
...
Signed-off-by: David Chen <530634352@qq.com >
Co-authored-by: SunChenxiang123 <1291824390@qq.com >
2025-11-24 09:08:29 -05:00
R3hankhan
4de87866a8
[CPU][IBM Z] Fix BF16 support and vectorize math operations for s390x ( #28926 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2025-11-24 12:08:09 +00:00
Didier Durand
eca7a8fb59
[Doc]: fix typos in various files ( #29230 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-24 11:10:48 +00:00
杰兮
8005e606bf
[Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-MTP ( #27563 )
...
Signed-off-by: zhyajie <yajizhan@amd.com >
Co-authored-by: zhyajie <yajizhan@amd.com >
2025-11-24 10:16:52 +00:00
rongfu.leng
68dfe28eae
[Feature][Benchmark] add --link-vars can filter when serve_param equal bench_param ( #28909 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-11-24 02:02:28 -08:00
Fanli Lin
ed40d85929
[BugFix] Fix R-VL model loading error ( #29299 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-23 22:48:45 -08:00
Roger Wang
0ff70821c9
[Core] Deprecate xformers ( #29262 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-24 04:18:55 +00:00
tongqiu
5253f4276f
[ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter Flash Attention ( #28376 )
...
Signed-off-by: apinge <Tong.Qiu2@amd.com >
2025-11-24 03:26:00 +00:00
Zero
30854783ad
[Model] Add OpenCUA-7B support ( #29068 )
...
Signed-off-by: lim4349 <rockmanzero@naver.com >
Signed-off-by: Zero <rockmanzero@naver.com >
Co-authored-by: Cloud User <ubuntu@a100-80g-4.novalocal >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-24 10:27:55 +08:00
Jee Jee Li
1073ba68b0
[LoRA] Optimize 3D MoE logic ( #29222 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-24 10:27:23 +08:00
Josh Moore
c309bb5245
[Bugfix] Update Gradio OpenAI Chatbot Webserver example to new Gradio message history format ( #29249 )
...
Signed-off-by: joshiemoore <joshiemoore98@gmail.com >
2025-11-24 00:47:54 +00:00
Woosuk Kwon
3e1ad40655
[Model Runner V2] Add apply_temperature option to gumbel_sample ( #29276 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-23 14:13:00 -08:00
Woosuk Kwon
62d54ba46d
[Model Runner V2] Optimize CUDA graph capture time ( #29275 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-23 11:15:32 -08:00
Woosuk Kwon
b004c00418
[Model Runner V2] Support spec decoding [1/N] ( #29274 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-23 10:09:06 -08:00
Woosuk Kwon
7f12c82fa6
[Model Runner V2] Change bookkeeping logic in preparation for spec decoding ( #29194 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-23 09:42:52 -08:00
Luke
6fb0215eee
[Bugfix] Use lazy string reference for DeepseekV3Config in config registry ( #28958 )
...
Signed-off-by: Luke <yq0536@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-23 11:43:21 +00:00
Micah Williamson
55c21c8836
[ROCm][CI] Fix "Cannot re-initialize CUDA in forked subprocess" in test_pynccl.py ( #29119 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-11-23 13:05:00 +08:00
rasmith
3999442f1c
[CI/Build][AMD] Add check for flash_att_varlen_func to test_tree_attention.py ( #29252 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-23 04:45:08 +00:00
rasmith
71362ffab4
[CI/Build][AMD] Skip test_multi_shared_storage_connector_consistency in test_multi_connector.py due to hipErrorLaunchFailure when calling .cpu() ( #29253 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-23 04:42:49 +00:00
Woosuk Kwon
20ee418adc
[Model Runner V2] Minor fix for cudagraph_utils ( #29256 )
2025-11-22 20:12:50 -08:00
Cyrus Leung
389aa1b2eb
[Doc] Update more docs with respect to V1 ( #29188 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-23 10:58:48 +08:00
Michael Act
3ed767ec06
docs: fixes distributed executor backend config for multi-node vllm ( #29173 )
...
Signed-off-by: Michael Act <michael.a.c.tulenan@gdplabs.id >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-23 10:58:28 +08:00
jiahanc
5f96c00c55
[Fix] Add SM check to flashinfer MOE backend ( #29144 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-11-23 00:39:30 +00:00
Qidong Su
4587063267
Patch DeepEP when building docker image with CUDA 13 ( #29154 )
...
Signed-off-by: Qidong Su <soodoshll@gmail.com >
2025-11-22 23:25:13 +00:00
Wentao Ye
472fdee974
[Chore] Update batch invariant code owner ( #29246 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-22 13:50:02 -08:00
Yizhou
df78aeef08
Refactor: Move CUDA graph dispatch logic earlier ( #27382 )
...
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com >
2025-11-22 16:10:31 -05:00
Nick Hill
7df331c66b
[BugFix] Fix chunked prompt logprobs + preemption ( #29071 )
2025-11-22 16:07:18 -05:00
Benjamin Bartels
eb5352a770
[CI/build] Removes source compilation from runtime image ( #26966 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
2025-11-22 10:23:09 -08:00
Cyrus Leung
d1cf8214e5
[Bugfix] Use HF config fields as fallback when loading Mistral config ( #29239 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-22 11:22:48 -07:00
Fadi Arafeh
730bd35378
[perf][cpu] Accelerate paged attention GEMMs (QK, PV) on Arm CPUs with NEON ( #29193 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-11-22 09:04:36 -08:00
Federico
f55c76c2b3
chore: add RTX_PRO_6000 GLM4.6-FP8 kernel tuning ( #29240 )
2025-11-22 08:42:48 -08:00
ZiTian Zhao
d84d8f4429
Fix EVS crash when using video_embeds inputs in Qwen2.5-VL ( #29232 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-22 06:48:59 -08:00
Cyrus Leung
ae66818379
[Misc] Fix pre-commit ( #29238 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-22 06:48:01 -08:00
Nick Hill
d44a63c6d6
[BugFix] Fix returned logprobs with spec decode + prefill chunking ( #29216 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-22 22:41:25 +08:00
Nicolò Lucchesi
066209a045
[Attention] Refactor FA block_size limitations to hybrid models only ( #29084 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-22 06:38:44 -08:00
Bram Wasti
5f7209a793
[tiny] Remove unsupported TRITON_MLA backend from batch invariance ( #28832 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-22 21:00:50 +08:00
yihong
2d4978a57e
fix: clean up function never use in setup.py ( #29061 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-11-22 05:00:04 -08:00
Nandan Vallamdasu
6965a392a4
Fix: Resolve circular import in model_loader/utils.py ( #29189 )
...
Signed-off-by: nandan2003 <nandan.vallamdasu@outlook.com >
Signed-off-by: Nandan Vallamdasu <nandan.vallamdasu@outlook.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-22 04:58:22 -08:00
Cyrus Leung
5a4802588e
[Misc] Further clean up chunked prefill and prefix caching init ( #29186 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-22 19:34:15 +08:00
rasmith
8e22da1d7f
[CI/Build Don't add FLASHINFER backend in test_cpu_offloading.py ( #29229 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-22 11:00:54 +00:00
rasmith
a4fdf2405c
[CI/Build] Skip tests that require libcudart in test_lmcache_integration.py ( #29228 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-22 10:59:39 +00:00
Jane (Yuan) Xu
e6309acdba
Simplify from_blob usage in get_cuda_view_from_cpu_tensor ( #29027 )
...
Signed-off-by: Jane Xu <janeyx@meta.com >
2025-11-22 10:35:32 +00:00
jinghanhu
988ee66b0d
Handle triton kernel import exception ( #29062 )
2025-11-22 10:07:50 +00:00
Mads Kildegård
ea38474ac5
[Frontend][Responses API] Multi-turn (with type: "output_text") support for non-harmony requests ( #29175 )
...
Signed-off-by: Mads Kildegård <mkildegaard99@gmail.com >
2025-11-22 09:58:22 +00:00
Andrew Xia
742e9ff6b3
[responsesAPI] parse reasoning item input ( #28248 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-22 15:42:11 +08:00
Woosuk Kwon
e9056056fb
[Model Runner V2] Limit cudagraph size to max decode batch size ( #29221 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-21 20:21:35 -08:00
Jee Jee Li
1489902b53
[LoRA] Cleanup FusedMoEWithLoRA ( #29187 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-22 04:01:30 +00:00
Yanan Cao
933f67ecd8
[Bugfix]Fix a conditional to not check zero value ( #28754 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-11-21 19:59:07 -08:00
rasmith
fd65015a14
[CI/Build] Only use supported types and features on ROCm in MoE kernel tests ( #29149 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-21 20:34:33 -07:00
Yihua Cheng
77e1c035d0
[chore][LMCache connector] Remove useless logs from lmcache connector ( #29069 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2025-11-22 03:18:00 +00:00
rasmith
6f403501a0
[CI/Build][AMD] Enable Entrypoints Integration Test (Pooling) to run without error on ROCm ( #29212 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-22 02:13:18 +00:00
FlintyLemming
052950e5b3
Add fused MoE config for H200 E160 N192 fp8 ( #29182 )
...
Signed-off-by: FlintyLemming <admin@flinty.moe >
2025-11-21 17:37:51 -08:00
qli88
1ef9c9e294
[CI/Build] Disable test_gptoss_tp.py in 'LoRA TP Test' group for ROCm platform ( #29204 )
...
Signed-off-by: qli88 <qiang.li2@amd.com >
2025-11-21 17:36:19 -08:00
Jie Luo
5c8f2adf50
[Bugfix] Fix block size in block_table with PCP ( #29094 )
...
Signed-off-by: Livinfly <luojie3m@gmail.com >
2025-11-22 01:34:28 +00:00
Ryan Rock
ed8e6843cc
[CI/Build] Add terratorch for AMD ( #29205 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2025-11-21 17:31:22 -08:00
Lukas Geiger
d045e22dfe
[Model][Qwen3VL] Tune Triton w8a8 block fp8 kernel for L40s ( #29217 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-21 17:30:55 -08:00
Wentao Ye
1d34eb11e0
[CI] Bug: Fix triton import issue ( #29202 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-21 17:14:49 -08:00
Charlie Fu
9a3101b2ba
[Rocm][CI] Fix DeekSeek V2-Lite Accuracy CI ( #29135 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-11-21 17:11:02 -08:00
Angela Yi
d5dbdbfcb2
[docs] Fix cudagraph mode config ( #29170 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-11-21 17:10:27 -08:00
Lucas Wilkinson
30d6466238
[BugFix] Fix Eagle IndexError: list index out of range for even num_speculative_tokens ( #29102 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-22 00:47:05 +00:00
Woosuk Kwon
e9af6ba62a
[Model Runner V2] Optimize Gumbel Sampling Kernel ( #29210 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-21 15:52:28 -08:00
Mark McLoughlin
c6fa3895e9
[KV Connector] Fix async connector prefix cache metrics ( #28585 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-11-21 17:45:00 -05:00
Varun Sundar Rabindranath
3137991f55
[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor ( #29162 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-21 14:28:17 -08:00
Julien Denize
57430fc95c
Default model load/config/tokenizer to mistral format if relevant files exist ( #28659 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-11-21 13:58:59 -08:00
Lucas Wilkinson
c68c7b403d
[BugFix] Fix missing symbol triggering FA2 fallback on Hopper ( #29107 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-21 13:58:32 -08:00
Ning Xie
53a1ba6ec5
[log] add weights loading time log to sharded_state loader ( #28628 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-21 21:06:09 +00:00
Lucas Wilkinson
1840c5cb18
[BugFix] Make sure to allocate worst case MoE workspace during profile run in the DP + EP case ( #27426 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-21 11:41:52 -08:00
Woosuk Kwon
1bed891f72
[Chore] Fix pre-commit error after #25266 ( #29190 )
2025-11-21 10:21:40 -08:00
Cyrus Leung
ceca060501
[Deprecation] Deprecate seed=None ( #29185 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-21 18:19:25 +00:00
Charlie Fu
75648b16dd
[ROCm][CI] Fix config/test_config_generation.py ( #29142 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-11-21 17:12:16 +00:00
Chendi.Xue
460d02a417
[NIXL] Fix after virtual block_size for host_buffer with heter kv_layout ( #29122 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-11-21 08:55:27 -08:00
Mingyuan Ma
b4c8fbaae2
Add TRTLLM MoE NVFP4 kernel to CompressedTensorsW4A4MoeMethod ( #28892 )
...
Signed-off-by: mingyuanm <mingyuanm@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-11-21 09:54:11 -07:00
rasmith
e99e467384
[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in lighting_attn.py ( #29132 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-21 11:53:09 -05:00
Wentao Ye
a42ab317ac
[Log] Optimize startup log ( #28948 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-11-21 08:46:20 -08:00
Aleksandr Malyshev
b7f1f490a6
Upstream triton fp4 weight preshuffle ( #28888 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-11-21 11:34:46 -05:00
Woosuk Kwon
30b44a1598
GPU Model Runner V2 ( #25266 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-11-21 08:20:55 -08:00
Wentao Ye
1f400c58b8
[CI] Add batch invariant test to ci ( #27842 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-21 09:20:33 -07:00
rasmith
711241c13c
[CI/Build] Fix illegal memory access and unsupported test in kernels/attention/test_cache.py ( #29118 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-21 10:58:38 -05:00
Cyrus Leung
d7219bcda3
[Misc] Move dynamic seed initialization to EngineArgs ( #29165 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-21 15:27:44 +00:00
wangxiyuan
4050bae417
[Doc] Update plugin doc ( #28532 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-21 14:57:26 +00:00
skaraban3807
f1805db1a6
[Perf] These changes enhance the NUMA functionality of vllm for systems with more than one NUMA nodes per socket ( #25559 )
...
Signed-off-by: Siddappa Karabannavar <siddappa.karabannavar@amd.com >
2025-11-21 14:13:52 +00:00
Julien Denize
434f3d3eb8
Fix mistral config ( #29172 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2025-11-21 14:01:20 +00:00
sfbemerk
2092ce8c39
Tool Call Parser logs should not contain user input / model output except on DEBUG ( #29160 )
...
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-21 20:57:19 +08:00
who who who
fc9f821d20
fix cross attention ( #28346 )
...
Signed-off-by: fsx950223 <fsx950223@outlook.com >
2025-11-21 04:55:43 -08:00
Cyrus Leung
9452863088
Revert "Revert #28875 ( #29159 )" ( #29179 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-21 04:27:43 -08:00
Bhagyashri
2b1b3dfa4b
Update Dockerfile to use gcc-toolset-14 and fix test case failures on power (ppc64le) ( #28957 )
...
Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com >
2025-11-21 12:24:09 +00:00
Russell Bryant
cca2d2cdbe
[Core] Align whisper closer to other multimodal models ( #27292 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-11-21 12:01:54 +00:00
Cyrus Leung
aab0102a26
[V0 deprecation] Remove more V0 references ( #29088 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-21 11:56:59 +00:00
WeiQing Chen
b34129bf8e
[Misc] remove useless v1 env ( #29164 )
...
Signed-off-by: David Chen <530634352@qq.com >
2025-11-21 01:41:20 -08:00
Cyrus Leung
4d7231e774
Revert #28875 ( #29159 )
2025-11-21 01:40:17 -08:00
Huamin Li
8ac3a41487
[CI Failure] Fix Gemma3 RoPE configuration for sliding attention layers ( #29111 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-20 23:53:30 -08:00
Canlin Guo
7d6da483b0
[Minor][Clean] Remove the legacy assertion in video ( #29150 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com >
2025-11-20 23:52:34 -08:00
Chenheli Hua
e4c3182c68
[Small] Capture AttributeError when checking ray dependency. ( #29024 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-11-20 22:54:10 -08:00
Alex Brooks
b4734b9550
[Bugfix] Fix default MM LoRA alignment for single str prompts ( #29140 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-11-21 13:32:30 +08:00
Jialin Ouyang
30b9c67743
Revert "[Redo] #26368 ( #28771 )" ( #29121 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-20 21:27:45 -08:00
Matthew Bonanni
11857a00b0
[Attention] Add ROCM_AITER_MLA_SPARSE to attention backend registry ( #29103 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-20 20:24:43 -08:00
Boyuan Feng
8c25f9cfb6
[BugFix] skip combo kernel on cpu ( #29129 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-21 11:50:59 +08:00
Cyrus Leung
56e96b37e4
[V0 Deprecation] Remove best_of ( #29090 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-21 11:40:40 +08:00
Qidong Su
698024ecce
[Doc] update installation guide regarding aarch64+cuda pytorch build ( #28875 )
...
Signed-off-by: Qidong Su <soodoshll@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-11-20 19:40:25 -08:00
jeremyteboul
0730414999
[Core] Add audio_embeds support to chat completions ( #29059 )
...
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com >
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com >
2025-11-21 11:39:47 +08:00
zhrrr
a982f5b5ea
[kernel][perf] support uncontiguous input for rms_norm kernel ( #28103 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-20 19:39:09 -08:00
Cyrus Leung
0e741c12e3
[Bugfix] Fix Plamo3 rope handling ( #29092 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-21 11:38:35 +08:00
Wentao Ye
56669c1f29
[CI] Fix mypy for vllm/v1/worker ( #29037 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-21 11:36:07 +08:00
Hongxia Yang
3f5f36da3f
[ROCm] Fix for import when building with upstream triton for gfx1100 for gpt-oss serving ( #29127 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
2025-11-21 03:30:07 +00:00
Wentao Ye
e1eefa4c40
[Bug] Fix torch warning of tf32 usage ( #29112 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-21 01:54:59 +00:00
Xiao Li
ed6ae1e36a
[AITER] [ROCm] Fix crash when loading llama4 model with old aiter version installed, fallback to forward_native implementation ( #29124 )
...
Signed-off-by: Xiao Li <ilx@meta.com >
2025-11-20 17:54:35 -08:00
Jee Jee Li
9875be6431
[LoRA][2/2]Remove LoRA extra vocab ( #28545 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-21 09:46:43 +08:00
Wentao Ye
df44df0143
[Feature] Shared Experts Overlap with FI deepgemm swap kernel, 2.2% throughput improvement and 3.6% TTFT improvement ( #28879 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-20 18:41:49 -07:00
Michael Goin
87cbbdff63
Update model references for OLMo3 ( #29099 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-11-21 09:16:52 +08:00
Michael Goin
986ab5db63
[CI Bugfix] Fix Kernels DeepGEMM Test (H100) ( #29106 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-20 16:42:33 -08:00
Rob Mulla
dd39f91edb
[Doc] cleanup TPU documentation and remove outdated examples ( #29048 )
...
Signed-off-by: Rob Mulla <rob.mulla@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-21 00:05:59 +00:00
rasmith
c7a29d2c8d
[CI/Build] Remove skip global cleanup in test_struct_output_generate.py ( #29022 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-20 21:44:37 +00:00
rasmith
8237ab8a2b
[CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now ( #29021 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-20 21:35:14 +00:00
Driss Guessous
3fd74189db
Fixes bench ( #29058 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com >
2025-11-20 21:21:54 +00:00
rasmith
5e5a7eb16f
[CI/Build] Make test_attention_selector.py run tests on correct platform ( #29064 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Signed-off-by: rasmith <Randall.Smith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-20 20:45:56 +00:00
rasmith
3d84ef9054
[CI/Build][AMD] Skip if flash_attn_varlen_func not available in test_aiter_flash_attn.py ( #29043 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-20 20:39:49 +00:00
Software Developer
4d01b64284
[Bugfix] - Add Trace Headers to Beam Search Path ( #29100 )
...
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com >
2025-11-20 20:00:33 +00:00
Kevin H. Luu
114b0e2500
[chore] Update annotate release scripts ( #29077 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-11-20 10:22:40 -08:00
Or Ozeri
647464719b
[KVConnector][Core] Support cross-layer KV blocks ( #27743 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-11-20 19:09:59 +01:00
Pan Li
e5bfcb6a88
[BugFix][PD]: make example proxy usable with P2pNcclConnector ( #26628 )
...
Signed-off-by: PAN <1162953505@qq.com >
2025-11-20 17:38:31 +00:00
Alexei-V-Ivanov-AMD
22924383e1
Updating the mirror of test-amd.yaml as of 2025-11-18 ( #29016 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-11-20 12:07:06 -05:00
rookie
56f45eddaf
[Frontend] Optimize beam search loop by sorting and then splicing ( #19347 )
...
Signed-off-by: zhangguozhu <zhangguozhu@360.cn >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: zhangguozhu <zhangguozhu@360.cn >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-11-20 09:02:30 -08:00
TJian
82b05b15e6
[BugFix] [FEAT] Enable fastsafetensors for ROCm platform ( #28225 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-20 16:34:11 +00:00
Fanli Lin
a2e9ebe9e2
[BugFix] Fix flash_attn import in siglip2navit.py ( #29082 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com >
2025-11-20 12:14:29 +00:00
Zhewen Li
93c8672ceb
[Bugfix] Fix spec decode memory regression after #28549 ( #28819 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-20 19:05:50 +08:00
Samit
371b1d4c61
[RL] Add Pause and Resume Generation for Asynchronous RL Training ( #28037 )
...
Signed-off-by: SamitHuang <285365963@qq.com >
Signed-off-by: Samit <285365963@qq.com >
Signed-off-by: samithuang <285365963@qq.com >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-11-20 03:01:03 -08:00
Shinichi Hemmi
c9e093116c
[MODEL] Implement plamo3 ( #28834 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
2025-11-20 03:00:19 -08:00
Or Ozeri
c0c2dd1e0b
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks ( #28951 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-20 18:55:10 +08:00
Pleaplusone
06c20c9904
[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA ( #26670 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-20 02:54:01 -08:00
Anna Shors
6eb745d9bd
Add truncate arg to yarn to match openai implementation of gpt-oss ( #28244 )
...
Signed-off-by: ashors1 <ashors@nvidia.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-11-20 18:53:50 +08:00
cjackal
66483a9d00
[Chore] Update xgrammar version from 0.1.25 to 0.1.27 ( #28221 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
2025-11-20 02:53:09 -08:00
Jinzhen Lin
edfe867208
[Misc] don't cache CUTLASS_REVISION var in CMakeLists.txt ( #28518 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2025-11-20 02:52:53 -08:00
Dezhan
dc45efc8ef
[BugFix] Fix Llama4 Pipeline Parallelism Assert Error ( #28577 )
...
Co-authored-by: Dezhan Tu <dztu@meta.com >
2025-11-20 02:52:36 -08:00
Vensen
fb8851f254
[Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu ( #28760 )
...
Signed-off-by: vensen <vensenmu@gmail.com >
Signed-off-by: Vensenmu <vensenmu@gmail.com >
2025-11-20 02:52:02 -08:00
Boyuan Feng
a903d59ffa
cleanup at::Tag::needs_fixed_stride_order ( #28974 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-20 02:51:36 -08:00
rasmith
322cb02872
[CI/Build][AMD] Fix import errors in tests/kernels/attention ( #29032 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-20 17:48:09 +08:00
Wentao Ye
2c52c7fd9a
[Bug] Fix torch dynamo warning Dynamo detected a call to a functools.lru_cache ( #29038 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-20 16:52:23 +08:00
Bradley D
1e1c06789e
[ci][amd] fix EPLB execution test ( #28742 )
...
Signed-off-by: Bradley Davis <bradleyhd@meta.com >
2025-11-20 14:53:38 +07:00
Pleaplusone
7218f83992
[ROCm][BugFix] Fix shared expert loading error when disable VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS ( #28633 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-20 14:50:23 +07:00
Cyrus Leung
20e4497be2
[V0 Deprecation] Remove num_lookahead_slots ( #29000 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-20 06:39:10 +00:00
Quentin Gallouédec
1c7bcc55b8
[Frontend] Allow parsed tool arguments ( #28820 )
...
Signed-off-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-19 22:20:12 -08:00
Lukas Geiger
a9705a290a
[Model][QwenVL] Replace torch.repeat_interleave with faster np.repeat ( #28964 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-19 22:04:23 -08:00
Isotr0py
64192d5624
[Bugfix] Revert custom attention mask for gemma3-mm ( #28995 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-20 13:23:22 +08:00
Canlin Guo
fe25772aa9
[Bugfix] Handle broken frames in video loading ( #29001 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com >
Signed-off-by: 凌葭 <lvjiang.lj@alibaba-inc.com >
Co-authored-by: 凌葭 <lvjiang.lj@alibaba-inc.com >
2025-11-20 04:38:12 +00:00
prashanth058
0cca9b4d13
[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM ( #28972 )
...
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com >
2025-11-20 03:50:37 +00:00
Shengliang Xu
a8c536829c
Consolidate Nvidia ModelOpt quant config handling for all quantization methods ( #28076 )
...
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com >
2025-11-19 22:39:36 -05:00
Benjamin Chislett
fcbcba6c70
[Feat] Iteration-level profiling for Torch and CUDA profiler ( #28987 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-19 19:17:48 -08:00
Fadi Arafeh
3168285fca
[cpu][ci] Add initial set of tests for Arm CPUs ( #28657 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-11-20 02:37:09 +00:00
Qiang Zhang
3fb0d90999
[AMD] Use Decoupled Kernel Block Size to Support AITER MLA block_size=1 ( #27715 )
...
Signed-off-by: chiangzhang <chiangzhang@tencent.com >
2025-11-20 02:11:52 +00:00
Kuntai Du
05c2dee7e9
[DeepSeek + LMCache Multiprocess] handle MLA for deepseek model + LMCache Multiprocess connector ( #29039 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-11-20 01:40:49 +00:00
liangel-02
1d642872a2
[torchao] fix safetensors for sharding ( #28169 )
...
Signed-off-by: Angel Li <liangel@meta.com >
2025-11-19 16:39:45 -08:00
Nick Hill
9ccef8e333
[Misc] Colorize logs ( #29017 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-19 19:26:04 -05:00
Jialin Ouyang
537cc635c7
[GC Debugger] Simply and improve GC Debugger Utils ( #29029 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-20 00:10:22 +00:00
Wentao Ye
5031cd5d55
[Refactor] Optimize select_experts ( #28069 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-19 18:53:15 -05:00
Alexander Matveev
3aaa94ac99
[Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier ( #28687 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-11-19 15:47:13 -08:00
JartX
8e38e99829
[Feature] EPLB on Qwen3VLMoe and CompressedTensorsWNA16MoEMethod ( #28849 )
2025-11-19 18:30:08 -05:00
Wentao Ye
0075bfffd4
[CI] Fix precommit rope_theta issue ( #29040 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-19 14:22:43 -08:00
Max Hu
cb0a7b4bea
[Bugfix] Move flashinfer kernel check into ``__init__` function of `FusedMoE`` ( #29018 )
...
Signed-off-by: Max Hu <hyoung2991@gmail.com >
2025-11-19 21:54:15 +00:00
Lucas Wilkinson
8f4f77a727
[BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 ( #29036 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-19 13:43:54 -08:00
Micah Williamson
22e44ad589
[ROCm][CI] Fix Weight Loading With Multiple GPU Tests on ROCm ( #28984 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-11-19 21:31:33 +00:00
Yongye Zhu
88f5b19f0b
[DeepSeek] Fix DeepSeek V3.2 Rope Embedding ( #28968 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2025-11-19 16:30:04 -05:00
Shu Wang
613abb50d5
[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked ( #25990 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-19 13:29:06 -08:00
Julien Denize
cdeec2e606
[BugFix] Ray with multiple nodes ( #28873 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-11-19 21:20:58 +00:00
Wentao Ye
1607e664f0
[Bug] Fix Batch Invariant MLA test ( #28967 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-19 21:18:32 +00:00
Ryan Rock
68d7231991
[CI/Build] Fix test_prefix_prefill for AMD ( #28905 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2025-11-19 16:04:36 -05:00
Qiu
2fd893b4ce
[Feature] Prefill Context Parallel (PCP) basic support ( #28718 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com >
Signed-off-by: LookAround <lixushi@huawei.com >
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com >
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com >
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com >
Co-authored-by: LookAround <lixushi@huawei.com >
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com >
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com >
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com >
2025-11-19 15:52:44 -05:00
Izzy Putterman
02f5903b84
Eagle: MM Cuda Graphs with MRope ( #28896 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-19 15:01:05 -05:00
Aleksandr Malyshev
ac10fd3c69
Upstreaming aiter triton attention backend as a new backend ( #28701 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-11-19 19:59:30 +00:00
杰兮
9d2d561257
[Bugfix] Fix precision corruption when shared_experts_stream=None ( #28942 )
...
Signed-off-by: zhyajie <yajizhan@amd.com >
Co-authored-by: zhyajie <yajizhan@amd.com >
2025-11-19 19:30:57 +00:00
Robert Shaw
fe69f331f8
[Kernels] Improve H200 Fused MoE Config ( #28992 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-11-19 19:23:54 +00:00
Jialin Ouyang
3319a493fc
[Core] Reuse created spec tokens lists to mitigate GC cost ( #28917 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-19 19:20:22 +00:00
Copilot
61728cd1df
Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests ( #28966 )
...
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-19 13:32:19 -05:00
Yuxuan Zhang
0c80efd94f
GLM-V video segmentation solution adjustment ( #28941 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
2025-11-19 17:32:55 +00:00
Harry Mellor
a8b70304d6
Update rope_scaling to rope_parameters in preparation for Transformers v5 ( #28542 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-19 09:06:36 -08:00
Shanshan Shen
d44e9df7d4
[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device ( #26487 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-11-19 16:24:55 +00:00
Lucas Wilkinson
48fc8b1e59
[BugFix] Fix async-scheduling + FlashAttn MLA ( #28990 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-19 10:04:07 -05:00
vnadathur
1ffe934c8a
[torch.compile] caching of config fields should be opt-out by default ( #26468 )
...
Signed-off-by: vnadathur <glvikramn@gmail.com >
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com >
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com >
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com >
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com >
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com >
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-19 06:13:54 -08:00
Yanan Cao
2c8b9182b5
[CI] Reorganize compile tests so new tests are automatically included in CI ( #28625 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-11-19 06:13:50 -08:00
Harry Mellor
4f5299f717
Relax Transformers modeling backend MoE experts check ( #28952 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-19 21:50:30 +08:00
Didier Durand
09540cd918
[Doc]: fix typos in various files ( #29010 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-19 04:56:21 -08:00
Chen Bruce
da2f6800e0
[Feat][Perf] Enable deepep-low-latency with round-robin expert placement. ( #28449 )
...
Signed-off-by: bruceszchen <bruceszchen@tencent.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-19 13:46:24 +01:00
Tova Movshovitz
ba558c029a
[config] Expose get_total_num_hidden_layers() in ModelConfig ( #28961 )
...
Signed-off-by: tovam <tovam@pliops.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-19 11:37:11 +00:00
Harry Mellor
97cfa99d59
[Docs] Take env var definition out of folded admonition ( #29005 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-19 03:32:04 -08:00
j20120307
bbc6c2f1e5
[CI/Build] Fix broken build on Apple M1 ( #28999 )
...
Signed-off-by: Kan Zhu <j20120307@gmail.com >
2025-11-19 11:07:22 +00:00
ihb2032
8151609583
refactor(cpu_types_scalar.hpp): Unify scalar loop implementations using unroll_loop ( #28847 )
...
Signed-off-by: ihb2032 <1355790728@qq.com >
Co-authored-by: lyd1992 <liuyudong@iscas.ac.cn >
2025-11-19 11:05:44 +00:00
Michael Yao
fdf93486d6
[Docs] Clean up moe_kernel_features.md ( #28530 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-11-19 02:35:29 -08:00
gnovack
d69062c67a
add support for --fully-sharded-loras in fused_moe ( #28761 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-19 16:32:00 +08:00
Louie Tsai
ae4821a108
Add CPU support model ( #28697 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
2025-11-18 23:47:57 -08:00
Didier Durand
7ed27f3cb5
[Doc]: fix typos in various files ( #28945 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-18 22:52:30 -08:00
Michael Goin
a4511e38db
Speed up macOS smoke test ( #28954 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-18 22:46:32 -08:00
Roman Solomatin
71d0ae1c54
[Misc] Update embedding/cross encoder tests to use mteb v2 ( #27329 )
...
Signed-off-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: wang.yuqi <noooop@126.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-11-18 22:28:40 -08:00
Lukas Geiger
3d4e7d34be
[Model][QwenVL] Simplify cos/sin rotary embedding indexing ( #28962 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-19 05:43:01 +00:00
Uranus
6a25ea5f0e
[Docs] Update oneshot imports ( #28188 )
...
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com >
2025-11-19 05:30:08 +00:00
Gleb Kurchanov
73ff872db0
[Bugfix] Fix typo in Qwen3 Next model executor ( #28960 )
...
Signed-off-by: Gleb Kurchanov <nepherpitou@gmail.com >
2025-11-19 05:21:02 +00:00
Xin Yang
468a8d72ba
[Bugfix] Fix FusedMoEModularKernel for triton backend ( #28913 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2025-11-19 13:05:22 +08:00
Matthew Bonanni
4c23690f43
[Attention] FlashAttention ViT support, make default backend ( #28763 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-18 20:06:21 -08:00
Strahinja Stamenkovic
814843e021
Enable bitsandbytes quantization on AMD GPUs that use warp size 32 ( #27307 )
...
Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com >
2025-11-19 03:12:31 +00:00
Li, Jiang
20852c8f4c
[CPU] Refactor CPU WNA16 ( #28826 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-11-19 10:32:00 +08:00
Jialin Ouyang
40b6b38f2c
[Core] Switch Flat logprob control from environment variable to SamplingParams ( #28914 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-11-19 02:10:02 +00:00
Jerry Zhang
da94c7c0eb
Move online quantization to model.load_weights ( #26327 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-11-18 16:52:41 -08:00
tomeras91
1395461f5f
[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op ( #28587 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-11-18 16:49:36 -08:00
Varun Sundar Rabindranath
9912b8ccb8
[Build] Add OpenAI triton_kernels ( #28788 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-18 16:45:20 -08:00
Johnny
49ef847aa8
[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 ( #28938 )
...
Signed-off-by: johnnynunez <johnnynuca14@gmail.com >
Signed-off-by: Johnny <johnnynuca14@gmail.com >
2025-11-18 16:44:27 -08:00
Michael Goin
67745d189f
Supress verbose logs from model_hosting_container_standards ( #28949 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-18 12:29:06 -08:00
Kunshang Ji
2a2d5d2780
Replace torch.cuda.Event with torch.Event for better hardware compatibility ( #26985 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-18 11:34:36 -08:00
Chendi.Xue
c3e2978620
[NIXL] fix cpu PD after physical <> logical block_size PR ( #28904 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-11-18 14:03:23 -05:00
Isotr0py
e4bb2684bc
[Models] Replace all nn.Conv2d with vLLM's Conv2dLayer ( #28842 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-18 18:56:04 +00:00
Kevin H. Luu
c64c0b78de
[chore] Move the rest of wikimedia url to S3 ( #28921 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 09:44:18 -08:00
vllmellm
0af3d4f0df
[FEAT] [AITER] [ROCm] integrate aiter sampling ops ( #26084 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-11-18 17:28:34 +00:00
Nick Hill
da8dadf68b
[Minor] Rename ec_producer field to is_ec_producer ( #28884 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-18 17:26:07 +00:00
Nicolò Lucchesi
f226a3f0c1
[CI][NIXL] Change default block_size for tests ( #28927 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-18 09:22:30 -08:00
Luciano Martins
c2612371ad
[Model] Add Gemma3 GGUF multimodal support ( #27772 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-18 08:56:29 -08:00
Ido Segev
49a986ecd4
[Benchmark] multi_turn: Report warmup-inclusive runtime ( #28937 )
...
Signed-off-by: Ido Segev <idos@pliops.com >
2025-11-18 16:38:22 +00:00
Alex
f6aa122698
[CI Sprint] Quantization CI Cleanup ( #24130 )
...
Signed-off-by: Alex Yun <alexyun04@gmail.com >
2025-11-18 09:21:48 -05:00
Nicolò Lucchesi
184b12fdc6
[Bugfix][NIXL] Fix block_size_ratio when logical !=physical blocks ( #28925 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-18 22:07:50 +08:00
Canlin Guo
b9489f51e1
[Model][Perf] Use cos and sin cache in QwenVL ( #28798 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com >
2025-11-18 11:51:54 +00:00
Song Zhixin
285eaa4285
[Bugfix] Safeguard against missing backend in AttentionBackendEnum ( #28846 )
...
Signed-off-by: jesse <szxfml@gmail.com >
Signed-off-by: Song Zhixin <szxfml@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 10:53:44 +00:00
Nick Hill
439368496d
[BugFix] Fix PP/async scheduling with pooling models ( #28899 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-18 00:20:45 -08:00
Isotr0py
896e41ae04
[CI/Build] Replace wikipedia url with local server ones ( #28908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-18 08:10:55 +00:00
Kuntai Du
5bb1da5190
[MISC] Remove format.sh ( #28906 )
...
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-11-18 05:28:31 +00:00
Nick Hill
5bdd155277
[CI] Fix async scheduling + spec decoding test flake ( #28902 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-18 05:26:32 +00:00
Ning Xie
0168f69e50
[Misc] Remove unnecessary parentheses from log statements ( #28897 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-17 20:33:46 -08:00
Didier Durand
083cf326dc
[Doc]: fix typos in various files ( #28863 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-17 20:32:14 -08:00
Cyrus Leung
bf9e1e8767
[Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields ( #28872 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-17 20:30:29 -08:00
Wentao Ye
3ddcf46011
[Refactor] Remove Unused Func in Batch Invariant ( #28881 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-17 20:29:29 -08:00
xuebwang-amd
d0a73620cc
[ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss ( #28638 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 11:16:45 +08:00
Michael Goin
88ab591f0b
Run macos smoke test workflow on main commit ( #28752 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-18 11:16:03 +08:00
Benjamin Bartels
b6e04390d3
[Bugfix] Fix Kimi-K2 tool parser concatenated tool calls parsing ( #28831 )
...
Signed-off-by: Thomas Mao <yiyeguhu@gmail.com >
Signed-off-by: bbartels <benjamin@bartels.dev >
Co-authored-by: Thomas Mao <yiyeguhu@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-17 19:13:25 -08:00
Zhuohan Li
552cac95b5
[Misc] Fix wrong comment in scheduler ( #28880 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-11-17 15:32:22 -08:00
Bangsheng Tang
61485844fc
[BugFix] Corner case that could cause out-of-sync with external launcher mode and dp >1 ( #28774 )
2025-11-17 15:22:11 -08:00
Pranav
f77bce001a
[Model] Add Afmoe architecture implementation ( #28332 )
...
Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr >
Signed-off-by: Pranav <veldurthipranav@gmail.com >
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr >
2025-11-17 15:11:20 -08:00
Wentao Ye
a289cc1dde
[Test] Batch Invariant: Rename and organize tests ( #27421 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-17 18:09:47 -05:00
Shreyas Kulkarni
95ae50b7d1
[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle ( #28435 )
...
Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com >
2025-11-17 15:01:34 -08:00
Nick Hill
7765e5ba75
[BugFix] Fix PP performance and PP kv connector output regression ( #28768 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-17 14:08:50 -08:00
Ronald
d8874c61a5
[Core] Async Scheduling X Spec Decoding Compatibility ( #24799 )
...
Signed-off-by: Ronald1995 <ronaldautomobile@163.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-11-17 12:16:20 -08:00
Zhewen Li
f8b19c0ffd
[Bugfix] Fix GPT-OSS on AMD after #28603 ( #28816 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-17 13:15:26 -05:00
tiehexue
e42bd8c2e3
Cast return value to int64_t for cache size ( #28814 )
...
Signed-off-by: tiehexue <tiehexue@hotmail.com >
2025-11-17 16:02:32 +00:00
Roger Wang
7f064491f8
[Bugfix][Perf] Revert applying HF processor on text-only inputs for multimodal models ( #28858 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-17 14:49:25 +00:00
Lucas Wilkinson
64e39d667c
[BugFix] Temporary fix for IMA with MTP = 2 and full-cg ( #28315 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-17 09:41:22 -05:00
Kunshang Ji
1b82fb0ad3
[XPU] work around for sp, avoid custom op import error ( #28822 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-17 13:16:44 +00:00
Jae-Won Chung
d4acf518d0
[Metrics] Fix KV cache usage percent metric multiproc ( #28792 )
...
The `vllm:kv_cache_usage_perc` Gauge metric is missing `multiprocess_mode="mostrecent"` and ends up returning
```
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="277"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="275"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="273"} 0.6530455880475035
...
```
The deprecated `vllm:gpu_cache_usage_perc` Gauge metric has `multiprocess_mode="mostrecent"`.
Signed-off-by: Jae-Won Chung <jwnchung@umich.edu >
2025-11-17 09:54:15 +00:00
wuyaoxuehun
ab01cd14e5
[BugFix] Fix glm4_moe_mtp load weights bug ( #28805 )
...
Signed-off-by: wuyaoxuehun <798143193@qq.com >
2025-11-17 17:13:11 +08:00
Li, Jiang
577bb34fff
[CPU][Bugfix] Fix _to_list in CPU model runner ( #28824 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-11-17 07:47:24 +00:00
Jee Jee Li
3380ed5e11
[Doc] Add llama4 LoRA tag ( #28825 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-17 14:08:48 +08:00
Jay Caldwell
6f37419244
[Bugfix][Model] Prevent special token leakage in KimiK2ToolParser streaming mode ( #28543 )
...
Signed-off-by: Jscaldwell55 <jay.s.caldwell@gmail.com >
2025-11-17 13:54:46 +08:00
Xiake Sun
60e089f0b9
[ROCm][Qwen3-32B] Fix AITER MHA accuracy issue cause by #25763 ( #28670 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
2025-11-16 20:52:11 -08:00
liuzhenwei
d64429bb36
[NIXL][XPU] update install script of NIXL ( #28778 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-11-17 03:01:33 +00:00
jiahanc
561253b37f
[Performance][Fix] update nvfp4 code to support renorm routing ( #28569 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-16 18:02:42 -08:00
Nick Hill
80b6080ddc
[BugFix] Fix async scheduling + chunked prefill + preemption ( #28787 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-17 06:46:46 +08:00
amirkl94
03ee48111d
Feature: Support Relu2 in FusedMoE fp8 cutlass path ( #27261 )
2025-11-16 13:39:44 -05:00
Lukas Geiger
5a87076d6e
[Model][QwenVL] Optimize Qwen2_5_VisionAttention q,k preparation ( #28769 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-16 17:37:15 +00:00
Ning Xie
ac1daf3233
fix comment typo ( #28802 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-16 17:03:21 +00:00
Didier Durand
63fed55506
[Doc]: fix typos in various files ( #28811 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-16 14:30:06 +00:00
Anna Shors
8d259fad6c
Fix gpt oss weight loading with EP + bf16 ( #28765 )
...
Signed-off-by: ashors1 <ashors@nvidia.com >
2025-11-16 13:12:45 +00:00
scottzh8
3bc1175798
[Bugfix] Fix host and port join for ipv6 in bench serve ( #28679 )
...
Signed-off-by: Scott Zhang <scottzh@fb.com >
Co-authored-by: Scott Zhang <scottzh@fb.com >
2025-11-16 10:20:57 +00:00
Dezhan
af02c40970
Fixed gpt-oss _load_weights_other() parameter position bug ( #28715 )
...
Co-authored-by: Dezhan Tu <dztu@meta.com >
2025-11-16 09:46:29 +00:00
Lucia Fang
b316ac6589
[V1] Support MP Executor for multi node distributed inference ( #23691 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@fb.com >
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-11-16 09:01:21 +00:00
wang.yuqi
a55b64635c
[Model] Allow users to control skip reading cache per request. ( #28194 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
2025-11-16 00:04:50 -08:00
ai-jz
d231876ce3
[Benchmark] Fix client seed synchronization in multi-turn benchmark ( #28512 )
...
Signed-off-by: ai-jz <aijz.xplr@gmail.com >
2025-11-16 15:04:32 +08:00
Bram Wasti
f849ee739c
Adding a benchmark for batch invariance ( #28161 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-16 13:22:17 +08:00
Lucas Wilkinson
be263f7645
[BugFix] Fix AssertionError: DCP not support reorder_batch_threshold > 1 now. ( #28751 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-15 22:35:06 +00:00
Didier Durand
2bb4435cb7
[Doc]: fix typos in various files ( #28567 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-15 19:27:50 +00:00
Lukas Geiger
07cadab27a
[Model][Qwen3VL] Cache positional embedding indices ( #28475 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-15 19:03:09 +00:00
Nick Hill
637f292196
[CI] Fix broken pipeline ( #28781 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-15 08:44:14 -08:00
Eldar Kurtić
e439c784fa
Add support for Eagle with separate lm-head and embed_tokens layers ( #28549 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com >
2025-11-15 06:12:02 -08:00
hwhaokun
085a525332
[Model] Fix lmhead init bug of bailing_moe ( #28777 )
...
Signed-off-by: hwhaokun <haokun0405@163.com >
Co-authored-by: zhaozx-cn <zhaozx2116@163.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-15 05:44:12 -08:00
Cyrus Leung
89d3679221
[Doc] Fix failing doc build ( #28772 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-15 05:33:27 -08:00
tingtinggithub
cb15ee28db
Allow Gemma3 to take image embeddings ( #28483 )
...
Signed-off-by: tingtinggithub <streamttt@gmail.com >
2025-11-15 04:18:08 -08:00
Angela Yi
f36292dbee
[compile] Enable sequence parallelism matching w/o custom ops enabled ( #27126 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: ProExpertProg <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <luka.govedic@gmail.com >
2025-11-15 11:46:12 +00:00
Vadim Gimpelson
173b356abf
[PERF] Remove TRTLLM Gen attn kernel limitation max_seq_len <=131072 ( #28755 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-15 15:43:41 +05:30
Cyrus Leung
638e4196d1
[Misc] Make SchedulerConfig.max_model_len init-only ( #28733 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-15 01:59:31 -08:00
Zhewen Li
1ec978c209
[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 ( #28709 )
...
Signed-off-by: Zhewen Li <zhewenli@meta.com >
2025-11-15 01:10:48 -08:00
Jane (Yuan) Xu
74b5267d3a
Use narrow over indexing in hadacore_transform to prep for ABI stable ( #28756 )
...
Signed-off-by: Jane Xu <janeyx@meta.com >
2025-11-15 01:10:15 -08:00
Zhuohan Li
dd6ac1c2bb
[RL] [V1] Remove unused device argument from reset_kv_cache ( #28766 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-11-14 23:59:42 -08:00
Cyrus Leung
98b4d389ed
[Redo] #26368 ( #28771 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-14 22:47:41 -08:00
Varun Sundar Rabindranath
6965ef436f
[Performance][DeepGEMM] Estimate expected_m ( #28694 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-15 13:52:14 +08:00
Chendi.Xue
c9e665852a
[NIXL] heterogeneous block_size support ( #26759 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-11-14 21:51:32 -08:00
Mohammad Othman
363aaeef0f
Fix IntermediateTensors initialization and add type hints ( #28743 )
...
Signed-off-by: Mohammad Othman <Mo@MohammadOthman.com >
Co-authored-by: Mohammad Othman <Mo@MohammadOthman.com >
2025-11-15 04:31:36 +00:00
Nick Hill
ac86bff8cb
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… ( #28773 )
2025-11-14 20:24:00 -08:00
Michael Goin
edfe498189
[Bugfix] Build hadacore kernels on >SM90 ( #28748 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-14 19:51:05 -08:00
Lukas Geiger
f05d474c8a
[Model][Qwen3VL] Use mm_position to compute mrope positions ( #28730 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-14 19:45:11 -08:00
QiliangCui
9fc81ec765
[TPU] Fix import error in tpu launch ( #28758 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com >
2025-11-15 00:58:32 +00:00
Jialin Ouyang
186352b270
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization ( #26368 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-14 16:04:04 -08:00
Nick Hill
58e61e56b7
[Test] Rework e2e async scheduling tests ( #28744 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-14 16:01:09 -08:00
Gregory Shtrasberg
75f01b9d3c
[ROCm][CI/Build] Upgrade to ROCm 7.1 and AITER main ( #28753 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-11-14 15:53:21 -08:00
rasmith
ba041d980b
[Log] Save profiler results to file instead of stdout ( #28144 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-14 23:26:39 +00:00
Thomas Parnell
e0c910bb89
[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 ( #28295 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-11-14 22:55:42 +00:00
Benjamin Chislett
bf3ffb61e6
[Bugfix] Fix ChunkedLocalAttention CUDA Graph setting ( #28739 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-14 14:14:46 -08:00
Alexander Matveev
e5c78956c0
[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine ( #28740 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-11-14 14:13:46 -08:00
Laith Sakka
2e0ad629b0
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch ( #25110 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2025-11-14 14:11:10 -08:00
Gregory Shtrasberg
5a84b76b86
[ROCm][CI/Build] Change install location of uv ( #28741 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-11-14 21:34:18 +00:00
Marcin Ostrowski
0de4f217ab
[Bugfix] TypeError: 'NoneType' object is not callable ( #27410 )
...
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com >
2025-11-14 21:13:53 +00:00
Michael Goin
f08eab2acc
[CI] Fix macos smoke test uv cache issue ( #28736 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-14 13:29:55 -07:00
Sage Moore
8977ffb5e6
[ROCm][Bugfix] Fix compilation errors with fused_qknorm_rope_kernel.cu ( #28682 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-11-14 11:06:01 -08:00
Andrey Khalyavin
fd4555089a
[BugFix] Fix misprint introduced by modular_kernel refactoring. ( #28728 )
...
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru >
2025-11-14 10:58:18 -08:00
GuanH
cec275efce
[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure ( #28663 )
...
Signed-off-by: GuanH <guansdrailib@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-14 18:44:27 +00:00
Cyrus Leung
e2741f6cbc
[Chore] Rename SchedulerConfig.chunked_prefill_enabled ( #28735 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-14 18:39:57 +00:00
Harry Mellor
67187554dd
[Docs] Enable some more markdown lint rules for the docs ( #28731 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-14 18:39:19 +00:00
TJian
a425dc256e
[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo ( #28716 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-14 10:30:50 -08:00
Fardin Hoque
964d65deed
LLaMA4 LoRA Adapter Enablement ( #28602 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Co-authored-by: Wei Wei <wwei6@meta.com >
2025-11-14 13:27:56 -05:00
Chen Wang
9261eb3dc1
docs(lora_resolvers): clarify multi-resolver order and storage path requirement ( #28153 )
...
Signed-off-by: Chen Wang <Chen.Wang1@ibm.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-14 18:08:30 +00:00
czhu-cohere
cdd7025961
[kernel] Improve FP8 PTPC on Hopper for larger shapes ( #28692 )
...
Signed-off-by: czhu-cohere <conway.zhu@cohere.com >
2025-11-14 09:59:11 -08:00
Julien Denize
085424808e
Remove audio optional dependency for mistral-common ( #28722 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-14 09:54:38 -08:00
Mohammad Othman
a17e36f223
Fix typo in comment: existance -> existence ( #28737 )
...
Signed-off-by: Mohammad Othman <emranm226@hotmail.com >
2025-11-14 09:35:45 -08:00
Matthew Bonanni
8cc40f8992
[Attention] Bump FA for removed method ( #28429 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-14 09:13:37 -08:00
Nicolò Lucchesi
6f1e7f7226
[DisaggEverything] Tokens in<>out /generate endpoint ( #24261 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-14 09:58:01 -07:00
Michael Goin
d54a18a47e
[CI][CPU] Smoke test for Apple Silicon using GHA MacOS runner ( #28688 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-14 09:37:18 -07:00
Harry Mellor
5f3cd7f7f2
[Docs] Update the name of Transformers backend -> Transformers modeling backend ( #28725 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-14 16:34:14 +00:00
dongbo910220
c934caee88
[Fix] improve aspect ratio in dummy image generation and add common VLM tests for PaddleOCR-VL ( #28711 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-11-14 16:07:20 +00:00
Duncan Moss
3f8a874065
[Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) ( #27134 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-14 08:02:44 -08:00
Cyrus Leung
511a6b611d
[Config] Clean up SchedulerConfig initialization ( #28665 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-14 22:41:02 +08:00
Nicolò Lucchesi
96b23b8e3b
[Bugfix][Nixl] Fix kernel physical<>logical block_size issue ( #28677 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-14 22:40:05 +08:00
zhaozx-cn
433c0f8675
[Model] Fix bailing_moe accuracy problem ( #28277 )
...
Signed-off-by: zhaozx-cn <zhaozx2116@163.com >
2025-11-14 13:33:02 +00:00
Fasal Shah
8d3748d3c7
[Doc] Fix macOS installation dependency resolution issue ( #26721 )
...
Signed-off-by: faisal shah <fashah@redhat.com >
2025-11-14 12:43:56 +00:00
Lucas Wilkinson
db56a59970
[BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (default) ( #28702 )
2025-11-14 12:19:22 +00:00
Yong Hoon Shin
9324e10275
Fix KV sharing fast prefill with cudagraph enabled ( #28537 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-14 11:53:42 +00:00
Jingchun Gao
4516d44b7f
[DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer ( #25438 )
...
Signed-off-by: gaojc <1055866782@qq.com >
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com >
Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com >
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Co-authored-by: gaojingchun (A) <g00955623@china.huawei.com >
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com >
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com >
2025-11-14 11:24:10 +00:00
Shanshan Shen
41b92f7d38
[Model][MM] Extract conv layer as CustomOp ( #28455 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-14 19:16:13 +08:00
Srreyansh Sethi
360bd8762f
[Frontend] Added chat-style multimodal support to /classify. ( #27516 )
...
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com >
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com >
Signed-off-by: vnadathur <glvikramn@gmail.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: vnadathur <glvikramn@gmail.com >
Co-authored-by: wang.yuqi <noooop@126.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-11-14 11:03:55 +00:00
lyn610
ecf8230d4d
[Metrics] Log number of preempted requests ( #28522 )
...
Add tracking and periodic logging for the number of preempted requests in the
metrics logger. This helps monitor system behavior under load.
Signed-off-by: Yining Liu <610lyn@gmail.com >
2025-11-14 09:47:45 +00:00
Xing Liu
8cfbe89b93
[Misc] fix comment in test_envs ( #28529 )
...
Signed-off-by: Xing Liu <xingliu14@gmail.com >
2025-11-14 09:32:46 +00:00
Boyuan Feng
fd75d3e8c0
[Minor] avoid register new custom and just import silly_attn ( #28578 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-14 09:32:31 +00:00
Michael Goin
c9a3a02149
Add output token counting to gsm8k eval ( #28594 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-14 09:32:03 +00:00
Nick Hill
bc3e43069a
[BugFix] Fix multi-modal async scheduling race condition ( #28706 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-14 01:11:13 -08:00
Jiangyun Zhu
c36bcfe6b3
[Bugfix] fix dots.ocr pp support ( #28705 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-14 09:01:26 +00:00
Yan Ma
529cea343d
use default CCL_ZE_IPC_EXCHANGE ( #28700 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-11-14 16:55:29 +08:00
rasmith
93103575ce
[BugFix][CI/Build][ROCM] Fix import error and apply assert in appropriate case in test_struct_output_generate ( #28311 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2025-11-13 22:41:29 -08:00
rasmith
15ae8e0784
[Bugfix][CI/Test][Spec Decode] Fix illegal memory access in offline_inference/spec_decode.py (Issue 27619) ( #28432 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-11-13 22:34:01 -08:00
haoyangli-amd
0b25498990
[Misc] add ignore mapper for quark quantization ( #28275 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com >
2025-11-14 05:56:35 +00:00
Roger Wang
0aecd9138f
[Misc] Update xformers to 0.33.0.post1 ( #28678 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-13 21:52:53 -08:00
Kunshang Ji
da14ae0fad
[XPU][CI]disable lm cache uts ( #28696 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-14 03:15:50 +00:00
Cyrus Leung
01bea115c4
[Misc] Remove warn_for_unimplemented_methods ( #28613 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-14 11:10:10 +08:00
Bradley D
b39a5026eb
[ci][amd] fix basic models extra init test ( #28676 )
...
Signed-off-by: Bradley Davis <bradleyhd@meta.com >
2025-11-14 02:44:36 +00:00
Michael Goin
622e6106a9
[CPU][Bugfix] Fix Apple Silicon M1 compilation failure ( #28681 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-14 09:49:55 +08:00
Sage Moore
2aa75c752b
[ROCm] Bump up the version of amd-smi to 6.4.3 ( #28680 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-11-14 01:24:28 +00:00
Hank_
4d5943bda6
[quantization][config] enable override existing quant_config ( #28510 )
...
Signed-off-by: Hank <hcc.mayday@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-14 01:24:10 +00:00
Alexei-V-Ivanov-AMD
f2b8e1c551
Mirrored test group definitions for AMD (2025-11-11) ( #28573 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-11-14 00:16:34 +00:00
Mark McLoughlin
6e25b1cddf
[KV Connector] Test async mode in scheduler tests ( #28550 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-13 18:30:59 -05:00
Wentao Ye
e64011f29a
[CI] Bug: Fix ci entrypoint pooling ( #28684 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-13 14:19:35 -08:00
Simon Mo
1b622deba7
[Misc] Update CODEOWNERS for simon-mo and comaniac ( #28675 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
2025-11-13 21:01:43 +00:00
Kebe
faed7bf07e
[Bugfix] [CPU] bump torch to 2.9.0 for Darwin to fix segmentation fault ( #27791 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-13 12:48:08 -08:00
Yanan Cao
262d263f6c
[Bugfix] Eliminate tuple inputs to submodules in graph partitioning ( #28533 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-11-13 15:09:05 -05:00
Qiu
968060c15a
[bugfix] correct local_chunk_len for DCP in reorg_kvcache with long context ( #28526 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-13 11:29:22 -08:00
elvischenv
5d6ce2b960
[Perf] Support stream interval for reducing host overhead ( #27869 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-11-13 13:21:25 -05:00
Matthew Bonanni
f9f3b596f3
[Attention][Bugfix] Fix FA sink support ( #28660 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-13 13:20:01 -05:00
Yannick Schnider
119c4927b3
[Bugfix] Fix validate model input for decoder models ( #27099 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-13 10:18:47 -08:00
Varun Sundar Rabindranath
fe1cd7704d
[Performance][B200] silu_mul_quant: pack scales in int32 ( #28358 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-13 10:16:55 -08:00
Johnny Yang
fdfd5075aa
[TPU] patch TPU wheel build script to resolve metadata issue ( #27279 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com >
2025-11-13 09:36:54 -08:00
Nick Hill
327c0a9a23
[BugFix] Ensure EngineArgs.create_engine_config is idempotent ( #28515 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-13 17:14:08 +00:00
Jane (Yuan) Xu
06c4873d95
Rewrite C++ meta funcs to Python ( #28595 )
...
Signed-off-by: Jane Xu <janeyx@meta.com >
2025-11-14 00:52:50 +08:00
Roger Wang
d3387750f1
[Misc] Turn off encoder torch compile by default ( #28634 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-13 08:38:08 -08:00
Harry Mellor
b230286fbc
Fix get_num_experts when config sets it explicitly to None ( #28652 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: bruceszchen <bruceszchen@tencent.com >
2025-11-13 16:02:42 +00:00
Yuanping Song
3035d1a166
[BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy path ( #28617 )
...
Signed-off-by: Yuanping Song <yuanping.song@outlook.com >
2025-11-13 15:24:35 +00:00
Huamin Li
07a606aa7e
[CI Failure] Fix backend selection for encoder-only models ( #28534 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-11-13 10:11:27 -05:00
amdfaa
a7791eac9d
[CI/Build] Install uv for AMD MI300: Language Models Tests (Hybrid) %N ( #28142 )
...
Signed-off-by: amdfaa <107946068+amdfaa@users.noreply.github.com >
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: zhewenli <zhewenli@meta.com >
2025-11-13 14:34:55 +00:00
Pleaplusone
8da2f28f53
[ROCm][BugFix]Fix get_cu_count in rocm_aiter_fa.py ( #28618 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-13 14:18:20 +00:00
Akash kaothalkar
86d15bfd8d
[Hardware][PowerPC] Fix fp16 compilation error for Power in cpu attention backend and bump oneDNN version ( #28535 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-11-13 13:32:21 +00:00
Fanli Lin
c9fe6abe7c
[Bugfix] Fix FPS value type for Qwen2.5-Omni video processing ( #28630 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-13 13:06:06 +00:00
zofia
c47b6c85ac
[XPU] add sym params to IPEXConfig ( #28611 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2025-11-13 11:35:04 +00:00
baonudesifeizhai
c428e8d80b
Fix io processor pooling #28273 ( #28484 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2025-11-13 11:34:14 +00:00
Zijing Liu
5e973209aa
[BugFix] Fix type error when assign a trition kernel tensor to a torch.nn.Parameter ( #28603 )
...
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
2025-11-13 11:30:04 +00:00
Di Wu
e63fd44560
Fix: Correctly filter special tokens in benchmark_prefix_caching ( #28615 )
...
Signed-off-by: Di Wu <dw2761@nyu.edu >
2025-11-13 10:57:44 +00:00
Yong Hoon Shin
11ac9ddd03
Support all interleaved layer types ( #28485 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-11-13 08:57:20 +00:00
Chauncey
5c9ad138d5
[Frontend] supports interleaved thinking ( #28531 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-13 16:14:13 +08:00
Jiangyun Zhu
fa183e9271
[Bugfix] fix kimi-linear crash ( #28445 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-13 07:59:58 +00:00
usberkeley
4ab34f6ef1
Add NUMA node validation for CPU thread binding ( #28555 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-11-13 07:03:52 +00:00
Huy Do
c33b87e777
Use official xformers-0.0.33 built for PT 2.9 ( #28600 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-12 22:48:53 -08:00
tjandy98
4504e8029b
[Bugfix] Prevent crash on empty grammar string ( #28210 )
...
Signed-off-by: tjandy98 <3953059+tjandy98@users.noreply.github.com >
2025-11-13 06:42:29 +00:00
Pleaplusone
ca00b1bfc6
[ROCm][BugFix] Remove the usage of device_info from aiter ( #28383 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-12 21:43:42 -08:00
Radu Salavat
d44fbbab0e
[build][cmake]: Bundle static ACL and torch libgomp for CPU extension builds ( #28059 )
...
Signed-off-by: Radu Salavat <radu.salavat@arm.com >
2025-11-13 05:43:08 +00:00
Lucia Fang
7e082bc14e
Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 ( #28574 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-11-12 21:40:45 -08:00
Fanli Lin
dbbe0c756a
[XPU] Support Triton path for LoRA operations on XPU ( #28511 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com >
2025-11-13 05:31:42 +00:00
Pleaplusone
7dca0c90cb
[BugFix][ROCm] Fix get_cu_count missing variable error ( #28608 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-13 05:18:56 +00:00
Andrew Xia
1a0b157a2e
[Frontend][responsesAPI][1/n] convert responses API tool input to chat completions tool format ( #28231 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-13 04:47:22 +00:00
Andrew Xia
7c38ed0f1c
[Frontend] split append tool output ( #28333 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-11-13 04:03:23 +00:00
Jialin Ouyang
a1d3866dda
[n-gen] DO NOT repeatedly return finished child requests ( #28591 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-13 03:36:07 +00:00
Harry Mellor
97d1c99302
Rename clashing method names for vLLM model protocol ( #27583 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-12 19:14:33 -08:00
Harry Mellor
3226283461
[Docs] Add some details about what the MoE block needs for the Transformers backend ( #28588 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-13 03:12:14 +00:00
Nick Hill
8832fff972
[BugFix] Fix mm_encoder_attn_backend arg type checking ( #28599 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-13 03:06:03 +00:00
Michael Goin
a543e678b4
[Bugfix] Fix SM100 gpt-oss regression due to faulty attn sink support ( #28561 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-12 19:40:59 -07:00
wangxiyuan
2dacd57394
[platform] Move get_cu_count to utils ( #27005 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-13 08:48:47 +08:00
Gregory Shtrasberg
d75ad04818
[ROCm][Bugfix] Revert removing setuptools version restriction ( #28592 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-11-12 16:46:58 -08:00
Michael Goin
52eadcec9e
[Docs] Update meetups.md description ( #28583 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-13 00:00:23 +00:00
Harry Mellor
51c599f0ec
Skip models that cannot currently init on Transformers v5 ( #28471 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-12 23:43:57 +00:00
Alexander Matveev
69d0e90313
[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap ( #28406 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-11-12 23:37:24 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
4ca5cd5740
[Core][AMD] Migrate fully transparent sleep mode to ROCm platform ( #12695 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com >
2025-11-12 15:24:12 -08:00
Michael Goin
10f01d5a3a
[Bugfix] Adjust Marlin CUDA arch selection to 8.0+PTX;9.0+PTX ( #28294 )
2025-11-12 15:14:13 -08:00
QiliangCui
3eb0c2673e
[TPU] Support GCS path in VLLM_TORCH_PROFILER_DIR ( #28487 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com >
2025-11-12 22:31:14 +00:00
vllmellm
d8140b9833
[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel naming for consistency in _aiter_ops.py ( #28464 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-11-12 21:46:57 +00:00
Varun Sundar Rabindranath
74a9a9faad
[Performance][B200] Fix deepgemm prologue ( #27897 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-12 13:13:03 -08:00
Wei Wei
478ee511de
[Misc]Fix typo in llm_engine.py ( #28584 )
...
Signed-off-by: Wei Wei <wwei6@meta.com >
2025-11-12 12:59:43 -08:00
Andy Lo
58ce8d12b7
[BugFix] Priority scheduling and spec tokens preemption ( #28558 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-11-12 20:29:21 +00:00
Yihua Cheng
94a9ebcf31
[KV connector][WIP] KV cache proxy based on LMCache multi-process mode ( #27902 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2025-11-12 20:25:43 +00:00
Harry Mellor
a39dd7bb06
[CI] Skip "Multi-Modal Models Test (Extended) 3" test that's broken in current Transformers ( #28559 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-12 19:38:13 +00:00
Thomas Parnell
64d57c3be7
[Model] [Config] Correctly identify granite-4.0-micro as non-hybrid model ( #28563 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-11-12 18:17:55 +00:00
PerryZhang01
a1e7fa362a
[EPLB][ROCm]: support EPBL for ROCm backend ( #27731 )
...
Signed-off-by: Perry Zhang <perzhang@amd.com >
Co-authored-by: Perry Zhang <perzhang@amd.com >
2025-11-12 18:16:35 +00:00
alberto
bac904565f
Implement ARC KV cache eviction policy for CPU offloader ( #27039 )
...
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com >
Signed-off-by: alberto <aperdomo@redhat.com >
Co-authored-by: Or Ozeri <or@ozery.com >
2025-11-12 09:51:39 -08:00
Benjamin Chislett
304419576a
[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer ( #28479 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-13 01:56:40 +09:00
Harry Mellor
a742134cc5
Remove deprecated fields from CompilationConfig ( #27593 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-12 16:10:28 +00:00
Nicolò Lucchesi
728a9eb70e
[Misc] Refactor Attention kv transfer methods into decorator ( #27816 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-11-12 16:05:44 +00:00
Canlin Guo
bc5bd45c7d
[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL ( #28271 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com >
2025-11-12 15:56:47 +00:00
Alexander Matveev
f76e85c299
[Performance][Hopper] Avoid M dim padding to 4x for most cases (due to cuda graphs paddings) ( #28492 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-11-12 10:51:43 -05:00
Harry Mellor
54aecd9ed5
Fix pre-commit (and XPU) on main ( #28556 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-12 06:13:41 -08:00
wangxiyuan
10138c92a5
[V0 deprecation] Deprecate use_v1 parameter ( #28112 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-12 14:03:52 +00:00
Jee Jee Li
a9d18b5107
[Bugfix] Fix gpt_oss packed_modules_mapping ( #28536 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-12 21:02:06 +08:00
TJian
edb59a9470
[ROCm] [Bugfix] Fix fused_qknorm_rope_kernel rocm compatibility ( #28500 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-12 05:01:14 -08:00
ZhengHongming888
c5f10cc139
add cpu option for p/d in nixl_connector ( #28356 )
...
Signed-off-by: Hongming Zheng <hongming.zheng@intel.com >
2025-11-12 11:53:08 +00:00
ziruiliu
d143152308
[KVConnector] Enable get_block_ids_with_load_errors() in LMCache connector ( #27978 )
...
Signed-off-by: Zirui Liu <ziliu@ddn.com >
Signed-off-by: ziruiliu <ziliu@ddn.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-11-12 11:44:58 +01:00
Chaojun Zhang
a4730c1b4f
[XPU]Fix crash due to removed VLLM_USE_V1 attribute ( #28520 )
...
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com >
2025-11-12 10:20:55 +00:00
wuyaoxuehun
d3ade61e42
[Model] fix glm4_moe_mtp load weights with GLM-4.6 checkpoint. ( #27597 )
...
Signed-off-by: wuao.scotty <wuao.scotty@bytedance.com >
Co-authored-by: wuao.scotty <wuao.scotty@bytedance.com >
2025-11-12 10:14:00 +00:00
yyzxw
1761dea1a8
[BugFix]: --enable-lora with model granite-4.0-micro crash ( #27733 )
...
Signed-off-by: zxw <1020938856@qq.com >
2025-11-12 09:03:56 +00:00
Huamin Li
c748355e0d
[CI] Introduce autorun_on_main feature ( #27836 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-11-12 08:51:19 +00:00
Chenguang Zheng
91864b79b3
[CI/Build] Fix crash due to removed VLLM_USE_V1 attribute in EPD ( #28521 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-11 23:09:33 -08:00
Lukas Geiger
ac0bb2c307
[Core] Cache vllm_is_batch_invariant ( #28304 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-12 05:03:01 +00:00
ai-jz
f31419ed8b
[Benchmark] Add retry support to fix workload bias in multi-turn benchmark ( #28493 )
2025-11-12 05:00:45 +00:00
Fanli Lin
b9ce9a3013
[BugFix] Add fallback path in apply_rotary_pos_emb_flashattn for non-cuda platforms ( #28447 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-12 03:13:21 +00:00
Chenguang Zheng
4ccffe561f
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation ( #25233 )
...
Signed-off-by: n00909098 <nguyen.kha.long@huawei.com >
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Signed-off-by: herotai214 <herotai214@gmail.com >
Signed-off-by: Khuong Le <khuong.le.manh@huawei.com >
Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com >
Co-authored-by: n00909098 <nguyen.kha.long@huawei.com >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: herotai214 <herotai214@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Khuong Le <khuong.le.manh@huawei.com >
Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com >
2025-11-11 18:58:33 -08:00
Lukas Geiger
cbb799e314
[Model][Qwen3VL] Simplify get_mrope_input_positions using numpy ( #28302 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-12 02:55:10 +00:00
Andreas Karatzas
9f0247cfa4
VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com >
2025-11-11 18:34:36 -08:00
Li, Jiang
7f829be7d3
[CPU] Refactor CPU attention backend ( #27954 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-11-12 09:43:06 +08:00
wangxiyuan
e1710393c4
[[V0 deprecation]]Remove VLLM_USE_V1 env ( #28204 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-11 18:22:16 -07:00
Isotr0py
3f770f4427
[Performance] Cache loaded custom logitsprocs to avoid overheads ( #28462 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-11 16:49:29 -08:00
Yanan Cao
48c879369f
[Frontend] Change CompilationMode to a proper Enum ( #28165 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-11-11 19:46:18 -05:00
Ilya Markov
1788aa1efb
[BugFix] Graceful handling of torch symm mem errors. ( #27671 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-11 17:41:54 -07:00
Adrian Abeyta
d23539549a
Use FLASHINFER MLA backend when testing fp8_kv_scale_compile ( #28491 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
2025-11-12 00:34:58 +00:00
Max Hu
412e153df5
[Feature] Allow configuring FlashInfer workspace size ( #28269 )
...
Signed-off-by: Max Hu <hyoung2991@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-11 23:32:20 +00:00
Michael Goin
e5f599d4d1
[Bugfix] Disable shared expert overlap if Marlin MoE is used ( #28410 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-11 23:16:12 +00:00
Michael Goin
28534b92b9
Add Zurich vLLM Meetup ( #28488 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-11 14:53:59 -08:00
wangxiyuan
d4902ba56d
[Misc] Cleanup Executor interface ( #28441 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-11 22:28:07 +00:00
Kyuyeun Kim
df4d3a44a8
[TPU] Rename path to tpu platform ( #28452 )
...
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com >
2025-11-11 19:16:47 +00:00
Jee Jee Li
9d1c474704
[LoRA][1/N]Remove LoRA extra vocab ( #28382 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-11 11:06:21 -08:00
Jie Luo
8c32c6e4b4
[Misc] fix typo in DCP comment ( #28389 )
...
Signed-off-by: Livinfly <luojie3m@gmail.com >
2025-11-11 10:59:16 -08:00
Canlin Guo
de120bc94f
[V0 deprecation] Clean up num_prefill_tokens logic for V0 ( #28203 )
...
Signed-off-by: gcanlin <canlinguosdu@gmail.com >
2025-11-11 10:57:12 -08:00
Jialin Ouyang
4228be7959
[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead ( #28245 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-11 10:28:47 -08:00
Lukas Geiger
76e4dcf225
[Misc] Remove unused attention prefix prefill ops functions ( #26971 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-11 18:26:04 +00:00
Fanli Lin
d5edcb8678
[BugFix] Fix Siglip2Attention on XPU ( #28448 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-11 18:18:02 +00:00
Xin Yang
6c3c0f8235
[Kernel] Optimize rms_norm kernel ( #27931 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2025-11-11 18:02:23 +00:00
Matthew Bonanni
684f254585
Prefer FlashAttention MLA as default over FlashMLA ( #27363 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-11 17:13:51 +00:00
Zhewen Li
e553424919
[CI/Build] Refactor Attention backend for test_prefix_prefill from xformers to SDPA ( #28424 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-12 01:09:47 +08:00
xuebwang-amd
5a1271d83a
[Quantization] fix attention quantization of gpt_oss model ( #27334 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
2025-11-11 12:06:00 -05:00
xuebwang-amd
05576df85c
[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model ( #24239 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
Co-authored-by: fxmarty-amd <felmarty@amd.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-11 12:05:22 -05:00
zhrrr
68c09efc37
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model ( #27165 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
2025-11-11 12:00:31 -05:00
Nicolò Lucchesi
a7ef3eb0cd
[NIXL] Generalize block-first backend layouts (FlashInfer-like) ( #28282 )
2025-11-11 16:57:43 +00:00
Michael Goin
f9a4087182
Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel ( #28431 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-11 11:46:04 -05:00
the-codeboy
287bbbeb06
[Doc] Fix typo in serving docs ( #28474 )
...
Signed-off-by: the-codeboy <71213855+the-codeboy@users.noreply.github.com >
2025-11-11 16:45:49 +00:00
usberkeley
3143eb23fc
[BugFix] Add test_outputs.py to CI pipeline ( #28466 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-11 16:01:30 +00:00
Fanli Lin
b886068056
[BugFix] Fix RuntimeError in PixtralHFAttention on CPU/XPU ( #28444 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-11 15:29:33 +00:00
Mark McLoughlin
a90ad7d838
Add @markmc to CODEOWNERS for Observability ( #28457 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-11 23:03:22 +08:00
jvlunteren
533b018f72
[BugFix] Fix Failing Ruff Check ( #28469 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com >
2025-11-11 06:41:43 -08:00
bnellnm
a1448b4b69
[Kernels] Split up fused_moe/layer.py, isolate more modular kernel code ( #28064 )
2025-11-11 07:29:02 -07:00
Maryam Tahhan
fa1970201d
[Docs] Fix grammar in CPU installation guide ( #28461 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
2025-11-11 14:01:11 +00:00
Ido Segev
3380543b20
Add request timeout override for multi-turn benchmarks ( #28386 )
...
Signed-off-by: Ido Segev <idos@pliops.com >
2025-11-11 13:41:18 +00:00
Cyrus Leung
afffd3cc8a
[Model] Pass mm_features directly into get_mrope_input_positions ( #28399 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-11 21:14:48 +08:00
Chaojun Zhang
7dbe6d81d6
Fix Fused MoE LoRA Triton kernel bug ( #28450 )
...
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com >
2025-11-11 20:46:47 +08:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic ( #24794 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-11 07:40:44 -05:00
Michael Goin
2e78150d24
[CI] Add mergify rules for nvidia label ( #28417 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-11 04:28:28 -08:00
Ido Segev
d381eb967f
Multi turn benchmark progress bar for synthetic conversation generation ( #28394 )
...
Signed-off-by: Ido Segev <idos@pliops.com >
2025-11-11 11:06:04 +00:00
Lukas Geiger
9973e6e04a
[Model][Qwen3VL] Slighly speedup fast_pos_embed_interpolate ( #28434 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-11 10:35:10 +00:00
Fanli Lin
c7991269dd
[BugFix] 'DeepseekV2Config' object has no attribute 'use_mla'` ( #28387 )
...
Signed-off-by: Lin, Fanli <fanli.lin@intel.com >
2025-11-11 08:45:38 +00:00
Jiangyun Zhu
f0359fffa4
[Bugfix] fix qwen3-next crash ( #28202 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-11 08:24:28 +00:00
Sage Moore
798c7bebca
[EPLB] Refactor balance_packing to use numpy and optimize GPU-CPU transfers in EPLB ( #28369 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-11-11 00:19:51 -08:00
Roger Wang
4fd4b743a2
[Bugfix] Fix max image size for PaddleOCR-VL ( #28442 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-11 08:07:24 +00:00
David Ben-David
cc079763c5
[BugFix] Avoid calling KV connector layer APIs when metadata is unset ( #28253 )
...
Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-11-10 23:39:36 -08:00
iAmir97
a7adbc6c6b
[Doc] Sleep mode documentation ( #28357 )
...
Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: Amir Balwel <amir.balwel@embeddedllm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 22:44:35 -08:00
Robert Shaw
e605e8e323
[Bugfix] Fix Stream Sync for Shared Expert Overlap ( #28430 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
Co-authored-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-11 05:59:08 +00:00
Zuyi Zhao
bca74e32b7
[Frontend] Add sagemaker_standards dynamic lora adapter and stateful session management decorators to vLLM OpenAI API server ( #27892 )
...
Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com >
Signed-off-by: Shen Teng <sheteng@amazon.com >
Co-authored-by: Shen Teng <sheteng@amazon.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-11-11 04:57:01 +00:00
Zhuohan Li
8d706cca90
[Misc] FlattenLogprobs -> FlatLogprobs ( #28335 )
2025-11-11 03:41:23 +00:00
Xin Yang
57201a6a4c
Fix rotary embedding benchmark script ( #28323 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2025-11-10 21:57:12 -05:00
Michael Goin
f2d9ad0620
Only register rocm_aiter_ops if aiter is found ( #28428 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-11 02:53:24 +00:00
Wentao Ye
de540c0354
[Feature] Add env var VLLM_MOE_USE_DEEP_GEMM ( #28422 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-11 02:29:48 +00:00
Lucas Wilkinson
39029d5192
[CI/Test Fix] Fix CP tests on Blackwell ( #28404 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-11 01:36:29 +00:00
Wentao Ye
35d801f13f
[Feature] Refactor batch invariant fp8 DeepGEMM ( #27606 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-11 00:08:40 +00:00
Matthew Bonanni
0bf29fadf5
[Test] Remove old non-varlen FA2 test ( #28420 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-10 23:57:41 +00:00
Adrian Abeyta
a5a790eea6
[Bugfix] Ensure calculated KV scales are applied in attention. ( #27232 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
2025-11-10 23:42:37 +00:00
Jialin Ouyang
b30372cbd0
[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for better coverage ( #27896 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-10 15:34:18 -08:00
Ilya Markov
d17ecc6b19
[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds ( #24248 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-11-10 18:33:11 -05:00
Yong Hoon Shin
021143561f
[ROCm] Add missing gemm_a8w8_blockscale import ( #28378 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-11-10 23:13:36 +00:00
Robert Shaw
30700b1cd7
[CI] Fix Plugin Tests Tests ( #28413 )
...
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2025-11-10 22:36:11 +00:00
Andrew Xia
4b94ed8f92
[Frontend][2/n] remove empty content from _parse_tool_calls_from_content ( #28331 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-11-10 14:07:49 -08:00
Lucas Wilkinson
6dec9f6109
[BugFix] Fix DeepGEMM over-allocating workspace ( #28254 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-10 17:01:17 -05:00
Wei Wei
bf6a3d0ff5
[Misc] Add more scoping for improved trace ( #28329 )
...
Signed-off-by: Wei Wei <wwei6@meta.com >
2025-11-10 21:03:21 +00:00
Sage Moore
40d33264c6
[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled ( #28377 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: Sage Moore <sagemoore@utexas.edu >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-10 20:39:19 +00:00
Jonas M. Kübler
9c84ca8293
[FA/Chore] Bump FA version for FP8 two-level accumulation ( #27889 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2025-11-10 12:06:04 -08:00
Rémi Delacourt
6d54336ae5
[Bugfix] Fix llguidance backend, rollback when EOS was encountered ( #25905 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-11-10 14:53:32 -05:00
jiahanc
34553b9d27
[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next ( #27492 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
2025-11-10 12:34:57 -05:00
Varun Sundar Rabindranath
b039bfda8f
[Bugfix] Fix persistent_masked_m_silu_mul_quant tests ( #28366 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-10 09:21:52 -08:00
Cyrus Leung
d0e186c16f
[V0 Deprecation] Remove unused context_len and seq_len from M-RoPE ( #28395 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-11 00:30:06 +08:00
vllmellm
f080a83511
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops ( #24490 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-10 08:20:53 -08:00
caozuoba
40e2eeeb92
[Kernel] Optimization of the mm_k operator. ( #28280 )
...
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-10 16:03:46 +00:00
zejunchen-zejun
b06b9470ca
[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model ( #27474 )
...
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com >
2025-11-10 10:38:56 -05:00
TJian
4673e465ff
Add @tjtanaa to codeowner for ROCm and multi-modal ( #28360 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-10 21:39:17 +08:00
Ferrebo
912744d066
[Fix] optimize visual token mask with caching and multi-token support ( #28374 )
...
Signed-off-by: Ferrebo <itachi971009@gmail.com >
Signed-off-by: kebo01 <kebo01@baidu.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 13:23:49 +00:00
Yu Jiaqi
15be507c86
[bugfix] fix siglip batch text output error ( #28365 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-11-10 21:21:15 +08:00
Mark McLoughlin
6f7de33bed
[Metrics] Refactor LoRA state tracking ( #26801 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-10 16:34:36 +08:00
Shinichi Hemmi
a98cc35c34
Restore PlaMo2 unit test as pfnet/plamo-2-1b now supports transformers >=4.56 ( #28019 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
2025-11-10 06:50:02 +00:00
Lucas Wilkinson
e8697faf03
[V0 deprecation] Remove no longer used get_metadata_cls ( #28370 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-10 14:32:09 +08:00
Xiake Sun
03fa4d3fb3
[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X ( #28373 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
Signed-off-by: Xiake Sun <xisun@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 04:53:40 +00:00
Varun Sundar Rabindranath
6b2b9fd934
[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness ( #28322 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-10 10:45:29 +08:00
JartX
c5f685b3ae
[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP ( #28279 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2025-11-09 23:09:36 +00:00
Jiangyun Zhu
c4768dcf47
[Kernel] Fix fused_gdn_gating ( #28343 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-09 14:26:35 -07:00
Zhewen Li
a65a934ebe
[CI/Build] Temporary fix to LM Eval Small Models ( #28324 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-09 21:08:38 +00:00
usberkeley
4a8d6bd168
Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method ( #28214 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-11-09 19:11:46 +00:00
Lucas Wilkinson
636efd10a5
[Core] Separate out attention metadata building logic from prepare inputs ( #26764 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-09 13:51:43 -05:00
Nick Hill
289eb6c537
[Core] Simplify async KV output aggregation ( #28327 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-09 09:44:13 -08:00
Nicolò Lucchesi
19d91ece4b
[CI] Fix flaky test_eagle_correctness test ( #28364 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-09 16:04:59 +00:00
Jiangyun Zhu
7ae5a5fb11
[Misc] Add some comments in qwen3-next ( #28267 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-08 23:59:24 -08:00
Yong Hoon Shin
de2b78305f
[ROCm] Add env to enable/disable aiter triton gemm ( #28321 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-11-08 22:27:00 -08:00
Ning Xie
e5e9067e61
[Misc] fix typo and add detailed log ( #28178 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-09 05:33:46 +00:00
yihong
3a7d580343
fix: close issue 28338 by fixed python version ( #28339 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-11-09 05:07:26 +00:00
Kevin H. Luu
05f8d69077
[chore] Move some wikimedia images to S3 ( #28351 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-11-09 01:58:26 +00:00
Mohammad Miadh Angkad
404d7a9d14
[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 ( #28345 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
2025-11-08 15:50:10 -07:00
ElizaWszola
171133f929
[Bugfix] Fix test fused quant layernorm tests ( #27865 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-08 14:31:33 -08:00
Cole Murray
32787d0644
Remove setuptools upper bound constraint (<80) ( #28337 )
...
Signed-off-by: Cole Murray <colemurray.cs@gmail.com >
2025-11-08 22:30:18 +00:00
Benjamin Chislett
975676d174
[Feat] Drop-in Torch CUDA Profiler ( #27841 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-08 14:07:37 -08:00
Ev Lacey
77d702a22b
Enhance run_cluster.sh for multi-NIC support ( #28328 )
...
Signed-off-by: Ev Lacey <elacey@nvidia.com >
2025-11-08 22:04:16 +00:00
zhangsicheng5
2108a571d7
[DCP] Support dcp kv_cache interleave size > 1 ( #26696 )
...
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com >
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Signed-off-by: Qiu <qiuchunshuo@huawei.com >
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com >
2025-11-09 04:45:27 +09:00
Andy Lo
47604137a2
[Bugfix] Spec decode + structured output + spec model max len edge case ( #28298 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-11-08 19:44:25 +00:00
Robert Shaw
26990d25dc
[Bugfix] Update device name for H200 detection ( #28349 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-11-08 19:01:11 +00:00
Harry Mellor
d9ab1ad9d1
reasoning_content -> reasoning (#27752 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-08 12:15:08 +00:00
22quinn
608bb14462
[Attention] Remove max cudagraph size limit of 992 ( #27840 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-11-07 22:33:27 -08:00
Xiaozhu Meng
4a36681f85
[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins ( #27990 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2025-11-07 22:25:21 -08:00
Abolfazl Shahbazi
d15afc1fd0
Refactor CPU/GPU extension targets for CMake build ( #28026 )
...
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com >
2025-11-08 14:17:35 +08:00
Isotr0py
934a9c3b79
[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 ( #28101 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-08 05:01:27 +00:00
gnovack
70af44fd10
[bugfix] support eagle with lora cudagraph specialization ( #28318 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2025-11-08 03:25:45 +00:00
Aurick Qiao
781f5ebf52
Bump arctic-inference requirement ( #28174 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-07 18:31:18 -08:00
Michael Goin
0852527647
[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM ( #28124 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-07 18:20:55 -08:00
Hamid Mukhtar
61d25dc44b
Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) ( #28308 )
...
Signed-off-by: Hamid Mukhtar <15519013+hammmmy@users.noreply.github.com >
2025-11-08 02:09:21 +00:00
Xiaohong (Sean) Chen
d0c7792004
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding ( #21068 )
...
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com >
Co-authored-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
2025-11-08 01:58:22 +00:00
Boyuan Feng
b158df2813
remove resolve_op_overloads and use splitting_ops directly ( #28081 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-08 01:13:13 +00:00
Kunshang Ji
1aaecda078
[XPU] Enable Expert parallel for MoE models ( #28263 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-08 00:33:11 +00:00
Harry Mellor
811df41ee9
Update Flashinfer from v0.4.1 to v0.5.2 ( #27952 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-07 16:24:42 -08:00
Nick Hill
67a2da890e
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) ( #28319 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-07 22:11:03 +00:00
Nick Hill
da786e339e
[Core] Rework handling of async scheduling config ( #28250 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-07 20:01:23 +00:00
Benjamin Chislett
18903216f5
[Bugfix] Fix and add tests for GptOss reasoning parser ( #28000 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-07 19:28:04 +00:00
Simon Mo
d0ceb38ae8
[Build] Fix release pipeline failing annotation ( #28272 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-07 10:06:45 -08:00
youkaichao
155ad56d7b
[doc] add guide about the provided PTX was compiled with an unsupported toolchain ( #28305 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-11-08 00:26:34 +08:00
Fadi Arafeh
5fb4137c99
[README] Add Arm CPUs to the list of supported targets ( #28290 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-11-07 15:41:47 +00:00
Nicolò Lucchesi
68a72a5cc1
Revert "[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )" ( #28289 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-07 15:07:01 +00:00
Boyuan Feng
0f872b7977
[Log] update shm wait time msg ( #28255 )
2025-11-07 09:43:30 -05:00
Wentao Ye
4b1ff13221
[Feature] Default ignore_eos True for random dataset ( #28227 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-07 07:35:33 -05:00
Iceber Gu
e0d6b4a867
[CLI] add --max-tokens to vllm complete ( #28109 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com >
2025-11-07 12:21:40 +00:00
Pavani Majety
72b1c2ae2c
[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes ( #27439 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-11-07 04:18:39 -08:00
Lukas Geiger
e0919f331d
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU ( #28168 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-07 12:14:29 +00:00
Kevin H. Luu
8e19d470af
[fix] Revert "fixing mm placeholder replacement issue with gemma3" ( #28285 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-11-07 12:09:09 +00:00
Mengqing Cao
1958bda9b4
[Misc][Model][Refactor] Pass the prefix into Linear layers ( #28259 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-11-07 19:38:38 +08:00
Zhang Xiangze
7bdb42b2f2
[CPU]Avoid repeated random sample compile ( #28260 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-07 11:03:57 +00:00
汪志鹏
315068eb4a
[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark ( #28265 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com >
2025-11-07 09:35:22 +00:00
Jialin Ouyang
ccd98b59c1
[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead ( #28171 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-07 00:27:12 -08:00
Jee Jee Li
21b82f4ea2
[Kernel] LoRA triton kernels support PDL ( #27402 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-07 08:05:48 +00:00
Copilot
a736e5ff77
[CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly ( #28074 )
2025-11-07 15:58:16 +08:00
baonudesifeizhai
9da9208b20
[Bug] Fix missing token_ids for reasoning parser models in chat completions #28246 ( #28256 )
2025-11-07 07:31:58 +00:00
smit kadvani
11fd69dd54
[amd][gptoss] Perf gain because of block alignment ( #28024 )
...
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com >
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com >
2025-11-07 05:27:42 +00:00
Harry Mellor
c0a4b95d64
Fix issues from #28242 ( #28257 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-07 04:23:17 +00:00
Alexis MacAskill
a47d94f18c
Add runai model streamer e2e test for GCS ( #28079 )
...
Signed-off-by: Alexis MacAskill <amacaskill@google.com >
2025-11-07 03:07:54 +00:00
Alex Brooks
e70fbc599b
[CI/Build] Loosen STT LoRA Translate Check (Flaky Test) ( #28247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Signed-off-by: Alex Brooks <alex.brooks@ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-11-07 02:51:27 +00:00
Lucas Kabela
4bf56c79cc
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile ( #28242 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2025-11-07 00:16:03 +00:00
Junhong Liu
59b453eaa2
Speed up mm processor kwargs per request by spliting dynamic and static kwargs ( #26483 )
...
Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
2025-11-07 07:51:28 +08:00
Eugene Khvedchenya
827e4237bc
Fix failing test for CRadio ( #27738 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: wang.yuqi <noooop@126.com >
2025-11-06 15:32:25 -08:00
Varun Sundar Rabindranath
ca6f755d24
[BugFix] Fix FusedMoELoRA + ModularKernel Integration ( #28237 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-06 22:53:30 +00:00
Matthew Bonanni
ca90f50304
[Test] Add non-MoE DP test coverage ( #28235 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-06 20:59:57 +00:00
Fang Han
da855b42d2
[Doc]: Make extraInit containers fully configurable in helm chart ( #27497 )
...
Signed-off-by: Fang Han <fhan0520@gmail.com >
2025-11-06 20:27:16 +00:00
Aleksandr Malyshev
449de9001a
[ROCm] triton fp8 kernel ( #27058 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
2025-11-06 14:46:44 -05:00
Vico Chu
d4aa65c998
[Chore] eliminate duplicated and unconditional object serialization in anthropic messages api ( #27792 )
...
Signed-off-by: Vico Chu <vico24826@gmail.com >
2025-11-06 19:09:19 +00:00
Julien Denize
7a8375f8a0
Add llama 4 scaling support ( #28145 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-11-06 18:55:17 +00:00
Andy Lo
5e0c1fe69c
[Structured outputs] Upgrade llguidance to 1.3.0 ( #28039 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-11-06 10:24:47 -08:00
Russell Bryant
4507a6dae4
CODEOWNERS: Add myself as reviewer on security docs ( #28216 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-11-06 17:39:42 +00:00
Roy Wang
d1dd5f53e4
[Frontend] Fix logging format when enable response logging ( #28049 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2025-11-06 16:25:39 +00:00
StanHatko
e52e4da971
[HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores ( #27953 )
...
Signed-off-by: Stan Hatko <stan_hatko@live.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2025-11-06 23:47:11 +08:00
Milos Puzovic
2176778cd3
[Doc] Add Arm CPUs are on the list of supported targets in vLLM ( #26018 )
...
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com >
2025-11-06 15:30:26 +00:00
Eric Yue
0370679ce9
[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 ( #28200 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-11-06 07:29:46 -08:00
Harry Mellor
8816e375d3
[Docs] Switch to directory style URLs ( #28058 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-06 07:06:33 -08:00
Michael Goin
f32229293e
Disable nm-testing models with issues in CI ( #28206 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-06 06:19:07 -08:00
xiangze-arm
c757a15f0f
[CPU]Improve cpu fused moe perf ( #27244 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-06 11:04:18 +00:00
Chauncey
59a50afa08
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony ( #26874 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-06 10:40:03 +00:00
courage17340
981cadb35c
[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty ( #28181 )
...
Signed-off-by: courage17340 <courage17340@163.com >
2025-11-06 17:52:13 +08:00
wangxiyuan
c3ee80a01a
[V0 deprecation]clean up is_v1_supported_oracle ( #28116 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-06 16:05:32 +08:00
Aditya Tewari
3755c14532
[CPU] Enable torch profiling ( #28130 )
...
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com >
2025-11-06 07:32:05 +00:00
Seungduk Kim
201dc98acc
Fix hard-coded parameter name in gemma3n.py ( #27946 )
...
Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com >
Signed-off-by: Biswa Panda <biswa.panda@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-11-05 23:07:36 -08:00
Julien Denize
a404e2c0f1
Patch Mistral Tokenizer ( #28146 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-11-06 06:43:16 +00:00
Xiaozhu Meng
e31946f86e
[flashinfer] fix FI all2all with FI cutlass moe ( #28166 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com >
2025-11-06 05:52:16 +00:00
gmagogsfm
bde5039325
[CI] Add compile/test_multimodal_compile.py to CI ( #28151 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-06 05:41:47 +00:00
Jacob Zhong
d72299d47b
Make the cv2 dependency optional ( #27780 )
...
Signed-off-by: Jacob <cmpute@qq.com >
2025-11-06 05:08:55 +00:00
Lukas Geiger
80679f108f
[Core][MM] Use non-blocking CPU-GPU copy of multimodal data ( #28141 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-06 04:05:12 +00:00
Isotr0py
43ecd0a900
[Chore] Clean up deepseek v2/v3 config copy ( #28055 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-06 03:46:30 +00:00
Chauncey
07d614511f
[Misc] Remove the duplicate code ( #28111 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 21:07:47 -05:00
Vadim Gimpelson
f948ab6945
[CI Failure] nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV was removed from HF. Skip it in tests ( #28170 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-06 01:22:13 +00:00
Wentao Ye
d71af5f502
[Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement ( #28164 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:21:08 -08:00
Wentao Ye
90189c71a9
[Bug] Fix env string "0" same to True ( #28159 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:04:20 -08:00
Wentao Ye
d79d9f0780
[Bug] Fix cpu disable shared_experts VLLM_DISABLE_SHARED_EXPERTS_STREAM ( #28157 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:03:09 -08:00
Vadim Gimpelson
b6a248bdd7
[PERF] Decouple projections from GDN custom op. Attempt 2 ( #28083 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-05 17:01:12 -08:00
Dayeol Lee
1767658559
[Debugging] Add annotation for easier trace analysis ( #22496 )
2025-11-05 16:52:52 -08:00
Kuntai Du
efe73e9b57
[Core][Hybrid allocator + connector 2/n] Unify remove_skipped_blocks by get_last_useful_token ( #25431 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-11-06 00:12:00 +00:00
Zhewen Li
0b8e871e5e
[CI/Build] Fix test_defaults_with_usage_context in AMD CI ( #27926 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 15:40:24 -08:00
Zhewen Li
5ee93a5956
[CI/Build] Update checking logic in cutlass_group_gemm_supported ( #27948 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 15:40:10 -08:00
Snehlata
e15601789b
[Feature]: Add corrupted request metric to V1 metrics system. ( #27306 )
...
Signed-off-by: atalhens <sneh.lata@nutanix.com >
2025-11-05 13:45:29 -08:00
Richard Zou
65ac8d8dc4
[Docs] Add guide to debugging vLLM-torch.compile integration ( #28094 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-11-05 21:31:46 +00:00
Isotr0py
ffb08379d8
[Chore] Remove Nemotron-Nano-VL config copy ( #28126 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-05 20:06:45 +00:00
R3hankhan
e04492449e
[Hardware][IBM Z] Optimize s390x Dockerfile ( #28023 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2025-11-05 11:25:44 -08:00
Michael Yao
518ec6b722
[Docs] Clean up README_TUNING.md ( #28088 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-11-05 19:01:34 +00:00
wang.yuqi
802748bddb
[Bugfix] Fix Qwen3-Reranker-8B load ( #28117 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-11-05 18:33:50 +00:00
Paul Zhang
faedbb4d4f
[Feature] Extend batch invariant torch.compile to B200 ( #27856 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com >
2025-11-05 10:04:49 -08:00
Samuel Shen
40db194446
[CI]: Add LMCacheConnector Unit Tests ( #27852 )
...
Signed-off-by: Samuel Shen <slshen@uchciago.edu >
Co-authored-by: Samuel Shen <slshen@uchciago.edu >
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu >
2025-11-05 09:45:57 -08:00
Chen Zhang
c765f0b443
[FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell ( #27994 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-11-05 09:25:32 -08:00
gmagogsfm
002b07c4b2
[Bugfix] vLLM should check Inductor config for compile cache enablement status ( #27637 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-11-05 12:22:44 -05:00
Walter Beller-Morales
752ddeacaa
[Core] add support for reasoning parser plugins ( #28075 )
...
Signed-off-by: walter beller-morales <walter.beller.morales@gmail.com >
2025-11-06 01:15:06 +08:00
Jiangyun Zhu
c18f88c6ca
[Kernel] Fuse computation of g and beta for Gated Delta Net ( #28095 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-05 09:14:55 -08:00
Jiaju Zhang
6fd0df8132
[misc] add vLLM Beijing Meetup ( #28127 )
...
Signed-off-by: Jiaju Zhang <jjzhang@redhat.com >
2025-11-05 17:12:59 +00:00
Isotr0py
3f5a4b6473
[Bugfix] Validate custom logits processor xargs for online serving ( #27560 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-05 16:53:33 +00:00
Pleaplusone
6cae1e5332
[ROCm][MLA] Support block-size > 1 for AITER MLA backend ( #27224 )
...
Signed-off-by: ganyi <ygan@amd.com >
Co-authored-by: wuhuikx <hattie.wu@amd.com >
2025-11-05 10:43:02 -05:00
Alexei-V-Ivanov-AMD
80c9275348
Enabling cooperative multi-gpu tests on multi-gpu nodes ( #27986 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-11-05 10:35:49 -05:00
Ilya Markov
e50c454672
[BugFix] Support EP/DP + EPLB with MTP ( #25311 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2025-11-05 15:22:17 +00:00
Chen Zhang
5d16d0fa62
[DCP] check return_lse for all layers in dcp ( #27929 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-11-05 22:27:25 +08:00
bigmoyan
0606bea2b6
add kimi reasoning parser ( #28128 )
...
Signed-off-by: wangzhengtao <wangzhengtao@msh.team >
Co-authored-by: wangzhengtao <wangzhengtao@msh.team >
2025-11-05 21:48:33 +08:00
Frost Mitchell
6e97eccf5d
[XPU] Enable custom routing functions in IPEX for Llama4 ( #28004 )
...
Signed-off-by: frost-intel <frost.mitchell@intel.com >
2025-11-05 13:39:57 +00:00
Boyuan Feng
6ab183813c
[Graph Partition][Cache] Use inductor partition ops config ( #27702 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-05 13:04:48 +00:00
amirkl94
6b7a81185d
Bugfix: Cutlass FP8 FusedMoE bad scaling factors ( #27255 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-05 06:06:06 -05:00
Eric Yue
b57789b62b
Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message ( #27635 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-11-05 19:03:51 +08:00
Chauncey
377061d481
[Misc] fix import error for DeepSeekR1ReasoningParser ( #28114 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 19:02:32 +08:00
Kuntai Du
86dca07d9b
[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator ( #28011 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-11-05 10:36:31 +00:00
Qiu
16b37f3119
[bugfix] fix wrong dcp_local_seq_lens calc ( #27518 )
...
Signed-off-by: Qiu <qiuchunshuo@huawei.com >
2025-11-05 17:58:13 +08:00
Chauncey
0976711f3b
[Refactor] to simplify and extract the shared logic between chat completion and responses ( #27961 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 15:46:39 +08:00
Chauncey
e261d37c9a
[Refactor] Lazy-loaded reasoning_parser ( #28092 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 15:37:02 +08:00
Alex Brooks
b7cbc25416
[Model, Core] Support Granite Speech & LoRA for STT ( #24455 )
2025-11-05 08:33:48 +01:00
Lucas Wilkinson
d43ad5a757
[BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) ( #28100 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-05 14:54:43 +08:00
Isotr0py
0ff05e3770
[Bugfix] Fix encoder-only model support for transformers backend ( #28021 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-04 22:24:41 -08:00
wangxiyuan
428bc7bf1c
[V0 deprecation] Remove VLLM_USE_V1 usage in most modules ( #27955 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-04 20:51:16 -08:00
Zhewen Li
878fd5a16f
[CI/Build] Enable some fixed tests in AMD CI ( #28078 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 03:15:59 +00:00
Kunshang Ji
18b39828d9
[XPU] Add gpt-oss model support for Intel GPU ( #27786 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-05 02:17:23 +00:00
tou
4ea62b77f5
[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 ( #27740 )
2025-11-05 09:25:09 +08:00
Vadim Gimpelson
d4e547bb7e
Revert "[PERF] Decouple projections from GDN custom op" ( #28080 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-04 15:58:23 -08:00
Aleksandr Malyshev
2d977a7a9e
[ROCm] gemm_a16w16 upstreaming ( #26969 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-11-04 16:01:00 -05:00
Chenheli Hua
1fb4217a05
[Multimodal] Make MediaConnector extensible. ( #27759 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-11-04 18:28:01 +00:00
nadavkluger
611c86ea3c
Added disable rule to track files under benchmarks/lib ( #28048 )
...
Signed-off-by: Nadav Kluger <nadav.k@fmr.ai >
2025-11-04 18:18:43 +00:00
Pleaplusone
dc937175d4
[ROCm][Perf] New design on ROCm AITER MHA backend Implementation ( #25763 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-04 18:05:33 +00:00
Harry Mellor
2f1cc8cef1
Remove deprecated --rope-scaling and --rope-theta ( #28006 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-04 18:01:56 +00:00
Nick Hill
938a81692e
[AsyncScheduling] Don't schedule past request max_tokens ( #27922 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 17:06:28 +00:00
Nick Hill
c9f66da8fd
[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 08:33:55 -08:00
yt0428
05cae69f0f
[model] Add support for openPangu_Ultra_MoE ( #27521 )
...
Signed-off-by: yuantao <2422264527@qq.com >
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 08:17:20 -08:00
Vadim Gimpelson
5fd8f02ea9
[PERF] Decouple projections from GDN custom op ( #27512 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-04 08:11:41 -08:00
lyrisz
97e3dda84b
[Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM ( #27284 )
...
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com >
Co-authored-by: Faqin Zhong <zhofaqin@amazon.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-04 07:49:25 -08:00
Nick Hill
5a0a6dfd55
[BugFix] Fix incorrect preallocated sampled_token_ids tensor size ( #28025 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 07:38:16 -08:00
bnellnm
938772af03
[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. ( #27123 )
2025-11-04 21:59:45 +08:00
tomeras91
e4ee658672
[Model] add optimal triton fused moe configs for NemotronH MoE ( #27967 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-11-04 12:59:43 +00:00
tomeras91
77f8001f53
[Model][Bugfix] fix pipeline parallelism support for NemotronH ( #27968 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-11-04 12:28:36 +00:00
Zhuohan Li
300a265978
[Core] Enable StatLogger in LLMEngine ( #28020 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-11-04 04:13:35 -08:00
Jerry Zhang
03c4c4aa9d
Support using Int4PreshuffledTensor after loading ( #26066 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-11-04 06:00:57 -05:00
yugong333
2ec401bc39
Load tuned fused_moe_lora shrink and expand kernel configs separately ( #27435 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 18:27:35 +08:00
Varun Sundar Rabindranath
4022a9d279
[BugFix][Performance] Restore flashinfer autotuning for all scenarios ( #27904 )
2025-11-04 15:56:21 +08:00
Zhewen Li
53f6e81dfd
[CI/Build] Fix OpenAI API correctness on AMD CI ( #28022 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-04 07:20:50 +00:00
CSWYF3634076
43a6acfb7d
[Model] fix ernie45 reasoning_parser ( #27973 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-11-04 07:16:46 +00:00
Mark McLoughlin
58279c60b5
[KV Connector] Make KVCacheConfig an explicit constructor argument ( #27887 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-03 23:00:49 -08:00
Zhewen Li
2f84ae1f27
[CI/Build] Update LM Eval Version in AMD CI ( #27944 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-04 06:36:40 +00:00
xiangze-arm
f32cbc9a0c
[CPU]Improve dynamic 4bit moe performance ( #27240 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-04 06:33:23 +00:00
Wentao Ye
7e4be74104
[Bug] Batch invariant: Fix flash attn MLA RuntimeError: scheduler_metadata must have shape (metadata_size) ( #27884 )
2025-11-04 14:05:55 +08:00
Mark McLoughlin
380ba6816d
[Metrics] Enable sleep state metric outside of dev mode ( #27867 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-03 20:35:36 -08:00
liuzhenwei
14a125a06d
[NIXL][XPU] Pin NIXL version to 0.7.0 ( #27849 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-11-04 03:28:35 +00:00
Chauncey
c02fccdbd2
[Refactor] Lazy import tool_parser ( #27974 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-04 10:10:10 +08:00
li2haipeng
6ddae74054
[LoRA] Lora shrink swizzle ( #27694 )
...
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
Signed-off-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 09:30:20 +08:00
vllmellm
b13a447546
[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm ( #27748 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-11-03 17:12:19 -08:00
QiliangCui
7956b0c0bc
Remove the tpu docker image nightly build. ( #27997 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com >
2025-11-04 00:35:54 +00:00
Tyler Michael Smith
3758757377
[Bugfix] Fix MoE Routing Simulation ( #28002 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-11-03 22:26:49 +00:00
Hank_
ccd3e55e51
[Bugfix][plugin] fla crash on plugin ( #27322 )
2025-11-04 05:27:03 +08:00
Matthew Bonanni
01baefe674
Add TP parameter to attention tests ( #27683 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-03 13:04:40 -08:00
Ning Xie
786030721e
[Docs] add runai_streamer_sharded to LoadConfig ( #27937 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-03 20:35:16 +00:00
Matthew Bonanni
145c00a4d3
[Bugfix] change FlashMLA reorder_batch_threshold ( #27777 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-03 15:17:10 -05:00
Lucas Kabela
55011aef24
[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile ( #27764 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2025-11-03 11:12:15 -08:00
Sophie du Couédic
a4398fbb5e
[Feature][Benchmarks] Support inf burstiness ( #26941 )
...
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com >
2025-11-03 18:33:17 +00:00
Aurick Qiao
2c19d96777
[Spec Decode] Integrate Suffix Decoding from Arctic Inference ( #25784 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com >
2025-11-03 09:23:31 -08:00
Lucas Wilkinson
4bc400f47e
[CI/Testing] Add basic single node dual batch overlap test ( #27235 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-03 17:00:46 +00:00
ahao-anyscale
cac4c10ef0
[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile ( #27616 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-11-03 11:13:51 -05:00
pwschuurman
f7d2946e99
[Bugfix] Skip gs:// model paths for speculator detection ( #27846 )
...
Signed-off-by: Peter Schuurman <psch@google.com >
2025-11-03 14:31:03 +00:00
gnovack
294c805f1d
Early exit for MoE LoRA kernels ( #27131 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-03 20:22:17 +08:00
zhang-prog
40b69e33e7
[Model] Add PaddleOCR-VL Model Support ( #27758 )
...
Signed-off-by: zhangyue <zhangyue66@baidu.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-03 19:04:22 +08:00
Jee Jee Li
32257297dd
[CI/Build] Remove the flaky gpt-oss lora test ( #27966 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-03 16:50:06 +08:00
Misha Efimov
ba464e6ae2
Add ORCA endpoint load metrics support ( #24905 )
...
Signed-off-by: Misha Efimov <mef@google.com >
2025-11-03 08:21:31 +00:00
Kunshang Ji
7f4bdadb92
[XPU]Refine Dockerfile.xpu, avoid oneccl dependency issue ( #27964 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-03 07:36:59 +00:00
Rémi Delacourt
cec7c28833
[Bugfix] Padded Eagle Specdec with Chunked Prefill ( #26263 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai >
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-03 02:22:46 -05:00
Thomas Parnell
18961c5ea6
[Hybrid] Pass kernel block size to builders ( #27753 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-11-03 05:48:03 +00:00
Sungyoon Jeong
470ad118b6
[Frontend] Align finish_reason when tool is called with OpenAI ( #25054 )
...
Signed-off-by: Sungyoon Jeong <sungyoon.jeong@furiosa.ai >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-03 04:21:18 +00:00
Biswa Panda
1bf43ae35d
[BugFix][LoRA] use adapter_id instead of id field of lora_request ( #27728 )
...
Signed-off-by: Biswa Panda <biswa.panda@gmail.com >
2025-11-03 10:08:08 +08:00
Vensen
0ce743f4e1
Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 ( #27420 )
...
Signed-off-by: vensenmu <vensenmu@gmail.com >
2025-11-02 16:24:01 +00:00
Cyrus Leung
6c317a656e
[Misc] Provide Siglip2 chat template ( #27939 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-02 13:42:38 +00:00
Asaf Joseph Gardin
00b31a36a2
[V1] [Hybrid] Mamba1 Automatic Prefix Caching ( #26377 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-11-02 04:16:23 -08:00
Julien Denize
73444b7b56
Performance fix MistralTokenizer: cache special ids and tokens ( #27925 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2025-11-02 08:48:33 +00:00
Cyrus Leung
853a8eb53b
[Bugfix] Fix Qwen Omni audio inference ( #27920 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-02 05:06:05 +00:00
Ben Browning
758ea2e980
[CI/Build] Fix flaky test_transcription_validation.py::test_basic_audio_gemma ( #27924 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-11-02 03:45:02 +00:00
Yue Zhang
685c99ee77
[KV offload] Offloading connector async scheduling support ( #27648 )
...
Signed-off-by: KevinCheung2259 <2651309292@qq.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-11-01 21:08:56 +00:00
Benjamin Bartels
1e88fb751b
Adds anthropic /v1/messages endpoint to openai api_server ( #27882 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
2025-11-01 12:45:42 -07:00
Nick Hill
c2ed069b32
[BugFix] Fix mixed penalties batch with async scheduling ( #27910 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-01 10:51:24 -07:00
wenxindongwork
af6e19f50f
[Core][TPU] Support TPU Data Parallalism ( #27365 )
...
Signed-off-by: wenxindongwork <wenxindong@google.com >
2025-11-01 17:14:44 +00:00
Cyrus Leung
99d69af9ec
[Bugfix] Python 3.10 compatibility for Self ( #27918 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-01 15:28:54 +00:00
Haco
d811b442d3
[Bugfix] DeepSeek V3.2 MTP metadata & CUDA graph issues ( #26779 )
...
Signed-off-by: xiaohajiayou <923390377@qq.com >
2025-11-01 10:52:43 -04:00
wangxiyuan
30a14b034f
[V0 deprecation] Remove VLLM_USE_V1 usage in platform and v1 module ( #27798 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 10:17:45 +00:00
Harry Mellor
799ce45cc1
[Docs] Mock all imports for docs ( #27873 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 10:02:23 +00:00
ai-jz
2c0c7c39bd
feat(benchmarks): support HF model names in multi-turn benchmark ( #27850 )
2025-11-01 08:04:52 +00:00
Yihua Cheng
e675118849
[Add] cmdline argument parsing for KV cache offloading modules ( #27621 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 07:17:07 +00:00
TJian
e2347dbf58
[Bugfix] [Model] Missing MRoPE function definition from KeyeForConditionalGeneration ( #27895 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-01 13:45:23 +08:00
Cyrus Leung
879a06579e
[CI/Build] Bump transformers version ( #27528 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 22:11:07 -07:00
yugong333
29de3cdee4
Adding SplitK in fused_moe_lora kernel ( #27818 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 12:55:46 +08:00
Yan Ma
7e2729b57e
[Multimodal][XPU]Enable vision attn backend for xpu platform ( #27525 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Yejing Lai <yejing.lai@intel.com >
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-01 04:45:02 +00:00
Jee Jee Li
3a5de7d2d6
[Bugfix] Fix KDA output ( #27905 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 11:54:36 +08:00
Jee Jee Li
bc4486d609
[Kernel] Enable FusedMoEModularKernel support bias ( #27754 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 02:05:12 +00:00
Nick Hill
0cdbe7b744
[Core] Async scheduling + structured outputs compatibility ( #26866 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-01 00:35:04 +00:00
Chen Zhang
df334868ca
[Hybrid] A simpler algorithm to find kernel_block_size ( #26476 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-31 21:30:28 +00:00
Bram Wasti
0e0a638c3b
Batch invariance doc ( #27839 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-31 17:22:19 -04:00
Matthew Bonanni
f29aeb5a25
Add FLASHINFER_MLA to test_mla_backends and add B200 CI run ( #27663 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-31 11:12:19 -07:00
Vinay R Damodaran
5e8862e9e0
[Feature] Pydantic validation for scheduler.py and structured_outputs.py ( #26519 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 18:05:50 +00:00
Nick Hill
9e5bd3076e
[Cleanup] Remove no-longer-used SpeculativeConfig.enable_chunked_prefill ( #27826 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-31 10:57:45 -07:00
Shu Wang
fc16f1c477
Flashinfer_CUTLASS_MOE fuses quantization for TP ( #27223 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com >
2025-10-31 17:54:29 +00:00
ZiTian Zhao
bc306fe5e9
fix incorrect type annotation in KimiMLP ( #27885 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
2025-10-31 17:38:02 +00:00
Chenguang Zheng
103a468bbf
[bugfix] Missing cached item in beam search ( #27874 )
...
Signed-off-by: fake0fan <645327136@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-31 17:34:27 +00:00
Rob Mulla
70bfbd7b16
Docs update tpu install instructions ( #27824 )
...
Signed-off-by: Rob Mulla <rob.mulla@gmail.com >
Signed-off-by: Rob Mulla <RobMulla@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 10:29:55 -07:00
GuanLuo
d6517be3cd
[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node ( #26338 )
...
Signed-off-by: Guan Luo <gluo@nvidia.com >
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com >
Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-10-31 10:16:00 -07:00
Isotr0py
7e06c40e63
[Bugfix] Fix broken MRoPE for GLM-4.1V/GLM-4.5V ( #27860 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-31 17:04:51 +00:00
Madeesh Kannan
675704ac01
[Bugfix] Allow 64-bit integer values for LoRA IDs to avoid overflow/truncation ( #27876 )
...
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com >
2025-10-31 16:58:42 +00:00
Jee Jee Li
0384aa7150
[CI/Build] Add gpt-oss LoRA test ( #27870 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-31 22:17:21 +08:00
Jiangyun Zhu
3857eb8725
[Perf] Decouple torch op from GDA to leverage torch.compile ( #27871 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-31 21:35:52 +08:00
Huamin Li
933cdea440
[BugFix] Don’t compute reorder threshold when there are no attention groups ( #27861 )
2025-10-31 11:36:18 +00:00
Isotr0py
3933f18a5e
[Bugfix] Avoid too small block m/n for FlexAttention kernel option ( #27853 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-31 19:33:12 +08:00
toncao
e5ef4dfc11
[Kimi-Linear] Correct prefixes and add compatibility to AWQ quants ( #27834 )
...
Signed-off-by: toncao <cpatonn@gmail.com >
Co-authored-by: toncao <cpatonn@gmail.com >
2025-10-31 17:36:37 +08:00
Akash kaothalkar
36960501d3
[Hardware][Powerpc] Fix VLLM_CPU_OMP_THREADS_BIND="auto" low CPU utilization for Power ( #27734 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-31 07:45:26 +00:00
Seiji Eicher
b2e65cb4a7
[benchmark] Make request IDs unique across clients by default ( #27723 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-10-30 17:40:35 -07:00
Wentao Ye
2bf0bcc1fc
[CI Test] Add Scheduled Integration Test ( #27765 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 17:29:26 -07:00
Jakub Sochacki
697f507a8e
[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 ( #26919 )
...
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl >
2025-10-31 07:57:22 +08:00
Matthew Bonanni
d5d2a0fe74
[Misc] Make all tool scripts executable ( #27831 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-30 23:46:02 +00:00
Nick Hill
c9791f1813
[BugFix] Fix broken import in initialize_ray_cluster() ( #27838 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-30 16:26:13 -07:00
Paul Zhang
e7acb20076
[Feature] Batch invariant torch.compile ( #27660 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-30 13:11:29 -07:00
Jialin Ouyang
4b68c4a55b
[Core][Perf] Only invoke save_new_computed_blocks when computed blocks are not empty ( #27799 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-30 19:47:30 +00:00
Wentao Ye
a8141fa649
[Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK ( #27750 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 15:32:39 -04:00
Sumanth R Hegde
4917002523
[Fix] Skip record_sleep_state logic in PrometheusStatsLogger if not in dev mode ( #27789 )
...
Signed-off-by: SumanthRH <sumanthrh99@gmail.com >
2025-10-30 19:26:27 +00:00
cong-meta
a2981c4272
[EP/DP][API Server] Enable DP-aware routing in OpenAI API requests ( #24945 )
...
Co-authored-by: Cong Chen <prowindy@gmail.com >
2025-10-30 12:10:16 -07:00
Jialin Ouyang
4574d48bab
[Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index ( #27629 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-30 11:52:36 -07:00
Tyler Michael Smith
ab98f6556f
[Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) ( #27811 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-30 11:52:18 -07:00
Roger Meier
2918c1b49c
[Model] Use the same fused_moe configs for all H200 devices ( #23642 )
...
Signed-off-by: Roger Meier <r.meier@siemens.com >
2025-10-30 17:36:56 +00:00
Mengqing Cao
1004205795
[MTP] Refactor mtp predictor to avoid d2h operation ( #27643 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-30 17:27:39 +00:00
Huy Do
ba33e8830d
Reapply "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" ( #27768 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-30 10:22:30 -07:00
Kebe
33a0ea5f32
[Docs] add Shanghai Meetup - 2025/10 ( #27545 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: esmeetu <jasonailu87@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: esmeetu <jasonailu87@gmail.com >
2025-10-31 00:33:13 +08:00
Ilya Markov
60f76baa66
[Misc] Replace CUDA_VISIBLE_DEVICES in DP with torch.cuda.set_device for device selection on cuda-like devices ( #27564 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-30 11:41:44 -04:00
Varun Sundar Rabindranath
e5e076cad7
[BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP ( #27762 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-30 08:24:31 -07:00
Li, Jiang
eebf00cb0c
[Bugfix][CPU] Fix MRoPE dispatch on the CPU backend ( #27800 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-30 15:12:05 +00:00
Fan Yin
9956aae4ea
[Model][Ouro] Support Ouro Model ( #27794 )
...
Signed-off-by: yinfan.1024 <yinfan.1024@bytedance.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-30 22:34:41 +08:00
Zhewen Li
0fe0140408
[KV offload] Enable CPU KV offload on CUDA alike Platforms ( #27770 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-30 22:10:29 +08:00
Zhiyuan Li
4e68cc9b6a
[Model] Introduce Kimi Linear to vLLM ( #27809 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn >
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com >
2025-10-30 21:02:27 +08:00
Huamin Li
1994de99ea
[CI Failure] Fix test_kv_cache_model_load_and_run ( #27717 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 12:27:53 +00:00
wang.yuqi
4464723f22
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. ( #25524 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-30 12:13:05 +00:00
Sairam Pillai
74374386e2
[Bugfix] Improve GPU validation logging in Ray fallback scenarios ( #25775 )
...
Signed-off-by: Sairam Pillai <sairam.pillai61@gmail.com >
2025-10-30 11:57:59 +00:00
Wentao Ye
c01f6e525f
[CI] Fix mypy for vllm/v1/core and vllm/v1/engine ( #27108 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 11:32:17 +00:00
Huamin Li
c7d2a554ba
[CI Failure] fix test_default_mm_loras ( #27795 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 18:13:03 +08:00
wangxiyuan
af826e0820
[V0 deprecation] Remove VLLM_USE_V1 usage in config module ( #27784 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-30 09:42:49 +00:00
Zhewen Li
e806178d2a
[BugFix][VL] Fix FA selection on Qwen2.5-VL ( #27790 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-30 07:54:44 +00:00
Huamin Li
5be1bed790
[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 ( #27113 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 07:50:56 +00:00
yitingdc
31b55ffc62
use stringData in secret yaml to store huggingface token ( #25685 )
...
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io >
2025-10-30 00:47:36 -07:00
Bram Wasti
ded8ada86a
Add more dims for batch invariant shims ( #27489 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-30 05:28:45 +00:00
Kuntai Du
8bff831f0a
[Benchmark] Cleanup deprecated nightly benchmark and adjust the docstring for performance benchmark ( #25786 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-10-30 04:43:37 +00:00
Lucas Wilkinson
b5d70751d8
[BugFix] Reordering extend logic fix ( #27739 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-29 21:39:34 -07:00
Fardin Hoque
b8c48c5d72
kernels/moe test pruning ( #27053 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-30 12:10:34 +08:00
Benjamin Bartels
17d055f527
[Feat] Adds runai distributed streamer ( #27230 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
Co-authored-by: omer-dayan <omdayan@nvidia.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-29 21:09:10 -07:00
Nick Hill
2ce5c5d3d6
[BugFix] Handle unscheduled requests properly when async scheduling ( #27756 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-29 21:04:25 -07:00
Kunshang Ji
b5bae42f91
[XPU] Update latest IPEX 2.8 release ( #27735 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-10-30 11:17:13 +08:00
Chen Zhang
d7fb10c574
[Bugfix] mamba-block-size is set for vision language model ( #27773 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-29 19:39:57 -07:00
Yan Ma
b798e39f93
[XPU][bugfix] fix rope for llama4 and deepseek ( #25145 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-10-30 09:43:13 +08:00
Chenheli Hua
48eb8eba58
[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. ( #27760 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 23:17:48 +00:00
Wentao Ye
b5d90f7400
[Bug] Fix DBO IMA issue for DeepEPHT ( #27666 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 16:28:27 -04:00
Nick Hill
d4aa144343
[BugFix] Fix handling of resumed reqs in SharedStorageConnector ( #27719 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-29 20:16:52 +00:00
Wentao Ye
fcb1d570bb
[Bug] Fix DeepEP low latency assert self.batched_router_logits.size(-1) == full_router_logits.size(-1) Bug ( #27682 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 14:50:39 -04:00
Nicolò Lucchesi
accb8fab07
[KVConnector] Add metrics to Prometheus-Grafana dashboard ( #26811 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-10-29 18:44:49 +00:00
Wentao Ye
5b0448104f
[Bug] Raise error explicitly if using incompatible backend ( #27424 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 13:29:20 -04:00
22quinn
f7a6682872
[CI/Build] Test torchrun with 8 cards ( #27548 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-29 10:26:06 -07:00
Boyuan Feng
a9fe0793f2
use_aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#27698 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-29 17:08:54 +00:00
JartX
7568a282b9
[FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA ( #27744 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-29 16:55:35 +00:00
Braulio Dumba
1da3309ace
[Core] Exposing engine sleep & wake_up state as prometheus metrics ( #24176 )
...
Signed-off-by: Braulio Dumba <Braulio.Dumba@ibm.com >
2025-10-29 09:32:01 -07:00
Wentao Ye
5522fb274b
[Chore] Optimize P2PNCCLEngine http_address ( #27488 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 00:05:09 +08:00
Nicolò Lucchesi
0f95a1c3f2
[CI] Fix flaky test_two_responses_with_same_prev_id test ( #27745 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-29 15:10:35 +00:00
Xiake Sun
ded24e3e54
[ROCm][Platform] Add MI308X device id in _ROCM_DEVICE_ID_NAME_MAP ( #27623 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
2025-10-29 14:44:03 +00:00
Roger Young
d6704dd099
Fix MiniMax-M2 rmsnorm precision and remove useless code ( #27627 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-29 21:01:05 +08:00
Cyrus Leung
ecca3fee76
[Frontend] Add vllm bench sweep to CLI ( #27639 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-29 05:59:48 -07:00
Zhewen Li
9a0d2f0d92
[CI/Build] Skip cpu offloading test on AMD ( #27690 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 12:55:51 +00:00
Isotr0py
ad3ec89532
[VLM] Add Qwen3-VL generation test ( #25185 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 12:19:37 +00:00
Kevin H. Luu
3481e40743
[chore] Remove models weight on S3 logic ( #27725 )
...
Signed-off-by: kevin <kevin@anyscale.com >
2025-10-29 10:29:49 +00:00
Eugene Khvedchenya
5e72216d17
Feature/video support in random mm dataset ( #25963 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Eugene Khvedchenya <ekhvedchenia@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 18:24:52 +08:00
Isotr0py
1a33aacf82
[Misc] Raise error for missing video metadata in MultiModalDataParser ( #27664 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-29 10:06:42 +00:00
Yue Zhang
7ba6aa8f56
[Fix] import get_kv_cache_torch_dtype error in LMCacheConnector integration ( #27670 )
...
Signed-off-by: KevinCheung2259 <2651309292@qq.com >
2025-10-29 10:03:54 +00:00
Alec S
ab2eb27b74
[Frontend] [gpt-oss] Mcp type bug ( #27689 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-29 10:01:32 +00:00
Alec S
3c7fefdeba
[Frontend] [gpt-oss] Tool json call parsing error retry ( #27675 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-29 09:42:44 +00:00
bnellnm
1891cf605a
[Bugfix] Fix modular kernel tests ( #27707 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-29 16:14:33 +08:00
Jiangyun Zhu
8df98c2161
[perf] Enable concurrent execution of "shared_experts" and "selected_experts" in qwen3-next ( #27578 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-29 08:12:54 +00:00
Cyrus Leung
4fb8771cc0
[CI/Build] Move pre-commit only scripts to tools/pre_commit ( #27657 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-29 08:04:33 +00:00
Dipika Sikka
413ef7a3b4
[Speculators] Move tests + fix integration ( #27308 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Signed-off-by: rahul-tuli <rtuli@redhat.com >
Co-authored-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-10-29 00:54:21 -07:00
Zhewen Li
8b62495076
[Bugfix] Fix non-contiguous tensor error in rocm_unquantized_gemm_impl ( #27605 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 00:00:15 -07:00
Zhewen Li
83fd49b1fc
[CI/Build][Bugfix]Fix Quantized Models Test on AMD ( #27712 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 06:27:30 +00:00
Shaoting
a4a4f0f617
[KV Connector] Update lmcache connector with latest compatibility ( #27681 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-10-29 05:38:37 +00:00
Lukas Geiger
0d8161b075
[Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes ( #27705 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 05:28:20 +00:00
liuzhenwei
d2c33c397a
[NIXL][XPU] update name of nixl wheel ( #27631 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-10-29 12:43:29 +08:00
Varun Sundar Rabindranath
f6d5f5888c
[Build] Revert triton_kernels requirements ( #27659 )
2025-10-28 21:07:09 -07:00
Simon Mo
9007bf57e6
Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" ( #27714 )
2025-10-28 20:58:01 -07:00
Huy Do
f257544709
Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 ( #27598 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-28 19:39:15 -07:00
Jialin Ouyang
0b51c9bd8b
[Core] Early return in SlidingWindowManager.remove_skipped_blocks ( #27673 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-29 01:32:33 +00:00
Wentao Ye
d3ab240f39
[Bug] Fix deepep low latency use nvlink by default ( #27677 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-28 23:53:12 +00:00
Lucas Kabela
94666612a9
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model ( #23207 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com >
2025-10-28 22:36:43 +00:00
Nick Hill
4fe5895361
[AsyncScheduling] Make async overlap work with logprobs ( #27615 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-28 22:35:54 +00:00
Or Ozeri
111faf1118
[Core] Scheduler: Publish connector events after output ( #25875 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-10-28 21:01:33 +00:00
Wentao Ye
6afc28a9ba
[Test] Batch Invariant: Unit test using parameterized backend ( #27478 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-28 13:51:35 -07:00
Lucas Wilkinson
141e6a0505
[Misc] Make reorder batch also separate extends ( #27367 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-28 10:55:10 -07:00
Matvei Pashkovskii
130aa8cbcf
Add load pattern configuration guide to benchmarks ( #26886 )
...
Signed-off-by: Matvei Pashkovskii <mpashkov@amd.com >
Signed-off-by: Matvei Pashkovskii <matvei.pashkovskii@amd.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-28 10:49:15 -07:00
Zhengxu Chen
e3d8186666
[compile] Add fallback path to AOT compile when serialization fails. ( #27350 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 12:54:26 -04:00
Cyrus Leung
f5710ef02a
[Misc] Make LayerBlockType a Literal instead of Enum ( #27658 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-28 16:23:35 +00:00
Mohammad Miadh Angkad
a8c02fb5bf
[Bugfix][CI] Fix v1 attention backend tests and add CI coverage ( #26597 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
Signed-off-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-28 11:42:05 -04:00
Kero Liang
02af36df36
[Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer ( #27117 )
...
Signed-off-by: Kero Liang <kerorek@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: donglu <donglu@cohere.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-28 15:01:24 +00:00
Zhiyuan Li
e88bdd60d9
[FLA] Introduce Kimi Delta Attention(KDA) to VLLM ( #27654 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn >
2025-10-28 22:56:28 +08:00
Samuel Shen
05e034f085
[nit]: Fix import for the lmcache integration ( #27600 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-10-28 14:40:55 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
936643a868
[BugFix] Also consider RAY_EXPERIMENTAL_NOSET_* when storing compilation cache ( #27294 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2025-10-28 10:22:28 -04:00
Junpu Fan
b186149e8e
[Bugfix][Frontend] validate arg priority in frontend LLM class before add request ( #27596 )
...
Signed-off-by: Junpu Fan <junpufan@gmail.com >
2025-10-28 14:02:43 +00:00
22quinn
2abbd351ef
[Core] Enable async scheduling for external_launcher mode ( #27394 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-28 13:52:47 +00:00
wangln19
446912d1cb
fix: allow HuggingFace standard chat template params via **kwargs ( #27622 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-28 21:12:34 +08:00
Zhengxu Chen
a00d6254e9
[compile] Disable dynamo guards check for AOT compilation. ( #27288 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 12:58:12 +00:00
Asaf Joseph Gardin
05181cc57f
[Hybrid] Add mamba_block_size to Engine Args ( #27289 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-10-28 12:54:24 +00:00
Zhengxu Chen
259504e147
[compile] Add enable_prompt_embeds to compile hash. ( #27285 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 20:46:03 +08:00
Wentao Ye
0484b64248
[Bug] Fix shape issue for eplb expert weights ( #27589 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 20:44:05 +08:00
Cyrus Leung
f58d9b6404
[Misc] Separate out utils.counter and move utils.Device to engine ( #27588 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-28 12:20:46 +00:00
Matthew Bonanni
44b5ce956d
[Bugfix] In LongRoPE, decide short vs long based on max_model_len ( #27431 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-28 12:00:56 +00:00
Nick Hill
7a865f2325
[V0 Deprecation] Remove vestigial V0 logits_processors.py file ( #27601 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-28 19:17:45 +08:00
wangln19
2fa90bda27
Fix a robust parsing issue in KimiK2ToolParser that causes IndexError ( #27565 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
2025-10-28 11:11:50 +00:00
Zhewen Li
0291fbf65c
[CI/Build] Fix amd model executor test ( #27612 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-28 08:58:11 +00:00
Jialin Ouyang
b46e4a06f1
[Core][Bookkeeping Optimization] Update against numpy view of is_token_ids tensor ( #27618 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-28 08:13:10 +00:00
Li, Jiang
d34f5fe939
[Bugfix][CPU] Fallback oneDNN linear to torch linear to fix half gemm support on legecy platforms ( #27526 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-27 23:25:44 -07:00
Eric Yue
bdb01a38fe
[Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X ( #27323 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-10-27 22:58:06 -07:00
vllmellm
5b3c35a68e
[ROCm] [Doc] Update ROCm installation docs ( #27327 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-28 13:00:50 +08:00
Chauncey
61fbfe5274
[Bugfix] fixed inconsistent finish_reason handling between V0 and V1 engines ( #27555 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-28 02:18:08 +00:00
Kuntai Du
255e34ca50
[Stability fix] turn off HMA allocator when connector is set ( #27592 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-27 18:32:23 -07:00
Roger Wang
a8d2e326ec
[Bugfix][CI] Fix config resolving logic with remote models ( #27610 )
2025-10-28 00:48:32 +00:00
Andrew Xia
53a56e658b
[gpt-oss][2/N] Support input_messages in responsesRequest ( #26962 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-27 23:15:49 +00:00
usberkeley
69f064062b
Code quality improvements: version update, type annotation enhancement, and enum usage simplification ( #27581 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-10-27 17:50:22 +00:00
Micah Williamson
921e78f4bb
[ROCm] Update AITER branch for ROCm base docker ( #27586 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-10-27 17:22:33 +00:00
Cyrus Leung
6ebffafbb6
[Misc] Clean up more utils ( #27567 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 15:30:38 +00:00
Ben Browning
3b96f85c36
[Chore]: Stream tokens vs characters in tool call parser tests ( #26513 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-10-27 23:06:25 +08:00
tingtinggithub
23ad820553
fixing mm placeholder replacement issue with gemma3 ( #27538 )
...
Signed-off-by: tingtingtang1992 <streamttt@gmail.com >
2025-10-27 14:34:01 +00:00
Varun Sundar Rabindranath
5d3be3ba4c
[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement ( #27487 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-27 07:32:50 -07:00
Yu Jiaqi
4f882be4a0
[Model] Siglip2 Model Support ( #27566 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-27 06:57:37 -07:00
Asaf Joseph Gardin
9273754222
[Hybrid] Added supports_mamba_prefix_caching Protocol ( #27339 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-10-27 13:05:20 +00:00
Jee Jee Li
f4e8154076
[Kernel] Enable moe LoRA kernel support FP16 ( #27468 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 19:48:37 +08:00
Fadi Arafeh
a663f6ae64
[cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 ( #27415 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-27 11:14:55 +00:00
Chauncey
a4fc21895e
[Bugfix] Fixed when return_token_ids=False, the first event still contains prompt_token_ids. ( #27561 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-27 11:06:43 +00:00
Shanshan Shen
a3e8611da5
[Bugfix] Limit the default value of max_model_len when it is not specified by users ( #27556 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-10-27 10:16:20 +00:00
Cyrus Leung
7c2bdb83dc
[Misc] Clean up utils ( #27552 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 09:05:40 +00:00
Danielle Robinson
9932ed6a83
[Kernel] Adding split_K implementation for fused_moe_lora ( #27291 )
...
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Danielle Robinson <dcmaddix@gmail.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 02:05:24 -07:00
Jee Jee Li
2d631d28c6
[Doc] Slight improvement to M2 and beyond ( #27554 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-27 09:02:10 +00:00
Cyrus Leung
b368382964
[Model] Deprecate merge_by_field_config=False ( #27551 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 16:43:00 +08:00
gnovack
a806c14cc7
[Performance][LoRA] add context varying params to 'do_not_specialize' in fused moe lora ( #27445 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2025-10-27 06:31:55 +00:00
yyzxw
181bf5bbde
[Docs] reemove the incorrect enable_reasoning parameter ( #27550 )
...
Signed-off-by: zxw <1020938856@qq.com >
2025-10-26 23:17:19 -07:00
Cyrus Leung
cbd5e07a51
[Model] Use merge_by_field_config for MM models (Qwen series) ( #27546 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 05:38:05 +00:00
CSWYF3634076
63b22e0dbb
[Model][Bugfix] fix ernie45 moe 300B SharedFusedMoE output tuple ( #27316 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-26 20:53:31 -07:00
Roger Young
5980604c44
Fix MiniMax-M2 copyright ( #27537 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-27 03:29:51 +00:00
youkaichao
361a7463d3
fix m2 test ( #27536 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-27 01:04:36 +08:00
Roger Young
720af6ab79
[Model][MiniMax-M2] Support MiniMax-M2 Model ( #27535 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-27 00:59:11 +08:00
Cyrus Leung
55cba4a05c
[CI/Build] Update causal-conv1d installation ( #27529 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 22:14:22 +08:00
Cyrus Leung
c7abff2990
Revert "[CI/Build] Use CPU for mm processing test on CI ( #27522 )" ( #27531 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 04:44:27 -07:00
Yeshwanth N
71b1c8b667
[Chore]:Extract math and argparse utilities to separate modules ( #27188 )
...
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com >
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com >
Signed-off-by: yeshsurya <yeshsurya@gmail.com >
2025-10-26 04:03:32 -07:00
Cyrus Leung
8fb7b2fab9
[Doc] Fix links to GH projects ( #27530 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 17:55:51 +08:00
Cyrus Leung
be7b55a83d
[Doc] Remove Molmo warning ( #27527 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 16:22:52 +08:00
Lucia Fang
315b860abe
[bugfix]fix empty prompts for async-engine mode in benchmark throughput ( #27494 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-10-26 08:16:35 +00:00
rongfu.leng
87c41c26ad
[Bugfix] Fix processor initialization for model from modelscope instead of HF ( #27461 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-26 07:44:31 +00:00
JartX
65d2cf9511
[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA ( #27190 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-10-26 15:08:52 +08:00
Isotr0py
d63cd9ff10
[CI/Build] Use CPU for mm processing test on CI ( #27522 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-26 13:09:18 +08:00
Cyrus Leung
66a168a197
[CI/Build] Refactor processing tests ( #27470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-25 16:14:30 +00:00
Matthew Bonanni
a99564ac5b
[Attention] Add missing kv cache scale setup ( #27490 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-25 00:12:49 -07:00
Cyrus Leung
4c5f632165
[Misc] Simplify max tokens in multimodal registry ( #27500 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-24 23:56:01 -07:00
Kuntai Du
b853540388
[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector ( #25712 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-24 23:34:18 -07:00
Zhuohan Li
56ed7609a9
Revert "[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selectio… ( #27502 )
2025-10-25 05:31:43 +00:00
Jiangyun Zhu
29c9cb8007
[CI] Add tests for cudagraph ( #27391 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-25 02:37:33 +00:00
Yihua Cheng
83f478bb19
[KVConnector] Migrate the LMCache integration code to be vLLM native ( #25542 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2025-10-25 00:23:53 +00:00
Varun Sundar Rabindranath
269c4db0a4
[Misc][DP] Guard mxfp4 implementation selection ( #27484 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-24 23:29:24 +00:00
Wentao Ye
52efc34ebf
[Log] Optimize Startup Log ( #26740 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-24 19:27:04 -04:00
Pengchao Wang
d95d0f4b98
[Distributed] Basic set of configuration for large EP deployment on GB200 ( #27328 )
...
Signed-off-by: Pengchao Wang <wpc@fb.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2025-10-24 14:16:44 -07:00
Lehua Ding
0402428200
[Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run ( #27455 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
2025-10-24 20:45:36 +00:00
jinghanhu
17af6aa0da
[Document] Add ms-swift library to rlhf.md ( #27469 )
2025-10-24 20:31:50 +00:00
Zhewen Li
fc168c33f3
[CI/Build] Fix test_torch_utils in AMD CI ( #27317 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-24 12:26:00 -07:00
Isotr0py
acc78aeb88
[Bugfix] Fix interns1-vit qk norm code path ( #27480 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-24 17:43:45 +00:00
Ming Yang
0f67d4d962
[Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek ( #26397 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-10-24 10:24:08 -07:00
kourosh hakhamaneshi
7e1d697b56
[Bugfix] Fix MultiConnector stats reconstruction across process boundaries ( #27366 )
...
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-10-24 17:08:05 +00:00
Chendi.Xue
699d62e6cf
[NIXL][BUGFIX] delay done_recving queue cleanup to bottom of get_finished ( #27297 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-24 17:01:41 +00:00
Richard Zou
cd390b609d
[compile] Turn standalone_compile back on ( #27460 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-10-24 16:30:27 +00:00
Fadi Arafeh
2080b05099
[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype ( #27472 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-24 15:57:48 +00:00
Lifans
6454afec90
[Doc] Fix minor issues in docs/design/metrics.md ( #27436 )
...
Signed-off-by: Lifan Shen <lifans@meta.com >
2025-10-24 05:40:54 -07:00
Chauncey
41a62564a7
Fix test named tool use ( #27458 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-24 20:27:45 +08:00
fhl2000
284cc92275
[MISC] cudagraph_capture_sizes related improvements ( #26016 )
...
Signed-off-by: fhl <2410591650@qq.com >
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-24 05:11:05 -07:00
ioana ghiban
435be10db9
Fix AArch64 CPU Docker pipeline ( #27331 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com >
2025-10-24 05:11:01 -07:00
Cyrus Leung
b7030d962b
[Benchmark] Enable benchmark to run with encoding_format="bytes" ( #27467 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-24 11:16:50 +00:00
Chauncey
3567816932
[Refactor] move tool parsing logic from protocol.py to the tool parser ( #27383 )
...
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-10-24 09:53:23 +00:00
22quinn
e0ef8a2920
[BugFix] Fix torchrun DP with LLM class ( #27395 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-24 08:11:37 +00:00
Isotr0py
42efe609ba
[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer ( #27418 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-24 07:32:47 +00:00
Yu Jiaqi
88d3141ec6
[Docs] remove v1 column for embedding models ( #27446 )
...
Signed-off-by: piood <2477084691@qq.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-23 23:55:03 -07:00
Rui Qiao
09a6a49eaf
[Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator ( #27443 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-24 14:53:09 +08:00
strinczer
074475541a
[Bugfix] Fix Pydantic union resolution for ResponseFunctionToolCall in Responses API ( #26706 )
...
Signed-off-by: Shai Trinczer <strinczer@icloud.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-23 22:53:42 -07:00
Aaron Pham
d4c574c39f
[Chore] remove structural tags logging lines ( #27451 )
2025-10-24 05:35:45 +00:00
usberkeley
c528b9006a
Fix EventPublisherFactory logic for disabled KV cache events ( #27419 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-10-24 05:00:01 +00:00
fhl2000
85fee74b33
[Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder ( #27427 )
...
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
2025-10-23 20:31:14 -07:00
hfan
8dbe0c527f
[Misc] Add TPU usage report when using tpu_inference. ( #27423 )
...
Signed-off-by: Hongmin Fan <fanhongmin@google.com >
2025-10-23 20:29:37 -07:00
Xiangyu Li
5cc6bddb6e
[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm ( #26092 )
2025-10-23 23:26:13 -04:00
Harry Mellor
1f9460c4c1
Fix pooling adapters for Transformers backend ( #27338 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 20:23:55 -07:00
xiao-llm
70022ffc00
Granite 4.0 quark quantization support ( #26944 )
...
Signed-off-by: Xiao YU <Xiao.YU@xilinx.com >
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com >
Co-authored-by: Xiao YU <Xiao.YU@xilinx.com >
2025-10-24 02:14:03 +00:00
Akash kaothalkar
f417746ad7
[Hardware][POWERPC] Disable oneDNN path in vllm/model_executor/layers/utils.py for Powerpc ( #27422 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-23 21:21:36 +00:00
Yu Jiaqi
0552cfb195
[Model] Siglip Embedding Support ( #27324 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-23 20:19:48 +00:00
Kebe
51dd14ac2b
[Bugfix][DP] Fix creating too many DP Placement Groups ( #26880 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-23 20:16:51 +00:00
Matthew Bonanni
dbfbf9f324
[Attention] Fix FlashMLA metadata builder arguments for q_len > 1 ( #27368 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-23 15:58:15 -04:00
Jonathan Chen
ca76486a16
[Chore] Separate out vllm.utils.platform_utils.py ( #27374 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 19:08:06 +00:00
Varun Sundar Rabindranath
a9f55dc588
[Misc] Add triton_kernels dependency ( #27370 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-23 12:04:14 -07:00
Isotr0py
81d5bb765a
[Bugfix] Fix AWQ marlin layer skipping ( #27416 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-23 18:30:28 +00:00
Gregory Shtrasberg
0825197bee
[Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek ( #27373 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-10-23 17:43:53 +00:00
Alexander Matveev
9ef3d5b875
[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer ( #27220 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-10-24 00:03:14 +08:00
Alexei-V-Ivanov-AMD
295c7f0267
Mirroring the test definitions (2025-10-22) ( #27362 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-24 00:02:26 +08:00
wang.yuqi
3fa2c12185
[Frontend][4/N] Improve all pooling task | Add plugin pooling task ( #26973 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com >
2025-10-23 14:46:18 +00:00
Cyrus Leung
fe2016de2d
[CI/Build] Remove unnecessary flags from test registry ( #27353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-23 14:42:40 +00:00
Ilya Markov
237cf6d32a
[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) ( #26709 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-23 20:58:39 +08:00
Navya Srivastava
faee3ccdc2
[Feature] Pydantic validation for speculative.py ( #27156 )
...
Signed-off-by: Navya Srivastava <navya.srivastava1707@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 12:19:33 +00:00
Bradley D
570c3e1cd4
[Bugfix] Honor --mm_encoder_attn_backend when used ( #27124 )
...
Co-authored-by: Bradley D <4551889+bradleyhd@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-23 20:09:52 +08:00
Harry Mellor
3a4255c7c4
Run mypy on the lowest supported Python version instead of system Python ( #27048 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 05:07:44 -07:00
tomeras91
61089465a6
[Model] Add MoE support for NemotronH ( #25863 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-10-23 10:27:23 +00:00
Tova Movshovitz
88afa11010
[Metrics] [KVConnector] Add connector prefix cache hit rate stats ( #26245 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-23 12:21:08 +02:00
Chauncey
d00ce29d89
[CI] Reorganize entrypoints tests ( #27403 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-23 10:10:06 +00:00
Louie Tsai
3b7bdf983b
add SLA information into comparison graph for vLLM Benchmark Suite ( #25525 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: louie-tsai <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-23 08:04:59 +00:00
Zhewen Li
50b788a17a
[CI/Build] Fix AMD CI: test_cpu_gpu.py ( #27388 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-23 07:55:00 +00:00
Lucia Fang
fc059c7061
[Bugfix] Fix args settings for guided decoding args ( #27375 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-10-23 07:34:06 +00:00
Cyrus Leung
bfb240cc49
[CI/Build] Fix Prithvi plugin test ( #27393 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-23 07:30:44 +00:00
Jonathan Chen
e255d92990
[Chore] Remove duplicate has_ functions in vllm.utils ( #27372 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 06:11:59 +00:00
wang.yuqi
3729ed00ba
[Model] Add num_cached_tokens for PoolingRequestOutput ( #27378 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-23 14:03:42 +08:00
Giancarlo Delfin
6644796bf4
[V1][spec decode] return logprobs for spec decoding ( #26060 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-22 22:59:59 -07:00
Andrew Sansom
ff93cc8c84
[CORE] Support Prefix Caching with Prompt Embeds ( #27219 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
2025-10-22 22:18:07 -07:00
PiteXChen
243ed7d32e
[Bugfix][Core] running queue index leakage exception ( #26754 )
...
Signed-off-by: CLFutureX <chenyongqyl@163.com >
2025-10-22 21:40:12 -07:00
fangpings
7e0941055f
[Bugfix] Fix incorrect kv cache metrics in grafana.json ( #27133 )
...
Signed-off-by: Fangping Shi <fangping_shi@apple.com >
Co-authored-by: Fangping Shi <fangping_shi@apple.com >
2025-10-22 20:58:36 -07:00
Cyrus Leung
6738e4a093
[Bugfix] Fix SLA tuner initialization ( #27355 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 20:43:04 -07:00
Isotr0py
2566dca2a9
[Bugfix] Fix deepseek-ocr multi-image inference and add merge_by_field_config=True with tensor schema support ( #27361 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 17:15:38 -07:00
Matthew Bonanni
b4fda58a2d
[MLA] Bump FlashMLA ( #27354 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-22 15:48:37 -07:00
dongbo910220
a0003b56b0
[Chore] Separate out system utilities from vllm.utils ( #27201 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 20:25:25 +00:00
Daisy-Ma-coder
5beacce2ea
[BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 ( #27128 )
...
Signed-off-by: qqma <qqma@amazon.com >
Co-authored-by: qqma <qqma@amazon.com >
2025-10-22 19:36:39 +00:00
rongfu.leng
8669c69afa
[Feature] publisher default set zmq in kv_event config ( #26915 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 19:19:33 +00:00
Sage
1651003c35
[Prefix Cache] Use LoRA name for consistent KV-cache block hashing ( #27211 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2025-10-22 18:13:03 +00:00
William Song
1cb8c6c5fe
[Doc] Fix numbering sequence in prefix caching ( #27357 )
...
Signed-off-by: William Song <jinwook@umich.edu >
2025-10-22 17:35:47 +00:00
Luciano Martins
e05a6754a8
[Model] Revert PR #26715 : Restore custom PaliGemma and Gemma3-MM impl… ( #27309 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
2025-10-22 10:05:34 -07:00
Isotr0py
084a9dae80
[Bugfix] Disable FlexAttention direct block mask building for encoder-only models ( #27344 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 16:39:08 +00:00
RED
c9461e05a4
Support Anthropic API /v1/messages Endpoint ( #22627 )
...
Signed-off-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-22 09:13:18 -07:00
Nicolò Lucchesi
4dfdb821c8
[P/D] Dynamic kv_output_aggregator collect size ( #26734 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-22 18:07:58 +02:00
Russell Bryant
58fab50d82
[Frontend] Require flag for loading text and image embeds ( #27204 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 15:52:02 +00:00
Isotr0py
db6f28d898
[Bugfix] Fix HF format InternVL large variants video processing ( #27330 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 08:39:23 -07:00
Cyrus Leung
14e2f1231e
[Bugfix] Make get_mrope_input_positions instance methods ( #27342 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 08:38:34 -07:00
Chendi.Xue
7c4767f1eb
[NIXL] use Host buffer to support TP_ratio > 1 for XPU ( #27140 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-10-22 15:28:13 +00:00
Jee Jee Li
9771e0b432
[Bugfix] Add missing 'is_internal_router' attribute to FusedMoEWithLoRA ( #27351 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 08:19:12 -07:00
Reinforce-II
980de31ca0
[bugfix] remove unused parameters to reduce unnecessary vram usage ( #26789 )
...
Signed-off-by: Reinforce-II <fate@eastal.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-22 08:16:09 -07:00
Wentao Ye
1c160841ea
[Bug] Fix DeepSeek-V2.5-1210-FP8 issue ( #27267 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-22 11:00:10 -04:00
Mark McLoughlin
4ca13a8667
[NIXL] Terminate handshake listener thread in shutdown ( #26404 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-22 16:59:53 +02:00
Isotr0py
675aa2ec64
[Model] Upstream Deepseek-OCR model ( #27247 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-22 07:59:15 -07:00
dongbo910220
3ae082c373
[Chore] Separate out optional dependency checks from vllm.utils ( #27207 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 10:44:21 -04:00
Alexei-V-Ivanov-AMD
49c00fe304
Mirroring changes in test-pipeline.yaml into test-amd.yaml ( #27242 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-22 09:59:45 -04:00
Mark McLoughlin
141d3b9fc5
[docs] Update v1 metrics design doc ( #27332 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: atalhens <sneh.lata@nutanix.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: atalhens <sneh.lata@nutanix.com >
2025-10-22 06:29:15 -07:00
Jee Jee Li
abf3db40ef
[Core] Handle MoE LoRA edge cases ( #27335 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 13:14:33 +00:00
gnovack
8e4ca4d14e
Bugfix - pass 'max_num_tokens_padded' into 'moe_lora_align_block_size' ( #27311 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 12:23:57 +00:00
Wentao Ye
1a0f4defb7
[Log] Add Warning for LLM(data_parallel_size=k) single-process DP Usage ( #27282 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-22 12:12:21 +00:00
Li, Jiang
843af7f7fc
[Bugfix][CPU] Disable dual stream execution for experts on CPU ( #27320 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-22 11:02:27 +00:00
wang.yuqi
1f633b8632
[Frontend][3/N] Improve all pooling task | Support binary embedding response ( #27066 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-22 18:38:57 +08:00
ExtReMLapin
a4c29e6e82
fixed reasoning streaming with tool_choice="required" ( #24108 )
...
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com >
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-10-22 09:42:55 +00:00
Harry Mellor
8f18feb191
Remove last level references not removed in #26355 ( #27260 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-22 09:18:17 +00:00
Huy Do
ed540d6d4c
Update release pipeline for PyTorch 2.9.0 ( #27303 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-22 09:18:01 +00:00
wangxiyuan
f6027b2855
[1/N][Platform] Cleanup useless function ( #26982 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-22 09:04:57 +00:00
Jiangyun Zhu
ab3e80042e
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled ( #27146 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-22 00:22:39 -04:00
Cyrus Leung
ceacedc1f9
[Benchmark] Add plot utility for parameter sweep ( #27168 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-21 20:30:03 -07:00
Nicolò Lucchesi
bfa59be8f1
[CI] Nixl integration tests DP-EP ( #27199 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-22 11:17:48 +08:00
vllmellm
265ecb05fb
[DOC] [ROCm] Add ROCm quickstart guide ( #26505 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-22 03:10:48 +00:00
Lain
09a7e6f617
[Deepseek v3.2] Remove extra logics in indexer ( #26465 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
Signed-off-by: Lain <siyuanf@nvidia.com >
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-21 23:34:03 +00:00
Tyler Michael Smith
6c2eef5a5d
[P/D] KVConnector for decode benchmarking ( #25986 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-10-21 16:30:47 -07:00
Benjamin Chislett
19748806f0
[Bugfix] skip cuda graph for drafter when running with eager ( #26821 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-21 15:39:09 -07:00
ExtReMLapin
4a8a567e16
Updated xgrammar backend to not deny supported string formats ( #27253 )
...
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com >
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-21 22:25:23 +00:00
Alexander Matveev
344a0017c0
[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE ( #26440 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-10-21 21:38:29 +00:00
Huy Do
becb7de40b
Update PyTorch to 2.9.0+cu129 ( #24994 )
...
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-21 17:20:18 -04:00
Tao He
250fb1b8ea
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. ( #27144 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-21 18:27:03 +00:00
Nick Hill
647214f3d5
[V0 Deprecation] Remove V0 executors ( #27142 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-21 11:09:37 -07:00
David Whyte-Gray
ddeec11ba9
[Bugfix][P/D] Reduce num_threads used by nixl ucx backend ( #27196 )
...
Signed-off-by: David Whyte-Gray <40244437+dagrayvid@users.noreply.github.com >
2025-10-21 13:41:52 -04:00
Wentao Ye
86ed77022d
[Feature] Batch Invariant for R1 TP 8 on Blackwell ( #27229 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-21 10:25:55 -07:00
Micah Williamson
aa1356ec53
[ROCm] Update Triton, Torch, and AITER branches for ROCm base Dockerfile ( #27206 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-10-21 12:01:23 -04:00
Pavani Majety
ecc3c0940a
Add @pavanimajety to .github/codeowners for Flashinfer, ModelOpt related code ( #27213 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-10-21 22:59:53 +08:00
JartX
ba09652de2
[ROCM] Enable CompressedTensorsWNA16 ( #27187 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2025-10-21 10:43:23 -04:00
Harry Mellor
bd66b8529b
[CI] Install pre-release version of apache-tvm-ffi for flashinfer ( #27262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-21 14:23:56 +00:00
dongbo910220
6c728f7771
[Chore] Separate out NCCL utilities from vllm.utils ( #27197 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-21 06:18:23 -07:00
Daniel Cámpora
80e9452984
[Deepseek v3.2] Optimize top_k_per_row ( #26763 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-21 08:30:07 +00:00
Roger Wang
c3a2c6ac5f
[MM][Core] Decouple ViT backend from LM backend ( #27061 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-10-21 00:30:10 -07:00
Nicolò Lucchesi
72f431e709
[Nixl] Minor refactor to handshake related metadata ( #26410 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-21 09:07:47 +02:00
Zebing Lin
be4445072c
[Fix][Spec Decode] Fix llama4 draft loading with different quantization ( #27136 )
...
Signed-off-by: linzebing <linzebing1995@gmail.com >
2025-10-20 23:19:00 -07:00
Benjamin Chislett
f381cf2302
[Bugfix] Fix broken MTP weight loading for FP8 KV Scales ( #27227 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-20 22:51:44 -07:00
Varun Sundar Rabindranath
5ff5d94e77
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 ( #26729 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-21 01:51:14 -04:00
Shu Wang
f95da13c3d
[ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 ( #26135 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-21 01:50:31 -04:00
Po-Han Huang (NVIDIA)
aef368aa08
[BugFix] GPT-OSS Attention DP + MoE TP weight loading issue ( #24032 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
2025-10-21 04:03:47 +00:00
Chen Wu
5f6cbf60d6
[Feature][Kernel]FusedMoE LoRA ( #21229 )
...
Signed-off-by: wuchen <cntryroa@gmail.com >
Signed-off-by: banjuede <lmklhc@163.com >
Signed-off-by: Chen Wu <cntryroa@gmail.com >
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: bk-201 <joy25810@foxmail.com >
Co-authored-by: wuchen <wuchen@zetyun.com >
Co-authored-by: Nathan Van Gheem <vangheem@gmail.com >
Co-authored-by: banjuede <lmklhc@163.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: bk-201 <joy25810@foxmail.com >
2025-10-21 03:01:37 +00:00
Russell Bryant
3ada34f9cb
[Frontend] Enforce tokenize=False when applying chat template ( #27205 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-21 02:57:34 +00:00
Lunwen He
0eb8f2b880
create is_in_the_same_node on cpu ( #26832 )
...
Co-authored-by: Lunwen He <lunwenh@meta.com >
2025-10-21 02:04:14 +00:00
Fadi Arafeh
163965d183
[cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 ( #27183 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Michael Yang <Michael.Yang@arm.com >
2025-10-21 02:02:58 +00:00
Nick Hill
a03cf9bc70
[V0 Deprecation] Remove V0 metrics code ( #27215 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-21 02:02:10 +00:00
Isotr0py
352c0c8a28
[Quantization] Automatically infer AWQ modules_to_not_convert field ( #26909 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-21 01:49:28 +00:00
Andrew Xia
bfe0b4bd2a
[ez] add uv lock to gitignore ( #27212 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-21 00:37:44 +00:00
Concurrensee
58fbbcb2f5
[ROCm] enable some tests in entrypoints test groups on AMD ( #26725 )
...
Signed-off-by: Yida <yida.wu@amd.com >
2025-10-21 00:37:16 +00:00
Heng Guo
87778d5f00
[Feature][Quantization] auto_round support for mixed bits quantization ( #23812 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com >
Signed-off-by: Heng Guo <heng.guo@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-20 22:23:30 +00:00
Nicolò Lucchesi
f9e7ad5400
[Bugfix][CI] Fix Distributed Tests (4 GPUs) async_sched+ray test ( #27195 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-20 16:34:54 +00:00
shivampr
4d0f266113
[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) ( #26268 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com >
2025-10-20 07:48:01 -07:00
Eugene Khvedchenya
e93ff6c8b9
Nemotron Nano V2 VL + EVS Video Support ( #27107 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Natan Bagrov <nbagrov@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-20 22:19:11 +08:00
ioana ghiban
1c691f4a71
AArch64 CPU Docker pipeline ( #26931 )
2025-10-20 07:09:40 -04:00
Jiangyun Zhu
9fce7bee74
[Kernel] Accelerate solve_tril with TMA ( #26746 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-20 05:39:02 +00:00
Andy Lo
b63f2143f8
[LoRA] LoRA cuda graph specialization ( #25914 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-20 04:21:09 +00:00
Yi Zhang
f32bf7582e
[Model][VLM] Support Bee-8B Model ( #27012 )
...
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com >
Signed-off-by: Yi Zhang <zhangyi970819@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-20 02:31:26 +00:00
Yongtao Huang
8a81d776ce
Fix typo in ValueError message: use kv_role instead of kv_disagg_role ( #27166 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-19 19:47:19 +00:00
Sergei Skvortsov
f6fdacd82c
[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled ( #26586 )
...
Signed-off-by: southfreebird <yvorott@gmail.com >
2025-10-19 19:24:46 +00:00
Cyrus Leung
d31f7844f8
[Misc] Move utils to avoid conflicts with stdlib, and move tests ( #27169 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-19 05:20:55 -07:00
iAmir97
7a6c8c3fa1
[Chore] Separate out vllm.utils.network_utils ( #27164 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
2025-10-19 03:06:32 -07:00
Jianyu Huang
221bf72577
output type conversion fix ( #27159 )
2025-10-19 08:10:07 +00:00
Cyrus Leung
b3aba04e5a
[Benchmark] Convenience script for multiple parameter combinations ( #27085 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-18 23:57:01 -07:00
dongbo910220
8a297115e2
[Chore] Separate out hashing utilities from vllm.utils ( #27151 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-19 11:09:38 +08:00
22quinn
191eed0bb9
[BugFix] Fix lazy imports involving outlines_core ( #27158 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-19 02:35:32 +00:00
Woosuk Kwon
fb860670da
[Minor] Remove unused env variable ( #27161 )
2025-10-18 18:48:35 -07:00
Tova Movshovitz
83e760c57d
[V1][Metrics][Plugin] Add plugin support for custom StatLoggerBase implementations ( #22456 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-18 15:12:46 -07:00
Lucas Wilkinson
c2bba69065
[BugFix] Disable fp8 kv-cache by default for DeepSeek V3.2 ( #27121 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 22:05:23 +00:00
Boyuan Feng
e133d6d218
[BugFix] fix graph partition signature ( #27139 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-18 17:34:36 -04:00
dongbo910220
a1946c9f61
[Chore] Separate out profiling utilities from vllm.utils ( #27150 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-18 19:12:01 +00:00
Lucas Wilkinson
9f020f4f31
[BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] ( #27111 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-18 12:44:39 -06:00
Nick Hill
3b45075206
[Minor] Add some clarifying comments to recent changes ( #27130 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-18 09:52:45 -07:00
Yongtao Huang
168e578efc
Fix incorrect string formatting in barrier timeout exceptions ( #27149 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-18 09:51:57 -07:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: isotr0py <2037008807@qq.com >
2025-10-18 09:48:22 -07:00
Lukas Geiger
5c2acb270a
[Models][QwenVL] Remove unnecessary .contiguous() calls ( #27106 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-18 07:05:05 -07:00
Nicolò Lucchesi
b26b70bec4
[Misc] Refactor get_kv_cache_spec into AttentionLayerBase ( #26587 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-18 13:51:21 +00:00
Fadi Arafeh
ab4be40fc5
[fix][cpu] fix prefill attention in CPU attention backend ( #27035 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-18 13:30:21 +00:00
Wentao Ye
245e4f2c01
[Feature] Batch Invariant: Support DeepGEMM and Blackwell ( #27127 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-18 09:28:05 -04:00
iAmir97
1d165d6d85
[Chore] Separate out vllm.utils.mem_utils ( #27143 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 10:06:59 +00:00
dongbo910220
83004020fd
[Test] Add test for /health endpoint on engine failure ( #26074 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-18 09:59:05 +00:00
Chendi.Xue
12e21701e7
[DOC][FEATURES][CPU]update cpu feature for v1 ( #27135 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-18 01:10:45 -07:00
Varun Sundar Rabindranath
30a33b92ee
[Misc] Rev DeepEP ( #27122 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-18 14:54:29 +08:00
Hanchenli
7c572544e4
[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot ( #25515 )
...
Signed-off-by: Hanchenli <lihanc2002@gmail.com >
Signed-off-by: Hanchenli <61769611+Hanchenli@users.noreply.github.com >
Signed-off-by: Wei Wei <wwei6@meta.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wei Wei <wwei6@meta.com >
Co-authored-by: Wei Wei <weiweinpu@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-17 21:55:54 -07:00
Huamin Li
c312320764
[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests ( #26663 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-17 21:11:26 -07:00
ZiTian Zhao
c981f0ea78
[Perf] Add H100 fused MoE config ( #25398 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
2025-10-18 02:21:27 +00:00
Lehua Ding
6367bde739
[BugFix][Core] Fix error when enable async-scheduling in multi-node env ( #25887 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
Signed-off-by: Lehua Ding <lehuading@qq.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-10-17 22:16:18 +00:00
Wentao Ye
f50cc221ea
[Test] Make test_failure more stable for batch invariance ( #27054 )
2025-10-17 16:59:08 -04:00
Pradyun92
acedc74b1a
[V1][Spec Decode] Fix greedy temperature detection after sampler refactor ( #27077 )
...
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com >
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com >
2025-10-17 13:27:47 -07:00
Zhuohan Li
d29483b58a
[Minor] Remove unnecessary error message ( #27115 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-17 20:02:12 +00:00
Michael Goin
950cf9e58e
[Bugfix] Use PIECEWISE cudagraphs on Blackwell if max_model_len > 131072 ( #27114 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-17 19:47:18 +00:00
Isotr0py
3125d79950
[Chore] Remove unused PolyNorm layer ( #27110 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-17 19:03:43 +00:00
vllmellm
e33ee23ee3
[Bugfix] [AITER] [ROCm] Fix Quark MoE Quant Config and AITER Fused MoE quant type logic ( #27029 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-17 12:51:10 -06:00
rasmith
b10c64c834
[ROCm][Bugfix][Model] Fix illegal memory access when running qwen3_moe models with rms_norm (Qwen3-235B-A22B, Qwen3-30B-A3B, etc.) ( #26192 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
Signed-off-by: rasmith <Randall.Smith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 14:17:18 -04:00
Aleksandr Malyshev
0925b28a8e
[ROCM] MoE fp4 CK kernel ( #26545 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-10-17 14:06:33 -04:00
Nicolò Lucchesi
99722d5f0e
[CI] Remove forbidden slash ( #27112 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-17 09:38:00 -07:00
燃
4c91a28e30
[bugfix] Qwen3-VL fix video incorrect timestamp calculations while do_sample_frames=True ( #27104 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
2025-10-17 16:26:33 +00:00
Patrick von Platen
b038d9c40c
[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) ( #26367 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-17 08:24:42 -07:00
Nicolò Lucchesi
2ba60ec7fe
[CI] Nixl integration tests ( #27010 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-17 07:13:31 -07:00
Luka Govedič
bd7157a071
[torch.compile] Enable attention and allreduce fusion without custom ops enabled ( #24604 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 08:10:23 -06:00
Yongtao Huang
be429d0cfd
Fix incorrect docstring for stop_profile() method ( #27101 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-17 06:30:23 -07:00
Reima Karhila (AMD)
c253745eb8
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 ( #25586 )
...
Signed-off-by: Reima Karhila <reima.karhila@amd.com >
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com >
Co-authored-by: xaguilar <Xavier.AguilarFruto@amd.com >
2025-10-17 04:56:12 -07:00
Jee Jee Li
daec4d2624
[Model]Improve Qwen3VLMoeForConditionalGeneration packed_modules_mapping ( #27096 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-17 04:47:00 -07:00
Harry Mellor
6c9fdbf725
[Docs] Replace rst style double-backtick with md single-backtick ( #27091 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:47:34 -07:00
Harry Mellor
483ea64611
[Docs] Replace all explicit anchors with real links ( #27087 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:22:06 -07:00
Mengqing Cao
e20eba753b
[VLM][Refactor] Remove useless func get_input_positions in MRotaryEmbedding ( #27088 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-17 02:00:30 -07:00
cong-meta
bbc1b29665
Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage ( #27069 )
...
Signed-off-by: cong-meta <prowindy@hotmail.com >
2025-10-17 01:53:06 -07:00
Chauncey
acb1bfa601
[CI] fix docs build failed ( #27082 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-17 07:53:40 +00:00
zhrrr
75c7ad9918
[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel ( #26717 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
2025-10-17 07:30:35 +00:00
Li, Jiang
5550ff9c25
[CI/Build] Update compressed tensor test path to fix CPU CI ( #27068 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-16 22:34:56 -07:00
Said Taghadouini
3aeb19a39e
[Model] Add support for LightOnOCR ( #26916 )
...
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com >
Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-17 05:05:24 +00:00
Cyrus Leung
8c017b3490
[Model] Always use Transformers backend for PaliGemma and Gemma3-MM ( #26715 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-17 05:03:35 +00:00
Zhewen Li
9c2c2287a0
[CI/Build] Update Llama4 eval yaml ( #27070 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-17 04:59:47 +00:00
Jee Jee Li
fec2b341ad
[Kernel] Lazy import FlashInfer ( #26977 )
2025-10-17 04:48:18 +00:00
Jee Jee Li
87bc0c492f
[Bugfix] Fix ReplicatedLinearWithLoRA ( #27065 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-17 04:43:16 +00:00
Nick Hill
fe3b9372ad
[Core] Change execute_model_with_error_logging() to be a ctx manager ( #27060 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-17 11:45:32 +08:00
Tao He
bde9e2272a
[Bugfix][Qwen] fixes the weights dtype in qwen3_next: it is actually a bfloat16 ( #27030 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
2025-10-17 03:37:52 +00:00
Boyuan Feng
08405609cc
disable graph partition in custom op ( #26952 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 11:08:47 +08:00
Nick Hill
ab81379ea6
[Perf] Exploit out-of-band buffers in shm_broadcast ( #26961 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-16 20:08:03 -07:00
Harry Mellor
4ffd6e8942
[Docs] Reduce custom syntax used in docs ( #27009 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 20:05:34 -07:00
Tomas Ruiz
965c5f4914
vllm bench serve shows num of failed requests ( #26478 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2025-10-16 19:55:09 -07:00
Lukas Geiger
4d055ef465
Remove unused imports ( #26972 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-16 19:51:17 -07:00
Boyuan Feng
17c540a993
[torch.compile] fix simple inductor graph partition test ( #27050 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-16 21:09:36 -04:00
Cyrus Leung
4d4d6bad19
[Chore] Separate out vllm.utils.importlib ( #27022 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-17 00:48:59 +00:00
Lucia Fang
11ae016bd7
[torch.compile] Passing only necessary compilation config to inductor pass config ( #27041 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-10-17 00:01:52 +00:00
jiahanc
41d3071918
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel ( #26714 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-16 16:20:25 -07:00
Harry Mellor
fb5e10d3fb
Refactor Transformers backend to use mixins ( #26906 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 21:50:39 +00:00
Bram Wasti
b2f78cbad4
[small][batch invariance] Rename the env and internal flags to simplify usage ( #26855 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
2025-10-16 21:40:25 +00:00
Wentao Ye
23583ee28c
[Bug] Add Assertion for random-input-len / random-output-len ( #26834 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-16 21:36:39 +00:00
Michael Goin
01c977e96d
[CI] Prune Quantization Tests and skip compilation ( #27038 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-16 17:26:35 -04:00
Wentao Ye
b3dda72c23
[Feature] Migrate DeepGEMM API from get_m_alignment_for_contiguous_layout to get_mk_alignment_for_contiguous_layout ( #26935 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-16 16:46:48 -04:00
Varun Sundar Rabindranath
fb0571b077
[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels ( #25997 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-16 12:53:11 -07:00
Wentao Ye
2ed8b6b3d0
[Bug] Fix batch invariant test has to is ( #27032 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-16 19:45:14 +00:00
kimbochen
013abde6ef
Adding Warmup to Benchmark Serving ( #26943 )
...
Signed-off-by: Kimbo Chen <chentenghung@gmail.com >
2025-10-16 12:44:32 -07:00
Kyle Sayers
a5464dcf92
[Compressed Tensors] Always clone output for compile robustness ( #26849 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-16 19:29:59 +00:00
Mandy Li
ac3ed5a815
Support block size of 256 used by Intel HPU ( #26883 )
...
Signed-off-by: mandy-li <mandy.j.li@intel.com >
2025-10-16 15:10:57 -04:00
Andrew Xia
e6ba2000ae
[gpt-oss][1/N] EZ: refactor serving_responses for modularity ( #26948 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2025-10-16 18:44:06 +00:00
Harry Mellor
aa255ff55a
Support set in the CLI generation ( #27031 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 18:07:18 +00:00
ZiTian Zhao
7bb736d00e
Fix Qwen2.5 VL image grid docstring ( #27033 )
...
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com >
2025-10-16 09:57:36 -07:00
Jee Jee Li
9f4e30904b
[Model] Fix Qwen3VL mm mapping ( #27027 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-16 09:45:59 -07:00
rongfu.leng
5afd3276df
[Feature] Add process_weights_after_loading to AttentionImpl ( #26870 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-10-16 08:02:30 -07:00
Tahsin Tunan
43721bc67f
[CI] Replace large models with tiny alternatives in tests ( #24057 )
...
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 15:51:27 +01:00
Kay Yan
02d709a6f1
[docs] standardize Hugging Face env var to HF_TOKEN (deprecates HUGGING_FACE_HUB_TOKEN) ( #27020 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-10-16 15:31:02 +01:00
Mark McLoughlin
4a510ab487
[NIXL] Improve request_finished() debug logs ( #25665 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-16 15:55:17 +02:00
Matthew Bonanni
314fa8abbf
[Attention] Tune CUTLASS MLA num_splits ( #26846 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-16 06:36:09 -07:00
Cyrus Leung
334535b6fb
[Benchmark] Show E2EL by default for pooling models ( #27014 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 12:47:09 +00:00
bogdanm
dcbb3f1871
[Bugfix] Correct LayerNorm epsilon parameter in modernbert.py ( #27008 )
...
Signed-off-by: bogdanm <152898065+bogdan01m@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-16 12:27:44 +00:00
Sungjae Lee
00417f4e44
[MISC] fix import violations for re and triton modules ( #26654 )
...
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com >
Co-authored-by: Mengqing Cao <cmq0113@163.com >
2025-10-16 03:38:27 -07:00
Lukas Geiger
ed344f4116
Cleanup code after Python 3.10 upgrade ( #26520 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-16 03:38:23 -07:00
CSWYF3634076
e51928793e
[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization ( #26885 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-16 03:37:35 -07:00
Cyrus Leung
d2740fafbf
[Chore] Separate out vllm.utils.collections ( #26990 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 08:35:35 +00:00
Cyrus Leung
17838e50ef
[Benchmark] Use truncation by default for pooling benchmarks ( #26992 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 16:02:39 +08:00
Zhewen Li
44c8555621
[CI/Build] Fix AMD import failures in CI ( #26841 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-16 07:28:20 +00:00
Akash kaothalkar
f7d318de2b
[Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling ( #26987 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-15 22:36:59 -07:00
Cyrus Leung
76f0d05bc6
[CI/Build] Update expected beam search output for Phi3V ( #26978 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 05:12:44 +00:00
Bram Wasti
7d8975de84
Deepseek-v3 Batch Invariant on 8xH100 ( #26609 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 22:06:02 -07:00
Vadim Gimpelson
785d8b6410
[PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) ( #26437 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-10-16 12:18:31 +08:00
Cyrus Leung
f6cdc9a02f
[Chore] Rename utils submodules ( #26920 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 03:58:13 +00:00
Chendi.Xue
509cdc0370
[DOC][XPU]update feature parity with Intel GPU ( #26954 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-15 20:07:10 -07:00
Richard Zou
9b6504c307
[BugFix] Work around graph partition x torch.compile cache issue ( #26956 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-10-15 20:06:11 -07:00
Angela Yi
e19b16dde6
[bugfix] Fix SP + PP without specifying compile size ( #26955 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 20:05:33 -07:00
ahao-anyscale
582f2c6be7
[BUG] Allow runai_streamer_sharded in config check ( #26958 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-10-15 20:05:14 -07:00
Michael Goin
f8a0acbdbe
[CI] Enable Blackwell Llama4 MoE tests ( #26731 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-15 21:02:57 -06:00
kliuae
1317034379
[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops ( #24097 )
...
Signed-off-by: chenjun <junchen2@amd.com >
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-10-16 10:41:34 +08:00
InChang Jeong
0ecc553ee6
[Bugfix] reasoning_parser parameter handling in run_batch.py ( #26225 )
...
Signed-off-by: inc-jeong <inc.jeong@navercorp.com >
Signed-off-by: InChang Jeong <inc.jeong@navercorp.com >
Co-authored-by: USER <user@AL02367916.local >
2025-10-16 10:24:05 +08:00
felixzhu555
f96bc3649c
[Qwen3-Next] Add tuned MoE config for Qwen3-Next FP8 on H100 tp2 ( #26887 )
...
Signed-off-by: Felix Zhu <felixzhu555@gmail.com >
2025-10-15 18:55:05 -07:00
Alexei-V-Ivanov-AMD
938c43ea7f
[ci] Adjusting AMD test composition 2025-10-14 ( #26852 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-15 23:52:13 +00:00
Adrian Abeyta
0a9ef0cfce
Move query quantization to attention layer for Flashinfer & Triton. ( #26534 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: Adrian Abeyta <aabeyta@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-15 19:01:38 -04:00
Wentao Ye
e5b438a247
[Bug] Temporally Disable VLLM_ALLREDUCE_USE_SYMM_MEM by Default ( #26925 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-15 16:18:50 -04:00
XiaobingZhang
0b99f5d302
support flashinfer_fp4 moe for 5090 gpu ( #26669 )
...
Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-15 15:06:47 -04:00
Benji Beck
1f491aa0c8
Vectorize RMS norm variance using vectorize_read_with_alignment ( #26234 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 11:54:41 -07:00
Kaixi Hou
de92d916fe
[NVIDIA] Add support for cudnn fp4 gemm via flashinfer ( #26107 )
...
Signed-off-by: kaixih <kaixih@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-15 13:53:00 -04:00
Woosuk Kwon
a1063628a4
[Chore] Clean up CODEOWNERS ( #26923 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-10-15 10:52:54 -07:00
XiaobingZhang
d796375258
[ModelOpt] Remove NVFP4 MoE K%16==0 constraint ( #26891 )
...
Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com >
2025-10-15 13:06:17 -04:00
Sam/Samuel
14f8456344
[Feature]: Use pydantic validation in observability.py config ( #26637 )
...
Signed-off-by: Samuel Wu <cernunnos1710@gmail.com >
Signed-off-by: Sam/Samuel <57896620+cern1710@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 16:44:03 +00:00
Pradeep Dasigi
4794c2bd92
Olmo 3 tool parser and tests ( #26143 )
...
Signed-off-by: Pradeep Dasigi <pradeepd@allenai.org >
2025-10-15 16:36:12 +00:00
Harry Mellor
d3cbaa08dc
Lower sevarity of log when model info cache misses due to exception ( #26917 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 09:01:09 -07:00
Cyrus Leung
828523ad8e
[Chore] Separate out vllm.utils.async_utils ( #26913 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 15:33:00 +00:00
Cyrus Leung
136a17fe6e
[Chore] Separate out vllm.utils.func ( #26904 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 13:03:58 +00:00
Boyuan Feng
f57438338d
[BugFix] Patch inductor memory plan logic ( #26878 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 12:51:45 +00:00
Max Wittig
5d598680e3
chore: remove unused marker ( #26890 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com >
2025-10-15 05:40:33 -07:00
wangxiyuan
8f4b313c37
[Misc] rename torch_dtype to dtype ( #26695 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-15 12:11:48 +00:00
Cyrus Leung
f93e348010
[Misc] Remove isort and yapf ignores ( #26888 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 12:09:03 +00:00
wang.yuqi
f54f85129e
[Model][2/N] Improve all pooling task | Support multi-vector retrieval ( #25370 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-15 11:14:41 +00:00
li2haipeng
d4d1a6024f
[Lora]Load tuned multi-lora kernel configs from json files ( #26319 )
...
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
Signed-off-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-15 09:45:14 +00:00
wangxiyuan
db1764e4e0
[Platform] allow platform to init dp group ( #22243 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-15 02:32:17 -07:00
Jialin Ouyang
7f83b4ee8e
[Easy] Get rid of unnecessary paraenthesis in kv_cache_manager ( #26842 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-15 09:17:43 +00:00
ant-yy
5c3bae1a6a
[Fix] Remove divisibility requirement between num_kv_heads and tp_size in bailing_moe ( #26876 )
...
Signed-off-by: vito.yy <vito.yy@antgroup.com >
2025-10-15 16:44:04 +08:00
Xudong Ma
5210dc3940
[Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops. ( #26853 )
...
Co-authored-by: Xudong Ma <mxd@meta.com >
2025-10-15 08:37:49 +00:00
youkaichao
650b51f9f9
[doc] add Context Parallel Deployment doc ( #26877 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-15 16:33:52 +08:00
Cyrus Leung
6256697997
[Doc] ruff format remaining Python examples ( #26795 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 01:25:49 -07:00
Wentao Ye
71557a5f7c
[CI] Fix mypy for vllm/executor ( #26845 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-15 01:23:33 -07:00
Zhewen Li
f3c378ffa7
[CI/Build] Add Qwen2.5-VL-7B-Instruct ChartQA Accuracy Tests in CI ( #21810 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com >
2025-10-15 08:09:56 +00:00
Yongye Zhu
f5ed68ef63
[Deepseek-V3.2][Kernel] Integrate cuda indexer k cache gather ( #26456 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2025-10-15 16:05:01 +08:00
Angela Yi
efdef57b1f
[bugfix] Lazy import cv2 ( #26869 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 07:47:50 +00:00
Cyrus Leung
b8a4572157
[Misc] Use helper function to generate dummy messages in OpenAI MM tests ( #26875 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 07:17:37 +00:00
Mengqing Cao
302ef403a2
[DSA][MLA] Tiny refactor on DeepSeek to make it reusable for different backends ( #26656 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-15 00:16:44 -07:00
sangho.lee
8865da157b
[Bugfix][Multi Modal] Fix incorrect Molmo token processing ( #26873 )
...
Signed-off-by: sanghol <sanghol@allenai.org >
2025-10-15 07:13:59 +00:00
Boyuan Feng
f0862eae43
[Graph Partition] pass tests for decorator ( #26831 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-15 06:39:48 +00:00
Isotr0py
8c851f6d04
[Bugfix] Fix qwen3-omni audio truncation issue ( #26815 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-15 05:38:36 +00:00
Angela Yi
7cfa420f49
[BugFix] Patch inductor partitioning logic ( #26735 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 05:04:32 +00:00
rongfu.leng
a27b288e4a
[Feature] default --extra-body param to disable thinking in vllm bench serve ( #26784 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-10-15 04:23:44 +00:00
zhrrr
e471d7ca7e
[CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR ( #26773 )
...
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-15 04:09:44 +00:00
Michael Yao
c43ca8259e
[Docs] Move build.inc into arm.inc ( #26862 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-10-14 20:35:08 -07:00
Tao Hui
85a65e7f51
[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972 ) ( #25589 )
...
Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: Tao Hui <taohui3@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-10-15 11:09:52 +08:00
kourosh hakhamaneshi
a2986b3e33
[Bugfix] Fixes prefix-repetition benchmark script ( #26828 )
...
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com >
2025-10-15 02:54:43 +00:00
Morrison Turnansky
96b9aa5aa0
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): name change compilation level to compilation mode, deprecation compilation level ( #26355 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-15 02:51:16 +00:00
Michael Goin
e66d787bce
Disable FlashInfer sampler by default ( #26859 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-15 02:35:18 +00:00
Chendi.Xue
bfad142e25
[BUGFIX][NIXL] quick fix for 'assert self.connector_worker is not None' in get_kv_connector_stats ( #26851 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-15 02:33:25 +00:00
Zhikaiiii
9354660036
[Bugfix]fix Qwen3 xml tool parser ( #26345 )
...
Signed-off-by: Zhikaiiii <1658973216@qq.com >
2025-10-15 09:50:30 +08:00
Jialin Ouyang
07ca70af8d
[Core][Easy] Use envs.__getattr__ for all Unify to environment variable access ( #26810 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-15 01:41:18 +00:00
Luka Govedič
2dcd12d357
[torch.compile] Fix tests for torch==2.9 inductor partition ( #26116 )
...
Signed-off-by: ProExpertProg <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-10-14 19:55:02 -04:00
Tyler Michael Smith
579d2e5458
[WideEP][P/D] Add usage stats for DP+EP and KV Connector ( #26836 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-14 23:51:54 +00:00
Ye Hu
0512c04aee
[frontend][gptoss] Add per turn stats into Harmony Context ( #25061 )
...
Signed-off-by: lacora <hyelacora@gmail.com >
Co-authored-by: Ye Hu <yehu@fb.com >
2025-10-14 16:48:13 -07:00
Michael Goin
7e0ef4084a
[CI Failure] Fix torchao dep failure for Quantization Test ( #26824 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 16:41:43 -07:00
Nick Hill
4aed506b65
[Core] Streamline some structured output related code ( #26737 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 23:27:44 +00:00
Boyuan Feng
a86b4c58e8
remove attn output view kernel ( #26680 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 22:53:10 +00:00
Nick Hill
ff4810ba73
[Minor] Group async_scheduling related fields in model runner init ( #26736 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 14:46:37 -07:00
Nan Qin
9d6964926e
fix: response_format for completion ( #23212 )
...
Signed-off-by: Nan2018 <qinnanjoshua@gmail.com >
2025-10-14 21:23:22 +00:00
Dhruvil Bhatt
0e65818910
Added MoE configs for llama 4, H200 device with tp=4/8 tuning ( #26837 )
...
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com >
2025-10-14 14:21:03 -07:00
Jialin Ouyang
380f17527c
[Perf] Cache vllm.env.__getattr__ result to avoid recomputation ( #26146 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 17:03:21 -04:00
HDCharles
b92ab3deda
Notice for deprecation of AutoAWQ ( #26820 )
...
Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 13:39:59 -07:00
Jialin Ouyang
acaa2c0a4a
[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs ( #24964 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 12:58:43 -07:00
Matthew Bonanni
82af928c41
[Attention][Spec Decode] FlashMLA spec decode support ( #26541 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-14 19:38:20 +00:00
Huamin Li
87efc681db
llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch ( #26790 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-14 11:54:12 -07:00
Michael Goin
c3a722fcb2
[CI Failure] Fix tests with missing TinyLlama-1.1B-Chat-v1.0-FP8-e2e ( #26816 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 18:38:59 +00:00
Ze'ev Klapow
aba48f7db1
[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 ( #26818 )
2025-10-14 11:20:39 -07:00
Michael Goin
04b5f9802d
[CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 ( #26722 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 10:52:05 -07:00
Reza Barazesh
efc8f7d814
Update coveragerc and add codecov.yml for path fixes ( #26435 )
...
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com >
2025-10-14 09:45:06 -07:00
Wentao Ye
6d87a2838c
[Config] Remove Unused Environment Variable VLLM_DISABLE_PAD_FOR_CUDAGRAPH ( #26743 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-14 11:47:49 -04:00
wang.yuqi
e6cdbd6792
Revert "[issues template] Encourage the author implement their own ideas" ( #26814 )
2025-10-14 08:37:34 -07:00
Chauncey
df850c4912
[Feature][Responses API] Stream Function Call - harmony ( #24317 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-14 08:31:43 -07:00
Qier Li
720394de43
[KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats ( #26046 )
...
Signed-off-by: Qier Li <kevin44036@gmail.com >
2025-10-14 14:38:07 +00:00
wang.yuqi
88a49745af
[issues template] Encourage the author implement their own ideas ( #26671 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-14 22:32:36 +08:00
Boyuan Feng
ca683a2a72
use combo kernel to fuse qk-norm and qk-rope ( #26682 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-14 09:40:59 -04:00
汪志鹏
e9f1b8c9e9
Adjusted the model order of the model registration file ( #26798 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-10-14 13:26:11 +00:00
Jaya Yuan
ea97940d6c
[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention ( #24864 )
...
Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com >
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com >
Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com >
2025-10-14 13:07:50 +00:00
Jee Jee Li
fdd32750f0
[CI/Build] Cleanup LoRA test ( #26752 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-14 12:06:35 +00:00
Vladislav Bronzov
c715ba3735
[Feature] Change vllm.py with pydantic validation ( #26726 )
...
Signed-off-by: Vladislav <vladislav.bronzov@gmail.com >
Signed-off-by: Vladislav Bronzov <58587565+VladOS95-cyber@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-14 12:00:54 +00:00
Cyrus Leung
9c4cb68339
[Chore] Remove SupportsV0Only interface and update supported models docs ( #26783 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 04:55:10 -07:00
Chauncey
780eb03d9b
[CI] Fix test_tool_id_kimi_k2 ( #26787 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-14 10:27:07 +00:00
Cyrus Leung
ef9676a1f1
[Doc] ruff format some Python examples ( #26767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 03:21:53 -07:00
Harry Mellor
70b1b330e1
Don't allow typos to fix by default ( #26785 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-14 03:05:15 -07:00
Cyrus Leung
d1d063a588
[Chore] Use max_transformers_version for Qwen-VL test ( #26792 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 03:03:46 -07:00
Chendi.Xue
7e6edb1469
[NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode ( #26556 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-14 09:46:05 +00:00
Cyrus Leung
74704d4553
[Model] Use merge_by_field_config for MM models (O-P) ( #26776 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 09:42:45 +00:00
Cyrus Leung
d2f816d6ff
[Bugfix] Standardize merging multimodal embeddings ( #26771 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 09:36:21 +00:00
wangxiyuan
577d498212
[Plugin] Make plugin group clear ( #26757 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-14 07:49:59 +00:00
Max Wittig
fd85c9f426
[Bugfix][FE]: Always include usage with --enable-force-include-usage ( #20983 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com >
Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com >
Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com >
2025-10-14 09:17:39 +02:00
Ye (Charlotte) Qi
d32c611f45
[CI/Build] Use 127.0.0.1 instead of localhost in utils ( #26750 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-14 07:04:00 +00:00
CSWYF3634076
01ad27faff
[Model][Bugfix]fix ernie45 load failed due to ernie45 eplb code ( #26684 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-14 06:55:23 +00:00
Ryan Li
481545b397
scheduler.py: Update the name of the default scheduler. ( #26758 )
...
Signed-off-by: Ryan Li <ryanli@ryanli.org >
2025-10-14 06:52:21 +00:00
Alexei-V-Ivanov-AMD
d3cc8427c0
[ci] Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR) ( #26718 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-13 23:10:23 -07:00
vllmellm
4821ac1b4d
[CI] [ROCm] Automate CC list for ROCm related issue ( #26753 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-14 13:57:26 +08:00
XiongfeiWei
4497c8f821
Fix lora tests failure in TPU CI due to the removal of LoRA bias ( #26723 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-10-14 13:04:23 +08:00
Michael Yao
2e36cdbe2b
[Docs] Add a start tag to build.inc.md ( #26747 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-10-13 21:51:55 -07:00
Maximilien de Bayser
fe3edb4cf0
Add support for the /rerank endpoint in vllm bench serve ( #26602 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2025-10-14 04:25:43 +00:00
Heng Guo
29350922c6
[Feature][Quantization] auto_round format add support for regex ( #24024 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com >
Signed-off-by: Heng Guo <heng.guo@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 03:03:16 +00:00
Varun Sundar Rabindranath
8ae169286f
[torch.compile] Unwrap fused_marlin_moe custom op ( #26739 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-14 02:22:16 +00:00
youkaichao
8a0af6a561
[build][torch.compile] upgrade depyf version ( #26702 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-14 10:12:09 +08:00
Jialin Ouyang
cfded80793
[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE ( #26742 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 01:46:44 +00:00
Angela Yi
b59dd19b55
[compile] Enable sequence parallelism for full cuda graph without specifying compile sizes ( #26681 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-13 18:15:34 -07:00
Michael Goin
3e051bda82
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend ( #26732 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-13 18:12:52 -07:00
Lucia Fang
8317f72354
[Misc][DP] support customized aggregated logger for dp ( #24354 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-10-13 17:45:59 -07:00
Maximilien de Bayser
d8bebb008a
Add tests for chunked prefill and prefix cache with causal pooling models ( #26526 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Ayush Singh <ayush1009208@gmail.com >
2025-10-14 07:45:04 +08:00
Jialin Ouyang
35bc22f23c
[ResponseAPI] Further polish message serialization and unit tests ( #26728 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-13 23:31:35 +00:00
Fardin Hoque
fa96fb9c70
Pruning kernel Core Tests ( #26727 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
2025-10-13 23:08:18 +00:00
Morrison Turnansky
e3fdb627d9
[FrontEnd] UNREVERT CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops ( #26502 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
2025-10-13 22:47:16 +00:00
Wentao Ye
7200a21cd1
[Bug] Fix Assertion error DeepEP/csrc/kernels/intranode.cu:928: 'false and Unsupported type' ( #26532 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-13 18:26:37 -04:00
Fardin Hoque
577c72a227
[CI Perf]Prune Tests in kernel/mamba ( #26538 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-13 18:22:31 -04:00
Wentao Ye
314285d4f2
[CI] Fix mypy for vllm/distributed ( #26593 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-13 16:02:24 -04:00
wang.yuqi
d2a7938582
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). ( #26414 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-13 19:06:43 +00:00
Alex Kogan
89342ce4c0
[Quantization] [Performance] Enable Marlin GEMM kernels for the calibration-free RTN-based quantization ( #26051 )
...
Signed-off-by: Alex Kogan <alex.kogan@oracle.com >
Signed-off-by: Alex Kogan <82225080+sakogan@users.noreply.github.com >
2025-10-13 18:52:54 +00:00
Yibo Cai
f89f599395
[CI][Release][Arm64]: Build arm64 release for gpu arch 8.9 ( #26698 )
2025-10-13 18:42:12 +00:00
Wentao Ye
e251e457c5
[Log] Optimize Startup Log ( #26601 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-14 02:06:57 +08:00
Cyrus Leung
afc47e4de7
[Model] Use merge_by_field_config for MM models (M-N) ( #26710 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 01:27:01 +08:00
Rahul Tuli
e3b90c1ba2
[Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py ( #26590 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-10-13 17:17:13 +00:00
haoyangli-amd
134f70b3ed
[Bugfix][Rocm] fix qr error when different inp shape ( #25892 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-10-13 10:04:21 -07:00
Sangyeon Cho
a1b2d658ee
[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 ( #26501 )
...
Signed-off-by: Sangyeon Cho <josang1204@gmail.com >
2025-10-13 12:58:33 -04:00
Aleksei Tsvetkov
5c7fe25491
[Misc] Separate prompt logging to debug ( #26713 )
...
Signed-off-by: Aleksei Tsvetkov <aitsvet@ya.ru >
2025-10-13 09:04:18 -07:00
Will Eaton
53c9a7cee2
[P/D] [NixlConnector] kv load recovery integration ( #26171 )
...
Signed-off-by: Will Eaton <weaton@redhat.com >
2025-10-13 08:48:04 -07:00
Michael Goin
0d21b9b51e
[UX] Speedup DeepGEMM warmup with heuristics ( #25619 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-13 07:59:27 -07:00
Anand Roy
10214b6935
[FEATURE]: Use pydantic validation in multimodal.py config ( #26629 )
...
Signed-off-by: Anand Roy <86306690+andycandy@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-13 07:56:59 -07:00
ihb2032
4a61950f4d
[Hardware][CPU] Disable torch.compile for RISC-V to prevent APIError ( #26693 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn
2025-10-13 07:56:01 -07:00
Bram Wasti
3263799056
[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] ( #26373 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
2025-10-13 10:24:53 -04:00
Isotr0py
8e67b2557a
[Bugfix] Fix out of bound index issue for Jina-embedding-v3 RoPE with cuda graph ( #26687 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-13 03:21:48 -07:00
Jialin Ouyang
4073c82c4e
[ResponseAPI] Simplify input/output message serialization ( #26620 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-13 09:59:15 +00:00
wang.yuqi
767c3ab869
[Model][0/N] Improve all pooling task | clean up ( #25817 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-13 16:44:50 +08:00
Harry Mellor
4f207c7174
Ignore large reformatting PRs in git blame ( #26690 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-13 01:20:47 -07:00
CSWYF3634076
782505ed8e
[Model] Add reasoning_parser and tool_parser for Ernie45 thinking ( #25027 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-13 15:55:20 +08:00
Jee Jee Li
98f30b8cba
[Model] Fix Skywork R1V mlp ( #26673 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-12 22:42:17 -07:00
yihong
3cd36660f7
docs: wrong command in structured_outputs README ( #26677 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-10-12 20:59:01 -07:00
yyzxw
46ad73955a
[FIX] Throwing an exception when the model does not support pool tasks ( #25840 ) ( #25855 )
...
Signed-off-by: zxw <1020938856@qq.com >
Co-authored-by: wang.yuqi <noooop@126.com >
2025-10-12 20:56:21 -07:00
quanliu
41f3884438
[Bugfix][Core]Fix block table out-of-range issue in priority scheduling ( #26661 )
...
Signed-off-by: quanliu <18646313696@163.com >
2025-10-13 01:25:42 +00:00
bnellnm
60e419c1ee
[Misc] cache result of disable_inplace ( #26666 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-13 00:17:50 +00:00
Michael Goin
7ef6052804
[CI/Build] Add tool to build vllm-tpu wheel ( #19165 )
...
Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-12 16:25:40 -06:00
Huamin Li
4fca1a1bd2
[easy] fix pre commit error on trunk ( #26665 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-12 21:25:34 +00:00
Lukas Geiger
a6049be73c
[Models][Qwen3VL] Speedup fast_pos_embed_interpolate ( #26647 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-13 01:20:07 +08:00
gjgjos
18ed7746ea
[Feature] Add support for naver/splade-v3 (BERT-based sparse embedding model) ( #26339 )
...
Signed-off-by: gjgjos <gjgjos@naver.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-12 17:00:52 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-12 09:51:31 -07:00
Chendi.Xue
9bb38130cb
[Bugfix] Fix GPU_ID issue in test script ( #26442 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-12 11:39:05 +00:00
Jaya Yuan
b91d8db873
[Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP ( #26574 )
...
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com >
2025-10-12 09:58:38 +00:00
Isotr0py
045b396d09
[Bugfix][CI/Build] Fix failing Mteb CI ( #26638 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-12 02:42:42 -07:00
wang.yuqi
76852017ea
[MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank ( #25867 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-12 09:29:08 +00:00
Vadim Gimpelson
82e64c7a20
[PERF] [Qwen3-next] Speed up gated RMSNorm ( #26207 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-12 08:27:50 +00:00
wang.yuqi
4ca204055e
Add @noooop to codeowner for pooling models ( #26652 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-12 14:04:44 +08:00
Haisheng Chen
c5c8f5ea59
[EPLB] Support ernie4.5-moe ( #22100 )
...
Signed-off-by: Haisheng Chen <langzs335@outlook.com >
Signed-off-by: Haisheng Chen <60504847+HsChen-sys@users.noreply.github.com >
Signed-off-by: Haisheng Chen <hac048@ucsd.edu >
Co-authored-by: Haisheng Chen <langzs335@outlook.com >
2025-10-12 10:40:47 +08:00
Angela Yi
01653a917b
[compile] Fix inductor partition config ( #26645 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-11 21:03:14 +00:00
Huamin Li
0cd103e7cb
CP: make correct_attn_out robust to 4‑D views and fix Triton arg binding ( #26509 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-11 20:50:57 +00:00
Cyrus Leung
5be7ca1b99
[Benchmark] Support Infinity API ( #26641 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-12 01:45:32 +08:00
Jee Jee Li
f0a30a067b
[Bugfix] Fix qwen-moe packed_modules_mapping ( #26634 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-11 15:21:33 +00:00
JJJYmmm
9d6cff3ede
[Bugfix][Qwen3VL] fix deepstack in qwen3vl ( #26626 )
...
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com >
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com >
2025-10-11 05:58:33 -07:00
Angela Yi
a25f2adee9
[compile] Add patched_fused_scaled_matmul_reduce_scatter ( #26604 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-11 05:44:43 -07:00
Chauncey
d0bed837ac
[Refactor]Reduce duplicate code in serving_chat ( #26627 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-11 12:04:49 +00:00
muzian666
f7ee69868a
[CPU] fix the issue when the node is '-' cause json decode error. ( #26562 )
...
Signed-off-by: muzian666 <andylee_2001@163.com >
Co-authored-by: qingan.li <qingan.li@wizpresso.com >
2025-10-11 12:04:04 +00:00
Rahul Tuli
d2a71530c1
Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE ( #26485 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-10-11 10:14:41 +00:00
ihb2032
086609de64
fix(nix): Allow local oneDNN path to fix vLLM CPU build failure ( #26401 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
2025-10-11 09:12:16 +00:00
dsinghvi
727144bed1
[Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py ( #24172 )
...
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com >
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: wwl2755 <wangwenlong2755@gmail.com >
2025-10-11 07:21:04 +00:00
sangho.lee
55392bc879
[Bugfix][Multi Modal] Fix incorrect Molmo image processing ( #26563 )
...
Signed-off-by: sanghol <sanghol@allenai.org >
2025-10-10 22:28:23 -07:00
Roger Wang
ddaff2938e
[MM] Move Qwen3Omni MRoPE impl to model file ( #26608 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-10 22:17:24 -07:00
liuzhenwei
27ed39a347
[XPU] Upgrade NIXL to remove CUDA dependency ( #26570 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-10-11 05:15:23 +00:00
Nishidha Panpaliya
8f8474fbe3
[CI/Build] Fix ppc64le CPU build and tests ( #22443 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
2025-10-11 13:04:42 +08:00
Chauncey
be067861c6
[Frontend] Improve the performance of is_reasoning_end ( #25735 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-11 10:43:39 +08:00
Nick Hill
5bc26c438d
[BugFix] Make penalties and bad_words work with async scheduling ( #26467 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-10 23:27:04 +00:00
Zhengxu Chen
eef921f45e
AOT Compilation for torch.compile (Bundled) ( #24274 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-10-10 19:02:11 -04:00
Bram Wasti
e317414ce1
Cache the environment variable check for batch invariance ( #26510 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
2025-10-10 22:47:34 +00:00
Nick Hill
949cb0170d
[BugFix] Fix async scheduling + request preemption ( #26385 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-10 20:29:57 +00:00
Vadim Gimpelson
e94cfd51da
[BUG] Qwen3-next MTP. Fix attn metadata build bug ( #26564 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-10-10 14:59:03 -04:00
Harry Mellor
7c12763b24
Fix some typing issues found by mypy==1.18.2 ( #26596 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-10 18:21:25 +00:00
Will Eaton
3b780a4bbb
Update CUDA architecture list in build pipeline for 12.9.1 wheels ( #26592 )
...
Signed-off-by: Will Eaton <wseaton@users.noreply.github.com >
2025-10-10 11:15:27 -07:00
Harry Mellor
30f78af147
Update pre-commit hook versions ( #26591 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-10 17:03:44 +00:00
Xiong Wang
19a9b169bf
Add Qwen3-Omni moe thinker ( #25550 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Xiong Wang <feizi.wx@alibaba-inc.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-10 17:00:56 +00:00
Roberto L. Castro
96ad65b7fe
[Transform] [Quantization] Add QuTLASS support to vLLM ( #24440 )
...
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
Signed-off-by: Andrei Panferov <andrei@panferov.org >
Co-authored-by: Andrei Panferov <andrei@panferov.org >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-10 09:43:40 -07:00
Shane A
8d2b8c0ff2
[Model] Add FlexOlmo model implementation ( #24923 )
...
Signed-off-by: Shane A <shanea@allenai.org >
2025-10-10 09:43:15 -07:00
Lukas Geiger
b2155ed317
[Model][Qwen3VL] Compute cu_seqlens on CPU to remove ( #26496 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-10 09:42:17 -07:00
Chauncey
910abdbd08
[Bugfix] fixed top_logprobs: -1 does not appear to work as intended ( #26470 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-11 00:41:17 +08:00
baonudesifeizhai
cddce79fda
[torch.compile] Make inductor partition rules respect splitting_ops #25691 ( #25845 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-10 16:35:28 +00:00
Mark McLoughlin
e519281920
[Metrics] Add test for multi-modal cache stats logging ( #26588 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-10 16:00:50 +00:00
Elvir Crnčević
7b03584de8
Silu v2 ( #25074 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: elvircrn <elvircrn@gmail.com >
Signed-off-by: Elvir Crnčević <elvircrn@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com >
2025-10-10 15:19:53 +00:00
Sage Moore
ae9d0e7da5
[Bugfix] Make DP padding optional in coordinate_batch_across_dp ( #26375 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-10-10 10:53:33 -04:00
Daniel Cámpora
0e67102d93
Added test_top_k_per_row to test-pipeline.yaml. ( #26569 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-10 10:48:33 -04:00
Jason Li
f4ba2061cf
[BugFix][torch.compile] Fix fused_scaled_matmul_reduce_scatter signature for PyTorch 2.8 ( #26038 )
...
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com >
Signed-off-by: <>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-10 07:42:13 -07:00
Chauncey
1e6848a65d
[CI] fix test_run_batch.py::test_completions - AssertionError ( #26578 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-10 22:16:28 +08:00
Andy Lo
67661375fa
[BugFix] Fix noop elimination edge case ( #26394 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-10-10 13:33:04 +00:00
Lucas Kabela
213b64452a
[Bugfix] Convert untraceable GroupShape to list for AMD impl ( #26535 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2025-10-10 13:32:29 +00:00
Mark McLoughlin
784c231151
[NIXL] Ignore abort on already-finished request ( #25067 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-10 12:21:56 +02:00
Chen Zhang
606b00e80f
[bugfix][DCP] fix block_size of hash in DCP prefix caching ( #26296 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-10 03:02:49 -07:00
Chauncey
720d3cd0f0
[CI] fix ruff format ( #26579 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-10 03:02:12 -07:00
Ashwin Phadke
ab196edefb
Remove LoRA bias support ( #25807 )
...
Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com >
Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-10 09:50:33 +00:00
Luis Tomas Bolivar
3ee202ea1e
[GPT-OSS] Add support for arrays at tool message content ( #25593 )
...
Signed-off-by: Luis Tomas Bolivar <ltomasbo@redhat.com >
2025-10-10 09:00:45 +00:00
Cyrus Leung
ad430a67ca
[Metrics] Log multi-modal cache stats and fix reset ( #26285 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-10 01:45:55 -07:00
Chen Zhang
6f0f570c43
[deepseek] kernel block size for UniformTypeKVCacheSpecs ( #26559 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-10 16:40:41 +08:00
Boyuan Feng
b545a0b207
fix test_simple_inductor_graph_partition ( #26522 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-10 06:39:19 +00:00
Lucas Wilkinson
29255cfc3b
[Spec-Decode] Support piecewise cudagraphs for Eagle head ( #25109 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-10-10 01:20:31 -04:00
Ben Browning
da4455609d
[Chore]: One pythonic tool parser test uses the wrong parser ( #26515 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-10-10 04:03:55 +00:00
Nick Hill
aafb99a4d4
[Core] Small simplification in GPUModelRunner._update_states() ( #26508 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-10 10:53:58 +08:00
Rui Qiao
757fa4a4da
[DP][ray] Support different VLLM_RAY_DP_PACK_STRATEGY ( #23849 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-09 19:53:43 -07:00
Julien Denize
c6187f55f7
Refactor MistralTokenizer ( #26358 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-10-09 22:48:58 +00:00
Wentao Ye
8983e0216f
[CI] Fix Pre-commit Issue Cannot determine type of "rank" and "world_size" ( #26448 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-09 15:16:48 -07:00
Wentao Ye
1ee35382cb
[Bug] Fix modular_kernel: ZeroDivisionError: integer division or modulo by zero ( #26528 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-09 15:13:27 -07:00
Benjamin Chislett
6e783bc54b
[Bugfix] Fix CUDA graph selection bug in FlashInfer at high concurrency ( #26499 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-09 17:12:34 -04:00
Michael Goin
c9d33c60dc
[UX] Add FlashInfer as default CUDA dependency ( #26443 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-09 14:10:02 -07:00
Nick Hill
2e54db4d2b
[Core] Remove unused prev_sampled_token_ids_invalid_indices input batch field ( #26514 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-09 20:22:14 +00:00
elvischenv
44f633dba1
[Flashinfer][gpt-oss] Support FP8-qkv Flashinfer TRTLLM Sinks Attention ( #25674 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2025-10-09 16:13:39 -04:00
bnellnm
a462331e36
[Bugfix] Disable moe inplace for torch >= 2.9 ( #26497 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-09 18:07:38 +00:00
roikoren755
4069db3f2e
[Bugfix] Enable padded FP4 quantization ( #25947 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2025-10-09 10:59:41 -07:00
Sage Moore
0d37450eb7
[BUGFIX] Add cu_tokens_across_sp to DPMetadata ( #26457 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-10-09 17:13:56 +00:00
bnellnm
47e66c24e2
[Model] Apply shared experts overlap optimization to all models with shared experts ( #26145 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-09 11:31:04 -04:00
Ming Yang
3b736e1c38
[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 ( #25049 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-10-09 08:06:29 -07:00
Lukas Geiger
2c1c7dfb35
[Models][Qwen] Replace pad with cat for better performance ( #26486 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-09 14:51:26 +00:00
Harry Mellor
e246ad6f0c
Upgrade Pydantic to v2.12.0 and remove hack for Python 3.13 ( #26481 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-09 06:02:40 -07:00
Jiangyun Zhu
5728da11ea
Revert #26113 "[Frontend] CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops" ( #26472 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-09 05:43:55 -07:00
Simon Danielsson
92be3f3517
[Feature] Use pydantic validation in parallel.py config ( #26417 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-09 12:41:31 +00:00
Isotr0py
d1ddf340c8
[V0 deprecation] Remove QKVCrossParallelLinear implementation ( #26475 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-09 10:52:27 +00:00
Wenzheng Bi
ec10fd0abc
[Bugfix] Move current_platform import to avoid python import cache. ( #16601 )
...
Signed-off-by: iwzbi <wzbi@zju.edu.cn >
2025-10-09 10:46:19 +00:00
Lukas Geiger
0426e3c5e1
[Models][Qwen3VL] Optimise _validate_and_reshape_mm_tensor ( #26426 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-09 10:25:48 +00:00
Cyrus Leung
4bdf7ac593
[Bugfix] Fix SHM cache initialization ( #26427 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-09 02:48:04 -07:00
Cyrus Leung
dc7976dd9f
[Misc] Upgrade more code to Python 3.10 ( #26463 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-09 10:43:53 +01:00
Simon Danielsson
e4791438ed
[Feature] Use pydantic validation in lora.py and load.py configs ( #26413 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
2025-10-09 02:38:33 -07:00
youkaichao
e6e898f95d
[doc] add Volcengine as a compute sponsor ( #26477 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-09 17:11:47 +08:00
Nick Hill
ddcbc2f334
[Misc] Misc code simplifications ( #26450 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-09 02:10:06 -07:00
Jerry Zhang
a83ff278d6
[torchao] Add support for ModuleFqnToConfig using regex ( #26001 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-10-09 08:32:32 +00:00
Rahul Tuli
cf4cd6c24f
Add: Support for multiple hidden layers in Eagle3 ( #26164 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-10-09 07:30:50 +00:00
Harry Mellor
b960441812
Enable RMSNorm substitution for Transformers backend ( #26353 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-09 07:28:51 +00:00
Luciano Martins
1317028aa8
[Model] Gemma3: Fix GGUF loading and quantization ( #26189 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-09 07:00:53 +00:00
elvischenv
5e49c3e777
Bump Flashinfer to v0.4.0 ( #26326 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2025-10-08 23:58:44 -07:00
pwschuurman
0d7c3cb51d
Update Dockerfile and install runai-model-streamer[gcs] package ( #26464 )
...
Signed-off-by: Peter Schuurman <psch@google.com >
2025-10-08 23:48:51 -07:00
Jee Jee Li
1b2c440cd6
[Core] Relax the LoRA max rank ( #26461 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-08 23:47:14 -07:00
Cyrus Leung
0f29dca988
[CI/Build] Fix model nightly tests ( #26466 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-08 23:44:16 -07:00
Zhiyuan Li
d24cf322e1
[Hybrid]: Decouple Kernel Block Size from KV Page Size ( #24486 )
...
Signed-off-by: lizhiyuan <uniartisan2017@gmail.com >
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com >
2025-10-08 23:43:39 -07:00
Qier Li
d17f0fbf30
[Core][KVConnector] Propagate all tokens on resumed preemptions ( #24926 )
...
Signed-off-by: Qier Li <kevin44036@gmail.com >
Co-authored-by: Qier Li <qier@fb.com >
2025-10-09 14:43:31 +08:00
Wenlong Wang
43ab8cfaa5
[MM][Doc] Add documentation for configurable mm profiling ( #26200 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-10-08 23:21:20 -07:00
Matt
de253d63b7
[Hardware][AMD] Enable FlexAttention backend on ROCm ( #26439 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2025-10-09 06:20:18 +00:00
Huy Do
8bd696fa53
[Bugfix] Incorrect another MM data format in vllm bench throughput ( #26462 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-09 05:58:46 +00:00
Nick Hill
bb6d8c21f9
[Bugfix] Catch and log invalid token ids in detokenizer #2 ( #26445 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-08 21:20:25 -07:00
Zhuohan Li
ebf6ef1a9b
[Minor] Change warning->warning_once in preprocess ( #26455 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-08 21:09:06 -07:00
Jee Jee Li
0c52d6ef81
[Bugfix] Set the minimum python version for gpt-oss ( #26392 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-08 20:35:49 -07:00
Rui Qiao
467a4f98f1
[Misc] Redact ray runtime env before logging ( #26302 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-08 17:43:34 -07:00
Naveenraj Kamalakannan
e614ab7806
Separate MLAAttention class from Attention ( #25103 )
...
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-08 17:11:11 -07:00
Matthew Bonanni
2a03f93de9
[Attention] Register FLASHMLA_SPARSE ( #26441 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-08 22:28:52 +00:00
bnellnm
da364615fc
[Kernels] Modular kernel refactor ( #24812 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-08 17:51:52 -04:00
Elaine Zhao
f08919b7d1
[Bugfix] Respect min_tokens in scheduler stop check ( #26317 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
2025-10-08 14:08:24 -07:00
Lukas Geiger
93f2c0aa08
[Models] Improve iteration over layers ( #26425 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-08 20:48:33 +00:00
Nicolò Lucchesi
4ebc9108a7
[Kernel] Centralize platform kernel import in current_platform.import_kernels ( #26286 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-08 20:25:31 +00:00
Morrison Turnansky
e1ba235668
[BugFix] Fix failing test quantization/test_compressed_tensors.py::test_compressed_tensors_fp8_block_enabled ( #26436 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
2025-10-08 20:04:12 +00:00
elvischenv
b82f4307c9
[Bugfix][Flashinfer] fix VLLM_USE_TRTLLM_ATTENTION issue for models with diff hyperparameters ( #25924 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2025-10-08 19:54:48 +00:00
Matthew Bonanni
76879cc160
[Attention] Implement universal BACKEND_MAP ( #25900 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-08 12:00:25 -07:00
Vinay R Damodaran
b25d7b5657
[Feature] Change cache.py with pydantic validation ( #26390 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-08 11:12:59 -07:00
Harry Mellor
e09d1753ec
Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 ( #26416 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-08 10:40:42 -07:00
Wentao Ye
4ba8875749
[Bug] Fix Test in Batch Invariant ( #26128 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-08 10:13:47 -07:00
Lukas Geiger
6273fe8d3d
[Benchmarks] Fix imports in FP8 tuning script ( #26407 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-08 16:31:59 +00:00
Wentao Ye
9fb3ae4e6f
[Bug] Fix DeepGEMM Attention Test ( #26423 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-08 12:23:41 -04:00
Aydin Abiar
76afe4edf8
[Bugfix] Fix vllm bench ... on CPU-only head nodes ( #25283 )
...
Signed-off-by: Aydin Abiar <aydin@anyscale.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Aydin Abiar <aydin@anyscale.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-08 16:06:42 +00:00
Michael Goin
c1b06fc182
[CI Failure] Fix pre-commit issue for install_nixl_from_source_ubuntu.py ( #26424 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-08 07:55:43 -07:00
Wentao Ye
241b4cfe66
[Refactor] Refactor FP8 & INT8 Quant Folder inside w8a8 ( #25293 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: courage17340 <courage17340@163.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: zxw <1020938856@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: chenlang <chen.lang5@zte.com.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: AlonKejzman <alonkeizman@gmail.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: Tao Hui <taohui3@gmail.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com >
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io >
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com >
Signed-off-by: Iceber Gu <caiwei95@hotmail.com >
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Signed-off-by: Icey <1790571317@qq.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com >
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com >
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Signed-off-by: zixi-qi <qizixi@meta.com >
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Naman Lalit <nl2688@nyu.edu >
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com >
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
Signed-off-by: Patrick Toulme <ptoulme@meta.com >
Signed-off-by: Patrick Toulme <pctoulme+1@gmail.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com >
Signed-off-by: Jialin Ouyang <jialino@meta.com >
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com >
Signed-off-by: Juechen Liu <jueliu@meta.com >
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: yingjun-mou <renzomou@gmail.com >
Signed-off-by: zhoukz <me@zhoukz.com >
Signed-off-by: Chenxi Yang <cxyang@fb.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Signed-off-by: Lee Nau <lnau@nvidia.com >
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Signed-off-by: Lucia Fang <fanglu@meta.com >
Signed-off-by: a120092009 <zhaoty0121@gmail.com >
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
Signed-off-by: Lehua Ding <lehuading@tencent.com >
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
Signed-off-by: anion <1005128408@qq.com >
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com >
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
Signed-off-by: David Ben-David <davidb@pliops.com >
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Salvatore Cena <cena@cenas.it >
Signed-off-by: padg9912 <phone.and.desktop@gmail.com >
Signed-off-by: nadathurv <work.vnadathur@gmail.com >
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com >
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Signed-off-by: billishyahao <bill.he@amd.com >
Signed-off-by: Nathan Scott <nathans@redhat.com >
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp >
Signed-off-by: Johnny <johnnynuca14@gmail.com >
Signed-off-by: johnnynunez <johnnynuca14@gmail.com >
Signed-off-by: Johnny <johnnync13@gmail.com >
Signed-off-by: Huamin Li <3ericli@gmail.com >
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com >
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
Signed-off-by: Peter Schuurman <psch@google.com >
Signed-off-by: Huy Do <huydhn@gmail.com >
Signed-off-by: leo-pony <nengjunma@outlook.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: zhewenli <zhewenli@meta.com >
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Signed-off-by: huijjj <huijong.jeong@squeezebits.com >
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
Signed-off-by: kyt <eluban4532@gmail.com >
Signed-off-by: Egor <e.a.krivov@gmail.com >
Signed-off-by: Yang <lymailforjob@gmail.com >
Signed-off-by: Paul Pak <paulpak58@gmail.com >
Signed-off-by: whx-sjtu <2952154980@qq.com >
Signed-off-by: Xiang Si <sixiang@google.com >
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com >
Signed-off-by: Jun Jiang <jasl9187@hotmail.com >
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com >
Co-authored-by: Nicole LiHui 🥜 <nicolelihui@outlook.com >
Co-authored-by: courage17340 <courage17340@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Nicole LiHui 🥜 <nicole.li@daocloud.io >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Fadi Arafeh <115173828+fadara01@users.noreply.github.com >
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: yyzxw <34639446+yyzxw@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: chenlang <chen.lang5@zte.com.cn >
Co-authored-by: chenlang <10346245@zte.com.cn >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: AlonKejzman <alonkeizman@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Tao Hui <taohui3@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com >
Co-authored-by: Shu Wang <shuw@nvidia.com >
Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Eugene Khvedchenya <ekhvedchenya@gmail.com >
Co-authored-by: yitingdc <59356937+yitingdc@users.noreply.github.com >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
Co-authored-by: xaguilar-amd <xavier.aguilarfruto@amd.com >
Co-authored-by: Iceber Gu <caiwei95@hotmail.com >
Co-authored-by: Tao He <linzhu.ht@alibaba-inc.com >
Co-authored-by: Icey <1790571317@qq.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Xu Wenqing <121550081+Xu-Wenqing@users.noreply.github.com >
Co-authored-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com >
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: 阿丹(adan) <47373076+LDLINGLINGLING@users.noreply.github.com >
Co-authored-by: liudan <adan@minicpm.com >
Co-authored-by: liudan <liudan@qq.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: Clouddude <kouss.hd@gmail.com >
Co-authored-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Co-authored-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com >
Co-authored-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: Naman Lalit <nl2688@nyu.edu >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com >
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Xiaohan Zou <renovamenzxh@gmail.com >
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Peter Pan <peter.pan@daocloud.io >
Co-authored-by: Patrick C. Toulme <135739773+patrick-toulme@users.noreply.github.com >
Co-authored-by: Clayton Coleman <smarterclayton@gmail.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Jialin Ouyang <jialino@meta.com >
Co-authored-by: weiliang <weiliangl@nvidia.com >
Co-authored-by: Yuxuan Zhang <2448370773@qq.com >
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com >
Co-authored-by: Juechen Liu <grinchcoder@gmail.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Yingjun Mou <renzomou@gmail.com >
Co-authored-by: Zhou Jiahao <me@zhoukz.com >
Co-authored-by: Chenxi Yang <cxyang@cs.utexas.edu >
Co-authored-by: Chenxi Yang <cxyang@fb.com >
Co-authored-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Lee Nau <lee.nau@gmail.com >
Co-authored-by: Adrian Abeyta <aabeyta@redhat.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: acisseJZhong <40467976+acisseJZhong@users.noreply.github.com >
Co-authored-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucia Fang <fanglu@meta.com >
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com >
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Co-authored-by: a120092009 <33205509+a120092009@users.noreply.github.com >
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
Co-authored-by: Lehua Ding <lehuading@tencent.com >
Co-authored-by: Reza Barazesh <3146276+rzabarazesh@users.noreply.github.com >
Co-authored-by: ihb2032 <40718643+ihb2032@users.noreply.github.com >
Co-authored-by: Asaf Joseph Gardin <39553475+Josephasafg@users.noreply.github.com >
Co-authored-by: Anion <123177548+Anionex@users.noreply.github.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: cjackal <44624812+cjackal@users.noreply.github.com >
Co-authored-by: David Ben-David <sdavidbd@gmail.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
Co-authored-by: Andrew Xia <axia@mit.edu >
Co-authored-by: Andrew Xia <axia@fb.com >
Co-authored-by: Salvatore Cena <cena@cenas.it >
Co-authored-by: Param <psch@cs.unc.edu >
Co-authored-by: Zhewen Li <zhewenli@meta.com >
Co-authored-by: nadathurv <work.vnadathur@gmail.com >
Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com >
Co-authored-by: Wenlong Wang <wangwenlong2755@gmail.com >
Co-authored-by: billishyahao <bill.he@amd.com >
Co-authored-by: Nathan Scott <natoscott@users.noreply.github.com >
Co-authored-by: Kenichi Maehashi <939877+kmaehashi@users.noreply.github.com >
Co-authored-by: Johnny <johnnync13@gmail.com >
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com >
Co-authored-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Hosang <156028780+hyoon1@users.noreply.github.com >
Co-authored-by: Jerry Zhang <jerryzh168@gmail.com >
Co-authored-by: pwschuurman <psch@google.com >
Co-authored-by: Huy Do <huydhn@gmail.com >
Co-authored-by: leo-pony <nengjunma@outlook.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
Co-authored-by: Andrew Xia <axia@meta.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: ahao-anyscale <ahao@anyscale.com >
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Liu-congo <1502632128@qq.com >
Co-authored-by: HUIJONG JEONG <64083281+huijjj@users.noreply.github.com >
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com >
Co-authored-by: kyt <eluban4532@gmail.com >
Co-authored-by: Egor <e.a.krivov@gmail.com >
Co-authored-by: Yang Liu <127183760+KKSK-DON@users.noreply.github.com >
Co-authored-by: Paul Pak <52512091+paulpak58@users.noreply.github.com >
Co-authored-by: whx <56632993+whx-sjtu@users.noreply.github.com >
Co-authored-by: Xiang Si <sixiang@google.com >
Co-authored-by: Aleksandr Samarin <samarin_ad@mail.ru >
Co-authored-by: Jun Jiang <jasl9187@hotmail.com >
Co-authored-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com >
2025-10-08 10:20:48 -04:00
Chendi.Xue
9fc983c707
[NIXL][non-cuda] Add install script for nixl with non-cuda ucx ( #25959 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
2025-10-08 14:19:53 +00:00
Harry Mellor
2f99f2f506
Tidy vllm/config/__init__.py to only add classes and functions ( #26405 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-08 07:10:00 -07:00
Lukas Geiger
338b1bf04f
[Benchmarks] Add support for Qwen 3 VL MoE tuning ( #26419 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-08 14:01:08 +00:00
wang.yuqi
e39dc46f8f
[CI] Pooling models mteb test disable enforce_eager ( #26408 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-08 12:15:36 +00:00
Harry Mellor
10c75b5439
[Docs] Have mergify leave a comment with the docs preview link ( #26412 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-08 12:04:00 +00:00
Eugene Khvedchenya
f9582fd8f4
[Model] Allow passing custom number of max tiles to Nano 2 VL ( #26403 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
2025-10-08 11:19:39 +00:00
Daniele
f377333bd7
[Misc] add usedforsecurity=False in md5 hash call ( #26357 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com >
2025-10-08 10:18:32 +00:00
Wentao Ye
f8607863d8
[Feature] Enable E8M0 by Default on Hopper for DeepGEMM, 5% E2E throughput improvement ( #26197 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-08 15:33:56 +08:00
Utkarsh Sharma
335b28f7d1
[TPU] Rename tpu_commons to tpu_inference ( #26279 )
...
Signed-off-by: Utkarsh Sharma <utksharma@google.com >
Co-authored-by: Utkarsh Sharma <utksharma@google.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
2025-10-07 23:30:52 -07:00
Ayush Satyam
5e65d6b2ad
fix[DP][v1]: Prevent hangs from mismatched worker configurations ( #26218 )
...
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com >
2025-10-07 22:55:08 -07:00
Cyrus Leung
0d4f48fa10
[Bugfix] Incorrect MM data format in vllm bench throughput ( #26395 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-08 13:52:19 +08:00
Barry Kang
127c8b782a
Add gather_indexer_k_quant_cache kernel ( #25931 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Signed-off-by: Simon Mo <simon.mo@hey.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-10-08 04:58:57 +00:00
Ayush Satyam
cd9890544b
fix(v1/kv_cache): resolve async KV transfer bug in cascade attention ( #23485 )
...
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com >
2025-10-08 04:46:33 +00:00
Nick Hill
067da2d1df
[Core] Simplify setting new_token_ids in CachedRequestData ( #26388 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-08 03:32:37 +00:00
isharif168
046118b938
Add SwigluOAI implementation for CPUFusedMOE ( #26347 )
...
Signed-off-by: Sharif Inamdar <sharif.inamdar@arm.com >
2025-10-07 20:17:49 -06:00
liangel-02
b32260ab85
[torchao] safetensors integration ( #25969 )
...
Signed-off-by: Angel Li <liangel@meta.com >
2025-10-07 20:12:35 -06:00
Lucas Wilkinson
f80e7866c0
[Misc] Clean up cruft from previous FlashMLA sparse implementation ( #26125 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-08 10:09:34 +08:00
Thomas Parnell
31a4b3e6c4
Revert #24446 and #26168 ( #26332 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-07 16:38:19 -06:00
Benjamin Chislett
caf8b1c084
[Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled ( #26361 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-07 22:12:26 +00:00
Michael Goin
1b86bd8e18
Add more libraries to rlhf.md ( #26374 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-10-07 20:59:41 +00:00
Johnny Yang
59012df99b
[TPU] update TPU benchmark threshold ( #25713 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com >
2025-10-07 13:53:09 -07:00
Benjamin Chislett
3d1f67616d
[Spec Decode] Enable efficient speculative decoding with FlashInfer-MLA ( #25984 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-07 16:05:59 -04:00
Sergei Skvortsov
6ebaf43ee4
[V1] Logit processors for rejection sampler ( #19482 )
...
Signed-off-by: southfreebird <yvorott@gmail.com >
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com >
Signed-off-by: Sergei Skvortsov <yvorott@gmail.com >
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-07 13:02:49 -07:00
Morrison Turnansky
0c824fc46f
[Frontend] CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops ( #26113 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
2025-10-07 12:53:43 -07:00
Pei-Lun Liao
eb577e4655
[Bugfix] Add missing sink tensor into flash attn cascade attn implementation ( #26325 )
2025-10-07 18:56:39 +00:00
Wentao Ye
8f36850f73
[Bug] Fix Shape Validation for Fallback while Enabling E8M0 for DeepGEMM ( #26322 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-07 13:50:30 -04:00
Chen Zhang
29fd2662ba
[deepseek] add EP8 FusedMOE config for H200 and B200 ( #26331 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-07 10:38:54 -07:00
Michael Goin
30a3e5af69
[CI] Add Qwen3 MoE NVFP4 to Blackwell lm-eval ( #26316 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-07 10:36:15 -07:00
fxmarty-amd
a38c1bfe09
[ci] Rename test_mxfp4_moe.py to test_ocp_mx_moe.py ( #26364 )
...
Signed-off-by: Felix Marty <Felix.Marty@amd.com >
2025-10-07 09:52:24 -07:00
Paul Pak
320feae6f5
[Model] Lfm2Moe ( #26344 )
...
Signed-off-by: Paul Pak <paulpak58@gmail.com >
2025-10-07 16:03:05 +00:00
Cyrus Leung
1e4ecca1d0
[V0 Deprecation] Remove VLLM_USE_V1 from tests ( #26341 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 15:42:31 +00:00
Cyrus Leung
c0a7b89d8e
[Misc] Move LRUCache into its own file ( #26342 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 15:08:40 +00:00
antrec
6f59beaf0b
[Model] Add support for ModernBertForTokenClassification ( #26340 )
...
Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr >
Signed-off-by: antrec <antoine.recanati@gmail.com >
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-07 14:29:19 +00:00
fxmarty-amd
41f1cf38f2
[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 ( #21166 )
2025-10-07 09:35:26 -04:00
Isotr0py
08d26a1b7e
[Model] Use merge_by_field_config for MM models (Ovis family) ( #26308 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-07 12:54:22 +00:00
fhl2000
63773a6200
[Docs] add docs for cuda graph v1 ( #24374 )
...
Signed-off-by: fhl <2410591650@qq.com >
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-07 05:25:05 -07:00
Sergio Paniego Blanco
883b42896a
Add TRL example notebook to RLHF docs ( #26346 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
2025-10-07 11:31:28 +00:00
Daniel Cámpora
e1098ced95
Add topk logits torch op for DS3.2. ( #25945 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-10-07 10:07:32 +00:00
Grant Holmes (Ren)
d100d78eb3
Optimize KV cache distribution for asymmetric pipeline parallelism ( #25164 )
...
Signed-off-by: gholmes829 <g.holmes429@gmail.com >
2025-10-07 09:20:30 +00:00
Cyrus Leung
7e4cd070b0
[V0 Deprecation] Remove VLLM_USE_V1 from docs and scripts ( #26336 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 16:46:44 +08:00
Snehlata
46b0779996
[BugFix] Update KV block hash type from BlockHash to ExternalBlockHash in kv_events_subscriber - #26264 ( #26265 )
...
Signed-off-by: atalhens <sneh.lata@nutanix.com >
2025-10-07 08:42:28 +00:00
Ayush Satyam
de342585ff
[Model] Define merge_by_field_config MM interface (R-T) ( #26260 )
...
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 16:10:55 +08:00
Andrew Xia
185d8ed44f
[responsesAPI][bugfix] serialize harmony messages ( #26185 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-07 07:07:53 +00:00
Cyrus Leung
d9836d4517
[Deprecation] Deprecate LLM.set_tokenizer ( #26333 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 06:50:57 +00:00
Ayush Satyam
5f7e8a916a
[Model] Define merge_by_field_config MM interface (U-Z) ( #26261 )
...
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 06:45:49 +00:00
ahao-anyscale
4dbdf4a294
[BUG] Fix file parsing for load_format runai_streamer_sharded ( #26324 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-10-07 11:23:07 +08:00
Michael Goin
c6873c4e6d
[UX] Support nested dicts in hf_overrides ( #25727 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-07 11:19:16 +08:00
Sage Moore
2111b4643c
[Core] Simplify the Dp padding/should ubatch coordination logic ( #25768 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-07 01:57:49 +00:00
Sage Moore
c50901f3b9
[Docs][DBO] Add initial doc that describes the DBO implementation ( #26024 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-10-07 00:47:28 +00:00
Simon Mo
8229280a9c
[Misc] Define EP kernel arch list in Dockerfile ( #25635 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
2025-10-07 00:05:33 +00:00
Benjamin Chislett
f77df94647
[Perf] Add decode full-graph support to FlashInfer-MLA backend ( #26313 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-06 23:03:49 +00:00
Gregory Shtrasberg
f231e5bc21
[ROCm] Split AITER unified attention into its own backend ( #25507 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-10-06 22:49:23 +00:00
Benjamin Chislett
2161efe978
[Bugfix] Allow skipping MoE in NVFP4 (fix for MTP) ( #25987 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-06 16:16:30 -04:00
Varun Sundar Rabindranath
f23b4c04fd
[BugFix] Pad input buffers in _dummy_run ( #26209 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-06 16:07:51 -04:00
Varun Sundar Rabindranath
93540958b8
[Docs] Fix broken table in moe_kernel_features doc ( #26314 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-06 15:58:05 -04:00
Cyrus Leung
44b9af5bb2
[Benchmark] Enable MM Embedding benchmarks ( #26310 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-06 19:51:58 +00:00
Raushan Turganbay
7cd95dc8a3
[Bugfix] Fix gemma3 with transformers backend ( #23178 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Raushan Turganbay <raushan@huggingface.co >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-06 18:42:32 +00:00
Crefeda Rodrigues
c02058c222
Add bias handling to CPUFusedMOE kernel ( #26289 )
...
Signed-off-by: Crefeda Rodrigues <crefeda.rodrigues@arm.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Crefeda Rodrigues <65665931+cfRod@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Sharif Inamdar <Sharif.Inamdar@arm.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-10-06 18:39:10 +00:00
7mile
b2ea5ba677
[Bugfix][Spec Decode] Fix wrong valid_mask for padded speculation when chunked prefill occurs ( #26231 )
...
Signed-off-by: seven-mile <i@7li.moe >
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-06 18:24:22 +00:00
Karan Goel
824a3f403f
[Misc] auto_tune: kill specific vllm process ( #26304 )
...
Signed-off-by: Karan Goel <karangoel@google.com >
2025-10-06 18:02:51 +00:00
Rahul Tuli
05f6846ede
Support llama3 eagle3 head with llama4 verifier ( #25961 )
...
Signed-off-by: rahul-tuli <rtuli@redhat.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-10-06 13:56:08 -04:00
Michael Goin
20db99cc69
[CI Bugfix] Make sure TRTLLM attention is available in test_blackwell_moe ( #26188 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-06 13:50:11 -04:00
Yannick Schnider
6431be808f
[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input ( #26295 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-06 17:19:34 +00:00
Matthew Bonanni
4727a8afa7
[Attention] Remove unused reorder_batch method ( #24463 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-06 13:13:39 -04:00
tomeras91
b8f603cebe
[Model] EVS support for nano_nemotron_vl ( #26269 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com >
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
2025-10-07 00:23:37 +08:00
Chatcharin Sangbutsarakum
fc679696f8
Fix DotsOCR tensor type ( #26281 )
...
Signed-off-by: what_in_the_nim <chatcharinsang@gmail.com >
2025-10-06 12:23:43 +00:00
Raushan Turganbay
ab5e7d93f4
[Bugfix] Fix mrope in Transformers Backend ( #26087 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-06 11:40:50 +00:00
Harry Mellor
0340f45553
Support expert parallel load balancing in Transformers backend ( #26287 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-06 11:20:16 +00:00
Cyrus Leung
19a00eb210
[Model] Use merge_by_field_config for MM models (Llava family) ( #26280 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-06 09:45:26 +00:00
Cyrus Leung
391612e78b
[Frontend] Consolidate tokenizer init code ( #26276 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-06 09:34:52 +00:00
abhisheksheth28
77c95f72f7
[Doc] add KAITO to integrations ( #25521 )
...
Signed-off-by: "Abhishek Sheth" <absheth@microsoft.com >
2025-10-06 17:30:03 +08:00
Aritra Roy Gosthipaty
59f30d0448
[Docs] Edit HF Inference Endpoints documentation ( #26275 )
...
Signed-off-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com >
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com >
2025-10-06 10:13:09 +01:00
Roger Wang
43c146ca42
[Misc] Clean up unnecessary E501 ignore ( #26274 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-10-06 07:29:18 +00:00
Yasmin Moslem
7c2ec0fe87
[Benchmarking] Add disable_shuffle option for dataset loading ( #26258 )
...
Signed-off-by: Yasmin Moslem <48152713+ymoslem@users.noreply.github.com >
2025-10-06 07:05:44 +00:00
dependabot[bot]
039b6bade3
Bump actions/stale from 10.0.0 to 10.1.0 ( #26272 )
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 07:01:21 +00:00
Harry Mellor
6c04638214
Fix per file ruff ignores related to line length ( #26262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-06 05:12:40 +00:00
wuhang
91ac7f764d
[CI][gpt-oss] Enable python tool tests in CI ( #24315 )
...
Signed-off-by: wuhang <wuhang6@huawei.com >
2025-10-06 04:20:06 +00:00
Chen Zhang
4be7d7c1c9
[MISC] Add heheda12345 to CODEOWNERS of vllm/config/cache.py ( #26270 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-06 10:58:59 +08:00
orangeng
59b477645c
[Doc] Edited minor typo ( #26266 )
...
Signed-off-by: Orange Ng <ngquanhao@outlook.com >
2025-10-05 19:53:09 -07:00
Thomas Parnell
778f554157
[V1] [Hybrid] Some additional clean-up in Mamba2 prefix caching ( #26222 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-06 10:40:30 +08:00
Thomas Parnell
d3c84297c3
[CI] Add comment about the single cudagraph capture size that is used ( #26252 )
2025-10-06 02:35:37 +00:00
Elieser Pereira
f509a20846
[DOC] Update production-stack.md ( #26177 )
...
Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com >
2025-10-05 21:32:48 +00:00
Michael Goin
60bc25e74c
[CI] Add Blackwell LM Eval Small Models test to nightly ( #26052 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-05 14:59:50 -06:00
Harry Mellor
b893d661b1
Fix per file ruff ignores related to simplification ( #26259 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 20:31:53 +00:00
Jason Li
6b6e98775f
[NVIDIA] flashinfer TRTLLM attention prefill token limit ( #25998 )
...
Signed-off-by: jasonlizhengjian <jason.li@centml.ai >
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com >
2025-10-05 14:24:37 -06:00
Jiangyun Zhu
9c3c21c519
[CI] fix mamba kernel test ( #26250 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-05 18:26:59 +00:00
Harry Mellor
512b8affa4
Update ruff pre-commit hooks version ( #26255 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-05 09:50:50 -07:00
Harry Mellor
1c0c68202c
Fix per file ruff ignores related to typing ( #26254 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 16:37:55 +00:00
ihb2032
5f317530ec
fix(tests): Resolve late binding of loop variable in assert message lambda ( #26249 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com
2025-10-05 09:18:22 -07:00
Harry Mellor
557b2e961d
Remove all cases of fmt: on/off ( #26253 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 09:18:14 -07:00
Harry Mellor
4e256cadc2
Remove all references to yapf as it's no longer used ( #26251 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 09:18:11 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 07:06:22 -07:00
Hank_
17edd8a807
[Platform][Kernel] platform-specific kernel loading ( #25823 )
...
Signed-off-by: Hank <hcc.mayday@gmail.com >
2025-10-05 13:25:15 +02:00
ihb2032
3303cfb4ac
[Bugfix][Hardware][RISC-V] Limit supported dtypes to float32 to avoid scheduler segfault ( #26228 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
2025-10-05 10:36:54 +00:00
Cyrus Leung
b7e8e4e6be
[Bugfix] Always apply MM processor even when no MM items are passed ( #26240 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-05 10:10:20 +00:00
Simon Danielsson
432e1cbc23
[Bugfix]: Assertion error when using FlashInfer backend ( #25933 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-05 16:46:36 +08:00
Jialin Ouyang
201c971e96
[Perf][Easy] Early stop in request_block_hasher ( #26112 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-05 16:46:03 +08:00
Maximilien de Bayser
e0986ea07b
Add documentation for granite 4 tool calling ( #26175 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2025-10-05 07:35:42 +00:00
Cyrus Leung
a964e5e6c3
[Bugfix] Allow --skip-tokenizer-init with echo and return_token_ids ( #26238 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-05 05:38:53 +00:00
22quinn
78c1d5bfd2
[Easy] Add str repr for IterationStats ( #26232 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-05 05:00:21 +00:00
Cyrus Leung
59a85c366e
[Model] Use merge_by_field_config for MM models (H-L) ( #26230 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-05 11:54:17 +08:00
Cyrus Leung
119f00630b
[Renderer] Clean up renderer code ( #26216 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-04 17:05:29 +00:00
Isotr0py
a42d2df75f
[Frontend] Cache chat template kwargs resolution ( #26227 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-04 15:32:30 +00:00
Li, Jiang
5c057e068f
[CPU] Refine batch reorder of CPU attention backend ( #26096 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-04 21:54:35 +08:00
Thomas Parnell
ed3aeb25a4
[V1] [Hybrid] Remove code to override default CUDA graph configuration ( #26226 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-04 13:47:48 +00:00
yuafng
86ee949128
Fix tensor device and dtype placement in Qwen2VL model ( #26219 )
...
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Yuanfeng Li <yuanfengli@meta.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-04 06:41:39 -07:00
Cyrus Leung
4570535ec4
[Model] CLIP Embedding Support ( #26010 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-04 06:21:42 -07:00
Nicolò Lucchesi
2a6dc67eb5
[Bugfix] Fix _reqs_to_process leak on abort ( #26012 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-04 11:39:31 +00:00
Yannick Schnider
f05fea1f5e
[Core] Enable decode of context length equal to max model length ( #26168 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
2025-10-04 09:59:26 +00:00
Luca Soldaini
d0df145c2a
Add Olmo 3 reasoning parser ( #26054 )
...
Signed-off-by: Luca Soldaini <luca@soldaini.net >
2025-10-04 17:48:29 +08:00
Cyrus Leung
1838cd4860
Revert "Add batch invariant kernel override for FlashInfer backend [2/n]" ( #26220 )
2025-10-04 02:45:08 -07:00
Huamin Li
7d6b03381e
[CI Failure] fix_test_auto_prefix_cache_support ( #26053 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-04 02:44:49 -07:00
Cyrus Leung
7c2e91c4e0
[Misc] Remove unused executor.apply_model ( #26215 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-04 01:45:53 -07:00
Cyrus Leung
736fbf4c89
[Misc] Require merge_by_field_config argument ( #26214 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-04 01:40:14 -07:00
Cyrus Leung
44ea85137a
[Model] Support nested structures for TensorSchema ( #26212 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-04 01:20:32 -07:00
Harry Mellor
d3d649efec
Support expert parallel in Transformers backend ( #26162 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-04 04:35:04 +00:00
Stan Wozniak
ea507c3a93
[V1] [Hybrid] Mamba2 Automatic Prefix Caching ( #25752 )
...
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com >
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com >
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com >
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-04 06:34:22 +02:00
Fadi Arafeh
9705fba7b7
[cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/ACL and weight prepack ( #25948 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2025-10-04 12:16:38 +08:00
Bram Wasti
2f7dbc9b42
Add batch invariant kernel override for FlashInfer backend [2/n] ( #25769 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-03 19:49:30 -07:00
Ben Browning
ea25a76c05
[BugFix] Use async Mistral Tokenizer in Chat Completions ( #26134 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-04 09:42:08 +08:00
Roger Wang
67bc0c003e
[Bugfix] Fix qwen3 vl dummy data generation with overrides ( #26193 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-10-04 01:40:20 +00:00
Eugene Khvedchenya
5a05f26603
Fix issue of using only the part of video frame [Nemotron Nano] ( #26186 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
2025-10-04 00:21:00 +00:00
Varun Sundar Rabindranath
7ef40bb983
[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels ( #25488 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-03 20:13:13 -04:00
Wentao Ye
767cbb011d
[CI] Fix Pre-commit Mypy Error ( #26181 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 16:08:03 -07:00
Angela Yi
7cfa4b24bf
[BugFix] Fix de-functionalization pass for rotary_embedding ( #23953 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-03 15:44:18 -07:00
Sergei Skvortsov
b71fcd4905
[Misc] Add penalties sampling parameters to serve tool ( #25974 )
...
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com >
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com >
2025-10-03 15:43:14 -07:00
Sahithi Chigurupati
75003f34e8
[CI] Push multiarch manifests as nightly builds ( #25764 )
...
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com >
2025-10-03 15:42:55 -07:00
Bowen Bao
78b8015a4d
[Bugfix] Relax tokenizer regex for mixtral to include 'tokenizer.model' ( #25964 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com >
2025-10-03 18:31:59 -04:00
Andrew Xia
831b124151
[responsesAPI] add better error messaging for long prompts ( #25724 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-03 14:33:13 -07:00
Wentao Ye
c1ffcb55da
[Refactor] Optimize FP8 MOE Backend Choice and Log ( #26044 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 15:23:42 -06:00
Corey Lowman
0879736aab
[Perf] Remove hardcoded num_warps=1 ( #26183 )
...
Signed-off-by: Corey Lowman <clowman1993@gmail.com >
2025-10-03 20:38:50 +00:00
Pavani Majety
a26917332f
[Quantization/NVFP4] Speed up TRTLLM NVFP4 MOE weight loading and fix K/V scale loading for MLA Attn ( #25968 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-10-03 19:35:06 +00:00
Nikhil G
cd9e5b8340
Fix V1 engine serialization error with Ray distributed executor ( #26148 )
...
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com >
2025-10-03 18:39:45 +00:00
Matthew Bonanni
300a59c4c3
Avoid division by zero in cache DS MLA kernel ( #26174 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-03 17:35:17 +00:00
Harry Mellor
d76541a6c5
Stop mergify from keeping stale PRs alive ( #26169 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-03 16:42:34 +00:00
Chendi.Xue
dd96465fd7
[BugFix][QWEN-VL]fix wrong apply_rotary_emb_torch selection introduced by #24642 ( #26123 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-03 08:52:26 -07:00
Jun Jiang
4f8f47e87e
Fix undefined symbol: cutlass_moe_mm_sm100 ( #26098 )
...
Signed-off-by: Jun Jiang <jasl9187@hotmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-03 15:48:32 +00:00
Cyrus Leung
d78fda7cda
[Renderer] Move Processor out of LLMEngine ( #26165 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-03 15:08:22 +00:00
Aleksandr Samarin
73a99cc2a5
[Model] Fixed stream generator for gpt-oss + spec-decoding ( #26027 )
...
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com >
2025-10-03 13:43:41 +00:00
Xiang Si
adae0c1f43
[CI/Build] do not enforce precompilation on tpu ci tests ( #25992 )
...
Signed-off-by: Xiang Si <sixiang@google.com >
2025-10-03 13:38:42 +00:00
whx
cbf9221992
[Model] Supplement to PR 24862: Pass param prefix to LLMHead ( #25805 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2025-10-03 21:34:53 +08:00
Paul Pak
5f42fc53b6
[backends][short_conv] CUDA graph piecewise edits ( #24215 )
...
Signed-off-by: Paul Pak <paulpak58@gmail.com >
2025-10-03 12:59:48 +00:00
Yannick Schnider
8ee846c27c
[Bugfix] Re-enable prefill of max model length ( #24446 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
2025-10-03 14:13:34 +02:00
Yang Liu
812b7f54a8
[Renderer] Move Processor out of AsyncLLM ( #24138 )
...
Signed-off-by: Yang <lymailforjob@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-03 11:29:45 +00:00
Sage Moore
5f2cacdb1e
Quick fix for IMA with the Prefix Prefill kernel during graph capture ( #25983 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-10-03 11:28:22 +00:00
Egor
aa5053e3fe
[Doc] Fixed shape description for fused_batched_moe.py ( #25668 )
...
Signed-off-by: Egor <e.a.krivov@gmail.com >
2025-10-03 04:00:23 -07:00
Wenlong Wang
79aa244678
[Multi Modal] Configurable MM Profiling ( #25631 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-03 03:59:10 -07:00
kyt
2ed3f20dba
[openai] Fix missing tool usage check (system message) ( #24768 )
...
Signed-off-by: kyt <eluban4532@gmail.com >
2025-10-03 18:55:44 +08:00
Nicolò Lucchesi
48f309029a
[NIXL][Misc] Expose metrics from NIXL for logging to CLI ( #25388 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-03 10:47:59 +00:00
Thomas Parnell
0e93ac0b3a
[CI] Fix distributed hybrid tests in CI ( #26155 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-03 09:14:18 +00:00
Yannick Schnider
5446ad1d24
[test utils] correct wrong typing ( #26159 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
2025-10-03 02:11:49 -07:00
Cyrus Leung
f9a8084e48
[Model] Use merge_by_field_config for MM models (InternVL family) ( #26153 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-03 01:59:06 -07:00
HUIJONG JEONG
3e70e3d4d5
add(v1): RequestStatesStats to RequestOutput ( #24947 )
...
Signed-off-by: huijjj <huijong.jeong@squeezebits.com >
2025-10-03 08:56:25 +00:00
Jiangyun Zhu
eb0fa43868
[Perf] Optimize reshape_and_cache CUDA Kernel ( #25955 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Co-authored-by: Liu-congo <1502632128@qq.com >
2025-10-03 01:33:46 -07:00
Cyrus Leung
0ad9951c41
[Input] Remove unused prompt field ( #26097 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-03 00:23:21 -07:00
Varun Sundar Rabindranath
8c9117181d
[Misc] Remove typing.List ( #26150 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-03 07:00:33 +00:00
ahao-anyscale
c4b48d3c0f
[BUG] Reorder model config creation ( #26124 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-10-03 14:59:36 +08:00
Harry Mellor
10d765482d
FusedMoE support for the Transformers backend (#22650 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-02 23:12:15 -07:00
Cyrus Leung
39b643dc1a
[Model] Use merge_by_field_config for MM models (G) ( #26117 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-02 22:38:29 -07:00
Zhewen Li
711f485643
[Bugfix] Fix import gemm_afp4wfp4 failure on AMD ( #26068 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-02 22:37:25 -07:00
TJian
9c5ee91b2a
[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm ( #26104 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-10-02 22:34:53 -07:00
Tyler Michael Smith
27edd2aeb4
[Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv ( #26103 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2025-10-02 22:21:01 -07:00
Andrew Xia
e5017cd6d6
[gpt-oss] disable tool server initialization if no tool in request ( #25790 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-03 05:08:35 +00:00
Benjamin Chislett
6a7796e871
[Bug]: Limit num_reqs in dummy_run when max_num_seqs is small ( #26144 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-03 04:00:20 +00:00
Matthew Bonanni
47b9339546
[DeepSeek] Improve performance of DS MLA cache kernel ( #26132 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-02 20:35:47 -07:00
Michael Goin
5d5146eee3
[CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper ( #26138 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-02 20:32:38 -07:00
Matthew Bonanni
2aaa423842
[Attention] Move Backend enum into registry ( #25893 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-02 20:32:24 -07:00
Ekagra Ranjan
ad2d788016
[Bug][Benchmark] Fix duplicate req in oversampling ( #26140 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-03 02:55:24 +00:00
Wentao Ye
36ce76c632
[Log] Optimize DeepGEMM Missing Log ( #26106 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-02 20:02:26 -06:00
Michael Goin
f1fc2107a3
[Bugfix] Disable cascade attention with FlashInfer ( #26130 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-02 16:30:37 -07:00
Matthew Bonanni
13cdc02173
Fix MTP with deepep_low_latency ( #25904 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-02 21:29:49 +00:00
ElizaWszola
502640c3f9
[Perf] Fix and reapply move apply w8a8 block fp8 linear to class ( #25696 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
2025-10-02 19:35:13 +00:00
Chen Zhang
3d5f1c8640
[Mamba][KVCacheManager] Simplify kv cache manage logic for mamba + MTP ( #25119 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-02 18:48:31 +00:00
Ekagra Ranjan
1cab2f9cad
EAGLE 3: Fix preamble so that measured speedup over Eagle 1 becomes 32% instead of 5% on MTBench ( #25916 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2025-10-02 11:29:35 -07:00
Chen Zhang
1e50f1be70
[Deepseek v3.2] Support indexer prefill chunking ( #25999 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-02 10:29:12 -07:00
Chenheli Hua
ad87ba927a
[Small] Prevent bypassing media domain restriction via HTTP redirects ( #26035 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-10-02 10:27:10 -07:00
Lucas Wilkinson
decf7f794b
[BugFix] Fix FI accuracy issue when used for MLA prefill ( #26063 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-02 17:18:13 +00:00
Cyrus Leung
d00d652998
[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command ( #25967 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-02 10:04:57 -07:00
Michael Goin
3b279a84be
[CI] Add Blackwell DeepSeek FP8 FlashInfer MoE tests ( #26040 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-02 09:07:19 -07:00
vllmellm
5e4a8223c6
[Qwen][ROCm] Flash Attention Rotary Embeddings ( #24642 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-02 08:26:08 -07:00
leo-pony
e51de388a2
[Platform][CI] Added OOT platform interface e2e test that running on Ascend NPU ( #25470 )
...
Signed-off-by: leo-pony <nengjunma@outlook.com >
2025-10-02 23:19:22 +08:00
Cyrus Leung
cc253b73d3
[Model] Use merge_by_field_config for MM models (D-F) ( #26076 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-02 08:17:35 -07:00
Cyrus Leung
7d6fb905d9
[Model] Use merge_by_field_config for MM models (A-C) ( #26073 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-02 08:17:31 -07:00
Lucas Wilkinson
418d111f8c
[FA/Chore] Bump vllm-flash-attention ( #25537 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-02 11:06:14 -04:00
Thomas Parnell
be8921fbba
Change size of single CUDA graph for CI to 4 ( #26089 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-02 14:14:28 +00:00
Huy Do
d4e7a1152d
Update base image to 22.04 (jammy) ( #26065 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-02 05:48:04 -07:00
pwschuurman
be22bb6f3d
Run:ai model streamer add GCS package support ( #24909 )
...
Signed-off-by: Peter Schuurman <psch@google.com >
2025-10-01 20:59:13 -07:00
Nick Hill
169313b9f8
[Misc] Make handling of SamplingParams clearer in n>1 case ( #26032 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-01 19:31:39 -07:00
Gregory Shtrasberg
0b018d8baf
[ROCm][Bugfix] Add missing parameter to ROCm backend ( #26029 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-10-01 19:23:14 -07:00
Jerry Zhang
c31246800c
Support RL online quantization with torchao ( #23014 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-10-01 16:39:29 -07:00
Lucas Wilkinson
4134312b35
[BugFix] ChunkedLocalAttention is currently not CG compatible ( #26034 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-01 16:28:00 -07:00
Wentao Ye
da554f932e
[Bug] Fix Negative Cuda Memory Usage ( #25683 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-01 18:16:26 -04:00
Hosang
aac622e0cd
[ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series ( #25908 )
...
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com >
2025-10-01 21:39:49 +00:00
Lucas Wilkinson
1726e93ef1
[BugFix][DP/EP] Fix CUTLASS MLA hang under load ( #26026 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
2025-10-01 12:30:00 -07:00
Michael Goin
ee04c0cd04
[CI] Tweaks to GPT-OSS Eval (Blackwell) for stability ( #26030 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-01 12:02:17 -07:00
Huamin Li
c36f0aa300
Fix test_mamba_ssm_ssd.py due to missing _query_start_loc_to_chunk_indices_offsets ( #25995 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-01 18:18:36 +00:00
Johnny
5234dc7451
[NVIDIA] Blackwell Family ( #24673 )
...
Signed-off-by: Johnny <johnnynuca14@gmail.com >
Signed-off-by: johnnynunez <johnnynuca14@gmail.com >
Signed-off-by: Johnny <johnnync13@gmail.com >
Signed-off-by: Salvatore Cena <cena@cenas.it >
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com >
Co-authored-by: Salvatore Cena <cena@cenas.it >
2025-10-01 10:50:54 -07:00
Kenichi Maehashi
3b7c20a6b5
[Bugfix] Apply same sampling parameters for both n=1 and n>1 ( #26005 )
...
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp >
2025-10-01 14:37:35 +00:00
Nathan Scott
f9e714813a
[Benchmark] Finish documented v0.11.0 deprecation of --endpoint-type ( #26007 )
...
Signed-off-by: Nathan Scott <nathans@redhat.com >
2025-10-01 12:41:57 +00:00
billishyahao
2518230d3e
[MISC] Fix misleading batch_size_capture_list when cuda_graph_sizes < 4 ( #25829 )
...
Signed-off-by: billishyahao <bill.he@amd.com >
Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com >
2025-10-01 08:39:45 -04:00
Harry Mellor
a332b84578
[CI] Only capture a single CUDA graph size in CI by default ( #25951 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-01 10:03:44 +01:00
Cyrus Leung
1405f0c7ba
[Misc] Factor out common _apply_feature_select_strategy ( #26003 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-01 01:31:03 -07:00
Wenlong Wang
84d57342b6
[BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-thinker ( #26004 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-10-01 08:03:25 +00:00
nadathurv
57b46d769e
[Doc] updating torch.compile doc link ( #25989 )
...
Signed-off-by: nadathurv <work.vnadathur@gmail.com >
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com >
Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com >
2025-10-01 07:04:56 +00:00
Lucia Fang
f48b6a03ba
[Misc]allow disable pynccl ( #25421 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-10-01 06:04:13 +00:00
Harry Mellor
2a69ab4899
Update to Transformers v4.56.2 ( #24638 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-30 22:07:07 -07:00
Lucas Wilkinson
8d7da92fd7
[BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 ( #25988 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-09-30 21:58:31 -07:00
Zhewen Li
e952eee698
[Bugfix] Fix __syncwarp on ROCM ( #25996 )
2025-09-30 21:15:11 -07:00
Roger Wang
66bca9b8bd
[MM] Add text-only mode for Qwen3-VL ( #26000 )
2025-09-30 21:13:42 -07:00
Param
99028fda44
Fix INT8 quantization error on Blackwell GPUs (SM100+) ( #25935 )
...
Signed-off-by: padg9912 <phone.and.desktop@gmail.com >
2025-09-30 19:19:53 -07:00
Wentao Ye
1244948885
[Log] Optimize Log for FP8MOE ( #25709 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-09-30 19:18:43 -07:00
Salvatore Cena
a73f6491c8
Update launch_bounds_utils.h for correct compile on Multiple Cuda Arch - PTXAS out of range Warning ( #25843 )
...
Signed-off-by: Salvatore Cena <cena@cenas.it >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-30 19:18:19 -07:00
Lucia Fang
001e50c92c
[Model] MTP fallback to eager for DeepSeek v32 ( #25982 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-10-01 01:53:22 +00:00
Lucas Wilkinson
96ebcaa3ad
[Misc] Make EP kernels install script support uv ( #25785 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-09-30 23:38:34 +00:00
Andrew Xia
5db1870bb9
[gpt-oss] use vLLM instead of openai types for streaming ( #25186 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-09-30 22:47:07 +00:00
Harry Mellor
2ce26b9b5d
[Docs] Remove API Reference from search index ( #25949 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 22:10:02 +00:00
Harry Mellor
a388252ac4
Add explicit pooling classes for the Transformers backend ( #25322 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-30 23:07:06 +01:00
David Ben-David
9a9f48dff7
[V1] [P/D] Add Support for KV Load Failure Recovery ( #19330 )
...
Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
2025-09-30 14:57:08 -07:00
Jee Jee Li
67f3fb0844
[Bench] Add DeepSeekV32 to MoE benchmark ( #25962 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-30 14:13:48 -07:00
cjackal
43b752c325
[Llama4] [multimodal] Fix misplaced dtype cast of cos_sin_cache in Llama4VisionRotaryEmbedding ( #25889 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
2025-09-30 20:35:15 +00:00
Or Ozeri
cfd302db9b
OffloadingConnector: Fix GPU block tracking bug ( #25856 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-09-30 19:53:04 +00:00
bnellnm
fb610ae684
[Docs] Add moe kernel features doc ( #25297 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 19:03:15 +00:00
Cyrus Leung
2f652e6cdf
[Doc] Improve MM Pooling model documentation ( #25966 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-30 18:58:29 +00:00
Wentao Ye
e6a226efba
[Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype' ( #25958 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-09-30 11:13:03 -07:00
youkaichao
a2e6fa7e03
[bugfix][deepseek] fix flashmla kernel selection ( #25956 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-01 00:30:36 +08:00
Cyrus Leung
9f1c4ecaf2
[Bugfix] Token type and position embeddings fail to be applied to inputs_embeds ( #25922 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-01 00:23:12 +08:00
Pavani Majety
ef283548f7
[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging ( #25895 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-09-30 10:51:31 -04:00
Anion
f4db5e6de1
[Bugfix][Model] Fix inference for Hunyuan dense models ( #25354 )
...
Signed-off-by: anion <1005128408@qq.com >
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com >
2025-09-30 14:38:07 +00:00
Sergio Paniego Blanco
099aaee536
Add Hugging Face Inference Endpoints guide to Deployment docs ( #25886 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 14:35:06 +00:00
Asaf Joseph Gardin
35fe398c7c
[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 ( #25858 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-09-30 07:30:44 -07:00
ihb2032
bb6d43047e
[Fix] Improve CPU backend compatibility for RISC-V ( #25816 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
2025-09-30 13:48:07 +00:00
Reza Barazesh
bc546f76a1
[CI] Move applicable tests to CPU ( #24080 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 14:45:20 +01:00
Nicolò Lucchesi
80608ba5af
[NIXL] Add support for MLA caches with different latent dim ( #25902 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-09-30 12:18:29 +00:00
Lehua Ding
e184c9c510
[perf] Use CPU tensor to reduce GPU->CPU sync ( #25884 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
2025-09-30 19:51:16 +08:00
Cyrus Leung
d7e34b4210
[Model] Move vision_feature_select_strategy into resolve_visual_encoder_outputs ( #25938 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-30 11:24:57 +00:00
CSWYF3634076
ef6e0e7132
[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 ( #25936 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-09-30 19:11:21 +08:00
Sergio Paniego Blanco
1ad3aca682
Updated TRL integration docs ( #25684 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 03:10:55 -07:00
a120092009
8d0afa9b42
[Doc] Add Cambricon MLU support ( #25942 )
...
Signed-off-by: a120092009 <zhaoty0121@gmail.com >
2025-09-30 17:59:47 +08:00
Yongye Zhu
fa7e254a7f
[New Model] DeepSeek-V3.2 (Rebased to Main) ( #25896 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Signed-off-by: Lucia Fang <fanglu@meta.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: Lucia Fang <fanglu@meta.com >
Co-authored-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com >
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
2025-09-30 17:14:41 +08:00
Simon Danielsson
e23cacda35
[Bugfix]: Clean up chunked prefill logging when using whisper ( #25075 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
2025-09-30 08:17:49 +00:00
Zhou Jiahao
2e1b8bc2b6
[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not ( #25925 )
...
Signed-off-by: zhoukz <me@zhoukz.com >
2025-09-30 08:15:23 +00:00
acisseJZhong
e47433b3c1
[BugFix] Pass config_format via try_get_generation_config ( #25912 )
2025-09-30 05:09:50 +00:00
Lucas Wilkinson
23194d83e8
[BugFix] Fix DP/EP hang ( #25906 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-09-30 04:18:59 +00:00
Harry Mellor
61aedb5ffe
MoveVllmConfig from config/__init__.py to config/vllm.py ( #25271 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-29 19:49:49 -07:00
Zhuohan Li
d3bd171123
[Benchmark] Support benchmark throughput for external launcher DP ( #25913 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-09-30 01:43:57 +00:00
Wentao Ye
89e4050af4
[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 ( #25909 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-30 09:15:19 +08:00
Andrew Sansom
78a47f87ce
Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models ( #25717 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
2025-09-30 08:10:58 +08:00
Aaron Pham
6a113d9aed
[V0 Deprecation] Remove vllm.worker and update according imports ( #25901 )
2025-09-29 23:26:11 +00:00
Nicolò Lucchesi
2e4fe48c37
[NIXL] Increase default KV block eviction timeout on P ( #25897 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-09-29 21:35:14 +00:00
Zhuohan Li
8eb0a1d906
[Doc] Polish example for torchrun dp ( #25899 )
2025-09-29 21:31:34 +00:00
Thomas Parnell
fea3e476aa
[Kernel] Chunk-aligned mamba2 ( #24683 )
2025-09-29 23:18:25 +02:00
Gregory Shtrasberg
61a3431613
[Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so ( #25605 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-09-29 17:01:50 -04:00
Naman Lalit
9bedac9623
[Doc] Add documentation for vLLM continuous benchmarking and profiling ( #25819 )
...
Signed-off-by: Naman Lalit <nl2688@nyu.edu >
2025-09-29 20:49:49 +00:00
Adrian Abeyta
c42ff4f4fd
[BugFix][torch.compile] KV scale calculation issues with FP8 quantization ( #25513 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
2025-09-29 15:52:04 -04:00
Lee Nau
d5ab28511c
[Bugfix] Use correct key "ignore" for config.json non-quantized layers ( #25706 )
...
Signed-off-by: Lee Nau <lnau@nvidia.com >
2025-09-29 15:07:29 -04:00
Jee Jee Li
e61eb5e09d
[Model] Remove MotifForCausalLM ( #25866 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-30 00:36:30 +08:00
Isotr0py
0899ba5b42
[CI/Build] Include Transformers backend test in nightly transformers test ( #25885 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-29 09:33:39 -07:00
Rahul Tuli
145ac73317
[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue ( #25883 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-09-29 11:37:20 -04:00
Chenxi Yang
d0d138bc55
[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) ( #24690 )
...
Signed-off-by: Chenxi Yang <cxyang@fb.com >
Co-authored-by: Chenxi Yang <cxyang@fb.com >
2025-09-29 14:31:51 +00:00
Jiangyun Zhu
43227236ec
[torch.compile] serialize cudagraph_mode as its enum name instead of value ( #25868 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-09-29 13:54:52 +00:00
Zhou Jiahao
8616300ae2
[Model][Bugfix] Fix issues in MiDashengLM implementation for quantized models ( #25854 )
...
Signed-off-by: zhoukz <me@zhoukz.com >
2025-09-29 10:59:04 +00:00
Yingjun Mou
edbaadd91f
[Bugfix] Fix requirements paths in install instructions ( #25827 )
...
Signed-off-by: yingjun-mou <renzomou@gmail.com >
2025-09-29 03:49:35 -07:00
youkaichao
9360d34fa1
update to latest deepgemm for dsv3.2 ( #25871 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-09-29 17:51:43 +08:00
Cyrus Leung
1b67b04656
[Misc] Remove more get_input_embeddings_v0 ( #25857 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-29 08:03:37 +00:00
Isotr0py
bd51f78e39
[V0 Deprecation][Models] Remove all V0 condition for mm embeddings merge ( #25331 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: isotr0py <2037008807@qq.com >
2025-09-29 14:09:18 +08:00
Roger Wang
65ecb4f134
[Bugfix] Fallback ViT attn backend to SDPA for blackwell ( #25851 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-09-29 06:03:51 +00:00
Kunshang Ji
143844fa43
[XPU]Fix xpu spec decoding UTs, avoid using cuda graph ( #25847 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-09-29 05:15:10 +00:00
Thomas Parnell
219cfbe7f6
Add Phi4FlashForCausalLM to _PREVIOUSLY_SUPPORTED_MODELS ( #25832 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-09-29 05:08:17 +00:00
Robert Shaw
9b44a7d926
[P/D] NIXL Updates ( #25844 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-09-29 04:46:30 +00:00
Juechen Liu
a3ae45a38c
[Misc] fix tests failure by using current_platform ( #25825 )
...
Signed-off-by: Juechen Liu <jueliu@meta.com >
2025-09-29 04:18:57 +00:00
Michael Goin
0307428d65
Remove redundant cudagraph dispatcher warning ( #25841 )
2025-09-28 17:12:42 -04:00
JJJYmmm
471997adf6
[Bugfix] fix Qwen3VLMoe load when pp > 1 ( #25838 )
...
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com >
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com >
2025-09-28 17:56:12 +00:00
Yuxuan Zhang
b1ded114b9
Update GLM-4.5 Doc transformers version ( #25830 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
2025-09-28 12:05:51 +00:00
weiliang
f4e4088c99
Fix random dataset mismatched token length with config. ( #24937 )
...
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-09-28 08:23:44 +00:00
Isotr0py
0efd540dbc
[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling ( #25557 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-09-28 04:21:01 +00:00
Roger Wang
6144754014
[Bugfix] Fix Qwen3-VL regression from #24982 ( #25814 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-09-28 03:21:09 +00:00
Roger Wang
69311446ba
[MM] Optimize memory profiling for scattered multimodal embeddings ( #25810 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-09-28 02:17:58 +00:00
Nicolò Lucchesi
da63274d9f
[Bugfix][NIXL] Fix Async Scheduler timeout issue ( #25808 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-09-27 15:17:35 -04:00
Jialin Ouyang
c216119d64
[Core] GC Debug callback ( #24829 )
...
Signed-off-by: Jialin Ouyang <jialino@meta.com >
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Jialin Ouyang <jialino@meta.com >
2025-09-27 17:53:31 +00:00
Clayton Coleman
5546acb463
[Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location ( #25766 )
...
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com >
2025-09-27 13:36:28 -04:00
Jiangyun Zhu
c0ec81836f
[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable ( #25651 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-09-27 16:09:00 +00:00
Patrick C. Toulme
b65e56babe
[Core] Refactor self.model() to call a helper for subclassing. ( #25084 )
...
Signed-off-by: Patrick Toulme <ptoulme@meta.com >
Signed-off-by: Patrick Toulme <pctoulme+1@gmail.com >
2025-09-27 08:40:59 -07:00
Peter Pan
49996cd597
[env] default nixl side port conflicts with kv-event zmq port ( #25056 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
2025-09-27 15:02:40 +00:00
yyzxw
ecb37e276a
[docs] transcriptions API audio upload ( #25446 )
...
Signed-off-by: zxw <1020938856@qq.com >
2025-09-27 15:00:35 +00:00
Tyler Michael Smith
a5354b3ed2
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models ( #24982 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-09-27 14:22:28 +00:00
Tyler Michael Smith
f9df8b4ad7
[Bugfix] Fix triton import precommit failure ( #25803 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-09-27 07:13:11 -07:00
Harry Mellor
ec152c8748
Fix GPTQ model loading in Transformers backend ( #25770 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-27 12:18:20 +00:00
Russell Bryant
7977e5027c
Add filtering for chat template kwargs ( #25794 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-27 10:46:49 +00:00
Russell Bryant
3f5d902d2a
Validate API tokens in constant time ( #25781 )
...
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com >
2025-09-27 18:09:26 +08:00
Cyrus Leung
27d7638b94
[Bugfix] Merge MM embeddings by index instead of token IDs ( #16229 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-09-27 08:15:12 +00:00
Xiaohan Zou
176173989a
[Bugfix] Add missing image_size for phi4_multimodal ( #25796 )
2025-09-27 07:59:22 +00:00
Roger Wang
23b8ee672d
[Misc] Update openai client example file for multimodal ( #25795 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-27 07:57:07 +00:00
22quinn
3939152069
[Misc] Fix codeowners override for v1 sample and attention ( #25037 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-09-27 07:47:29 +00:00
Cyrus Leung
cd87bfbf37
[CI/Build] Reorganize root-level V1 tests ( #25767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-27 13:51:15 +08:00
22quinn
b3613e3ace
[CI/Build] Add timing to Model Executor Test ( #25799 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-09-26 21:57:27 -07:00
Cyrus Leung
d346ec695e
[CI/Build] Consolidate model loader tests and requirements ( #25765 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-26 21:45:20 -07:00
Wentao Ye
c242c98031
[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL ( #25788 )
2025-09-26 20:44:52 -07:00
WeiQing Chen
f1d53d150c
[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl ( #22872 )
...
Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com >
2025-09-27 03:35:47 +00:00
Michael Goin
92da847cf5
Add flashinfer-build.sh and register precompiled cu128 wheel in Dockerfile ( #25782 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-09-26 18:54:09 -07:00
Russell Bryant
3958b96bf5
Add option to restrict media domains ( #25783 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com >
2025-09-27 01:23:52 +00:00
Zhuohan Li
8bf8f45822
[Core] Don't count preempted tokens in prefix cache hit rate ( #25787 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-09-27 00:16:40 +00:00
Jonas M. Kübler
6f5c0931c1
[Spec decode] automatically disable mm for text-only draft models ( #25667 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
2025-09-27 08:10:21 +08:00
Naman Lalit
4e33a7ea85
[Bugfix] Optimize CpuGpuBuffer initialization ( #25447 )
...
Signed-off-by: Naman Lalit <nl2688@nyu.edu >
2025-09-27 08:07:36 +08:00
Bram Wasti
dc48ba0c75
Kernel-override Determinism [1/n] ( #25603 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
2025-09-26 16:59:09 -07:00
Sage Moore
4778b42660
Reduce the Cuda Graph memory footprint when running with DBO ( #25779 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-09-26 22:29:56 +00:00
qizixi
c70ac4b8ff
[spec decode] Consolidate speculative decode method name for MTP ( #25232 )
...
Signed-off-by: zixi-qi <qizixi@meta.com >
2025-09-26 22:27:05 +00:00
Michael Goin
cf89202855
[CI] Fix FlashInfer AOT in release docker image ( #25730 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-09-26 14:11:40 -07:00
fhl2000
f075693da7
[V1] address post issues related to #20059 (part 1) ( #23046 )
...
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-09-26 15:58:19 -04:00
Michael Goin
f708bd4904
[CI] Add E2E Blackwell Quantized MoE Test ( #25723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-09-26 12:23:00 -07:00
Michael Goin
0002b7f0d1
[Docs] Add Toronto Meetup ( #25773 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-09-26 12:00:46 -07:00
Frank Wang
11aafd9886
[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition ( #25355 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-09-26 11:54:00 -07:00