Robert Shaw
|
95befecc18
|
[MoE Refactor][2/N] Use Modular Kernels for Fp8 (#30825)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-19 23:36:38 +00:00 |
|
Wentao Ye
|
4cf9429897
|
[Bug] Fix error 'Dynamo failed to run FX node with fake tensors for Deepseek V3.2 (#31046)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-19 23:31:31 +00:00 |
|
Robert Shaw
|
83a317f650
|
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-19 13:09:54 -08:00 |
|
Lucas Wilkinson
|
5f6477d1d0
|
[BugFix] Fix TypeError: unhashable type: 'dict' when serving deepseek32 (#30924)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-19 16:07:54 -05:00 |
|
Wentao Ye
|
3bd8335bd0
|
[Refactor] Refactor for DeepGemmQuantScaleFMT using cache (#30898)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-19 13:50:39 -07:00 |
|
Seiji Eicher
|
1ab5213531
|
Make engine core client handshake timeout configurable (#27444)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-12-19 20:38:30 +00:00 |
|
Zhonghua Deng
|
969bbc7c61
|
[Model] Add MiMo-V2-Flash support (#30836)
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-19 17:17:03 +00:00 |
|
Andrey Talman
|
268a972c62
|
Update Pytorch version update docs (#30982)
|
2025-12-19 16:08:53 +00:00 |
|
Jinzhen Lin
|
5fbfa8d9ef
|
[Quantization] fix marlin w8a8 check (#30961)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-19 07:33:22 -08:00 |
|
Shanshan Shen
|
23a1946e3b
|
[CustomOp][Refactor] Extract common methods for ApplyRotaryEmb CustomOp (#31021)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-12-19 22:16:09 +08:00 |
|
Thomas Parnell
|
b5545d9d5c
|
[Bugfix] [Kernel] Triton attention kernels: mask out V blocks that fall outside sliding window (#30887)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-12-19 21:39:54 +08:00 |
|
Nishidha Panpaliya
|
bd2b52fc2d
|
[CPU][Bugfix] Fix ppc64le CPU build (#30871)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
|
2025-12-19 12:26:35 +00:00 |
|
Li, Jiang
|
420ba2dbb6
|
Enable aarch64 CPU performance benchmarks (#26494)
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Ioana Ghiban <ioana.ghiban@arm.com>
Co-authored-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-12-19 12:16:18 +00:00 |
|
Marko Rosenmueller
|
455949675d
|
[Frontend][Bug] allow tool calls in analysis channel (#28139)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-19 10:47:44 +00:00 |
|
lif
|
086b96339f
|
[Bugfix] Add validation for tool requests when tool_parser is unavailable (#30613)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2025-12-19 18:23:28 +08:00 |
|
Jinzhen Lin
|
9187de9fac
|
[Quantization] enable compressed-tensors marlin support for turing (2) (#31008)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-19 08:56:35 +00:00 |
|
Isotr0py
|
ac1c934276
|
[Bugfix] Fix incorrect tiles creation for mm prefix triton attention (#30974)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-19 16:00:33 +08:00 |
|
Wenqi Glantz
|
4924ac582c
|
Add hidden dimension validation for multimodal embedding inputs (#30968)
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
|
2025-12-19 07:59:36 +00:00 |
|
Li, Jiang
|
096b25c9ed
|
[Doc][CPU] Fix index link for CPU regular release wheels (#31015)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-19 07:29:52 +00:00 |
|
Jinzhen Lin
|
de08b8f61b
|
[Quantization] enable compressed-tensors marlin support for turing (#31000)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2025-12-18 20:29:48 -08:00 |
|
Nick Hill
|
2ac85a4544
|
[BugFix] Fix logprobs with spec decode and modified logits (#30846)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-18 19:58:28 -08:00 |
|
Andreas Karatzas
|
7b43db210c
|
[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements (#30270)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-19 02:17:27 +00:00 |
|
PlatinumGod
|
6a09612b2e
|
[Bugfix] Fix tool_choice="none" being ignored by GPT-OSS/harmony models (#30867)
Signed-off-by: yujiepu <pyjapple@gmail.com>
Signed-off-by: PlatinumGod <pyjapple@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
v0.14.0rc0
|
2025-12-19 09:34:27 +08:00 |
|
Nick Hill
|
45c0526ac9
|
[BugFix] Handle errors when preprocessing added requests (#30895)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-19 01:29:11 +00:00 |
|
Benjamin Chislett
|
d6b3d39b6d
|
[Cleanup] Refactor FlashInferMetadataBuilder (#29128)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-18 14:45:30 -08:00 |
|
Chendi.Xue
|
6ca74bc11a
|
[NIXL][BUG FIX] Fix both failing issue and accuracy issue with nixl + host_buffer on CUDA (#30419)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-12-18 22:10:02 +00:00 |
|
Harry Mellor
|
19c583398a
|
Check for truthy rope_parameters not the existence of it (#30983)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-18 13:59:10 -08:00 |
|
Nick Hill
|
b0b77c4655
|
[BugFix] Fix spec decode + structured outputs + preemption edge case (#30916)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-18 12:59:55 -08:00 |
|
Kayvan Mivehnejad
|
634a14bd7d
|
Strengthen input validation and tests for 'parse_raw_prompts’. (#30652)
Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>
|
2025-12-18 19:51:58 +00:00 |
|
Chen Zhang
|
24b65eff0d
|
[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 (#30319)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-12-18 19:47:56 +00:00 |
|
Elizabeth Thomas
|
41b6f9200f
|
Remove all2all backend envvar (#30363)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-18 19:46:28 +00:00 |
|
Wentao Ye
|
97000a2be7
|
[Bug] Fix compressed tensor not using deepgemm (#30820)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-18 14:45:55 -05:00 |
|
Isotr0py
|
d2dc5dfc6e
|
[Bugfix] Remove tile_size=64 for mm_prefix triton attention (#30973)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-18 20:42:32 +01:00 |
|
navmarri14
|
b8c477c115
|
tuned fused configs for B300 (#30629)
|
2025-12-18 11:41:59 -08:00 |
|
jiahanc
|
53ad423f26
|
[Perf] enable flashinfer rotary_embedding custom ops in DeepSeek rotary (#30729)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-12-18 14:31:18 -05:00 |
|
wz1qqx
|
889f8bb250
|
[BugFix]Reclaim resources to prevent memory leaks when use LMCacheMPConnector (#30745)
Signed-off-by: wz1qqx <ziqi.wang@novita.ai>
Co-authored-by: wz1qqx <ziqi.wang@novita.ai>
|
2025-12-18 19:09:51 +00:00 |
|
Fanli Lin
|
058926d48c
|
[XPU] allow custom workers (e.g. vllm-omni workers) to be used on XPU (#30935)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
|
2025-12-18 10:16:36 -08:00 |
|
Isotr0py
|
700a5ad6c6
|
[MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface (#30684)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-19 02:04:19 +08:00 |
|
Alec
|
62be3670cb
|
[BugFix] Add sleep to fix tight loop and release GIL (#29476)
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-12-18 09:52:55 -08:00 |
|
inkcherry
|
500f26e6d3
|
[Bugfix] fix DP-aware routing in OpenAI API requests (#29002)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2025-12-18 09:50:42 -08:00 |
|
Nick Hill
|
686cbaac64
|
[Cleanup] Remove unused ModelRunner V1 InputBatch.num_tokens field (#30218)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-18 09:17:00 -08:00 |
|
Vasiliy Kuznetsov
|
f4ee2c3d90
|
fix fp8 online quantization streaming with tp > 1 (#30900)
Signed-off-by: vasiliy <vasiliy@fb.com>
|
2025-12-18 11:45:15 -05:00 |
|
Xin Yang
|
9a5e96523b
|
[LoRA] Set default MXFP4 LoRA backend to Marlin (#30598)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-18 08:42:22 -08:00 |
|
wzyrrr
|
326e7c3105
|
[Doc] Add Sophgo TPU Support (#30949)
Co-authored-by: zhaoyang.wang <zhaoyang.wang@sophgo.com>
|
2025-12-18 16:29:33 +00:00 |
|
Lucas Kabela
|
0db5439ded
|
[Bugfix][torch2.10] Fix test_qwen2_5_vl_compilation with 2.10 RC (#30822)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-18 08:23:31 -08:00 |
|
sarathc-cerebras
|
28d15ab56b
|
adds jais 2 support (#30188)
Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-12-18 15:46:58 +00:00 |
|
Wentao Ye
|
6628758233
|
[Bug] Fix batch invariant in torch 2.10 (#30907)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-18 07:27:51 -08:00 |
|
zhrrr
|
eee600c34f
|
[Misc] support nsys profile for bench latency (#29776)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
|
2025-12-18 14:52:20 +00:00 |
|
Michael Goin
|
100f93d2be
|
Filter safetensors files to download if .safetensors.index.json exists (#30537)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-18 14:51:17 +00:00 |
|
vllmellm
|
96bf50a2c0
|
[ROCm] Serving Fails on Radeon Due to AITER Dtype Import (#30952)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-12-18 11:47:46 +00:00 |
|