Didier Durand
|
d7e1e59972
|
[Doc]: fix typos in Python comments (#24093)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:05:45 -07:00 |
|
Wentao Ye
|
c4ed78b14f
|
[Compile] Fix Compile Warning for w4a8_mm_entry.cu (#23660)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-02 20:45:52 -07:00 |
|
co63oc
|
1bd007f234
|
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-02 20:44:50 -07:00 |
|
afeldman-nm
|
136d853e65
|
[V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (#23656)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
|
2025-09-03 02:52:51 +00:00 |
|
Russell Bryant
|
e32a0e8678
|
Upgrade xgrammar to 0.1.23 (#22988)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-03 02:32:59 +00:00 |
|
youkaichao
|
42dc59dbac
|
Update release pipeline post PyTorch 2.8.0 update (#24073)
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
|
2025-09-03 10:09:19 +08:00 |
|
Chaojun Zhang
|
862f2ef893
|
[XPU] Fix the bug of LoRA logits on the XPU platform (#24081)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
|
2025-09-03 08:21:18 +08:00 |
|
Matthew Bonanni
|
2fd1a40a54
|
[CI/Build] Disable SiluMul NVFP4 quant fusion tests (#24121)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-02 16:50:28 -07:00 |
|
Wentao Ye
|
930a24144c
|
[Bug] R1 Accuracy: Fix routed_scaling_factor Double Mul Issue (#24119)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-02 22:22:30 +00:00 |
|
rasmith
|
457e471971
|
[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (#23692)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-09-02 22:13:57 +00:00 |
|
Thomas Parnell
|
d328f7894f
|
[CI] Enable all hf transformers baselines in test_hybrid (#23936)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-09-02 20:15:06 +00:00 |
|
Wentao Ye
|
98aee612aa
|
[Log] Only Print Profiler Results on Rank 0 (#23370)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-02 18:53:34 +00:00 |
|
nathan
|
598bd74cf8
|
Fix weights loading for Apertus (#24100)
Signed-off-by: Nathan Ranchin <nranchin@student.ethz.ch>
|
2025-09-02 18:34:28 +00:00 |
|
Mark McLoughlin
|
2417798471
|
[Metrics] Deprecate TPOT in favor of ITL (#24110)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-09-02 18:10:10 +00:00 |
|
Kyuyeun Kim
|
9480ae24e3
|
[Bugfix] Fix packed_factor missing attribute error (#23902)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2025-09-02 10:56:31 -07:00 |
|
Chenheli Hua
|
f399182e8c
|
Run ruff format on a few files. (#24075)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-09-02 17:55:32 +00:00 |
|
Kyle Sayers
|
1c41310584
|
[Bugfix] Fix transform_config parsing in Compressed Tensors (#23945)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-09-02 13:54:10 -04:00 |
|
Jiangyun Zhu
|
c83c4ff815
|
[Benchmark] Add support for local hf dataset path in benchmark (#23999)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-09-02 17:49:16 +00:00 |
|
Peter Pan
|
0e1759cd54
|
[docs] add SYS_NICE cap & security-opt for docker/k8s (#24017)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-02 17:27:20 +00:00 |
|
Michael Goin
|
e66ed3e675
|
[CI Failure] Skip failing nvfp4 silu test (#23959)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-02 13:18:15 -04:00 |
|
wang.yuqi
|
e0653f6c0b
|
[Model] Classification models support logit_bias / sigmoid_normalize (#24031)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-02 16:48:57 +00:00 |
|
Kyungmin Lee
|
38ba061f6f
|
[BugFix] Fix EXAONE4 rotary embeddings (#23918)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-02 14:40:55 +00:00 |
|
Nicolò Lucchesi
|
0a74e9d0f2
|
[Gemma3n] Fix audio batching (#24052)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-02 22:23:35 +08:00 |
|
Christian Berge
|
8bd5844989
|
correct LWS deployment yaml (#23104)
Signed-off-by: cberge908 <42270330+cberge908@users.noreply.github.com>
|
2025-09-02 12:04:59 +00:00 |
|
Aziz
|
ce30dca5c4
|
[CI]: reduce HTTP calls inside entrypoints openai tests (#23646)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: Aziz <azizbenothman76@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-02 10:49:32 +00:00 |
|
WeiQing Chen
|
2f0bab3f26
|
[Model] Support dp on ViT on GLM-4.5V (#23168)
Signed-off-by: David Chen <530634352@qq.com>
|
2025-09-02 10:48:18 +00:00 |
|
Didier Durand
|
fad73be1a5
|
[Doc]: fix typos in Python comments (#24077)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 02:38:55 -07:00 |
|
Benji Beck
|
56d04089ef
|
Migrate Interns1 inputs to TensorSchema (#23510)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-02 04:35:45 +00:00 |
|
Yan Ma
|
7be0cb8e9e
|
[XPU][Feature] fp8 online quantization support for XPU (#23148)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>
|
2025-09-02 04:06:53 +00:00 |
|
Benji Beck
|
1fa1d6a9a0
|
Migrate OvisImagePatchInputs to TensorSchema (#22024)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-02 12:01:36 +08:00 |
|
Maximilien de Bayser
|
d59c986444
|
Remove runtime checks based on pooling params (#24051)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-09-02 11:54:37 +08:00 |
|
damon
|
04d0c60770
|
[Bugfix] Fix the issue that Blip2ForConditionalGeneration' object has… (#24028)
Signed-off-by: Dazhi Jiang <dazhi_jiang@163.com>
|
2025-09-02 11:54:20 +08:00 |
|
Asaf Joseph Gardin
|
2b41cbbf03
|
[V1][Mamba1] - FP32 SSM Kernel Support (#23506)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
|
2025-09-01 20:53:00 -07:00 |
|
Didier Durand
|
0235103cbb
|
[Doc]: fix typos in Python comments (#24042)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-01 19:07:45 -07:00 |
|
Lucia Fang
|
a344a5aa0a
|
[bugfix]fix MTP hidden states (#24056)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-09-01 21:09:37 +00:00 |
|
Woosuk Kwon
|
5685370271
|
[Chore][V0 Deprecation] Move LogProb to a separate file (#24055)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-01 12:07:53 -07:00 |
|
WeiQing Chen
|
a0e0efd6bd
|
[Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 (#23817)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-09-01 16:56:56 +00:00 |
|
Christian Pinto
|
cf91a89dd2
|
[docs][misc] IOProcessor plugins fixes (#24046)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-09-01 09:17:41 -07:00 |
|
Woosuk Kwon
|
39a22dcaac
|
[Misc] Minor code simplification for spec decode (#24053)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-01 08:54:01 -07:00 |
|
Julien Debache
|
41c80698b3
|
Document multi-proc method selection for profiling (#23802)
Signed-off-by: jdebache <jdebache@nvidia.com>
|
2025-09-01 06:28:26 -07:00 |
|
Kwai-Keye
|
7c8271cd1e
|
[Model]: support KeyeVL-1_5-8B (#23838)
Signed-off-by: wangruitao <wangruitao@kuaishou.com>
Co-authored-by: wangruitao <wangruitao@kuaishou.com>
|
2025-09-01 03:50:27 -07:00 |
|
Kay Yan
|
3e330fcb21
|
[Doc]: Fix CPU install docs: force torch-backend=cpu to avoid GPU torchvision errors (#24033)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-09-01 03:34:52 -07:00 |
|
Nicolò Lucchesi
|
d46934b229
|
[Frontend] Gemma3n audio transcriptions/translations endpoint (#23735)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-01 18:07:46 +08:00 |
|
Didier Durand
|
107284959a
|
[Doc]: fix typos in Python comments (#24026)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-01 09:38:20 +00:00 |
|
Jee Jee Li
|
dc1a53186d
|
[Kernel] Update DeepGEMM to latest commit (#23915)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-01 02:38:04 -07:00 |
|
wang.yuqi
|
55602bb2e6
|
[Frontend] Update the warning log when using VLLM_ALLOW_LONG_MAX_MODEL_LEN (#20904)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-01 08:50:25 +00:00 |
|
Isotr0py
|
d7fbc6ddac
|
[Misc] Enable V1 FP16 inference on pre-Ampere GPUs (#24022)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-01 08:12:22 +00:00 |
|
Ning Xie
|
5438967fbc
|
[Misc] add hash_function doc string (#24014)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-31 23:11:20 -07:00 |
|
Code Jesus
|
422e793fa6
|
[Bugfix] Add support for <tool_call> format in streaming mode for XLAM Tool Parser (#22769)
Signed-off-by: Devon Peroutky <devon@kindo.ai>
|
2025-09-01 14:07:54 +08:00 |
|
Christian Pinto
|
1cb39dbcdd
|
[Misc] IO Processor plugins for pooling models (#22820)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-31 23:07:12 -07:00 |
|