Benji Beck
|
56d04089ef
|
Migrate Interns1 inputs to TensorSchema (#23510)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-02 04:35:45 +00:00 |
|
Yan Ma
|
7be0cb8e9e
|
[XPU][Feature] fp8 online quantization support for XPU (#23148)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>
|
2025-09-02 04:06:53 +00:00 |
|
Benji Beck
|
1fa1d6a9a0
|
Migrate OvisImagePatchInputs to TensorSchema (#22024)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-02 12:01:36 +08:00 |
|
Maximilien de Bayser
|
d59c986444
|
Remove runtime checks based on pooling params (#24051)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-09-02 11:54:37 +08:00 |
|
damon
|
04d0c60770
|
[Bugfix] Fix the issue that Blip2ForConditionalGeneration' object has… (#24028)
Signed-off-by: Dazhi Jiang <dazhi_jiang@163.com>
|
2025-09-02 11:54:20 +08:00 |
|
Asaf Joseph Gardin
|
2b41cbbf03
|
[V1][Mamba1] - FP32 SSM Kernel Support (#23506)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
|
2025-09-01 20:53:00 -07:00 |
|
Didier Durand
|
0235103cbb
|
[Doc]: fix typos in Python comments (#24042)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-01 19:07:45 -07:00 |
|
Lucia Fang
|
a344a5aa0a
|
[bugfix]fix MTP hidden states (#24056)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-09-01 21:09:37 +00:00 |
|
Woosuk Kwon
|
5685370271
|
[Chore][V0 Deprecation] Move LogProb to a separate file (#24055)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-01 12:07:53 -07:00 |
|
WeiQing Chen
|
a0e0efd6bd
|
[Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 (#23817)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-09-01 16:56:56 +00:00 |
|
Christian Pinto
|
cf91a89dd2
|
[docs][misc] IOProcessor plugins fixes (#24046)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-09-01 09:17:41 -07:00 |
|
Woosuk Kwon
|
39a22dcaac
|
[Misc] Minor code simplification for spec decode (#24053)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-01 08:54:01 -07:00 |
|
Julien Debache
|
41c80698b3
|
Document multi-proc method selection for profiling (#23802)
Signed-off-by: jdebache <jdebache@nvidia.com>
|
2025-09-01 06:28:26 -07:00 |
|
Kwai-Keye
|
7c8271cd1e
|
[Model]: support KeyeVL-1_5-8B (#23838)
Signed-off-by: wangruitao <wangruitao@kuaishou.com>
Co-authored-by: wangruitao <wangruitao@kuaishou.com>
|
2025-09-01 03:50:27 -07:00 |
|
Kay Yan
|
3e330fcb21
|
[Doc]: Fix CPU install docs: force torch-backend=cpu to avoid GPU torchvision errors (#24033)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-09-01 03:34:52 -07:00 |
|
Nicolò Lucchesi
|
d46934b229
|
[Frontend] Gemma3n audio transcriptions/translations endpoint (#23735)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-01 18:07:46 +08:00 |
|
Didier Durand
|
107284959a
|
[Doc]: fix typos in Python comments (#24026)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-01 09:38:20 +00:00 |
|
Jee Jee Li
|
dc1a53186d
|
[Kernel] Update DeepGEMM to latest commit (#23915)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-01 02:38:04 -07:00 |
|
wang.yuqi
|
55602bb2e6
|
[Frontend] Update the warning log when using VLLM_ALLOW_LONG_MAX_MODEL_LEN (#20904)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-01 08:50:25 +00:00 |
|
Isotr0py
|
d7fbc6ddac
|
[Misc] Enable V1 FP16 inference on pre-Ampere GPUs (#24022)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-01 08:12:22 +00:00 |
|
Ning Xie
|
5438967fbc
|
[Misc] add hash_function doc string (#24014)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-31 23:11:20 -07:00 |
|
Code Jesus
|
422e793fa6
|
[Bugfix] Add support for <tool_call> format in streaming mode for XLAM Tool Parser (#22769)
Signed-off-by: Devon Peroutky <devon@kindo.ai>
|
2025-09-01 14:07:54 +08:00 |
|
Christian Pinto
|
1cb39dbcdd
|
[Misc] IO Processor plugins for pooling models (#22820)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-31 23:07:12 -07:00 |
|
Benji Beck
|
437c3ce026
|
Migrate Phi4 inputs to TensorSchema (#23471)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-01 14:05:59 +08:00 |
|
Ning Xie
|
499b074bfd
|
[Misc] refactor code by import as for torch._inductor.config (#23677)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-09-01 14:05:42 +08:00 |
|
Isotr0py
|
ff0e59d83a
|
[CI/Build] Improve Tensor Schema tests speed by avoid engine core initialization (#23357)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-31 22:52:20 -07:00 |
|
Woosuk Kwon
|
b55713683c
|
[Misc] Move fast prefill logic to separate method (#24013)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-01 05:40:38 +00:00 |
|
Jun-Howie
|
acc1a6e10a
|
Fix the bug related to loading GPTP INT3 weights. (#23328)
Signed-off-by: JunHowie <JunHowie@aliyun.com>
Co-authored-by: JunHowie <JunHowie@aliyun.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-01 05:39:57 +00:00 |
|
Woosuk Kwon
|
8c742a66d1
|
[Misc] Avoid redundant copy for encoder-only models (#24012)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-01 04:02:43 +00:00 |
|
JartX
|
183a70967a
|
[BUGFIX] GPTQ quantization compatibility for Qwen3 MOE models (AutoGPTQ and AutoRound-GPTQ) (#23994)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-01 03:33:40 +00:00 |
|
Or Ozeri
|
14b4326b94
|
v1: Support KV events from connectors (#19737)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-01 01:13:21 +00:00 |
|
Nick Hill
|
752d2e1c36
|
[Minor] Fix some random typos in comments (#24009)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-31 16:42:17 -07:00 |
|
Xiaodong Wang
|
81eea3d348
|
vllm fix check on max vocab size (#22471)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-08-31 20:57:05 +08:00 |
|
Didier Durand
|
9701352e4b
|
[Doc]: fix typos in Python comments (#24001)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-31 08:21:59 +00:00 |
|
Roger Wang
|
749be00a98
|
[Core][Multimodal] Allow passing multi_modal_uuids as multimodal identifiers. (#23394)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-08-30 18:01:22 -07:00 |
|
Gabriel Marinho
|
5b8077b8ac
|
Fix wrong truncate_prompt_tokens type hint (#22761)
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Gabriel Marinho <104592062+gmarinho2@users.noreply.github.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-30 20:39:38 +00:00 |
|
Andy Lo
|
038e9be4eb
|
[LoRA] Much faster startup when LoRA is enabled (#23777)
Signed-off-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-30 15:37:39 +00:00 |
|
Ning Xie
|
68a349114f
|
[Misc] enhance type hint for rearrange return value (#23519)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-30 06:43:33 -07:00 |
|
Ning Xie
|
e80bca309e
|
[Refactor] refactor freezing_value/cuda_event initialize outside try finally (#23758)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-30 06:42:25 -07:00 |
|
Ning Xie
|
fb4983e112
|
[Misc] add reorder_batch AttentionMetadataBuilder (#23798)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-30 06:41:45 -07:00 |
|
sadegh.shokatian
|
379ea2823a
|
Add LoRA support for DeepSeek models (V2, V3, R1-0528) (#23971)
Signed-off-by: sadeghja1070 <sadegh.ja1070@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-30 06:40:02 -07:00 |
|
Jiangyun Zhu
|
3a6acad431
|
[Model] Enable encoder DP for MiniCPM-V (#23948)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-30 06:31:26 -07:00 |
|
Ning Xie
|
5490d633ce
|
[UT] fix unify_kv_cache_configs when kv cache config needs sort (#23843)
|
2025-08-30 11:22:14 +00:00 |
|
Jee Jee Li
|
628d00cd7b
|
[Bugfix] Fix test_lora_resolvers.py (#23984)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-30 11:16:11 +00:00 |
|
Thomas Parnell
|
4071c76cf3
|
[V1] [Hybrid] Move MiniMaxLinearAttention into layers/mamba (#23831)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-30 00:16:15 -07:00 |
|
Cyrus Leung
|
f1bddbd852
|
[Core] Cleanup TPU model runner for MM (#23894)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-30 00:14:58 -07:00 |
|
Yong Hoon Shin
|
9748c5198b
|
[CI] Fix broken compile tests due to unsupported SiluMul+Nvfp4Quant fusion (#23973)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-08-30 00:14:43 -07:00 |
|
Roger Wang
|
ee52a32705
|
[CI] Move testing image from remote URL to S3 (#23980)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-08-29 21:41:25 -07:00 |
|
Xin Yang
|
8fb85b7bb6
|
Add routed_scaling_factor to MoE grouped topk (#23123)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-29 21:36:48 -07:00 |
|
dubejf
|
5b31cb1781
|
[Bugfix] Fix --config arg expansion called from api_server.py (#23944)
Signed-off-by: Jean-Francois Dube <dubejf+gh@gmail.com>
Co-authored-by: Jean-Francois Dube <dubejf+gh@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-29 21:36:39 -07:00 |
|