Zebing Lin
82dfb12e52
[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead ( #23673 )
...
Signed-off-by: linzebing <linzebing1995@gmail.com >
2025-09-08 21:34:37 -07:00
zhiweiz
170129eb28
[gpt-oss] Harmony changes with container tool support ( #23386 )
...
Signed-off-by: zhiweiz <zhiweiz@fb.com >
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
Co-authored-by: zhiweiz <zhiweiz@fb.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2025-09-08 19:03:50 -07:00
Woosuk Kwon
4172235ab7
[V0 deprecation] Deprecate V0 Neuron backend ( #21159 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-06 16:15:18 -07:00
Bangsheng Tang
848562bd49
break execute_model in gpu_model_runner into sub-functions for custom scopes ( #24265 )
...
Co-authored-by: Bangsheng Tang <bangsheng@meta.com >
2025-09-06 14:02:47 -07:00
Didier Durand
35bf193864
[Doc]: fix typos in Python comments ( #24294 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-09-05 19:41:12 -07:00
Lucas Wilkinson
402759d472
[Attention] FlashAttn MLA ( #14258 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2025-09-04 02:47:59 -07:00
Divakar Verma
04d1dd7f4a
[ROCm][Aiter] Add triton fp8 bmm kernel for mla ( #23264 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com >
2025-08-28 18:18:08 +00:00
Wentao Ye
321938e9ac
[Feature] Add VLLM_DISABLE_PAD_FOR_CUDAGRAPH to Avoid Hang Issue ( #23595 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-27 21:52:24 +00:00
Wentao Ye
3af47c3cc6
[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt ( #23666 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-08-27 14:09:08 +00:00
nvjullin
7ea22e42d5
[Misc] Add override for allreduce fusion thresholds ( #23639 )
...
Signed-off-by: Julien Lin <jullin@nvidia.com >
2025-08-26 15:53:04 +00:00
Xin Yang
8a3cd90af5
[Kernel] Add fused grouped_topk kernel for MoE ( #23274 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-08-25 11:47:52 -07:00
Ilya Markov
0313cf854d
[PERF] PyTorch Symmetric Memory All-Reduce ( #20759 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com >
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-08-22 15:39:08 -06:00
Shiyan Deng
da65bec309
add an env var for path to pre-downloaded flashinfer cubin files ( #22675 )
2025-08-22 19:25:45 +00:00
Pavani Majety
1d353b6352
[Core] Always use tensor cores for Flashinfer Decode Wrapper ( #23214 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-08-21 16:02:11 -04:00
Ning Xie
3496274663
[Misc] Convert VLLM_TORCH_PROFILER_DIR path to absolute ( #23191 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-08-21 15:49:09 -04:00
Chenheli Hua
e58c5a9768
[Core] Add torch profiler CPU traces for AsyncLLM. ( #21794 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-08-20 02:32:47 +00:00
nvjullin
79899b63f6
[Bugfix] Added more env vars to hash ( #22449 )
...
Signed-off-by: Julien Lin <jullin@nvidia.com >
2025-08-15 20:08:37 +00:00
Csrayz
a0632a3e03
[Frontend] Expose do_log_stats interval to env ( #22905 )
...
Signed-off-by: Csrayz <jover@cmbchina.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-08-15 13:00:20 +00:00
nvjullin
279a5f31b3
[Kernel] Add nvfp4 gemm flashinfer backends ( #22346 )
...
Signed-off-by: Julien Lin <jullin@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-08-14 16:03:55 -04:00
Jinzhen Lin
33c63e9547
[Kernel] [Quantization] Add MXFP4 and bias support for marlin kernel ( #22428 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com >
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Animesh Jain <anijain@umich.edu >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: kf <kuanfu.liu@embeddedllm.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com >
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.me >
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: yan <yan.ma@intel.com >
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Xiao Liu <xiszishu@gmail.com >
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es >
Signed-off-by: Andy Xie <andy.xning@gmail.com >
Signed-off-by: Haibin Lin <haibin.lin@bytedance.com >
Signed-off-by: David Ben-David <davidb@pliops.com >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Signed-off-by: Abirdcfly <fp544037857@gmail.com >
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: huangweixiao <huangweixiao@msh.team >
Signed-off-by: alyosha-swamy <raghav@arcee.ai >
Signed-off-by: Eric Hanley <ericehanley@google.com >
Signed-off-by: Abatom <abzhonghua@gmail.com >
Signed-off-by: CLFutureX <775523362@qq.com >
Signed-off-by: Linkun Chen <github@lkchen.net >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: tlipoca9 <tlipoca9@gmail.com >
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com >
Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Benji Beck <benjibeck@meta.com >
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: simon-mo <xmo@berkeley.edu >
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Zhang Jason <ning.zhang2@amd.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: asafg <asafg@ai21.com >
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Lain <fusiyuan2000@hotmail.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: QscQ <qscqesze@gmail.com >
Signed-off-by: qingjun <qingjun@minimaxi.com >
Signed-off-by: Syed Muhammad Bin Asif <syedmba7@connect.hku.hk >
Signed-off-by: Lionel Villard <villard@us.ibm.com >
Signed-off-by: ycyaw66 <497410282@qq.com >
Signed-off-by: David Chen <530634352@qq.com >
Signed-off-by: Linkun <github@lkchen.net >
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com >
Signed-off-by: Ming Yang <minos.future@gmail.com >
Signed-off-by: Adrian Garcia <adrian.garcia@inceptionai.ai >
Signed-off-by: shaojunqi <shaojunqi.sjq@alibaba-inc.com >
Signed-off-by: Ricardo Decal <rdecal@anyscale.com >
Signed-off-by: Andrew Chan <andrewkchan.akc@gmail.com >
Signed-off-by: Felix Marty <Felix.Marty@amd.com >
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com >
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: XIn Li <xinli@nvidia.com >
Signed-off-by: Junhao Li <junhao@ubicloud.com >
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Signed-off-by: <zyy1102000@gmail.com >
Signed-off-by: Guy Stone <guys@spotify.com >
Signed-off-by: <yyweiss@gmail.com >
Signed-off-by: yyw <yyweiss@gmail.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com >
Signed-off-by: Pradyun92 <142861237+Pradyun92@users.noreply.github.com >
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Co-authored-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Huzaifa Sidhpurwala <huzaifas@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Animesh Jain <jainanimesh2305@yahoo.com >
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com >
Co-authored-by: XiongfeiWei <isaacwxf23@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: JartX <sagformas@gmail.com >
Co-authored-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: kf <kuanfu.liu@embeddedllm.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com >
Co-authored-by: Yong Hoon Shin <48474650+sarckk@users.noreply.github.com >
Co-authored-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com >
Co-authored-by: Yuxuan Zhang <2448370773@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Yan Ma <yan.ma@intel.com >
Co-authored-by: Xiao <xiszishu@gmail.com >
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
Co-authored-by: Ning Xie <andy.xning@gmail.com >
Co-authored-by: H <linhaibin.eric@gmail.com >
Co-authored-by: David Ben-David <sdavidbd@gmail.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
Co-authored-by: TankNee <nee@tanknee.cn >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com >
Co-authored-by: ZiTian.Zhao <zitian.zhao@tencentmusic.com >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Abirdcfly <fp544037857@gmail.com >
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com >
Co-authored-by: Chenxi Yang <cxyang@cs.utexas.edu >
Co-authored-by: Chenxi Yang <cxyang@meta.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Weixiao Huang <hwx.simle@gmail.com >
Co-authored-by: Raghav Ravishankar <113712354+alyosha-swamy@users.noreply.github.com >
Co-authored-by: ericehanley <ericehanley@google.com >
Co-authored-by: Zhonghua Deng <abzhonghua@gmail.com >
Co-authored-by: Po-Han Huang (NVIDIA) <53919306+nvpohanh@users.noreply.github.com >
Co-authored-by: PiteXChen <44110731+CLFutureX@users.noreply.github.com >
Co-authored-by: lkchen <github@lkchen.net >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
Co-authored-by: tlipoca9 <160737620+tlipoca9@users.noreply.github.com >
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Co-authored-by: wang.yuqi <noooop@126.com >
Co-authored-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Siyuan Liu <lsiyuan@google.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Zhang Jason <ning.zhang2@amd.com >
Co-authored-by: Asaf Joseph Gardin <39553475+Josephasafg@users.noreply.github.com >
Co-authored-by: asafg <asafg@ai21.com >
Co-authored-by: Lain <siyuanf@nvidia.com >
Co-authored-by: tc-mb <157115220+tc-mb@users.noreply.github.com >
Co-authored-by: imning3 <hbning@pku.edu.cn >
Co-authored-by: Maximilien de Bayser <mbayser@br.ibm.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Tao He <linzhu.ht@alibaba-inc.com >
Co-authored-by: qscqesze <qingjun@minimaxi.com >
Co-authored-by: Syed Muhammad Bin Asif <92625830+syedmba@users.noreply.github.com >
Co-authored-by: Lionel Villard <villard@us.ibm.com >
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com >
Co-authored-by: ycyaw66 <497410282@qq.com >
Co-authored-by: Moritz Sanft <58110325+msanft@users.noreply.github.com >
Co-authored-by: Ming Yang <minos.future@gmail.com >
Co-authored-by: Adrián García García <adrigarvk8@gmail.com >
Co-authored-by: Michael Goin <mgoin@redhat.com >
Co-authored-by: JaceyShao <65159281+JaceyShao@users.noreply.github.com >
Co-authored-by: shaojunqi <shaojunqi.sjq@alibaba-inc.com >
Co-authored-by: Ricardo Decal <crypdick@users.noreply.github.com >
Co-authored-by: Andrew Chan <andrewkchan.akc@gmail.com >
Co-authored-by: fxmarty-amd <felmarty@amd.com >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
Co-authored-by: Zhiyu <zhiyuc@nvidia.com >
Co-authored-by: Shu Wang <shuw@nvidia.com >
Co-authored-by: XIn Li <xinli@nvidia.com >
Co-authored-by: Junhao Li <streaver91@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: Hong Hanh <hanh.usth@gmail.com >
Co-authored-by: Daniel Serebrenik <74646983+pliops-daniels@users.noreply.github.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Guy Stone <guys@spotify.com >
Co-authored-by: yyweiss <70619747+yyweiss@users.noreply.github.com >
Co-authored-by: Pradyun92 <142861237+Pradyun92@users.noreply.github.com >
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-08-14 11:23:22 -07:00
milesial
20d65aa755
[Frontend] Multithreaded async multimodal load_bytes ( #22710 )
...
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com >
Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com >
2025-08-13 06:09:26 -07:00
Chi Zhang
98deac3879
[FEATURE] support custom vllm tuned config path for fused moe triton kernels ( #22791 )
...
Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com >
2025-08-13 20:27:25 +08:00
Wentao Ye
f7dcce7a4a
[Feature] Add VLLM_USE_DEEP_GEMM_E8M0 Env to Control E8M0 Scale ( #21968 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-08-11 09:39:08 -07:00
Doug Smith
d1af8b7be9
enable Docker-aware precompiled wheel setup ( #22106 )
...
Signed-off-by: dougbtv <dosmith@redhat.com >
2025-08-10 16:29:02 -07:00
Shu Wang
a3b9c17b56
Support Tensorrt-LLM MoE fp4 for low-latency ( #21331 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: XIn Li <xinli@nvidia.com >
Co-authored-by: XIn Li <xinli@nvidia.com >
2025-08-07 19:18:22 -07:00
Cyrus Leung
139d155781
[Frontend] Use engine argument to control MM cache size ( #22441 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-08-07 09:47:10 -07:00
Cyrus Leung
766bc8162c
[Core] Store only the keys for multi-modal data in P0 ( #22198 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-08-07 01:45:04 -07:00
Ming Yang
82216dc21f
[Misc] Support routing logic simulation ( #21990 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-06 23:06:20 -07:00
Lain
9a3835aaa9
Fix trtllm-gen attention env and add attention sink ( #22378 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Lain <fusiyuan2000@hotmail.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
2025-08-06 18:07:41 -07:00
Yongye Zhu
31f09c615f
[gpt-oss] flashinfer mxfp4 ( #22339 )
...
Signed-off-by: simon-mo <xmo@berkeley.edu >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
2025-08-06 12:37:27 -07:00
Woosuk Kwon
6e20924350
Add attention sink in attention backends ( #22320 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
2025-08-05 22:37:21 -07:00
Wentao Ye
ae87ddd040
[Refactor] Remove Unused Environment Variable VLLM_NO_DEPRECATION_WARNING ( #22199 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-08-05 09:40:23 -07:00
elvischenv
83156c7b89
[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel ( #22095 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2025-08-05 02:45:34 -07:00
Woosuk Kwon
9af654cc38
[Responses API] Ignore store=True and process the request by default ( #22185 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-04 05:12:48 -07:00
Woosuk Kwon
6d98843b31
[Responses API] Disable response store by default ( #22137 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-03 04:04:21 -07:00
Varun Sundar Rabindranath
a65f46be5e
[Misc] DeepGemmExperts : Avoid JIT generation in the hot-path ( #21955 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-08-01 19:42:03 -07:00
Nicolò Lucchesi
57393715e8
[Misc] VLLM_TARGET_DEVICE.lower() ( #22101 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-08-01 19:41:40 -07:00
Rui Qiao
d331759488
Introduce RayPPCommunicator for ray-based PP ( #21660 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-08-01 11:50:58 -07:00
Simon Mo
da31f6ad3d
Revert precompile wheel changes ( #22055 )
2025-08-01 08:26:24 +00:00
wenxindongwork
8f0d516715
[TPU] Support Pathways in vLLM ( #21417 )
...
Signed-off-by: wenxindongwork <wenxindong@google.com >
2025-07-30 10:02:12 -07:00
youkaichao
e91d3c9cda
[misc] skip p2p check by default ( #21904 )
2025-07-30 22:05:04 +08:00
Csrayz
b917da442b
Expose PyTorch profiler configuration to environment variables ( #21803 )
...
Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com >
2025-07-29 19:46:31 -07:00
Doug Smith
a1873db23d
docker: docker-aware precompiled wheel support ( #21127 )
...
Signed-off-by: dougbtv <dosmith@redhat.com >
2025-07-29 14:45:19 -07:00
Lucas Wilkinson
8aa1485fcf
[Perf] Disable chunked local attention by default with llama4 ( #21761 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-07-28 18:49:04 -04:00
Chauncey
6da0078523
[Feat] Allow custom naming of vLLM processes ( #21445 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-07-24 03:15:23 -07:00
deven-labovitch
63d92abb7c
[Frontend] Set MAX_AUDIO_CLIP_FILESIZE_MB via env var instead of hardcoding ( #21374 )
...
Signed-off-by: Deven Labovitch <deven@videa.ai >
2025-07-23 20:22:19 -07:00
Michael Goin
f3137cdd81
[Core] Freeze gc during cuda graph capture to speed up init ( #21146 )
...
Signed-off-by: Codex <codex@openai.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-07-23 17:20:14 -07:00
Li, Jiang
a15a50fc17
[CPU] Enable shared-memory based pipeline parallel for CPU backend ( #21289 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-07-21 09:07:08 -07:00
Li, Jiang
e3a0e43d7f
[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code ( #21032 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-07-19 05:13:55 -07:00
Kaixi Hou
6d0734c562
[NVIDIA] Add SM100 Flashinfer MoE blockscale fp8 backend for low latency ( #20645 )
...
Signed-off-by: kaixih <kaixih@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-07-19 02:33:01 -07:00