Commit Graph

8877 Commits

Author SHA1 Message Date
Shaoyu Yang
d637b96099 [BugFix] [Vul] Add missing usedforsecurity=False in MD5 hashing to enable FIPS (#18319)
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: shaoyuyoung <shaoyuyoung@gmail.com>
Co-authored-by: cascade <cascade812@outlook.com>
2025-05-19 01:31:23 -07:00
CYJiang
275c5daeb0 fix: Add type specifications for CLI arguments in tensorizer options (#18314) 2025-05-18 23:42:17 -07:00
Simon Mo
47fda6d089 [Build] Supports CUDA 12.6 and 11.8 after Blackwell Update (#18316)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-05-18 23:19:33 -07:00
Reid
27d0952600 [Misc] extract parser.parse_args() (#18323)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-19 04:06:26 +00:00
Nan Qin
221cfc2fea Feature/vllm/input embedding completion api (#17590)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com>
Co-authored-by: Bryce1010 <bryceyx@gmail.com>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-05-18 20:18:05 -07:00
wwl2755
9da1095daf [Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa (#18175)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-05-18 19:49:46 -07:00
Robin
d1211f8794 [Doc] Add doc to explain the usage of Qwen3 thinking (#18291)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-05-18 23:04:07 +00:00
Reid
b6a6e7a529 [Misc] add litellm integration (#18320)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-18 15:32:30 +00:00
Lifu Huang
4fb349f66a Fix copy-paste error in phi4mm image processing (#18315)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-18 07:00:12 -07:00
22quinn
908733aca7 [Model] Use sigmoid for single-label classification (#18313)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-05-18 07:00:09 -07:00
Reid
1a8f68bb90 [doc] update reasoning doc (#18306)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-18 06:59:14 -07:00
cascade
9ab2c02ff8 Support sequence parallelism combined with pipeline parallelism (#18243)
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-05-17 22:47:25 +00:00
Ning Xie
66e63e86ec [MISC] fix typo (#18305)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-05-17 10:52:09 -07:00
rongfu.leng
9214e60631 [Model] use AutoWeightsLoader for solar (#18113) 2025-05-17 00:24:17 -07:00
Nishidha
f880d42582 Fixed build on ppc64le due to openssl conflicts (#18262)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
2025-05-17 00:23:46 -07:00
Michael Goin
dcfe95234c Update Dockerfile to build for Blackwell (#18095) 2025-05-17 00:23:25 -07:00
Siyuan Liu
48ac2bed5b [Hardware][TPU] Optionally import for TPU backend (#18269)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: Carol Zheng <cazheng@google.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: Hongmin Fan <fanhongmin@google.com>
2025-05-17 15:23:12 +08:00
David Ben-David
3e0d435027 [P/D][V1] Support dynamic loading of external KV connector implementations (#18142)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
2025-05-17 06:40:39 +00:00
汪志鹏
4ee4826ede [BugFix] Correct max_model_len derivation from config.json for Mistral format (#17937)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: tracelogfb <48808670+tracelogfb@users.noreply.github.com>
Co-authored-by: Stephen Chen <tracelog@meta.com>
2025-05-17 04:20:13 +00:00
Reid
60017dc841 [Misc] reformat the collect-env output (#18285)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-16 19:46:18 -07:00
Trevor Royer
55f1a468d9 Move cli args docs to its own page (#18228) (#18264)
Signed-off-by: Trevor Royer <troyer@redhat.com>
2025-05-16 19:43:45 -07:00
Michael Goin
fd195b194e [V1][P/D] Local attention optimization for NIXL (#18170)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-16 21:16:33 -04:00
Woosuk Kwon
fabe89bbc4 [Spec Decode] Don't fall back to V0 when spec decoding is enabled (#18265) 2025-05-16 16:10:27 -07:00
Jinzhen Lin
e73b7dfd69 [Bugfix] fix an illegal memory access was encountered of marlin kernel + act_order (#18245) 2025-05-16 16:02:44 -07:00
Bowen Wang
7fdfa01530 [Sampler] Adapt to FlashInfer 0.2.3 sampler API (#15777)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-05-16 15:14:03 -07:00
Sanger Steel
aef94c6d07 [CI] Assign reviewer to mergify with changes to Tensorizer files (#18278) 2025-05-16 12:04:14 -07:00
Nick Hill
0ceaebf87b [BugFix] Fix ordering of KVConnector finished send/rcv sets (#18211)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-05-16 09:20:54 -07:00
Nick Hill
1db4f47f81 [BugFix] Fix multi async save in MultiConnector (#18246)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-05-16 08:13:47 -07:00
Reid
d3d91b6f71 [Misc][MacOS] fix bfloat16 error (#18249)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-16 15:05:59 +00:00
learner0810
87d871470d [Model] Use autoweightloader for dbrx (#18251)
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
2025-05-16 07:54:13 -07:00
fxmarty-amd
a5f8c111c2 [Fix] Fix typo in resolve_hf_chat_template (#18259)
Signed-off-by: Felix Marty <felmarty@amd.com>
2025-05-16 14:52:41 +00:00
Lain
e23564cb70 use ceil_div in cutlass block scaling shape check (#17918) 2025-05-16 03:02:58 -07:00
Isotr0py
390ec88905 [Misc] Consolidate Audio tests into multimodal common generation tests (#18214)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-16 09:18:08 +00:00
Seiji Eicher
541817670c [Misc] Add Ray Prometheus logger to V1 (#17925)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-05-16 01:02:42 -07:00
Vadim Gimpelson
67da5720d4 [PERF] Speed up Qwen2.5-VL model by speed up rotary position embedding (#17973)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
2025-05-15 23:31:02 -07:00
David Xia
5c04bb8b86 [doc] fix multimodal example script (#18089)
Signed-off-by: David Xia <david@davidxia.com>
2025-05-16 06:05:34 +00:00
Lucia Fang
3d2779c29a [Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827)
Signed-off-by: Lucia Fang <fanglu@fb.com>
2025-05-15 22:28:27 -07:00
Will Eaton
6b31c84aff Throw better error for when running into k8s service discovery issue (#18209)
Signed-off-by: Will Eaton <weaton@redhat.com>
2025-05-15 21:07:28 -07:00
Harry Mellor
b18201fe06 Allow users to pass arbitrary JSON keys from CLI (#18208)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-15 21:05:34 -07:00
Sky Lee
f4937a51c1 [Model] vLLM v1 supports Medusa (#17956)
Signed-off-by: lisiqi23 <lisiqi23@xiaomi.com>
Signed-off-by: skylee-01 <497627264@qq.com>
Co-authored-by: lisiqi23 <lisiqi23@xiaomi.com>
2025-05-15 21:05:31 -07:00
kliuae
ee659e3b60 [Bugfix][ROCm] Use chunked_prefill_paged_decode as fallback for V1 attention on ROCm (#18093)
Signed-off-by: kf <kuanfu.liu@embeddedllm.com>
2025-05-15 19:30:17 -07:00
Lucas Wilkinson
4e1c6a0264 [Bugfix] fix rotary embedding test for _get_padded_tensor_shape (#18229)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-16 01:32:45 +00:00
Lucas Wilkinson
c7852a6d9b [Build] Allow shipping PTX on a per-file basis (#18155)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-15 16:41:55 -07:00
Lucia Fang
8795eb9975 [Bugfix] Fix test_eagle test (#18223)
Signed-off-by: Lucia Fang <fanglu@fb.com>
2025-05-15 15:59:42 -07:00
Alexei-V-Ivanov-AMD
0b34593017 Adding "AMD: Tensorizer Test" to amdproduction. (#18216) 2025-05-15 11:01:25 -07:00
Nicolò Lucchesi
e3f3aee6f4 [Misc] Avoid cuda graph log when sizes still match (#18202)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-05-15 09:59:38 -07:00
TJian
92540529c0 [Bugfix] [ROCm]: Remove assertion logic when using AITER fused moe in unquantizedMethod to reenable LLama4 BF16 (#18205)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-05-15 09:53:18 -07:00
Zhonghua Deng
fadb8d5c2d [Bugfix]Change the exception thrown by call_hf_processor from RuntimeError to ValueError (#18181)
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-05-15 09:01:47 -07:00
Sebastian Schoennenbeck
2aa5470ac5 [Frontend] Fix chat template content format detection (#18190)
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
2025-05-15 09:00:21 -07:00
Harry Mellor
51ff154639 Improve examples rendering in docs and GitHub (#18203)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-15 15:57:49 +00:00