Pavani Majety
|
b02fd288b2
|
[Hardware][NV] Fix Modelopt model loading for k-v-scales for Llama models. (#11787)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-01-29 01:46:12 -08:00 |
|
Alphi
|
d93bf4da85
|
[Model] Refactoring of MiniCPM-V and add MiniCPM-o-2.6 support for vLLM (#12069)
Signed-off-by: hzh <hezhihui_thu@163.com>
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Shanshan Shen <467638484@qq.com>
Signed-off-by: elijah <f1renze.142857@gmail.com>
Signed-off-by: Yikun <yikunkero@gmail.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com>
Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: sixgod <evethwillbeok@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com>
Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: Chenguang Li <757486878@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Steve Luo <36296769+SunflowerAries@users.noreply.github.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: TJian <tunjian1996@gmail.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-01-29 09:24:59 +00:00 |
|
Travis Johnson
|
036ca94c25
|
[Bugfix] handle alignment of arguments in convert_sparse_cross_attention_mask_to_dense (#12347)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Wallas Santos <wallashss@ibm.com>
|
2025-01-29 08:54:35 +00:00 |
|
Michael Goin
|
bd02164cf9
|
Bugfix for whisper quantization due to fake k_proj bias (#12524)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-01-29 04:49:03 +00:00 |
|
Cyrus Leung
|
8f58a51358
|
[VLM] Merged multi-modal processor and V1 support for Qwen-VL (#12504)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-28 16:25:05 +00:00 |
|
Harry Mellor
|
823ab79633
|
Update pre-commit hooks (#12475)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-27 17:23:08 -07:00 |
|
Bowen Wang
|
2bc3fbba0c
|
[FlashInfer] Upgrade to 0.2.0 (#11194)
Signed-off-by: Bowen Wang <abmfy@icloud.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-01-27 18:19:24 +00:00 |
|
Isotr0py
|
ce69f7f754
|
[Bugfix] Fix gpt2 GGUF inference (#12467)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-27 18:31:49 +08:00 |
|
Cyrus Leung
|
5204ff5c3f
|
[Bugfix] Fix Granite 3.0 MoE model loading (#12446)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-26 21:26:44 -08:00 |
|
Tyler Michael Smith
|
aa2cd2c43d
|
[Bugfix] Disable w16a16 2of4 sparse CompressedTensors24 (#12417)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-01-26 19:59:58 +08:00 |
|
Divakar Verma
|
bf21481dde
|
[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-01-25 12:17:19 +08:00 |
|
Cyrus Leung
|
fb30ee92ee
|
[Bugfix] Fix BLIP-2 processing (#12412)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-25 11:42:42 +08:00 |
|
Russell Bryant
|
d3d6bb13fb
|
Set weights_only=True when using torch.load() (#12366)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-01-24 02:17:30 +00:00 |
|
Dipika Sikka
|
eb5cb5e528
|
[BugFix] Fix parameter names and process_after_weight_loading for W4A16 MoE Group Act Order (#11528)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-01-23 21:40:33 +00:00 |
|
Gregory Shtrasberg
|
e97f802b2d
|
[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2025-01-23 18:04:03 +00:00 |
|
Isotr0py
|
c5b4b11d7f
|
[Bugfix] Fix k_proj's bias for whisper self attention (#12342)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-23 10:15:33 +00:00 |
|
rasmith
|
68c4421b6d
|
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD (#12282)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-01-23 00:10:37 +00:00 |
|
Jee Jee Li
|
84bee4bd5c
|
[Misc] Improve the readability of BNB error messages (#12320)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-22 16:56:54 +00:00 |
|
Cyrus Leung
|
6609cdf019
|
[Doc] Add docs for prompt replacement (#12318)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-22 14:56:29 +00:00 |
|
Roger Wang
|
16366ee8bb
|
[Bugfix][VLM] Fix mixed-modality inference backward compatibility for V0 (#12313)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-22 21:06:36 +08:00 |
|
zhou fan
|
528dbcac7d
|
[Model][Bugfix]: correct Aria model output (#12309)
Signed-off-by: xffxff <1247714429@qq.com>
|
2025-01-22 11:39:19 +00:00 |
|
Cyrus Leung
|
cd7b6f0857
|
[VLM] Avoid unnecessary tokenization (#12310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-22 11:08:31 +00:00 |
|
Kevin H. Luu
|
64ea24d0b3
|
[ci/lint] Add back default arg for pre-commit (#12279)
Signed-off-by: kevin <kevin@anyscale.com>
|
2025-01-22 01:15:27 +00:00 |
|
Cyrus Leung
|
df76e5af26
|
[VLM] Simplify post-processing of replacement info (#12269)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-21 16:48:13 -08:00 |
|
wangxiyuan
|
fa9ee08121
|
[Misc] Set default backend to SDPA for get_vit_attn_backend (#12235)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-01-21 11:52:11 -08:00 |
|
Cyrus Leung
|
f2e9f2a3be
|
[Misc] Remove redundant TypeVar from base model (#12248)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-21 08:40:39 +00:00 |
|
Jee Jee Li
|
1f1542afa9
|
[Misc]Add BNB quantization for PaliGemmaForConditionalGeneration (#12237)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-21 07:49:08 +00:00 |
|
Cyrus Leung
|
96912550c8
|
[Misc] Rename MultiModalInputsV2 -> MultiModalInputs (#12244)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-21 07:31:19 +00:00 |
|
Nicolò Lucchesi
|
5fe6bf29d6
|
[BugFix] Fix GGUF tp>1 when vocab_size is not divisible by 64 (#12230)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-01-21 12:23:14 +08:00 |
|
Gregory Shtrasberg
|
d4b62d4641
|
[AMD][Build] Porting dockerfiles from the ROCm/vllm fork (#11777)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-01-21 12:22:23 +08:00 |
|
Cheng Kuan Yong Jason
|
06a760d6e8
|
[bugfix] catch xgrammar unsupported array constraints (#12210)
Signed-off-by: Jason Cheng <jasoncky96@gmail.com>
|
2025-01-20 16:42:02 -08:00 |
|
Cyrus Leung
|
b37d82791e
|
[Model] Upgrade Aria to transformers 4.48 (#12203)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-20 17:58:48 +08:00 |
|
Cyrus Leung
|
59a0192fb9
|
[Core] Interface for accessing model from VllmRunner (#10353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-20 15:00:59 +08:00 |
|
Isotr0py
|
83609791d2
|
[Model] Add Qwen2 PRM model support (#12202)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-20 14:59:46 +08:00 |
|
Martin Gleize
|
bbe5f9de7d
|
[Model] Support for fairseq2 Llama (#11442)
Signed-off-by: Martin Gleize <mgleize@meta.com>
Co-authored-by: mgleize user <mgleize@a100-st-p4de24xlarge-4.fair-a100.hpcaas>
|
2025-01-19 10:40:40 -08:00 |
|
Roger Wang
|
81763c58a0
|
[V1] Add V1 support of Qwen2-VL (#12128)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: imkero <kerorek@outlook.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-19 19:52:13 +08:00 |
|
Isotr0py
|
edaae198e7
|
[Misc] Add BNB support to GLM4-V model (#12184)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-19 19:49:22 +08:00 |
|
Cyrus Leung
|
630eb5b5ce
|
[Bugfix] Fix multi-modal processors for transformers 4.48 (#12187)
|
2025-01-18 19:16:34 -08:00 |
|
Michal Adamczyk
|
4e94951bb1
|
[BUGFIX] Move scores to float32 in case of running xgrammar on cpu (#12152)
Signed-off-by: Michal Adamczyk <madamczyk@habana.ai>
|
2025-01-19 11:12:05 +08:00 |
|
yancong
|
32eb0da808
|
[Misc] Support register quantization method out-of-tree (#11969)
|
2025-01-18 16:13:16 -08:00 |
|
Isotr0py
|
02798ecabe
|
[Model] Port deepseek-vl2 processor, remove dependency (#12169)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-18 13:59:39 +08:00 |
|
Gregory Shtrasberg
|
b5b57e301e
|
[AMD][FP8] Using MI300 FP8 format on ROCm for block_quant (#12134)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-01-17 17:12:26 +00:00 |
|
Li, Jiang
|
d4e6194570
|
[CI/Build][CPU][Bugfix] Fix CPU CI (#12150)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-01-17 19:39:52 +08:00 |
|
Isotr0py
|
62b06ba23d
|
[Model] Add support for deepseek-vl2-tiny model (#12068)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-16 17:14:48 +00:00 |
|
Roger Wang
|
874f7c292a
|
[Bugfix] Fix max image feature size for Llava-one-vision (#12104)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-01-16 14:54:06 +00:00 |
|
youkaichao
|
bf53e0c70b
|
Support torchrun and SPMD-style offline inference (#12071)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-16 19:58:53 +08:00 |
|
Isotr0py
|
dd7c9ad870
|
[Bugfix] Remove hardcoded head_size=256 for Deepseek v2 and v3 (#12067)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-16 10:11:54 +00:00 |
|
Michael Goin
|
9aa1519f08
|
Various cosmetic/comment fixes (#12089)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-01-16 09:59:06 +00:00 |
|
Elfie Guo
|
fa0050db08
|
[Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2025-01-16 04:31:27 +00:00 |
|
kewang-xlnx
|
de0526f668
|
[Misc][Quark] Upstream Quark format to VLLM (#10765)
Signed-off-by: kewang-xlnx <kewang@xilinx.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-01-15 11:05:15 -05:00 |
|