Commit Graph

1235 Commits

Author SHA1 Message Date
Jee Jee Li
a655eb3025 [Misc]Add BNB quantization for Qwen2VL (#11719)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-03 15:19:02 -07:00
Aurick Qiao
e1a5c2f0a1 [Model] Whisper model implementation (#11280)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
2025-01-03 16:39:19 +08:00
Lu Fang
07064cb1d4 [Bugfix] Check chain_speculative_sampling before calling it (#11673)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-01-02 16:58:56 -08:00
bjmsong
187e32997c [Bugfix] Change kv scaling factor by param json on nvidia gpu (#11688)
Signed-off-by: bjmsong <bjmsong@126.com>
Co-authored-by: bjmsong <bjmsong@126.com>
2025-01-02 21:11:39 +00:00
Cyrus Leung
8c38ee7007 [VLM] Merged multi-modal processor for LLaVA-NeXT (#11682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-02 16:39:27 +00:00
Cyrus Leung
a115ac46b5 [VLM] Move supported limits and max tokens to merged multi-modal processor (#11669)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-01 15:44:42 +00:00
Kazuhiro Serizawa
6d70198b17 [Doc] Fix typo (#11666)
Signed-off-by: Kazuhiro Serizawa <nserihiro@gmail.com>
2025-01-01 08:10:10 +00:00
Jee Jee Li
11d8a091c6 [Misc] Optimize Qwen2-VL LoRA test (#11663)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-01 14:42:23 +08:00
Cyrus Leung
365801fedd [VLM] Add max-count checking in data parser for single image models (#11661)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-12-31 22:15:21 -08:00
Roger Wang
e7c7c5e822 [V1][VLM] V1 support for selected single-image models. (#11632)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-12-31 21:17:22 +00:00
Michael Goin
74fa1d123c [Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-31 03:43:54 +00:00
Matthias Vogler
a2a40bcd0d [Model][LoRA]LoRA support added for MolmoForCausalLM (#11439)
Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-30 17:33:06 -08:00
whyiug
36e7670045 [Bugfix] Validate and concatenate image embeddings in MiniCPMVBaseModel (#11631) 2024-12-30 18:51:04 +00:00
Cyrus Leung
8d9b6721e7 [VLM] Abstract out multi-modal data parsing in merged processor (#11620)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-30 15:01:35 +00:00
youkaichao
b12e87f942 [platforms] enable platform plugins (#11602)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-30 20:24:45 +08:00
Li, Jiang
5dbf854553 [CI/Build][CPU] Fix CPU CI by lazy importing triton FP8 kernels (#11618)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2024-12-30 10:17:04 +00:00
Michael Goin
0aa38d16f5 Remove print statement in DeepseekScalingRotaryEmbedding (#11604) 2024-12-29 20:16:46 +00:00
youkaichao
dba4d9dec6 [v1][bugfix] fix cudagraph with inplace buffer assignment (#11596)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-29 09:03:49 +00:00
youkaichao
328841d002 [bugfix] interleaving sliding window for cohere2 model (#11583)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-28 16:55:42 +00:00
Roger Wang
b7dcc003dc [Model] Remove hardcoded image tokens ids from Pixtral (#11582)
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-12-28 10:54:23 +00:00
Isotr0py
d34be24bb1 [Model] Support InternLM2 Reward models (#11571)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-28 06:14:10 +00:00
Selali
ac79799403 [Bugfix] Fix for ROCM compressed tensor support (#11561) 2024-12-27 20:12:11 +00:00
Isotr0py
dde1fa18c9 [Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (#11566)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-12-27 19:45:13 +00:00
Jee Jee Li
0240402c46 [Misc]Add BNB quantization for MolmoForCausalLM (#11551)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-27 18:48:24 +00:00
ErezSC42
55509c2114 [MODEL] LoRA support for Jamba model (#11209)
Signed-off-by: Erez Schwartz <erezs@ai21.com>
2024-12-27 17:58:21 +00:00
Cyrus Leung
101418096f [VLM] Support caching in merged multi-modal processor (#11396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-27 17:22:48 +00:00
Jee Jee Li
2c9b8ea2b0 [Bugfix] Fix TeleChat2ForCausalLM weights mapper (#11546)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-27 10:39:15 +00:00
Mengqing Cao
6c6f7fe8a8 [Platform] Move model arch check to platform (#11503)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2024-12-27 08:45:25 +00:00
Robert Shaw
2339d59f92 [BugFix] Fix quantization for all other methods (#11547)
Some checks failed
Create Release / Create Release (push) Has been cancelled
2024-12-26 22:23:29 -08:00
Simon Mo
f49777ba62 Deepseek v3 (#11502)
Some checks failed
Create Release / Create Release (push) Has been cancelled
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>
2024-12-26 16:09:44 -08:00
Michael Goin
2072924d14 [Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: HandH1998 <1335248067@qq.com>
2024-12-26 15:33:30 -08:00
Cyrus Leung
eec906d811 [Misc] Add placeholder module (#11501)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-26 13:12:51 +00:00
Jee Jee Li
f57ee5650d [Model] Modify MolmoForCausalLM MLP (#11510)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-26 13:12:05 +00:00
sroy745
dcb1a944d4 [V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-12-26 19:02:58 +09:00
Cyrus Leung
3f3e92e1f2 [Model] Automatic conversion of classification and reward models (#11469)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-24 18:22:22 +00:00
Jee Jee Li
196c34b0ac [Misc] Move weights mapper (#11443)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-24 13:05:25 +00:00
Jee Jee Li
b1b1038fbd [Bugfix] Fix Qwen2-VL LoRA weight loading (#11430)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-24 09:56:10 +00:00
Michael Goin
60fb4f3bcf [Bugfix] Add kv cache scales to gemma2.py (#11269) 2024-12-23 19:30:45 +00:00
Dipika Sikka
b866cdbd05 [Misc] Add assertion and helpful message for marlin24 compressed models (#11388) 2024-12-24 02:23:38 +08:00
Michael Goin
5bfb30a529 [Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-23 23:06:20 +08:00
Roger Wang
c2d1b075ba [Bugfix] Fix issues for Pixtral-Large-Instruct-2411 (#11393)
Signed-off-by: ywang96 <ywang@example.com>
Co-authored-by: ywang96 <ywang@example.com>
2024-12-21 10:15:03 +00:00
George
51ff216d85 [Bugfix] update should_ignore_layer (#11354)
Signed-off-by: George Ohashi <george@neuralmagic.com>
2024-12-21 06:36:23 +00:00
omer-dayan
995f56236b [Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192)
Signed-off-by: OmerD <omer@run.ai>
2024-12-20 16:46:24 +00:00
Wallas Henrique
86c2d8fd1c [Bugfix] Fix spec decoding when seed is none in a batch (#10863)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-12-20 05:15:31 +00:00
Isotr0py
276738ce0f [Bugfix] Fix broken CPU compressed-tensors test (#11338)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-12-19 17:37:31 +00:00
Isotr0py
e24113a8fe [Model] Refactor Qwen2-VL to use merged multimodal processor (#11258)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-19 16:28:00 +00:00
Roger Wang
7379b3d4b2 [V1] Fix multimodal profiling for Molmo (#11325)
Signed-off-by: ywang96 <ywang@example.com>
Co-authored-by: ywang96 <ywang@example.com>
2024-12-19 16:27:22 +00:00
Yehoshua Cohen
6c7f881541 [Model] Add JambaForSequenceClassification model (#10860)
Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-19 22:48:06 +08:00
Cyrus Leung
a0f7d53beb [Bugfix] Cleanup Pixtral HF code (#11333)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-19 13:22:00 +00:00
Cyrus Leung
6142ef0ada [VLM] Merged multimodal processor for Qwen2-Audio (#11303)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-19 06:14:17 +00:00