Jee Jee Li
|
a655eb3025
|
[Misc]Add BNB quantization for Qwen2VL (#11719)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-03 15:19:02 -07:00 |
|
Aurick Qiao
|
e1a5c2f0a1
|
[Model] Whisper model implementation (#11280)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2025-01-03 16:39:19 +08:00 |
|
Lu Fang
|
07064cb1d4
|
[Bugfix] Check chain_speculative_sampling before calling it (#11673)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-01-02 16:58:56 -08:00 |
|
bjmsong
|
187e32997c
|
[Bugfix] Change kv scaling factor by param json on nvidia gpu (#11688)
Signed-off-by: bjmsong <bjmsong@126.com>
Co-authored-by: bjmsong <bjmsong@126.com>
|
2025-01-02 21:11:39 +00:00 |
|
Cyrus Leung
|
8c38ee7007
|
[VLM] Merged multi-modal processor for LLaVA-NeXT (#11682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-02 16:39:27 +00:00 |
|
Cyrus Leung
|
a115ac46b5
|
[VLM] Move supported limits and max tokens to merged multi-modal processor (#11669)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-01 15:44:42 +00:00 |
|
Kazuhiro Serizawa
|
6d70198b17
|
[Doc] Fix typo (#11666)
Signed-off-by: Kazuhiro Serizawa <nserihiro@gmail.com>
|
2025-01-01 08:10:10 +00:00 |
|
Jee Jee Li
|
11d8a091c6
|
[Misc] Optimize Qwen2-VL LoRA test (#11663)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-01 14:42:23 +08:00 |
|
Cyrus Leung
|
365801fedd
|
[VLM] Add max-count checking in data parser for single image models (#11661)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-31 22:15:21 -08:00 |
|
Roger Wang
|
e7c7c5e822
|
[V1][VLM] V1 support for selected single-image models. (#11632)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-31 21:17:22 +00:00 |
|
Michael Goin
|
74fa1d123c
|
[Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-31 03:43:54 +00:00 |
|
Matthias Vogler
|
a2a40bcd0d
|
[Model][LoRA]LoRA support added for MolmoForCausalLM (#11439)
Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-30 17:33:06 -08:00 |
|
whyiug
|
36e7670045
|
[Bugfix] Validate and concatenate image embeddings in MiniCPMVBaseModel (#11631)
|
2024-12-30 18:51:04 +00:00 |
|
Cyrus Leung
|
8d9b6721e7
|
[VLM] Abstract out multi-modal data parsing in merged processor (#11620)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-30 15:01:35 +00:00 |
|
youkaichao
|
b12e87f942
|
[platforms] enable platform plugins (#11602)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-30 20:24:45 +08:00 |
|
Li, Jiang
|
5dbf854553
|
[CI/Build][CPU] Fix CPU CI by lazy importing triton FP8 kernels (#11618)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2024-12-30 10:17:04 +00:00 |
|
Michael Goin
|
0aa38d16f5
|
Remove print statement in DeepseekScalingRotaryEmbedding (#11604)
|
2024-12-29 20:16:46 +00:00 |
|
youkaichao
|
dba4d9dec6
|
[v1][bugfix] fix cudagraph with inplace buffer assignment (#11596)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-29 09:03:49 +00:00 |
|
youkaichao
|
328841d002
|
[bugfix] interleaving sliding window for cohere2 model (#11583)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-28 16:55:42 +00:00 |
|
Roger Wang
|
b7dcc003dc
|
[Model] Remove hardcoded image tokens ids from Pixtral (#11582)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-28 10:54:23 +00:00 |
|
Isotr0py
|
d34be24bb1
|
[Model] Support InternLM2 Reward models (#11571)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-28 06:14:10 +00:00 |
|
Selali
|
ac79799403
|
[Bugfix] Fix for ROCM compressed tensor support (#11561)
|
2024-12-27 20:12:11 +00:00 |
|
Isotr0py
|
dde1fa18c9
|
[Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (#11566)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-27 19:45:13 +00:00 |
|
Jee Jee Li
|
0240402c46
|
[Misc]Add BNB quantization for MolmoForCausalLM (#11551)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-27 18:48:24 +00:00 |
|
ErezSC42
|
55509c2114
|
[MODEL] LoRA support for Jamba model (#11209)
Signed-off-by: Erez Schwartz <erezs@ai21.com>
|
2024-12-27 17:58:21 +00:00 |
|
Cyrus Leung
|
101418096f
|
[VLM] Support caching in merged multi-modal processor (#11396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-27 17:22:48 +00:00 |
|
Jee Jee Li
|
2c9b8ea2b0
|
[Bugfix] Fix TeleChat2ForCausalLM weights mapper (#11546)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-27 10:39:15 +00:00 |
|
Mengqing Cao
|
6c6f7fe8a8
|
[Platform] Move model arch check to platform (#11503)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2024-12-27 08:45:25 +00:00 |
|
Robert Shaw
|
2339d59f92
|
[BugFix] Fix quantization for all other methods (#11547)
Create Release / Create Release (push) Has been cancelled
|
2024-12-26 22:23:29 -08:00 |
|
Simon Mo
|
f49777ba62
|
Deepseek v3 (#11502)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>
|
2024-12-26 16:09:44 -08:00 |
|
Michael Goin
|
2072924d14
|
[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-26 15:33:30 -08:00 |
|
Cyrus Leung
|
eec906d811
|
[Misc] Add placeholder module (#11501)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-26 13:12:51 +00:00 |
|
Jee Jee Li
|
f57ee5650d
|
[Model] Modify MolmoForCausalLM MLP (#11510)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-26 13:12:05 +00:00 |
|
sroy745
|
dcb1a944d4
|
[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-26 19:02:58 +09:00 |
|
Cyrus Leung
|
3f3e92e1f2
|
[Model] Automatic conversion of classification and reward models (#11469)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 18:22:22 +00:00 |
|
Jee Jee Li
|
196c34b0ac
|
[Misc] Move weights mapper (#11443)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-24 13:05:25 +00:00 |
|
Jee Jee Li
|
b1b1038fbd
|
[Bugfix] Fix Qwen2-VL LoRA weight loading (#11430)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-24 09:56:10 +00:00 |
|
Michael Goin
|
60fb4f3bcf
|
[Bugfix] Add kv cache scales to gemma2.py (#11269)
|
2024-12-23 19:30:45 +00:00 |
|
Dipika Sikka
|
b866cdbd05
|
[Misc] Add assertion and helpful message for marlin24 compressed models (#11388)
|
2024-12-24 02:23:38 +08:00 |
|
Michael Goin
|
5bfb30a529
|
[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-23 23:06:20 +08:00 |
|
Roger Wang
|
c2d1b075ba
|
[Bugfix] Fix issues for Pixtral-Large-Instruct-2411 (#11393)
Signed-off-by: ywang96 <ywang@example.com>
Co-authored-by: ywang96 <ywang@example.com>
|
2024-12-21 10:15:03 +00:00 |
|
George
|
51ff216d85
|
[Bugfix] update should_ignore_layer (#11354)
Signed-off-by: George Ohashi <george@neuralmagic.com>
|
2024-12-21 06:36:23 +00:00 |
|
omer-dayan
|
995f56236b
|
[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192)
Signed-off-by: OmerD <omer@run.ai>
|
2024-12-20 16:46:24 +00:00 |
|
Wallas Henrique
|
86c2d8fd1c
|
[Bugfix] Fix spec decoding when seed is none in a batch (#10863)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-12-20 05:15:31 +00:00 |
|
Isotr0py
|
276738ce0f
|
[Bugfix] Fix broken CPU compressed-tensors test (#11338)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-19 17:37:31 +00:00 |
|
Isotr0py
|
e24113a8fe
|
[Model] Refactor Qwen2-VL to use merged multimodal processor (#11258)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 16:28:00 +00:00 |
|
Roger Wang
|
7379b3d4b2
|
[V1] Fix multimodal profiling for Molmo (#11325)
Signed-off-by: ywang96 <ywang@example.com>
Co-authored-by: ywang96 <ywang@example.com>
|
2024-12-19 16:27:22 +00:00 |
|
Yehoshua Cohen
|
6c7f881541
|
[Model] Add JambaForSequenceClassification model (#10860)
Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 22:48:06 +08:00 |
|
Cyrus Leung
|
a0f7d53beb
|
[Bugfix] Cleanup Pixtral HF code (#11333)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 13:22:00 +00:00 |
|
Cyrus Leung
|
6142ef0ada
|
[VLM] Merged multimodal processor for Qwen2-Audio (#11303)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 06:14:17 +00:00 |
|