vllmellm
|
5ebf66748b
|
[FEAT][ROCm] Integrate Fused MoE Kernels from AITER (#14967)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-26 16:30:30 +08:00 |
|
Cyrus Leung
|
997c8811d6
|
[Model] Support multi-image for Molmo (#15438)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-26 11:26:33 +08:00 |
|
Harry Mellor
|
e42389f9d7
|
Transformers backend already supports V1 (#15463)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-25 20:26:16 -07:00 |
|
Cyrus Leung
|
a9e879b316
|
[Misc] Clean up MiniCPM-V/O code (#15337)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-25 10:22:52 +00:00 |
|
Naitong Yu
|
2f4bd358f1
|
[Model] Support Tele-FLM Model (#15023)
Signed-off-by: Naitong Yu <ntyu@baai.ac.cn>
Signed-off-by: jiangxin <horizon94@outlook.com>
Co-authored-by: Jason Fang <jasonfang3900@gmail.com>
Co-authored-by: jiangxin <horizon94@outlook.com>
|
2025-03-22 02:04:44 -07:00 |
|
TJian
|
ec870fba9a
|
[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (#14959)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-21 22:36:14 -07:00 |
|
Isotr0py
|
1e508343e1
|
[Bugfix] Fix incorrect qwen2.5-vl attention mask pre-computation (#15200)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-20 19:18:04 -07:00 |
|
Matt Ritter
|
a8652f4f0f
|
Enable CUDA graph support for llama 3.2 vision (#14917)
Signed-off-by: Matt Ritter <100659061+mritterfigma@users.noreply.github.com>
|
2025-03-19 23:29:16 -07:00 |
|
Cyrus Leung
|
f690372b68
|
[Core] Update dtype detection and defaults (#14858)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 13:49:33 +08:00 |
|
Jee Jee Li
|
46c759c165
|
[Bugfix] Fix LoRA extra vocab size (#15047)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-18 09:40:29 -07:00 |
|
yury-tokpanov
|
452e8fd968
|
[MODEL] Add support for Zamba2 models (#13185)
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-18 08:56:21 -07:00 |
|
Patrick von Platen
|
f863ffc965
|
[Mistral-Small 3.1] Update docs and tests (#14977)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-18 03:29:42 -07:00 |
|
Cyrus Leung
|
b89fb2a4a1
|
[CI/Build] Use AutoModelForImageTextToText to load VLMs in tests (#14945)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 18:35:17 +00:00 |
|
vllmellm
|
2bb0e1a799
|
[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-17 11:33:35 +00:00 |
|
Sibi
|
a73e183e36
|
[Misc] Replace os environ to monkeypatch in test suite (#14516)
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-16 20:35:57 -07:00 |
|
Robert Shaw
|
bb3aeddfaf
|
[CI] Nightly Tests (#14898)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-03-17 02:06:43 +00:00 |
|
Isotr0py
|
def232e122
|
[VLM] Clean up Phi-4-MM ViT implementation (#14812)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-03-15 18:53:52 -07:00 |
|
Rémi Delacourt
|
61c6a5a796
|
[VLM] Merged multi-modal processor for Pixtral (#12211)
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-15 06:28:27 -07:00 |
|
Cyrus Leung
|
3556a41434
|
[VLM] Limit multimodal input cache by memory (#14805)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-15 02:52:05 -07:00 |
|
Li, Jiang
|
a2ae496589
|
[CPU] Support FP8 KV cache (#14741)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-03-14 22:07:36 -07:00 |
|
Robert Shaw
|
d4d93db2c5
|
[V1] V1 Enablement Oracle (#13726)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-03-14 22:02:20 -07:00 |
|
Cyrus Leung
|
613c5bb945
|
[Bugfix] Fix Aria test loading (#14823)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-14 09:11:23 -07:00 |
|
Roger Wang
|
0c2af17c76
|
[CI] Fix missing example model id in processor test (#14787)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-14 13:52:15 +08:00 |
|
Cyrus Leung
|
8e9ffd37d6
|
[Misc] Clean up processor tests (#14771)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-13 18:25:37 +00:00 |
|
Cyrus Leung
|
f53a0586b9
|
[Bugfix] Fix prompt format of GLM4V (#14539)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-13 11:37:17 +00:00 |
|
Cyrus Leung
|
382403921f
|
[VLM] Support pan-and-scan for Gemma3 multi-modal processor (#14672)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-13 02:23:12 -07:00 |
|
TJian
|
916836bbfb
|
[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-12 09:31:19 -07:00 |
|
Woosuk Kwon
|
c0c25e25fa
|
[Model] Add support for Gemma 3 (#14660)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-12 08:36:33 -07:00 |
|
Pavani Majety
|
debd6bbf09
|
[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-03-12 05:13:11 +00:00 |
|
Isotr0py
|
e392d85831
|
[Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization (#14545)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-11 20:12:52 -07:00 |
|
Farzad Abdolhosseini
|
80e78d02ac
|
[Model] Extend Ultravox to accept audio longer than 30s (#13631)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
|
2025-03-12 10:27:10 +08:00 |
|
Cyrus Leung
|
af295e9b01
|
[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (#14609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-11 07:59:43 -07:00 |
|
kYLe
|
1769928079
|
[Model] Update Paligemma multimodal processing with PromptUpdate (#14015)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-03-06 08:31:38 +00:00 |
|
Congcong Chen
|
0a995d5434
|
[Model] New model support for Phi-4-multimodal-instruct (#14119)
|
2025-03-04 20:57:01 -08:00 |
|
Travis Johnson
|
c060b71408
|
[Model] Add support for GraniteMoeShared models (#13313)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-04 08:04:52 +08:00 |
|
Harry Mellor
|
cf069aa8aa
|
Update deprecated Python 3.8 typing (#13971)
|
2025-03-02 17:34:51 -08:00 |
|
Harry Mellor
|
76c89fcadd
|
Use smaller embedding model when not testing model specifically (#13891)
|
2025-02-28 00:50:43 -08:00 |
|
Travis Johnson
|
73e0225ee9
|
[Bugfix] Check that number of images matches number of <|image|> tokens with mllama (#13911)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-02-28 04:00:45 +00:00 |
|
Isotr0py
|
edf309ebbe
|
[VLM] Support multimodal inputs for Florence-2 models (#13320)
|
2025-02-27 02:06:41 -08:00 |
|
Cyrus Leung
|
7b700ec8c8
|
[Bugfix] Add test example for Ultravox v0.5 (#13890)
|
2025-02-26 02:31:43 -08:00 |
|
Roger Wang
|
7ca1da020f
|
[Misc] Fix input processing for Ultravox (#13871)
|
2025-02-25 23:56:34 -08:00 |
|
Michael Goin
|
07c4353057
|
[Model] Support Grok1 (#13795)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-26 01:07:12 +00:00 |
|
Isotr0py
|
ba5106e519
|
[LMM] Implement merged multimodal processor for whisper (#13278)
|
2025-02-23 01:46:03 -08:00 |
|
Kevin H. Luu
|
2c5e637b57
|
[ci] Use env var to control whether to use S3 bucket in CI (#13634)
|
2025-02-22 19:19:45 -08:00 |
|
Cyrus Leung
|
377d10bd14
|
[VLM][Bugfix] Pass processor kwargs properly on init (#13516)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-19 13:13:50 +00:00 |
|
Lucia Fang
|
f525c0be8b
|
[Model][Speculative Decoding] DeepSeek MTP spec decode (#12755)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-02-19 17:06:23 +08:00 |
|
Alex Brooks
|
983a40a8bb
|
[Bugfix] Fix Positive Feature Layers in Llava Models (#13514)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
|
2025-02-19 08:50:07 +00:00 |
|
Kevin H. Luu
|
d5d214ac7f
|
[1/n][CI] Load models in CI from S3 instead of HF (#13205)
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>
|
2025-02-19 07:34:59 +00:00 |
|
Isotr0py
|
67ef8f666a
|
[Model] Enable quantization support for transformers backend (#12960)
|
2025-02-17 19:52:47 -08:00 |
|
Tyler Michael Smith
|
1f69c4a892
|
[Model] Support Mamba2 (Codestral Mamba) (#9292)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-02-17 20:17:50 +08:00 |
|