Woosuk Kwon
|
371d04d39b
|
[V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (#11394)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-27 09:32:38 +09:00 |
|
Simon Mo
|
f49777ba62
|
Deepseek v3 (#11502)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com>
|
2024-12-26 16:09:44 -08:00 |
|
Robert Shaw
|
55fb97f7bd
|
[2/N] API Server: Avoid ulimit footgun (#11530)
|
2024-12-26 23:43:05 +00:00 |
|
Michael Goin
|
2072924d14
|
[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-26 15:33:30 -08:00 |
|
Robert Shaw
|
720b10fdc6
|
[1/N] API Server (Remove Proxy) (#11529)
|
2024-12-26 23:03:43 +00:00 |
|
Cyrus Leung
|
eec906d811
|
[Misc] Add placeholder module (#11501)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-26 13:12:51 +00:00 |
|
Jee Jee Li
|
f57ee5650d
|
[Model] Modify MolmoForCausalLM MLP (#11510)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-26 13:12:05 +00:00 |
|
sroy745
|
dcb1a944d4
|
[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681)
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-26 19:02:58 +09:00 |
|
Jee Jee Li
|
aa25985bd1
|
[Misc][LoRA] Fix LoRA weight mapper (#11495)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-26 15:52:48 +08:00 |
|
Lucas Tucker
|
dbeac95dbb
|
Mypy checking for vllm/compilation (#11496)
Signed-off-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: lucast2021 <lucast2021@headroyce.org>
|
2024-12-26 05:04:07 +00:00 |
|
Cyrus Leung
|
51a624bf02
|
[Misc] Move some multimodal utils to modality-specific modules (#11494)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-26 04:23:20 +00:00 |
|
Cyrus Leung
|
b689ada91e
|
[Frontend] Enable decord to load video from base64 (#11492)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-25 16:33:55 +00:00 |
|
Rui Qiao
|
9832e5572a
|
[V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor (#11472)
|
2024-12-24 19:49:46 -08:00 |
|
Cyrus Leung
|
3f3e92e1f2
|
[Model] Automatic conversion of classification and reward models (#11469)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 18:22:22 +00:00 |
|
Jee Jee Li
|
196c34b0ac
|
[Misc] Move weights mapper (#11443)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-24 13:05:25 +00:00 |
|
Mengqing Cao
|
5c7963249d
|
[attn][tiny fix] fix attn backend in MultiHeadAttention (#11463)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2024-12-24 12:39:36 +00:00 |
|
Isotr0py
|
7a5286cc04
|
[Bugfix][Hardware][CPU] Fix CPU input_positions creation for text-only inputs with mrope (#11434)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-24 17:59:51 +08:00 |
|
Jee Jee Li
|
b1b1038fbd
|
[Bugfix] Fix Qwen2-VL LoRA weight loading (#11430)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-24 09:56:10 +00:00 |
|
Cyrus Leung
|
9edca6bf8f
|
[Frontend] Online Pooling API (#11457)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 17:54:30 +08:00 |
|
dpxa
|
4f074fbf53
|
[Misc]Suppress irrelevant exception stack trace information when CUDA… (#11438)
Co-authored-by: shiquan <shiquan>
|
2024-12-24 08:43:39 +00:00 |
|
Rui Qiao
|
a491d6f535
|
[V1] TP Ray executor (#11107)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-12-23 23:00:12 +00:00 |
|
Rafael Vasquez
|
32aa2059ad
|
[Docs] Convert rST to MyST (Markdown) (#11145)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-12-23 22:35:38 +00:00 |
|
yansh97
|
94d545a1a1
|
[Doc] Fix typo in the help message of '--guided-decoding-backend' (#11440)
|
2024-12-23 20:20:44 +00:00 |
|
Michael Goin
|
60fb4f3bcf
|
[Bugfix] Add kv cache scales to gemma2.py (#11269)
|
2024-12-23 19:30:45 +00:00 |
|
Dipika Sikka
|
b866cdbd05
|
[Misc] Add assertion and helpful message for marlin24 compressed models (#11388)
|
2024-12-24 02:23:38 +08:00 |
|
Michael Goin
|
5bfb30a529
|
[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-23 23:06:20 +08:00 |
|
Lucas Tucker
|
e51719ae72
|
mypy type checking for vllm/worker (#11418)
Signed-off-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: lucast2021 <lucast2021@headroyce.org>
|
2024-12-23 13:55:49 +00:00 |
|
youkaichao
|
f30581c518
|
[misc][perf] remove old code (#11425)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-23 08:01:08 +00:00 |
|
Jason T. Greene
|
f1d1bf6288
|
[Bugfix] Fix fully sharded LoRAs with Mixtral (#11390)
Signed-off-by: Jason Greene <jason.greene@redhat.com>
|
2024-12-22 23:25:10 +08:00 |
|
Roger Wang
|
c2d1b075ba
|
[Bugfix] Fix issues for Pixtral-Large-Instruct-2411 (#11393)
Signed-off-by: ywang96 <ywang@example.com>
Co-authored-by: ywang96 <ywang@example.com>
|
2024-12-21 10:15:03 +00:00 |
|
Ricky Xu
|
584f0ae40d
|
[V1] Make AsyncLLMEngine v1-v0 opaque (#11383)
Signed-off-by: Ricky Xu <xuchen727@hotmail.com>
|
2024-12-21 15:14:08 +08:00 |
|
George
|
51ff216d85
|
[Bugfix] update should_ignore_layer (#11354)
Signed-off-by: George Ohashi <george@neuralmagic.com>
|
2024-12-21 06:36:23 +00:00 |
|
Woosuk Kwon
|
dd2b5633dd
|
[V1][Bugfix] Skip hashing empty or None mm_data (#11386)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-21 14:22:21 +09:00 |
|
Michael Goin
|
d573aeadcc
|
[Bugfix] Don't log OpenAI field aliases as ignored (#11378)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-20 19:03:50 +00:00 |
|
omer-dayan
|
995f56236b
|
[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192)
Signed-off-by: OmerD <omer@run.ai>
|
2024-12-20 16:46:24 +00:00 |
|
Roger Wang
|
04139ade59
|
[V1] Fix profiling for models with merged input processor (#11370)
Signed-off-by: ywang96 <ywang@roblox.com>
|
2024-12-20 12:04:21 +00:00 |
|
youkaichao
|
c954f21ac0
|
[misc] add early error message for custom ops (#11355)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-19 21:18:25 -08:00 |
|
Wallas Henrique
|
86c2d8fd1c
|
[Bugfix] Fix spec decoding when seed is none in a batch (#10863)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-12-20 05:15:31 +00:00 |
|
Michael Goin
|
b880ffb87e
|
[Misc] Add tqdm progress bar during graph capture (#11349)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-20 04:35:18 +00:00 |
|
Akash kaothalkar
|
48edab8041
|
[Bugfix][Hardware][POWERPC] Fix auto dtype failure in case of POWER10 (#11331)
Signed-off-by: Akash Kaothalkar <0052v2@linux.vnet.ibm.com>
|
2024-12-20 01:32:07 +00:00 |
|
yangzhibin
|
e461c262f0
|
[Misc] Remove unused vllm/block.py (#11336)
|
2024-12-19 17:54:24 +00:00 |
|
Isotr0py
|
276738ce0f
|
[Bugfix] Fix broken CPU compressed-tensors test (#11338)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-19 17:37:31 +00:00 |
|
Cyrus Leung
|
cdf22afdda
|
[Misc] Clean up and consolidate LRUCache (#11339)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-20 00:59:32 +08:00 |
|
Isotr0py
|
e24113a8fe
|
[Model] Refactor Qwen2-VL to use merged multimodal processor (#11258)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 16:28:00 +00:00 |
|
Roger Wang
|
7379b3d4b2
|
[V1] Fix multimodal profiling for Molmo (#11325)
Signed-off-by: ywang96 <ywang@example.com>
Co-authored-by: ywang96 <ywang@example.com>
|
2024-12-19 16:27:22 +00:00 |
|
Yehoshua Cohen
|
6c7f881541
|
[Model] Add JambaForSequenceClassification model (#10860)
Signed-off-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 22:48:06 +08:00 |
|
Cyrus Leung
|
a0f7d53beb
|
[Bugfix] Cleanup Pixtral HF code (#11333)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 13:22:00 +00:00 |
|
Yanyi Liu
|
5aef49806d
|
[Feature] Add load generation config from model (#11164)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-12-19 10:50:38 +00:00 |
|
Rui Qiao
|
f26c4aeecb
|
[Misc] Optimize ray worker initialization time (#11275)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-12-18 23:38:02 -08:00 |
|
Cyrus Leung
|
6142ef0ada
|
[VLM] Merged multimodal processor for Qwen2-Audio (#11303)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 06:14:17 +00:00 |
|