Cody Yu
|
4f065f12f5
|
[Misc][V1] Skip device checking if not available (#15061)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-18 19:33:43 -07:00 |
|
Jennifer Zhao
|
228b768db6
|
[Doc] Minor v1_user_guide update (#15064)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-03-18 16:10:45 -07:00 |
|
Chujie Zheng
|
027827cc1d
|
fix long dtype in topk sampling (#15049)
|
2025-03-18 15:57:31 -07:00 |
|
Alexander Matveev
|
72a8639b68
|
[V1] TPU - CI/CD use smaller model (#15054)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-18 21:39:21 +00:00 |
|
Woosuk Kwon
|
99abb8b650
|
[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-18 14:31:54 -07:00 |
|
Russell Bryant
|
3a1e648158
|
[V1] Refactor Structured Output for multiple backends (#14694)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-18 19:49:15 +00:00 |
|
Jee Jee Li
|
46c759c165
|
[Bugfix] Fix LoRA extra vocab size (#15047)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-18 09:40:29 -07:00 |
|
Isotr0py
|
179a619c21
|
[Bugfix] Fix broken CPU quantization due to triton import (#15038)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-18 08:57:39 -07:00 |
|
yury-tokpanov
|
452e8fd968
|
[MODEL] Add support for Zamba2 models (#13185)
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-18 08:56:21 -07:00 |
|
ekuznetsov139
|
8b793f7ec6
|
MI325 configs, fused_moe_kernel bugfix (#14987)
Signed-off-by: Eugene Kuznetsov <eugene.kuznetsov@amd.com>
|
2025-03-18 08:05:18 -07:00 |
|
Nicolò Lucchesi
|
af35d3a3cc
|
[TPU][V1][Bugfix] Fix chunked prefill with padding (#15037)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-18 07:34:45 -07:00 |
|
Simon Mo
|
3b457143d2
|
[Bugfix] Register serializers for V0 MQ Engine (#15009)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-18 09:14:47 -04:00 |
|
Cyrus Leung
|
ab656f2c2f
|
[Bugfix] Loosen type check to avoid errors in V1 (#15021)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-18 12:54:40 +00:00 |
|
Serena
|
64fc2193dc
|
[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros (#14347)
|
2025-03-18 05:50:19 -07:00 |
|
Sebastian Schoennenbeck
|
dd732028f5
|
[Bugfix][Frontend] Fix validation of logprobs in ChatCompletionRequest (#14352)
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
|
2025-03-18 05:50:05 -07:00 |
|
hoshi-hiyouga
|
414919138b
|
[Bugfix] torchrun compatibility (#14899)
Signed-off-by: hiyouga <hiyouga@buaa.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-18 05:49:27 -07:00 |
|
Jee Jee Li
|
db7c8ca910
|
[Misc] Embedding model support LoRA (#14935)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-18 12:07:00 +00:00 |
|
Patrick von Platen
|
f863ffc965
|
[Mistral-Small 3.1] Update docs and tests (#14977)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-18 03:29:42 -07:00 |
|
Varun Sundar Rabindranath
|
400d483e87
|
[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-18 09:47:53 +00:00 |
|
Shanshan Shen
|
d1695758b2
|
[Doc][V1] Fix V1 APC doc (#14920)
|
2025-03-18 08:15:46 +00:00 |
|
Liangfu Chen
|
53a0cf8b95
|
[Neuron] trim attention kernel tests to fit trn1.2x instance (#14988)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-03-18 15:05:52 +08:00 |
|
Tristan Leclercq
|
5eeabc2a44
|
[Bugfix] Fix bnb quantization for models with both HF-format and Mistral-format weights (#14950)
|
2025-03-17 23:27:26 +00:00 |
|
Alexander Matveev
|
18551e820c
|
[V1] TPU - Fix CI/CD runner (#14974)
|
2025-03-17 21:07:07 +00:00 |
|
Robert Shaw
|
e41e160263
|
[V1] Guard Against Main Thread Usage (#14972)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-17 13:23:02 -07:00 |
|
Cyrus Leung
|
b89fb2a4a1
|
[CI/Build] Use AutoModelForImageTextToText to load VLMs in tests (#14945)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 18:35:17 +00:00 |
|
Roger Wang
|
5340b0e221
|
[Bugfix] Fix interface for Olmo2 on V1 (#14976)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-17 11:26:38 -07:00 |
|
Roger Wang
|
37e3806132
|
[Bugfix] Make Gemma3 MM V0 only for now (#14971)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Roger Wang <ywang@roblox.com>
v0.8.0rc2
|
2025-03-17 10:04:21 -07:00 |
|
Aaron Pham
|
c0efdd655b
|
[Fix][Structured Output] using vocab_size to construct matcher (#14868)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-03-17 11:42:45 -04:00 |
|
Quentin
|
aaaec52ad9
|
[Bugfix][Model] Mixtral: use unused head_dim config argument (#14961)
Signed-off-by: Quentin Torroba <quentin.torroba@mistral.ai>
|
2025-03-17 07:44:18 -07:00 |
|
Tyler Michael Smith
|
e1eb45d397
|
[Bugfix] Fix precommit - line too long in pixtral.py (#14960)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 07:18:50 -07:00 |
|
Simon Mo
|
89fca671fb
|
[V1] Default MLA to V1 (#14921)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-17 06:54:40 -07:00 |
|
Patrick von Platen
|
d20b0c139c
|
Add patch merger (#14957)
|
2025-03-17 06:47:50 -07:00 |
|
Cyrus Leung
|
166a168b0f
|
[Doc] Fix misleading log during multi-modal profiling (#14955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 06:14:32 -07:00 |
|
vllmellm
|
2bb0e1a799
|
[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-17 11:33:35 +00:00 |
|
Cyrus Leung
|
6eaf1e5c52
|
[Misc] Add --seed option to offline multi-modal examples (#14934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 03:00:17 -07:00 |
|
Cyrus Leung
|
868a8c5b2c
|
[Bugfix] Fix Ultravox on V1 (#14929)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 17:15:20 +08:00 |
|
iefgnoix
|
b4ad56c1bd
|
[V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. (#14846)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-03-17 01:48:28 -07:00 |
|
kushanam
|
69698f257e
|
fix minor miscalled method (#14327)
|
2025-03-17 01:47:58 -07:00 |
|
Lu Fang
|
cd0cd85102
|
[MISC] More AMD unused var clean up (#14926)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-17 16:40:41 +08:00 |
|
Russell Bryant
|
0a74bfce9c
|
setup.py: drop assumption about local main branch (#14692)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-17 01:37:42 -07:00 |
|
Chen Zhang
|
dd3b865854
|
[Doc] Add vLLM Beijing meetup slide (#14938)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-17 16:29:36 +08:00 |
|
Yan Ma
|
9b87a579aa
|
[Misc][XPU] Use None as device capacity for XPU (#14932)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2025-03-17 01:22:14 -07:00 |
|
Cyrus Leung
|
b539222d4e
|
[V1] Remove input cache client (#14864)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-16 23:42:06 -07:00 |
|
Lily Liu
|
8d6cf89526
|
[V1] [Spec Decode] Support random sampling for spec decode (#13933)
Create Release / Create Release (push) Has been cancelled
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
v0.8.0rc1
|
2025-03-16 22:00:20 -07:00 |
|
Simon Mo
|
583a9778e0
|
[Benchmark] Do not save detailed info to json by default (#14879)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-16 21:48:11 -07:00 |
|
Sibi
|
a73e183e36
|
[Misc] Replace os environ to monkeypatch in test suite (#14516)
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-16 20:35:57 -07:00 |
|
Lucas Wilkinson
|
1e799b7ec1
|
[BugFix] Fix MLA + V1 + TP==1 causing reinitialization of cuda context (#14910)
|
2025-03-17 03:35:37 +00:00 |
|
Woosuk Kwon
|
7f6c5ee06c
|
[V1][Minor] Add __repr__ to ConstantList (#14907)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 20:20:15 -07:00 |
|
Woosuk Kwon
|
faa0275730
|
[V1] Optimize the overhead of rewinding (#14905)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 20:19:30 -07:00 |
|
Cyrus Leung
|
8a5a9b70d7
|
[CI/Build] Update defaults for test reproducibility (#14893)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 10:38:15 +08:00 |
|