Kunshang Ji
|
68cf1601d3
|
[CI][Intel GPU] update XPU dockerfile and CI script (#15109)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-03-19 01:29:25 -07:00 |
|
Cyrus Leung
|
61f412187d
|
[Bugfix] Re-enable Gemma3 for V1 (#14980)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-18 23:58:22 -07:00 |
|
Woosuk Kwon
|
05ccd0aa35
|
[V1] Ensure using int64 for sampled token ids (#15065)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-18 23:52:19 -07:00 |
|
Cyrus Leung
|
f690372b68
|
[Core] Update dtype detection and defaults (#14858)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-19 13:49:33 +08:00 |
|
Brayden Zhong
|
8b3e94a357
|
[Model] Remove duplicated message check in Mistral chat completion request (#15069)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-03-19 05:09:32 +00:00 |
|
Julien Denize
|
437f9162d0
|
[Model] Pixtral: Remove layer instantiation duplication (#15053)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-03-19 10:34:03 +08:00 |
|
Cody Yu
|
4f065f12f5
|
[Misc][V1] Skip device checking if not available (#15061)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-18 19:33:43 -07:00 |
|
Jennifer Zhao
|
228b768db6
|
[Doc] Minor v1_user_guide update (#15064)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-03-18 16:10:45 -07:00 |
|
Chujie Zheng
|
027827cc1d
|
fix long dtype in topk sampling (#15049)
|
2025-03-18 15:57:31 -07:00 |
|
Alexander Matveev
|
72a8639b68
|
[V1] TPU - CI/CD use smaller model (#15054)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-18 21:39:21 +00:00 |
|
Woosuk Kwon
|
99abb8b650
|
[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-18 14:31:54 -07:00 |
|
Russell Bryant
|
3a1e648158
|
[V1] Refactor Structured Output for multiple backends (#14694)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-18 19:49:15 +00:00 |
|
Jee Jee Li
|
46c759c165
|
[Bugfix] Fix LoRA extra vocab size (#15047)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-18 09:40:29 -07:00 |
|
Isotr0py
|
179a619c21
|
[Bugfix] Fix broken CPU quantization due to triton import (#15038)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-18 08:57:39 -07:00 |
|
yury-tokpanov
|
452e8fd968
|
[MODEL] Add support for Zamba2 models (#13185)
Signed-off-by: Yury Tokpanov <yury@zyphra.com>
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-18 08:56:21 -07:00 |
|
ekuznetsov139
|
8b793f7ec6
|
MI325 configs, fused_moe_kernel bugfix (#14987)
Signed-off-by: Eugene Kuznetsov <eugene.kuznetsov@amd.com>
|
2025-03-18 08:05:18 -07:00 |
|
Nicolò Lucchesi
|
af35d3a3cc
|
[TPU][V1][Bugfix] Fix chunked prefill with padding (#15037)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-18 07:34:45 -07:00 |
|
Simon Mo
|
3b457143d2
|
[Bugfix] Register serializers for V0 MQ Engine (#15009)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-18 09:14:47 -04:00 |
|
Cyrus Leung
|
ab656f2c2f
|
[Bugfix] Loosen type check to avoid errors in V1 (#15021)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-18 12:54:40 +00:00 |
|
Serena
|
64fc2193dc
|
[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros (#14347)
|
2025-03-18 05:50:19 -07:00 |
|
Sebastian Schoennenbeck
|
dd732028f5
|
[Bugfix][Frontend] Fix validation of logprobs in ChatCompletionRequest (#14352)
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
|
2025-03-18 05:50:05 -07:00 |
|
hoshi-hiyouga
|
414919138b
|
[Bugfix] torchrun compatibility (#14899)
Signed-off-by: hiyouga <hiyouga@buaa.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-18 05:49:27 -07:00 |
|
Jee Jee Li
|
db7c8ca910
|
[Misc] Embedding model support LoRA (#14935)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-18 12:07:00 +00:00 |
|
Patrick von Platen
|
f863ffc965
|
[Mistral-Small 3.1] Update docs and tests (#14977)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-18 03:29:42 -07:00 |
|
Varun Sundar Rabindranath
|
400d483e87
|
[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-18 09:47:53 +00:00 |
|
Shanshan Shen
|
d1695758b2
|
[Doc][V1] Fix V1 APC doc (#14920)
|
2025-03-18 08:15:46 +00:00 |
|
Liangfu Chen
|
53a0cf8b95
|
[Neuron] trim attention kernel tests to fit trn1.2x instance (#14988)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-03-18 15:05:52 +08:00 |
|
Tristan Leclercq
|
5eeabc2a44
|
[Bugfix] Fix bnb quantization for models with both HF-format and Mistral-format weights (#14950)
|
2025-03-17 23:27:26 +00:00 |
|
Alexander Matveev
|
18551e820c
|
[V1] TPU - Fix CI/CD runner (#14974)
|
2025-03-17 21:07:07 +00:00 |
|
Robert Shaw
|
e41e160263
|
[V1] Guard Against Main Thread Usage (#14972)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-17 13:23:02 -07:00 |
|
Cyrus Leung
|
b89fb2a4a1
|
[CI/Build] Use AutoModelForImageTextToText to load VLMs in tests (#14945)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 18:35:17 +00:00 |
|
Roger Wang
|
5340b0e221
|
[Bugfix] Fix interface for Olmo2 on V1 (#14976)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-17 11:26:38 -07:00 |
|
Roger Wang
|
37e3806132
|
[Bugfix] Make Gemma3 MM V0 only for now (#14971)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Roger Wang <ywang@roblox.com>
v0.8.0rc2
|
2025-03-17 10:04:21 -07:00 |
|
Aaron Pham
|
c0efdd655b
|
[Fix][Structured Output] using vocab_size to construct matcher (#14868)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-03-17 11:42:45 -04:00 |
|
Quentin
|
aaaec52ad9
|
[Bugfix][Model] Mixtral: use unused head_dim config argument (#14961)
Signed-off-by: Quentin Torroba <quentin.torroba@mistral.ai>
|
2025-03-17 07:44:18 -07:00 |
|
Tyler Michael Smith
|
e1eb45d397
|
[Bugfix] Fix precommit - line too long in pixtral.py (#14960)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 07:18:50 -07:00 |
|
Simon Mo
|
89fca671fb
|
[V1] Default MLA to V1 (#14921)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-17 06:54:40 -07:00 |
|
Patrick von Platen
|
d20b0c139c
|
Add patch merger (#14957)
|
2025-03-17 06:47:50 -07:00 |
|
Cyrus Leung
|
166a168b0f
|
[Doc] Fix misleading log during multi-modal profiling (#14955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 06:14:32 -07:00 |
|
vllmellm
|
2bb0e1a799
|
[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-17 11:33:35 +00:00 |
|
Cyrus Leung
|
6eaf1e5c52
|
[Misc] Add --seed option to offline multi-modal examples (#14934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 03:00:17 -07:00 |
|
Cyrus Leung
|
868a8c5b2c
|
[Bugfix] Fix Ultravox on V1 (#14929)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-17 17:15:20 +08:00 |
|
iefgnoix
|
b4ad56c1bd
|
[V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. (#14846)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-03-17 01:48:28 -07:00 |
|
kushanam
|
69698f257e
|
fix minor miscalled method (#14327)
|
2025-03-17 01:47:58 -07:00 |
|
Lu Fang
|
cd0cd85102
|
[MISC] More AMD unused var clean up (#14926)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-17 16:40:41 +08:00 |
|
Russell Bryant
|
0a74bfce9c
|
setup.py: drop assumption about local main branch (#14692)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-17 01:37:42 -07:00 |
|
Chen Zhang
|
dd3b865854
|
[Doc] Add vLLM Beijing meetup slide (#14938)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-17 16:29:36 +08:00 |
|
Yan Ma
|
9b87a579aa
|
[Misc][XPU] Use None as device capacity for XPU (#14932)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2025-03-17 01:22:14 -07:00 |
|
Cyrus Leung
|
b539222d4e
|
[V1] Remove input cache client (#14864)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-16 23:42:06 -07:00 |
|
Lily Liu
|
8d6cf89526
|
[V1] [Spec Decode] Support random sampling for spec decode (#13933)
Create Release / Create Release (push) Has been cancelled
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
v0.8.0rc1
|
2025-03-16 22:00:20 -07:00 |
|