Cyrus Leung
|
5994430b84
|
[Misc] Remove redundant num_embeds (#15443)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-25 18:27:57 +08:00 |
|
Cyrus Leung
|
a9e879b316
|
[Misc] Clean up MiniCPM-V/O code (#15337)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-25 10:22:52 +00:00 |
|
Md. Shafi Hussain
|
3e2f37a69a
|
Dockerfile.ppc64le changes to move to UBI (#15402)
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
|
2025-03-25 10:15:14 +00:00 |
|
Thien Tran
|
4f044b1d67
|
[Kernel][CPU] CPU MLA (#14744)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-03-25 09:34:59 +00:00 |
|
Siyuan Liu
|
4157f563b4
|
[Hardware][TPU][Bugfix] Fix v1 mp profiler (#15409)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-03-25 01:43:00 -07:00 |
|
Lu Fang
|
051da7efe3
|
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
|
2025-03-25 15:36:45 +08:00 |
|
Woosuk Kwon
|
25f560a62c
|
[V1][Spec Decode] Update target_logits in place for rejection sampling (#15427)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
v0.8.2
|
2025-03-24 21:04:41 -07:00 |
|
Russell Bryant
|
a09ad90a72
|
[V1] guidance backend for structured output + auto fallback mode (#14779)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
|
2025-03-24 21:02:33 -07:00 |
|
Chauncey
|
10b34e36b9
|
[Bugfix] Fixed the issue of not being able to input video and image simultaneously (#15387)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-25 03:48:08 +00:00 |
|
Tyler Michael Smith
|
b5269db959
|
Revert "Fix non-contiguous input passed to Marlin kernel (#15319)" (#15398)
|
2025-03-24 20:43:51 -07:00 |
|
Jee Jee Li
|
6db94571d7
|
[Misc] Remove LoRA log (#15388)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-24 20:43:48 -07:00 |
|
Harry Mellor
|
97cfa65df7
|
Add pipeline parallel support to TransformersModel (#12832)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-03-25 10:41:45 +08:00 |
|
Woosuk Kwon
|
911c8eb000
|
[Minor][Spec Decode] Remove compiled_softmax (#15416)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-24 19:09:04 -07:00 |
|
Woosuk Kwon
|
ebcebeeb6b
|
[V1][Spec Decode] Enable spec decode for top-p & top-k sampling (#15063)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-24 17:16:46 -07:00 |
|
Gregory Shtrasberg
|
f533b5837f
|
[ROCm][Kernel] MoE weights padding (#14454)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: charlifu <charlifu@amd.com>
|
2025-03-24 23:45:30 +00:00 |
|
Gregory Shtrasberg
|
8279201ce6
|
[Build] Cython compilation support fix (#14296)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-03-24 23:37:54 +00:00 |
|
Siyuan Liu
|
23fdab00a8
|
[Hardware][TPU] Skip failed compilation test (#15421)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
|
2025-03-24 23:28:57 +00:00 |
|
Nick Hill
|
623e2ed29f
|
[BugFix][V1] Quick fix for min_tokens with multiple EOS (#15407)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-24 15:58:59 -07:00 |
|
Nick Hill
|
9d72daf4ce
|
[V1][Perf] Simpler request output queues (#15156)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-03-24 22:44:08 +00:00 |
|
Cyrus Leung
|
6dd55af6c9
|
[Doc] Update docs on handling OOM (#15357)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-24 14:29:34 -07:00 |
|
Yuan Tang
|
3eb08ed9b1
|
[DOC] Add Kubernetes deployment guide with CPUs (#14865)
|
2025-03-24 10:48:43 -07:00 |
|
liuzhenwei
|
5eeadc2642
|
[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral (#12303)
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
|
2025-03-24 09:48:40 -07:00 |
|
Nick Hill
|
3aee6573dc
|
[V1] Aggregate chunked prompt logprobs in model runner (#14875)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-24 12:27:57 -04:00 |
|
Yi Liu
|
9cc645141d
|
[MISC] Refine no available block debug msg (#15076)
Signed-off-by: Yi Liu <yiliu4@habana.ai>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
|
2025-03-25 00:01:10 +08:00 |
|
Chen1022
|
0893567db9
|
[V1][Minor] fix comments (#15392)
Signed-off-by: chenjincong <chenjincong@baidu.com>
Signed-off-by: Chen-0210 <chenjincong11@gmail.com>
Co-authored-by: chenjincong <chenjincong@baidu.com>
|
2025-03-24 08:45:32 -07:00 |
|
Russell Bryant
|
8abe69b499
|
[Core] Don't force uppercase for VLLM_LOGGING_LEVEL (#15306)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-24 08:27:30 -07:00 |
|
Manish Sethi
|
761702fd19
|
[Core] Integrate fastsafetensors loader for loading model weights (#10647)
Signed-off-by: Manish Sethi <Manish.sethi1@ibm.com>
|
2025-03-24 08:08:02 -07:00 |
|
youkaichao
|
9606d572ed
|
[distributed] fix dp group (#15355)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-24 14:54:27 +00:00 |
|
Cyrus Leung
|
cbcdf2c609
|
[Bugfix] Fix chat template loading (#15143)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-24 13:50:09 +00:00 |
|
Russell Bryant
|
038de04d7b
|
Fix zmq IPv6 URL format error (#15341)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-24 09:30:41 -04:00 |
|
Jinzhen Lin
|
6b3cc75be0
|
[Kernel] allow non-contiguous input for marlin kernel (#14658)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-03-24 09:21:33 -04:00 |
|
Simon Mo
|
7ffcccfa5c
|
Revert "[CI/Build] Use uv python for docker rather than ppa:deadsnakess/ppa (#13569)" (#15377)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-24 05:53:10 -07:00 |
|
sfbemerk
|
cc8accfd53
|
[Misc] Update guided decoding logs to debug (#15310)
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>
|
2025-03-24 04:25:20 -07:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
948ab03e7e
|
[Bugfix][V1] Avoid importing PreTrainedModel (#15366)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2025-03-24 10:33:12 +00:00 |
|
Rui Qiao
|
5797fb97e9
|
[Misc] Remove ignore_reinit_error for ray.init() (#15373)
|
2025-03-24 07:41:53 +00:00 |
|
Jee Jee Li
|
3892e58ad7
|
[Misc] Upgrade BNB version (#15183)
|
2025-03-24 05:51:42 +00:00 |
|
Qubitium-ModelCloud
|
d20e261199
|
Fix non-contiguous input passed to Marlin kernel (#15319)
|
2025-03-24 03:09:44 +00:00 |
|
Luka Govedič
|
f622dbcf39
|
[Fix] [torch.compile] Improve UUID system for custom passes (#15249)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-03-24 01:54:07 +00:00 |
|
Lucas Wilkinson
|
dccf535f8e
|
[V1] Enable V1 Fp8 cache for FA3 in the oracle (#15191)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-23 15:07:04 -07:00 |
|
Roger Wang
|
9c5c81b0da
|
[Misc][Doc] Add note regarding loading generation_config by default (#15281)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-23 14:00:55 -07:00 |
|
Robin
|
d6cd59f122
|
[Frontend] Support tool calling and reasoning parser (#14511)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-03-23 14:00:07 -07:00 |
|
Woosuk Kwon
|
bc8ed3c4ba
|
[V1][Spec Decode] Use better defaults for N-gram (#15358)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-23 10:52:30 -07:00 |
|
Woosuk Kwon
|
b9bd76ca14
|
[V1][Spec Decode] Respect prompt_lookup_max (#15348)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-23 10:41:44 -07:00 |
|
DefTruth
|
6ebaf9ac71
|
[Bugfix] consider related env vars for torch.compiled cache hash (#14953)
Signed-off-by: DefTruth <31974251+DefTruth@users.noreply.github.com>
|
2025-03-23 15:53:09 +00:00 |
|
DefTruth
|
f90d34b498
|
[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-03-23 01:10:10 -07:00 |
|
youkaichao
|
f68cce8e64
|
[ci/build] fix broken tests in LLM.collective_rpc (#15350)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-23 14:49:48 +08:00 |
|
youkaichao
|
09b6a95551
|
[ci/build] update torch nightly version for GH200 (#15135)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-23 14:04:13 +08:00 |
|
shangmingc
|
50c9636d87
|
[V1][Usage] Refactor speculative decoding configuration and tests (#14434)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-03-22 19:28:10 -10:00 |
|
hijkzzz
|
0661cfef7a
|
Fix v1 supported oracle for worker-cls and worker-extension-cls (#15324)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-03-23 10:23:35 +08:00 |
|
Chen Zhang
|
a827aa815d
|
[doc] Add back previous news (#15331)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-03-22 17:38:33 -07:00 |
|