Sijia(Jackson) Chen
|
92edf35826
|
[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints (#16674)
|
2025-04-17 11:44:34 -07:00 |
|
Nicolò Lucchesi
|
eb5819b2d9
|
[V1][TPU] Enable Top K (#15489)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>
|
2025-04-17 18:18:11 +00:00 |
|
Nicolò Lucchesi
|
5989f4684d
|
[TPU][V1] Fix padding recompilation when max-num-batched-tokens is not even (#16726)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-17 18:09:57 +00:00 |
|
rongfu.leng
|
5125d72f02
|
[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small (#16548)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-17 17:48:31 +00:00 |
|
Ximingwang-09
|
a018e555fd
|
[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 (#16753)
Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com>
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-04-18 00:01:30 +08:00 |
|
Robin
|
6211b92273
|
[Bugfix]Fix index out of range error in api server log (#16787)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-04-17 09:01:07 -07:00 |
|
Nick Hill
|
05fcd1b430
|
[V1][Perf] Faster incremental detokenization (#15137)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-17 07:45:24 -07:00 |
|
Insu Kim
|
7c02d6a137
|
[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion (#16784)
Signed-off-by: insukim1994 <insu.kim@moreh.io>
|
2025-04-17 14:10:08 +00:00 |
|
wang.yuqi
|
11c3b98491
|
[Doc] Document Matryoshka Representation Learning support (#16770)
|
2025-04-17 13:37:37 +00:00 |
|
Cyrus Leung
|
dbe7f07001
|
[Doc] Make sure to update vLLM when installing latest code (#16781)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-17 06:53:31 -06:00 |
|
Reid
|
c69bf4ee06
|
fix: hyperlink (#16778)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-17 11:34:20 +00:00 |
|
Harry Mellor
|
d27ea94034
|
Improve configs - TokenizerPoolConfig + DeviceConfig (#16603)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 11:19:42 +00:00 |
|
Reid
|
99ed526101
|
[Misc] refactor examples series - lmcache (#16758)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-17 11:02:35 +00:00 |
|
Michael Yao
|
207da28186
|
[Doc] Fix a 404 link in installation/cpu.md (#16773)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-17 10:46:21 +00:00 |
|
intervitens
|
5b1aca2ae3
|
[Bugfix] Fix GLM4 model (#16618)
Signed-off-by: intervitens <intervitens@tutanota.com>
|
2025-04-17 03:35:07 -07:00 |
|
Reid
|
d8e557b5e5
|
[doc] add open-webui example (#16747)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-17 18:27:32 +08:00 |
|
Cyrus Leung
|
61a44a0b22
|
[Doc] Add more tips to avoid OOM (#16765)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-17 09:54:34 +00:00 |
|
DefTruth
|
a6481525b8
|
[misc] ignore marlin_moe_wna16 local gen codes (#16760)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-17 17:15:14 +08:00 |
|
Richard Liaw
|
8cac35ba43
|
[Ray] Improve documentation on batch inference (#16609)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
|
2025-04-16 22:19:26 -07:00 |
|
Russell Bryant
|
9dbf7a2dc1
|
[V1] Remove log noise when idle (#16735)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-16 21:34:08 -07:00 |
|
David Heineman
|
607029e515
|
[Bugfix] Revert max_prompt_len validation for decoder-only models. (#16741)
Signed-off-by: David Heineman <david@davidheineman.com>
|
2025-04-16 21:33:15 -07:00 |
|
Isotr0py
|
cb072ce93b
|
[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-17 04:17:39 +00:00 |
|
Divakar Verma
|
95aca283b4
|
[rocm][V0] fix selection logic for custom PA in V0 (#16426)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-04-16 19:52:11 -07:00 |
|
Robert Shaw
|
2b05b8ce69
|
[V1][Frontend] Improve Shutdown And Logs (#11737)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com>
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-16 19:48:34 -07:00 |
|
Aaruni Aggarwal
|
3c776dcefb
|
Adding vllm buildkite job for IBM Power (#16679)
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
|
2025-04-17 10:47:47 +08:00 |
|
Bryan Lu
|
2cbd4d2999
|
[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification (#16636)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-16 19:47:26 -07:00 |
|
Staszek Paśko
|
3092375e27
|
[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432)
Signed-off-by: Staszek Pasko <staszek@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-16 19:28:32 -07:00 |
|
Harry Mellor
|
3cd91dc955
|
Help user create custom model for Transformers backend remote code models (#16719)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 01:05:59 +00:00 |
|
Jade Zheng
|
8a7368e069
|
[Misc] Remove redundant comment (#16703)
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
|
2025-04-17 00:44:52 +00:00 |
|
Harry Mellor
|
93e561ec4d
|
Improve error for structured output backend selection (#16717)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 00:35:35 +00:00 |
|
Joe Runde
|
e1b004839a
|
[Hardware] Add processor inputs to platform validation (#16680)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-04-16 09:28:42 -07:00 |
|
xsank
|
ee378f3d49
|
[Model] support modernbert (#16648)
Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com>
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>
|
2025-04-16 05:30:15 -07:00 |
|
DefTruth
|
e82ee40de3
|
[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-16 03:31:39 -07:00 |
|
Cyrus Leung
|
facbe2a114
|
[Doc] Improve OOM troubleshooting (#16704)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-16 18:29:48 +08:00 |
|
Reid
|
7168920491
|
[Misc] refactor examples series (#16708)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-16 10:16:36 +00:00 |
|
Kay Yan
|
21378a2323
|
[CI] Cleanup additional_dependencies: [toml] for pre-commit yapf hook (#16405)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-04-16 10:05:31 +00:00 |
|
Shanshan Shen
|
976711d9db
|
[V1][Structured Output] Move xgrammar related utils to backend_xgrammar.py (#16578)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-04-16 17:01:36 +08:00 |
|
Sage Moore
|
44fa4d556c
|
[ROCM] Bind triton version to 3.2 in requirements-built.txt (#16664)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-04-16 14:05:28 +08:00 |
|
billishyahao
|
3ac98edcb1
|
[Feature] add model aware kv ops helper (#16020)
Signed-off-by: billishyahao <bill.he@amd.com>
|
2025-04-15 23:00:43 -07:00 |
|
Richard Zou
|
966c742ed2
|
Disable remote caching when calling compile_fx (#16611)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-15 22:18:28 -07:00 |
|
Jee Jee Li
|
0d7d05f4b6
|
[Misc] Modify LRUCache touch (#16689)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-16 04:51:38 +00:00 |
|
rongfu.leng
|
96bb8aa68b
|
[Bugfix] fix gpu docker image mis benchmarks dir (#16628)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-15 21:21:14 -07:00 |
|
Shinichi Hemmi
|
3badb0213b
|
[Model] Add PLaMo2 (#14323)
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
Signed-off-by: shemmi <shemmi@preferred.jp>
Co-authored-by: Kento Nozawa <nzw0301@preferred.jp>
Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp>
Co-authored-by: Calvin Metzger <metzger@preferred.jp>
|
2025-04-15 19:31:30 -07:00 |
|
Angky William
|
fdcb850f14
|
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546)
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
|
2025-04-15 22:31:38 +00:00 |
|
Dipika Sikka
|
54a66e5fee
|
[Misc] Update compressed-tensors WNA16 to support zero-points (#14211)
|
2025-04-15 07:33:51 -06:00 |
|
DefTruth
|
280d62b8a2
|
[Kernel] Remove redundant Exp calculations (#16123)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-15 12:58:37 +00:00 |
|
Xihui Cang
|
1666e66443
|
Add "/server_info" endpoint in api_server to retrieve the vllm_config. (#16572)
Signed-off-by: Xihui Cang <xihuicang@gmail.com>
|
2025-04-15 11:50:38 +00:00 |
|
Jee Jee Li
|
1575c1701a
|
[CI/Build] Fix LoRA OOM (#16624)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-15 16:38:19 +08:00 |
|
Reid
|
6ae996a873
|
[Misc] refactor argument parsing in examples (#16635)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-15 08:05:30 +00:00 |
|
Richard Zou
|
b590adfdc1
|
Fix vLLM x torch.compile config caching (#16491)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-14 23:11:11 -07:00 |
|