wang.yuqi
|
3d3ab3689f
|
[New Model]: Snowflake Arctic Embed (Family) (#16649)
|
2025-04-18 08:11:57 -07:00 |
|
Harry Mellor
|
686623c5e7
|
Fix nullable_kvs fallback (#16837)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-18 05:58:39 -07:00 |
|
Cyrus Leung
|
aadb656562
|
[Misc] Clean up Kimi-VL (#16833)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-18 05:15:09 -07:00 |
|
Jonghyun Choe
|
87e067de41
|
[Model] use AutoWeightsLoader for BigCode, GPT-J (#16823)
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>
|
2025-04-18 10:42:41 +00:00 |
|
Michael Yao
|
26507f8973
|
[Docs] Fix a link and grammar issue in production-stack.md (#16809)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-18 06:42:58 +00:00 |
|
Nathan Weinberg
|
9c1d5b456d
|
[Doc] add podman setup instructions for official image (#16796)
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
|
2025-04-18 06:10:49 +00:00 |
|
Lucia Fang
|
e31045f95c
|
[Bugfix] fix pp for llama4 (#16746)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-04-18 13:51:30 +08:00 |
|
Luka Govedič
|
aaec845f8e
|
[ROCm] [Attention] Cleanup ROCm output passing (#16431)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-04-18 05:46:45 +00:00 |
|
rongfu.leng
|
7bdfd29a35
|
[Misc] add collect_env to cli and docker image (#16759)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-17 22:13:35 -07:00 |
|
Harry Mellor
|
e78587a64c
|
Improve-mm-and-pooler-and-decoding-configs (#16789)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 22:13:32 -07:00 |
|
Lucas Wilkinson
|
7eb4255628
|
[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-17 22:13:29 -07:00 |
|
Michael Goin
|
6a0f547561
|
Add hardware print to TPU V1 test (#16792)
|
2025-04-17 22:13:26 -07:00 |
|
Shanshan Shen
|
30ed81b7ca
|
[V1][Structured Output] Minor modification to _validate_structured_output() (#16748)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-04-18 13:12:54 +08:00 |
|
Chauncey
|
7a4a5de729
|
[Misc] Update outdated note: LMCache now supports chunked prefill (#16697)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-18 05:12:42 +00:00 |
|
Cyrus Leung
|
c16fb5dae8
|
[Doc] Improve help examples for --compilation-config (#16729)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-17 21:22:34 -07:00 |
|
Tarun Kumar
|
e37073efd7
|
Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema (#16721)
Signed-off-by: Tarun Kumar <takumar@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-17 21:08:27 -07:00 |
|
Lucas Wilkinson
|
183dad7a85
|
[Attention] Update to lastest FA3 code (#13111)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-17 15:14:07 -07:00 |
|
Yihua Cheng
|
3408e47159
|
[P/D][V1] KV Connector API V1 (#15960)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
|
2025-04-17 13:22:40 -07:00 |
|
Nick Hill
|
0377b8310b
|
[MLA] Simplification to batch P/D reordering (#16673)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-17 16:12:09 -04:00 |
|
Mark McLoughlin
|
e4755f7fac
|
[V1][Metrics] Fix http metrics middleware (#15894)
|
2025-04-17 19:52:18 +00:00 |
|
Sijia(Jackson) Chen
|
92edf35826
|
[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints (#16674)
|
2025-04-17 11:44:34 -07:00 |
|
Nicolò Lucchesi
|
eb5819b2d9
|
[V1][TPU] Enable Top K (#15489)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: Hyesoo Yang <hyeygit@gmail.com>
|
2025-04-17 18:18:11 +00:00 |
|
Nicolò Lucchesi
|
5989f4684d
|
[TPU][V1] Fix padding recompilation when max-num-batched-tokens is not even (#16726)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-17 18:09:57 +00:00 |
|
rongfu.leng
|
5125d72f02
|
[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small (#16548)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-17 17:48:31 +00:00 |
|
Ximingwang-09
|
a018e555fd
|
[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 (#16753)
Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com>
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-04-18 00:01:30 +08:00 |
|
Robin
|
6211b92273
|
[Bugfix]Fix index out of range error in api server log (#16787)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-04-17 09:01:07 -07:00 |
|
Nick Hill
|
05fcd1b430
|
[V1][Perf] Faster incremental detokenization (#15137)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-17 07:45:24 -07:00 |
|
Insu Kim
|
7c02d6a137
|
[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion (#16784)
Signed-off-by: insukim1994 <insu.kim@moreh.io>
|
2025-04-17 14:10:08 +00:00 |
|
wang.yuqi
|
11c3b98491
|
[Doc] Document Matryoshka Representation Learning support (#16770)
|
2025-04-17 13:37:37 +00:00 |
|
Cyrus Leung
|
dbe7f07001
|
[Doc] Make sure to update vLLM when installing latest code (#16781)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-17 06:53:31 -06:00 |
|
Reid
|
c69bf4ee06
|
fix: hyperlink (#16778)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-17 11:34:20 +00:00 |
|
Harry Mellor
|
d27ea94034
|
Improve configs - TokenizerPoolConfig + DeviceConfig (#16603)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 11:19:42 +00:00 |
|
Reid
|
99ed526101
|
[Misc] refactor examples series - lmcache (#16758)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-17 11:02:35 +00:00 |
|
Michael Yao
|
207da28186
|
[Doc] Fix a 404 link in installation/cpu.md (#16773)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-17 10:46:21 +00:00 |
|
intervitens
|
5b1aca2ae3
|
[Bugfix] Fix GLM4 model (#16618)
Signed-off-by: intervitens <intervitens@tutanota.com>
|
2025-04-17 03:35:07 -07:00 |
|
Reid
|
d8e557b5e5
|
[doc] add open-webui example (#16747)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-17 18:27:32 +08:00 |
|
Cyrus Leung
|
61a44a0b22
|
[Doc] Add more tips to avoid OOM (#16765)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-17 09:54:34 +00:00 |
|
DefTruth
|
a6481525b8
|
[misc] ignore marlin_moe_wna16 local gen codes (#16760)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-04-17 17:15:14 +08:00 |
|
Richard Liaw
|
8cac35ba43
|
[Ray] Improve documentation on batch inference (#16609)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
|
2025-04-16 22:19:26 -07:00 |
|
Russell Bryant
|
9dbf7a2dc1
|
[V1] Remove log noise when idle (#16735)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-16 21:34:08 -07:00 |
|
David Heineman
|
607029e515
|
[Bugfix] Revert max_prompt_len validation for decoder-only models. (#16741)
Signed-off-by: David Heineman <david@davidheineman.com>
|
2025-04-16 21:33:15 -07:00 |
|
Isotr0py
|
cb072ce93b
|
[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-17 04:17:39 +00:00 |
|
Divakar Verma
|
95aca283b4
|
[rocm][V0] fix selection logic for custom PA in V0 (#16426)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-04-16 19:52:11 -07:00 |
|
Robert Shaw
|
2b05b8ce69
|
[V1][Frontend] Improve Shutdown And Logs (#11737)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com>
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-16 19:48:34 -07:00 |
|
Aaruni Aggarwal
|
3c776dcefb
|
Adding vllm buildkite job for IBM Power (#16679)
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
|
2025-04-17 10:47:47 +08:00 |
|
Bryan Lu
|
2cbd4d2999
|
[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification (#16636)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-16 19:47:26 -07:00 |
|
Staszek Paśko
|
3092375e27
|
[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] (#16432)
Signed-off-by: Staszek Pasko <staszek@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-16 19:28:32 -07:00 |
|
Harry Mellor
|
3cd91dc955
|
Help user create custom model for Transformers backend remote code models (#16719)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 01:05:59 +00:00 |
|
Jade Zheng
|
8a7368e069
|
[Misc] Remove redundant comment (#16703)
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
|
2025-04-17 00:44:52 +00:00 |
|
Harry Mellor
|
93e561ec4d
|
Improve error for structured output backend selection (#16717)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 00:35:35 +00:00 |
|