Or Ozeri
|
9d1c50a5ac
|
[KV offload][2/N] Introduce LRU-based CPU offloading management (#20075)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-19 00:20:51 +00:00 |
|
Andrew Sansom
|
9a4600e4dc
|
[CORE] Prompt Embeddings Support for v1 Engine (#24278)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-09-19 08:03:09 +08:00 |
|
Lucas Wilkinson
|
9fac6aa30b
|
[BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv (#25206)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-18 14:26:28 -07:00 |
|
Or Ozeri
|
a53ad626d6
|
[KV offload][1b/N] rename offloading to kv_offload (#25191)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-18 20:53:52 +00:00 |
|
Woosuk Kwon
|
1c3dad22ff
|
[V0 Deprecation] Remove unused async_timeout.py (#25190)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-18 20:35:21 +00:00 |
|
Wentao Ye
|
d2a30a2d93
|
[Bug] Fix torch Compilation Cache Hit Error (#25093)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-18 12:38:37 -07:00 |
|
Wentao Ye
|
75fb112d80
|
[Bug] Fix returned_lse not Defined issue (#25106)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-09-18 19:32:24 +00:00 |
|
Aziz
|
38db529f66
|
[feat]: Create interface for model-specific M-RoPE (#24194)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: Aziz <azizbenothman76@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-18 19:18:56 +00:00 |
|
Nikhil Gupta
|
064cac7bb7
|
[fix]: remove data type hardcoding from gptoss model implementation (#23807)
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
|
2025-09-18 18:15:23 +00:00 |
|
Woosuk Kwon
|
e19bce40a1
|
[V0 Deprecation] Remove AsyncLLMEngine (#25025)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-18 11:07:42 -07:00 |
|
Or Ozeri
|
505805b645
|
[KV offload][1/N] Introduce an offloading component (#19848)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-18 10:57:07 -07:00 |
|
Rohan Potdar
|
bbdc0f2366
|
[ROCm][AITER][Bugfix] Switch AITER to use PIECEWISE_AND_FULL compilation (#25104)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2025-09-18 17:46:47 +00:00 |
|
Gregory Shtrasberg
|
dc34059360
|
[ROCm][CI/Build] Use ROCm7.0 as the base (#25178)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-18 09:36:55 -07:00 |
|
qizixi
|
c4cb0af98a
|
[spec decode] Fix MTP inference path for MiMo-7B model (#25136)
Signed-off-by: zixi-qi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-18 09:12:19 -07:00 |
|
Harry Mellor
|
1c3b1634aa
|
[Misc] Add codeowner for Transformers backend (#25180)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 09:01:50 -07:00 |
|
Shu Wang
|
2ea50e977a
|
Enable Allgather/ReduceScatter backend for NaiveAllToAll (#23964)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-18 15:52:58 +00:00 |
|
Hyogeun Oh (오효근)
|
b419937c78
|
[Docs] Fix warnings in mkdocs build (continued) (#25163)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-09-18 08:23:26 -07:00 |
|
wang.yuqi
|
5f696c33b1
|
[New Model] Support BertForTokenClassification / Named Entity Recognition (NER) task (#24872)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-18 23:22:01 +08:00 |
|
dongbo910220
|
67244c86f0
|
feat(api): Return 503 on /health when engine is dead (#24897)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2025-09-18 14:29:40 +00:00 |
|
Vadim Gimpelson
|
072d7e53e5
|
[PERF] Add conv1d metadata to GDN attn (#25105)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-09-18 14:27:49 +00:00 |
|
jvlunteren
|
01a583fea4
|
[Kernel] Decouple Tile Size from Block Size in Triton Unified Attention Kernel (#21197)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2025-09-18 14:27:01 +00:00 |
|
Nicolò Lucchesi
|
bc19d75985
|
[Misc] Add kv-connector label (#25156)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-18 13:56:07 +00:00 |
|
Michael Goin
|
fbd6523ac0
|
Refactor dense FP8 tensor/channel/block utils and add CT FP8 block (#21404)
|
2025-09-18 08:53:45 -04:00 |
|
Shanshan Shen
|
470484a4f5
|
[Structured Output][Refactor] Move apply_grammar_bitmask() method from ModelRunner to structured output utils (#21999)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-09-18 20:44:31 +08:00 |
|
Roger Wang
|
21da73343a
|
[Misc] Clean up flags in vllm bench serve (#25138)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-18 12:43:33 +00:00 |
|
Asaf Joseph Gardin
|
66072b36db
|
[Bugfix][Mamba] - Fix Conv State Kernel FP32 Support (#24883)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
|
2025-09-18 12:21:17 +00:00 |
|
Harry Mellor
|
3ed1ec4af2
|
Fix validate-config pre-commit check (#25157)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 12:06:28 +00:00 |
|
Harry Mellor
|
5a33ae9a3f
|
Fix forward reference warning in documentation (#25150)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 11:41:41 +00:00 |
|
William Song
|
c9ff9e6f0c
|
[Docs] add the parallel sampling usage in LLMEngine and AsyncLLM (#24222)
|
2025-09-18 04:37:08 -07:00 |
|
Kay Yan
|
eaffe4486c
|
[Docs] Fix pooling-params doc references in openai_compatible_server.md (#24939)
|
2025-09-18 04:36:47 -07:00 |
|
Harry Mellor
|
8ed039d527
|
Move StructuredOutputsConfig from config/__init__.py to config/structured_outputs.py (#25153)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 11:24:27 +00:00 |
|
Jee Jee Li
|
37970105fe
|
[Model] Improve Pooling Model (#25149)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-18 11:04:21 +00:00 |
|
Chauncey
|
cc935fdd7e
|
[Frontend] Support setting logprobs to -1 (#25031)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-18 10:34:42 +00:00 |
|
Elvir Crnčević
|
abdfcd4f3d
|
silu-v1: Fix EPS not being used during max-reduction (#25069)
Signed-off-by: elvircrn <elvircrn@gmail.com>
|
2025-09-18 10:25:12 +00:00 |
|
ihb2032
|
4f02b77de4
|
Fix: Add explicit #include <omp.h> for OpenMP compatibility on certain toolchains (#24951)
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
|
2025-09-18 17:43:23 +08:00 |
|
Aaron Pham
|
29283e8976
|
[Chore] Cleanup guided namespace, move to structured outputs config (#22772)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 09:20:27 +00:00 |
|
Punitvara
|
05b044e698
|
[Doc] Fix cross-reference warnings (#25058)
Signed-off-by: Punit Vara <punitvara@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 02:05:16 -07:00 |
|
Gerard Finol
|
aa3f105c59
|
Add 'path' option to ImagePrompt data_format (#25081)
Signed-off-by: Gerard Finol <gerard.finol@urv.cat>
|
2025-09-18 02:02:14 -07:00 |
|
Tao He
|
ef7eefe17a
|
[Qwen] Add fp8 checkpoint support for qwen3-next. (#25079)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
|
2025-09-18 08:16:04 +00:00 |
|
rongfu.leng
|
350c94deb3
|
[Bugfix] when use s3 model cannot use default load_format (#24435)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-09-18 07:47:43 +00:00 |
|
Harry Mellor
|
f4cd80f944
|
Retrieve sliding_window from text config in Gemma3 MM (#25085)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 06:29:05 +00:00 |
|
Harry Mellor
|
349e0e3462
|
[Docs] Fix API Reference (#25140)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-17 23:23:29 -07:00 |
|
Lumina
|
81b16a2bc9
|
[Kernel] Better inf handling for grouped topk cu (#24886)
Signed-off-by: lumina37 <starry.qvq@gmail.com>
|
2025-09-18 05:53:55 +00:00 |
|
Simon Mo
|
e111d5b0ae
|
[CLI] Use streaming in CLI chat and completion commands (#23769)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-17 22:30:26 -07:00 |
|
Simon Mo
|
a904ea78ea
|
[benchmark] add peak throughput metrics and plot (#23867)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-17 22:30:02 -07:00 |
|
Benjamin Chislett
|
b7433ca1a4
|
[Spec Decode] Efficient padded speculation (#24539)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-09-18 01:07:24 -04:00 |
|
Woosuk Kwon
|
5c65a72bb1
|
[V0 Deprecation] Remove more V0 tests (#25117)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 22:05:25 -07:00 |
|
YiwenC
|
9d8a2d86d2
|
[EPLB] Add EPLB support for hunyuan_v1 (#23078)
|
2025-09-18 04:51:35 +00:00 |
|
Chaojun Zhang
|
3bc18127ff
|
[XPU] Whisper model support on XPU Platform (#25123)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
|
2025-09-18 04:30:10 +00:00 |
|
Andrew Sansom
|
bec060fd99
|
Mark prompt logprobs as incompatible with prompt embeds at API level (#25077)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-17 21:25:07 -07:00 |
|