Gregory Shtrasberg
|
b9a1c4c8a2
|
[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork (#24279)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-09 12:21:56 -04:00 |
|
youkaichao
|
1aa427fdc1
|
[Kernels] Add Flash Linear Attention Kernels (#24518)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-10 00:04:41 +08:00 |
|
Micah Williamson
|
1c63a16b65
|
[Core] Run garbage collector after CUDA graph capture to fix throughput regression (#24128)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-09 10:38:10 -04:00 |
|
d.transposed
|
922d3b401b
|
[Bugfix] Handle the edge case in detokenizer where processed tokens contain both stop str and eos token (#23938)
Signed-off-by: dtransposed <damian.bogunowicz@gmail.com>
|
2025-09-09 07:30:24 -07:00 |
|
wang.yuqi
|
19332c0479
|
[Model] Systematic support for fp32 head, pooling models part (#23810)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-09 07:29:50 -07:00 |
|
Wentao Ye
|
a55cf41a09
|
[Compilation][WideEP] Enable Piecewise CUDAGraph for DeepEPHT (#24123)
|
2025-09-09 10:21:10 -04:00 |
|
Ye (Charlotte) Qi
|
6fb2788163
|
[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-09 10:02:35 +00:00 |
|
Weixiao Huang
|
3d2a2de8f7
|
[RL] fast weight update with zmq + ipc handles (#24295)
Signed-off-by: huangweixiao <huangweixiao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-09-09 16:57:46 +08:00 |
|
Chen Zhang
|
1116590b16
|
[gpt-oss] Validate gpt-oss python tool during initialization (#23856)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-09 08:37:48 +00:00 |
|
Roger Wang
|
ccb97338af
|
[Misc] Add Codex settings to gitignore (#24493)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-09-09 01:25:44 -07:00 |
|
Ye (Charlotte) Qi
|
45c9cb5835
|
[Misc] Add claude settings to gitignore (#24492)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-09 01:14:45 -07:00 |
|
WeiQing Chen
|
e283976f3a
|
[Performance][MM] Building the inverse permutation in O(n) time in Qwen2_5_VisionTransformer (#24443)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
|
2025-09-09 00:24:11 -07:00 |
|
Didier Durand
|
46876dff32
|
[Doc]: fixing typos to improve docs (#24480)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-08 23:06:04 -07:00 |
|
Ming Yang
|
1823a00d67
|
[Misc] Support bench serve long context (#24373)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-09-08 22:53:10 -07:00 |
|
Mickaël Seznec
|
ed16d0f26f
|
[Doc] mention fpdb for multiprocess breakpoints (#24452)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
|
2025-09-08 21:46:45 -07:00 |
|
22quinn
|
0cdd213641
|
[Misc] Improve Worker process title and logging prefix (#22205)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-09-08 21:43:48 -07:00 |
|
Cyrus Leung
|
948dd3443b
|
[Bugfix] Fix Apertus HF repo name (#24447)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-08 21:40:29 -07:00 |
|
cong-meta
|
b2f7745774
|
Add data_parallel_size to VllmConfig string representation (#24298)
Co-authored-by: Cong Chen <congc@meta.com>
|
2025-09-08 21:35:18 -07:00 |
|
Zebing Lin
|
82dfb12e52
|
[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-09-08 21:34:37 -07:00 |
|
elvischenv
|
bba1042c6f
|
[Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel (#23647)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-09-08 20:53:07 -07:00 |
|
CSWYF3634076
|
b6fbc15634
|
[BugFix][Model] Fix Ernie4.5-VL hanging on long inputs (#24074)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2025-09-09 11:37:16 +08:00 |
|
Harry Mellor
|
3e0d4a3475
|
Move KVTransferConfig from config/__init__.py to config/kv_transfer.py (#24434)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-08 20:30:32 -07:00 |
|
dependabot[bot]
|
562663a044
|
Bump actions/github-script from 7.0.1 to 8.0.0 (#24413)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-09 03:12:44 +00:00 |
|
dependabot[bot]
|
ed1623a88a
|
Bump actions/stale from 9.1.0 to 10.0.0 (#24412)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-09 03:11:20 +00:00 |
|
cjackal
|
13b89bd823
|
[doc] update vllm serve cli args documentation (#24329)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2025-09-09 03:07:58 +00:00 |
|
dependabot[bot]
|
22a0070530
|
Bump actions/setup-python from 5.4.0 to 6.0.0 (#24414)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-09 02:54:58 +00:00 |
|
zhiweiz
|
170129eb28
|
[gpt-oss] Harmony changes with container tool support (#23386)
Signed-off-by: zhiweiz <zhiweiz@fb.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: zhiweiz <zhiweiz@fb.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2025-09-08 19:03:50 -07:00 |
|
Tyler Michael Smith
|
955c624915
|
[Bugfix][Wide EP] Fix redundant work when using DeepEP, TP Attn, and EP MoE (#24134)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-08 19:01:51 -07:00 |
|
Zhiyu
|
4f87abdcc6
|
Update reviewers for modelopt related files (#24468)
|
2025-09-09 01:53:13 +00:00 |
|
Sahithi Chigurupati
|
6910b56da2
|
[CI] Add nightly multiarch manifests to dockerhub (#24102)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-09-09 01:18:09 +00:00 |
|
R3hankhan
|
e10fef0883
|
[Hardware][IBM Z] Fix Outlines Core issue for s390x (#24034)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2025-09-08 16:50:34 -07:00 |
|
Chauncey
|
e680723eba
|
[Bugfix] Disable the statslogger if the api_server_count is greater than 1 (#22227)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-09-08 15:28:03 -07:00 |
|
Matthew Bonanni
|
620db1fc58
|
[Attention] FlashAttention MLA cudagraph support (#23958)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-09-08 22:05:26 +00:00 |
|
Ekagra Ranjan
|
41183c1fe0
|
[Spec Decode] Fix offline spec_decode.py (#24257)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-08 20:44:13 +00:00 |
|
Yang Kaiyong
|
43d9ad03ba
|
[Model loader]: support multi-thread model weight loading (#23928)
Signed-off-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-09-08 18:49:39 +00:00 |
|
Jiangyun Zhu
|
7be141b2c5
|
[CI] Enable encoder model compilation test (#24442)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-09-08 11:48:06 -07:00 |
|
Jee Jee Li
|
8d7f39b48c
|
[Model] Remove quantized mixtral (#24437)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-08 11:02:14 -07:00 |
|
Ekagra Ranjan
|
cd08636926
|
[Spec Decode][Benchmark] Add Blitzedit dataset (#23605)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-08 10:32:52 -07:00 |
|
Ekagra Ranjan
|
3feeeb9fea
|
[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking (#23563)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2025-09-08 10:32:42 -07:00 |
|
Jee Jee Li
|
6f4a82f8b5
|
[Model] Enable BNB support for qwen2_5_omni_thinker (#24420)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-08 09:37:08 -07:00 |
|
rongfu.leng
|
c44797a4d6
|
[Docs]add eplb_config param use docs (#24213)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-09-08 09:36:57 -07:00 |
|
Didier Durand
|
55be93baf5
|
[Doc]: fix 2 hyperlinks leading to Ray site after they changed Ray's doc structure (#24438)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-08 09:36:54 -07:00 |
|
Harry Mellor
|
717fc00e98
|
[Docs] Move feature compatibility tables to README (#24431)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-08 06:45:14 -07:00 |
|
Chenheli Hua
|
01dfb5e982
|
[Frontend] User-provided uuids for medias in chat. (RFC #22044) (#23449)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-09-08 06:42:20 -07:00 |
|
Harry Mellor
|
03dd652c16
|
Move KVEventsConfig from config/__init__.py to config/kv_events.py (#24433)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-08 06:41:27 -07:00 |
|
Christian Pinto
|
9cd76b71ab
|
[Misc] Terratorch related fixes (#24337)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-08 06:40:26 -07:00 |
|
tomeras91
|
e041314184
|
[Bugfix] Fix mamba2 prefill chunking (#23279)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-08 11:42:41 +00:00 |
|
Li Wang
|
5e537f45b4
|
[Bugfix] Fix get_quant_config when using modelscope (#24421)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-09-08 11:03:02 +00:00 |
|
Michael Yao
|
c2a8b08fcd
|
[Doc] Fix issues in integrations/llamastack.md (#24428)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-09-08 02:28:32 -07:00 |
|
Didier Durand
|
f4962a6d55
|
[Doc]: fix typos in Python comments (#24417)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-08 00:22:16 -07:00 |
|