wang.yuqi
|
a8b0361c92
|
[CI] Split pooling from entrypoints Test (#24632)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-11 01:53:09 -07:00 |
|
Kyuyeun Kim
|
ed5ae4aace
|
[Bugfix] Fix _synced_weight_loader (#24565)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2025-09-11 16:52:33 +08:00 |
|
Xingyu Liu
|
0fc36463e0
|
[CI]Add transformers_utils to Async Engine, Inputs, Utils, Worker Test (#24615)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
|
2025-09-11 01:52:10 -07:00 |
|
Michael Yao
|
d14c4ebf08
|
[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-09-11 01:50:12 -07:00 |
|
Russell Bryant
|
ba6011027d
|
[Docs] Update V1 doc to reflect whisper support (#24606)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-11 01:50:08 -07:00 |
|
Michael Yao
|
85df8afdae
|
[Docs] Revise frameworks/anything-llm.md (#24489)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-09-11 01:50:05 -07:00 |
|
Cyrus Leung
|
6aeb1dab4a
|
[Bugfix] Fix incorrect import of CacheConfig (#24631)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-11 01:48:25 -07:00 |
|
Tao He
|
e93f4cc9e3
|
Add the support for the qwen3 next model (a hybrid attention model). (#24526)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-11 15:32:09 +08:00 |
|
Jerry Zhang
|
2048c4e379
|
[torchao] Support quantization configs using module swap (#21982)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-09-10 23:53:24 -07:00 |
|
Chenxi Yang
|
d13360183a
|
Remove redundant all gather + split (#23441)
Co-authored-by: Chenxi Yang <cxyang@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2025-09-10 23:45:07 -07:00 |
|
TaehyunKim
|
9bd831f501
|
[Model] New model support for Motif-1-Tiny (#23414)
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com>
Co-authored-by: WyldeCat <skan1543@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-10 23:29:40 -07:00 |
|
Didier Durand
|
e2b1f863aa
|
[Doc]: fixing doc typos (#24635)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-10 23:19:28 -07:00 |
|
shengshiqi-google
|
41329a0ff9
|
[Core] feat: Add --safetensors-load-strategy flag for faster safetensors loading from Lustre (#24469)
Signed-off-by: Shiqi Sheng <shengshiqi@google.com>
Signed-off-by: shengshiqi-google <160179165+shengshiqi-google@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-10 23:10:01 -07:00 |
|
Tomas Ruiz
|
ee0bc5e1b4
|
Enable --profile in 'vllm bench throughput' (#24575)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
|
2025-09-10 23:06:19 -07:00 |
|
Saman A. Pour
|
3d1393f6fc
|
Kimi K2 Fused MoE kernels Optimization configs (#24597)
Signed-off-by: Saman Keon <samanamp@outlook.com>
|
2025-09-10 23:06:16 -07:00 |
|
Guy Stone
|
8a894084d2
|
[Engine][Chore] use local variable and remove output var assignment (#24554)
Signed-off-by: Guy Stone <guys@spotify.com>
|
2025-09-10 23:05:42 -07:00 |
|
Nick Hill
|
e2d8c27f68
|
[BugFix] Fix pipeline parallel (#24621)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-10 23:05:30 -07:00 |
|
Li, Jiang
|
29799ddacc
|
[Bugfix] Add missing VIT backend dispatch on CPU (#24623)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-10 22:28:41 -07:00 |
|
Peter Salas
|
f17a6aa4ec
|
[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides (#24131)
Signed-off-by: Peter Salas <peter@fixie.ai>
|
2025-09-10 22:25:34 -07:00 |
|
Wenlong Wang
|
6c8deacd72
|
[Bug] [Spec Decode] Fix model_initialization test and mismatch in aux_hidden_layers (#24613)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-09-10 21:23:18 -07:00 |
|
Chauncey
|
55b823ba0f
|
Add @chaunceyjiang to codeowner for reasoning Reasoning and Tool parser (#24406)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-11 04:23:04 +00:00 |
|
youkaichao
|
8c5a747246
|
[distributed] update known issues (#24624)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-11 11:09:38 +08:00 |
|
Alexandre Marques
|
5931b7e5d9
|
[Models][Quantization] Add quantization configuration update in Voxtral model (#24122)
Signed-off-by: Alexandre Marques <almarque@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-10 19:13:56 -07:00 |
|
Jonathan Berkhahn
|
cc99baf14d
|
[Misc] Make timeout passable in init_distributed_environment (#24522)
Signed-off-by: jberkhahn <jaberkha@us.ibm.com>
|
2025-09-10 15:41:12 -07:00 |
|
Hanjie Qiu
|
dcb28a332b
|
[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078)
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-10 15:31:10 -07:00 |
|
Michael Goin
|
fba7856581
|
[Perf] Warmup FlashInfer attention during startup (#23439)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-10 15:03:17 -07:00 |
|
Chen Zhang
|
b5e383cd8b
|
[gpt-oss] raise error for flashinfer backend without trtllm (#24482)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-10 14:33:13 -07:00 |
|
Gregory Shtrasberg
|
9a161307f5
|
[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends (#19767)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-10 13:59:55 -07:00 |
|
Russell Bryant
|
37e8182bfe
|
[v1] Add Whisper model support (encoder-decoder) (#21088)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
|
2025-09-10 13:53:35 -07:00 |
|
Nick Hill
|
4db4426404
|
[CI] Fail subprocess tests with root-cause error (#23795)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-10 13:53:21 -07:00 |
|
Thien Tran
|
a0933c3bd6
|
[Bugfix] Enable FP8 KV cache for FlashInfer and Triton backend on non-sm100 GPUs (#24577)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
|
2025-09-10 12:33:41 -07:00 |
|
rongfu.leng
|
09e68bce34
|
[Misc] update log level debug to warning when process port is used by (#24226)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-10 11:32:57 -07:00 |
|
Xingyu Liu
|
9fb74c27a7
|
[Core] Support configuration parsing plugin (#24277)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-10 11:32:43 -07:00 |
|
Ming Yang
|
4032949630
|
[Bugfix] Fix DeepEP config for DP4TP4 (#23619)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-09-10 10:37:56 -07:00 |
|
tomeras91
|
08abfa78ec
|
[Bugfix] fix modelopt exclude_modules name mapping (#24178)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-10 10:20:46 -07:00 |
|
Shiyan Deng
|
2bef2d1405
|
[Logging] allow config logging stream (#24336)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
|
2025-09-10 15:02:01 +00:00 |
|
Robin
|
36cacd0958
|
[Doc] Add documentation for GLM-4.5 series models: tool-calling and reasoning parser (#24589)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-09-10 07:50:55 -07:00 |
|
Jee Jee Li
|
bb3eb80d92
|
[Core] Split LoRA layers (#24574)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-10 07:47:51 -07:00 |
|
pwschuurman
|
fcc0a3130a
|
[CI] Fix tensorizer test assertion (#24545)
Signed-off-by: Peter Schuurman <psch@google.com>
|
2025-09-10 06:57:36 -07:00 |
|
zzhxxx
|
736569da8d
|
[Platform] Custom ops support for LMhead and LogitsProcessor (#23564)
Signed-off-by: zzhx1 <zzh_201018@outlook.com>
|
2025-09-10 06:26:31 -07:00 |
|
Kay Yan
|
2eb9986a2d
|
[BugFix] python collect_env.py and vllm collect-env compatibility with uv venv (#24066)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
|
2025-09-10 21:25:33 +08:00 |
|
Hyogeun Oh (오효근)
|
ccee371e86
|
[Docs] Fix warnings in mkdocs build (continued) (#24092)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-10 06:23:28 -07:00 |
|
RoadToNowhereX
|
c0bd6a684a
|
Fix Auto_Round Quatization Loading on SM75 and Lower GPUs (#24217)
Signed-off-by: RoadToNowhereX <37441177+RoadToNowhereX@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-10 06:22:31 -07:00 |
|
co63oc
|
3144d90217
|
fix some typos (#24167)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-10 06:21:23 -07:00 |
|
Daniele
|
2f5e5c18de
|
[CI/Build] bump timm dependency (#24189)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-10 06:20:59 -07:00 |
|
wang.yuqi
|
bd98842c8a
|
[CI] Add PPL test for generation models (#24485)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-10 06:16:39 -07:00 |
|
Lifans
|
d6069887c6
|
[rocm] enable torchao quantization for rocm (#24400)
Signed-off-by: Lifan Shen <lifans@meta.com>
|
2025-09-10 06:16:21 -07:00 |
|
Ye (Charlotte) Qi
|
492196ed0e
|
[CI/Build] split true unit tests to Entrypoints Unit Tests (#24418)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-10 06:16:07 -07:00 |
|
Nick Hill
|
f4f1a8df22
|
[BugFix] Ensure integrity of reused CPU tensors during async scheduling (#24527)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: guoze.lin <guozelin@tencent.com>
|
2025-09-10 21:15:14 +08:00 |
|
lacora
|
0b9a612fa3
|
[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat (#24549)
Signed-off-by: lacora2017 <yehu@meta.com>
Co-authored-by: lacora2017 <yehu@meta.com>
|
2025-09-10 21:14:55 +08:00 |
|