Harry Mellor
|
a332b84578
|
[CI] Only capture a single CUDA graph size in CI by default (#25951)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-01 10:03:44 +01:00 |
|
Cyrus Leung
|
1405f0c7ba
|
[Misc] Factor out common _apply_feature_select_strategy (#26003)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-01 01:31:03 -07:00 |
|
Wenlong Wang
|
84d57342b6
|
[BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-thinker (#26004)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-10-01 08:03:25 +00:00 |
|
nadathurv
|
57b46d769e
|
[Doc] updating torch.compile doc link (#25989)
Signed-off-by: nadathurv <work.vnadathur@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
|
2025-10-01 07:04:56 +00:00 |
|
Lucia Fang
|
f48b6a03ba
|
[Misc]allow disable pynccl (#25421)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-10-01 06:04:13 +00:00 |
|
Harry Mellor
|
2a69ab4899
|
Update to Transformers v4.56.2 (#24638)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-30 22:07:07 -07:00 |
|
Lucas Wilkinson
|
8d7da92fd7
|
[BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 (#25988)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-30 21:58:31 -07:00 |
|
Zhewen Li
|
e952eee698
|
[Bugfix] Fix __syncwarp on ROCM (#25996)
|
2025-09-30 21:15:11 -07:00 |
|
Roger Wang
|
66bca9b8bd
|
[MM] Add text-only mode for Qwen3-VL (#26000)
|
2025-09-30 21:13:42 -07:00 |
|
Param
|
99028fda44
|
Fix INT8 quantization error on Blackwell GPUs (SM100+) (#25935)
Signed-off-by: padg9912 <phone.and.desktop@gmail.com>
|
2025-09-30 19:19:53 -07:00 |
|
Wentao Ye
|
1244948885
|
[Log] Optimize Log for FP8MOE (#25709)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-30 19:18:43 -07:00 |
|
Salvatore Cena
|
a73f6491c8
|
Update launch_bounds_utils.h for correct compile on Multiple Cuda Arch - PTXAS out of range Warning (#25843)
Signed-off-by: Salvatore Cena <cena@cenas.it>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-30 19:18:19 -07:00 |
|
Lucia Fang
|
001e50c92c
|
[Model] MTP fallback to eager for DeepSeek v32 (#25982)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-10-01 01:53:22 +00:00 |
|
Lucas Wilkinson
|
96ebcaa3ad
|
[Misc] Make EP kernels install script support uv (#25785)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-30 23:38:34 +00:00 |
|
Andrew Xia
|
5db1870bb9
|
[gpt-oss] use vLLM instead of openai types for streaming (#25186)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-09-30 22:47:07 +00:00 |
|
Harry Mellor
|
2ce26b9b5d
|
[Docs] Remove API Reference from search index (#25949)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-30 22:10:02 +00:00 |
|
Harry Mellor
|
a388252ac4
|
Add explicit pooling classes for the Transformers backend (#25322)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-30 23:07:06 +01:00 |
|
David Ben-David
|
9a9f48dff7
|
[V1] [P/D] Add Support for KV Load Failure Recovery (#19330)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
|
2025-09-30 14:57:08 -07:00 |
|
Jee Jee Li
|
67f3fb0844
|
[Bench] Add DeepSeekV32 to MoE benchmark (#25962)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-30 14:13:48 -07:00 |
|
cjackal
|
43b752c325
|
[Llama4] [multimodal] Fix misplaced dtype cast of cos_sin_cache in Llama4VisionRotaryEmbedding (#25889)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2025-09-30 20:35:15 +00:00 |
|
Or Ozeri
|
cfd302db9b
|
OffloadingConnector: Fix GPU block tracking bug (#25856)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-30 19:53:04 +00:00 |
|
bnellnm
|
fb610ae684
|
[Docs] Add moe kernel features doc (#25297)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-30 19:03:15 +00:00 |
|
Cyrus Leung
|
2f652e6cdf
|
[Doc] Improve MM Pooling model documentation (#25966)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-30 18:58:29 +00:00 |
|
Wentao Ye
|
e6a226efba
|
[Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype' (#25958)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-30 11:13:03 -07:00 |
|
youkaichao
|
a2e6fa7e03
|
[bugfix][deepseek] fix flashmla kernel selection (#25956)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-10-01 00:30:36 +08:00 |
|
Cyrus Leung
|
9f1c4ecaf2
|
[Bugfix] Token type and position embeddings fail to be applied to inputs_embeds (#25922)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-01 00:23:12 +08:00 |
|
Pavani Majety
|
ef283548f7
|
[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging (#25895)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-09-30 10:51:31 -04:00 |
|
Anion
|
f4db5e6de1
|
[Bugfix][Model] Fix inference for Hunyuan dense models (#25354)
Signed-off-by: anion <1005128408@qq.com>
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>
|
2025-09-30 14:38:07 +00:00 |
|
Sergio Paniego Blanco
|
099aaee536
|
Add Hugging Face Inference Endpoints guide to Deployment docs (#25886)
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-30 14:35:06 +00:00 |
|
Asaf Joseph Gardin
|
35fe398c7c
|
[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
|
2025-09-30 07:30:44 -07:00 |
|
ihb2032
|
bb6d43047e
|
[Fix] Improve CPU backend compatibility for RISC-V (#25816)
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
|
2025-09-30 13:48:07 +00:00 |
|
Reza Barazesh
|
bc546f76a1
|
[CI] Move applicable tests to CPU (#24080)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-30 14:45:20 +01:00 |
|
Nicolò Lucchesi
|
80608ba5af
|
[NIXL] Add support for MLA caches with different latent dim (#25902)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-30 12:18:29 +00:00 |
|
Lehua Ding
|
e184c9c510
|
[perf] Use CPU tensor to reduce GPU->CPU sync (#25884)
Signed-off-by: Lehua Ding <lehuading@tencent.com>
|
2025-09-30 19:51:16 +08:00 |
|
Cyrus Leung
|
d7e34b4210
|
[Model] Move vision_feature_select_strategy into resolve_visual_encoder_outputs (#25938)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-30 11:24:57 +00:00 |
|
CSWYF3634076
|
ef6e0e7132
|
[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 (#25936)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2025-09-30 19:11:21 +08:00 |
|
Sergio Paniego Blanco
|
1ad3aca682
|
Updated TRL integration docs (#25684)
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-30 03:10:55 -07:00 |
|
a120092009
|
8d0afa9b42
|
[Doc] Add Cambricon MLU support (#25942)
Signed-off-by: a120092009 <zhaoty0121@gmail.com>
|
2025-09-30 17:59:47 +08:00 |
|
Yongye Zhu
|
fa7e254a7f
|
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
|
2025-09-30 17:14:41 +08:00 |
|
Simon Danielsson
|
e23cacda35
|
[Bugfix]: Clean up chunked prefill logging when using whisper (#25075)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
2025-09-30 08:17:49 +00:00 |
|
Zhou Jiahao
|
2e1b8bc2b6
|
[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not (#25925)
Signed-off-by: zhoukz <me@zhoukz.com>
|
2025-09-30 08:15:23 +00:00 |
|
acisseJZhong
|
e47433b3c1
|
[BugFix] Pass config_format via try_get_generation_config (#25912)
|
2025-09-30 05:09:50 +00:00 |
|
Lucas Wilkinson
|
23194d83e8
|
[BugFix] Fix DP/EP hang (#25906)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-30 04:18:59 +00:00 |
|
Harry Mellor
|
61aedb5ffe
|
MoveVllmConfig from config/__init__.py to config/vllm.py (#25271)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-29 19:49:49 -07:00 |
|
Zhuohan Li
|
d3bd171123
|
[Benchmark] Support benchmark throughput for external launcher DP (#25913)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2025-09-30 01:43:57 +00:00 |
|
Wentao Ye
|
89e4050af4
|
[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-30 09:15:19 +08:00 |
|
Andrew Sansom
|
78a47f87ce
|
Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models (#25717)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-30 08:10:58 +08:00 |
|
Aaron Pham
|
6a113d9aed
|
[V0 Deprecation] Remove vllm.worker and update according imports (#25901)
|
2025-09-29 23:26:11 +00:00 |
|
Nicolò Lucchesi
|
2e4fe48c37
|
[NIXL] Increase default KV block eviction timeout on P (#25897)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-29 21:35:14 +00:00 |
|
Zhuohan Li
|
8eb0a1d906
|
[Doc] Polish example for torchrun dp (#25899)
|
2025-09-29 21:31:34 +00:00 |
|