Stefano Castagnetta
|
6183cae1bd
|
[Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang (#38730)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
|
2026-04-01 12:08:40 -07:00 |
|
bnellnm
|
7cf56a59a2
|
[MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-04-01 09:44:08 -04:00 |
|
wliao2
|
4dfad17ed1
|
replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-31 22:32:54 +08:00 |
|
Lucas Kabela
|
e31915063d
|
[Bugfix] Fix for builtins (forward fix of pytorch/177558) (#37234)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-03-31 01:08:11 +00:00 |
|
Andreas Karatzas
|
43cc5138e5
|
[ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-28 22:08:03 -07:00 |
|
TJian
|
58a249bc61
|
[ROCm] [Release] Update ROCm variant from rocm700 to rocm721 (#38413)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-28 06:07:03 +00:00 |
|
Harry Mellor
|
b3601da6e7
|
[Mypy] Fix mypy for vllm/model_executor (except vllm/model_executor/layers) (#37904)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-24 17:14:01 +00:00 |
|
Wentao Ye
|
45bd5c8e75
|
[Mypy] Fix mypy for vllm/config (#37808)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-23 14:33:59 +00:00 |
|
Wei Zhao
|
b36adfa349
|
[Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-17 20:09:20 +00:00 |
|
Isotr0py
|
a836524d20
|
[Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-17 14:44:19 +00:00 |
|
Kunshang Ji
|
747b068136
|
[Hardware] Replace memory related torch.cuda APIs (#37031)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
|
2026-03-16 10:24:48 +00:00 |
|
Dimitrios Bariamis
|
cc16b24b17
|
Update Flashinfer to 0.6.6 (#36768)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
|
2026-03-12 13:19:19 -04:00 |
|
Kunshang Ji
|
53ec16a705
|
[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-12 07:57:47 -07:00 |
|
Martin Hickey
|
7f1f36bf91
|
[CI] Fix mypy for vllm/reasoning (#35742)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-12 12:21:33 +00:00 |
|
Yan Ma
|
894843eb25
|
replace with torch.cuda.device with with torch.accelerator.device_index (#36144)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-03-11 23:12:57 -07:00 |
|
Harry Mellor
|
a0f44bb616
|
Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-08 20:05:24 -07:00 |
|
Wei Zhao
|
379689d533
|
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891)
|
2026-03-07 13:51:54 -08:00 |
|
Kunshang Ji
|
66a2209645
|
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-05 10:36:39 +00:00 |
|
Taneem Ibrahim
|
1aaec59d79
|
[MISC] fixed tool_parser mypy errors (#35640)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 12:23:12 +00:00 |
|
Kunshang Ji
|
16d2ad1d38
|
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 09:49:47 +00:00 |
|
TJian
|
5dfc5abe94
|
[ROCm] [Release] Change the package from aiter to amd-aiter (#35198)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-02 23:13:39 -08:00 |
|
Martin Hickey
|
7560d674c9
|
[CI] Fix mypy for vllm/device allocator (#35518)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-02 15:53:18 +00:00 |
|
Lucas Wilkinson
|
8b5014d3dd
|
[Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-01 23:44:57 +00:00 |
|
Taneem Ibrahim
|
59d7af9c6c
|
[MISC] Fixing a null reference by removing parallel_utils from mypy EXCLUDE (#35630)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-03-01 09:26:44 -05:00 |
|
Aaron Hao
|
2ce6f3cf67
|
[Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-02-27 13:45:21 -07:00 |
|
Wentao Ye
|
062b789632
|
[Bug] Fix outdated links in source code (#35314)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-27 03:50:46 +00:00 |
|
Tyler Michael Smith
|
eb19955c37
|
[WideEP] Remove pplx all2all backend (#33724)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-26 14:30:10 -08:00 |
|
Taneem Ibrahim
|
d38cd3dde5
|
[Misc] Fix mypy errors in vllm/profiler and remove from exclude list (#34959)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-02-20 19:56:33 -08:00 |
|
junuxyz
|
c61a98f529
|
[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
|
2026-02-17 12:22:56 +00:00 |
|
Aneesh Puttur
|
0b5f9b7204
|
[CI] Enable mypy import following for vllm/v1/kv_offload (#34639)
Signed-off-by: Aneesh Puttur <aneeshputtur@gmail.com>
|
2026-02-17 09:58:15 +08:00 |
|
Lucas Kabela
|
a3205beffb
|
[CI] Enable mypy coverage for individual excluded files (#34292)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-16 07:34:29 -08:00 |
|
Andreas Karatzas
|
4c078fa546
|
[ROCm][CI] Pin TorchCodec to v0.10.0 for ROCm compatibility (#34447)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-12 18:47:34 +00:00 |
|
Matthew Bonanni
|
f2c47886fd
|
[Attention] Add FlashInfer Sparse MLA backend (#33451)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-02-12 17:21:54 +00:00 |
|
junuxyz
|
fa7e0bfacf
|
[CI][BugFix] Fix silent failure in shellcheck hook and baseline exist… (#32458)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
|
2026-02-11 17:03:48 +00:00 |
|
Tyler Michael Smith
|
c4b9e6778f
|
[Misc] Add pre-commit hook to catch boolean ops in with-statements (#34271)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-10 15:13:20 -08:00 |
|
Michael Goin
|
5e75a14a66
|
[Doc] Add DCP support to attention backend doc (#33936)
|
2026-02-09 18:33:43 -05:00 |
|
zifeitong
|
52181baaea
|
Update DeepGEMM version pin in Dockerfile to match #32479 (#33935)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-02-07 05:30:22 -08:00 |
|
Harry Mellor
|
791a94bed0
|
Consolidate and fix forbidden import pre-commit checks (#33982)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-06 01:47:41 -08:00 |
|
Harry Mellor
|
61e632aea1
|
Turn @config into a dataclass_transform (#31541)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-03 17:40:59 +00:00 |
|
Lucas Kabela
|
726d89720c
|
[CI] Enable mypy import following for vllm/spec_decode (#33282)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-30 06:43:32 +00:00 |
|
Harry Mellor
|
fb946a7f89
|
Make mypy opt-out instead of opt-in (#33205)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-29 09:12:26 +00:00 |
|
TJian
|
f9d03599ef
|
[Release] [CI] Optim release pipeline (#33156)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-28 22:45:42 -08:00 |
|
Matthew Bonanni
|
77c4f45c6c
|
[7/N][Attention][Docs] Add documentation for attention backends (#32477)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-28 17:20:22 -05:00 |
|
Harry Mellor
|
f1acbd68c5
|
[CI] Enable mypy import following for vllm/compilation (#33199)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 08:59:54 +00:00 |
|
Matthew Bonanni
|
a608b4c6c2
|
[5/N][Attention] Finish eliminating vllm/attention folder (#32064)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-27 10:02:51 -05:00 |
|
Wentao Ye
|
7ef5873752
|
[CI] Fix mypy for vllm/v1/structured_output (#32722)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-23 11:55:51 +08:00 |
|
Lucas Kabela
|
15e302dfce
|
[Misc][BE] Turn on strict type coverage for vllm/compilation (#31756)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-22 15:12:26 +00:00 |
|
Cyrus Leung
|
d117a4d1a9
|
[Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-22 12:44:22 +00:00 |
|
Robert Shaw
|
42135d6898
|
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414)
|
2026-01-21 08:22:33 -05:00 |
|
dolpm
|
7c5dedc247
|
[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-20 19:45:59 +00:00 |
|