Wentao Ye
|
d88a1df699
|
[Deprecation] Deprecate profiling envs (#33722)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-04 05:58:21 +00:00 |
|
杨朱 · Kiki
|
b95cc5014d
|
[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-02-03 15:01:59 +08:00 |
|
杨朱 · Kiki
|
ef248ff740
|
[Misc] Remove deprecated profiler environment variables (#33536)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-02-03 14:58:44 +08:00 |
|
Pavani Majety
|
c3a9752b0c
|
[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. (#32437)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2026-01-30 10:30:46 -08:00 |
|
Harry Mellor
|
fb946a7f89
|
Make mypy opt-out instead of opt-in (#33205)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-29 09:12:26 +00:00 |
|
Roger Wang
|
b539f988e1
|
[Models] Kimi-K2.5 (#33131)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-27 14:50:31 +08:00 |
|
dolpm
|
58a05b0ca1
|
[fix] CPUDNNLGEMMHandler pointer baked into inductor artifact (#32913)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-26 16:59:44 -05:00 |
|
Alex Brooks
|
9ac818a551
|
[Misc] HF Hub LoRA Resolver (#20320)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2026-01-26 13:56:32 +00:00 |
|
Jee Jee Li
|
73b243463b
|
[BugFix] Add env variable to control PDL in LoRA (#32836)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-25 16:32:30 +08:00 |
|
dolpm
|
0118cdcc02
|
[fix] add VLLM_OBJECT_STORAGE_SHM_BUFFER_NAME to compile factors (#32912)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-23 22:53:10 +00:00 |
|
Xin Yang
|
90c2007932
|
[Bugfix] Disable tma_aligned_scales in test_fusions_e2e (#32916)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-23 14:34:30 +00:00 |
|
Nick Hill
|
7fe255889e
|
[Misc] Log vLLM logo when starting server (#32796)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-23 11:15:12 +08:00 |
|
Isotr0py
|
8ebf271bb6
|
[Misc] Replace urllib's urlparse with urllib3's parse_url (#32746)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-22 16:37:15 +08:00 |
|
Alex Sun
|
49a1262267
|
[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664)
Signed-off-by: Alex Sun <alex.s@amd.com>
|
2026-01-22 16:33:18 +08:00 |
|
Wentao Ye
|
6437ff1fb9
|
[Deprecation] Remove deprecated environment variables (#32812)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-22 02:25:16 +00:00 |
|
dolpm
|
7c5dedc247
|
[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-20 19:45:59 +00:00 |
|
Walter Beller-Morales
|
8be263c3fb
|
[Core] Cleanup shm based object store on engine shutdown (#32429)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-01-20 08:53:37 +00:00 |
|
Karan Bansal
|
3055232ba0
|
[Feature] Add FIPS 140-3 compliant hash algorithm option for multimodal hashing (#32386)
Signed-off-by: Karan Bansal <karanb192@gmail.com>
|
2026-01-18 11:02:01 +08:00 |
|
TomerBN-Nvidia
|
c277fbdf31
|
[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257)
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>
|
2026-01-15 16:15:05 -08:00 |
|
Aleksandr Malyshev
|
8c11001ba2
|
[ROCM] DSfp4 mla projection gemms weight dynamic quantization (#32238)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2026-01-15 14:13:08 -06:00 |
|
Pleaplusone
|
130d6c9514
|
[ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for AiterFlashAttentionBackend (#29887)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-01-15 15:29:53 +00:00 |
|
Roberto L. Castro
|
8ef50d9a6b
|
[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
|
2026-01-13 15:22:53 -08:00 |
|
Fadi Arafeh
|
9103ed1696
|
[CPU][BugFix] Disable AOT Compile for CPU (#32037)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-10 23:15:49 -08:00 |
|
Matthew Bonanni
|
2612ba9285
|
[1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-09 13:10:24 -08:00 |
|
Michael Goin
|
d5ec6c056f
|
[UX] Add vLLM model inspection view (#29450)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-09 10:12:35 -07:00 |
|
inkcherry
|
4505849b30
|
[ROCm][PD] add moriio kv connector. (#29304)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2026-01-09 14:01:57 +00:00 |
|
Kate Cheng
|
cc6dafaef2
|
[Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 Linear Op) (#29213)
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2026-01-07 10:53:54 -05:00 |
|
Wentao Ye
|
af9a7ec255
|
[Bug] Revert torch warning fix (#31585)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-05 22:31:21 +00:00 |
|
Seiji Eicher
|
1ab5213531
|
Make engine core client handshake timeout configurable (#27444)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-12-19 20:38:30 +00:00 |
|
Elizabeth Thomas
|
41b6f9200f
|
Remove all2all backend envvar (#30363)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-18 19:46:28 +00:00 |
|
SungMinCho
|
a0b782f9cc
|
[Metrics] Model FLOPs Utilization estimation (#30738)
Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-12-18 01:40:51 +00:00 |
|
Zhengxu Chen
|
9db1db5949
|
[compile] Ignore VLLM_FORCE_AOT_LOAD from cache factors (#30809)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-17 01:56:24 -08:00 |
|
Lucas Wilkinson
|
9fec0e13d5
|
[Attention] Cache attention metadata builds across hybrid KV-cache groups (#29627)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
|
2025-12-16 17:10:16 -05:00 |
|
Lucas Wilkinson
|
3e41992fec
|
[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-12 05:57:47 -08:00 |
|
Wentao Ye
|
d6464f2679
|
[Chore] Fix torch precision warning (#30428)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-11 04:05:56 +00:00 |
|
Cyrus Leung
|
7e24e5d4d6
|
[Deprecation] Remove deprecated task, seed and MM settings (#30397)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-10 19:59:39 -08:00 |
|
Jialin Ouyang
|
9f042ba26b
|
[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-12-10 14:13:01 -05:00 |
|
Benjamin Chislett
|
e858bfe051
|
[Cleanup] Refactor profiling env vars into a CLI config (#29912)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-09 13:29:33 -05:00 |
|
Wentao Ye
|
83319b44c2
|
[Compile] Fix torch warning TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled (#29897)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-09 10:40:37 -05:00 |
|
Ming Yang
|
9d6235ca9a
|
[moe] Allow disabling DP chunking (#29936)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-12-09 00:29:36 +00:00 |
|
dtc
|
842aba501d
|
[P/D] Introduce Mooncake Transfer Engine as kv_connector (#24718)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: dtc <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-12-04 09:51:36 +00:00 |
|
Shengqi Chen
|
1109f98288
|
[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-03 14:08:19 -08:00 |
|
Elizabeth Thomas
|
b5407869c8
|
[Bugfix] Respect VLLM_CONFIGURE_LOGGING value (#28671)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Jane Xu <janeyx@meta.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Johnny Yang <johnnyyang@google.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: bruceszchen <bruceszchen@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Johnny Yang <24908445+jcyang43@users.noreply.github.com>
|
2025-12-03 22:00:52 +00:00 |
|
Amr Mahdi
|
f5d3d93c40
|
[docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2025-12-03 11:41:53 +00:00 |
|
Andrew Xia
|
52cb349fc0
|
[responsesAPI][3] ResponsesParser to set up non harmony MCP (#29413)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-12-02 11:24:45 -05:00 |
|
Shengqi Chen
|
4b612664fd
|
[CI] Renovation of nightly wheel build & generation (take 2) (#29838)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-01 22:17:10 -08:00 |
|
Kevin H. Luu
|
1336a1ea24
|
Revert #29787 and #29690 (#29815)
|
2025-12-01 13:42:03 -08:00 |
|
Shengqi Chen
|
36db0a35e4
|
[CI] Renovation of nightly wheel build & generation (#29690)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-01 21:25:39 +08:00 |
|
Yifei Zhang
|
1ab8fc8197
|
Make PyTorch profiler gzip and CUDA time dump configurable (#29568)
Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>
|
2025-12-01 04:30:46 +00:00 |
|
Shu Wang
|
f72a817bdf
|
[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141)
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-30 16:05:32 -08:00 |
|