This website requires JavaScript.
ddeec11ba9
[Bugfix][P/D] Reduce num_threads used by nixl ucx backend (#27196 )
David Whyte-Gray
2025-10-21 13:41:52 -04:00
86ed77022d
[Feature] Batch Invariant for R1 TP 8 on Blackwell (#27229 )
Wentao Ye
2025-10-21 13:25:55 -04:00
aa1356ec53
[ROCm] Update Triton, Torch, and AITER branches for ROCm base Dockerfile (#27206 )
Micah Williamson
2025-10-21 11:01:23 -05:00
ecc3c0940a
Add @pavanimajety to .github/codeowners for Flashinfer, ModelOpt related code (#27213 )
Pavani Majety
2025-10-21 07:59:53 -07:00
ba09652de2
[ROCM] Enable CompressedTensorsWNA16 (#27187 )
JartX
2025-10-21 16:43:23 +02:00
bd66b8529b
[CI] Install pre-release version of apache-tvm-ffi for flashinfer (#27262 )
Harry Mellor
2025-10-21 15:23:56 +01:00
6c728f7771
[Chore] Separate out NCCL utilities from vllm.utils (#27197 )
dongbo910220
2025-10-21 21:18:23 +08:00
80e9452984
[Deepseek v3.2] Optimize top_k_per_row (#26763 )
Daniel Cámpora
2025-10-21 10:30:07 +02:00
c3a2c6ac5f
[MM][Core] Decouple ViT backend from LM backend (#27061 )
Roger Wang
2025-10-21 00:30:10 -07:00
72f431e709
[Nixl] Minor refactor to handshake related metadata (#26410 )
Nicolò Lucchesi
2025-10-21 09:07:47 +02:00
be4445072c
[Fix][Spec Decode] Fix llama4 draft loading with different quantization (#27136 )
Zebing Lin
2025-10-21 02:19:00 -04:00
f381cf2302
[Bugfix] Fix broken MTP weight loading for FP8 KV Scales (#27227 )
Benjamin Chislett
2025-10-21 01:51:44 -04:00
5ff5d94e77
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 (#26729 )
Varun Sundar Rabindranath
2025-10-21 01:51:14 -04:00
f95da13c3d
[ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 (#26135 )
Shu Wang
2025-10-21 00:50:31 -05:00
aef368aa08
[BugFix] GPT-OSS Attention DP + MoE TP weight loading issue (#24032 )
Po-Han Huang (NVIDIA)
2025-10-21 12:03:47 +08:00
5f6cbf60d6
[Feature][Kernel]FusedMoE LoRA (#21229 )
Chen Wu
2025-10-21 11:01:37 +08:00
3ada34f9cb
[Frontend] Enforce tokenize=False when applying chat template (#27205 )
Russell Bryant
2025-10-20 22:57:34 -04:00
0eb8f2b880
create is_in_the_same_node on cpu (#26832 )
Lunwen He
2025-10-20 19:04:14 -07:00
163965d183
[cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 (#27183 )
Fadi Arafeh
2025-10-21 03:02:58 +01:00
a03cf9bc70
[V0 Deprecation] Remove V0 metrics code (#27215 )
Nick Hill
2025-10-20 19:02:10 -07:00
352c0c8a28
[Quantization] Automatically infer AWQ modules_to_not_convert field (#26909 )
Isotr0py
2025-10-21 09:49:28 +08:00
bfe0b4bd2a
[ez] add uv lock to gitignore (#27212 )
Andrew Xia
2025-10-20 17:37:44 -07:00
58fbbcb2f5
[ROCm] enable some tests in entrypoints test groups on AMD (#26725 )
Concurrensee
2025-10-20 19:37:16 -05:00
87778d5f00
[Feature][Quantization] auto_round support for mixed bits quantization (#23812 )
Heng Guo
2025-10-21 06:23:30 +08:00
f9e7ad5400
[Bugfix][CI] Fix Distributed Tests (4 GPUs) async_sched+ray test (#27195 )
Nicolò Lucchesi
2025-10-20 18:34:54 +02:00
4d0f266113
[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) (#26268 )
shivampr
2025-10-20 07:48:01 -07:00
e93ff6c8b9
Nemotron Nano V2 VL + EVS Video Support (#27107 )
Eugene Khvedchenya
2025-10-20 17:19:11 +03:00
1c691f4a71
AArch64 CPU Docker pipeline (#26931 )
ioana ghiban
2025-10-20 13:09:40 +02:00
9fce7bee74
[Kernel] Accelerate solve_tril with TMA (#26746 )
Jiangyun Zhu
2025-10-20 13:39:02 +08:00
b63f2143f8
[LoRA] LoRA cuda graph specialization (#25914 )
Andy Lo
2025-10-20 05:21:09 +01:00
f32bf7582e
[Model][VLM] Support Bee-8B Model (#27012 )
Yi Zhang
2025-10-20 10:31:26 +08:00
8a81d776ce
Fix typo in ValueError message: use kv_role instead of kv_disagg_role (#27166 )
Yongtao Huang
2025-10-20 03:47:19 +08:00
f6fdacd82c
[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled (#26586 )
Sergei Skvortsov
2025-10-19 20:24:46 +01:00
d31f7844f8
[Misc] Move utils to avoid conflicts with stdlib, and move tests (#27169 )
Cyrus Leung
2025-10-19 20:20:55 +08:00
7a6c8c3fa1
[Chore] Separate out vllm.utils.network_utils (#27164 )
iAmir97
2025-10-19 17:06:32 +07:00
221bf72577
output type conversion fix (#27159 )
Jianyu Huang
2025-10-19 01:10:07 -07:00
b3aba04e5a
[Benchmark] Convenience script for multiple parameter combinations (#27085 )
Cyrus Leung
2025-10-19 14:57:01 +08:00
8a297115e2
[Chore] Separate out hashing utilities from vllm.utils (#27151 )
dongbo910220
2025-10-19 11:09:38 +08:00
191eed0bb9
[BugFix] Fix lazy imports involving outlines_core (#27158 )
22quinn
2025-10-18 19:35:32 -07:00
fb860670da
[Minor] Remove unused env variable (#27161 )
Woosuk Kwon
2025-10-18 18:48:35 -07:00
83e760c57d
[V1][Metrics][Plugin] Add plugin support for custom StatLoggerBase implementations (#22456 )
Tova Movshovitz
2025-10-19 01:12:46 +03:00
c2bba69065
[BugFix] Disable fp8 kv-cache by default for DeepSeek V3.2 (#27121 )
Lucas Wilkinson
2025-10-18 18:05:23 -04:00
e133d6d218
[BugFix] fix graph partition signature (#27139 )
Boyuan Feng
2025-10-18 14:34:36 -07:00
a1946c9f61
[Chore] Separate out profiling utilities from vllm.utils (#27150 )
dongbo910220
2025-10-19 03:12:01 +08:00
9f020f4f31
[BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] (#27111 )
Lucas Wilkinson
2025-10-18 14:44:39 -04:00
3b45075206
[Minor] Add some clarifying comments to recent changes (#27130 )
Nick Hill
2025-10-18 09:52:45 -07:00
168e578efc
Fix incorrect string formatting in barrier timeout exceptions (#27149 )
Yongtao Huang
2025-10-19 00:51:57 +08:00
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils (#26908 )
Isotr0py
2025-10-19 00:48:22 +08:00
5c2acb270a
[Models][QwenVL] Remove unnecessary .contiguous() calls (#27106 )
Lukas Geiger
2025-10-18 16:05:05 +02:00
b26b70bec4
[Misc] Refactor get_kv_cache_spec into AttentionLayerBase (#26587 )
Nicolò Lucchesi
2025-10-18 15:51:21 +02:00
ab4be40fc5
[fix][cpu] fix prefill attention in CPU attention backend (#27035 )
Fadi Arafeh
2025-10-18 14:30:21 +01:00
245e4f2c01
[Feature] Batch Invariant: Support DeepGEMM and Blackwell (#27127 )
Wentao Ye
2025-10-18 09:28:05 -04:00
1d165d6d85
[Chore] Separate out vllm.utils.mem_utils (#27143 )
iAmir97
2025-10-18 17:06:59 +07:00
83004020fd
[Test] Add test for /health endpoint on engine failure (#26074 )
dongbo910220
2025-10-18 17:59:05 +08:00
12e21701e7
[DOC][FEATURES][CPU]update cpu feature for v1 (#27135 )
Chendi.Xue
2025-10-18 03:10:45 -05:00
30a33b92ee
[Misc] Rev DeepEP (#27122 )
Varun Sundar Rabindranath
2025-10-18 02:54:29 -04:00
7c572544e4
[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot (#25515 )
Hanchenli
2025-10-17 21:55:54 -07:00
c312320764
[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests (#26663 )
Huamin Li
2025-10-17 21:11:26 -07:00
c981f0ea78
[Perf] Add H100 fused MoE config (#25398 )
ZiTian Zhao
2025-10-18 10:21:27 +08:00
6367bde739
[BugFix][Core] Fix error when enable async-scheduling in multi-node env (#25887 )
Lehua Ding
2025-10-18 06:16:18 +08:00
f50cc221ea
[Test] Make test_failure more stable for batch invariance (#27054 )
Wentao Ye
2025-10-17 16:59:08 -04:00
acedc74b1a
[V1][Spec Decode] Fix greedy temperature detection after sampler refactor (#27077 )
Pradyun92
2025-10-17 16:27:47 -04:00
d29483b58a
[Minor] Remove unnecessary error message (#27115 )
Zhuohan Li
2025-10-17 13:02:12 -07:00
950cf9e58e
[Bugfix] Use PIECEWISE cudagraphs on Blackwell if max_model_len > 131072 (#27114 )
Michael Goin
2025-10-17 15:47:18 -04:00
3125d79950
[Chore] Remove unused PolyNorm layer (#27110 )
Isotr0py
2025-10-18 03:03:43 +08:00
e33ee23ee3
[Bugfix] [AITER] [ROCm] Fix Quark MoE Quant Config and AITER Fused MoE quant type logic (#27029 )
vllmellm
2025-10-18 02:51:10 +08:00
b10c64c834
[ROCm][Bugfix][Model] Fix illegal memory access when running qwen3_moe models with rms_norm (Qwen3-235B-A22B, Qwen3-30B-A3B, etc.) (#26192 )
rasmith
2025-10-17 13:17:18 -05:00
0925b28a8e
[ROCM] MoE fp4 CK kernel (#26545 )
Aleksandr Malyshev
2025-10-17 11:06:33 -07:00
99722d5f0e
[CI] Remove forbidden slash (#27112 )
Nicolò Lucchesi
2025-10-17 18:38:00 +02:00
4c91a28e30
[bugfix] Qwen3-VL fix video incorrect timestamp calculations while do_sample_frames=True (#27104 )
燃
2025-10-18 00:26:33 +08:00
b038d9c40c
[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) (#26367 )
Patrick von Platen
2025-10-17 17:24:42 +02:00
2ba60ec7fe
[CI] Nixl integration tests (#27010 )
Nicolò Lucchesi
2025-10-17 16:13:31 +02:00
bd7157a071
[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604 )
Luka Govedič
2025-10-17 10:10:23 -04:00
be429d0cfd
Fix incorrect docstring for stop_profile() method (#27101 )
Yongtao Huang
2025-10-17 21:30:23 +08:00
c253745eb8
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 (#25586 )
Reima Karhila (AMD)
2025-10-17 14:56:12 +03:00
daec4d2624
[Model]Improve Qwen3VLMoeForConditionalGeneration packed_modules_mapping (#27096 )
Jee Jee Li
2025-10-17 19:47:00 +08:00
6c9fdbf725
[Docs] Replace rst style double-backtick with md single-backtick (#27091 )
Harry Mellor
2025-10-17 10:47:34 +01:00
483ea64611
[Docs] Replace all explicit anchors with real links (#27087 )
Harry Mellor
2025-10-17 10:22:06 +01:00
e20eba753b
[VLM][Refactor] Remove useless func get_input_positions in MRotaryEmbedding (#27088 )
Mengqing Cao
2025-10-17 17:00:30 +08:00
bbc1b29665
Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage (#27069 )
cong-meta
2025-10-17 01:53:06 -07:00
acb1bfa601
[CI] fix docs build failed (#27082 )
Chauncey
2025-10-17 15:53:40 +08:00
75c7ad9918
[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717 )
zhrrr
2025-10-17 15:30:35 +08:00
5550ff9c25
[CI/Build] Update compressed tensor test path to fix CPU CI (#27068 )
Li, Jiang
2025-10-17 13:34:56 +08:00
3aeb19a39e
[Model] Add support for LightOnOCR (#26916 )
Said Taghadouini
2025-10-17 07:05:24 +02:00
8c017b3490
[Model] Always use Transformers backend for PaliGemma and Gemma3-MM (#26715 )
Cyrus Leung
2025-10-17 13:03:35 +08:00
9c2c2287a0
[CI/Build] Update Llama4 eval yaml (#27070 )
Zhewen Li
2025-10-16 21:59:47 -07:00
fec2b341ad
[Kernel] Lazy import FlashInfer (#26977 )
Jee Jee Li
2025-10-17 12:48:18 +08:00
87bc0c492f
[Bugfix] Fix ReplicatedLinearWithLoRA (#27065 )
Jee Jee Li
2025-10-17 12:43:16 +08:00
fe3b9372ad
[Core] Change execute_model_with_error_logging() to be a ctx manager (#27060 )
Nick Hill
2025-10-16 20:45:32 -07:00
bde9e2272a
[Bugfix][Qwen] fixes the weights dtype in qwen3_next: it is actually a bfloat16 (#27030 )
Tao He
2025-10-17 11:37:52 +08:00
08405609cc
disable graph partition in custom op (#26952 )
Boyuan Feng
2025-10-16 20:08:47 -07:00
ab81379ea6
[Perf] Exploit out-of-band buffers in shm_broadcast (#26961 )
Nick Hill
2025-10-16 20:08:03 -07:00
4ffd6e8942
[Docs] Reduce custom syntax used in docs (#27009 )
Harry Mellor
2025-10-17 04:05:34 +01:00
965c5f4914
vllm bench serve shows num of failed requests (#26478 )
Tomas Ruiz
2025-10-17 04:55:09 +02:00
4d055ef465
Remove unused imports (#26972 )
Lukas Geiger
2025-10-17 03:51:17 +01:00
17c540a993
[torch.compile] fix simple inductor graph partition test (#27050 )
Boyuan Feng
2025-10-16 18:09:36 -07:00
4d4d6bad19
[Chore] Separate out vllm.utils.importlib (#27022 )
Cyrus Leung
2025-10-17 08:48:59 +08:00
11ae016bd7
[torch.compile] Passing only necessary compilation config to inductor pass config (#27041 )
Lucia Fang
2025-10-16 17:01:52 -07:00
41d3071918
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714 )
jiahanc
2025-10-16 16:20:25 -07:00
fb5e10d3fb
Refactor Transformers backend to use mixins (#26906 )
Harry Mellor
2025-10-16 22:50:39 +01:00