83319b44c2
[Compile] Fix torch warning TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled (#29897)
Wentao Ye
2025-12-09 10:40:37 -05:00
c72ea10723
[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. (#30056)
Hubert de La Jonquiere
2025-12-09 11:54:08 +01:00
67475a6e81
[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA (#30309)
Jaya Yuan
2025-12-09 16:22:14 +08:00
184076c3fe
[DeepSeek v3.2] Make top-k work for any logit values. (#27568)
Daniel Cámpora
2025-12-08 15:55:58 +01:00
eb1051fb95
[ROCm] Guard group quant RMS norm fusion patterns (#30239)
Ye (Charlotte) Qi
2025-12-08 06:44:48 -08:00
80433e225e
[LoRA] Reduce the loading time of MoE LoRA (#30243)
Jee Jee Li
2025-12-08 21:29:47 +08:00
5c2433a6f3
Add tip for mypy and markdownlint to the pre-commit comment (#30259)
Harry Mellor
2025-12-08 13:11:51 +00:00
77072e93b3
[docs] governance documents (#24801)
Simon Mo
2025-12-08 03:06:20 -09:00
2e660c2434
[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. (#30249)
wang.yuqi
2025-12-08 20:01:21 +08:00
408cf42f67
[CI] Prevents triggering of an inactive issue/PR check for forked repository. (#29654)
Shiming Zhang
2025-12-08 18:29:14 +08:00
9e77ffca3f
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686)
wang.yuqi
2025-12-08 16:10:09 +08:00
bcb6f5947f
[Perf] Remove sync point in vit torch sdpa attn backend (#30232)
Dazhi Jiang
2025-12-08 15:12:42 +08:00
cd00c443d2
[Misc] Rename TensorRT Model Optimizer to Model Optimizer (#30091)
Zhiyu
2025-12-07 23:05:27 -08:00
4026ae31e9
[Misc] Move disable_nccl_for_dp_synchronization init logic into VllmConfig (#30161)
Nick Hill
2025-12-05 20:59:04 -08:00
b12f4a9830
[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985)
rasmith
2025-12-05 22:57:38 -06:00
e858bc4d14
[Model] Add support for transformer-based Ultravox v0.7 projector (#30089)
Peter Salas
2025-12-05 20:55:43 -08:00
e3fbb6f152
fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb (#30093)
Dongjie Zou
2025-12-05 23:55:09 -05:00
c4d62618ca
Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend (#30102)
yuttian1
2025-12-06 12:54:38 +08:00
62079d8600
[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109)
rasmith
2025-12-05 22:54:17 -06:00
bf4a901af9
Better error when world size is larger than node and distributed_executor_backend is not set (#30140)
Harry Mellor
2025-12-06 04:53:52 +00:00
7e31c3a3f6
[CI]: Remove unnecessary imports from test_lmache_integration (#30157)
Samuel Shen
2025-12-05 20:53:34 -08:00
dc839ad03d
[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151)
rasmith
2025-12-05 22:52:11 -06:00