Kunshang Ji
|
53ec16a705
|
[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-12 07:57:47 -07:00 |
|
Harry Mellor
|
f83b933b84
|
[CI] Bump mypy version to 1.19.1 (#36104)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-10 09:18:28 -07:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Kunshang Ji
|
66a2209645
|
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-05 10:36:39 +00:00 |
|
Robert Shaw
|
97995f6376
|
[MoE Refactor] Create MK for TRTLLM Kernels (#32564)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-03-03 10:39:50 -08:00 |
|
Hongxia Yang
|
f26650d649
|
[ROCm] add amd-quark package in requirements for rocm to use quantized models (#35658)
Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-02 06:02:43 +00:00 |
|
Jesse Cai
|
a60985b07e
|
Fix deprecated v1 config tests (#35327)
Signed-off-by: Jesse Cai <jessecai@fb.com>
|
2026-03-01 20:32:03 -05:00 |
|
Michael Goin
|
de527e1cec
|
[UX] Add --moe-backend arg for explicit kernel selection (#33807)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-25 17:44:44 -08:00 |
|
Matthias Gehre
|
4e2c7caf2d
|
[Bugfix] Add regression test for MoE quant_config under torch.compile (#34335)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-02-20 13:27:26 +08:00 |
|
Linda
|
275e0d2a99
|
[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
|
2026-02-11 12:38:11 +00:00 |
|
Kunshang Ji
|
cb9574eb85
|
[XPU][9/N] clean up existing ipex code/doc (#34111)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-11 00:27:15 -08:00 |
|
Vasiliy Kuznetsov
|
0130223bd9
|
fix memory for online fp8 quantization with streaming weight load (#31914)
Signed-off-by: vasiliy <vasiliy@fb.com>
|
2026-02-02 14:17:42 -05:00 |
|
Michael Goin
|
67ebaff528
|
Refactor NVFP4 Linear utils for ModelOpt and CT (#33201)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-30 16:37:42 -08:00 |
|
Kyle Sayers
|
f857a03f6b
|
[QeRL] Layerwise Reloading (#32133)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-01-30 08:50:05 -07:00 |
|
Robert Shaw
|
af9b69f977
|
[Quantization][Deprecation] Remove Marlin 24 (#32688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 15:54:59 +00:00 |
|
Eldar Kurtić
|
44f08af3a7
|
Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>
|
2026-01-22 13:29:57 -07:00 |
|
Robert Shaw
|
4e31b7f228
|
[Quantization][Deprecation] Remove RTN (#32697)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-21 16:34:42 +00:00 |
|
Vasiliy Kuznetsov
|
d2389c1262
|
fp8 online quant: split out Fp8OnlineLinearMethod (#32189)
|
2026-01-20 18:13:22 -05:00 |
|
vllmellm
|
148117ea2e
|
[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-01-20 14:48:20 +08:00 |
|
Yi Liu
|
50632adc58
|
Consolidate Intel Quantization Toolkit Integration in vLLM (#31716)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
|
2026-01-14 07:11:30 +00:00 |
|
Matt
|
bde57ab2ed
|
[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-10 23:19:46 -08:00 |
|
Robert Shaw
|
5825bbc1f7
|
[Quantization] Deprecate Long Tail of Schemes (#31688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-08 19:07:45 -05:00 |
|
Lucas Wilkinson
|
6cdf015c3c
|
[Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future (#31747)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-08 15:20:49 -08:00 |
|
Zhiwei
|
573a1d1119
|
[ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch (#31905)
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>
|
2026-01-08 15:47:44 +08:00 |
|
Robert Shaw
|
5dcd7ef1f2
|
[MoE Refactor][15/N] Apply Refactor to Fp8 (#31415)
|
2026-01-07 19:42:33 -05:00 |
|
Li, Jiang
|
8becf146bd
|
[Quantization][Refactor] Move CPU GPTQ kernel into MP linear (#31801)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-06 19:10:18 +00:00 |
|
Kevin McKay
|
42b42824ae
|
[Misc] Fix grammar errors in comments and messages (#31115)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:14:02 -08:00 |
|
Kevin McKay
|
ec58c10ce1
|
[Misc] Fix quantization-related typos (#31116)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:13:48 -08:00 |
|
CedricHuang
|
19cc9468fd
|
[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM (#30957)
|
2025-12-21 22:34:49 -05:00 |
|
Robert Shaw
|
b471092d3a
|
[MoE Refactor][4/N] Marlin Fp8 Mk (#31036)
|
2025-12-21 12:37:42 -05:00 |
|
Roberto L. Castro
|
4fa7ce46f3
|
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-12 19:34:23 -08:00 |
|
Cyrus Leung
|
5a87d8b9b1
|
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-10 19:59:35 -08:00 |
|
Kyle Sayers
|
fccd532587
|
[Quantization] FP8 Weight Reloading for Quantized RL Rollout (#28480)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-12-09 13:54:32 -08:00 |
|
HDCharles
|
df01eda4dc
|
[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878)
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
|
2025-11-26 21:35:13 -05:00 |
|
liangel-02
|
1d642872a2
|
[torchao] fix safetensors for sharding (#28169)
Signed-off-by: Angel Li <liangel@meta.com>
|
2025-11-19 16:39:45 -08:00 |
|
Li, Jiang
|
20852c8f4c
|
[CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-19 10:32:00 +08:00 |
|
Alex
|
f6aa122698
|
[CI Sprint] Quantization CI Cleanup (#24130)
Signed-off-by: Alex Yun <alexyun04@gmail.com>
|
2025-11-18 09:21:48 -05:00 |
|
Hank_
|
4d5943bda6
|
[quantization][config] enable override existing quant_config (#28510)
Signed-off-by: Hank <hcc.mayday@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-14 01:24:10 +00:00 |
|
Andreas Karatzas
|
9f0247cfa4
|
VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>
|
2025-11-11 18:34:36 -08:00 |
|
xuebwang-amd
|
05576df85c
|
[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Co-authored-by: fxmarty-amd <felmarty@amd.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-11 12:05:22 -05:00 |
|
Vadim Gimpelson
|
f948ab6945
|
[CI Failure] nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV was removed from HF. Skip it in tests (#28170)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-06 01:22:13 +00:00 |
|
Jerry Zhang
|
03c4c4aa9d
|
Support using Int4PreshuffledTensor after loading (#26066)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-11-04 06:00:57 -05:00 |
|
Varun Sundar Rabindranath
|
4022a9d279
|
[BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904)
|
2025-11-04 15:56:21 +08:00 |
|
Huamin Li
|
1994de99ea
|
[CI Failure] Fix test_kv_cache_model_load_and_run (#27717)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-10-30 12:27:53 +00:00 |
|
Xiangyu Li
|
5cc6bddb6e
|
[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092)
|
2025-10-23 23:26:13 -04:00 |
|
wangxiyuan
|
f6027b2855
|
[1/N][Platform] Cleanup useless function (#26982)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-10-22 09:04:57 +00:00 |
|
Varun Sundar Rabindranath
|
5ff5d94e77
|
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 (#26729)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-21 01:51:14 -04:00 |
|
Michael Goin
|
01c977e96d
|
[CI] Prune Quantization Tests and skip compilation (#27038)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-16 17:26:35 -04:00 |
|
Michael Goin
|
f8a0acbdbe
|
[CI] Enable Blackwell Llama4 MoE tests (#26731)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-15 21:02:57 -06:00 |
|
Michael Goin
|
7e0ef4084a
|
[CI Failure] Fix torchao dep failure for Quantization Test (#26824)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-14 16:41:43 -07:00 |
|