biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Harry Mellor	f83b933b84	[CI] Bump `mypy` version to 1.19.1 (#36104 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-10 09:18:28 -07:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Robert Shaw	97995f6376	[MoE Refactor] Create MK for TRTLLM Kernels (#32564 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-03-03 10:39:50 -08:00
Hongxia Yang	f26650d649	[ROCm] add amd-quark package in requirements for rocm to use quantized models (#35658 ) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>	2026-03-02 06:02:43 +00:00
Jesse Cai	a60985b07e	Fix deprecated v1 config tests (#35327 ) Signed-off-by: Jesse Cai <jessecai@fb.com>	2026-03-01 20:32:03 -05:00
Michael Goin	de527e1cec	[UX] Add `--moe-backend` arg for explicit kernel selection (#33807 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-25 17:44:44 -08:00
Matthias Gehre	4e2c7caf2d	[Bugfix] Add regression test for MoE quant_config under torch.compile (#34335 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-02-20 13:27:26 +08:00
Linda	275e0d2a99	[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-02-11 12:38:11 +00:00
Kunshang Ji	cb9574eb85	[XPU][9/N] clean up existing ipex code/doc (#34111 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-11 00:27:15 -08:00
Vasiliy Kuznetsov	0130223bd9	fix memory for online fp8 quantization with streaming weight load (#31914 ) Signed-off-by: vasiliy <vasiliy@fb.com>	2026-02-02 14:17:42 -05:00
Michael Goin	67ebaff528	Refactor NVFP4 Linear utils for ModelOpt and CT (#33201 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-30 16:37:42 -08:00
Kyle Sayers	f857a03f6b	[QeRL] Layerwise Reloading (#32133 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-01-30 08:50:05 -07:00
Robert Shaw	af9b69f977	[Quantization][Deprecation] Remove Marlin 24 (#32688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 15:54:59 +00:00
Eldar Kurtić	44f08af3a7	Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>	2026-01-22 13:29:57 -07:00
Robert Shaw	4e31b7f228	[Quantization][Deprecation] Remove RTN (#32697 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-21 16:34:42 +00:00
Vasiliy Kuznetsov	d2389c1262	fp8 online quant: split out Fp8OnlineLinearMethod (#32189 )	2026-01-20 18:13:22 -05:00
vllmellm	148117ea2e	[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-20 14:48:20 +08:00
Yi Liu	50632adc58	Consolidate Intel Quantization Toolkit Integration in vLLM (#31716 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2026-01-14 07:11:30 +00:00
Matt	bde57ab2ed	[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-10 23:19:46 -08:00
Robert Shaw	5825bbc1f7	[Quantization] Deprecate Long Tail of Schemes (#31688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-08 19:07:45 -05:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
Zhiwei	573a1d1119	[ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch (#31905 ) Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>	2026-01-08 15:47:44 +08:00
Robert Shaw	5dcd7ef1f2	[MoE Refactor][15/N] Apply Refactor to Fp8 (#31415 )	2026-01-07 19:42:33 -05:00
Li, Jiang	8becf146bd	[Quantization][Refactor] Move CPU GPTQ kernel into MP linear (#31801 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-06 19:10:18 +00:00
Kevin McKay	42b42824ae	[Misc] Fix grammar errors in comments and messages (#31115 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2025-12-21 21:14:02 -08:00
Kevin McKay	ec58c10ce1	[Misc] Fix quantization-related typos (#31116 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2025-12-21 21:13:48 -08:00
CedricHuang	19cc9468fd	[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM (#30957 )	2025-12-21 22:34:49 -05:00
Robert Shaw	b471092d3a	[MoE Refactor][4/N] Marlin Fp8 Mk (#31036 )	2025-12-21 12:37:42 -05:00
Roberto L. Castro	4fa7ce46f3	[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 ) Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-12 19:34:23 -08:00
Cyrus Leung	5a87d8b9b1	[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:35 -08:00
Kyle Sayers	fccd532587	[Quantization] FP8 Weight Reloading for Quantized RL Rollout (#28480 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-12-09 13:54:32 -08:00
HDCharles	df01eda4dc	[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878 ) Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>	2025-11-26 21:35:13 -05:00
liangel-02	1d642872a2	[torchao] fix safetensors for sharding (#28169 ) Signed-off-by: Angel Li <liangel@meta.com>	2025-11-19 16:39:45 -08:00
Li, Jiang	20852c8f4c	[CPU] Refactor CPU WNA16 (#28826 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-19 10:32:00 +08:00
Alex	f6aa122698	[CI Sprint] Quantization CI Cleanup (#24130 ) Signed-off-by: Alex Yun <alexyun04@gmail.com>	2025-11-18 09:21:48 -05:00
Hank_	4d5943bda6	[quantization][config] enable override existing quant_config (#28510 ) Signed-off-by: Hank <hcc.mayday@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-14 01:24:10 +00:00
Andreas Karatzas	9f0247cfa4	`VLLM_USE_TRITON_FLASH_ATTN` V0 variable deprecation (#27611 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>	2025-11-11 18:34:36 -08:00
xuebwang-amd	05576df85c	[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Co-authored-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-11 12:05:22 -05:00
Vadim Gimpelson	f948ab6945	[CI Failure] `nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV` was removed from HF. Skip it in tests (#28170 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-06 01:22:13 +00:00
Jerry Zhang	03c4c4aa9d	Support using Int4PreshuffledTensor after loading (#26066 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-11-04 06:00:57 -05:00
Varun Sundar Rabindranath	4022a9d279	[BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904 )	2025-11-04 15:56:21 +08:00
Huamin Li	1994de99ea	[CI Failure] Fix test_kv_cache_model_load_and_run (#27717 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-30 12:27:53 +00:00
Xiangyu Li	5cc6bddb6e	[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )	2025-10-23 23:26:13 -04:00
wangxiyuan	f6027b2855	[1/N][Platform] Cleanup useless function (#26982 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-22 09:04:57 +00:00
Varun Sundar Rabindranath	5ff5d94e77	[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 (#26729 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-21 01:51:14 -04:00
Michael Goin	01c977e96d	[CI] Prune Quantization Tests and skip compilation (#27038 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-16 17:26:35 -04:00
Michael Goin	f8a0acbdbe	[CI] Enable Blackwell Llama4 MoE tests (#26731 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-15 21:02:57 -06:00
Michael Goin	7e0ef4084a	[CI Failure] Fix torchao dep failure for Quantization Test (#26824 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-14 16:41:43 -07:00

1 2 3 4

200 Commits