biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Andrey Talman	68f783a727	[Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673 ) Signed-off-by: atalman <atalman@fb.com>	2026-03-17 18:47:59 +00:00
Wei Zhao	a3a51d20e7	[Benchmark] Improvements to attention benchmark script (#37115 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-03-16 22:22:40 +00:00
Kunshang Ji	747b068136	[Hardware] Replace memory related torch.cuda APIs (#37031 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>	2026-03-16 10:24:48 +00:00
Matthew Bonanni	f444c05c32	[Attention] Use FA4 for MLA prefill (#34732 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-12 12:10:17 -04:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Yan Ma	894843eb25	replace `with torch.cuda.device` with `with torch.accelerator.device_index` (#36144 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-03-11 23:12:57 -07:00
Roberto L. Castro	580864d81e	[Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 (#34917 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>	2026-03-09 09:50:36 -07:00
Roberto L. Castro	2b28b9b269	[Attention][Perf] Optimize cp_gather_and_upconvert_fp8_kv_cache - DeepSeek-v3.2 (#35290 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-03-09 09:46:57 -07:00
Harry Mellor	a0f44bb616	Allow `markdownlint` to run locally (#36398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:05:24 -07:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Kunshang Ji	16d2ad1d38	[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache` (#30681 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 09:49:47 +00:00
Robert Shaw	97995f6376	[MoE Refactor] Create MK for TRTLLM Kernels (#32564 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-03-03 10:39:50 -08:00
Cyrus Leung	792a74b973	[Doc] Improve UX of `--enable-log-requests` (#35723 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-02 08:24:09 -08:00
Wentao Ye	05970c772c	[Refactor] Remove dead code for attention benchmark script (#35418 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-26 09:53:46 -08:00
Wentao Ye	05972ea7e5	[Refactor] Remove dead or duplicate func utils or variables (#35318 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-26 10:57:56 -05:00
Hanjie Qiu	71dfce6aa6	[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109 ) Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>	2026-02-26 03:17:20 +00:00
Michael Goin	22a97e6613	[Perf] Improve default triton fused moe configs (#34846 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-23 16:01:28 -08:00
Jee Jee Li	7291d1b288	[Bugfix] Fix kernel benchmark (#33752 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-22 21:18:08 -08:00
Mayank Ketkar	648951a9c3	[Bugfix] Fix benchmark_fused_collective crash on CustomOp init (#34665 ) Signed-off-by: Mayank Ketkar <mketkar@zoox.com> Signed-off-by: Mayank Ketkar <mayket04@gmail.com> Co-authored-by: Mayank Ketkar <mketkar@zoox.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 19:01:00 -05:00
Jongseok Park	c656ba3b4d	[Kernel] Triton-based Top-k and Top-p sampler kernels (#33538 ) Signed-off-by: js_park <cakeng@naver.com> Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com> Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-02-17 23:14:30 +00:00
junuxyz	c61a98f529	[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514 ) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>	2026-02-17 12:22:56 +00:00
Matthias Gehre	934acddef9	[Perf] fused_moe: add int4_w4a16 benchmark support and tuning config (#34130 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-13 00:14:27 -08:00
Matthew Bonanni	f2c47886fd	[Attention] Add FlashInfer Sparse MLA backend (#33451 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2026-02-12 17:21:54 +00:00
Michael Goin	ff1f83b056	[Refactor] Replace `activation: str` with `MoEActivation` enum (#33843 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-02-11 17:29:32 -08:00
Matthias Gehre	7a048ee65f	[Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 (#34149 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-02-11 03:58:56 +00:00
Mohammad Miadh Angkad	d4f123cc48	[Kernel] FlashInfer: switch allreduce fusion to unified API (#33985 ) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>	2026-02-09 15:43:24 +00:00
Lucas Wilkinson	d0d97e2974	[Misc] Fix up attention benchmarks (#33810 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-09 09:42:03 -05:00
Jee Jee Li	978a37c823	[Model] GLM adaptation (#34124 )	2026-02-09 17:32:52 +08:00
Wentao Ye	77c09e1130	[Refactor] Remove align block size logic in `moe_permute` (#33449 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-06 10:57:06 -08:00
Eldar Kurtić	5c52644b10	[Docs] Update link to Benchmark CLI documentation (#33254 ) Signed-off-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>	2026-02-06 16:00:59 +00:00
Runkai Tao	7320ca3942	Add unpermute-aware fused MoE LoRA path (#32655 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>	2026-02-02 09:46:09 +08:00
Roy Wang	68feb76a6f	[Misc] Replace deprecated interface seed_everything (#33474 ) Signed-off-by: esmeetu <jasonailu87@gmail.com>	2026-01-31 05:38:39 -08:00
Dimitrios Bariamis	f0bca83ee4	Add support for Mistral Large 3 inference with Flashinfer MoE (#33174 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-01-30 22:48:27 -08:00
Robert Shaw	af9b69f977	[Quantization][Deprecation] Remove Marlin 24 (#32688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 15:54:59 +00:00
Robert Shaw	247d1a32ea	[Quantization][Deprecation] Remove BitBlas (#32683 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-28 11:06:22 +00:00
Matthew Bonanni	e82fa448c4	Add attention benchmarking tools (#26835 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-01-28 00:09:20 +00:00
Lifan Shen	da8d0c441a	[AMD][QWEN3-NEXT] FP8 Tunings (#32042 ) Signed-off-by: Lifan Shen <lifans@meta.com>	2026-01-27 09:34:13 +00:00
Robert Shaw	5a93b9162b	[MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com>	2026-01-27 01:28:02 +00:00
Wentao Ye	8f987883cb	[Refactor] Remove unused `_moe_permute` function (#33108 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-26 16:06:45 -05:00
Roberto L. Castro	fcb9df99bd	[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) (#32520 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-24 18:45:27 -07:00
Michael Goin	4561f13985	[Refactor] Rename `gptq_marlin` to `marlin` to match MoE (#32952 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-23 16:48:12 -05:00
Wentao Ye	dfab5f3764	[Bug] Fix benchmark script `moe_permute_unpermute` (#32949 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-23 16:18:56 -05:00
Xin Yang	d08b356ee0	[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper (#32619 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-22 15:47:04 -05:00
Xin Yang	63227accf5	[Kernel] Add topk_sigmoid kernel (#31246 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-21 22:49:51 +00:00
danisereb	f999539869	Add missing import of fused_topk to benchmark_moe (#32784 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-01-21 18:30:10 +00:00
whx	1861ae8aae	[PluggableLayer][1/N] Define PluggableLayer (Fix ci) (#32744 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-21 11:38:04 -05:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
Yuxuan Zhang	71832ba71e	[GLM-4.7] GLM Model support for GLM-Lite (#31386 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Yuxuan Zhang <2448370773@qq.com>	2026-01-19 01:18:38 -08:00

1 2 3 4 5 ...

589 Commits