biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Carl Y	3bc2734dd0	[Kernel] Fuse FP8 output quantization into merge_attn_states (#36518 ) Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>	2026-04-03 01:47:04 +00:00
Olya Kozlova	598190aac3	[fix] Remove trtllm ragged mla prefills (#36540 ) Signed-off-by: Olya Kozlova <okozlova@nvidia.com>	2026-03-31 12:30:27 -07:00
Pleaplusone	d9d342d214	[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek (#27457 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-26 12:45:28 +08:00
courage17340	981cadb35c	[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty (#28181 ) Signed-off-by: courage17340 <courage17340@163.com>	2025-11-06 17:52:13 +08:00
Lucas Wilkinson	ce75efeecb	[BugFix] FA2 MLA Accuracy Issue (#18807 ) Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>	2025-05-28 08:59:39 +00:00
DefTruth	e82ee40de3	[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel (#16693 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-16 03:31:39 -07:00
DefTruth	e9528f6dc6	[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-11 06:50:50 -06:00