nvfp4-megamoe-kernel/cutedsl at 02b57071bee5212e511f068e891cbdb5524c7cfd - nvfp4-megamoe-kernel - Gitea: Git with a cup of tea

biondizzle/nvfp4-megamoe-kernel

Files

History

biondizzle 6ceb05327f Add blackwell_attention module and comprehensive test

2026-05-19 15:30:29 +00:00

..

refactor: copy CuTeDSL kernel into repo with local imports

2026-05-16 02:57:54 +00:00

__init__.py

refactor: copy CuTeDSL kernel into repo with local imports

2026-05-16 02:57:54 +00:00

blackwell_attention.py

Add blackwell_attention module and comprehensive test

2026-05-19 15:30:29 +00:00

bridge.py

Fix torch.compile crash: remove threading.Lock from LUT cache path

2026-05-18 20:54:55 +00:00

csa_attention.py

Fix device reference in full_attention_reference

2026-05-19 08:01:31 +00:00

custom_ops.py

Replace autograd.Function with torch.library.custom_op for Dynamo compat

2026-05-19 01:54:48 +00:00

inverse_rope.py

Replace DeepGEMM fp8_einsum with CuTeDSL NVFP4 for wo_a (o_proj)

2026-05-19 02:36:30 +00:00

moe_pipeline.py

Add pipeline test with real model weights, add swiglu_limit to reference moe_pipeline

2026-05-17 18:07:44 +00:00

nvfp4_linear.py

Replace autograd.Function with torch.library.custom_op for Dynamo compat

2026-05-19 01:54:48 +00:00

runner.py

Fix gs None error in legacy _ensure_stacked path

2026-05-19 02:17:53 +00:00

shared_expert_pipeline.py

Fix garbled shared_expert_pipeline.py: imports/class were merged

2026-05-19 07:18:10 +00:00

wo_a_grouped_linear.py

Fix wo_a: scatter each group's data at correct offset in padded buffer

2026-05-19 02:45:57 +00:00