nvfp4-megamoe-kernel/Dockerfile at 48fa64dfda170bb0b60dbeb796acdda62c2089e9

Files

biondizzle c043a11bcc Register CuTeDSL as proper NvFp4LinearKernel for NVFP4 linear layers

- Create CuTeDSLNvFp4LinearKernel extending NvFp4LinearKernel base class
- Register it via init_nvfp4_linear_kernel() selection mechanism
  (inserted at top of _POSSIBLE_NVFP4_KERNELS, before FlashInfer)
- process_weights_after_loading: uint8→FP4, permute, create CuTeDSL runner
- apply_weights: route through CuTeDSL GEMM
- Update Dockerfile: copy kernel + registration script
- Fix attention: always use forward() for quantized compressor/indexer
  layers (dtype check was fragile after kernel swaps weights to dummy BF16)

2026-05-19 00:44:44 +00:00

3.5 KiB

Raw Blame History

View Raw

3.5 KiB Raw Blame History

3.5 KiB

Raw Blame History