← Home

Tiled Matmul Benchmark

Runs jit(matmul) live on your GPU across tile configs and matrix sizes. All numbers in GFLOP/s (higher is better).

"tt" = threadTile (register tiling per thread). "Bk" = contraction block size. "auto" = chooseTileConfig picks the best candidate for the device.

Config256 f32512 f321024 f322048 f32
eager (ref)
tiled 16×16
tiled 32×32 tt22
tiled 32×32 Bk32 tt22
tiled 32×32 tt44