Tiled Matmul Benchmark
Runs jit(matmul) live on your GPU across tile configs and
matrix sizes. All numbers in GFLOP/s (higher is better).
"tt" = threadTile (register tiling per thread). "Bk" = contraction block
size. "auto" = chooseTileConfig picks the best candidate for the
device.
| Config | 256 f32 | 512 f32 | 1024 f32 | 2048 f32 |
|---|---|---|---|---|
| eager (ref) | — | — | — | — |
| tiled 16×16 | — | — | — | — |
| tiled 32×32 tt22 | — | — | — | — |
| tiled 32×32 Bk32 tt22 | — | — | — | — |
| tiled 32×32 tt44 | — | — | — | — |