Equana

Back to Tutorials

Timing & Benchmarking

Measure execution time and benchmark matrix operations

Run AllReset

Equana includes built-in timing functions so you can measure how long your code takes to run. This is especially useful when working with numerical code — you can verify that operations run at the speed you expect and find bottlenecks.

tic / toc

The simplest way to time a block of code is tic() and toc(). Calling tic() starts a timer; calling toc() returns the elapsed time in seconds:

Code [1]Run
tic()

# Do some work
A = zeros(100, 100)
B = ones(100, 100)
C = A + B

elapsed = toc()
println("Elapsed: " + string(elapsed) + " s")

Named Timers

tic() returns a handle — a timestamp you can pass to toc(handle) later. This lets you run multiple independent timers at once:

Code [2]Run
# Start two timers
t_total = tic()

t_create = tic()
A = ones(200, 200)
B = ones(200, 200)
println("Matrix creation: " + string(toc(t_create)) + " s")

t_mult = tic()
C = A * B
println("Multiplication:  " + string(toc(t_mult)) + " s")

println("Total:           " + string(toc(t_total)) + " s")

Benchmarking with timeit

For more reliable measurements, use timeit(f). It runs a function many times and returns statistics as a tuple (mean, min, max, n):

Code [3]Run
function work() do
  A = ones(50, 50)
  B = ones(50, 50)
  C = A * B
end

result = timeit(work)
println("Mean: " + string(result[1]) + " s")
println("Min:  " + string(result[2]) + " s")
println("Max:  " + string(result[3]) + " s")
println("Runs: " + string(result[4]))

You can also specify the number of iterations:

Code [4]Run
result = timeit(work, 10)
println("Mean over " + string(result[4]) + " runs: " + string(result[1]) + " s")

Benchmarking Dense Matrix Multiply

Matrix multiplication (A * B) is the core operation behind most scientific and machine learning workloads. Under the hood, Equana uses a WASM SIMD kernel with cache-blocking and register tiling — the same techniques used by high-performance libraries like OpenBLAS.

Let's benchmark it at increasing matrix sizes to see how throughput scales:

Code [5]Run
# Benchmark matrix multiply at different sizes
sizes = [32, 64, 128, 256]

for n in sizes do
  function C = multiply() begin
    A = ones(n, n)
    B = ones(n, n)
    C = A * B
  end

  result = timeit(multiply, 5)
  # FLOP count for matrix multiply: 2 * n^3
  flops = 2.0 * n * n * n
  gflops = flops / result[2] / 1e9
  println(string(n) + "x" + string(n) + ":  " + string(result[2]) + " s  (" + string(gflops) + " GFLOPS)")
end

Note: We use the minimum time (result[2]) for GFLOPS calculation — the minimum best represents peak throughput by excluding noise from garbage collection and other overhead.

The numbers above run entirely in your browser using WebAssembly. For comparison, the Under the Hood page shows how Equana's WASM SIMD kernel achieves ~100% of native SSE performance — around 430 GFLOPS on a modern desktop CPU with 16 threads.

Workbench

Clear
No variables in workbench

Next Steps