-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hello,
I have the following test code that compares runtime of using MA57 to solve a linear system with a single thread and with (assuming correct behavior) multiple threads.
using HSL, MatrixMarket, SuiteSparseMatrixCollection, OpenBLAS32#, MKL
using LinearAlgebra, Printf, BenchmarkTools
ssmc = ssmc_db(verbose=false)
matrix = ssmc_matrices(ssmc, "Boeing", "pwtk")
path = fetch_ssmc(matrix, format="MM")
n = matrix.nrows[1]
A = MatrixMarket.mmread(joinpath(path[1], "$(matrix.name[1]).mtx"))
b = ones(n)
b_norm = norm(b)
# should be single-threaded:
HSL.omp_set_num_threads(1)
#BLAS.set_num_threads(1)
LDL = @btime Ma57($A) # 594 ms (57 alloc, 345.11 MiB)
@btime ma57_factorize!($LDL) # 1.2 s (0 alloc, 0 bytes)
@btime ma57_solve($LDL, $b) # 36 ms (6 alloc, 3.3 MiB)
# should use four threads:
ENV["OPENBLAS_NUM_THREADS"]=4
HSL.omp_set_num_threads(4)
LDL = @btime Ma57($A) # 604 ms (57 alloc, 345 MiB)
@btime ma57_factorize!($LDL) # 1.2 s (0 alloc, 0 bytes)
@btime ma57_solve($LDL, $b) # 36 ms (6 alloc, 3.3 MiB)But both the included timings and manual review of htop during execution seem to indicate the libHSL is not actually making use of multiple cores. I can reproduce this incorrect (I think) behavior on both an intel I7-1260p mobile processor and an AMD EPYC 9554P server processor.
If I switch to the MKL backend by uncommenting the using MKL component above, I do see the right number of threads engage. The runtime is actually a bit slower than the single threaded performance, which makes me wonder if everything is working in that case even. But I'm perfectly willing to believe that the parallel speedup is pretty sparsity pattern-dependent.
Thanks very much for your time and thoughts! CC @amontoison, who I have chatted with about this a bit off-thread already.