Distributed linear algebra in Julia
In the previous session there was a question about parallel linear algebra with DistributedArrays.jl, i.e. whether it was possible to solve linear systems defined with distributed arrays, without having to code your own solver. Let’s start with serial solvers which can scale well to surprisingly large linear systems.
Serial solvers
LinearSolve.jl provides serial dense matrix solvers with a focus on performance.
using LinearSolve
n = 100;
A = rand(n,n); # uniform numbers from [0, 1)
b = rand(n);
@time prob = LinearProblem(A, b) # define a linear system problem
@time sol = solve(prob) # solve this problem
*(A,sol) - b # check the result
Let’s check performance on progressively larger systems:
k = 1000
for n in (10, 100, 1k, 10k, 20k, 30k)
A = rand(n,n);
b = rand(n);
prob = LinearProblem(A, b);
@time sol = solve(prob);
end
For these cases I receive:
0.000026 seconds (14 allocations: 2.375 KiB)
0.000115 seconds (17 allocations: 81.984 KiB)
0.010350 seconds (15 allocations: 7.654 MiB)
2.819498 seconds (18 allocations: 763.170 MiB, 0.07% gc time)
23.127729 seconds (18 allocations: 2.981 GiB, 0.55% gc time)
123.673966 seconds (18 allocations: 6.706 GiB, 0.28% gc time)
Distributed solvers
There are few projects that use Distributed.jl to implement parallel dense and/or sparse linear solvers:
You have to use Elemental.DistMatrix
type which gets distributed only when you run it on top of Julia’s MPI:
using MPI, MPIClusterManagers, Distributed
man = MPIWorkerManager(4)
addprocs(man);
@everywhere using DistributedArrays, Elemental
@everywhere n = 100
@mpi_do man A = Elemental.DistMatrix(Float64);
@mpi_do man Elemental.gaussian!(A,n,n);
@mpi_do man b = Elemental.DistMatrix(Float64);
@mpi_do man Elemental.gaussian!(b,n);
@time @mpi_do man Elemental.solve!(A,b)
@mpi_do man println(size(b)) # each worker prints global size
@mpi_do man println(b) # each worker prints the (same) solution
In this example storage and computation happen on all worker processes. You can monitor all Julia processes with the following command on the compute node that is running them:
htop -p $(echo $(pgrep julia) | sed 's/ /,/g')
I had trouble printing the array b
from a single process or printing its subsections from all processes, but
I probably just used wrong syntax. I have not tried scaling to bigger problems and to more CPU cores.