Distributed linear algebra in Julia



 Table of contents

In the previous session there was a question about parallel linear algebra with DistributedArrays.jl, i.e. whether it was possible to solve linear systems defined with distributed arrays, without having to code your own solver. Let’s start with serial solvers which can scale well to surprisingly large linear systems.

Serial solvers

LinearSolve.jl provides serial dense matrix solvers with a focus on performance.

using LinearSolve
n = 100;
A = rand(n,n);                     # uniform numbers from [0, 1)
b = rand(n);
@time prob = LinearProblem(A, b)   # define a linear system problem
@time sol = solve(prob)            # solve this problem
*(A,sol) - b                       # check the result

Let’s check performance on progressively larger systems:

k = 1000
for n in (10, 100, 1k, 10k, 20k, 30k)
    A = rand(n,n);
    b = rand(n);
    prob = LinearProblem(A, b);
    @time sol = solve(prob);
end

For these cases I receive:

  0.000026 seconds (14 allocations: 2.375 KiB)
  0.000115 seconds (17 allocations: 81.984 KiB)
  0.010350 seconds (15 allocations: 7.654 MiB)
  2.819498 seconds (18 allocations: 763.170 MiB, 0.07% gc time)
 23.127729 seconds (18 allocations: 2.981 GiB, 0.55% gc time)
123.673966 seconds (18 allocations: 6.706 GiB, 0.28% gc time)

Distributed solvers

There are few projects that use Distributed.jl to implement parallel dense and/or sparse linear solvers:

ParallelLinalg.jl
ParallelLinalg.jl implements distributed dense linear algebra but was last updated 7 years ago …

Pardiso.jl
Pardiso.jl provides an interface to the Intel’s MKL PARDISO library, which is a highly optimized direct solver for sparse linear systems. It supports distributed computation through MPI.jl.

PartitionedArrays.jl
PartitionedArrays.jl provides HPC sparse linear algebra solvers and relies on MPI.jl. I played with it, but launching it was kind of tricky …

IterativeSolvers.jl
IterativeSolvers.jl provides iterative solvers for linear systems and eigenvalue problems. It supports distributed computation through DistributedArrays.jl and ParallelStencil.jl. I played with it, but ran into errors while following their examples …

PETSc.jl
PETSc.jl provides a low-level interface to PETSc (Portable, Extensible Toolkit for Scientific Computation), a parallel library for linear algebra, PDEs, and optimization. It supports distributed computation through MPI.jl. I have not tried it in Julia.

Elemental.jl
Elemental.jl provides a distributed-memory library of linear algebra routines, including solvers for linear systems, eigenvalue problems, and singular value problems. It supports a wide range of parallel architectures, including multicore CPUs, GPUs, and clusters. This one seems to work quite nicely, although you cannot use DistributedArrays out of the box.

You have to use Elemental.DistMatrix type which gets distributed only when you run it on top of Julia’s MPI:

using MPI, MPIClusterManagers, Distributed
man = MPIWorkerManager(4)
addprocs(man);
@everywhere using DistributedArrays, Elemental
@everywhere n = 100
@mpi_do man A = Elemental.DistMatrix(Float64);
@mpi_do man Elemental.gaussian!(A,n,n);
@mpi_do man b = Elemental.DistMatrix(Float64);
@mpi_do man Elemental.gaussian!(b,n);
@time @mpi_do man Elemental.solve!(A,b)
@mpi_do man println(size(b))   # each worker prints global size
@mpi_do man println(b)         # each worker prints the (same) solution

In this example storage and computation happen on all worker processes. You can monitor all Julia processes with the following command on the compute node that is running them:

htop -p $(echo $(pgrep julia) | sed 's/ /,/g')

I had trouble printing the array b from a single process or printing its subsections from all processes, but I probably just used wrong syntax. I have not tried scaling to bigger problems and to more CPU cores.