A few years ago, as part of a commercial Sioux project with a focus on minimizing computation times, I dove into the MATLAB roots
command to get a feeling of the complexity/computation time of the algorithm. What I found surprised me. It was not something I learning in any university course, and I found particularly enjoying, as it is exactly opposite of something that a lot of people who’ve had some basic linear algebra course do come across.
Hoping for the same surprise and enjoyment in whoever reads this, let’s get started.
We start out in a different branch of mathematics, namely linear algebra, and show a connection in that domain to polynomial rootfinding, namely through the eigenvalue problem.
Given a square $n\times n$ matrix $A$, an eigenvalue is scalar for which the equation \(Av = \lambda v\) has a non-zero solution. Those non-zero solutions are the eigenvectors $v$, commonly normalized as any $\alpha v$ will also be an eigenvector. An $n \times n$ matrix can have at most $n$ distinct eigenvalues (with corresponding eigenvector), and under certain conditions the eigenvectors can form a basis in $\mathbf{R}^n$ that ‘diagonalizes’ the linear operator $A$. Then the eigenvalue problem yields a matrix decomposition
\(A = VDV^{-1}\) with $D$ diagonal matrix with all eigenvalues $\lambda_i$ on the diagonal, and $V$ the matrix with the eigenbasis. If $A$ is a symmetric matrix, the eigenbasis is orthogonal, such that we can have $V^{-1} = V^T$. The singular value decomposition is a closely related decomposition that even works for rectangular matrices. Eigendecompositions are very powerful, and there is the saying (can’t find a reference) that ‘you know a matrix when you know its SVD’. I also like the analogy that the SVD is the DNA of a matrix.
In introductory linear algebra courses, which typically are not numerical, but theoretical, the determinant is a very commonly used concept that can also be used to find the eigenvalues of a matrix. A square system $Cx = 0$ only has a non-zero solution if the determinant of $C$ is zero. Therefore, a condition on the eigenvalues is that \(\det(A - \lambda I) = 0\) The determinant is a simple computation using only multiplications and additions/subtractions of matrix entries, such that the expression $\det(A - \lambda I)$ becomes a polynomial expression in scalar variable $\lambda$. The polynomial is of degree $n$, and the eigenvalues of matrix $A$ will correspond to the roots of the ‘characteristic’ polynomial $p_A(\lambda) = \det(A - \lambda I)$. Degree $n$ polynomials can have at most $n$ distinct roots, so that matches up nicely with the earlier statement on distinct eigenvalues. Before we further explore this polynomial rootfinding problem, let’s set up a numerical example. We construct a symmetric $4 \times 4$ matrix $A$ starting from the known eigenvalues $-5, -1, 2,$ and $3$.
Roots of polynomial –> lets dive into the roots
command. Could it perhaps be running some Newton’s method from some well chosen seeds?
%matplotlib notebook
import numpy as np
import scipy.linalg
import matplotlib.pyplot as plt
D = np.diag([-5, -1, 2, 3])
Q,_ = np.linalg.qr(np.random.randn(4, 4))
A = Q @ D @ Q.T
A
array([[-0.21455278, 2.45041637, 1.70683099, 1.29521153],
[ 2.45041637, -1.16044339, -0.6460939 , -2.25219366],
[ 1.70683099, -0.6460939 , -1.40413652, -0.38731631],
[ 1.29521153, -2.25219366, -0.38731631, 1.77913269]])
We can check the $A$ was indeed constructed as intended:
np.linalg.eig(A)[0]
array([-5., -1., 2., 3.])
Deriving the determinant of $4\times 4$ matrix $A - \lambda I$ is a tedious manual calculation. Numerical tools like numpy
don’t allow the $\lambda$ to be unspecified. We would need a symbolic tool for that. Without resorting to a symbolic tool, we can still get the characteristic polynomial as will be shown.
We can loop over a range of $\lambda$ values and numerically evaluate the determinant for each value of $\lambda$. This allows us to plot the characteristic polynomial. Marked in orange are 5 arbitrary distinct points on the graph that we will feed into numpy.polyfit
to retrieve the characteristic polynomial numerically. For nicely spaced points, small matrices and points in the vicinity of the eigenvalues, this works fine.
lambdas = np.linspace(-6, 4)
dets = np.zeros_like(lambdas)
for i, l in enumerate(lambdas):
dets[i] = np.linalg.det(A - l*np.eye(4))
lambdas_sub = lambdas[::12]
dets_sub = dets[::12]
plt.figure()
plt.plot(lambdas, dets)
plt.plot(lambdas_sub, dets_sub, 'o')
plt.grid()
p = np.polyfit(lambdas_sub, dets_sub, 4)
p
array([ 1., 1., -19., 11., 30.])
The characteristic polynomial of matrix $A$ is \(p_A(\lambda) = \lambda^4 + \lambda^3 -19\lambda^2 + 11\lambda + 30\)
As we have integer-valued eigenvalues, all coefficients of the characteristic polynomial are integer too.
How do we find the roots of this polynomial? In basic linear algebra courses the worked out examples are usually only $2\times 2$ or $3 \times 3$ matrices, yielding second or third degree polynomials, for which explicit expressions for the roots exist. From Galois theory we know on the other hand that root finding beyond $n = 4$ becomes a completely different matter. Roots of polynomials will be called ‘algebraic numbers’, but these are only in special cases expressible in a formula consisting of the polynomial coefficients and the set of simple operations of addition, subtraction, multiplication, division and taking radicals.
Without explicit formulae, we must resort to iterative algorithms that (hopefully) converge to these algebraic numbers. For example, the roots
function in MATLAB or numpy
yields all the roots of polynomial $p_A(\lambda)$:
np.roots(p)
array([-5., 3., 2., -1.])
So how does roots
do polynomial rootfinding? Newton’s method can converge to zeros of general non-linear functions very quickly if started sufficiently close to one. But that only gives a single root. How can we get all roots, with confidence/certainty? Run Newton’s method with multiple, well chosen starting values?
The answer is: No
The surprising solution is that we flip it upside down! Given a polynomial $p(\lambda)$, one can create matrices which have that polynomial for a characteristic polynomial. The ‘companion matrix’ is one way to do so. A polynomial $p(\lambda)$ can be represented by a list of scalars $p_i$ such that \(p(\lambda) = p_0 + p_1 \lambda + p_2 \lambda^2 + \ldots + p_n\lambda^n\)
Before we construct the companion matrix, we normalize the coefficients such that $p_n = 1$, which means we get a ‘monic’ polynomial.
The companion matrix is now \(A_p = \begin{pmatrix} -p_{n-1} & \cdots & -p_1 & -p_0 \\ 1 & \cdots & 0 & 0 \\ 0 & \ddots & 0 & 0 \\ 0 & \cdots & 1 & 0 \\ 0 & \cdots & 0 & 1 \end{pmatrix}\)
Ap = scipy.linalg.companion(p)
Ap
array([[ -1., 19., -11., -30.],
[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.]])
lambdas = np.linspace(-6, 4)
Apdets = np.zeros_like(lambdas)
for i, l in enumerate(lambdas):
Apdets[i] = np.linalg.det(Ap - l*np.eye(4))
plt.figure()
plt.plot(lambdas, dets)
plt.plot(lambdas, Apdets, '--')
plt.grid()
plt.legend(['Characteristic polynomial for $A$', 'Characteristic polynomial for companion matrix $A_p$'])
The figure above confirms that companion matrix has the same characteristic polynomial as our original matrix $A$.
If we have a way to diagonalize $A_p$, we can find all the roots of $p(\lambda)$. And of course, the trick now is that the numerical way of computing eigenvalues does not involve finding the roots of a characteristic polynomial at all! In stead, we use the famous QR algorithm for finding eigenvalues. Not to be confused with the QR decomposition, though heavily reliant on it.
First, note that a mapping $A \to QAQ^T$ with $Q$ an $n\times n$ orthogonal matrix is called a similarity transform and leaves the eigenvalues unchanged. The QR algorithm creates a sequence of similar $A_i$ matrices (with $A_0 = A$) that (under conditions) converges to an upper triangular matrix. As triangular matrices have the eigenvalues on the diagonal, we obtain the roots we are looking for.
The recipe to creating this sequence $A_i$ is to iteratively take QR decompositions and then recombine with the matrices swapped like $A_{i+1} \leftarrow R_i Q_i$. Behold!
np.set_printoptions(suppress = True)
Ap = scipy.linalg.companion(p)
N = 20
Aps = np.zeros((4, 4, N))
Aps[..., 0] = Ap
for i in range(1, N):
if i < 10:
print(np.round(Ap, decimals=2))
Q, R = np.linalg.qr(Ap)
Ap = R @ Q
Aps[..., i] = Ap
[[ -1. 19. -11. -30.]
[ 1. 0. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 1. 0.]]
[[-10.5 7.9 -22.56 3.34]
[ -9.53 8.92 -22.54 3.27]
[ 0. 0.09 1.26 0.6 ]
[ 0. 0. 1.18 -0.68]]
[[ -2.54 16.17 -21.81 -25.19]
[ 0.88 0.87 -1. -1.15]
[ 0. 0.12 1.35 1.18]
[ 0. 0. 0.59 -0.68]]
[[ -7.43 12.91 -27.7 14.29]
[ -1.99 5.61 -10.97 5.66]
[ 0. 0.03 1.88 -0.35]
[ 0. 0. 0.41 -1.05]]
[[ -3.84 14.99 -25.96 -21.18]
[ 0.54 1.96 -2.99 -2.45]
[ 0. 0.03 1.81 0.99]
[ 0. 0. 0.2 -0.93]]
[[ -5.86 13.7 -27.45 17.81]
[ -0.56 3.93 -7.12 4.62]
[ 0. 0.01 1.95 -0.72]
[ 0. 0. 0.11 -1.03]]
[[ -4.53 14.36 -27.02 -19.7 ]
[ 0.25 2.58 -4.32 -3.16]
[ 0. 0.01 1.93 0.9 ]
[ 0. 0. 0.05 -0.98]]
[[ -5.31 13.84 -27.33 18.74]
[ -0.18 3.34 -5.91 4.06]
[ 0. 0.01 1.98 -0.83]
[ 0. 0. 0.03 -1.01]]
[[ -4.82 14.07 -27.27 -19.26]
[ 0.1 2.84 -4.91 -3.48]
[ 0. 0. 1.97 0.88]
[ 0. 0. 0.01 -1. ]]
We have logged all the iterates in a 3D structure to analyse the convergence:
plt.figure()
_ = plt.semilogy(np.abs(np.reshape(Aps, (-1, N)).T))
We observe how the values in the lower triangle converge to zero linearly (in a $\rho^k$ with $\rho < 1$ sense). And after 20 iterations, we find the diagonal values indeed already quite close to the eigenvalues we started out with (and roots of the polynomial we constructed along the way).
np.diag(Ap)
array([-5.00066328, 3.00090486, 1.99976051, -1.00000209])
A few final notes: In Golub & Van Loan we can find theory on the convergence rate of the QR algorithm. The convergence rate relates to the ratios of eigenvalues (after sorting in descending order $|\lambda_1| \geq \ldots \geq |\lambda_n|$) \(\left| \frac{\lambda_{p+1}}{\lambda_p} \right|\) Eigenvalues close to eachother slow down convergence, and identical eigenvalues will not be resolved. Slow convergence can partly be remedied by a shifting strategy, which adds and subtracts a matrix $\mu I$ at the right places in the algorithm, to yield rates like \(\left| \frac{\lambda_{p+1} - \mu}{\lambda_p - \mu} \right|\)
A final remark is that typical usage of the QR algorithm for diagonalization of a matrix is preceded by a transformation to an upper Hessenberg form. A QR decomposition on a full matrix requires $O(n^3)$ operations, while on a Hessenberg matrix (which is upper triangular plus one off-diagonal) is only $O(n^2)$. Moreover, the QR algorithm perserves the Hessenberg form, such that the initial cost to form the Hessenberg matrix is offset by each iteration being considerably cheaper. The companion matrix as constructed above, is of Hessenberg form by construction, so that is a nice coincidence.
Finally, we conclude that the pen-and-paper/theoretical method of computing matrix eigenvalues from polynomial roots is exactly opposite from what is actually done in numerical mathematics: polynomial roots are computed using the eigenvalue decomposition of a companion matrix. That eigenvalue decomposition is computed with the QR algorithm.
None yet
Very interested in your comments but still figuring out the most suited approach to this. For now, feel free to send me an email.