An Introduction to Singular Spectrum Analysis (SSA)

Peter Urbani
Feb 5, 2013
7 min read

Most of you will be familiar with Principal Component Analysis (PCA) used mostly in fixed income models where the first three ordered principal components, or dominant eigenvalues, are deemed to represent the level, slope and curvature of the yield curve. This month we review a related method namely Singular Spectrum Analysis (SSA). I have also included a worked example and a spreadsheet implementation that can be downloaded here.

The mathematics behind SSA has multiple names. It is also known as: Proper Orthogonal Decomposition, PCA, Principal Value Decomposition, Singular Value Decomposition, Singular System Analysis, bi-orthogonal decomposition, Karmen Loeve decomposition and the Caterpillar method

While most traditional econometric time series analysis substantially lies in the time domain, spectral analysis is performed in the frequency domain. The idea dates back to the 1960s and even earlier to a paper by Edward Lorenz who proposed its use for the measurement and forecasting of localised weather phenomenon. Its use has mostly been in the fields of Geostatistics and Digital Signal Processing until fairly recently with the recent interest in the Mane - Takens embedding theorem.

The main aim of these techniques is to detect low, medium, and high frequency components carrying the most information in the time series, thus providing precise filtering methods, the identification of the signal dominant cycles, trend-cycle separation, business cycles extraction, and the analysis of co movements among different series.

Notwithstanding evidence of chaos in economic datasets has not been confirmed yet, Broomhead and King (1986) demonstrate that SSA works well even with mildly nonlinear data, as economic series effectively are

One of SSA main advantages with respect to classical Fourier methods is its ability to detect oscillations modulated both in amplitude and phase (Allen and Smith, 1996). Thus, the original signal is not simply decomposed into periodic sine and cosine functions, but rather into data adaptive waves possibly exhibiting non-constant amplitude and/or phase

The method is not conceived to build models, but rather to identify information about the time series deterministic and stochastic parts (Ormerod and Campbell, 1997). In particular, SSA should both accurately forecast the short-term system evolution and capture its long-term features, highlighting some of the system peculiar properties, such as its degree of randomness (Gershenfeld and Weigend, 1993).

Like Principal Component Analysis, SSA separates the time series into empirical orthogonal function EOF's or statistical "modes". Usually the majority of the variance in the time-series is contained in the first few EOF's. The patterns of those first few EOF's may be linked to dynamic mechanisms such as the trend and the cycles.

SSA can be used for:

Smoothing,

Trend extraction,

The identification of Seasonal effects,

Filling in missing values,

Change Point Detection,

Forecasting,

It is a non-parametric and model-free method and hence it can be applied to any series

It does not require stationarity of the series

It does not require log-transformation

The signals obtained by SSA decomposition differ from those obtained by filtering out frequency bands with the Fourier transform

The main model assumption behind SSA is that all interpretable subseries can be approximated by time series described via certain LRRs (linear recurrence relations) of small order and SSA is able to approximately separate these subseries from each other and from noise.

The class of time series which satisfy LRRs consists of sums of products of polynomials, exponentials and sinusoids.

SSA is based on a lag-covariance matrix orthogonal decomposition. The eigenvectors represent the lagged sequences of length L providing the new orthonormal basis

onto which the signal is decomposed (Vautard et al., 1992). They are generally called Empirical Orthogonal Functions and come directly from the data, while the corresponding eigenvalues represent the so called Principal Components, i.e. the fraction of total variance explained in the dataset by each orthogonal direction.

Vautard et al. (1992) demonstrate that the sum of the PCs spectra is identical to the original series power spectrum. This result is particularly interesting, since it underlies the completely linear nature of SSA, which valuably simplifies the analysis of nonlinear time series through linear tools.

Despite the linear summation, the transform between X(t)

and X-Hat(t) is nonlinear, since the relation between each EOF and the original series is actually nonlinear: this fact allows a proper decomposition of nonlinear time series

When we perform an SSA analysis, we specify the time delay embedding dimension L. This value specifies the number of EOF's or statistically independent data streams the time series will be separated into. As mentioned above, most of the time series variance is contained in the first few EOF's. The trend component is almost always the first one. When a cycle is present, it will appear as a pair of EOF's which have nearly identical variance. The eigenvector plot lets us visually inspect how the variance is distributed amongs EOF's.

One of the criticisms of SSA is that it is non-causal. This is due to the diagonal averaging which takes place in the last reconstruction step. This means that when new-information arrives or the window period is extended the change in the size of the matrix ( see X-Hat in the worked example ) causes the prior values to change or be re-drawn. This makes it difficult for traders to use SSA as is. Fortunately it can be made causal by recording only the end point (end point SSA) or forecast values and then iterating over some data window although this is obviously more computationally expensive.

The validity of any forecasting performed using the SSA method depends on the extent to which there is underlying structure in the original time series, the persistence of this structure and the separability of the signal from the noise. For natural and highly cyclical phenomenon this may well be high but for quasi-random data such as financial time series such pockets of predictability may be scarce and temporary at best

For interests sake I have provided a few forecast examples but excepting in the case of the GDP forecasts which do most likely have considerable cyclical structure, they should be treated with caution. As far as I am aware SSA is primarily being used in finance for pre-processing and smoothing of data rather than primary trade signal generation. Reported results suggest that it performs in line with other time series forecasting methods such as double exponential smoothing ( Winters, Holt ) and ARMA methods without requiring as much parameterisation.

Identification of Key Components in the Grouping Stage.

The frequency domain approach to time-series analysis is based on the Wiener Khinchin theorem, which states the equality between the power spectrum and the Fourier transform of the autocorrelation function (ACF) of a time series.

The eigenvalues and eigenvectors define empirical orthogonal functions (EOFs).

Signal-to-noise separation can usually be obtained by merely inspecting the slope break in a "scree diagram" of eigenvalues

Usually the singular spectrum is plotted using the log of the singular values (i.e. the log of the square root of the eigenvalues ) ranked using decreasing variance (i.e. decreasing singular value ). For the log version of the singular spectrum the uncertainty, or the 95% confidence interval, is +-(1.96 / N ) ^ 0.5 where N is the number of data points ( p. 408 of Vautard and Ghil, 1989 ).

The outcome of adding together the principal components is clearly a filtered version of the original data. Thus singular spectrum analysis can be used as a rapid method for FIT filtering - for example, high frequency noise can be removed. ( Kantz and Schreiber 1997 ).

Method

The Basic SSA technique consists of two complementary stages: decomposition and reconstruction both of which include two separate steps

Stage 1: Decomposition

First step: Embedding

Embedding can be regarded as a mapping that transfers the one-dimensional time-series into a two-dimensional ( L , K ) matrix of K lagged time-series of length L of the original data. L represents the window length ( sometimes called M ) and N is the number of data points. K is given as K = N − L +1. The result of this step is known as a Hankel Matrix X which has equal elements on the diagonal.

Second step: Singular value decomposition (SVD)

Generate a lag covariance matrix C of X by taking X.X’

Perform a singular value decomposition on C to arrive at the

Matrix S containing on its diagonal the eigenvalues or ‘singular’ values of X.X’

Matrix U containing the eigenvectors of X.X’

Matrix V containing the Transpose of U

Stage 2: Reconstruction

First step: Grouping

For each of the K eigenvectors - calculate the matrix X-hat as the eigentriple grouping of X-hat = U.U’.X or equivalently U.V.X

Second step: Diagonal averaging Diagonal averaging transfers each of the K X-hat matrices into a time series, which is an additive component of the initial series YT by reversing the Hankelisation process in step one

Forecasting Step

Forecasting can be done by either identifying the Linear Recurrent Formula or by vector forecasting whereby the last eigenvector is used as a basis for extending or projecting the trajectory matrix X-Hat, or one of its reconstructed factors, H steps ahead where H is at most L – 1.

Decomposition: Window length and SVD

The window length L ( sometimes M ) is the only parameter in the decomposition stage. Selection of the proper window length depends on the problem

Selection of the proper window length depends on the problem in hand and on preliminarily information about the time series. Theoretical results tell us that L should be large enough but not greater than N/2