Time series analysis focus on the analysis of random variable sequences. In this article, I would briefly review the basic definitions and properties of time series, the typical models, and the prediction of time series. The concepts and formulas are from Applicational Time Series Analysis by Shuyuan He, published by Peking University Express. Note that this is only a memo containing only key knowledge. If you are a beginner, please refer to other tutorials.

I picked up most of my statistics knowledge in my university in Mandarin. Thus there might be slight mistakes or imperfections with the terms and the grammars in this article. If you have any suggestions, you are more than appreciated to email me and help me to correct them.

# Basic Definitions

In this section, the basic definition of time series related knowledges, including stationary time series, correlation function, white noise, and spectrum function, will be reviewed. Some useful properties of these definitions will be reviewed along with.

## Time Series

A sequence of random variables ordered by time index is called a time series, represented as \[ X_1, X_2, \ \dots \] For a time series, its observed samples are called an implementation of the time series, represented as \[ x_1, x_2, \ \dots \] Most of the time, a time series is considered as the sum of 3 components: the trend component \(T_t\), the seasonal component \(S_t\), and the random component \(R_t\). \[ X_t = T_t + S_t + R_t \] Usually, \(T_t\) and \(S_t\) are considered as non-random functions and can be estimated by methods such as regression. This article would mainly focus on the random component, \(R_t\).

## Stationary Time Series

A time series \(\{X_t\}\) is called a **stationary time series** if it satisfies the following.

- For any \(t \in \mathbb{N}\), \(EX_t^2 < \infty\)
- For any \(t \in \mathbb{N}\), \(EX_t = \mu\)
- For any \(t \in \mathbb{N}\), \(E[(X_t - \mu)(X_s - \mu)] = \gamma_{t - s}\)

That is, for a stationary time series, for every term, their second moment is finite, and the expectations are a fixed value. For every two terms, their correlation is decided by the difference of their time index instead of the time index itself.

## Correlation Function

For a stationary time series \(\{X_t\}\) with mean value \(\mu\) and correlation defined by the following formula. \[
\gamma_{k} = E[(X_t - \mu)(X_{t + k} - \mu)]
\] The sequence \(\{\gamma_k\}\) is called the **correlation function** of the stationary time series. And the correlation matrix \(\Gamma_n\) is defined as follows. \[
\Gamma_n = (\gamma_{k - j})_{k, j = 1}^n = \left(\begin{matrix}
\gamma_0 & \gamma_1 & \dots & \gamma_{n - 1} \\
\gamma_1 & \gamma_0 & \dots & \gamma_{n - 2} \\
\vdots & \vdots & & \vdots \\
\gamma_{n - 1} & \gamma_{n - 2} & \dots & \gamma_0
\end{matrix}\right)
\] The correlation function \(\{\gamma_k\}\) and the correlation matrix \(\Gamma_n\) satisfies the following properties.

- \(\gamma_k = \gamma_{-k}\)
- \(\Gamma_n\) is non-negative definite for any \(n \in \mathbb{N}_+\)
- \(|\gamma_k| \leq \gamma_0\) for any \(k \in \mathbb{Z}\)

### Estimation of Correlation Function

Given \(n\) samples from a stationary time series, the correlation function can be estimated as follows. \[ \hat{\gamma}_k = \frac{1}{N} \sum_{j = 1}^{N - k} (x_j - \bar{x}_N)(x_{j + k} - \bar{x}_N) \]

## Partial Correlation Coefficient

For stationary time series with positive definite \(\Gamma_{n + 1}\), for \(1 \leq k \leq n\), the Levinson recursive formula holds. \[ \begin{cases} a_{1, 1} = \frac{\gamma_1}{\gamma_0} \\ \sigma_0^2 = \gamma_0 \\ \sigma_k^2 = \sigma_{k - 1}^2 (1 - a_{k, k}^2) \\ a_{k + 1, k + 1} = \frac{\gamma_{k + 1} - \sum_{j = 1}^k a_{k, j}\gamma_{k - j + 1}}{\gamma_0 - \sum_{j = 1}^k a_{k, j} \gamma_j} \\ a_{k + 1, j} = a_{k, j} - a_{k + 1, k + 1} a_{k, k - j + 1} \end{cases} \] with \[ \sigma_k^2 = E(X_{k + 1} - \textbf{a}_k^T\textbf{X}_k)^2 \\ \textbf{X}_n = (X_n, X_{n - 1}, \dots, X_1)^T \]

## White Noise

Let \(\{\epsilon_t\}\) be a stationary time series, if for any \(s, t \in \mathbb{N}\), \[
E\epsilon_t = \mu \\
cov(\epsilon_t, \epsilon_s) = \begin{cases}
\sigma^2 \qquad t = s \\
0 \qquad \ \ t \neq s
\end{cases}
\] Then \(\{\epsilon_t\}\) is called **white noise**, usually represented as \(WN(\mu, \sigma^2)\).

### White Noise Test

Given the estimation of the correlation coefficient for samples from a time series \[ \hat{\rho}_k = \frac{\hat{\gamma}_k}{\hat{\gamma}_0} \] The following random variable approximately obeys \(m\)-dimensional normal distribution. \[ \sqrt{N} (\hat{\rho}_1, \hat{\rho}_2, \dots, \hat{\rho}_m) \] Thus, the following random variable approximately obeys the chi-square distribution with the degree of freedom \(m\). \[ \chi_m^2 = N \sum_{i = 1}^m \hat{\rho}_i^2 \] By applying the chi-square test, it would be possible to decide whether the samples are from white noise.

## Spectrum Function

For stationary time series \(\{X_t\}\) with correlation function \(\{\gamma_k\}\), if there exists a non-decreasing and right continuous function \(F(\lambda)\) defined in \([-\pi, \pi]\) such that: \[
\gamma_k = \int_{-\pi}^\pi e^{ik\lambda} d F(\lambda) \\
F(- \pi) = 0
\] Then \(F(\lambda)\) is called the **spectrum distribution function** for \(\{X_t\}\).

If there exists a non-negative function \(f(\lambda)\) such that: \[
\gamma_k = \int_{-\pi}^\pi f(\lambda) e^{ik\lambda} d\lambda
\] Then \(f(\lambda)\) is called the **spectrum density function** for \(\{X_t\}\).

For stationary time series, its spectrum function uniquely exists. The spectrum density function for real value stationary time series is an odd function.

### Spectrum Density Function for Stationary Time Series

If the correlation function \(\{\gamma_k\}\) for stationary time series \(\{X_t\}\) is absolute summable, then \(\{X_t\}\) has its spectrum density function \[ f(\lambda) = \frac{1}{2 \pi} \sum_{k = - \infty}^\infty \gamma_k e^{-ik\lambda} \]

### Spectrum Density Function for Moving Average

Let \(\{\epsilon_t\}\) be a \(WN(0, \sigma_2)\), sequence \(\{a_j\}\) are square summable, then the moving average over \(\{\epsilon_t\}\) \[ X_t = \sum_{j = -\infty}^\infty a_j \epsilon_{t - j} \] has spectrum density function \[ f(\lambda) = \frac{\sigma^2}{2 \pi} \left| \sum_{j = - \infty}^\infty a_j e_{ij\lambda} \right|^2 \]

## Hilbert Space for Stationary Time Series

Let \(L^2(X)\) be the set containing all limited linear combination of random variables from stationary time series \(\{X_t\}\): \[
L^2(X) = \left\{ \sum_{j = 1}^k a_j X(t_j) | a_j \in \mathbb{R}^n, t_j \in \mathbb{Z}, 1 \leq j \leq k, k \in \mathbb{N}_+ \right\}
\] Then \(L^2(X)\) is a linear space. With the inner product for element \(X\) and \(Y\) defined as \(E(XY)\), \(L^2(X)\) is a complete inner product space, also called a **Hilbert space**.

# Time Series Models

In this section, 3 typical time series models: AR, MA, and ARMA, and their related properties will be reviewed.

## AR(p) Model

Let \(\{\epsilon_t\}\) be a \(WN(0, \sigma^2)\), and \(A(z)\) be a polynomial that satisfies \[ A(z) = 1 - \sum_{i = 1}^p a_j z^j \neq 0, \qquad |z| \leq 1 \] the differential equation \[ X_t = \sum_{j = 1}^p a_j X_{t - j} + \epsilon_t \] is called an auto regression model, or \(AR(p)\) model. If a stationary time series satisfies the model, it is called an \(AR(p)\) series.

### Stable Solution of AR(p) Model

For \(AR(p)\) model, its only stable solution is defined as follows. Sequence \(\{\psi_j\}\) is called the **Wold coefficient** of the time series. \[
X_t = A^{-1}(\mathscr B) \epsilon_t = \sum_{j = 0}^\infty \psi_j \epsilon_{t - j}
\] The Wold coefficient can be solved by the following recursive formula. \[
\psi_k = \begin{cases}
0 \qquad \qquad \qquad k \leq 0 \\
1 \qquad \qquad \qquad k = 0 \\
\sum_{j = 1}^p a_j \psi_{k - j} \qquad k \geq 1
\end{cases}
\]

### Spectrum Density Function of AR(p) Series

For \(AR(p)\) series \(\{X_t\}\), its spectrum density function is \[ f(\lambda) = \frac{\sigma^2}{2 \pi} \left| \sum_{j = 0}^\infty \psi_j e^{ij\lambda}\right|^2 = \frac{\sigma^2}{2 \pi} \left| \frac{1}{A(e^{i\lambda})} \right|^2 \]

### Yule-Walker Equation

For \(AR(p)\) series, the following equation holds. \[ \begin{cases} \boldsymbol{\gamma}_n = \Gamma_n \boldsymbol{a}_n \\ \gamma_0 = \boldsymbol{\gamma}_n^T \boldsymbol{a}_n + \sigma^2 \end{cases} \] with \[ \boldsymbol{a}_n = (a_1, a_2, \dots, a_p, 0, \dots, 0)^T \\ \boldsymbol{\gamma}_n = (\gamma_1, \gamma_2, \dots, \gamma_n)^T \] Yule-Walker equation can be used to calculate the correlation function given \(\boldsymbol{a}_n\), or estimate \(\boldsymbol{a}_n\) given estimated correlation function.

## MA(q) Model

Let \(\{\epsilon_t\}\) be a \(WN(0, \sigma^2)\), and \(B(z)\) be a polynomial that satisfies \[ B(z) = 1 + \sum_{j = 1}^q b_j z^j, \qquad |z| < 1 \] the equation \[ X_t = \epsilon_t + \sum_{j = 1}^q b_j \epsilon_{t - j} \] is called a moving average model, or \(MA(q)\) model. If a stationary time series satisfies the model, it is called a \(MA(q)\) series.

### Correlation Function for MA(q) Series

Given a \(MA(q)\) series, its correlation function \(\{\gamma_k\}\) can be calculated as follows. \[ \gamma_k = \begin{cases} \sigma^2 \sum_{j = 0}^{q - k} b_j b_{j + k}, \qquad 0 \leq k \leq q \\ 0, \qquad k > q \end{cases} \]

### Spectrum Density Function for MA(q) Series

Given a \(MA(q)\) series, its spectrum density function \(f(\lambda)\) can be calculated as follows. \[ f(\lambda) = \frac{\sigma^2}{2 \pi} \left| \sum_{j = 0}^\infty \psi_j e^{ij\lambda}\right|^2 = \frac{1}{2 \pi} \sum_{k = -q}^q \gamma_k e^{-ik\lambda} = \frac{\sigma^2}{2 \pi} \left| B(e^{i\lambda}) \right|^2 \]

## ARMA(p, q) Model

Let \(\{\epsilon_t\}\) be a \(WN(0, \sigma^2)\), and \(A(z)\) \(B(z)\) be polynomials having different roots, \(b_0 = 1\), \(a_qb_q \neq 0\), and \[ A(z) = 1 - \sum_{i = 1}^p a_j z^j \neq 0, \qquad |z| \leq 1 \\ B(z) = \sum_{j = 0}^q b_j z^j, \qquad |z| < 1 \] the differential equation \[ X_t = \sum_{j = 1}^p a_j X_{t - j} + \sum_{j = 0}^q b_j \epsilon_{t - j} \] is called an auto regression moving average model, or an \(ARMA(p, q)\) model. If a stationary time series satisfies the model, it is called an \(ARMA(p, q)\) series.

### Stable Solution of ARMA(p, q) Model

For \(ARMA(p, q)\) model, its only stable solution is defined as follows. Sequence \(\{\psi_j\}\) is called the **Wold coefficient** of the time series. \[
X_t = A^{-1}(\mathscr B) B(\mathscr B) \epsilon_t = \Phi(\mathscr B) \epsilon_t = \sum_{j = 0}^{\infty} \psi_j \epsilon_{t - j}
\] The Wold coefficient can be solved by the following recursive formula. \[
\psi_k = \begin{cases}
0 \qquad \qquad \qquad k \leq 0 \\
1 \qquad \qquad \qquad k = 0 \\
b_j + \sum_{j = 1}^p a_j \psi_{k - j} \qquad k \geq 1
\end{cases}
\]

### Correlation Function for MA(q) Series

Given an \(ARMA(p, q)\) series, its correlation function \(\{\gamma_k\}\) can be calculated as follows. \[ \gamma_k = \sigma^2 \sum_{j = 0}^\infty \psi_j \psi_{j + k} = \sum_{j = 1}^p a_j \gamma_{k - j} + \sigma^2 \sum_{j = 0}^q b_j \psi_{j - k} \]

### Spectrum Density Function for MA(q) Series

Given an \(ARMA(p, q)\) series, its spectrum density function \(f(\lambda)\) can be calculated as follows. \[ f(\lambda) = \frac{\sigma^2}{2 \pi} \left| \sum_{j = 0}^\infty \psi_j e^{ij\lambda}\right|^2 = \frac{1}{2 \pi} \sum_{k = -q}^q \gamma_k e^{-ik\lambda} = \frac{\sigma^2}{2 \pi} \left| \frac{B(e^{i\lambda})}{A(e^{i\lambda})} \right|^2 \]

# Time Series Prediction

One goal of analyzing time series is to make better a prediction for its futural terms. Since a time series can be considered as the sum of a trend component, a seasonal component, and a random component, and the random component is usually stationary, this section mainly focuses on the prediction for stationary time series.

## Linear Prediction for Time Series

Let \(Y\) and \(X_j\) be random variables with \(0\) as their mean value and limited variance, if there exist \(\boldsymbol{a} \in \mathbb{R}^n\), such that for any \(\boldsymbol{b} \in \mathbb{R}^n\), \[
E(Y - \boldsymbol{a}^T X)^2 \leq E(Y - \boldsymbol{b}^T X)^2
\] then \(\boldsymbol{a}^T \boldsymbol{X}\) is called the **best linear prediction** for \(Y\) by \(\boldsymbol{X}\), represented as \(L(Y | \boldsymbol{X})\).

### Mean Square Error for Best Linear Prediction

If \(\boldsymbol{a} \in \mathbb{R}^n\), such that \[ \Gamma \boldsymbol{a} = E(\boldsymbol{X}Y) \] then the following equations holds. \[ L(Y | \boldsymbol{X}) = \boldsymbol{a}^T \boldsymbol{X} \\ E(Y - L(Y | \boldsymbol{X}))^2 = EY^2 - E[L(Y | \boldsymbol{X})]^2 = EY^2 - \boldsymbol{a}^T \Gamma \boldsymbol{a} \]

## Stepwise Prediction for Time Series

Let \(\{Y_t\}\) be a stationary time series with mean value \(0\). If its correlation matrix is positive definite, then the best linear prediction can be derived from the prediction errors: \[ \hat{Y}_{n + 1} = L(Y_{n + 1} | \boldsymbol{Y_n}) = \sum_{j = 1}^n \theta_{n, j} W_{n + 1 - j} \] The coefficients \(\{\theta_{n, j}\}\) and the prediction square error \(\nu_n = EW_{n + 1}^2\) satisfies the following recursive formula. \[ \begin{cases} \nu_0 = \gamma_0 \\ \theta_{n, n - k} = \left[ \gamma_{n - k} - \sum_{j = 0}^{k - 1} \frac{\theta_{k, k - j} \theta_{n, n - j} \nu_j}{\nu_k} \right] \\ \nu_n = \gamma_0 - \sum_{j = 0}^{n - 1} \theta_{n, n - j}^2 \nu_j \end{cases} \]