Author: Xiaochen Xian

In this article, I would like to introduce stationary time series and its properties. Stationary time series is one whose properties do not depend on the time at which the series is observed. It has been widely applied and shows strong power in statistical analysis. The time series with any trends, seasonal patterns, or both, are not stationary. These types of time series, however, can often be converted to stationary time series using techniques like de-trending or differencing.

**1. What is time series?**

Time series, as its name says, is a sequence labeled by time. We denote the time series variable as ${Y_t: t=\cdots, -1, 0, 1 \cdots}$.

There are many examples of time series in real life. The most common demonstration of a time series is time plot. Figure 1 demonstrated the daily average temperature in Madison for August 2017 (data from National weather service). The time series shows a fluctuating but slowly decreasing trend.

Figure 1. The daily average temperature in Madison, August 2017.

Another example is shown in Figure 2, which shows the quarterly beer production in Australia over the years 1974 – 1985 (data from DataMarket). It is clear that the beer production shown in this figure demonstrates a clear seasonal pattern.

Figure 2. Quarterly beer production in Australia in 1974 – 1985.

**2. Summary measures of time series**

For a time series $\{Y_t: t=\cdots, -1, 0, 1, \cdots \}$, we introduce some summary measures that are important in describing a time series. Denote the probability density function (p.d.f.) of $Y_t$ as $f_t(y_t)$, then some measures of ${Y_t}$ are:

- $\mu_t=\mathbb{E}(Y_t)=\int_{-\infty}^{\infty}y_tf_t(y_t)dy_t$ is the mean function of the variable $Y_t$, for $t=\cdots, -1, 0, 1, \cdots$.
- $\sigma_t^2=\mathbb{E}[(Y_t-\mu_t)^2] ) = \int_{-\infty}^{\infty}(y_t-\mu_t)^2f_t(y_t)dy_t$ is the variance function of the variable $Y_t$, for $t=\cdots, -1, 0, 1, \cdots$.
- Similarly, we can define the auto-covariance function of $Y_{t_1}$ and $Y_{t_2}$ as\[\gamma(t_1, t_2)=Cov(Y_{t_1}, Y_{t_2})=\mathbb{E}[(Y_{t_1}-\mu_{t_1})( Y_{t_2}-\mu_{t_2})]\]
- Based on the auto-covariance function, we define the auto-correlation function (ACF) as \[\rho(t_1, t_2)=Cor(Y_{t_1}, Y_{t_2})=\frac{Cov(Y_{t_1}, Y_{t_2})}{\sqrt{Var(Y_{t_1})} \cdot \sqrt{Var(Y_{t_1})}}=\frac{\gamma(t_1, t_2)}{\sigma_{t_1} \cdot \sigma_{t_2}}.\]

**3.** Stationary** time series**

In this section, two types of stationarity of time series are introduced.

A process ${Y_t}$ is **strictly stationary** if it satisfies, for every $n$, every set of $t_1, t_2, \cdots, t_n$ and every integer $s$, the joint probability distribution of the set of random variables $Y_{t_1}, Y_{t_1}, \cdots, Y_{t_n}$ is the same as the joint probability distribution of the set of random variables $Y_{t_1+s}, Y_{t_1+s}, \cdots, Y_{t_n+s}$.

From the definition, we can see that the probability distribution characteristics of a strictly stationary process remain the same as the process shifts over time. The probability characteristics are time invariant. Specifically, it also implies that for any $t_1, t_2$, the covariance function $Cov(Y_{t_1}, Y_{t_2})=\gamma(t_1-t_2)$ depends only on the time lag $t_1-t_2$, provided that each $Y_t$ has a finite second moment.

Strict stationarity imposes a very strong distributional constraint on the time series, which might not be applicable in real applications. Therefore, another type of stationarity, named **weak stationarity**, will be introduced. A process ${Y_t}$ is (weak) stationary if it satisfies:

- $\mathbb{E}|Y_t|^2<\infty$ for all $t$.
- $\mathbb{E}(Y_t)^2$ does not depend on $t$.
- $Cov(Y_t, Y_{t+k})$ depends only on lag $k$, not on $t$.

A natural question is that whether strict stationarity implies weak stationarity, or the other way around. In fact, strict stationarity cannot imply weak stationarity since strictly stationary time series might not have a finite second moment. On the other hand, weak stationarity cannot imply strict stationarity either, since weakly stationary time series can have different distributions for different times $t$. However, for some special cases, these two stationarities can be equivalent. For example, if the process ${Y_t}$ is weakly stationary and is Gaussian, then it is strictly stationary. The proof of this result is intuitive and works directly on the covariance matrix of the Gaussian process.

Since we know that covariance function of a stationary time series does not depend on time, we can write the covariance function and the ACF as \[ \gamma(k)=Cov(Y_t, Y_{t+k}), \] \[ \rho(k)=Cor(Y_t, Y_{t+k})=\frac{\gamma(k)}{\gamma(0)}, k=0, \pm 1, \pm 2, \cdots. \]

Some basic properties of the auto-covariance function are

- $|\gamma(k)| \leq \gamma(0)$ for all $k$. This can be proved using Cauchy-Schwartz inequality.
- $\gamma(k)$ is even. That is, $\gamma(-k)=\gamma(k)$.
- $\gamma(k)$ is non-negative definite, i.e., for every $n$, times $t_1, t_2, \cdots, t_n$, and constants $c_1, c_2, \cdots, c_n$, \[ \sum_{i=1}^n\sum_{j=1}^nc_ic_j\gamma(t_i-t_j)\geq0. \]

Some basic properties of the ACF are

- $-1 \leq |\rho(k)| \leq 1$ for all $k$ with $\rho(0)=1$.
- $\rho(k)$ is even. That is, $\rho(-k)=\rho(k)$.
- $\rho(k)$ is non-negative definite.

The ACF ${\rho(k)}$ is of prime interest in the study of stationary process, because it gives a summary of relations between values at different time lags.

To study a given time series ${Y_t}$, the summary measures have to be calculated based on the sample in the following way.

- Sample mean \[\hat{\mu}=\bar{y}=\frac{1}{T} \sum_{t=1}^T y_t \] is an estimator of $\mu$.

- Sample auto-covariance \[ \hat{\gamma}(k)=C(k)=\frac{1}{T} \sum_{t=1}^{T-k}(y_t-\bar{y})(y_{t+k}-\bar{y}) \] is an estimate of $\gamma(k)$ for $k=0, 1, 2, \cdots, K$, and $K$ is small relative to $T$.

- Sample ACF \[ \hat{\rho}(k)=r(k)=\frac{\hat{\gamma}(k)}{\hat{\gamma}(0)}=\frac{\sum_{t=1}^{T-k}(y_t-\bar{y})(y_{t+k}-\bar{y})}{\sum_{t=1}^T(y_t-\bar{y})^2} \] is an estimate of $\rho(k)$ for $k=0, 1, 2, \cdots, K$, and $K$ is small relative to $T$.

As an example, Figure 3 shows the ACF for the time series in the daily temperature example. The ACF for a time series can be plotted easily using function “$acf()$” for a $ts$ (time series) object in R. The dashed lines in Figure 3 show the 95% percent confidence interval for the correlations being zero. Therefore, it can be observed that only $\rho(0)$ and $\rho(1)$ are significant at confidence level $\alpha=0.05$. We can regard this type of ACF as an ACF from a stationary time series, and it thus can be modeled as an order 1 moving average (MA) model since only $\rho(0)$ and $\rho(1)$ are significant.

Figure 3. ACF for the time series in the daily temperature example.

Similarly, the ACF for the time series in the Australian beer example is shown in Figure 4. It is clear that the correlations show a periodic pattern, and all correlations for the first and third quarters are significant at the confidence level $\alpha=0.05$. In practice, we have to remove the seasonal trend for such type of data before analyzing it as a stationary time series.

Figure 4. ACF for the time series in the Australian beer example.

The ACF plot is an important way for identifying non-stationary times series. For a stationary time series, the ACF will drop to zero relatively quickly (as shown in Figure 3), while the ACF of non-stationary data decreases slowly (like Figure 4).

A typical way to make a time series stationary is to calculate the differences between consecutive observations $y_{t+1}$ and $y_t$, which is known as differencing. Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and so eliminating trend and seasonality. Besides, transformations such as logarithms can help to stabilize the variance of a time series.

Just thinking, the object of time series forecasting should predict what can happen in the future based on high volume of existing data. So I was wondering can we smooth the non-stationary data to stationary data and reduce the risk of outliers or accidental cases?