Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

Subscriber: null; date: 16 January 2019

Analysis of economic data with multiscale spatio-temporal models

Abstract and Keywords

This article discusses the use of Bayesian multiscale spatio-temporal models for the analysis of economic data. It demonstrates the utility of a general modelling approach for multiscale analysis of spatio-temporal processes with areal data observations in an economic study of agricultural production in the Brazilian state of Espìrito Santo during the period 1990–2005. The article first describes multiscale factorizations for spatial processes before presenting an exploratory multiscale data analysis and explaining the motivation for multiscale spatio-temporal models. It then examines the temporal evolution of the underlying latent multiscale coefficients and goes on to introduce a Bayesian analysis based on the multiscale decomposition of the likelihood function along with Markov chain Monte Carlo (MCMC) methods. The results from agricultural production analysis show that the spatio-temporal framework can effectively analyse massive economics data sets.

12.1 Introduction

Many economic processes are naturally spatio-temporal. Often times, quantities related to these processes are available as areal data, partitioned by the geographical domain of interest, for example totals or averages of some variable of interest for each state of a country. Furthermore, these economic processes may often be considered at several different levels of spatial resolution. Here we present a general modelling approach for multiscale analysis of spatio-temporal processes with areal data observations and illustrate the utility of this approach for economic applications with an analysis of agricultural production per county in the Brazilian state of Espírito Santo from 1990 to 2005.

In practice, economic data may be considered at different levels of resolution, e.g. census tract, county, state, country, continent. Our analysis of agricultural production in Espírito Santo is based on the states geopolitical organization as depicted in Figure 12.1. Specifically, each state in Brazil is divided into counties, microregions and macroregions; counties are grouped into microregions, which are then grouped into macroregions, according to their socioeconomic similarity. Thus, our analysis considers three levels of resolution: county, microregion, and macroregion.

The county-level data are aggregated to obtain the data at the other scales of resolution. Then, at each time point, the observed data are decomposed into empirical multiscale coefficients that represent a vector of weighted differences between each subregion and its descendants. To this extent, we consider a multiscale decomposition, proposed by Kolaczyk and Huang (2001), which includes Haar wavelet decompositions (Vidakovic, 1999) as particular case. This decomposition is quite flexible and can accommodate irregular grids and heteroscedastic errors. Finally, we assume that the empirical multiscale coefficients correspond to latent multiscale coefficients that evolve through time following singular random walks.

An interesting feature of our approach is that the multiscale coefficients have an interpretation of their own; thus, the multiscale spatio-temporal framework (p. 296)

Click to view larger

Fig. 12.1 Geopolitical organization of Espírito Santo State by (a) counties, (b) microregions, and (c) macroregions.

may offer new insight on understudied multiscale aspects of spatio-temporal observations. Moreover, the analysis of the estimated multiscale coefficients that we propose sheds light on the similarities and differences in agricultural production between regions within each scale of resolution. In addition, the temporal change in relative agricultural importance of those regions is explicitly described by the evolution of the estimated latent multiscale coefficients.

Multiscale spatio-temporal modeling has only recently been addressed in the literature, with just a hand full of papers published to date. One early attempt at multiscale spatio-temporal modeling was given by Berliner et al. (1999), where the authors propose a hierarchical model for turbulence and consider a wavelet decomposition for an underlying latent process, and allow the wavelet coefficients to evolve through time. Note that the multiscale decomposition that they consider is a wavelet decomposition applied to data on a regular grid. In contrast, we employ a decomposition that can be applied naturally to data on irregular grids without having to appeal to lifting schemes (Sweldens, 1996). Additionally, wavelets yield a decomposition where each node, in a given resolution level, gives rise to two children nodes (i.e. a dyadic expansion); this would lead to a two-dimensional wavelet coefficient but, as the sum of its elements is zero, only one of its elements need to be retained. Conversely, in our construction, there is no limitation on the number of children nodes at each level of resolution in the decomposition. For example, in the agricultural production application each macroregion has between two and five descendant microregions, and each microregion has between two and twelve descendant counties. As each empirical multiscale coefficient is a vector with number of elements equal to the corresponding number of descendants, and the sum of its elements is zero, the distribution of the empirical multiscale coefficient is a degenerate Gaussian distribution.

(p. 297) A more recent reference related to our work is Johannesson et al. (2007). Although the authors propose a multiscale decomposition that uses a coarse to fine construction, there are several differences between their approach and ours. First, they consider a regular grid while we consider more general data structures. Second, they estimate the relationship between the different levels of resolution subject to a mass balance constraint; conversely, we obtain the relationship between the different levels of resolution as a function of the different variances of the different counties, and use the resulting relationship in order to define our multiscale coefficients. Lastly, they assume temporal dynamics only for the aggregated coarse level. In contrast, we include dynamics not only for the aggregated coarse level, but also for the multiscale coefficients at each resolution level. The inclusion of the latter dynamics is fundamentally difficult due to the singular nature of the multiscale coefficients and it is one of the main contributions of the approach developed in this paper.

A desirable property for methods that provide estimates at different scales of resolution is that the estimates respect deterministic relationships between the different scales of resolution. For example, the sum of per county agricultural production within a given microregion is equal to the total production of that microregion. Here we assume that there is an underlying latent process that evolves through time such that the observations per county are independent conditional on the underlying process. Moreover, conditional on this underlying process we can compute the expected value of the observations at the different resolution levels; these expected values should respect the fact that the sum of per county agricultural production within a given microregion is equal to the total production of that microregion. This is achieved by requiring that the latent multiscale coefficients satisfy the same constraint as their empirical multiscale coefficients counterparts; namely, that the sum of its elements is equal to zero. Moreover, this is accomplished by assuming that the latent multiscale coefficients evolve through time according to singular random walks, with singular covariance matrix proportional to the singular covariance matrix of the empirical multiscale coefficients. As a consequence, our framework insures that deterministic relationships between different levels of resolution are automatically respected for both the observations and the latent process as well as for the estimated latent process.

Finally, we propose a Bayesian analysis for the spatio-temporal multiscale model, performed in three steps. First, we obtain a preliminary estimate of the variance for each county using a standard first-order dynamic linear model (West and Harrison, 1997). Then, based on these preliminary estimates of the county variances we compute the empirical multiscale coefficients. The original likelihood function for the county observations is then decomposed into the product of the likelihood for the coarse level observations and the (p. 298) likelihood for the empirical multiscale coefficients at the different levels of resolution. This allows us to implement a very efficient MCMC algorithm which combines the standard Forward Filter Backward Sampler (FFBS) (Carter and Kohn, 1994; Frühwirth-Schnatter, 1994) with a novel Singular Forward Filter Backward Sampler (SFFBS) specially tailored for the simulation of the latent multiscale coefficients.

Since the seminal work by Donoho and Johnstone (1994) on nonparametric regression using wavelets, wavelet-based multiscale methods and models have flourished. In fact, for many researchers the term multiscale means use of wavelets. However, this is far from the truth: there are many multiscale methods and models for processes defined at several scales of resolution that are not wavelet-based. Our multiscale spatio-temporal model fits better in this last category as it explores a non-wavelet-based multiscale decomposition useful for Gaussian areal data observed on an irregular grid. For this reason, the discussion of other multiscale modelling techniques is out of the scope of this chapter. For a comprehensive discussion on multiscale modelling see Ferreira and Lee (2007) and the references therein.

The remainder of this paper is organized as follows. Section 12.2 describes multiscale factorizations for spatial processes. Section 12.3 presents an exploratory multiscale data analysis and provides the motivation for multiscale spatio-temporal models. The temporal evolution of the underlying latent multi-scale coefficients is introduced in Section 12.4. Section 12.5 presents a Bayesian analysis based on the multiscale decomposition of the likelihood function along with MCMC. Section 12.6 illustrates our methodology through analysis of data from our motivating example. Conclusions and possible extensions are presented in Section 12.7. For convenience of exposition the derivation of all full conditional distributions is left to the Appendix.

12.2 Multiscale factorization

At each time point we decompose the data into empirical multiscale coefficients using the spatial multiscale modelling framework of Kolaczyk and Huang (2001), where the authors consider spatial aggregations by sums. In what follows we review their approach, in the context of our motivating example, using the notation of Ferreira and Lee (2007, Chapter 9). For simplicity of notation, in this section, we omit any temporal dependence.

In our motivating example, interest lies in agricultural production observed at the county level, which we assume is the Lth level of resolution (i.e. the finest level of resolution), on a partition of a domain $S⊂ ℝ 2$ denoted by {BL1, … , BLnL}, with $B Lj ∈S, j=1,…, n L , B Li ∩ B Li = 0 , i≠j$ and $∪ j=1 n L B Lj =S$. In our case, the region of interest S is the Brazilian state of Espírito (p. 299) Santo, L = 3, and BLj = B3j refers to the jth county. Furthermore, for the jth county, let yLj and μLj = E(yLj) respectively denote agricultural production and its latent expected value.

We assume that yL1, … , yLnL are conditionally independent given the latent mean process $μ L,1: n L = ( μ L1 ,… μ LnL ) ′$. This is a fairly reasonable assumption and is equivalent to assuming independent measurement errors. Note that in general the latent mean process will be correlated and this dependence will carry over to the observations $y L,1: n L = ( y L1 ,… y L n L ) ′$. Thus, the marginal distribution of the observations will exhibit dependence.

Interest not only lies in the latent mean process μ at the Lth scale of resolution but also in the process at aggregated coarser scales. At the lth scale of resolution, the domain S is partitioned in nl subregions Bl1, … , Blnl, l = 1, … , L. In our case, l = 1 corresponds to the macroregion resolution level, whereas l = 2 corresponds to the microregion level. It is assumed that the partition at level l + 1 is a refinement of the partition at level l. That is, $B lj = ∪ ( l+1, j ′ )∈ D lj B l+1, j ′$ where Dlj is the set of descendants of subregion j at level l, and DljDli = ∅, ij . Note that the number of descendants does not need to be constant. In what follows we denote the number of descendants of a particular subregion (l, j) by mlj. This construction is very similar to the construction of Basseville et al. (1992a,b) for multiscale models on trees.

With this notation at hand, the aggregated measurements at the lth level of resolution are recursively defined by

$Display mathematics$

Analogously, the aggregated agricultural production mean process is defined by

$Display mathematics$

Next, let A be a set of subregions and denote by yA and μA the corresponding vectors of measurements and means, respectively. Then yDlj represents the vector of descendants of ylj. Note that ylj is a deterministic function of its descendants; thus, the distribution of yDlj conditional on ylj is a degenerate Gaussian distribution with a singular covariance matrix.

In addition, let $σ lj 2$ denote the variance of ylj and define $Σ l =diag( σ l,1: n l 2 )$. Assuming yL1, … , yLnL are conditionally independent given μL,,1:nL and $σ L,1: n L 2$, the variance at subregion (l, j) can be recursively computed as

$Display mathematics$

(p. 300) With these assumptions and as explained in Appendix B.1, the likelihood function can be factorized as (Kolaczyk and Huang, 2001)

$Display mathematics$

where $v lj = σ D lj 2 / σ lj 2 , θ lj = μ D lj − v lj μ lj$ and $Ω lj = Σ Dlj − σ lj −2 σ D lj 2 ( σ D lj 2 ) ′ ,l=1,…,L−1,j=1,…, n l$. Note that the scale specific parameter θlj is a vector corresponding to the difference between the mean level process at the descendants of subregion Blj and the scaled mean level process at Blj. Finally, $θ lj e = y D lj − v lj y lj$ is an empirical estimate of θlj, and $θ lj e | y lj , μ L , σ L 2 ∼N( θ lj , Ω lj )$.

This factorization is equivalent to a reparametrization of the model, where μL,,1:nL are replaced by the mean level process at the coarsest level μ1,1:n1 and by the scale-specific multiscale parameters θljs.

12.3 Exploratory multiscale data analysis

In this section we present an exploratory analysis of agricultural production data, from 1990 to 2005, in Espírito Santo State across several levels of resolution. The data are in inflation-corrected monetary value in the Brazilian currency which is the Real. The first column of Figure 12.2 displays time series of total per macroregion agricultural production from 1990 to 2005 in Espírito Santo State. As a result of these plots we can immediately make several observations. First, Macroregion 1 shows a rapid increase in production from 1993 to 1995 and extremely volatile behavior from 1995 to 2005. Second, Macroregion 2 presents a strong steady increase in production from 1990 to 2005. Additionally, Macroregion 3 displays a less impressive increase in production, whereas Macroregion 4 shows a striking increase in production from 1993 to 1997 and a not less surprising sharp decrease from 1997 to 2001.

The second column of Figure 12.2 displays the time series of agricultural production, for each macroregion, from 1990 to 2005 in Espírito Santo State per microregion. The disaggregation of production from Macroregion 1 shows that Microregion 2 has the largest production in almost all the years considered, and also the largest volatility. Microregions 1 and 3 have comparatively fairly low production while Microregions 4 and 5 have moderate and increasing production.

Macroregion 2 is formed by Microregions 6 and 7, and both microregions had an increase in production during the period of interest. Macroregion 3 contains Microregions 8, 9, and 10 and each of these microregions showed an increase in production, with Microregion 8 also exhibiting high volatility. (p. 301)

Click to view larger

Fig. 12.2 Agriculture production per macroregion of Espírito Santo State. Macroregion 1 (a–c), 2 (d–f), 3 (g–i), and 4 (j–l). Macroregion totals (a,d,g,j); macroregion disaggregated by microregion (b,e,h,k); empirical multiscale coefficients (c,f,i,l). Note, for ease of comparison, all plots have been constructed using the same range on the agricultural production-axis (y-axis).

Finally, Macroregion 4 contains Microregions 11 and 12; both microregions exhibit similar behaviour with rapid increase from 1993 to about 1998 and then rapid decrease from 1999 to 2001.

The third column of Figure 12.2 displays the temporal evolution of the empirical multiscale coefficients, for different macroregions. Recall that, for each macroregion, the empirical multiscale coefficient, at a given time, is a (p. 302) vector with each element corresponding to a descendant microregion. Thus, for Macroregion 1 we have 5 time series representing the evolution of the empirical multiscale coefficient. Moreover, this figure illustrates several other interesting aspects. First, the empirical multiscale coefficient varies through time in a fairly smooth manner. Second, the coefficient corresponding to Microregion 2 is the most prominent, clearly indicating that the microregion lost relative importance from 1996 to 1998, but then regained importance within Macroregion 1. Unfortunately, due to noise, it is more difficult to evaluate what is transpiring with the other microregions contained in Macroregion 1.

Macroregions 2 and 4 have only two microregions each, and thus at each time point the respective empirical multiscale coefficient has only two elements. These two elements sum to zero, and thus through time the respective series have symmetric behaviour around zero. Finally, Macroregion 3 contains three microregions, all of which do not seem to differ significantly.

Another interesting aspect is that the empirical multiscale coefficient for all of the macroregions seem to be noisy versions of latent multiscale coefficients that vary somewhat smoothly over time. Thus, it makes sense to consider a model that focuses on the temporal evolution of the mean level process at the coarsest level and on the temporal evolution of multiscale coefficients. The multiscale coefficient parameter acts similar to a wavelet coefficient in wavelet analysis; that is, it provides the detail information necessary to recover the mean process at resolution level l + 1 from the mean process at resolution level l. Thus, we can think of the multiscale factorization as providing a hierarchical structure, where each level of the structure is related to a particular resolution level and its specific mean process behavior.

12.4 Dynamic multiscale modelling

12.4.1 Multiscale coefficients and dynamic priors

In order to analyse the data arising out of our application of interest and to generally increase the flexibility of the approach reviewed in Section 12.2, we introduce temporal dynamics to the spatial multiscale framework. In particular, we add a subscript t to indicate time and assume that the mean level process at the coarsest level μt1 and the scale-specific parameter θtlj evolve through time according to random walks

(12.1)

$Display mathematics$

(12.2)

$Display mathematics$

Fundamental to the construction of our multiscale spatio-temporal modeling approach is the specification of the evolution covariance matrices W and ϒlj. (p. 303) These matrices have to be properly specified in order to preserve the special correlation structure of the elements of the scale specific parameters θtljs and to respect the constraint $1 ′ m lj θ tlj =0$.

The model we propose sets out a structure for the covariance matrices defined in (12.1) and (12.2). Particularly, for the covariance matrix at the coarsest level, we assume W = diag(W1:n1), with $W k = ξ k σ 1k 2$. This parametrization is extremely useful since ξk can be interpreted as the signal-to-noise ratio at the coarsest level and, as a result, it is easier to assign a meaningful prior to ξk than to Wk. Now, for the covariance matrix of the evolution innovations of the multiscale coefficients, we assume $ϒ lj = ψ lj Ω lj$. This specification both preserves the special correlation structure of the θtljs and implies the constraint $1 ′ m lj θ tlj =0$.

12.4.2 Priors for the hyperparameters

The parameters μ01k and θ0lj provide the initial conditions at time t = 0. Let D0 be the prior information about the system at time 0, and recursively define the accumulated information about the system at time t as Dt = Dt−1 ∪ {yL}. Then, we assume the following conjugate priors for μ01k and θ0lj

$Display mathematics$

and

$Display mathematics$

where m01k, c01k, m0lj, and C0lj are fixed a priori. Note that noninformative uniform priors p(μ01k) ∝ 1 and $p( θ 0lj )α1( p ′ D lj θ 0lj =0 )$ where 1(·) is the indicator function and is equal to 1 if the condition is true, and is equal to 0 otherwise, can be obtained as limiting cases of the conjugate priors when c01k → ∞ and C0lj → ∞.

The signal-to-noise ratio parameters ξk and ψlj, described in Section 12.4.1, are most likely small. Otherwise, the components of the latent process would vary too much over time making it difficult to predict the several multi-scale components of the multivariate spatio-temporal process. As a result, we expect ξk, k = 1, … , n1, and ψlj, l = 1, … , L, j = 1, … , nl, to be much smaller than 1. Therefore, we assume that the prior distribution for each of ξk is IG(0.5τk, 0.5κk) with density $p( ξ k )α ξ k − 0.5 τk −1 exp( − 0.5 κk / ξ k )$, where τk and κk are fixed a priori such that there is a high probability that ξk is smaller than 0.3. Similarly, the prior distribution for ψlj is IG(0.5ϱlj, 0.5ςlj) where ϱlj and ςlj are fixed a priori such that there is a high probability that ψlj is smaller than 0.3.

(p. 304) 12.4.3 The multiscale spatio-temporal model

In the previous sections we have detailed an approach to multiscale spatio-temporal modeling. For the sake of clarity we summarize our multiscale spatio-temporal model.

Observation equation:

(12.3)

$Display mathematics$

where $Σ L =diag( σ L,1: n L 2 )$.

Multiscale decomposition of the observation equation:

(12.4)

$Display mathematics$

(12.5)

$Display mathematics$

System equations:

$Display mathematics$

12.5 Estimation

12.5.1 Empirical Bayes estimation of $σ tLj 2 , v j$ and Ωlj

The parameter νlj is the vector of relative volatility of the descendants of subregion (l, j), and Ωlj is the singular covariance matrix of the empirical multiscale coefficient of subregion (l, j). Here we use empirical Bayes (Carlin and Louis, 2000) for the estimation of νlj and Ωlj, that is, we estimate these parameters upfront and then hold them fixed at their estimates when we perform the analysis of the other parameters of the model.

The parameters νlj and Ωlj, l = 1, … , L, j = 1, … , nl, are deterministic functions of $σ tLj 2 ,j=1,…, n L$ the conditional variance of ytLj given μtLj. A preliminary analysis of the agricultural production of each county through time indicates that $σ tLj 2 ,j=1,…, n L$, may be assumed constant over time. Henceforth, we assume that $σ tLj 2 = σ Lj 2 ,t=1,…,T,j=1,…, n L$. In order to obtain an estimate of $σ Lj 2$ we perform a univariate time series analysis for each county using first-order dynamic linear models (West and Harrison, 1997). These analyses yield estimates $σ ˜ Lj 2$ that we use for the empirical Bayes estimation of νlj and Ωlj. More specifically, we estimate these parameters by

$Display mathematics$

(p. 305) where the estimates $σ ˜ Dlj 2 , σ ˜ lj 2$ and $∑ ˜ Dlj$ are directly computed from $σ ˜ Lj 2 ,j=1,…, n L$.

12.5.2 Posterior exploration

Here we present a Markov chain Monte Carlo algorithm for performing a Bayesian analysis of our multiscale multivariate spatio-temporal model. More specifically, we build a Gibbs sampler (Geman and Geman, 1984; Gelfand and Smith, 1990) to explore the posterior distribution. We block the parameters of the model as follows: $ξ 1 ,…, ξ n 1 , ψ 11 ,…, ψ 1 n 1 ,…, ψ L−1,1 ,…, ψ L−1, n L−1$, $μ 0:T,11 ,…, μ 0:T,1 n 1 , θ 0:T,11 ,…, θ 0:T,1 n 1 ,…, θ 0:T,L−1,1 ,…, θ 0:T,L−1, n L−1$. It can be shown that, given $σ L,1: n L 2 , ξ 1: n 1$, and $ψ 1,1: n 1 ,…, ψ L−1,1: n L−1$, the vectors $μ 0:T,11 ,…, μ 0:T,1 n 1 , θ 0:T,11 ,…, θ 0:T,1 n 1 ,…, θ 0:T,L−1,1 ,…, θ 0:T,L−1, n L−1$ are conditionally independent and normally distributed a posteriori. As a result, simulation of $μ 0:T11 ,…, μ 0:T,1 n 1 , θ 0:T,11 ,…, θ 0:T,1 n 1 ,…, θ 0:T,L−1,1 ,…, θ 0:T,L−1, n L−1$ can be performed in parallel at each iteration of the MCMC algorithm. In this direction, we simulate each $μ 0:T,1k ,k=1,…, n 1$, from its full conditional distribution using the Forward Filter Backward Sampler (FFBS) (Carter and Kohn, 1994; Frühwirth-Schnatter, 1994). Simulation of each θ0:T,lj, l = 1, … , L − 1; j = 1, … , nl, poses a non-trivial step in the MCMC procedure because the covariance matrix of the system equation is singular. Therefore, to accommodate this simulation we describe a modified FFBS for models with singular system equation covariance matrix (see Appendix B.3).

Simulation of ξk and ψlj can be performed with Gibbs steps sampling directly from their full conditional distributions as we describe below. Specifically, the parameter ξk, k = 1, … , n1, is sampled from its full conditional distribution: $ξ k | μ 0:T,1k , σ 1k 2 , D T ∼IG( 0.5 τ k * , 0.5 κ k * )$, where $τ k * = τ k +T$ and $κ k * = κ k + σ 1k 2 ∑ t=1 T ( μ t1k − μ t−1,1k ) 2$. Finally, the parameter ψlj, l = 1, … , L, j = 1, … , nl, is sampled from its full conditional distribution: $ψ lj | θ 0:T,lj , D T ∼IG( 0.5 ϱ lj * ,0.5 ς lj * )$ where $ϱ lj * = ϱ jl +T( m lj −1 )$ and $ς lj * = ς lj + ∑ t=1 T ( θ tlj − θ t−1,lj ) ′ Ω lj − ( θ tlj − θ t−1,lj )$, where $Ω lj −$ is a generalized inverse of Ωlj.

Simulation of μ0:T,1k is performed with the usual FFBS as introduced and described in Carter and Kohn (1994) and Frühwirth-Schnatter (1994). This step is nowadays fairly standard, therefore we omit the exact equations for the sake of brevity. For a comprehensive discussion, see Gamerman and Lopes (2006).

Simulation of θ·lj is performed with the Singular Forward Filter Backward Sampler (SFFBS) which we describe in Appendix B.3.

12.5.3 Reconstruction of the latent mean process

One of the main interests of any multiscale analysis is the estimation of the latent agricultural production mean process at each scale of resolution. This (p. 306) task can be performed using the sample of the posterior distribution simulated with the MCMC algorithm described above and in Appendix B. In particular, from the gth draw from the posterior distribution, we can recursively compute the corresponding latent mean process at each level of resolution using the equation

$Display mathematics$

proceeding from the coarsest to the finest resolution level. With these draws, we can then compute the posterior mean, standard deviation and credible intervals for the latent mean process. Finally, the results can be presented graphically via maps and dynamic movies of the estimated latent mean process over the region of interest.

12.6 Agricultural production in Espírito Santo

Espírito Santo State has a tropical humid climate, with an average annual temperature around 73 degrees Fahrenheit and rain precipitation around 1400 millimeters per year. The state borders the Atlantic Ocean to the east, and the states of Rio de Janeiro to the south, Bahia to the north and Minas Gerais to the west. Its main agriculture products are coffee, sugar cane, manioc, coconut, and tropical fruits such as banana, passion fruit, and papaya. Fluctuations in the total value of agricultural production may be caused by price changes, expansions and reductions in cultivated area, variations in weather, crop diseases, and productivity improvement.

Here we analyse the data on agricultural production in Espírito Santo with the multiscale spatio-temporal model using the estimation procedure described in Section 12.5. For this analysis, we ran the MCMC algorithm for a total of 10,000 iterations. Convergence of the MCMC is verified through trace plots of the posterior and was achieved after the first 500 iterations. Consequently, for the purposes of inference, we have conservatively discarded the first 1000 iterations as burnin.

Figure 12.3 shows the marginal posterior densities for the signal-to-noise ratio ξk of the evolution of the aggregated coarsest level. The posterior density for ξ3 of Macroregion 3 is relatively closer to 0 than for the other macroregions, implying that the underlying latent process μt3 varies through time with less variability than μtk for k = 1, 2, 4. This is reflected in Figure 12.4, which displays the posterior mean and 95% credible interval for μt1k, k = 1, , 4, as a function of time, along with the coarse level observations. Even though the coarse level observations are fairly noisy, we are still able to estimate a meaningful and reasonably smooth underlying latent mean process. This allows us to refine our conclusions from the exploratory data analysis of (p. 307)

Fig. 12.3 Marginal posterior densities for the signal-to-noise ratio ξk of the evolution of the aggregated coarsest level.

Section 12.3. First, Macroregion 1 shows a rapid increase in production from 1993 to about 1995 due to expansion of the planted area; after 1996 the expansion stopped and the production seems to have just varied around the same level due to price fluctuations. Second, Macroregion 2 presents a steady and relatively strong increase in production from 1990 to 2005 due to incorporation of new areas for agriculture. Additionally, Macroregion 3 displays a slow increase in production probably due to a combination of slow expansion of cultivated area and improvement in productivity. Finally, Macroregion 4 shows a striking increase in production from 1994 to 1998 and a not less surprising decrease from 1999 to 2001. This coincides with a period of economic stability in Brazil from 1994 to 1998 which abruptly ended in the beginning of 1999 with a strong devaluation of the Real and a credit squeeze. Probably many farmers in Macroregion 4 used credit to expand the planted area during the period of economic stability and were driven out of business by the 1999 Brazilian economic crisis.

Figure 12.5 shows the marginal posterior densities for the signal-to-noise ratio ψ1j of the evolution of the coarsest level multiscale coefficients. The signal-to-noise ratio for Macroregion 1 is relatively large, reflecting the fact that the underlying multiscale coefficient varies significantly through time (see, for example, Figure 12.2c). The signal-to-noise ratio for Macroregion 2 is smaller than for Macroregion 1, reflecting a less volatile temporal evolution. Meanwhile, the signal-to-noise ratios for Macroregions 3 and 4 are concentrated close to 0 (p. 308)

Click to view larger

Fig. 12.4 Coarse level observations (dots), and posterior mean (solid line) and 95% credible interval (dashed lines) for μt1k, k = 1, , 4, through time. Note, for ease of comparison, all plots have been constructed with identical range on the y-axis.

implying that most of the variability of the empirical multiscale coefficients in Figures 12.2(i) and (l) are due to noise.

The analysis of the latent multiscale coefficient sheds light on possible changes over time in the relative importance of different subregions within the same parent region. With regard to agricultural production in Espírito Santo State, there are four multiscale coefficients that relate each macroregion with the respective microregions and 12 multiscale coefficients that relate each microregion with the respective counties. In the present application, these multiscale coefficients are vectors with lengths varying from 2 to 12. Thus, the number of graphs to be examined is quite extensive; therefore, for the sake of brevity we focus here only on a subset of these graphs. Specifically, we illustrate possible patterns that may emerge with the analyses of the multiscale coefficients for Macroregions 1, 2 and 4. Figure 12.6 presents the posterior mean (p. 309)

Fig. 12.5 Marginal posterior densities for the signal-to-noise ratio ψ1j of the evolution of the coarsest level multiscale coefficients.

and 95% credible interval through time, as well as the empirical multiscale coefficient for each of the five components of the multiscale coefficient for Macroregion 1, θt11 = θt11,1:5. From this figure we note the significant change through time in the relative importance of Microregions 2, 4, and 5, within Macroregion 1. Figure 12.7 focuses on Macroregion 2; particularly, this figure illustrates that the relative change in importance of Microregions 6 and 7 within Macroregion 2 is evident. Finally, Figure 12.8 displays posterior summaries for the multiscale coefficient for Macroregion 4; whereas the empirical multiscale coefficients are fairly noisy, the posterior mean and credible intervals seem to indicate that the relative importance of Microregions 11 and 12 has not changed significantly through time.

Figure 12.9 presents, on the logarithm scale, maps of observed agricultural production and estimated latent processes for years 1993, 1997, 2001, and 2005. Whereas the observed production varies substantially both spatially and temporally, the estimated latent process is reasonably smooth in both the spatial and temporal domains. The reason for this is because each subregion borrows information both across time and from the other subregions through its parent subregion. At the same time, differences between subregions are respected and thus the maps of estimated latent process do not exhibit the spatial over-smoothing typical of many spatial models. (p. 310)

Click to view larger

Fig. 12.6 Multiscale coefficient for Macroregion 1, θt11,1:5 = (θt111, … , θt115). Empirical multiscale coefficient (dots), posterior mean (solid line) and 95% credible interval (dashed lines). Note, for ease of comparison, all plots have been constructed with identical range on the y-axis.

(p. 311)

Click to view larger

Fig. 12.7 Multiscale coefficient for Macroregion 2, θt12,1:2 = (θt121, θt122). Empirical multiscale coefficient (dots), posterior mean (solid line) and 95% credible interval (dashed lines).

Click to view larger

Fig. 12.8 Multiscale coefficient for Macroregion 4, θt14,1:2 = (θt141, θt142). Empirical multiscale coefficient (dots), posterior mean (solid line) and 95% credible interval (dashed lines).

12.7 Further discussion

We have presented an analysis of agricultural production in Espírito Santo State using a multiscale spatio-temporal model. This model decomposes the agricultural production data into multiscale coefficients at different scales of resolution and evolves the multiscale coefficients through time using state-space models. The resulting framework is able to produce estimates of the underlying latent production mean process that are coherent across the different scales of resolution.

A potentially important aspect of our spatio-temporal framework is the ability to effectively analyse massive economics data sets. More specifically, the (p. 312)

Click to view larger

Fig. 12.9 Logarithm of observed agriculture production and logarithm estimated latent process for years 1993, 1997, 2001, and 2005.

(p. 313) multiscale decomposition coupled with the state-space time evolution allow an extremely efficient divide-and-conquer estimation algorithm. This estimation algorithm has good granularity and can, therefore, be implemented in a parallel computing environment. As a consequence, our multiscale spatio-temporal approach can be made extremely computationally efficient and thus is well designed for dealing with massive data sets. For example, it is conceptually feasible to analyse economic data sets comprising county level information for large countries through several years.

A. Broader context and background

A.1 Multiscale models

Multiscale modelling broadly refers to models for data and processes that may be structured by scale. A broad coverage of existing statistical multiscale models is given by Ferreira and Lee (2007). Multiscale modelling includes not only the well-known wavelet multiscale decompositions (Daubechies, 1992; Mallat, 1999; Vidakovic, 1999), but many other models such as for example Gaussian models on trees (Willsky, 2002), hidden Markov models on trees (Kato et al., 1996), multiscale random fields (Chapter 10, Ferreira and Lee, 2007), multi-scale time series (Ferreira et al., 2006), change of support models (Banerjee et al., 2003), and implicit computationally linked multiscale models (Higdon et al., 2002; Holloman et al., 2006). Multiscale models have been successfully applied to several different scientific areas such as image segmentation (Nowak, 1999), permeability estimation (Ferreira et al., 2003), agronomy (Banerjee and Johnson, 2006), single photon emission computed tomography (Holloman et al., 2006), and disease mapping (Louie and Kolaczyk, 2006).

A.2 Dynamic linear models

Dynamic models, also known as state-space models, have been successfully used in many areas of science to model processes that evolve through time, as for example in economics (Azevedo et al., 2006), finance (Jacquier et al., 2007), ecology (Wikle, 2003), epidemiology (Knorr-Held and Richardson, 2003), and climatology (Wikle et al., 2001). The books by West and Harrison (1997) and Harvey (1989) give thorough coverage of dynamic models from the Bayesian and frequentist perspectives, respectively. Migon et al. (2005) provide a review along with recent developments on Bayesian dynamic models.

In this chapter we consider dynamic linear models, which can be written as

$Display mathematics$

(p. 314) where the first equation is known as the observation equation and the second equation is the system equation. At time t, yt is the vector of observations, θt is the latent process, Ft relates the observations to the latent process, Gt describes the evolution of the latent process through time, Vt is the observational covariance matrix, and Wt is the covariance matrix of the system equation innovation. Typically, Ft, Gt, Vt and Wt are known up to a few hyperparameters.

If the matrices Ft, Gt, Vt, and Wt are completely known, then the Kalman filter can be used to estimate the latent process. If Ft, Gt, Vt, and/or Wt are functions of unknown hyperparameters, then estimation can be performed using Markov chain Monte Carlo (MCMC). See Robert and Casella (2004) and Gamerman and Lopes (2006) for detailed information on MCMC methods. In this context, each iteration of the MCMC algorithm is then divided in two blocks: simulation of the unknown hyperparameters, and simulation of the latent process. The details of the simulation of the hyperparameters is model-specific. If the matrices Vt and Wt are positive definite, the latent process can be efficiently simulated using the forward filter backward sampler (Frühwirth-Schnatter, 1994; Carter and Kohn, 1994).

B. Multiscale decomposition and computations

B.1 Multiscale decomposition

Since the joint distribution of the observations at the finest level L conditional on the mean process μL,1:nL is multivariate Gaussian, it follows that $y l,1: n l | μ l,1: n l , Σ l ∼N( μ l,1: n l , Σ l )$. Moreover, the joint distribution of ylj and yDlj is

$Display mathematics$

Therefore, using standard results from the theory of multivariate normal distributions (Mardia et al., 1979), the conditional distribution of yDlj given ylj is

$Display mathematics$

with

$Display mathematics$

and

$Display mathematics$

(p. 315) l = 1, … , L − 1, j = 1, … , nl. Note that the scale specific parameter θlj is a vector corresponding to the difference between the mean level process at the descendants of subregion Blj and the scaled mean level process at Blj.

Next, consider

$Display mathematics$

which is an empirical estimate of θlj. Then,

(12.6)

$Display mathematics$

which is a singular Gaussian distribution (Anderson, 1984, p. 33). Straightforward linear algebra shows that $θ lj e$ is subject to the constraint $1 ′ m lj θ lj e =0$. Moreover, this constraint is implicitly embedded in the singular Gaussian distribution in (12.6). In order to see this is true, note that

$Display mathematics$

and

$Display mathematics$

B.2 Derivation of full conditional distributions

Full conditional of ξk. The full conditional density of ξk, k = 1, … , n1, is

$Display mathematics$

Therefore, $ξ k | μ 0:T,1k , σ 1k 2 , D T ∼~IG( 0.5 τ k * ,0.5 κ k * )$, where $τ k * = τ k +T$ and $κ k * = κ k + σ 1k −2 ∑ t=1 T ( μ t1k − μ t−1,1k ) 2$.

Full conditional of ψlj. The full conditional density of ψlj, l = 1, … , L, j = 1, … , nl, is

$Display mathematics$

Therefore $ψ lj | θ 0:T,lj , σ 2 , D T ∼IG( 0.5 τ lj * ,0.5 κ lj * )$ where $τ lj * =T( m lj −1 )+τ$ and $κ lj * =κ+ ∑ t=1 T ( θ tlj − θ t−1,lj ) ′ Ω lj − ( θ tlj − θ t−1,lj )$.

(p. 316) B.3 Singular forward filter backward sampler

From the multiscale decomposition of the observation equation, we have that

$Display mathematics$

Moreover, from the system equations,

$Display mathematics$

Then the singular forward filter backward sampler proceeds as follows.

1. 1. Use the Kalman filter to obtain the mean and covariance matrix of $p( θ 1lj | σ 2 , ψ lj , D 1 ),…,p( θ Tlj | σ 2 , ψ lj , D T )$:

• posterior at $t−1: θ t−1,lj | D t−1 ∼N( m t−1,lj , C t−1,lj Ω lj )$;

• prior at $t: θ tlj | D t−1 ∼N( a tlj , R tlj Ω lj )$, where $a tlj = m t−1,lj$ and $R tlj = C t−1,lj + ψ lj$;

• posterior at $t: θ tlj | D t ∼N( m tlj , C tlj Ω lj )$, where $C tlj = ( 1+ R tlj −1 ) −1$ and $m tlj = C tlj ( θ tlj e + R tlj −1 a tlj )$.

2. 2. Simulate θTlj from $θ Tlj | σ 2 , ψ lj , D T ∼N( m Tlj , C Tlj Ω lj )$.

3. 3. Recursively simulate θtlj, t = T − 1, , 0, from

$Display mathematics$

where $H tlj = ( C tlj −1 + ψ lj −1 ) −1$ and $h tlj = H tlj ( C tlj −1 m tlj + ψ lj −1 θ t+1,lj )$.

Acknowledgments

Marco A. R. Ferreira was partially supported by CNPq grant 479093/2004-0. Part of this work was performed while Adelmo I. Bertolde was a graduate student in the Statistics Program at the Federal University of Rio de Janeiro with a scholarship from CAPES, Brazil.

References

Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis (2nd edn). John Wiley, New York.Find this resource:

Azevedo, J. V., Koopman, S. J. and Rua, A. (2006). Tracking the business cycle of the euro area: A multivariate model-based bandpass filter. Journal of Business and Economic Statistics, 24, 278–290.Find this resource:

Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2003). Hierarchical Modeling and Analysis for Spatial Data. Chapman & Hall, Boca Raton, Florida.Find this resource:

(p. 317) Banerjee, S. and Johnson, G. A. (2006). Coregionalized single- and multi-resolution spatially-varying growth curve modelling with application to weed growth. Biometrics, 61, 617–625.Find this resource:

Basseville, M., Benveniste, A. and Willsky, A. S. (1992a). Multiscale autoregressive processes, part I: Schur-Levinson parametrizations. IEEE Transactions on Signal Processing, 40, 1915–1934.Find this resource:

Basseville, M., Benveniste, A. and Willsky, A. S. (1992b). Multiscale autoregressive processes, part II: Lattice structures for whitening and modeling. IEEE Transactions on Signal Processing, 40, 1935–1953.Find this resource:

Berliner, L. M., Wikle, C. K. and Milliff, R. F. (1999). Multiresolution wavelet analyses in hierarchical Bayesian turbulence models. In Bayesian Inference in Wavelet Based Models, (ed. P. Müller and B. Vidakovic), pp. 341–359. Springer-Verlag, New York.Find this resource:

Carlin, B. P. and Louis, T. A. (2000). Bayes and Empirical Bayes Methods for Data Analysis, (2nd edn.) Chapman Hall / CRC. Boca Raton, Florida.Find this resource:

Carter, C. K. and Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika, 81(3), 541–553.Find this resource:

Daubechies, I. (1992). Ten Lectures on Wavelets. SIAM: Philadelphia, PA.Find this resource:

Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation via wavelet shrinkage. Biometrika, 81, 425–455.Find this resource:

Ferreira, M. A. R., West, M., Bi, Z., Lee, H. and Higdon, D. (2003). Multi-scale modelling of 1-D permeability fields. In Bayesian Statistics 7, (ed. J. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, and M. West), pp. 519–527. Oxford University Press, Oxford.Find this resource:

Ferreira, M. A. R. and Lee, H. K. H. (2007). Multiscale Modeling: A Bayesian Perspective. Springer Series in Statistics. Springer, New York.Find this resource:

Ferreira, M. A. R., West, M. Lee, H. K. H. and Higdon, D. (2006). Multi-scale and hidden resolution time series models. Bayesian Analysis, 1, 947–968.Find this resource:

Frühwirth-Schnatter, S. (1994). Data augmentation and dynamic linear models. Journal of Time Series Analysis, 15, 183–202.Find this resource:

Gamerman, D. and Lopes, H. F. (2006). Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, (2nd edn.) Chapman Hall/CRC, Boca Raton, Florida.Find this resource:

Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398–409.Find this resource:

Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.Find this resource:

Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, Cambridge.Find this resource:

Higdon, D., Lee, H. and Bi, Z. (2002). A Bayesian approach to characterizing uncertainty in inverse problems using coarse and fine scale information. IEEE Transactions on Signal Processing, 50, 389–399.Find this resource:

Holloman, C. H., Lee, H. K. H. and Higdon, D. M. (2006). Multi-resolution genetic algorithms and Markov chain Monte Carlo. Journal of Computational and Graphical Statistics, 15, 861–879.Find this resource:

Jacquier, E., Johannes, M. and Polson, N. (2007). MCMC maximum likelihood for latent space models. Journal of Econometrics, 137, 615–640.Find this resource:

Johannesson, G., Cressie, N. and Huang, H. (2007). Dynamic multi-resolution spatial models. Environmental and Ecological Statistics, 14, 5–25.Find this resource:

(p. 318) Kato, Z., Berthod, M. and Zerubia, J. (1996). A hierarchical markov random field model and multi-temperature annealing for parallel image classification. Graphical Models and Image Processing, 58, 18–37.Find this resource:

Knorr-Held, L. and Richardson, S. (2003). A hierarchical model for space-time surveillance data on meningococcal disease incidence. Applied Statistics, 52, 169–183.Find this resource:

Kolaczyk, E. D. and Huang, H. (2001). Multiscale statistical models for hierarchical spatial aggregation. Geographical Analysis, 33, 95–118.Find this resource:

Louie, M. M. and Kolaczyk, E. D. (2006). A multiscale method for disease mapping in spatial epidemiology. Statistics in Medicine, 25, 1287–1308.Find this resource:

Mallat, S. G. (1999). A Wavelet Tour of Signal Processing, (2nd edn). Academic Press: San Diego.Find this resource:

Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivarite Analysis. Academic Press: San Diego.Find this resource:

Migon, H. S., Gamerman, D., Lopes, H. F. and Ferreira, M. A. R. (2005). Bayesian dynamic models. In Handbook of Statistics, Volume 25, (ed. D. Dey and C. R. Rao), pp. 553–588. Elsevier, Amsterdam.Find this resource:

Nowak, R. D. (1999). Multiscale hidden Markov models for Bayesian image analysis. In Bayesian Inference in Wavelet Based Models, (ed. P. Müller and B. Vidakovic), pp. 243–265. Springer-Verlag, New York.Find this resource:

Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, (2nd edn.) Springer-Verlag, New York.Find this resource:

Sweldens, W. (1996). The lifting scheme: a custom-design construction of biorthogonal wavelets. Applied and Computational Harmonic Analysis, 3, 186–200.Find this resource:

Vidakovic, B. (1999). Statistical Modeling by Wavelets. John Wiley, New York.Find this resource:

West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models (2nd ed.). Springer-Verlag, New York.Find this resource:

Wikle, C. K. (2003). A kernel-based spectral approach for spatiotemporal dynamic models. Technical report, Department of Statistics, University of Missouri.Find this resource:

Wikle, C. K., Milliff, R. F., Nychka, D. and Berliner, L. M. (2001). Spatio-temporal hierarchical Bayesian modeling: Tropical ocean surface winds. Journal of the American Statistical Association, 96, 382–397.Find this resource:

Willsky, A. S. (2002). Multiresolution Markov models for signal and image processing. Proceedings of the IEEE, 90, 1396–1458.Find this resource: