1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Herramientas de modelizacion para series

138 5 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 138
Dung lượng 851,43 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • Tesis Ignacio Arbués Lombardía

    • PORTADA

    • AGRADECIMIENTOS

    • INDICE GENERAL

    • CAPÍTULO 1: INTRODUCCIÓN

    • CAPÍTULO 2: AN EXTENDED PORTMANTEAU TEST FOR VARMA MODELS WITH MIXING NONLINEAR CONSTRAINTS

    • CAPÍTULO 3: DEPARTURE FROM NORMALITY OF INCREASING-DIMENSION MARTINGALES

    • CAPÍTULO 4: DETERMINING THE MSE-OPTIMAL CROSS SECTION TO FORECAST

Nội dung

Como dec´ıamos anteriormente, las restriccionespueden mitigar este problema, pero en lugar de introducir restricciones in-tuitivas y espec´ıficas para cada ecuaci´on o variable, los autor

Contextualizaci´ on hist´ orica

Los modelos VAR y DFM en la macroeconom´ıa

Hasta los a˜nos setenta del siglo pasado, para estudiar el comportamiento conjunto de las variable macroecon´omicas se empleaban fundamentalmente los Modelos de Ecuaciones Simult´aneas Estos modelos se hab´ıan generali- zado a partir de los trabajos de la Comisi´on Cowles en los a˜nos cuarenta y consisten en sistemas de ecuaciones lineales en los que aparecen las variables cuyo comportamiento se quiere estudiar, posiblemente retardos de estas va- riables y perturbaciones aleatorias Un modelo de este tipo se puede escribir de la forma

A(L)y t =u t , (1.1) donde y t es un vector que contiene las variables del modelo, A(z) = A 0 +

A1+ .+Apz p es un polinomio cuyos coeficientes son matrices cuadradas (que podemos considerar tambi´en como una matriz cuyos elementos son polinomios),L es el operador de retardos (es decir, L j yt=yt − j) yut es un vector de perturbaciones aleatorias Por desgracia, un modelo de la forma (1.1) tiene un inconveniente: no est´a identificado Esto significa que puede haber dos conjuntos diferentes de par´ametros que dan lugar a modelos que no pueden ser distinguidos a partir de los datos porque sus distribuciones de probabilidad son id´enticas Los problemas de identificaci´on se pueden en- tender como la consecuencia de buscar el modelo en un espacio demasiado grande 1 Por ello, una manera de proceder es reducir este espacio imponien-

Here we follow the usual practice of using the word “model” to refer both to the general object represented in (1.1) and to the particular instance obtained by assigning concrete values to the parameters The context allows us to distinguish these two meanings We also discuss parameter restrictions A common restriction is to require the covariance matrix of the disturbances to be diagonal (i.e., the shocks are uncorrelated or, in the Gaussian case, independent) Another type of restriction introduces a distinction between endogenous and exogenous variables Denoting by y_t the vector that contains the endogenous variables and by x_t the vector of exogenous ones, we then formulate the model in the following form.

B(L)y t =C(L)x t +u t , lo que que es equivalente a suponer que en la matrizA eran nulos los coefi- cientes dex en las ecuaciones dey.

Imposing restrictions on the parameters of a model increases the precision of the estimates, which can tempt researchers to include more restrictions than are strictly necessary to achieve identifiability But if those restrictions are incorrect, the model is misspecified This suggests that restrictions should be supported by empirical evidence or theoretical justification, a caveat echoed by Sims in the influential article "Macroeconomics and Reality" ([4]).

Within economic modeling, when different models end up with substantially different right-hand-side variables, the divergence isn't driven by alternative economic theories alone; in the case of demand equations, it often reflects an intuitive, econometrician’s blend of psychological and sociological reasoning rather than a strict reliance on traditional economic theory.

En la referencia que acabamos de citar, Sims aboga por un enfoque dis- tinto, seg´un el cu´al todas las variables se consideran end´ogenas y en el que

Because the models end up with very different sets of variables on the right-hand side, this difference does not arise from economic theory but, in the case of the demand equations, from the intuitive version of an econometric approach grounded in psychological and sociological insights The identification problem is addressed by moving to the reduced form To obtain the reduced form we multiply the left-hand side of the model by A_0^{-1} and rewrite it as Φ(L) y_t = ε_t, where Φ(z) = I + Φ_1 z + + Φ_p z^p, Φ_j = A_0^{-1} A_j and ε_t = A_0^{-1} u_t.

Este tipo de modelo es conocido como VAR y se convirti´o en una de las herramientas m´as habituales para analizar el comportamiento din´amico de las variables macroecon´omicas.

Mathematically, the VAR model can be viewed as a special case of the VARMA (Vector Autoregression and Moving-Average) model In this perspective, VARMA stands as the multivariate counterpart to the univariate AR model, incorporating both autoregressive and moving-average components to capture dynamics across multiple time series.

MA is widely used for modeling univariate time series, a legacy going back to Box and Jenkins The VARMA models have the form Φ(L)y_t = Θ(L)ε_t, with Θ being another matrix of polynomials, and the right-hand side representing a moving average component VARMA models have never enjoyed the same popularity as VARs or the univariate ARMA models, partly because their estimation is computationally much more demanding Nevertheless, they deserve mention for reasons that will become evident later.

Around the same period that Sims was publishing—indeed a little earlier—he, together with Thomas Sargent and John Geweke, introduced another type of model in Sargent’s doctoral thesis supervised by the former, which was designed to address several problems of classical models as well as additional concerns In [6], besides criticizing the a priori restrictions, it highlights one of the drawbacks of these equation-based models.

Authors contend that macroeconometric work embeds very little of the a priori theory, whether in simultaneous-equation models or in vector autoregressions (VARs), and that the field suffers from a proliferation of parameters Sargent and Sims argued that this combination—weak theoretical grounding alongside an expanding parameter space—undermines the credibility and usefulness of empirical macroeconomic analysis.

Cyclical dynamics among macroeconomic variables typically unfold with lags of eight or more quarters A ten-equation, tenth-order autoregression—ten lags of ten variables in each equation—leaves virtually zero degrees of freedom in U.S postwar data Rather than reducing the model’s dimensionality by preemptively restricting specific equations as in standard methods, we adopt simplifying conditions that are symmetric across the variables.

One central issue is that the number of parameters in a n-variate VAR(p) model grows as p n^2 + n(n+1)/2, so the quadratic term in n makes the inclusion of many variables difficult Although constraints can mitigate this, the authors advocate a single, general restriction that is symmetric—treating all variables equally and invariant across equations—rather than intuition-based, per-equation restrictions This approach contrasts with macroeconomic models that lack explicit microfoundations of individual behavior With the introduction of stochastic models, such a global restriction offers a parsimonious path to expanding model size without sacrificing tractability.

Din´ amicos de Equilibrio General, se corregir´ıa esto.

Cyclic interactions among macroeconomic variables typically operate with lags of eight quarters or more A general autoregressive system with ten equations and order ten—ten lags of ten variables in every equation—would leave essentially zero degrees of freedom when estimated on postwar U.S data Rather than reducing dimensionality by imposing a priori restrictions on specific equations as is customary in standard methodologies, we adopt simplifying constraints that are symmetric across the variables (including their permutations) Equivalently, this restriction can be stated informally as: the dynamics of the n variables can be explained by a fixed, reduced number of common factors.

Formally, a dynamic factor model (DFM) can be written as y_t = A(L) f_t + v_t, where A is a matrix of polynomials, f_t is a k-dimensional vector of common factors and v_t is a vector of idiosyncratic factors Similar to traditional factor analysis, the idiosyncratic components v_t are assumed uncorrelated, so the relationships among the elements of y_t are driven solely by the common factors This structure dramatically reduces parameter proliferation The factors themselves follow their own dynamic models—for example, a VAR for f_t and univariate autoregressive (AR) models for each component of v_t: Φ(B) f_t = ξ_t, φ_i(B) v_t^i = η_t^i.

Suppose the orders of the dynamic factor models and the degree of A are at most p Then the total number of parameters is n k p + k^2 p + k(k+1)/2 + n(p+1) There are no quadratic terms in n, only linear terms in n with coefficient k p, so if k p remains controlled, one can, in principle, let n be large.

Inferencia en sistemas lineales

As changes in how macroeconomic variables were modeled were taking shape, parallel advances in the statistical theory of linear systems emerged that are relevant to our discussion This theory rests on a foundational result known as the Wold decomposition, which can be found, for example, in [15] It asserts that any linearly regular process can be expressed as x_t = Ψ(L) ε_t, where ||Ψ||^2 < ∞ and ε_t is white noise, meaning it is weakly stationary and uncorrelated with ε_s for s ≠ t.

Este resultado permite buscar modelos lineales para explicar la din´amica de cualquier proceso estacionario 6 , aunque se puede emplear al menos de dos maneras:

The most common approach combines an additional assumption: the transfer function Ψ is rational, meaning it can be written as Ψ = Φ^{-1} Θ, where Φ and Θ are polynomials This is equivalent to saying that the process satisfies an ARMA model.

5 un proceso estacionario x t es linealmente regular si E [ x ( t )] = 0 y l´ım τ→∞ x ( t + τ|t ) = 0 donde x ( t + τ |t ) es el predictor lineal m´ınimo cuadr´ atico de x ( t + τ ) que usa {x ( s ) : s ≤ t}

6 Esto no implica que no pueda haber modelos no lineales con mejores propiedades.

Without that additional assumption, the construction of a linear model is framed as an approximation problem, where the relationship that was previously considered exact is now treated as approximate For example, Ψ can be approximated by a fractional rational Φ^{-1}Θ, though it is more common to transform the Wold representation into an infinite autoregressive form, such as Π(B)x_t, and approximate the power series Π by a polynomial Φ.

Two contrasting modeling strategies are considered In the first, a linear model Φ̂−1Θ̂ is estimated so that the difference between the true transfer function of the process and the estimated one is attributable solely to sampling variability rather than population parameters; in the second, it is the finite order of the polynomials that separates the model from the true process behavior These distinctions have important implications: (i) under the rationality assumption, one talks about model identification, whereas in the approximate setting we can only discuss model selection since neither model is correct; (ii) consequently, the first approach favors the Bayesian Information Criterion (BIC) for model selection, while the second approach is better served by the Akaike Information Criterion (AIC); (iii) in the approximate case, goodness-of-fit tests lose meaning This work contributes to both frameworks, as will be discussed later.

In 1970, Box and Jenkins published Time Series Analysis: Forecasting and Control, a tremendously influential work that provided practitioners working with univariate time series with a phased, practical framework for building models now known as ARIMA ARIMA models are ARMA models augmented with differencing on the left-hand side, including nonseasonal differences ∇ = (1 − L) or seasonal differences ∇s = (1 − L^s), where d denotes the number of observations per seasonal cycle (for economic data, typically one year).

This methodology rests on a comprehensive mathematical theory that provides conditions for the consistency of parameter estimates, correct identification of model orders, and the asymptotic behavior needed for reliable predictions and control Of particular interest are hypothesis tests that have become widely used theoretical tools for validating time series models In this context, the residual autocorrelation tests of Box and Pierce, and of Ljung and Box, play a central role, with Ljung and Box offering a small-sample correction to the original Box-Pierce approach to improve performance on limited data.

These contrasts are based on the following steps: after estimating the model, the residuals ε̂_t are obtained, their sample autocovariances γ̂_j are computed, and the Ljung-Box statistic Q_k is formed from these autocovariances Under the null hypothesis that the model is correctly specified, Q_k is asymptotically distributed as a chi-square with k − r degrees of freedom, where r is the number of estimated parameters The practical requirement that k be large is typically expressed by using a sufficiently large number of lags to ensure the chi-square approximation is reliable.

Similar to [17], there is no precise statement of the test's asymptotic properties, nor are all of the required hypotheses stated Consequently, we reach the first contribution of this work.

Contribution 1: We specify, with precise asymptotic properties, the hypotheses underlying residual autocorrelation tests Clarifying the convergence in greater detail has direct practical consequences for how the maximum lag included in the test is chosen, affecting the reliability and power of residual autocorrelation analysis in time-series models.

Throughout the 1970s and into the early 1980s, many of the theoretical results that existed for univariate ARMA models were adapted to the multivariate case In particular, Dunsmuir and Hannan ([18]) and Deistler, Dunsmuir, and Hannan ([19]) demonstrated the asymptotic properties of multivariate ARMA models.

Chapter 7, Section 2.3 and Chapter 3, Section 4.1 of the thesis on maximum-likelihood estimators for linear models parameterized by a finite number of parameters, and in particular for VARMA models, present results that require a detailed description of the topology of linear models (see [20]) These results address the behavior and properties of ML estimators under finite-parameter linear specifications and highlight the specialized considerations needed for VARMA structures.

Multivariate residual autocorrelation contrasts were also extended to the vector case The first multivariate version was proposed by Hosking, with its asymptotic properties studied by Poskitt and Tremayne These works retain the same level of imprecision noted in the univariate case In the literature, these multivariate tests are typically called Portmanteau tests (they are sometimes referred to as the univariate version as well) Later, in 1988, Ahn showed that the Portmanteau test could be applied to VAR models with restrictions, provided these restrictions affect only the coefficients of the autoregressive part and not the covariance matrix of the disturbances In this setting, the degrees of freedom are computed by subtracting the free parameters: the number of included autocorrelations minus the number of parameters plus the number of restrictions The literature generally treats this result as valid for restricted VARMA models as well, as reflected in standard references such as Lütkepohl, though it is not always proven However, as we will show, the Portmanteau test cannot be applied in general to VARMA models with mixing-type restrictions that simultaneously affect AR/MA coefficients and the covariance matrix; by contrast, the maximum likelihood estimation theory for linear models is fully applicable to that case, which has practical implications for model assessment.

Dynamic factor models (DFMs) are equivalent to VARMA models under certain restrictions that correspond to mixing-type constraints Consequently, a key diagnostic tool used for linear models cannot be applied to DFMs This leads to the second contribution of the thesis.

Contribution 2 sets out the conditions under which the Portmanteau test can be applied under mixing restrictions and introduces an extended Portmanteau test as an alternative when those conditions cannot be satisfied; it also demonstrates how this approach applies to the DFM 8 case.

Beyond the traditional contrast, our extended contrast applies to other model families Notably, these include Stock and Watson's factorial structural vector autoregression (FSVAR) model [25] and the Peña and Box model [26].

Aproximaci´ on por funciones racionales

As noted earlier, you can work without assuming the transfer function is rational In that case, an ARMA model is only an approximation of the true dynamics of the process Suppose a process x_t can be represented as an infinite-order VAR: x_t = ∑_{k=1}^{∞} A_k x_{t−k} + ε_t, and we estimate a finite VAR model of the form x_t = ∑_{k=1}^{p} Φ_k x_{t−k} + ε_t The estimated model then deviates from the truth in two ways, related respectively to the concepts of truncation error and estimation (or misspecification) error The truncation error arises from omitting higher-order lags beyond p, while the estimation error reflects the uncertainty in estimating the finite set of coefficients Φ_k from data.

Approximation: we seek the VAR(p) model that most closely approximates the true data-generating process If we knew the exact dynamics of the true model, we could determine the optimal order p by minimizing a chosen distance measure between models, thereby selecting the specification that best captures the underlying dynamics.

8 Cap´ıtulo 2. especificaremos Pero en lugar de la din´amica verdadera tenemos una muestra del proceso, con lo que entra en juego el concepto siguiente.

Estimation involves selecting the coefficients of the approximate model as those that optimize a chosen objective function based on the sample data This can mean maximizing the likelihood function or minimizing the sum of squared errors.

Este marco es muy distinto del de la estimaci´on bajo la hip´otesis de que el proceso realmente satisface un modelo de la clase donde estimamos En particular, la parte de aproximaci´on y la de estimaci´on tienen consecuencias opuestas respecto a la elecci´on del orden del modelo p Un p grande es bueno desde el punto de vista de la aproximaci´on, puesto que en una clase de modelos m´as grandes es posible acercarse m´as al modelo verdadero Sin embargo, desde el punto de vista de la estimaci´on,pgrande es malo, porque deteriora la relaci´on entre el n´umero de par´ametros y el de observaciones, haciendo que las estimaciones sean m´as ruidosas De este modo, la elecci´on del orden del modelo consiste en poner en la balanza ambos efectos.

La teor´ıa sobre esta cuesti´on generalmente propone condiciones al cre- cimiento de p en funci´on de T La condici´on p/(TlogT) 1/2 → 0 garantiza consistencia ([27]) y p 3 /T → 0 garantiza normalidad para combinaciones lineales de los coeficientes ([28] y [29]) Si queremos ir m´as all´a y probar otras propiedades asint´oticas de los estimadores, como la normalidad con- junta del vector de par´ametros, nos tropezamos con una dificultad En la mayor´ıa de situaciones, los par´ametros del modelo se pueden representar conjuntamente con un vector en un determinado espacio vectorial de dimen- si´on d, R d Entonces, podemos aplicar resultados te´oricos como distintosTeoremas Centrales del L´ımite Sin embargo, si el orden del modelo p di- verge, no podemos encerrar los par´ametros en ning´un espacio vectorial de dimensi´on finita Este problema de la dimensi´on creciente aparece en otras situaciones (por ejemplo, en [30] y en los modelos factoriales aproximados anteriormente mencionados) y obliga a emplear t´ecnicas alternativas Los resultados anteriores sobre normalidad se limitan a demostrar que ciertas combinaciones lineales de los coeficientes son asint´oticamente normales. Con objeto de probar la normalidad asint´otica conjunta del vector de par´ametros, desarrollamos un resultado previo que da lugar a la tercera de las aportaciones de esta Tesis:

Contribution 3: We provide an upper bound on the rate at which a sequence of martingales with increasing dimension converges to normality To achieve this, we generalize a previous result for martingales in Banach spaces.

Aplicamos este resultado a la normalidad de los par´ametros estimados de un modelo AR de ordenp que tiende a infinito con T, lo que da lugar a la cuarta aportaci´on.

Contribution 4: We provide growth conditions under which the estimators are asymptotically jointly normal, enabling asymptotic results to be applied across problems This approach can also be used for other issues; for example, revisiting the residual autocovariance contrasts introduced in section 1.1.2, the standard way to show that the statistic Q_k follows a chi-square distribution is to prove that the vector of autocovariances is asymptotically normal However, for this to hold, the dimension k must grow without bound with the sample size T, which means traditional martingale asymptotic normality results cannot be applied, likely explaining why the properties of the contrast were not stated precisely In this thesis, leveraging our Contribution 3, we achieve the fifth contribution.

Aportaci´on 5: Damos condiciones sobre el crecimiento dekpara

10 Cap´ıtulo 3, apartado 4.2. que el estad´ıstico Q k se distribuya asint´oticamente como una χ 2 cuando k y T divergen simult´aneamente 11

Without the theoretical tool we introduced in Contribution 3, we could only test the sequential result, meaning we take the limit T → ∞ first and then the limit k → ∞ This is less relevant in practice, because in reality we never wait for T to reach infinity to choose k By imposing joint growth conditions on k and T, we obtain guidance on selecting k as a function of T, a choice that is typically determined by the surrounding circumstances.

Selecci´ on de modelos para predicci´ on

So far we have described diagnostic models and tools without distinguishing between the intended uses of the model The theory addressed in this section is oriented specifically toward linear models as instruments for predicting certain processes As noted at the outset of this introduction, prediction is a core function of time-series models and one of the most relevant functions.

Anyway, before diving into the question of prediction, let's pause to consider how to choose a linear model for a multivariate time series, say x_t = (x^0_t, , x^n_t) One way to do this is to use a model selection criterion The most well-known criteria have the following form:

−2 logL+kg(T) (1.3) dondeLes la verosimilitud del modelo,kes el n´umero de par´ametros libres,

T es el n´umero de observaciones yg(ã) es una funci´on no decreciente Si eli- gi´eramos el modelo que hace menor el primer t´ermino de (1.3), estar´ıamos

In the search for the best fit, the second term penalizes model complexity The penalty function g describes how this complexity penalty changes as the number of observations grows When using Gaussian likelihood, equation (1.3) can be rewritten as log σ̂^2 + k g(T), linking the estimated variance to the complexity penalty through the function g and the data size T.

Let T denote the sample size and ˆσ^2 the estimated variance of disturbances The most common choices for the model selection criterion are y_g(T) = log T for the Bayesian Information Criterion (BIC) and y_g(T) = 2 log log T for the Hannan–Quinn (HQ) criterion These two criteria yield consistent model identification under the hypothesis of a rational transfer function The Akaike Information Criterion (AIC) with g(T) = 2 is inconsistent for rational models but provides asymptotically optimal predictions in the non-rational case (see references on spectral optimality) Other tools, such as goodness-of-fit tests or likelihood ratio tests, exist but have significant limitations: using them to select among several models would require sequential contrasts and control of the overall error rate, which is practically infeasible; thus tests are more appropriate as diagnostic aids after model selection or when the model set is already small For this reason, automatic modeling approaches like TRAMO rely on selection criteria (for example, BIC in that program).

If our goal is to find a model that reliably predicts x0_t, we shouldn't restrict our experiments to a single multivariate series x_t = (x0_t, , x_n_t) Instead, we can explore multiple predictive configurations by building multivariate series from different subsets of variables, or by using univariate series drawn from the available variables, to identify which combinations offer the strongest forecasting performance.

Even if we could obtain the true model and exact parameter values, the decision about which series to include would be irrelevant, since predictions based on more variables would be at least as good as those based on fewer In practice, models are estimated and thus imperfect; using a more complex model like a vector autoregression (VAR) increases estimation uncertainty and can harm predictive performance if the extra complexity does not improve forecasts Conversely, adding more series can sometimes improve predictive accuracy Therefore, akin to selecting ARMA orders carefully rather than letting the order grow large enough to contain the true model, the aim here is to identify the minimal subset of variables that are genuinely relevant for prediction, balancing model complexity and accuracy.

As´ı el proceso de selecci´on del modelo se podr´ıa descomponer en dos partes:

(A) elegir el subconjunto de variables, y

(B) elegir el modelo adecuado para ellas.

Until now, the most common way to address this issue has been to treat models independently of the subset of variables for which they are defined and to use predictive-capacity contrasts The literature on predictive-capacity contrasts takes off with a seminal article that laid the groundwork for this approach.

Diebold and Mariano present foundational ideas for predictive accuracy tests, notably the use of out-of-sample predictive comparisons The test they propose targets equal predictive ability: the null hypothesis states that two competing models have no difference in predictive accuracy, as measured by a specified loss function The test statistic resembles a t-statistic: the numerator is the average loss differential between the models, and the denominator is the estimated standard error (the square root of the estimated variance) of that differential In essence, the DM test assesses whether observed differences in forecast losses are statistically significant, indicating one model's predictive performance dominates the other.

Unfortunately, it has been shown that the Diebold–Mariano statistic yields a standard normal limiting distribution only when the models being compared are non-nested; for nested models, it converges to a distribution that is a functional of a Brownian motion (Clark and McCracken, [40]) The theory on how to treat nested model comparisons has undergone substantial development, primarily [39].

As discussed earlier, selecting a model from many possibilities using contrasts can be problematic In fact, Chapter 4 shows that the successive application of contrasts makes the choice of the optimal subset inconsistent with probability one To address this difficulty, we propose a family of selection criteria similar to (1.4), but with the distinction that they serve to decide not the model itself (the second part of the decision, per the decomposition presented above), but rather the subset of series for which the model is built (the first part of the decision) This constitutes the final contribution.

Contribution 6 introduces a family of criteria for selecting the optimal subset of time series and proves their statistical consistency It also compares the performance of these criteria with several predictive-capacity benchmarks to demonstrate their effectiveness in subset selection and predictive accuracy.

Estructura de la tesis

Cap´ıtulo 2: Contraste de portmanteau extendido

The main objective of this chapter is to propose and evaluate a goodness-of-fit test based on the autocorrelation of residuals for VARMA models with nonlinear constraints This test is a modification of Hosking's multivariate portmanteau test, originally introduced by Hosking, and its statistic is designed to capture residual autocorrelation in the multivariate setting to assess model adequacy.

, donde ˆC j es la j-´esima matriz de autocovarianza residual.

The classical portmanteau statistic (which we call the portmanteau statistic here to distinguish it from our own version) has an asymptotic chi-square distribution when applied to the residuals of unrestricted VARMA models or VAR models with restrictions However, the convergence of the statistic’s distribution breaks down when estimating models with restrictions that simultaneously affect the coefficients of the innovation covariance matrix and the autoregressive and moving-average parts (i.e., mixing restrictions) These restrictions, which may seem artificial, complicate the asymptotic behavior and can undermine standard inference based on a chi-square limit.

These ideas appear naturally when considering Dynamic Factor Models, which can be viewed as VARMA models with mixing restrictions, a type of restriction that also occurs in state-space models To understand the relation between VARMA models with restrictions and factorial or state-space models, note that a VARMA whose coefficients are functions of a parameter set—i.e., a structured parameterization—can be interpreted as a VARMA with a restriction given by belonging to the image of the parameterization Conversely, by the implicit-function theorem, a model with restrictions can be understood as a model with structured parameterization Section 3 details the equivalence between structured parameterization and restrictions.

Restrictions on the covariance matrix of innovations can be assessed by evaluating whether the residual covariance matrix is compatible with those restrictions, providing a natural goodness-of-fit check for the model The proposed Q*_k statistic offers two advantages: (a) it directly tests the incompatibility between the imposed restrictions and the observed residual structure, and (b) unlike the classical portmanteau statistic, it converges to a chi-square distribution under mixing restrictions The exact form of the statistic is designed to quantify this incompatibility and enable straightforward inference.

, (1.5) donde ˆΣ es la matriz de covarianzas estimada bajo las restricciones Tambi´en consideramos una variante de (1.5) en la que ˆC0 es sustituida por ˆΣ en el segundo t´ermino.

A second objective of this chapter is to sharpen the theory of convergence for the classical statistic by clarifying the hypotheses and the asymptotic behavior Previous theoretical results in the literature are vague in both specifying the required assumptions and describing the meaning of asymptotic convergence, since they focus only on convergence in T while neglecting the role of k Accordingly, the article makes two theoretical contributions: first, Theorem 2.1 provides a proof of the convergence in distribution of our statistic in the general case; second, Theorem 2.2 specifies the conditions under which the classical statistic can be employed Together, these results extend the understanding of when and how the classical statistic is valid in more general settings, beyond convergence in T alone.

In the following sections we analyze the application of contrasts to several model classes: dynamic factor models (DFMs), Peña and Box’s model [26], and the vector autoregression with a factorial structure as proposed by Stock and Watson [44] To demonstrate that all of these fall within the scope of our results, we introduce a more general class of models called the Factorial Impulse Model (MIF), which differs from VARMA in that the moving-average component has a factorial structure.

A continuaci´on comprobamos que: (a) los MIF cumplen las condiciones re- queridas y (b) todos los modelos anteriores se pueden expresar como MIF con restricciones.

There remains a theoretical difficulty to resolve Factor models are identified only up to linear transformations of the common factors; that is, a process that can be represented by a given dynamic factor model (DFM) may also admit alternate representations whose common factors are linear combinations of those in the original DFM We show that this indeterminacy does not prevent the asymptotic distribution of the statistic from being chi-square, but it does require introducing a correction to the number of degrees of freedom.

We apply the statistical contrast to simulated data to verify that the empirical rejection rate is close to the theoretical value under the null hypothesis and to compare the contrast’s power with that of the classical test across multiple alternative hypotheses Finally, we present an application to Spain’s Industrial Production Index (IPI) data, illustrating the method’s practical performance on real-world economic time-series.

Cap´ıtulo 3: Alejamiento de la normalidad de martin-

galas de dimensi´on creciente

This chapter is divided into two parts: first, Sections 2 and 3 introduce a result on the rate of convergence to normality for a certain class of martingales, and then, Section 4 applies this tool to two inference problems for multivariate time-series models.

These martingales are indexed as X_{n,i} for i = 1, , n, forming a triangular array Each X_{n,i} is a vector whose dimension depends on n, and we are particularly interested in the regime where this dimension grows without bound When the dimension remains finite, traditional tools apply, but the focus here is on the infinite-dimensional limit Concretely, we aim to obtain bounds on the distance between the distribution of the average of the differences and the corresponding normal distribution in the associated (growing) dimension.

Hay muchas maneras de medir la distancia entre dos distribuciones, pero para nuestros prop´ositos, la m´as conveniente es la m´etrica de Proj´orov (ver

Reference [45] is the most exhaustive reference on distribution metrics On the other hand, the Kantorovich metric provides a bound for the Prohorov metric and has favorable properties Because of this, we take as the starting point for our work the results of Rachev and Rüschendorf ([46]) for martingales in Banach spaces.

We establish a modified version of the principal theorem in [46], and then use this result to bound the Kantorovich distance between the distribution of the mean of the martingale differences and the corresponding multivariate normal distribution More precisely, we prove that when the martingale dimension does not grow too quickly, the distance decays at a polynomial rate with respect to the number of terms.

One primary application of these results is in residual autocorrelation tests As noted earlier, the theory behind these contrasts is often stated in somewhat vague terms It is generally asserted that the statistic Q_k, where k denotes the order of the autocorrelation, is approximately chi-square with d(k) degrees of freedom, where d is a function specified for large k and large sample size T The argument rests on Q_k being a sum of quadratic functions of a martingale-difference mean plus terms that vanish as T → ∞ The Central Limit Theorem is then invoked, but the challenge is that the asymptotic covariance matrix is only approximately idempotent when k is large Consequently, convergence results must account for the limiting behavior in both k and T.

El primer resultado de este tipo parece ser el teorema 2.1, pero ah´ı el limite es sucesivo, primero en T y despu´es en k Este tipo de propiedad asint´otica no es la m´as conveniente, sino que es m´as adecuado un resultado que asegure la convergencia cuando k, T → ∞y se cumpla alguna relaci´on entre k y T, ya que esto da una indicaci´on sobre c´omo elegir k cuando

T est´a determinado de antemano por la disponibilidad de los datos La relevancia de las propiedades asint´oticas del estad´ıstico se debe a que el nivel de significaci´on emp´ırico del contraste se aproxima mejor o peor al te´orico seg´un la distribuci´on del estad´ıstico est´e m´as cerca de su l´ımite o menos Por tanto, una buena propiedad ser´ıa que el error debido a emplear la regi´on de rechazo te´orica en lugar de la verdadera tienda a cero SiFT(x) es la funci´on de distribuci´on del estad´ıstico yG T (x) es la de una chi-cuadrado con el n´umero correspondiente de grados de libertad, entonces queremos que para un ciertop,

Este es justamente el resultado que demostramos.

The second application concerns inference for autoregressive models when the true process is AR(∞) If you estimate an autoregressive model of order k for a time series of length T generated by an AR(∞) process, the order should not grow too quickly relative to T to preserve estimation properties It has been shown that if k grows so that k^3/T → 0 and the power series is absolutely summable, then the estimators are asymptotically normal Unfortunately, this property has been established for a linear combination of the parameter vector rather than for the entire vector itself If φ̂(k) denotes the estimated coefficient vector and y_l(k) is a sequence of vectors satisfying certain conditions, then l(k)ᵗφ(k) is asymptotically normal By contrast, we prove that the distribution of φ̂ is close to a multivariate normal, which allows, for example, constructing confidence regions for the parameters.

Cap´ıtulo 4: Determinaci´ on de la secci´ on cruzada ´ opti-

ma en el sentido del error medio cuadr´atico de predic- ci´on

This chapter proposes a family of criteria to extract, from a set of time series, the optimal subset for predicting a given target series By optimal here we mean the subset that minimizes the mean squared forecast error (MSFE) at a horizon h, and among all subsets achieving the same MSFE, the smallest possible subset The theoretical part introduces the criteria and proves their consistency under certain assumptions; we then assess their performance via simulations, comparing them with other well-known methods, and finally illustrate the approach with a real-world case drawn from the forecasting literature.

To construct our criteria, we adopt a model-selection philosophy similar to well-known criteria such as Akaike, Schwarz (BIC), and Hannan–Quinn, but instead of relying on the likelihood, we base decisions on the mean squared forecast error, denoted as σ̂^2_h(I) Here I represents a particular subset of the available variables The form of these criteria blends the predictive error for a given horizon with a penalty that reflects the complexity of the chosen variable subset I, mirroring the spirit of traditional model-selection approaches.

T , dondeδ(I) es una medida del tama˜no de I,T es el n´umero de predicciones yS T es una funci´on creciente que determina si la selecci´on es m´as o menos parsimoniosa.

Here we show that the subset selected by any of these criteria converges to the optimum under certain assumptions These assumptions are fairly mild, since they permit a broad class of models as long as they are estimated consistently, and they can be either nested or non-nested They also allow the criteria to be computed with predictions that are post-model-selection as well as in-sample By contrast, the models must be well-specified.

A modo de ejemplo, mostramos que estas hip´otesis se cumplen para modelos VARMA.

Next, we present two generalizations The first demonstrates that the method works when subsets are selected from a random class This scenario arises when a data-driven preselection is performed among all possible subsets.

The second generalization presents a version of the criteria tailored to the scenario where the objective is predicting multiple time series In this setting, the usual prediction mean squared error is replaced by a specific function of the covariance matrix of the predictions We show that, if this function satisfies certain conditions, the consistency observed in the univariate case extends to the multivariate context, preserving the univariate consistency when predicting more than one series.

In the simulation section, we compare the probability of selecting the optimal subset using our criteria with the probability obtained from several hypothesis tests: the Diebold and Mariano test; Clark and McCracken’s ENC-T and ENC-NEW tests; the conditional Giacomini-White test; and the Granger causality test To do this, we generate 5,000 realizations of various bivariate processes, arranged so that in some cases the optimal subset contains only the series to be forecast, and in others it also includes the second series We also assess the method’s performance under misspecification The results indicate that our criteria improve some of the tests and are not outperformed by any of them.

To conclude, we apply the same methods to real data Following [40], we assess whether the underlying U.S inflation can be better forecast using the unemployment rate Our criteria, like those of Clark and McCracken and unlike those of Diebold and Mariano and Giacomini and White, indicate that the unemployment rate is indeed a useful predictor of inflation one quarter ahead.

The central idea behind this thesis is to provide tools for tackling the challenges that arise as the dimensionality of multivariate time series increases To address this issue, two distinct strategies emerge: one is to select models whose complexity grows in a controlled way with the dimension, and the other is to keep the dimensionality itself small.

Regarding the first strategy, we present a goodness-of-fit test for dynamic factor models (DFMs) that can be applied to high-dimensional time series and contrasted with vector autoregression (VAR) models Regarding the second, we propose a family of variable-selection criteria that extract from a larger set of variables an optimal subset, optimized for predictive accuracy by minimizing the mean squared error of the target variable.

Moreover, the techniques we have developed to achieve these two results have also enabled us to advance the theory of autocorrelation-contrast tests for VARMA models and to improve the estimation of approximate autoregressive models These methodological advances strengthen the overall time series analysis framework and provide more robust tools for researchers working with complex dynamic systems.

[1] Yule, G U., (1927), On a method of investigating periodicities in dis- turbed series, with special reference to Wolfer’s sunspot numbers,Philo- sophical Transactions of the Royal Society of London Series A, 226, 267–298.

[2] Hosking, J R M (1984), Modeling persistence in hydrological time series using fractional differencing,Water resources research, 20, 1898– 1908.

[3] Kaplan, D (2008), Univariate and Multivariate Autoregressive Time Series Models of Offensive Baseball Performance: 1901-2005, Journal of Quantitative Analysis in Sports, 4.

[4] Sims, C A (1980), Macroeconomics and reality, Econometrica, 48, 1– 48.

[5] Box, G E P y Jenkins, G M (1976),Time Series Analysis:Forecasting and Control, Holden-Day, San Francisco.

Sargent and Sims (1977) argue for business cycle modeling that avoids overreliance on a priori economic theory, promoting a data‑driven, flexible framework that uses reduced‑form representations to uncover cyclical dynamics and to let empirical relationships speak for themselves Their approach emphasizes minimal theoretical restrictions and a focus on what the data reveal about economic fluctuations, foreshadowing later methodological shifts toward VAR‑style and other evidence‑based macroeconomic tools Presented in New Methods in Business Cycle Research (C Sims et al., Minneapolis: Federal Reserve Bank of Minneapolis), this perspective highlights modeling choices that prioritize observed dynamics over prescriptive theory, enabling researchers to infer the mechanisms of business cycles from the data.

[7] Geweke, J (1977), The Dynamic Factor Analysis of Economic Time Series, en D J Aigner y A S Goldberger (eds.) Latent Variables in Socio-Economic Models Amsterdam: North Holland Cap 19.

[8] Chamberlain, G y Rothschild, M (1983), Arbitrage, Factor Structure and Mean-Variance Analysis in Large Assets Markets Econometrica

[9] Bai, J y Ng, S (2002), Determining the Number of Factors in Approx- imate Factor Models Econometrica 70, 191–221.

[10] Bai, J (2003), Inferential Theory for Factor Models of Large Dimen- sions Econometrica 71, 135–171.

[11] Onatski, A (2009), Testing hypotheses about the number of factors in large factor models Econometrica 77, 1447–1479.

[12] Forni, M., Hallin, M., Lippi, F y Reichlin, L (2000), The General- ized Dynamic Factor Model: Identification and Estimation Review of Economics and Statistics 82, 540–554.

[13] Forni, M., Hallin, M., Lippi, F y Reichlin, L (2004), The Generalized Dynamic Factor Model: Consistency and Rates Journal of Economet- rics 119, 231–255.

[14] Forni, M., Hallin, M., Lippi, F y Reichlin, L (2005), The Generalized Dynamic Factor Model: One-Sided Estimation and Forecasting.Journal of the American Statistical Association 100, 830–840.

[15] Hannan, E J y Deistler, M (1988), The Statistical Theory of LinearSystems, New York: John Wiley and Sons.

[16] Box, G E P y Pierce, D (1970), Distribution of Autocorrelations in Autoregressive Moving Average Time Series Models Journal of the American Statistical Association 65, 1509–1526.

[17] Ljung, G M y Box, G E P (1978), On a Measure of Lack of Fit in Time Series Models Biometrika 65, 297–303.

[18] Dunsmuir, W y Hannan, E J (1976), Vector linear time series models. Advances in Applied Probability 8, 339–364.

[19] Deistler, M y Dunsmuir, W y Hannan, E J.(1978), Vector linear time series models: corrections and extensions Advances in Applied Proba- bility 10, 360–372.

[20] Deistler, M y P¨otscher, B M (1984), The behaviour of the likelihood function for ARMA models Advances in Applied Probability 16, 843– 865.

[21] Hosking, J R M (1980), The Multivariate Portmanteau Statistic. Journal of the American Statistical Association 371, 602–608.

[22] Poskitt, D S y Tremayne, A R (1982), Diagnostic Tests for Multiple Time Series Models The Annals of Statistics 10, 114–120.

[23] Ahn, S K (1988), Distribution for Residual Autocovariances in Mul- tivariate Autoregressive Models With Structured Parameterization. Biometrika 75, 590–593.

[24] L¨utkepohl, H (1991), Introduction to Multiple Time Series, Berlin: Springer-Verlag.

[25] Stock, J H y Watson, M W (2005), Implcations of Dynamic FactorModels for VAR Analysis, NBER WP 11467.

[26] Pe˜na, D y Box, G E P (1987), Identifying a Symplifying Structure in Time Series Journal of the American Statistical Association 82, 836– 843.

[27] Hannan, E J y Kavalieris, L (1986), Regression, autoregression mod- els Journal of Time Series Analysis 7, 27–50.

[28] Berk, K N (1974), Consistent autoregressive spectral estimates The Annals of Statistics 2, 489–502.

[29] Lewis, R y Reinsel, G C (1985), Prediction of multivariate time series by autoregressive model fitting Journal of Multivariate Analysis 16, 393–411.

[30] Mammen, E (1989) Asymptotics with increasing dimension for the robust regression with applications to the bootstrap The Annals of Statistics 17, 382–400.

[31] Hannan E J y Quinn B G (1979), The determination of the order of an autoregression Journal of the Royal Statistical Society B 41, 190– 195.

[32] Schwarz G (1978), Estimating the dimension of a model.The Annals of Statistics 6, 461–464.

[33] Akaike H (1973), Information Theory and an extension of the Maxi- mum Likelihood Principle, in: B N Petrov and F Csaki, (eds.),Second international symposium on information theory Academiai Kiado: Bu- dapest, pp 267–281.

[34] Akaike, H (1974), A new look at the statistical model identification.IEEE Transactions on Automatic Control 19, 716–723.

[35] Shibata, R (1980), Asymptotically efficient selection of the order of the model for estimating parameters of a linear process.The Annals of

[36] Shibata, R (1981), An optimal autoregressive spectral estimate The

[37] Maravall, A (2008) Notes on programs TRAMO and SEATS c www.bde.es/webbde/es/secciones/servicio/software/tramo/Part II Tramo.pdf

[38] Diebold F y Mariano R (1995), Comparing Predictive Accuracy.Jour- nal of Business and Economics Statistics 13, 252–263.

[39] Harvey, D I., Leybourne, S J y Newbold, P (1998), Tests for Forecast

Encompassing Journal of Business and Economic Statistics 16, 254–

[40] Clark T E y McCracken M W (2001), Tests of Equal Forecast Ac- curacy and Encompassing for Nested Models, Journal of Econometrics

[41] Clark T E y McCracken M W (2005), Evaluating Direct Multistep

[42] Clark T E y West K D (2007), Approximately Normal Tests for Equal

Predictive Accuracy in Nested Models, Journal of Econometrics 138,

[43] Clark T E y McCracken M W (2011), Reality Checks and Compar- isons of Nested Predictive Models,Journal of Business and Economics

[44] Stock, J H y Watson, M W (2003), Understanding Changes in Inter- national Business Cycle Dynamics NBER Working Paper No W9859.

[45] Rachev, S T (1991), Probability metrics and the stability of stochastic models Wiley, Nueva York.

[46] Rachev, S T y R¨uschendorf, L (1994) On the rate of convergence in the CLT with respect to the Kantorovich metric In Probability in Banach spaces (Hoffmann- Jorgensen, J., Kuelbs, J y Marcus, B., eds.) 9, 193–207 Birkh¨auser, Londres.

[47] Giacomini R y White H (2006), Tests of conditional predictive ability. Econometrica 74(6), 1545–1578.

[48] Granger C W J (1969), Investigating causal relations by econometric models and cross-spectral methods Econometrica 37, 424–438.

An extended portmanteau test for VARMA models with mixing nonlinear constraints *

Residual autocorrelation diagnostics remain among the most widely used tools for evaluating time-series models In the univariate setting, residuals from estimated ARMA models are assessed with Box–Pierce and Ljung–Box tests to gauge goodness of fit This paper shifts the focus to multivariate ARMA processes, modeled as y_t = Φ_1 y_{t-1} + + Φ_p y_{t-p} + ε_t + Θ_1 ε_{t-1} + + Θ_q ε_{t-q}, where y_t ∈ R^n, E[ε_t] = 0, and E[ε_t ε_t^T] = Σ The multivariate ARMA representation can be written compactly as Φ(B) y_t = Θ(B) ε_t, with Φ(z) = I − Φ_1 z − − Φ_p z^p and Θ(z) = I + Θ_1 z + + Θ_q z^q, where B denotes the backshift operator.

* Journal of Time Series Analysis, 29, (2008) 741–761

If T residuals ˆε t are available, we can compute the residual covariances

Cˆj ˆ εtεˆt+j/T, and then, the classical multivariate portmanteau statistic is obtained as,

When the coefficients in (2.1) are estimated without restrictions, Hosking (1980) and Li and McLeod (1981) showed that Q_k follows, asymptotically, a chi-square distribution with (k − p − q) n^2 degrees of freedom Ahn (1988) extended this result to the autoregressive case where the matrices are parameterized by a vector, i.e., when Φ(z) is constrained to lie in the image of the parameterization In that setting, the asymptotic distribution remains chi-square but with kn^2 − b degrees of freedom, where b denotes the number of free parameters.

All matrices in (2.1), including Σ, depend on the ab×1 vector β When it is necessary to highlight this dependency, we write Σ(β), Φj(β), Θj(β), Φ(β) and Θ(β) to indicate that these are coefficient dependencies on β rather than polynomials evaluated at β Thus, the entire system including the covariance matrix is constrained to a specific class of models The generalization of the asymptotic distribution results of Qk to this case is not possible in general.

Since Σ cannot be freely estimated, goodness-of-fit should also be assessed using the zero-lag covariances If β̂ is an estimator of the true β, it is convenient to quantify how much the estimated covariance Σ̂ = Σ(β̂) deviates from the covariance implied by the model at the true parameter, Σ(β) This deviation provides a concise diagnostic of how well the parameter estimates recover the instantaneous dependence structure that the model imposes.

Cˆ 0 , so instead of (2.2) we propose,

Under the null hypothesis, ˆΣ and ˆC 0 are both consistent estimates of Σ,and thus any of them can be used in the second term of (2.3) Then, we can

An extended portmanteau test for VARMA models with

Extended Portmanteau Statistic

Let us assume the following,

A1 Φj( ˙β), Θ j ( ˙β) and Σ( ˙β)>0 aren×ntwice continuously differentiable functions of ab×1 vector ˙β in a set Ω.

Within the interior of Ω, for β̇ = β_in, the stationary process y_t is generated by equation (2.1) The innovation ε_t is an i.i.d process whose fourth‑order moments depend on the first and second‑order moments in the same way as those of a multivariate Gaussian distribution.

Under A3, we can also define Ψ(z) = Φ(z) − 1 Θ(z) and Π(z) = Θ(z) − 1 Φ(z). Let us call φ = vec(Φ ), θ = vec(Θ ), π = vec(Π ), ψ = vec(Ψ ), σ= vec(Σ), π ∗ = [σ , π ] and ψ ∗ = [σ , ψ ]

Under A3, assumption A4 is equivalent to∂ψ ∗ /∂β being of rankb A4 ensures uniqueness ofβ in a neighborhood of the true (Φ,Θ,Σ).

Theorem 2.1 Let F k,T ∗ (x) be the distribution function of R ∗ k or Q ∗ k and χ 2 ν,α the1−α quantile of a chi-square distribution withν degrees of freedom. Then, under A1-A4, there exists someρ∈(0,1) such as, lim sup

Equation (2.5) shows that T times the absolute difference between F_{k,T}^*(χ^2_{n(n+1)/2} + k n^2 − b, α) and (1−α) is of order O(ρ_k) Note that this bound does not force lim_{k,T} |F_{k,T}^*( ) − (1−α)| to vanish In practical applications, the sample size T is fixed and one must select a sequence k(T) The objective is to choose k(T) so that, as T → ∞ with k = k(T), the asymptotic behavior of the bound remains controlled and the inference remains reliable.

The sequence converges to zero as fast as possible The difficulty in obtaining convergence rates for the Central Limit Theorem prevents us from giving a precise proof of the optimal k(T) Nevertheless, O_p(T−1) terms appear in the proof of Theorem 2.1, so no choice of k(T) can make the convergence faster than T−1 The minimum k(T) for which O(ρ^{k(T)}) has order T−1 is k(T) = −log(T)/log(ρ) The coefficient ρ depends on the decay rate of Ψ(z); therefore, if |Φ(z)| has roots near the unit circle, greater values of k are required.

The Classical Portmanteau

Using the portmanteau (2.2) requires additional assumptions A sufficient condition is that ˙β can be decomposed as (˙β0, β˙1) so that ˙Σ depends on ˙β0 while the terms {Φ˙j, Θ˙j} depend on ˙β1, i.e., a non-mixing parameterization This assumption can be relaxed to allow mixing dependence if the derivatives of ˙Σ with respect to ˙β1 and the derivatives of {Φ˙j, Θ˙j} with respect to ˙β0 vanish at the true values Under these conditions the parameterization is asymptotically non-mixing as T → ∞ and ˆβ converges to β.

An equivalent parameterization can be defined by tilde Σ(λ), tilde Φ(λ), and tilde Θ(λ), which are equivalent to Σ(β), Φ(β), and Θ(β) whenever there exists a diffeomorphism h from Λ onto Ω such that tilde Σ(λ) = Σ(h(λ)), tilde Φ(λ) = Φ(h(λ)), and tilde Θ(λ) = Θ(h(λ)), with h(λ) = β Because the asymptotic distribution in (2.2) is the same under any two equivalent parameterizations, the existence of one parameterization that satisfies these conditions is sufficient.

Another sufficient condition is that the constraint be equivalent to two independent constraints: one that governs the ARMA coefficients and another that governs the covariance matrix This condition can also be relaxed to a single constraint on the first-order derivatives.

Let us then enunciate more accurately the additional assumptions,

A5 There is a local equivalent parameterization, ˜Σ( ˙λ), ˜Φ( ˙λ), ˜Θ( ˙λ) with λ˙ = ( ˙λ 0 ,λ˙ 1 )∈R b 0 ×R b 1 such as atλ,∂σ/∂λ˙ 1 =∂φ j /∂λ˙ 0 =∂θ j /∂λ˙ 0 0 for anyj >0.

A5’ The tangent space to the set{(σ( ˙β) , φ( ˙β) , θ( ˙β) )|β˙ ∈Ω}inβ is equal toEσ×E φ,θ whereEσ ⊂R n 2 and E φ,θ ⊂R (p+q)n 2 have dimensionsb0 andb 1 respectively.

Proposition 2.1 Assumptions A5 and A5’ are equivalent.

Any of these assumptions allows to use the classical portmanteau test.

Theorem 2.2 Let F k,T (x) be the distribution function of Q k and χ 2 ν,α as in theorem 2.1 Then, under A1-A5, there exists someρ∈(0,1) such as, lim sup

Equation (2.6) states that T|F_{k,T}(χ^2_{n(n+1)/2 + k n^2 − b_1, α}) − (1 − α)| = O(ρ_k) Note that the first term in (2.4) is asymptotically distributed as a chi-square with degrees of freedom n(n+1)/2 − b_0 Since the Gaussian assumption is required only to establish the asymptotic distribution of the zero-lag autocorrelations, it is not necessary for Theorem 2.2, making this result a generalization of the earlier work by Hosking (1980) and Ahn (1988).

Under non-mixing parameterization, the substantive part of rank condition A4 is to ensure that the derivative of ψ with respect to β1 has full rank b1 This holds, for example, when affine restrictions, together with the left-coprimeness of Φ(z) and Θ(z), define an identifiable class as in Theorem 2.7.3 of Hannan and Deistler (1988) Suppose Φ(z) and Θ(z) are left coprime If the rank is smaller than b1, there exist tangent-space directions U(z) and V(z) to the constraint manifold satisfying U(z)Φ(z) − Θ(z) = V(z) Then for any real γ one can form perturbed filters ˜Φ(z) = Φ(z) + γU(z) and ˜Θ(z) = Θ(z) + γV(z) such that ˜Φ(z) is invertible for small γ and the pair [˜Φ(z), ˜Θ(z)] remains left coprime and satisfies the same restrictions; but this perturbation shows that ˜Φ(z) and ˜Θ(z) do not define a unique model, which contradicts identifiability.

Testing Dynamic Factor Models

Dynamic Factor Models (DFMs) were introduced by Geweke (1977), Sargent and Sims (1977), and Engle and Watson (1981) and have since become a cornerstone for analyzing multivariate time series The idea is that the cross‑sectional correlations among y_{ti}, i = 1, , n, are driven by a small number of unobserved latent common factors f_t, with each series expressed as a linear combination L_i f_t plus an idiosyncratic component v_{it} The model is dynamic, positing autoregressive (or ARMA) dynamics for both the common factors and the idiosyncratic terms It can be written as y_{ti} = L_i f_t + v_{it}, Φ(B) f_t = ξ_t, ϕ_i(B) v_{it} = η_{ti}, where L_i is the loading vector for series i, f_t is the vector of common factors, and η_{ti}, ξ_t are uncorrelated white-noise processes with variances σ_i^2 and covariance Ξ, respectively.

Under the model specification, the covariance matrix of the forecasting error, Σ, cannot be any symmetric positive definite matrix; it must satisfy manifold constraints Consequently, unlike in the ordinary VARMA framework, it is as important to test residual autocorrelation as to ensure that their covariance structure is consistent with the model This covariance-consistency check strengthens diagnostics and improves the reliability of forecasts.

State-space models are typically estimated by maximum likelihood, but this approach becomes computationally prohibitive as the number of time series grows To tackle this challenge, Quah and Sargent (1993) demonstrated that a model for a large set of series can be effectively estimated by employing the Expectation-Maximization algorithm.

In the static framework, Chamberlain and Rothschild (1983) introduced approximate factor models that allow idiosyncratic components to be correlated, a setting that requires letting the cross-sectional size n tend to infinity and imposing specific conditions on the covariance matrices of the factors and the loadings These approximate models have been extended to the dynamic realm and have attracted substantial attention, with important contributions from Bai and Ng (2002), Bai (2003), Stock and Watson (1998, 2002), and Forni, Hallin, Lippi and Reichlin (2000, 2004).

2005) The relaxed assumptions of these models make them unsuited for the portmanteau test The generalized factor model of Forni et al is strongly nonparametric, and thus, even less suited to our test Consequently, in this paper we focus on the exact or strict factor models.

A related model is the one by Pe˜na and Box (Pe˜na and Box, 1987; Hu and Chou, 2004), which allows a more general structure for the dynamic factors (e g., VARMA) and assumesv i t as white noise process.

We can also consider the Factor-Structural Vector Autoregression (FS- VAR) (Stock and Watson, 2003) This model, consists of a VAR model for yt with factor structure for the innovations, Φ(B)y t =Lξ t +η t (2.10)

Thus, the correlations between variables are explained partially by the com- mon factors, and partially by the off-diagonal elements of Φ(z).

In the next subsection, we analyze a Factor Shock Model (FSM) and show that it can be represented as a constrained ARMA Section 2.4.2 demonstrates that dynamic factor models such as the DFM, Peña-Box, and FSVAR can be regarded as constrained FSMs Section 2.4.3 explains how to address the identification issues inherent in factor models, providing practical guidance for resolving them.

Consider a vector autoregressive model with common shocks given by Φ(B) y_t = Θ_c(B) ξ_t + Θ_s(B) η_t (2.11), where Φ(B) and Θ_s(B) are n×n polynomial matrices of degrees p and q, Θ_c(B) is an n×r polynomial matrix of degree q, and Θ_s(B) is diagonal; ξ_t and η_t are uncorrelated Gaussian processes of dimensions r×1 and n×1, with ξ_t capturing the common shocks and η_t the idiosyncratic shocks By allowing the free zero‑order terms to adjust, we can normalize the factor processes to unit variances, i.e., Θ_j(z) = Θ_j,0 + Θ_j,1 z + + Θ_j,q z^q for j = c, s, so the model remains parsimonious while preserving the role of the shocks.

To apply our tests to the FSM, we must confirm that it has a VARMA representation and that its parameterization satisfies assumptions A1–A4 The existence of such a VARMA representation is guaranteed by Lütkepohl.

(1984), but since regularity properties are required, we need to make our own derivation.

We can write the model (2.11) as, Φ(B)y t = ¯Θ(B)χ t (2.12) with ¯Θ(B) = [Θc(B),Θs(B)] andχ t = (ξ t , η t )

Then,y t has the following state-space representation, yt=HYt

The Kalman Filter theory (for example, Hannan and Deistler, 1988) implies that if (2.13) holds, the process y_t can be represented as y_t = [I + H(I − F B)^{-1} F K B] ε_t, where ε_t are innovations with covariance matrix V and K and V are the asymptotic Kalman gain and covariance matrix of the filter.

We can use this to analyze the VARMA representation ofy t It suffices to check that the right side of (2.11) is a MA.

If we set a_t = ¯Θ(B) χ_t, then (2.13) holds for a_t in place of y_t with Φ_j = 0 for j > 0 In this case the matrix F is nilpotent, and this has two consequences: (i) (I − F B)^{-1} has only a finite number of terms, so the Neumann series terminates after a finite number of steps; (ii) the resulting expressions become finite polynomials in B and F, which simplifies the analysis and computation of the transformed quantities.

Indeed, MA(∞) is actually a finite MA, and (i) the filter converges in a finite number of steps, so K_t equals the asymptotic K at some finite t; since K_t is obtained through algebraic operations, it is infinitely differentiable as a function of the coefficients in Θ̄, and the matrices of the MA representation are smooth as well The autoregressive part is parameterized by itself, so the conditions A1 and A2 hold.

A3 holds under the usual stability condition for Φ(z) Since Θ(z), obtained via the Kalman filter, satisfies |Θ(z)| = 0 for |z| < 1, only the unit circle |z| = 1 matters Thus it suffices to require that the spectral density matrix of the process has full rank at all frequencies Let f(ω) denote the spectral density matrix of y_t.

Let Γ(k) denote the autocovariance function of the time series y_t, and its spectral density f(ω) can be written in the frequency domain as f(ω) = (2π)^{-1} Φ(z)^{-1} Θ(z) [Φ(z)^{-1} Θ(z)]¯ ∗, with z = e^{iω} In this form, Φ(z) is the transfer polynomial and Θ(z) is a diagonal matrix of MA polynomials, and the spectrum is determined by the interaction of these components through the complex-conjugate product and convolution If the rank of ¯Θ(z) is full on the unit circle |z| = 1, then |Θ(z)| = 0 for |z| = 1, making the unit-circle behavior of the spectrum governed by the MA polynomials Therefore, a key condition is that none of the polynomials appearing in the diagonal entries of Θ(z) has roots on the unit circle (i.e., no unit-modulus roots).

Summarizing the discussion above, there exists a function fulfilling A1- A3 that maps the parameters in the FSM to the corresponding VARMA models.

2.4.2 DFM and FSVAR as constrained FSM

The FSVAR model is clearly a FSM withq = 1 It is easy to see how theDFM is related to the FSM Since Φ(z) − 1 =|Φ(z)| − 1 adj(Φ(z)) we can write

By setting |Φ(B)|f_t = adj(Φ(B))ξ_t and v_t = φ_i(B)^{-1}η_{t,i}, and substituting these into (2.7), we obtain |Φ(B)|φ_i(B)y_{t,i} = φ_i(B)L_i adj(Φ(B))ξ_t + |Φ(B)|η_{t,i}, which is a form contained in model (2.11) Importantly, allowing (2.8) and (2.9) to be ARMA structures rather than pure AR does not preclude the dynamic factor model (DFM) from being embedded in this scheme The same argument applies readily to the Peña–Box model.

Simulation Results

While a comprehensive evaluation of the test’s power across a wide range of cases falls outside the scope of this paper, we report a small set of experiments to verify the test’s asymptotic distribution under the null hypothesis and to assess its power against several alternatives.

For the null hypothesis, we have simulated with MATLAB for different values ofT,N = 2500 realizations of a process with one common factor,

Equation (2.16) defines a factor-model for y_ti as a linear combination of a common lagged factor ξ_{t−1} and lagged idiosyncratic terms η_{i,t−1}, with coefficient sequences such that θ_c,0,i1 = θ_s,0,ii = 1, θ_c,1,i1 = i/(n+2), and θ_s,1,ii = (n+1−i)/(n+2) for i = 1, ,n with n = 5, where ξ_t is the common factor and η_it are the idiosyncratic factors The model is estimated by maximum likelihood, and the likelihood calculation was programmed in C for speed Table 2.1 reports the rejection frequencies of the statistics ˜Q_k, ˜R_k* and ˜Q_k* for significance levels α = 0.1, 0.05 and 0.01 Since there is no exact criterion for the degrees of freedom of ˜Q_k, we approximate it by a χ^2_{k n^2 − 2n} distribution, under the rationale that the conditions of Theorem 2.2 are approximately met because Σ depends mainly on θ_c,0,i1 and θ_s,0,i1, and the MA coefficients θ_c,1,i1 / θ_c,0,i1 and θ_s,1,i1 / θ_s,0,i1 drive the remaining dependence.

Now, we consider a process generated with an additional common factor, [A 2 ] :y i t =θ c,0,i1 ξ t 1 +θ c,1,i1 ξ t 1 − 1 +θ c,0,i2 ξ t 2 − 1 +θ s,0,ii η t i +θ s,1,ii η t i − 1 (2.17)

ForA 2 we use the same equation asA 1 but with the coefficientsθ c,0,i2 i/5.

Finally, we generate data with two additional common factors,

We estimate the modelA 0 using data generated with A 0 −A 3 In table 2.1, we present the rejection frequencies under A0−A3 The frequencies under the null hypothesisA 0 are the expected ones.

In A1, the additional factor is difficult to detect, since its loadings are the same as the zero lag ofξ t Nevertheless, for very large series (T%6 or greater), ˜R ∗ k outperforms both ˜Q ∗ k and ˜Q k

In the A2 scenario, detecting a lack of fit is easier, so all three tests yield higher rejection probabilities even for moderately sized series (T = 8) The test based on R*_k appears to be the most powerful, with Q*_k occupying a middle position As the series grow large, the performances of R*_k and Q*_k converge, while the unstarred Q_k remains clearly weaker.

For A 3 , the increase of rejection frequencies of ˜R ∗ k and ˜Q ∗ k is significant even for Td The performance of ˜Q k is far below.

2.5.2 Detecting a lag of the common factor

In this case, null hypothesis will be that the series is generated by a model with a common factor without lags,

[B0] :y i t =θc,0,i1ξ t 1 +θs,0,iiη t i +θs,1,iiη i t − 1 (2.19) And the alternative hypothesis is the model with lagged factor,

Let y_{t,i} follow equation (2.20): y_{t,i} = θ_{c,0,i−1} ξ_{t−1} + θ_{c,1,i−1} ξ_{t−1} − 1 + θ_{s,0,ii} η_{t,i} + θ_{s,1,ii} η_{t,i−1}, with the coefficients taken from model A0 in 2.5.1 We then consider Case B2, which uses the same model as B0 but replaces ξ_t with an AR(1) process: ξ_{t,1} = φ ξ_{t−1,1} + ε_t, where ε_t are iid with variance 1−φ^2 and φ = 0.3 The rejection frequencies reported in Table 2.2 show that the power of R̃_k* exceeds that of Q̃*_k and Q̃_k, though the gaps are smaller than those observed in 2.5.1.

Real Data Example

Using the extended portmanteau test, this study models Spain’s industrial production from 1992:1 to 2005:9 with a one-factor factor model The analysis covers five series—Consumer durables, Consumer nondurables, Capital, Intermediate, and Energy—denoted x_i,t for i = 1,…,5, all working-day adjusted and available from the Instituto Nacional de Estadística (INE) A univariate examination suggests the log-transformed series are adequately described by a linear model, and by defining y_i,t = (1−B)(1−B^12) log x_i,t, the framework is a factor model with a single common factor and seasonal MA idiosyncratic components In this specification, the observed y_i,t are driven by one common factor plus seasonally correlated idiosyncratic terms, aligning with the chosen one-factor structure.

Equation (2.21) models y_it as the sum of a term driven by a common latent factor and a lagged idiosyncratic noise component: y_it = θ c,0,i1 ξ_t + (1−ϑ_s,1,i B)(1−ϑ_s,12,i B^12) η_it The η_it are independent white-noise processes with zero mean and variance σ_i^2, while the common factor ξ_t is not given a dynamic model; it is treated as another white-noise process with unit variance, independent of the η_it This specification separates global fluctuations from idiosyncratic disturbances, providing a simple yet flexible stochastic representation for y_it.

Maximum likelihood estimation yields the parameter values shown in table 2.3 For k = 24, the classical portmanteau test would accept the null hypothesis of good fit only up to the 90% level (see table 2.4) By contrast, the extended portmanteau test rejects the null hypothesis even at the 99% level, signaling misfit under the standard criterion As a result, we modify the model in light of the correlation between residuals and the common factor Concretely, we add additional lags of the common factor, arriving at a revised model.

M 1 (table 2.3) We can see that the p-value for ˜R ∗ k allows accepting the null hypothesis at 99%.

Using data from 1992:1 to 2004:9, the previous analysis included estimation and testing of two models The final year of the data serves as the out-of-sample forecast evaluation, yielding a mean squared error of 9.30 for Model 0 and 8.11 for Model 1.

Conclusions

We prove that the statistics R*_k and Q*_k can be used for goodness-of-fit testing in constrained VARMA models when the constraints affect all coefficients, including those of the covariance matrix of the innovations Under the null hypothesis, R*_k and Q*_k follow a chi-square distribution, enabling standard inference The classical portmanteau test remains applicable when the constraints do not locally mix the ARMA parameters with the coefficients of the innovation covariance matrix.

Simulated and real data suggest that the power of the test is greater with ˜R ∗ k than with ˜Q ∗ k If we apply the corrected classical portmanteau test

Q˜ k despite of the lack of theoretical basis, our results show that its power is less that with the extended ones.

Lemma 2.1 If Φ(z) and Θ(z) are matrix polynomials such as |Θ(z)| = 0 for |z| ≤1, then Π = Θ − 1 Φ satisfies for any k,

Proof If we differentiate Π(z) Θ(z) = Φ(z) and multiply by (Θ − 1 ⊗I n ) we get,

Lemma 2.2 The asymptotic covariance matrix of T 1/2 ( ˆβ−β) is,

Let τ denote the varying elements on Φ and Θ, and let λ be the Lagrange multipliers for the constrained maximization of the log-likelihood From Theorem 4.3.1 in Hannan and Deistler (1988), we can obtain the asymptotic covariance of T^{1/2}(ˆτ − τ, σ̂ − σ, λ̂) A first-order Taylor expansion then yields the asymptotic covariance of T^{1/2}(ˆβ − β).

Under the gaussian assumption,M 4 = (I +K nn )(Σ⊗Σ), and then,

Thus, the asymptotic covariance matrix becomesI(β) − 1 and to conclude we only need to derive the likelihood function and to use,

Lemma 2.3 LetX i ,i= 0,1beb×n i matrices of rankb i , such asb=b 0 +b 1 and X 0 X 1 = 0 Then, X 0 (X 0 X 0 +X 1 X 1 ) − 1 X 1 = 0.

Let Xi = Ui Di Vi be the singular value decomposition (SVD) of Xi, with Di diagonal and Ui, Vi orthogonal By reordering, we can assume Di splits so that D0 has nonzero entries only in the first b0 positions of the main diagonal and D1 occupies the last b1 positions Because the last b1 columns of U0 and the first b0 columns of U1 are multiplied by zero in the corresponding diagonal blocks, these columns do not contribute to the decomposition, yielding a block-structured form that clarifies the separation between the two parts of the matrix Xi and paves the way for the subsequent argument.

With the singular value decomposition (SVD), we can substitute certain components by arbitrary values without violating the established identities By assembling a new orthogonal matrix U from the first b0 columns of U0 and the last b1 columns of U1, we obtain X_i = U D_i V_i This demonstrates that the SVD factorization persists under these substitutions, confirming that the representation Xi = U Di Vi remains valid for the constructed U, D_i, and V_i.

X 0 (X0X 0 +X1X 1 ) − 1 X1 =X 0 U(D0D 0 +D1D 1 ) − 1 U X1 Then,D0D 0 +D1D 1 is a diagonal matrix, while the lastb 1 columns ofU X 0 =D 0 V 0 and the first b 0 ones ofU X 1 =D 1 V 1 are null, so the lemma follows.

Proof of Theorem 2.1 If A1–A4 hold, the assumptions of Theorem 4.2.1 of Hannan and Deistler (1988) are also satisfied Consequently, Σ(β̂), Φi(β̂) and Θj(β̂) are consistent, and for large T they enter arbitrarily close to the true values The rank condition implies that there exists a neighborhood in which the model is locally identifiable, enabling consistent estimation of the relevant parameters.

Let ∂β(2.29) have full rank b The Rank Theorem (Bröcker and Jänich, 1982) guarantees the existence of twice differentiable constraints under which Theorem 4.3.1 can be applied and the estimators satisfy the Central Limit Theorem In a neighborhood of the true parameter, the relations ˙Σ = Σ(˙β), ˙Φi = Φi(˙β), and ˙Θi = Θ(˙β) can be solved for ˙β with regularity, hence the Central Limit Theorem holds for ˆβ In Lemma 2.2 we obtain the asymptotic covariance matrix of ˆβ.

Let y_t be defined for all t from −∞ to T For computational simplicity, the theoretical residuals ε̇_t, i.e., those that could be computed from the entire process y_t observed from −∞, are, up to O_p(T−1) terms, equal to the actual residuals ė_t computed using only the first T terms Both ε̇_t and ė_t satisfy the relation Θ̇(B) ε_t = Φ̇(B) y_t This implies that, under the stability hypothesis, E|ė_t − ε̇_t|^2 = O(ρ^t).

C˙j = T − 1 ˙ εtε˙ t+j and ˙Dj =T − 1 ˙ ete˙ t+j Then, the difference between

To determine the asymptotic distribution of the autocovariances, stack the autocovariances up to lag k into c = [vec(C1), , vec(Ck)] Form the augmented vector c* = [vec(C0), c] It can be shown that T^{1/2} (c* − σ*) with σ* = [σ, 0, , 0]^T is asymptotically distributed as a normal with zero mean and covariance matrix Σ*.

To derive the asymptotic distribution of the estimator ĉ*, we expand it by a Taylor series about the true value c* The first step is to compute the derivatives of the residuals with respect to β̇ The residuals can be written as ε̇_t = ∑_{i=1}^∞ (I_n ⊗ y_t − i) ˙π_i, with ˙π_i = vec( ˙Π_i ) and ˙Π(z) = ˙Θ(z) − 1/Φ(z) By Lemma 2.1, the infinite series is differentiable term-by-term in the invertibility region, which yields the differentiated residuals necessary to characterize the limiting distribution of ĉ*.

We write the vectorized covariance matrix vec(C j ) as

(ε t+j ⊗I n )ε t Taking derivatives with respect to ˙β, we obtain,

Due to the ergodicity ofy t , the sum above converges toE j = j i=1∂π i /∂β(I n ⊗ Ψj − iΣ) for j > 0 and to zero for j = 0 Thus, since ˆc ∗ = c ∗ +∂c/∂β( ˆβ − β) +O p (T − 1 ) and defining ˜X = [0, E 1 , , E k ] = [0, X ] , then ∂c/∂β X+op(1), we find that ˆc ∗ =c ∗ + ˜X( ˆβ−β) +o p (T − 1/2 ) (2.33)

Before using the expression above to obtain the asymptotic distribution of ˆc ∗ , we need the asymptotic covariance of c ∗ and ˆβ The log-likelihood of β˙ is up to constants, normalizing by T, L T ( ˙β) = −(2T) − 1 T t=1log|Σ˙ t | −

(2T) − 1 T t=1e˙ t Σ˙ − t 1 e˙ t , where ˙Σt is the covariance matrix of ˙e t Under A3, Σt= Σ +O(ρ t ) and thus, onlyOp(T − 1 ) terms are neglected in the following calculations if instead ofL T we use, l T ( ˙β) =−1

By a first order Taylor expansion of∂l T /∂β,˙ βˆ−β =I(β) − 1 ∂lT

If we denote vec(Σ − 1 ) by σ − 1 , the derivative∂lT/∂β equals,

The first term in equation (2.36) is deterministic, so its covariance with vec(Cj) is zero It also holds that cov(Σ−1 εt ⊗ Σ−1 εt, vec(Cj)) = 0 for j > 0, due to the independence of the ε’s By applying equation (2.31) and following the approach in Ahn (1988), one obtains, for j > 0, the corresponding covariance result cov(T ∂l T) in line with that reference.

∂β(In⊗ys+j − iε s ) (2.37) which converges to E j when T → ∞ For the case j = 0, the first term between braces{ .}in (2.36) is uncorrelated with vec(C0), so cov(T ∂lT/∂β,vec(C0)) equals,

It can be proved thatE(Σ − 1 ε t ⊗Σ − 1 ε t )ε t (I n ⊗ε t ) =σ − 1 σ +I n 2+K nn , so (2.38) yields,

Which by the symmetry of Σ, equals∂σ /∂β Let us call E 0 =∂σ /∂β,

X ∗ = [E0, X ] , and then, limT →∞cov(T ∂lT/∂β, c ∗ ) =X ∗ Now, from the equation above and (2.35), cov( ˆβ, c ∗ ) = T − 1 I(β) − 1 X ∗ Since Σ is smooth enough, we can write, σˆ ∗ =σ ∗ +

From (2.33) and (2.40),T 1/2 (ˆc ∗ −σˆ ∗ ) =T 1/2 (c ∗ −σ ∗ )−T 1/2 X ∗ ( ˆβ−β) + op(1) and then, the asymptotic covariance matrix is,

Since Σ is positive definite, there exists someS such asS S= Σ − 1 and SΣS=I n We define,

Let us call ˆG k a consistent estimate ofG k If we define ˜c ∗ = ˆG k (ˆc ∗ −σˆ ∗ ) thenT 1/2 ˜c ∗ is distributed asymptotically with covariance matrix,V k =Z k −

The expression T(˜c ∗ ) ˜c ∗ equals Q ∗ k or R ∗ k depending on which ˆG k we choose among the following,

Let us check that for large k,V k is nearly an idempotent matrix of the required rank, but we still need some calculations First, we can see that J(β) =X ∗ G k G k X ∗ equals,

∂β (2.46) which is the same asI(β) save for k l=1Ψl − iΣΨl − j instead of Γ(i−j) ∞ l=1Ψ l − i ΣΨ l − j Thus, the differenceJ(β)−I(β) is aO(ρ k ) Then we can write V k as,

Where U k ≤ O(ρ k )X ∗ G k 2 The matrix (I n 2 +Knn)/2 is idempotent and has rankn(n+ 1)/2 (see Magnus and Neudecker, 1982, pag 48) On the other hand, Z k G k X ∗ = G k X ∗ due to Knn(S⊗S) = (S⊗S)Knn and

From the relation K_nn ∂σ/∂β = ∂σ/∂β, W_k is the difference of two commuting idempotent matrices, and hence W_k is idempotent itself; its rank equals rank(Z_k) − rank(X^*), since for idempotent matrices the rank equals the trace The rank of X^* is b for large k due to A4 together with (2.23).

Since W k is idempotent of rank d=n(n+ 1)/2 +kn 2 −b, there exists some (k+ 1)n 2 ×(k+ 1)n 2 matrixH such as H H=I (k+1)n 2 and,

If H = (H 0 , H 1 ) with H0 sized d×(k+ 1)n 2 , then u =H0˜c ∗ satisfies,

T(˜c ∗ ) ˜c ∗ =T u u+ Δ 1 , where E|Δ 1 | 2 ≤MU k Now, we definev = (I d +

H0U k H 0 ) − 1/2 u and then, T Evv =I d and, T v v=T u u+ Δ2 with, Δ2=T

Let us see thatX ∗ G k is bounded uniformly ink It is easy to see that the norm ofG k is bounded For X ∗ , we can use that,

Since Ψ and Π are both O(ρ k ), then E j ≤ M ρ i ρ j − i , and thus, the norm ofX ∗ is uniformly bounded.

From (2.33), (2.35) and (2.36) we find that, up too p (T − 1/2 ) terms, any linear combination of (v ,βˆ −β ) is a martingale difference The Central Limit Theorem for martingales (Billingsley, 1995, p 476) applies, since the martingale version of the Lindeberg condition holds under bounded fourth- order moments Thus, T 1/2 v converges in distribution to a multivariate normal with zero mean and unit covariance matrix Therefore, T v v is asymptotically distributed as a chi-square withddegrees of freedom Sum- marizing, we have that the extended portmanteau statistic equals the sum ofT v v, plus O(ρ k ) terms It holds that for any α∈(0,1), δ >0,

Letting T → ∞ and using that the maximum of the density function of aχ 2 l is O(l − 1 ), lim sup

If we choose δ=ζ k withρ < ζ 2

Ngày đăng: 26/01/2022, 15:53

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w