一、矩阵代数 二、专题:矩阵求导术 1. 标量对向量求导 1.1 定义 ∂ f ∂ X = ∂ f ∂ x i j \frac{\partial f}{\partial X}=\frac{\partial f}{\partial x_{ij}}
∂ X ∂ f = ∂ x i j ∂ f
f f f 对X X X 逐元素求导排成与X X X 尺寸相同的矩阵 特别地,X X X 为m m m 维列向量,
∂ ∂ X = ( ∂ ∂ x 1 ∂ ∂ x 2 ⋮ ∂ ∂ x m ) \frac{\partial}{\partial X}=
\begin{pmatrix}
\frac{\partial}{\partial x_1} \\
\frac{\partial}{\partial x_2} \\
\vdots \\
\frac{\partial}{\partial x_m}
\end{pmatrix}
∂ X ∂ = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ ∂ x 1 ∂ ∂ x 2 ∂ ⋮ ∂ x m ∂ ⎠ ⎟ ⎟ ⎟ ⎟ ⎞
1.2 一些可用的结论 1.2.1 线性函数求导 y = a ′ x = x ′ a = ∑ i = 1 n a i x i y=a'x=x'a=\sum_{i=1}^{n}a_i x_i
y = a ′ x = x ′ a = i = 1 ∑ n a i x i
∂ a ′ x ∂ x = a ∂ x ′ b ∂ x = b \frac{\partial{a'x}}{\partial x}=a\\
\frac{\partial{x'b}}{\partial x}=b\\
∂ x ∂ a ′ x = a ∂ x ∂ x ′ b = b
Notice: 这里注意 a ′ 与 a 的区别!!! \text{Notice: 这里注意$a'$与$a$的区别!!!}
Notice: 这里注意 a ′ 与 a 的区别 !!!
1.2.2 二次型问题 A 对称: ∂ x ′ A x ∂ x = 2 A x A 非对称: ∂ x ′ A x ∂ x = ( A + A ′ ) x ∂ x ′ A x ∂ a i j = x i x j ∂ x ′ A x ∂ A = x ⋅ x ′ A对称:\frac{\partial {x'Ax}}{\partial x}=2Ax
\\A非对称:\frac{\partial {x'Ax}}{\partial x}=(A+A')x \\ \ \\
\frac{\partial {x'Ax}}{\partial a_{ij}}=x_i x_j \\
\frac{\partial {x'Ax}}{\partial A}=x\cdot x'
A 对 称 : ∂ x ∂ x ′ A x = 2 A x A 非 对 称 : ∂ x ∂ x ′ A x = ( A + A ′ ) x ∂ a i j ∂ x ′ A x = x i x j ∂ A ∂ x ′ A x = x ⋅ x ′
1.2.3 行列式求导 ∂ ∣ A ∣ ∂ a i j = ( − 1 ) i + j ∣ A i j ∣ = c i j ∵ A i j − 1 = ∣ C j i ∣ ∣ A ∣ ∴ ∂ l n ∣ A ∣ ∂ a i j = ( − 1 ) i + j ∣ C j i ∣ ∣ A ∣ ∂ l n ∣ A ∣ ∂ A = ( A − 1 ) ′ \frac{\partial |A|}{\partial a_{ij}}=(-1)^{i+j}|A_{ij}|=c_{ij}
\\ \ \\ \ \\
\because\ A_{ij}^{-1}=\frac{|C_{ji}|}{|A|}\\
\therefore\ \frac{\partial ln|A|}{\partial {a_{ij}}}=\frac{(-1)^{i+j}|C_{ji}|}{|A|}
\\ \frac{\partial ln|A|}{\partial {A}}=(A^{-1})'
∂ a i j ∂ ∣ A ∣ = ( − 1 ) i + j ∣ A i j ∣ = c i j ∵ A i j − 1 = ∣ A ∣ ∣ C j i ∣ ∴ ∂ a i j ∂ l n ∣ A ∣ = ∣ A ∣ ( − 1 ) i + j ∣ C j i ∣ ∂ A ∂ l n ∣ A ∣ = ( A − 1 ) ′
2. 向量对向量求导(暂略) 三、概率论相关 1. 概率论基础 2. 多维随机向量的概率分布 3. 条件分布 4. 随机变量的数字特征:总体矩 4.1 离散与连续 4.1.1 离散分布 p k ≡ P ( X = x k ) E ( X ) ≡ μ ≡ ∑ k = 1 ∞ x k p k p_k\equiv P(X=x_k)\\
E(X)\equiv \mu \equiv \sum_{k=1}^{\infty}x_k p_k
p k ≡ P ( X = x k ) E ( X ) ≡ μ ≡ k = 1 ∑ ∞ x k p k
4.1.2 连续分布 ∫ − ∞ ∞ f ( x ) d x = 1 E ( X ) ≡ μ ≡ ∫ − ∞ ∞ x f ( x ) d x \int_{-\infty}^{\infty}f(x)dx=1
\\ E(X)\equiv \mu \equiv \int_{-\infty}^{\infty}xf(x)dx
∫ − ∞ ∞ f ( x ) d x = 1 E ( X ) ≡ μ ≡ ∫ − ∞ ∞ x f ( x ) d x
4.2 矩 4.2.1 定义 E ( g ( x ) ) = ∫ − ∞ ∞ g ( x ) f ( x ) d x E(g(x))=\int_{-\infty}^{\infty}g(x)f(x)dx
E ( g ( x ) ) = ∫ − ∞ ∞ g ( x ) f ( x ) d x
4.2.2 概率统计中常用的两类矩 4.3 条件期望与条件方差 4.3.1 条件期望 E ( Y ∣ X = x ) = E ( Y ∣ x ) = ∫ − ∞ ∞ y f ( y ∣ x ) d y E(Y|X=x)=E(Y|x)=\int_{-\infty}^{\infty}yf(y|x)dy
E ( Y ∣ X = x ) = E ( Y ∣ x ) = ∫ − ∞ ∞ y f ( y ∣ x ) d y
Notice: 是关于x x x 的函数,y y y 已被积分掉 4.3.2 条件方差 V a r ( Y ∣ X = x ) ≡ V a r ( Y ∣ x ) = ∫ − ∞ ∞ [ y − E ( Y ∣ x ) ] 2 f ( y ∣ x ) d y Var(Y|X=x)\equiv Var(Y|x)=\int_{-\infty}^{\infty}[y-E(Y|x)]^{2} f(y|x)dy
V a r ( Y ∣ X = x ) ≡ V a r ( Y ∣ x ) = ∫ − ∞ ∞ [ y − E ( Y ∣ x ) ] 2 f ( y ∣ x ) d y
4.4 多维随机向量的期望、方差 4.4.1对向量(矩阵)的期望 E ( X ) = E ( X 1 X 2 ⋮ X n ) = ( E ( X 1 ) E ( X 2 ) ⋮ E ( X n ) ) E(X)=E\begin{pmatrix}X_1\\X_2\\ \vdots \\ X_n\end{pmatrix}=\begin{pmatrix}E(X_1)\\E(X_2)\\ \vdots \\ E(X_n)\end{pmatrix}
E ( X ) = E ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ X 1 X 2 ⋮ X n ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ E ( X 1 ) E ( X 2 ) ⋮ E ( X n ) ⎠ ⎟ ⎟ ⎟ ⎟ ⎞
4.4.2 协方差矩阵 C o v ( X , Y ) m × n = E [ ( X − E ( X ) ) ( Y − E ( Y ) ) ′ ] = E ( X Y ′ ) − E ( X ) E ( Y ) ′ = E [ ( X 1 − E ( X 1 ) X 2 − E ( X 2 ) ⋮ X m − E ( X m ) ) ( Y 1 − E ( Y 1 ) Y 2 − E ( Y 2 ) ⋯ Y n − E ( Y n ) ) ] = E ( ( X 1 − E ( X 1 ) ) ( Y 1 − E ( Y 1 ) ) ( X 1 − E ( X 1 ) ) ( Y 2 − E ( Y 2 ) ) ⋯ ( X 1 − E ( X 1 ) ) ( Y n − E ( Y n ) ) ( X 2 − E ( X 2 ) ) ( Y 1 − E ( Y 1 ) ) ( X 2 − E ( X 2 ) ) ( Y 2 − E ( Y 2 ) ) ⋯ ( X 2 − E ( X 2 ) ) ( Y n − E ( Y n ) ) ⋮ ⋮ ⋱ ⋮ ( X m − E ( X m ) ) ( Y 1 − E ( Y 1 ) ) ( X m − E ( X m ) ) ( Y 2 − E ( Y 2 ) ) ⋯ ( X m − E ( X m ) ) ( Y 1 − E ( Y 1 ) ) ) = ( C o v ( X 1 , Y 1 ) C o v ( X 1 , Y 2 ) ⋯ C o v ( X 1 , Y n ) C o v ( X 2 , Y 1 ) C o v ( X 2 , Y 2 ) ⋯ C o v ( X 2 , Y n ) ⋮ ⋮ ⋱ ⋮ C o v ( X m , Y 1 ) C o v ( X m , Y 2 ) ⋯ C o v ( X m , Y n ) ) \scriptstyle
\begin{aligned}
Cov(X,Y)_{m\times n}&=E[(X-E(X))(Y-E(Y))']\\
&=E(XY')-E(X)E(Y)' \\
&=E[\begin{pmatrix}X_1-E(X_1)\\X_2-E(X_2)\\ \vdots \\ X_m-E(X_m)\end{pmatrix}\begin{pmatrix}Y_1-E(Y_1)&Y_2-E(Y_2) &\cdots Y_n-E(Y_n)\end{pmatrix}]\\
&=E\begin{pmatrix}(X_1-E(X_1))(Y_1-E(Y_1)) &(X_1-E(X_1))(Y_2-E(Y_2)) &\cdots&(X_1-E(X_1))(Y_n-E(Y_n))\\(X_2-E(X_2))(Y_1-E(Y_1)) &(X_2-E(X_2))(Y_2-E(Y_2)) &\cdots &(X_2-E(X_2))(Y_n-E(Y_n))\\ \vdots &\vdots &\ddots &\vdots \\ (X_m-E(X_m))(Y_1-E(Y_1)) &(X_m-E(X_m))(Y_2-E(Y_2)) &\cdots &(X_m-E(X_m))(Y_1-E(Y_1))\end{pmatrix}\\
&=\begin{pmatrix}Cov(X_1, Y_1) &Cov(X_1, Y_2) &\cdots&Cov(X_1, Y_n)\\Cov(X_2,Y_1) &Cov(X_2,Y_2) &\cdots &Cov(X_2,Y_n)\\ \vdots &\vdots &\ddots &\vdots \\ Cov(X_m,Y_1) &Cov(X_m,Y_2)&\cdots &Cov(X_m,Y_n)\end{pmatrix}
\end{aligned}
C o v ( X , Y ) m × n = E [ ( X − E ( X ) ) ( Y − E ( Y ) ) ′ ] = E ( X Y ′ ) − E ( X ) E ( Y ) ′ = E [ ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ X 1 − E ( X 1 ) X 2 − E ( X 2 ) ⋮ X m − E ( X m ) ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ ( Y 1 − E ( Y 1 ) Y 2 − E ( Y 2 ) ⋯ Y n − E ( Y n ) ) ] = E ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ ( X 1 − E ( X 1 ) ) ( Y 1 − E ( Y 1 ) ) ( X 2 − E ( X 2 ) ) ( Y 1 − E ( Y 1 ) ) ⋮ ( X m − E ( X m ) ) ( Y 1 − E ( Y 1 ) ) ( X 1 − E ( X 1 ) ) ( Y 2 − E ( Y 2 ) ) ( X 2 − E ( X 2 ) ) ( Y 2 − E ( Y 2 ) ) ⋮ ( X m − E ( X m ) ) ( Y 2 − E ( Y 2 ) ) ⋯ ⋯ ⋱ ⋯ ( X 1 − E ( X 1 ) ) ( Y n − E ( Y n ) ) ( X 2 − E ( X 2 ) ) ( Y n − E ( Y n ) ) ⋮ ( X m − E ( X m ) ) ( Y 1 − E ( Y 1 ) ) ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ C o v ( X 1 , Y 1 ) C o v ( X 2 , Y 1 ) ⋮ C o v ( X m , Y 1 ) C o v ( X 1 , Y 2 ) C o v ( X 2 , Y 2 ) ⋮ C o v ( X m , Y 2 ) ⋯ ⋯ ⋱ ⋯ C o v ( X 1 , Y n ) C o v ( X 2 , Y n ) ⋮ C o v ( X m , Y n ) ⎠ ⎟ ⎟ ⎟ ⎟ ⎞
V a r ( X ) m × m = E [ ( X − E ( X ) ) ( X − E ( X ) ) ′ ] = E ( X X ′ ) − E ( X ) E ( X ) ′ = ( V a r ( X 1 ) C o v ( X 1 , X 2 ) ⋯ C o v ( X 1 , X m ) C o v ( X 2 , X 1 ) V a r ( X 2 ) ⋯ C o v ( X 2 , X m ) ⋮ ⋮ ⋱ ⋮ C o v ( X m , X 1 ) C o v ( X m , X 2 ) ⋯ V a r ( X m ) ) \begin{aligned}
Var(X)_{m\times m}&=E[(X-E(X))(X-E(X))']\\&=E(XX')-E(X)E(X)'\\
&=
\begin{pmatrix}Var(X_1) &Cov(X_1, X_2) &\cdots&Cov(X_1, X_m)\\Cov(X_2,X_1) &Var(X_2) &\cdots &Cov(X_2,X_m)\\ \vdots &\vdots &\ddots &\vdots \\ Cov(X_m,X_1) &Cov(X_m,X_2)&\cdots &Var(X_m)\end{pmatrix}
\end{aligned}
V a r ( X ) m × m = E [ ( X − E ( X ) ) ( X − E ( X ) ) ′ ] = E ( X X ′ ) − E ( X ) E ( X ) ′ = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ V a r ( X 1 ) C o v ( X 2 , X 1 ) ⋮ C o v ( X m , X 1 ) C o v ( X 1 , X 2 ) V a r ( X 2 ) ⋮ C o v ( X m , X 2 ) ⋯ ⋯ ⋱ ⋯ C o v ( X 1 , X m ) C o v ( X 2 , X m ) ⋮ V a r ( X m ) ⎠ ⎟ ⎟ ⎟ ⎟ ⎞
4.4.3 夹心估计量(A、B为常数矩阵) C o v ( A X , B Y ) = A C o v ( X , Y ) B ′ V a r ( A X ) = A V a r ( X ) A ′ Cov(AX,BY)=ACov(X,Y)B'\\
Var(AX)=AVar(X)A'
C o v ( A X , B Y ) = A C o v ( X , Y ) B ′ V a r ( A X ) = A V a r ( X ) A ′
4.4.4 矩阵和的方差V a r ( X + Y ) Var(X+Y) V a r ( X + Y ) V a r ( X + Y ) = V a r ( X ) + V a r ( Y ) + C o v ( X , Y ) + C o v ( X , Y ) ′ Var(X+Y)=Var(X)+Var(Y)+Cov(X,Y)+Cov(X,Y)'
V a r ( X + Y ) = V a r ( X ) + V a r ( Y ) + C o v ( X , Y ) + C o v ( X , Y ) ′
5. 样本矩:总体矩的参数估计 5.1 期望迭代定律 E ( Y ) = E X [ E ( Y ∣ X = x ) ] E ( g ( Y ) ) = E X [ E ( g ( Y ) ∣ X = x ) ] E(Y)=E_{X}[E(Y|X=x)]\\
E(g(Y))=E_X[E(g(Y)|X=x)]
E ( Y ) = E X [ E ( Y ∣ X = x ) ] E ( g ( Y ) ) = E X [ E ( g ( Y ) ∣ X = x ) ]
5.2 方差分解定律 V a r ( y ) = V a r X [ E ( y ∣ X ) ] + E X [ V a r ( y ∣ X ) ] 对于估计量而言: V a r ( b ) = V a r X [ E ( b ∣ X ) ] + E X [ V a r ( b ∣ X ) ] = E X [ V a r ( b ∣ X ) ] = E X [ σ 2 ( X ′ X ) − 1 ] = σ 2 E X [ ( X ′ X ) − 1 ] \begin{aligned}
Var(y)&=Var_{X}[E(y|X)]+E_{X}[Var(y|X)]
\\ \ \\
对于估计量而言:\\
Var(b)&=Var_{X}[E(b|X)]+E_{X}[Var(b|X)] \\
&=E_X[Var(b|X)]\\
&=E_X[\sigma^2(X'X)^{-1}]\\
&=\sigma^2E_X[(X'X)^{-1}]
\end{aligned}
V a r ( y ) 对 于 估 计 量 而 言 : V a r ( b ) = V a r X [ E ( y ∣ X ) ] + E X [ V a r ( y ∣ X ) ] = V a r X [ E ( b ∣ X ) ] + E X [ V a r ( b ∣ X ) ] = E X [ V a r ( b ∣ X ) ] = E X [ σ 2 ( X ′ X ) − 1 ] = σ 2 E X [ ( X ′ X ) − 1 ]
6. 随机变量无关的三个层次:线性独立—>均值独立—>线性不相关 相互独立:f ( x , y ) = f x ( x ) f y ( y ) f(x,y)=f_x(x)f_y(y) f ( x , y ) = f x ( x ) f y ( y )
均值独立:E ( Y ∣ X = x ) = E ( Y ) E(Y|X=x)=E(Y) E ( Y ∣ X = x ) = E ( Y )
Notice: 并不意味着X均值独立于Y
Theorem:若Y均值独立于X,或者X均值独立于Y,则有C o v ( X , Y ) = 0 Cov(X,Y)=0 C o v ( X , Y ) = 0
C o v ( X , Y ) = E [ ( X − E X ) ( Y − E Y ) ] = E X E Y [ ( X − E X ) ( Y − E Y ) ∣ X = x ] = E X [ ( X − E X ) E Y [ ( Y − E Y ) ∣ X = x ] ] = E X [ [ X − E ( X ) ] [ E ( Y ∣ x ) − E ( Y ) ] ] = 0
\begin{aligned}
Cov(X,Y)&=E[(X-EX)(Y-EY)]\\
&=E_X E_Y[(X-EX)(Y-EY)|X=x]\\
&=E_X [(X-EX)E_Y[(Y-EY)|X=x]]\\
&=E_X[[X-E(X)][E(Y|x)-E(Y)]]\\
&=0
\end{aligned}
C o v ( X , Y ) = E [ ( X − E X ) ( Y − E Y ) ] = E X E Y [ ( X − E X ) ( Y − E Y ) ∣ X = x ] = E X [ ( X − E X ) E Y [ ( Y − E Y ) ∣ X = x ] ] = E X [ [ X − E ( X ) ] [ E ( Y ∣ x ) − E ( Y ) ] ] = 0
线性不相关:C o v ( x , y ) = 0 Cov(x,y)=0 C o v ( x , y ) = 0
四、统计基础 1. 常用连续型统计分布 1.1 正态分布 1.1.1 一维正态分布 f ( x ) = 1 2 π σ 2 e − ( x − μ ) 2 2 σ 2 f(x)=\frac{1}{\sqrt{2\pi \sigma^2}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}
f ( x ) = 2 π σ 2 1 e 2 σ 2 − ( x − μ ) 2
1.1.2 多维正态分布 f ( X 1 , X 2 , ⋯ , X n ) = 1 ( 2 π ) n 2 ∣ Σ ∣ 1 / 2 e − 1 2 ( X − μ ) ′ Σ − 1 ( X − μ ) 其中 : Σ = ( V a r ( X 1 ) C o v ( X 1 , X 2 ) ⋯ C o v ( X 1 , X n ) C o v ( X 2 , X 1 ) V a r ( X 2 ) ⋯ C o v ( X 2 , X n ) ⋮ ⋮ ⋱ ⋮ C o v ( X n , X 1 ) C o v ( X n , X 2 ) ⋯ V a r ( X n ) ) f(X_1,X_2,\cdots,X_n)=\frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{1/2}}e^{-\frac{1}{2}(X-\mu)'\Sigma^{-1}(X-\mu)}\\
\ \\ \ \\ \ \\
其中:\ \Sigma=\begin{pmatrix}Var(X_1) &Cov(X_1, X_2) &\cdots&Cov(X_1, X_n)\\Cov(X_2,X_1) &Var(X_2) &\cdots &Cov(X_2,X_n)\\ \vdots &\vdots &\ddots &\vdots \\ Cov(X_n,X_1) &Cov(X_n,X_2)&\cdots &Var(X_n)\end{pmatrix}
f ( X 1 , X 2 , ⋯ , X n ) = ( 2 π ) 2 n ∣ Σ ∣ 1 / 2 1 e − 2 1 ( X − μ ) ′ Σ − 1 ( X − μ ) 其 中 : Σ = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ V a r ( X 1 ) C o v ( X 2 , X 1 ) ⋮ C o v ( X n , X 1 ) C o v ( X 1 , X 2 ) V a r ( X 2 ) ⋮ C o v ( X n , X 2 ) ⋯ ⋯ ⋱ ⋯ C o v ( X 1 , X n ) C o v ( X 2 , X n ) ⋮ V a r ( X n ) ⎠ ⎟ ⎟ ⎟ ⎟ ⎞
1.2 X 2 \mathcal{X}^{2} X 2 分布 若{ Z 1 , Z 2 , ⋯ , Z k } ∼ N ( 0 , 1 ) \{Z_1,Z_2,\cdots,Z_k\}\sim N(0,1) { Z 1 , Z 2 , ⋯ , Z k } ∼ N ( 0 , 1 ) ,则:
∑ i = 1 k Z i 2 ∼ X 2 ( k ) \sum_{i=1}^{k}Z_{i}^2 \sim \mathcal{X}^2(k)
i = 1 ∑ k Z i 2 ∼ X 2 ( k )
期望k k k ,方差2 k 2k 2 k
if Z ∼ N ( 0 , 1 ) , Z 2 ∼ X 2 ( 1 ) Z\sim N(0,1), Z^2\sim \mathcal{X}^2(1) Z ∼ N ( 0 , 1 ) , Z 2 ∼ X 2 ( 1 )
补充:
若m m m 维随机变量x x x 服从正态分布N ( μ , Σ ) N(\mu, \Sigma) N ( μ , Σ ) ,其中Σ \Sigma Σ 为非退化矩阵(满秩),则二次型
( x − μ ) ′ Σ − 1 ( x − μ ) ∼ χ 2 ( m ) (x-\mu)'\Sigma^{-1}(x-\mu)\sim \chi^2(m)
( x − μ ) ′ Σ − 1 ( x − μ ) ∼ χ 2 ( m )
1.3 t分布 Z ∼ N ( 0 , 1 ) , Y ∼ X 2 ( k ) Z\sim N(0,1),Y\sim \mathcal{X}^2(k) Z ∼ N ( 0 , 1 ) , Y ∼ X 2 ( k ) ,且Z与Y相互独立,则
Z Y / k ∼ t ( k ) \frac{Z}{\sqrt{Y/k}}\sim t(k)
Y / k Z ∼ t ( k )
1.4 F分布 Y 1 ∼ X ( k 1 ) Y_1 \sim \mathcal{X}^(k_1) Y 1 ∼ X ( k 1 ) , Y 2 ∼ X 2 ( k 2 ) Y_2\sim \mathcal{X}^2(k_2) Y 2 ∼ X 2 ( k 2 ) , 且Y 1 , Y 2 Y_1,Y_2 Y 1 , Y 2 相互独立,
Y 1 / k 1 Y 2 / k 2 ∼ F ( k 1 , k 2 ) \frac{Y_1/k_1}{Y_2/k_2}\sim F(k_1,k_2)
Y 2 / k 2 Y 1 / k 1 ∼ F ( k 1 , k 2 )
如果X ∼ t ( k ) X\sim t(k) X ∼ t ( k ) , 则X 2 ∼ F ( 1 , k ) X^2\sim F(1,k) X 2 ∼ F ( 1 , k )
F分布与χ 2 \chi^2 χ 2 分布在大样本下是等价的
**命题:假设F ∼ F ( m , n − K ) F\sim F(m, n-K) F ∼ F ( m , n − K ) **分布,则当n → ∞ n\rightarrow\infty n → ∞ 时,m F ⟶ d χ 2 ( m ) mF \stackrel{d}\longrightarrow \chi^2(m) m F ⟶ d χ 2 ( m )
2. 统计推断相关概念 2.1 均方误差 M S E ( θ ^ ) = E [ ( θ ^ − θ ) 2 ] = V a r ( θ ^ ) + [ B i a s ( θ ^ ) ] 2 MSE(\hat{\theta})=E[(\hat{\theta}-\theta)^2]=Var(\hat{\theta})+[Bias(\hat{\theta})]^2 M S E ( θ ^ ) = E [ ( θ ^ − θ ) 2 ] = V a r ( θ ^ ) + [ B i a s ( θ ^ ) ] 2
证明:
M S E ( θ ^ ) = E [ ( θ ^ − θ ) 2 ] = E { [ θ ^ − E ( θ ^ ) + E ( θ ^ ) − θ ] } = E [ θ ^ − E ( θ ^ ) ] 2 + 2 E { [ θ ^ − E ( θ ^ ) ] [ E ( θ ^ ) − θ ] } + E [ E ( θ ^ ) − θ ] 2 = V a r ( θ ^ ) + 2 [ E ( θ ^ ) − θ ] E [ θ ^ − E ( θ ^ ) ] + [ B i a s ( θ ^ ) ] 2 = V a r ( θ ^ ) + [ B i a s ( θ ^ ) ] 2 \begin{aligned}
MSE(\hat{\theta})&=E[(\hat{\theta}-\theta)^2]\\
&=E\{[\hat{\theta}-E(\hat{\theta})+E(\hat{\theta})-\theta]\}\\
&=E[\hat{\theta}-E(\hat{\theta})]^2 +2E\{[\hat{\theta}-E(\hat{\theta})][E(\hat{\theta})-\theta]\}+E[E(\hat{\theta})-\theta]^2\\
&=Var(\hat{\theta})+2[E(\hat{\theta})-\theta]E[\hat{\theta}-E(\hat{\theta})]+[Bias(\hat{\theta})]^2\\
&=Var(\hat{\theta})+[Bias(\hat{\theta})]^2
\end{aligned}
M S E ( θ ^ ) = E [ ( θ ^ − θ ) 2 ] = E { [ θ ^ − E ( θ ^ ) + E ( θ ^ ) − θ ] } = E [ θ ^ − E ( θ ^ ) ] 2 + 2 E { [ θ ^ − E ( θ ^ ) ] [ E ( θ ^ ) − θ ] } + E [ E ( θ ^ ) − θ ] 2 = V a r ( θ ^ ) + 2 [ E ( θ ^ ) − θ ] E [ θ ^ − E ( θ ^ ) ] + [ B i a s ( θ ^ ) ] 2 = V a r ( θ ^ ) + [ B i a s ( θ ^ ) ] 2
五、小样本OLS 1. OLS推导 1.1 标量形式 标准方程组
{ 1 n ∑ i = 1 n e i = 0 1 n ∑ i = 1 n x i e i = 0 e i = y i − α ^ − β ^ x i m i n α ^ , β ^ ∑ i = 1 n ( y i − α ^ − β ^ x i ) 2 \begin{cases}
\frac{1}{n}\sum_{i=1}^{n}e_{i}=0 \\
\frac{1}{n}\sum_{i=1}^{n}x_i e_{i}=0
\end{cases}\\ \ \\ \ \\\ \\
e_i=y_i-\hat{\alpha}-\hat{\beta}x_i \\ \ \\
\mathop{min}_{\hat{\alpha},\hat{\beta}} \sum_{i=1}^{n}(y_i-\hat{\alpha}-\hat{\beta}x_i)^2
{ n 1 ∑ i = 1 n e i = 0 n 1 ∑ i = 1 n x i e i = 0 e i = y i − α ^ − β ^ x i min α ^ , β ^ i = 1 ∑ n ( y i − α ^ − β ^ x i ) 2
推导过程
y ˉ = α ^ + β ^ x ˉ ↓ ∑ x i [ ( y i − y ˉ ) − β ^ ( x i − x ˉ ) ] = 0 ↓ ∑ x i ( y i − y ˉ ) − β ^ ∑ x i ( x i − x ˉ ) = 0 ↓ β ^ = ∑ x i ( y i − y ˉ ) ∑ x i ( x i − x ˉ ) = ∑ ( x i − x ˉ ) ( y i − y ˉ ) ∑ ( x i − x ˉ ) ( x i − x ˉ ) ↓ α ^ = y ˉ − β ^ x ˉ \bar{y}=\hat{\alpha}+\hat{\beta}\bar{x}\\
\downarrow \\
\sum x_i[(y_i-\bar{y})-\hat{\beta}(x_i-\bar{x})]=0 \\
\downarrow \\
\sum x_i(y_i-\bar{y})-\hat{\beta}\sum x_i(x_i-\bar{x})=0 \\
\downarrow \\
\hat{\beta}=\frac{\sum x_i(y_i-\bar{y})}{\sum x_i(x_i-\bar{x})}=\frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})(x_i-\bar{x})} \\
\downarrow \\
\hat{\alpha}=\bar{y}-\hat{\beta}\bar{x}
y ˉ = α ^ + β ^ x ˉ ↓ ∑ x i [ ( y i − y ˉ ) − β ^ ( x i − x ˉ ) ] = 0 ↓ ∑ x i ( y i − y ˉ ) − β ^ ∑ x i ( x i − x ˉ ) = 0 ↓ β ^ = ∑ x i ( x i − x ˉ ) ∑ x i ( y i − y ˉ ) = ∑ ( x i − x ˉ ) ( x i − x ˉ ) ∑ ( x i − x ˉ ) ( y i − y ˉ ) ↓ α ^ = y ˉ − β ^ x ˉ
1.2 向量形式 1.2.1 代数法 Y = X β ^ + e m i n β ^ S S R = m i n β ^ ( Y − X β ^ ) ′ ( Y − X β ^ ) = m i n β ^ ( Y ′ Y − Y ′ X β ^ − β ^ ′ X ′ Y + β ^ X ′ X β ^ ) = m i n β ^ ( Y ′ Y − 2 Y ′ X β ^ ⏟ s c a l a r ! ! ! + β ^ X ′ X β ^ ) ∂ S S R ∂ β ^ = − 2 X ′ Y + 2 X ′ X β ^ = 0 Notice: 这里用到了 二、1.2中的矩阵求导知识 β ^ = ( X ′ X ) − 1 X ′ Y Y=X\hat{\beta}+e \\ \ \\
\begin{aligned}
\mathop{min}_{\hat{\beta}}SSR &=\mathop{min}_{\hat{\beta}} (Y-X\hat{\beta})'(Y-X\hat{\beta})\\
&=\mathop{min}_{\hat{\beta}}(Y'Y-Y'X\hat{\beta}-\hat{\beta}'X'Y+\hat{\beta}X'X\hat{\beta})\\
&=\mathop{min}_{\hat{\beta}}(Y'Y-\underbrace{2Y'X\hat{\beta}}_{scalar!!!}+\hat{\beta}X'X\hat{\beta}) \\
\end{aligned}
\\ \ \\
\frac{\partial SSR}{\partial \hat{\beta}}=-2X'Y+2X'X\hat{\beta}=0 \\
\text{Notice: 这里用到了 二、1.2中的矩阵求导知识} \\
\hat{\beta}=(X'X)^{-1}X'Y
Y = X β ^ + e min β ^ S S R = min β ^ ( Y − X β ^ ) ′ ( Y − X β ^ ) = min β ^ ( Y ′ Y − Y ′ X β ^ − β ^ ′ X ′ Y + β ^ X ′ X β ^ ) = min β ^ ( Y ′ Y − s c a l a r ! ! ! 2 Y ′ X β ^ + β ^ X ′ X β ^ ) ∂ β ^ ∂ S S R = − 2 X ′ Y + 2 X ′ X β ^ = 0 Notice: 这里用到了 二、 1.2 中的矩阵求导知识 β ^ = ( X ′ X ) − 1 X ′ Y
1.2.2 几何法 X ′ e = ( 1 1 ⋯ 1 x 12 x 22 ⋯ x n 2 ⋮ ⋮ ⋱ ⋮ x 1 k x 2 k ⋯ x n k ) k × n ⋅ ( e 1 e 2 ⋮ e n ) n × 1 = 0 e = Y − X β ^ X ′ ( Y − X β ^ ) = 0 β ^ = ( X ′ X ) − 1 X ′ Y X'e=
\begin{pmatrix}
1 &1 & \cdots &1 \\
x_{12} &x_{22} &\cdots &x_{n2} \\
\vdots &\vdots &\ddots &\vdots \\
x_{1k} &x_{2k} &\cdots &x_{nk}
\end{pmatrix}_{k\times n}\cdot
\begin{pmatrix}
e_1 \\ e_2 \\ \vdots \\ e_n
\end{pmatrix}_{n\times 1}=0
\\ \ \\ \ \\
e=Y-X\hat{\beta}\\
X'(Y-X\hat{\beta})=0 \\
\hat{\beta}=(X'X)^{-1}X'Y
X ′ e = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ 1 x 1 2 ⋮ x 1 k 1 x 2 2 ⋮ x 2 k ⋯ ⋯ ⋱ ⋯ 1 x n 2 ⋮ x n k ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ k × n ⋅ ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ e 1 e 2 ⋮ e n ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ n × 1 = 0 e = Y − X β ^ X ′ ( Y − X β ^ ) = 0 β ^ = ( X ′ X ) − 1 X ′ Y
2. Projection Matrix, Residual Making Matrix and Frisch-Waugh-Lovell Theorem 2.1 推导 y ^ = X β ^ = X ( X ′ X ) − 1 X ′ Y S e t P x = X ( X ′ X ) − 1 X ′ [ P r o j e c t i o n M a t r i x ] e = y − P x y = [ I − P x ] y S e t M x = I − P x = I − X ( X ′ X ) − 1 X ′ [ R e s i d u a l M a k i n g M a t r i x ] ∴ e = M x y = M x u y ^ = P x y \hat{y}=X\hat{\beta}=X(X'X)^{-1}X'Y\\
Set\ P_x=X(X'X)^{-1}X'\ [Projection\ Matrix]\\
e=y-P_{x}y=[I-P_x]y \\
Set\ M_x=I-P_x=I-X(X'X)^{-1}X'\ [Residual\ Making\ Matrix]\\
\therefore\ e=M_x y=M_xu\\
\ \ \ \hat{y}=P_x y
y ^ = X β ^ = X ( X ′ X ) − 1 X ′ Y S e t P x = X ( X ′ X ) − 1 X ′ [ P r o j e c t i o n M a t r i x ] e = y − P x y = [ I − P x ] y S e t M x = I − P x = I − X ( X ′ X ) − 1 X ′ [ R e s i d u a l M a k i n g M a t r i x ] ∴ e = M x y = M x u y ^ = P x y
2.2 P x , M x P_x, M_x P x , M x 性质 P x + M x = I P_x+M_x=I P x + M x = I 对称性:P x = P x ′ P_x=P_x' P x = P x ′ , M x = M x ′ M_x=M_x' M x = M x ′ 等幂矩阵:P x P x = P x , M x M x = M x P_x P_x=P_x, \ M_x M_x=M_x P x P x = P x , M x M x = M x 勾股定理:y ′ y = y ′ P ′ P y + y ′ M ′ M y = y ^ ′ y ^ + e ′ e y'y=y'P'Py+y'M'My=\hat{y}'\hat{y}+e'e y ′ y = y ′ P ′ P y + y ′ M ′ M y = y ^ ′ y ^ + e ′ e P X = X , P e = 0 , M X = 0 PX=X,\ Pe=0,\ MX=0 P X = X , P e = 0 , M X = 0 P x P_x P x 与M x M_x M x 正交:P x M x = M x P x = 0 P_xM_x=M_xP_x=0 P x M x = M x P x = 0 2.3 Frisch-Waugh-Lovell Theorem 2.3.1 描述 在向量y ⃗ \vec{y} y 对两组变量X 1 , X 2 X_1, X_2 X 1 , X 2 的线性最小二乘回归中,将y ⃗ \vec{y} y 对X 1 X_1 X 1 单独做回归并得到残差,然后将X 2 X_2 X 2 中的每列对X 1 X_1 X 1 做回归并得到一组残差,将前者得到的残差对后者得到的残差集再次回归,得到β 2 ^ \hat{\beta_2} β 2 ^ [Partial Out]。 2.3.2 原理 X = [ X 1 , X 2 ] y = X 1 β 1 + X 2 β 2 + u D e f i n e P 1 = X ( X ′ X ) − 1 X ′ M = I − P 1 M 1 y = M 1 X 1 β 1 ⏟ = 0 + M 1 X 2 β 2 + M 1 u 其中: M 1 y : y 对 X 1 做回归所得的残差向量 M 1 X 2 : X 2 每一列对 X 1 做回归所得的矩阵(向量集) X=[X_1,X_2]\\
y=X_1\beta_1+X_2 \beta_2+u\\
Define\ P_1=X(X'X)^{-1}X'\\ M=I-P_1\\
M_1 y=\underbrace{M_1 X_1\beta_1}_{=0}+M_1X_2 \beta_2+M_1u \\
其中:\\
\begin{aligned}
&M_1y: y对X_1做回归所得的残差向量
\\ &M_1X_2:X_2每一列对X_1做回归所得的矩阵(向量集)
\end{aligned}
X = [ X 1 , X 2 ] y = X 1 β 1 + X 2 β 2 + u D e f i n e P 1 = X ( X ′ X ) − 1 X ′ M = I − P 1 M 1 y = = 0 M 1 X 1 β 1 + M 1 X 2 β 2 + M 1 u 其 中 : M 1 y : y 对 X 1 做 回 归 所 得 的 残 差 向 量 M 1 X 2 : X 2 每 一 列 对 X 1 做 回 归 所 得 的 矩 阵 ( 向 量 集 )
2.3.3 更正式的推导:分块回归与偏回归 描述
y = X β + u = ( X 1 X 2 ) ( β 1 β 2 ) + u = X 1 β 1 + X 2 β 2 + u y=X\beta+u=\begin{pmatrix}X_1&X_2\end{pmatrix} \begin{pmatrix}\beta_1 \\ \beta_2\end{pmatrix}+u=X_1\beta_1+X_2\beta_2+u
y = X β + u = ( X 1 X 2 ) ( β 1 β 2 ) + u = X 1 β 1 + X 2 β 2 + u
正规方程组
( X 1 ′ X 1 X 1 ′ X 2 X 2 ′ X 1 X 2 ′ X 2 ) ( β 1 ^ β 2 ^ ) = ( X 1 ′ y X 2 ′ y ) \begin{pmatrix}X_1{'}X_1 &X_1{'}X_2\\ X_2{'}X_1 &X_2{'}X_2
\end{pmatrix}\begin{pmatrix}\hat{\beta_1}\\ \hat{\beta_2}
\end{pmatrix}=\begin{pmatrix}X_1{'}{y}\\ X_2{'}{y}
\end{pmatrix}
( X 1 ′ X 1 X 2 ′ X 1 X 1 ′ X 2 X 2 ′ X 2 ) ( β 1 ^ β 2 ^ ) = ( X 1 ′ y X 2 ′ y )
正交分解定理
y ^ = u 1 y u 1 u 1 u 1 + u 2 y u 2 u 2 u 2 + ⋯ + u k y u k u k u k \hat{y}=\frac{u_1 y}{u_1 u_1}u_1+\frac{u_2 y}{u_2 u_2}u_2+\cdots+\frac{u_k y}{u_k u_k}u_k
y ^ = u 1 u 1 u 1 y u 1 + u 2 u 2 u 2 y u 2 + ⋯ + u k u k u k y u k
b 2 b_2 b 2 的推导
X 1 ′ X 1 b 1 + X 1 ′ X 2 b 2 = X 1 ′ Y X 2 ′ X 1 b 1 + X 2 ′ X 2 b 2 = X 2 ′ Y b 1 = ( X 1 ′ X 1 ) − 1 X 1 ′ Y − ( X 1 ′ X 1 ) − 1 X 1 ′ X 2 ′ b 2 = ( X 1 ′ X 1 ) − 1 X 1 ′ ( Y − X 2 b 2 ) X 2 ′ X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ Y − X 2 ′ X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ X 2 b 2 + X 2 ′ X 2 b 2 = X 2 ′ Y ∴ b 2 = [ X 2 ′ X 2 − X 2 ′ X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ X 2 ] − 1 ⋅ ( X 2 ′ Y − X 2 ′ X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ Y ) = [ X 2 ′ ( I − X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ ) X 2 ] − 1 ⋅ ( X 2 ′ ( I − X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ ) Y ) = ( X 2 ′ M 1 X 2 ) − 1 ( X 2 M 1 Y ) D e f i n e : X 2 × = M 1 X 2 Y × = M 1 Y A n d b 2 = ( X 2 × ′ X 2 × ) − 1 ( X 2 × Y ) X_1{'}X_1 b_1+X_1{'}X_2b_2=X_1{'}Y\\
X_2{'}X_1 b_1+X_2{'}X_2b_2=X_2{'}Y\\ \ \\
b_1=(X_1{'}X_1)^{-1}X_1{'}Y-(X_1{'}X_1)^{-1}X_1{'}X_2{'}b_2=(X_1{'}X_1)^{-1}X_1{'}(Y-X_2b_2)\\ \ \\
X_2{'}X_1(X_1{'}X_1)^{-1}X_1{'}Y-X_2{'}X_1(X_1{'}X_1)^{-1}X_1{'}X_2b_2+X_2{'}X_2b_2=X_2{'}Y\\ \ \\
\begin{aligned}
\therefore\ b_2&=[X_2{'}X_2-X_2{'}X_1(X_1{'}X_1)^{-1}X_1{'}X_2]^{-1}\cdot (X_2{'}Y-X_2{'}X_1(X_1{'}X_1)^{-1}X_1{'}Y)\\
&=[X_2{'}(I-X_1(X_1{'}X_1)^{-1}X_1{'})X_2]^{-1}\cdot (X_2{'}(I-X_1(X_1{'}X_1)^{-1}X_1{'})Y)\\
&=(X_2{'}M_1X_2)^{-1}(X_2M_1Y)
\end{aligned} \\ \ \ \\
\begin{aligned}
&Define:\\&X_2^{*}=M_1X_2 \\&Y^{*}=M_1Y \\
And\ \ &b_2=(X_2^{*}{'}X_2^{*})^{-1}(X_2^{*}Y)
\end{aligned}
X 1 ′ X 1 b 1 + X 1 ′ X 2 b 2 = X 1 ′ Y X 2 ′ X 1 b 1 + X 2 ′ X 2 b 2 = X 2 ′ Y b 1 = ( X 1 ′ X 1 ) − 1 X 1 ′ Y − ( X 1 ′ X 1 ) − 1 X 1 ′ X 2 ′ b 2 = ( X 1 ′ X 1 ) − 1 X 1 ′ ( Y − X 2 b 2 ) X 2 ′ X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ Y − X 2 ′ X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ X 2 b 2 + X 2 ′ X 2 b 2 = X 2 ′ Y ∴ b 2 = [ X 2 ′ X 2 − X 2 ′ X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ X 2 ] − 1 ⋅ ( X 2 ′ Y − X 2 ′ X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ Y ) = [ X 2 ′ ( I − X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ ) X 2 ] − 1 ⋅ ( X 2 ′ ( I − X 1 ( X 1 ′ X 1 ) − 1 X 1 ′ ) Y ) = ( X 2 ′ M 1 X 2 ) − 1 ( X 2 M 1 Y ) A n d D e f i n e : X 2 × = M 1 X 2 Y × = M 1 Y b 2 = ( X 2 × ′ X 2 × ) − 1 ( X 2 × Y )
3. OLS估计量的性质的证明 3.1 无偏性 E ( β ^ ) = β E(\hat{\beta})=\beta E ( β ^ ) = β 证明:
β ^ = ( X ′ X ) − 1 X ′ Y = ( X ′ X ) − 1 X ′ ( X β + u ) = β + ( X ′ X ) − 1 X ′ u \hat{\beta}=(X'X)^{-1}X'Y=(X'X)^{-1}X'(X\beta+u)=\beta+(X'X)^{-1}X'u β ^ = ( X ′ X ) − 1 X ′ Y = ( X ′ X ) − 1 X ′ ( X β + u ) = β + ( X ′ X ) − 1 X ′ u
E ( β ^ ) = β + ( X ′ X ) − 1 X ′ E ( u ) = β + ( X ′ X ) − 1 X ′ E X E ( u ∣ X ) = β E(\hat{\beta})=\beta+(X'X)^{-1}X'E(u)=\beta+(X'X)^{-1}X'E_XE(u|X)=\beta E ( β ^ ) = β + ( X ′ X ) − 1 X ′ E ( u ) = β + ( X ′ X ) − 1 X ′ E X E ( u ∣ X ) = β
一阶矩假定:E ( u ) = E X ( E ( u ∣ X ) ) = 0 ← E ( u ∣ X ) = 0 → C o v ( X , u ) = 0 E(u)=E_X(E(u|X))=0\leftarrow E(u|X)=0\rightarrow Cov(X,u)=0 E ( u ) = E X ( E ( u ∣ X ) ) = 0 ← E ( u ∣ X ) = 0 → C o v ( X , u ) = 0
证明:
C o v ( X , u ) = E ( X u ) − E ( X ) E ( u ) = E ( X u ) = E X ( X E ( u ∣ X ) ) = 0 \begin{aligned}
Cov(X,u)&=E(Xu)-E(X)E(u)\\
&=E(Xu)\\
&=E_X(XE(u|X))\\
&=0
\end{aligned}
C o v ( X , u ) = E ( X u ) − E ( X ) E ( u ) = E ( X u ) = E X ( X E ( u ∣ X ) ) = 0
二阶矩假定(球形扰动项)【与无偏性无关】:V a r ( u ) = σ 2 I Var(u)=\sigma^2I V a r ( u ) = σ 2 I OR V a r ( Y ) = σ 2 I Var(Y)=\sigma^2I V a r ( Y ) = σ 2 I
V a r ( β ^ ) = V a r ( β + ( X ′ X ) − 1 X ′ u ) = V a r ( ( X ′ X ) − 1 X ′ u ) = ( X ′ X ) − 1 X ′ V a r ( u ) X ( X ′ X ) − 1 = σ 2 ( X ′ X ) − 1 \begin{aligned}
Var(\hat{\beta})&=Var(\beta+(X'X)^{-1}X'u)\\
&=Var((X'X)^{-1}X'u)\\
&=(X'X)^{-1}X'Var(u)X(X'X)^{-1}\\
&=\sigma^2(X'X)^{-1}
\end{aligned}
V a r ( β ^ ) = V a r ( β + ( X ′ X ) − 1 X ′ u ) = V a r ( ( X ′ X ) − 1 X ′ u ) = ( X ′ X ) − 1 X ′ V a r ( u ) X ( X ′ X ) − 1 = σ 2 ( X ′ X ) − 1
3.2 一致性 P r ( ∣ β ^ n − β ∣ > ε ) → 0 a s n → ∞ Pr(|\hat{\beta}_n-\beta|>\varepsilon)\rightarrow0 \ \ as\ \ n\rightarrow\infty \\
P r ( ∣ β ^ n − β ∣ > ε ) → 0 a s n → ∞
证明:
β ^ n − β = ( X ′ X ) − 1 X ′ u = [ 1 n ∑ i = 1 n x i x i ′ ] − 1 [ 1 n ∑ i = 1 n x i u i ] ⟶ P E ( x i x i ′ ) − 1 E ( x i u i ) \hat{\beta}_n-\beta=(X'X)^{-1}X'u=[\frac{1}{n}\sum_{i=1}^n x_i x_i']^{-1}[\frac{1}{n}\sum_{i=1}^n x_i u_i]\stackrel{P}\longrightarrow E(x_ix_i')^{-1}E(x_iu_i)
β ^ n − β = ( X ′ X ) − 1 X ′ u = [ n 1 i = 1 ∑ n x i x i ′ ] − 1 [ n 1 i = 1 ∑ n x i u i ] ⟶ P E ( x i x i ′ ) − 1 E ( x i u i )
注意:这里未用二阶矩!也就是说异方差、自相关问题并不影响一致性!
3.3 另外的讨论:对σ 2 \sigma^2 σ 2 的无偏估计量σ ^ 2 \hat{\sigma}^2 σ ^ 2 σ ^ 2 = u ^ ′ u ^ n − K \hat{\sigma}^2=\frac{\hat{u}'\hat{u}}{n-K}
σ ^ 2 = n − K u ^ ′ u ^
4. Gauss-Markov Theorem 4.1 描述 当一阶矩假定、二阶矩假定都满足的时候,相比于其他线性无偏估计量,OLS估计量是最有效的。 4.2 Classical Linear Regression Model (CLRM) Assumptions 线性(关于β \beta β ) 解释变量随机选取 不存在严格多重共线性,即X X X 满列秩:r a n k ( X ) = k rank(X)=k r a n k ( X ) = k 一阶矩假定:自变量外生性要求 小样本OLS(严格外生性假定): E ( u i ∣ X ) = 0 E(u_i|X)=0 E ( u i ∣ X ) = 0 , 意味着$Cov(u_i,x_{jk})=0\ \forall j,k $ 大样本OLS (放松的假定——同期不相关假定): E ( u i ∣ X ) = c E(u_i|X)=c E ( u i ∣ X ) = c , E ( u i ) = E x ( u i ∣ X ) = E x ( c ) = 0 E(u_i)=E_x(u_i|X)=E_x(c)=0 E ( u i ) = E x ( u i ∣ X ) = E x ( c ) = 0 二阶矩假定:球形扰动项 【违背后OLS估计量依旧无偏、一致,但就不是BLUE了】 V a r ( u i ∣ X ) = σ 2 Var(u_i|X)=\sigma^2
V a r ( u i ∣ X ) = σ 2
V a r ( u i ∣ X ) = E ( u u ′ ∣ X ) = ( σ 2 0 ⋯ 0 0 σ 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ σ 2 ) = σ 2 I \begin{aligned}
Var(u_i|X)&=E(uu'|X) \\
&=\begin{pmatrix}
\sigma^2 &0 &\cdots &0 \\
0 &\sigma^2 &\cdots &0 \\
\vdots&\vdots&\ddots&\vdots \\
0 &0 &\cdots &\sigma^2
\end{pmatrix} \\
&=\sigma^2I
\end{aligned}
V a r ( u i ∣ X ) = E ( u u ′ ∣ X ) = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ σ 2 0 ⋮ 0 0 σ 2 ⋮ 0 ⋯ ⋯ ⋱ ⋯ 0 0 ⋮ σ 2 ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ = σ 2 I
注意:正态性假设并非CLRM假定之一,但对于小样本OLS的假设检验很关键 4.3 Gauss-Markov定理的证明 假设:β ^ \hat{\beta} β ^ 为任一线性无偏估计
需证明:V a r ( b ∣ X ) ≤ V a r ( β ^ ∣ X ) Var(b|X)\leq Var(\hat{\beta}|X) V a r ( b ∣ X ) ≤ V a r ( β ^ ∣ X )
由于线性假定,故有β ^ = C k × n y \hat \beta=C_{k\times n}y β ^ = C k × n y
∵ b = A y , A = ( X ′ X ) − 1 X ′ D e f i n e D ≡ C − A ∴ β ^ = C y = ( A + D ) y = D ( X β + u ) + b = D X β + D u + b ∴ β = E ( β ^ ∣ X ) = E ( D X β + D u + b ∣ X ) = D X β + β ∴ D X = 0 ∴ β ^ = D u + b ∴ β ^ − β = ( D + A ) u ∴ V a r ( β ^ ∣ X ) = V a r ( β ^ − β ∣ X ) = V a r ( ( D + A ) u ∣ X ) = ( D + A ) V a r ( u ∣ X ) ( D + A ) ′ = σ 2 ( D + A ) ( D ′ + A ′ ) = σ 2 ( D D ′ + ( X ′ X ) − 1 ) ∴ V a r ( β ^ ∣ X ) − V a r ( b ∣ X ) = σ 2 D D ′ N o t i c e : D D ′ 是半正定矩阵!!! \begin{aligned}
&\because\ b=Ay, \ A=(X'X)^{-1}X'\\\ \\
&Define\ D\equiv C-A\\ \ \\
&\therefore\ \hat \beta=Cy=(A+D)y=D(X\beta+u)+b=DX\beta+Du+b\\
&\therefore\ \beta=E(\hat \beta |X)=E(DX\beta+Du+b|X)=DX\beta+\beta \\
&\therefore\ DX=0 \\
&\therefore\ \hat \beta=Du+b \\
&\therefore\ \hat{\beta}-\beta=(D+A)u \\
&\therefore\ Var(\hat{\beta}|X)=Var(\hat{\beta}-\beta|X)=Var((D+A)u|X)\\
& \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =(D+A)Var(u|X)(D+A)' \\
& \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\sigma^2(D+A)(D'+A')\\
& \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\sigma^2(DD'+(X'X)^{-1})\\
&\therefore\ Var(\hat{\beta}|X)-Var(b|X)=\sigma^2DD' \\
\ \\&Notice: DD'是半正定矩阵!!!
\end{aligned}
∵ b = A y , A = ( X ′ X ) − 1 X ′ D e f i n e D ≡ C − A ∴ β ^ = C y = ( A + D ) y = D ( X β + u ) + b = D X β + D u + b ∴ β = E ( β ^ ∣ X ) = E ( D X β + D u + b ∣ X ) = D X β + β ∴ D X = 0 ∴ β ^ = D u + b ∴ β ^ − β = ( D + A ) u ∴ V a r ( β ^ ∣ X ) = V a r ( β ^ − β ∣ X ) = V a r ( ( D + A ) u ∣ X ) = ( D + A ) V a r ( u ∣ X ) ( D + A ) ′ = σ 2 ( D + A ) ( D ′ + A ′ ) = σ 2 ( D D ′ + ( X ′ X ) − 1 ) ∴ V a r ( β ^ ∣ X ) − V a r ( b ∣ X ) = σ 2 D D ′ N o t i c e : D D ′ 是 半 正 定 矩 阵 ! ! !
5. R 2 R^2 R 2 拟合优度:相关的讨论 5.1 残差平方和的另一二次型表示 e ′ e = Y ′ M ′ M Y = Y ′ M Y = Y ′ e = e ′ Y e'e=Y'M'MY=Y'MY=Y'e=e'Y \\ \ \\
e ′ e = Y ′ M ′ M Y = Y ′ M Y = Y ′ e = e ′ Y
5.2 与均值相关的等幂矩阵M 0 M^0 M 0 x = ( x 1 x 2 ⋮ x n ) x ˉ = 1 n i ′ x i x ˉ = i 1 n i ′ x = ( x ˉ x ˉ ⋮ x ˉ ) = 1 n i i ′ x ∴ ( x 1 − x ˉ x 2 − x ˉ ⋮ x n − x ˉ ) = [ x − i x ˉ ] = [ x − 1 n i i ′ x ] [ x − 1 n i i ′ x ] = [ I − 1 n i i ′ ] x = M 0 x M 0 = I − 1 n i i ′ M 0 i = 0 x=\begin{pmatrix} x_1 \\ x_2 \\ \vdots\\ x_n\end{pmatrix} \\ \ \\
\bar{x}=\frac{1}{n}i'x\\ \ \\
i\bar{x}=i\frac{1}{n}i'x=\begin{pmatrix} \bar{x} \\ \bar{x} \\ \vdots\\ \bar{x}\end{pmatrix}
=\frac{1}{n}ii'x\\ \ \\
\therefore\ \begin{pmatrix} x_1-\bar{x}\\ x_2-\bar{x} \\ \vdots\\ x_n-\bar{x}\end{pmatrix}
=[x-i\bar{x}]=[x-\frac{1}{n}ii'x]\\ \ \\ \
[x-\frac{1}{n}ii'x]=[I-\frac{1}{n}ii']x=M^0x
\\ \ \\ M^0=I-\frac{1}{n}ii' \\ \ \\
M^0i=0
x = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ x 1 x 2 ⋮ x n ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ x ˉ = n 1 i ′ x i x ˉ = i n 1 i ′ x = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ x ˉ x ˉ ⋮ x ˉ ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ = n 1 i i ′ x ∴ ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ x 1 − x ˉ x 2 − x ˉ ⋮ x n − x ˉ ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ = [ x − i x ˉ ] = [ x − n 1 i i ′ x ] [ x − n 1 i i ′ x ] = [ I − n 1 i i ′ ] x = M 0 x M 0 = I − n 1 i i ′ M 0 i = 0
5.3 均值离差和 ∑ i = 1 n ( x i − x ˉ ) = i ′ [ M 0 x ] = 0 ′ x = 0 \sum_{i=1}^{n}(x_i-\bar{x})=i'[M^0x]=0'x=0
i = 1 ∑ n ( x i − x ˉ ) = i ′ [ M 0 x ] = 0 ′ x = 0
5.4 均值的离差平方和 ∑ i = 1 n ( x i − x ˉ ) 2 = ∑ i = 1 n x i 2 − n x ˉ 2 = ( x − x ˉ ) ′ ( x − x ˉ ) = ( M 0 x ) ′ ( M 0 x ) = x ′ M 0 ′ M 0 x = x ′ M 0 x \begin{aligned}
\sum_{i=1}^{n}(x_i-\bar{x})^2&=\sum_{i=1}^{n}x_i^2-n\bar{x}^2\\
&=(x-\bar{x})'(x-\bar{x})\\
&=(M^0x)'(M^0x)\\
&=x'M^{0}{'}M^0x \\
&=x'M^0x
\end{aligned}
i = 1 ∑ n ( x i − x ˉ ) 2 = i = 1 ∑ n x i 2 − n x ˉ 2 = ( x − x ˉ ) ′ ( x − x ˉ ) = ( M 0 x ) ′ ( M 0 x ) = x ′ M 0 ′ M 0 x = x ′ M 0 x
总结: ( ∑ i = 1 n ( x i − x ˉ ) 2 ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ ) ∑ i = 1 n ( y i − y ˉ ) ( x i − x ˉ ) ∑ i = 1 n ( y i − y ˉ ) 2 ) = ( x ′ M 0 x x ′ M 0 y y ′ M 0 x y ′ M 0 y ) 总结:\\
\begin{pmatrix}
\sum_{i=1}^{n}(x_i-\bar{x})^2 &\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})\\
\sum_{i=1}^{n}(y_i-\bar{y})(x_i-\bar{x}) &\sum_{i=1}^{n}(y_i-\bar{y})^2
\end{pmatrix}=
\begin{pmatrix}
x'M^0x &x'M^0y\\
y'M^0x &y'M^0y
\end{pmatrix}
总 结 : ( ∑ i = 1 n ( x i − x ˉ ) 2 ∑ i = 1 n ( y i − y ˉ ) ( x i − x ˉ ) ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ ) ∑ i = 1 n ( y i − y ˉ ) 2 ) = ( x ′ M 0 x y ′ M 0 x x ′ M 0 y y ′ M 0 y )
5.5 R 2 R^2 R 2 推导 ∵ y i − y ˉ = y ^ i − y ˉ + e i ∴ M 0 y = M 0 X b + M 0 e ∵ e ′ M 0 X = e ′ X = 0 ∵ y ′ M 0 y = ( X b ) ′ M 0 X b + e ′ e = y ^ M 0 y ^ + e ′ e ∴ S S T = S S E + S S R 第一种表示方式: R 2 = S S E S S T = ( X b ) ′ M 0 X b y ′ M 0 y = 1 − S S R S S T = 1 − e ′ e y ′ M 0 y ∵ e = M x y = M x u ∴ e ′ e = u ′ M x u 第二种表示方式: R 2 = 1 − u ′ M x u y ′ M 0 y ∵ ( X b ) ′ M 0 X b = y ^ ′ M 0 y ^ y ^ = X b , y = y ^ + e , M 0 e = e , X ′ e = 0 ∴ y ^ ′ M 0 y ^ = y ^ ′ M 0 ( y − e ) = y ^ ′ M 0 y − y ^ ′ M 0 e = y ^ ′ M 0 y 第三种表示方式 : R 2 = y ^ ′ M 0 y ^ y ′ M 0 y = y ^ ′ M 0 y y ′ M 0 y ⋅ y ^ ′ M 0 y y ^ ′ M 0 y ^ = [ ∑ ( y ^ i − y ˉ ) ( y i − y ˉ ) ] 2 ∑ ( y i − y ˉ ) 2 ∑ ( y ^ i − y ˉ ) 2 \begin{aligned}
&\because\ y_i-\bar{y}=\hat{y}_i-\bar{y}+e_i\\
&\therefore\ M^0y=M^0Xb+M^0e \\
&\because\ e'M^0X=e'X=0\\
&\because\ y'M^0y=(Xb)'M^0Xb+e'e=\hat{y}M^0\hat{y}+e'e\\
&\therefore\ SST=SSE+SSR \\ \ \\
&第一种表示方式:R^2=\frac{SSE}{SST}=\frac{(Xb)'M^0Xb}{y'M^0y}=1-\frac{SSR}{SST}=1-\frac{e'e}{y'M^0y}\\ \ \\ \ \\
&\because\ e=M_x y=M_x u \\
&\therefore\ e'e=u'M_xu \\ \ \\
&第二种表示方式: R^2=1-\frac{u'M_xu}{y'M^0y}
\\ \ \\ \ \\ \ \\
&\because\ (Xb)'M^0Xb=\hat{y}'M^0\hat{y}\\ &\hat{y}=Xb,y=\hat{y}+e,M^0e=e,X'e=0 \\
&\therefore\ \hat{y}'M^0\hat{y}=\hat{y}'M^0(y-e)=\hat{y}'M^0y-\hat{y}'M^0e=\hat{y}'M^0y \\
\\ \ &第三种表示方式:R^2=\frac{\hat{y}'M^0\hat{y}}{y'M^0y}= \frac{\hat{y}'M^0y}{y'M^0y}\cdot \frac{\hat{y}'M^0y}{\hat{y}'M^0\hat{y}}=\frac{[\sum(\hat{y}_i-\bar{y})(y_i-\bar{y})]^2}{\sum(y_i-\bar{y})^2 \sum(\hat{y}_i-\bar{y})^2}
\end{aligned}
∵ y i − y ˉ = y ^ i − y ˉ + e i ∴ M 0 y = M 0 X b + M 0 e ∵ e ′ M 0 X = e ′ X = 0 ∵ y ′ M 0 y = ( X b ) ′ M 0 X b + e ′ e = y ^ M 0 y ^ + e ′ e ∴ S S T = S S E + S S R 第 一 种 表 示 方 式 : R 2 = S S T S S E = y ′ M 0 y ( X b ) ′ M 0 X b = 1 − S S T S S R = 1 − y ′ M 0 y e ′ e ∵ e = M x y = M x u ∴ e ′ e = u ′ M x u 第 二 种 表 示 方 式 : R 2 = 1 − y ′ M 0 y u ′ M x u ∵ ( X b ) ′ M 0 X b = y ^ ′ M 0 y ^ y ^ = X b , y = y ^ + e , M 0 e = e , X ′ e = 0 ∴ y ^ ′ M 0 y ^ = y ^ ′ M 0 ( y − e ) = y ^ ′ M 0 y − y ^ ′ M 0 e = y ^ ′ M 0 y 第 三 种 表 示 方 式 : R 2 = y ′ M 0 y y ^ ′ M 0 y ^ = y ′ M 0 y y ^ ′ M 0 y ⋅ y ^ ′ M 0 y ^ y ^ ′ M 0 y = ∑ ( y i − y ˉ ) 2 ∑ ( y ^ i − y ˉ ) 2 [ ∑ ( y ^ i − y ˉ ) ( y i − y ˉ ) ] 2
5.6 Adjusted R 2 R^2 R 2 R ˉ 2 = 1 − e ′ e / ( n − K ) y ′ M 0 y / ( n − 1 ) = 1 − n − 1 n − K ( 1 − R 2 ) \bar{R}^2=1-\frac{e'e/(n-K)}{y'M^0y/(n-1)}=1-\frac{n-1}{n-K}(1-R^2)
R ˉ 2 = 1 − y ′ M 0 y / ( n − 1 ) e ′ e / ( n − K ) = 1 − n − K n − 1 ( 1 − R 2 )
定理:在一个多元回归中,若一个回归变量X X X 的t t t 值大于1,将这个变量去掉将导致R ˉ 2 \bar{R}^2 R ˉ 2 减小
证明:
R K 2 为全变量回归后的拟合优度; R 1 2 为省略 x K 后的拟合优度 R_K^2为全变量回归后的拟合优度;R_1^2为省略x_K后的拟合优度 R K 2 为 全 变 量 回 归 后 的 拟 合 优 度 ; R 1 2 为 省 略 x K 后 的 拟 合 优 度
R K 2 = 1 − e ′ e y ′ M 0 y R 1 2 = 1 − e 1 ′ e 1 y ′ M 0 y R ˉ K 2 = 1 − n − 1 n − K ( 1 − R K 2 ) R ˉ 1 2 = 1 − n − 1 n − K + 1 ( 1 − R 1 2 ) R ˉ K 2 − R ˉ 1 2 = n − 1 n − K + 1 e 1 ′ e 1 y ′ M 0 y − n − 1 n − K e ′ e y ′ M 0 y \begin{aligned}
&R_K^2=1-\frac{e'e}{y'M^0y}\\
&R_1^2=1-\frac{e_1{'}e_{1}}{y'M^0y} \\
&\bar{R}_K^2=1-\frac{n-1}{n-K}(1-R_K^2)\\
&\bar{R}_1^2=1-\frac{n-1}{n-K+1}(1-R_1^2)\\
&\bar{R}_K^2-\bar{R}_1^2=\frac{n-1}{n-K+1}\frac{e_1{'}e_{1}}{y'M^0y}-\frac{n-1}{n-K}\frac{e'e}{y'M^0y}
\end{aligned}
R K 2 = 1 − y ′ M 0 y e ′ e R 1 2 = 1 − y ′ M 0 y e 1 ′ e 1 R ˉ K 2 = 1 − n − K n − 1 ( 1 − R K 2 ) R ˉ 1 2 = 1 − n − K + 1 n − 1 ( 1 − R 1 2 ) R ˉ K 2 − R ˉ 1 2 = n − K + 1 n − 1 y ′ M 0 y e 1 ′ e 1 − n − K n − 1 y ′ M 0 y e ′ e
若剔除一个解释变量后调整R 2 R^2 R 2 变小,R ˉ K 2 − R ˉ 1 2 > 0 \bar{R}_K^2-\bar{R}_1^2>0 R ˉ K 2 − R ˉ 1 2 > 0 , 即要求( n − K ) e 1 ′ e 1 > ( n − K + 1 ) e ′ e (n-K)e_1'e_1>(n-K+1)e'e ( n − K ) e 1 ′ e 1 > ( n − K + 1 ) e ′ e , 又因为:
e 1 ′ e 1 = e ′ e + b K 2 ( X K ′ M 1 X K ) e ′ e = ( n − K ) s 2 ∴ b K 2 ( X K ′ M 1 X K ) > 1 定理得证。 e_1'e_1=e'e+b_K^2(X_K'M^1X_K)\\
e'e=(n-K)s^2\\
\therefore b_K^2(X_K'M^1X_K)>1\\
定理得证。
e 1 ′ e 1 = e ′ e + b K 2 ( X K ′ M 1 X K ) e ′ e = ( n − K ) s 2 ∴ b K 2 ( X K ′ M 1 X K ) > 1 定 理 得 证 。
补充:为什么e 1 ′ e 1 = e ′ e + b K 2 ( X K ′ M 1 X K ) e_1'e_1=e'e+b_K^2(X_K'M^1X_K) e 1 ′ e 1 = e ′ e + b K 2 ( X K ′ M 1 X K ) ?
5.7 证明S S T = S S R + S S E SST=SSR+SSE S S T = S S R + S S E :标量形式 证明:∑ ( y i − y ˉ ) 2 = ∑ ( y ^ i − y ˉ ) 2 + ∑ e i 2 \sum(y_i-\bar y)^2=\sum(\hat{y}_i-\bar{y})^2+\sum e_i^2 ∑ ( y i − y ˉ ) 2 = ∑ ( y ^ i − y ˉ ) 2 + ∑ e i 2
∑ ( y i − y ˉ ) 2 = ∑ ( y i − y ^ i + y ^ i − y ˉ ) 2 = ∑ ( y i − y ^ ) 2 + ∑ ( y ^ i − y ˉ ) 2 + 2 ∑ ( y i − y ^ i ) ( y ^ i − y ˉ ) = ∑ ( y i − y ^ ) 2 + ∑ ( y ^ i − y ˉ ) 2 + 2 ∑ ( y i − y ^ i ) ( a + b x i − y ˉ ) = ∑ ( y i − y ^ ) 2 + ∑ ( y ^ i − y ˉ ) 2 + 2 [ ∑ ( y i − y ^ i ) ( a − y ˉ ) + b ∑ ( y i − y ^ i ) x i ] = ∑ ( y i − y ^ ) 2 + ∑ ( y ^ i − y ˉ ) 2 + 2 [ ∑ ( y i − a − b x i ) ( a − y ˉ ) + b ∑ ( y i − a − b x i ) x i ] = ∑ ( y i − y ^ ) 2 + ∑ ( y ^ i − y ˉ ) 2 = ∑ ( y ^ i − y ˉ ) 2 + ∑ e i 2 \begin{aligned}
\sum(y_i-\bar y)^2&=\sum(y_i-\hat{y}_i+\hat{y}_i-\bar y)^2
\\&=\sum(y_i-\hat y)^2+\sum(\hat{y}_i-\bar{y})^2+2\sum(y_i-\hat{y}_i)(\hat{y}_i-\bar{y})
\\&=\sum(y_i-\hat y)^2+\sum(\hat{y}_i-\bar{y})^2+2\sum(y_i-\hat{y}_i)(a+bx_i-\bar{y})
\\&=\sum(y_i-\hat y)^2+\sum(\hat{y}_i-\bar{y})^2+2[\sum(y_i-\hat{y}_i)(a-\bar{y})+b\sum(y_i-\hat{y}_i)x_i]
\\&=\sum(y_i-\hat y)^2+\sum(\hat{y}_i-\bar{y})^2+2[\sum(y_i-a-bx_i)(a-\bar{y})+b\sum(y_i-a-bx_i)x_i]
\\&=\sum(y_i-\hat y)^2+\sum(\hat{y}_i-\bar{y})^2
\\&=\sum(\hat{y}_i-\bar{y})^2+\sum e_i^2
\end{aligned}
∑ ( y i − y ˉ ) 2 = ∑ ( y i − y ^ i + y ^ i − y ˉ ) 2 = ∑ ( y i − y ^ ) 2 + ∑ ( y ^ i − y ˉ ) 2 + 2 ∑ ( y i − y ^ i ) ( y ^ i − y ˉ ) = ∑ ( y i − y ^ ) 2 + ∑ ( y ^ i − y ˉ ) 2 + 2 ∑ ( y i − y ^ i ) ( a + b x i − y ˉ ) = ∑ ( y i − y ^ ) 2 + ∑ ( y ^ i − y ˉ ) 2 + 2 [ ∑ ( y i − y ^ i ) ( a − y ˉ ) + b ∑ ( y i − y ^ i ) x i ] = ∑ ( y i − y ^ ) 2 + ∑ ( y ^ i − y ˉ ) 2 + 2 [ ∑ ( y i − a − b x i ) ( a − y ˉ ) + b ∑ ( y i − a − b x i ) x i ] = ∑ ( y i − y ^ ) 2 + ∑ ( y ^ i − y ˉ ) 2 = ∑ ( y ^ i − y ˉ ) 2 + ∑ e i 2
5.8 对单个系数的t检验 5.9 对线性假设的F检验 检验的原假设H 0 : R β = r H_0:R\beta=r H 0 : R β = r
F统计量的分布
F ≡ ( R b − r ) ′ [ R ( X ′ X ) − 1 R ′ ] − 1 ( R b − r ) / m s 2 ∼ F ( m , n − K ) F\equiv\frac{(Rb-r)'[R(X'X)^{-1}R']^{-1}(Rb-r)/m}{s^2}\sim F(m,n-K)
F ≡ s 2 ( R b − r ) ′ [ R ( X ′ X ) − 1 R ′ ] − 1 ( R b − r ) / m ∼ F ( m , n − K )
5.10 F统计量的似然比原理表达式 F = ( e × ′ e × − e ′ e ) / m e ′ e / ( n − K ) F=\frac{(e^*{'}e^*-e'e)/m}{e'e/(n-K)}
F = e ′ e / ( n − K ) ( e × ′ e × − e ′ e ) / m
5.11 预测 预测点的精确值
y 0 = x 0 ′ β + ε 0 y_0=x_0^{'}\beta+\varepsilon_0
y 0 = x 0 ′ β + ε 0
预测误差
y ^ 0 − y 0 = x 0 ′ ( b − β ) − ε 0 \hat{y}_0-y_0=x_0^{'}(b-\beta)-\varepsilon_0
y ^ 0 − y 0 = x 0 ′ ( b − β ) − ε 0
预测误差的方差
V a r ( y ^ 0 − y 0 ) = V a r ( ε 0 ) + V a r [ x 0 ′ ( b − β ) ] = σ 2 ⏟ y 0 本身的不确定性 + σ 2 x 0 ′ ( X ′ X ) − 1 x 0 ⏟ 抽样误差 Var(\hat{y}_0-y_0)=Var(\varepsilon_0)+Var[x_0'(b-\beta)]=\underbrace{\sigma^2}_{y_0本身的不确定性}+\underbrace{\sigma^2x_0'(X'X)^{-1}x_0}_{抽样误差}
V a r ( y ^ 0 − y 0 ) = V a r ( ε 0 ) + V a r [ x 0 ′ ( b − β ) ] = y 0 本 身 的 不 确 定 性 σ 2 + 抽 样 误 差 σ 2 x 0 ′ ( X ′ X ) − 1 x 0
扰动项服从正态分布 y ^ 0 − y 0 ∼ N ( 0 , σ 2 + σ 2 x 0 ′ ( X ′ X ) − 1 x 0 ) \hat{y}_0-y_0\sim N(0,\sigma^2+\sigma^2x_0'(X'X)^{-1}x_0) y ^ 0 − y 0 ∼ N ( 0 , σ 2 + σ 2 x 0 ′ ( X ′ X ) − 1 x 0 )
y ^ 0 − y 0 s 1 + x 0 ′ ( X ′ X ) − 1 x 0 ∼ t ( n − K ) \frac{\hat{y}_0-y_0}{s\sqrt{1+x_0'(X'X)^{-1}x_0}}\sim t(n-K)
s 1 + x 0 ′ ( X ′ X ) − 1 x 0 y ^ 0 − y 0 ∼ t ( n − K )
六、大样本OLS:渐进性质 1. 多重共线性&解决方式之一:增大样本量 1.1 概念:完全与非完全 1.2 诊断方式 X ′ X X'X X ′ X 是否满秩?
否,完全多重共线性 是
R 2 R^2 R 2 或F F F 高,但单个变量很少会显著 解释变量的高相关性:用x j x_j x j 对{ x 1 , x 2 , . . . x k } \{x_1,x_2,...x_k\} { x 1 , x 2 , . . . x k } 进行回归,发现R j 2 R_j^2 R j 2 较高
V a r ( β ^ k ∣ X ) = σ 2 ( 1 − R k 2 ) S k Var(\hat{\beta}_k|X)=\frac{\sigma^2}{(1-R_k^2)S_k} V a r ( β ^ k ∣ X ) = ( 1 − R k 2 ) S k σ 2 , S k = ∑ i = 1 n ( x i k − x ˉ k ) 2 S_k=\sum_{i=1}^n (x_{ik}-\bar{x}_k)^2 S k = ∑ i = 1 n ( x i k − x ˉ k ) 2 方差膨胀因子:V I F k = 1 1 − R k 2 VIF_k=\frac{1}{1-R_k^2} V I F k = 1 − R k 2 1 , 原则上 m a x { V I F 1 , V I F 2 , . . . , V I F k } ≤ 10 max\{VIF_1, VIF_2,...,VIF_k\}\leq 10 m a x { V I F 1 , V I F 2 , . . . , V I F k } ≤ 1 0 1.3 Remedies 最优方法:如果可能,增加样本观测量(more data) 扔掉一个变量(但有可能导致遗漏变量偏差) 标准化:x ~ ≡ x − x ˉ S x \tilde{x}\equiv \frac{x-\bar{x}}{S_x} x ~ ≡ S x x − x ˉ 2. 为何需要大样本理论? 小样本理论的假设过强 小样本理论要求严格外生性 :C o v ( u i , x j k ) = 0 ∀ j , k Cov(u_i,x_{jk})=0\ \forall j,k C o v ( u i , x j k ) = 0 ∀ j , k 但对于A R ( 1 ) AR(1) A R ( 1 ) 而言,这意味着解释变量与扰动项的过去、现在、未来全部正交
y t = ρ y t − 1 + ε t , ( t = 1 , 2 , . . . , T ) y_t=\rho y_{t-1}+\varepsilon_t,\ (t=1,2,...,T) y t = ρ y t − 1 + ε t , ( t = 1 , 2 , . . . , T ) 严格外生性要求,C o v ( y t − 1 , ε i ) ( i = 1 , 2 , . . . , T ) Cov(y_{t-1},\varepsilon_i)\ (i=1,2,...,T) C o v ( y t − 1 , ε i ) ( i = 1 , 2 , . . . , T ) , 但是,C o v ( y t , ε t ) = C o v ( ρ y t − 1 + ε t , ε t ) = V a r ( ε t ) > 0 Cov(y_t,\varepsilon_t)=Cov(\rho y_{t-1}+\varepsilon_t,\varepsilon_t)=Var(\varepsilon_t)>0 C o v ( y t , ε t ) = C o v ( ρ y t − 1 + ε t , ε t ) = V a r ( ε t ) > 0 大样本仅要求解释变量和同期的扰动项不相关,C o v ( y t − 1 , ε t ) = 0 Cov(y_{t-1},\varepsilon_t)=0 C o v ( y t − 1 , ε t ) = 0 扰动项服从正态分布的假设过强 统计量的精确分布难以推导 Notice: 大样本理论的缺陷 通常认为n ≥ 30 , 通常 100 以上 n\geq30,通常100以上 n ≥ 3 0 , 通 常 1 0 0 以 上 3. 随机收敛 3.1 依概率收敛 { x n } n = 1 ∞ = { x 1 , x 2 , x 3 , . . . } \{x_n\}_{n=1}^\infty=\{x_1,x_2,x_3,...\} { x n } n = 1 ∞ = { x 1 , x 2 , x 3 , . . . } 依概率收敛于常数a
记p l i m x n = a plim\ x_n=a p l i m x n = a , x n ⟶ P a x_n\stackrel{P}\longrightarrow a x n ⟶ P a
若对于任意ε > 0 \varepsilon>0 ε > 0 , 当n → ∞ n\rightarrow\infty n → ∞ 时,都有l i m P ( ∣ x n − a ∣ > ε ) = 0 limP(|x_n-a|>\varepsilon)=0 l i m P ( ∣ x n − a ∣ > ε ) = 0
随机变量间的收敛
x n ⟶ P x x_n\stackrel{P}\longrightarrow x x n ⟶ P x if { x n − x } n = 1 ∞ \{x_n-x\}_{n=1}^{\infty} { x n − x } n = 1 ∞ 依概率收敛到0 随机向量间的收敛
p l i m x n ⃗ = x ⃗ plim\ \vec{x_n}=\vec{x} p l i m x n = x 3.2 依均方收敛 3.3 依分布收敛 { x n } n = 1 ∞ \{x_n\}_{n=1}^{\infty} { x n } n = 1 ∞ 与随机变量x的累积分布函数分别记为F n ( x ) F_n(x) F n ( x ) 和F ( x ) F(x) F ( x )
if $ \forall x, 都有 , 都有 , 都 有 limF_n(x)=F(x), 则称 , 则称 , 则 称 x_n\stackrel{d}\longrightarrow x, 并称 x 为 , 并称x为 , 并 称 x 为 {x_n}$的渐进分布
例如:t分布的渐进分布是正态分布
t ( k ) ⟶ d N ( 0 , 1 ) t(k)\stackrel{d}\longrightarrow N(0,1) t ( k ) ⟶ d N ( 0 , 1 ) 3.4 渐进正态:定义 i f x n ⟶ d x , 且 x ∼ N o r m a l ( ) , 则称 x n 为渐进正态 if\ x_n\stackrel{d}\longrightarrow x, 且x\sim Normal(),则称x_n为渐进正态
i f x n ⟶ d x , 且 x ∼ N o r m a l ( ) , 则 称 x n 为 渐 进 正 态
3.5 各随机收敛的关系 依均方收敛 → 依概率收敛 → 依分布收敛 依均方收敛\rightarrow依概率收敛\rightarrow 依分布收敛
依 均 方 收 敛 → 依 概 率 收 敛 → 依 分 布 收 敛
4. 大样本理论的工具:大数定律&CLT 4.1 大数定律 【强大数定律:依均方收敛】当样本容量n n n 很大时,样本均值趋于总体均值。
【切比雪夫大数定律:依概率收敛】
4.2 中心极限定理(CLT) 不管{ x n } n = 1 ∞ \{x_n\}_{n=1}^{\infty} { x n } n = 1 ∞ 具体分布,当n → ∞ n\rightarrow\infty n → ∞ 时,样本均值x n ˉ \bar{x_n} x n ˉ 的渐进分布都是正态分布【但必须是iid】
x ˉ n − μ σ 2 n ⟶ d N ( 0 , 1 ) ⇒ x ˉ n ⟶ d N ( μ , σ 2 / n ) 变形: σ ( x ˉ n − μ σ 2 n ) ⟶ d σ N ( 0 , 1 ) x ˉ n − μ 1 n ⟶ d N ( 0 , σ 2 ) R o o t − n C o n v e r g e n c e : n ( x ˉ n − μ ) ⟶ d N ( 0 , σ 2 ) \frac{\bar{x}_n-\mu}{\sqrt{\frac{\sigma^2}{n}}}\stackrel{d}\longrightarrow N(0,1)\Rightarrow \bar{x}_n \stackrel{d}\longrightarrow N(\mu, \sigma^2/n)\\
变形:\\
\sigma(\frac{\bar{x}_n-\mu}{\sqrt{\frac{\sigma^2}{n}}})\stackrel{d}\longrightarrow \sigma N(0,1)\\
\frac{\bar{x}_n-\mu}{\sqrt{\frac{1}{n}}}\stackrel{d}\longrightarrow N(0,\sigma^2)\\
Root-n\ Convergence:\ \sqrt{n}(\bar{x}_n-\mu)\stackrel{d}\longrightarrow N(0,\sigma^2)
n σ 2 x ˉ n − μ ⟶ d N ( 0 , 1 ) ⇒ x ˉ n ⟶ d N ( μ , σ 2 / n ) 变 形 : σ ( n σ 2 x ˉ n − μ ) ⟶ d σ N ( 0 , 1 ) n 1 x ˉ n − μ ⟶ d N ( 0 , σ 2 ) R o o t − n C o n v e r g e n c e : n ( x ˉ n − μ ) ⟶ d N ( 0 , σ 2 )
x ˉ n − μ \bar{x}_n-\mu x ˉ n − μ 趋向0的速度大约为: 1 1 , 1 2 , . . . , 1 n \frac{1}{\sqrt{1}}, \frac{1}{\sqrt{2}},..., \frac{1}{\sqrt{n}} 1 1 , 2 1 , . . . , n 1
推广到多维的中心极限定理
n ( x ˉ n ⃗ − μ ⃗ ) ⟶ d N ( 0 ⃗ , Σ ) \sqrt{n}(\vec{\bar{x}_n}-\vec{\mu})\stackrel{d}\longrightarrow N(\vec{0}, \Sigma)
n ( x ˉ n − μ ) ⟶ d N ( 0 , Σ )
5. 统计量的大样本性质 5.1 一致估计量 p l i m β ^ n = β [ 依概率收敛 ] plim\ \hat{\beta}_n=\beta\ [依概率收敛]
p l i m β ^ n = β [ 依 概 率 收 敛 ]
5.2 渐进正态与渐近方差 n ( β ^ n − β ) ⟶ d N ( 0 , Σ ) \sqrt{n}(\hat{\beta}_n-\beta)\stackrel{d}\longrightarrow N(0,\Sigma)
n ( β ^ n − β ) ⟶ d N ( 0 , Σ )
σ 2 \sigma^2 σ 2 为渐近方差,记为A v a r ( β ^ n ) Avar(\hat{\beta}_n) A v a r ( β ^ n ) 5.3 渐进有效性 若A v a r ( β ^ n ) < A v a r ( β ~ n ) Avar(\hat{\beta}_n)<Avar(\tilde{\beta}_n) A v a r ( β ^ n ) < A v a r ( β ~ n ) , 则称β ^ n \hat{\beta}_n β ^ n 比β ~ n \tilde{\beta}_n β ~ n 更为渐进有效
5.4 均方误差 M S E ( β ^ ) ≡ E [ ( β ^ − β ) 2 ] MSE(\hat{\beta})\equiv E[(\hat{\beta}-\beta)^2]
M S E ( β ^ ) ≡ E [ ( β ^ − β ) 2 ]
证明:M S E ( β ^ ) = V a r ( β ^ ) + [ B i a s ( β ^ ) ] 2 MSE(\hat{\beta})=Var(\hat{\beta})+[Bias(\hat{\beta})]^2 M S E ( β ^ ) = V a r ( β ^ ) + [ B i a s ( β ^ ) ] 2
M S E ( β ^ ) ≡ E [ ( β ^ − β ) 2 ] = E [ ( β ^ − E ( β ^ ) + E ( β ^ ) − β ) 2 ] = V a r ( β ^ ) + [ B i a s ( β ^ ) ] 2 MSE(\hat{\beta})\equiv E[(\hat{\beta}-\beta)^2]=E[(\hat{\beta}-E(\hat{\beta})+E(\hat{\beta})-\beta)^2]=Var(\hat{\beta})+[Bias(\hat{\beta})]^2
M S E ( β ^ ) ≡ E [ ( β ^ − β ) 2 ] = E [ ( β ^ − E ( β ^ ) + E ( β ^ ) − β ) 2 ] = V a r ( β ^ ) + [ B i a s ( β ^ ) ] 2
多维形式:M S E ( β ^ ) = E [ ( β ^ − β ) ( β ^ − β ) ′ ] = V a r ( β ^ ) + [ B i a s ( β ^ ) ] [ B i a s ( β ^ ) ] ′ MSE(\hat{\beta})=E[(\hat{\beta}-\beta)(\hat{\beta}-\beta)']=Var(\hat{\beta})+[Bias(\hat{\beta})][Bias(\hat{\beta})]' M S E ( β ^ ) = E [ ( β ^ − β ) ( β ^ − β ) ′ ] = V a r ( β ^ ) + [ B i a s ( β ^ ) ] [ B i a s ( β ^ ) ] ′
6. 大样本OLS假定 假定1:线性假定
假定2:( K + 1 ) (K+1) ( K + 1 ) 维随机过程{ y i , x i 1 , . . . , x i k } \{y_i,x_{i1},...,x_{ik}\} { y i , x i 1 , . . . , x i k } 渐进独立平稳,故适用大数定律与CLT
假定3:同期外生性 E ( x i k u i ) = 0 , ∀ i , k E(x_{ik }u_i)=0,\forall i,k E ( x i k u i ) = 0 , ∀ i , k
假定4:秩条件:X满列秩
假定5:定义如下向量——
g i = x i ε i = ( x i 1 x i 2 ⋮ x i K ) ε i g_i=x_i\varepsilon_i=\begin{pmatrix}x_{i1}\\x_{i2}\\ \vdots\ \\ x_{iK} \end{pmatrix}\varepsilon_i
g i = x i ε i = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ x i 1 x i 2 ⋮ x i K ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ ε i
g i g_i g i 为鞅差分序列,且其协方差矩阵S = E [ g i g i ′ ] = E ( ε i 2 x i x i ′ ) S=E[g_ig_i']=E(\varepsilon_i^2x_ix_i') S = E [ g i g i ′ ] = E ( ε i 2 x i x i ′ ) 为非退化矩阵
Notice:无需假定 严格外生 与 正态随机扰动项!
假定6:解释变量的四阶矩存在——E [ ( x i k x i j ) 2 ] E[(x_{ik}x_{ij})^2] E [ ( x i k x i j ) 2 ] 存在且为有限( ∀ i , j , k ) (\forall i,j,k) ( ∀ i , j , k ) 7. OLS的大样本性质 由于X = ( x 1 ′ x 2 ′ ⋮ x n ′ ) X=\begin{pmatrix}x_1'\\x_2'\\ \vdots \\x_n'\end{pmatrix} X = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ x 1 ′ x 2 ′ ⋮ x n ′ ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ ,故X ′ X = ( x 1 x 2 ⋯ x n ) ( x 1 ′ x 2 ′ ⋮ x n ′ ) = ∑ i = 1 n [ x i x i ′ ] K × K X'X=\begin{pmatrix}x_1&x_2& \cdots &x_n\end{pmatrix}\begin{pmatrix}x_1'\\x_2'\\ \vdots \\x_n'\end{pmatrix}=\sum_{i=1}^n[x_ix_i']_{K\times K} X ′ X = ( x 1 x 2 ⋯ x n ) ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ x 1 ′ x 2 ′ ⋮ x n ′ ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ = ∑ i = 1 n [ x i x i ′ ] K × K
定义 S X X = 1 n X ′ X = ∑ 1 n x i x i ′ S_{XX}=\frac{1}{n}X'X=\sum\frac{1}{n}x_ix_i' S X X = n 1 X ′ X = ∑ n 1 x i x i ′
另一方面,X ′ y = ( x 1 x 2 ⋯ x n ) ( y 1 y 2 ⋮ y n ) = ∑ i = 1 n x i y i X'y=\begin{pmatrix}x_1&x_2& \cdots &x_n\end{pmatrix}\begin{pmatrix}y_1\\y_2\\ \vdots \\y_n\end{pmatrix}=\sum_{i=1}^nx_iy_i X ′ y = ( x 1 x 2 ⋯ x n ) ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ y 1 y 2 ⋮ y n ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ = ∑ i = 1 n x i y i
∴ b = ( X ′ X ) − 1 X ′ y = S X X − 1 S X Y \therefore\ b=(X'X)^{-1}X'y=S_{XX}^{-1}S_{XY} ∴ b = ( X ′ X ) − 1 X ′ y = S X X − 1 S X Y
性质一:β ^ \hat{\beta} β ^ 一致估计量 p l i m β ^ n = β plim\ \hat{\beta}_n=\beta p l i m β ^ n = β
性质二:β ^ \hat{\beta} β ^ 服从渐进正态分布 (为进行统计推断)
n ( β ^ n − β ) ⟶ d N ( 0 , A v a r ( β ^ ) ) \sqrt{n}(\hat{\beta}_n-\beta)\stackrel{d}\longrightarrow N(0,Avar(\hat{\beta})) n ( β ^ n − β ) ⟶ d N ( 0 , A v a r ( β ^ ) )
V a r ( β ^ ∣ X ) = ( X ′ X ) − 1 X ′ V a r ( u ∣ X ) X ( X ′ X ) − 1 Var(\hat{\beta}|X)=(X'X)^{-1}X'Var(u|X)X(X'X)^{-1} V a r ( β ^ ∣ X ) = ( X ′ X ) − 1 X ′ V a r ( u ∣ X ) X ( X ′ X ) − 1
A v a r ( β ^ ) = [ E ( x i x i ′ ) ] − 1 S [ E ( x i x i ′ ) ] − 1 Avar(\hat{\beta})=[E(x_ix_i')]^{-1}S[E(x_ix_i')]^{-1} A v a r ( β ^ ) = [ E ( x i x i ′ ) ] − 1 S [ E ( x i x i ′ ) ] − 1 , 其中S ≡ E ( g i g i ′ ) = E ( ε i 2 x i x i ′ ) S\equiv E(g_ig_i')=E(\varepsilon_i^2 x_ix_i') S ≡ E ( g i g i ′ ) = E ( ε i 2 x i x i ′ )
证明:大样本下s 2 s^2 s 2 是无条件方差E ( ε i 2 ) = σ 2 E(\varepsilon_i^2)=\sigma^2 E ( ε i 2 ) = σ 2 的一致估计量
s 2 ≡ e ′ e n − K = ε ′ M ε n − K = ε ′ [ I n − X ( X ′ X ) − 1 X ′ ] ε n − K = 1 n − K [ ε ′ ε − ε ′ X ( X ′ X ) − 1 X ′ ε ] = n n − K [ ε ′ ε n − ε ′ X ( X ′ X ) − 1 X ′ ε n ] = n n − K [ 1 n ∑ ε i 2 − g ˉ S X X − 1 g ˉ ] ⟶ n → ∞ σ 2 \begin{aligned}
s^2&\equiv \frac{e'e}{n-K}=\frac{\varepsilon'M\varepsilon}{n-K}=\frac{\varepsilon'[I_n-X(X'X)^{-1}X']\varepsilon}{n-K}\\
&=\frac{1}{n-K}[\varepsilon'\varepsilon-\varepsilon'X(X'X)^{-1}X'\varepsilon]\\
&=\frac{n}{n-K}[\frac{\varepsilon'\varepsilon}{n}-\frac{\varepsilon'X(X'X)^{-1}X'\varepsilon}{n}]\\
&=\frac{n}{n-K}[\frac{1}{n}\sum\varepsilon_i^2-\bar{g}S_{XX}^{-1}\bar{g}]\\
&\mathop{\longrightarrow}_{n\rightarrow \infty}\sigma^2
\end{aligned}
s 2 ≡ n − K e ′ e = n − K ε ′ M ε = n − K ε ′ [ I n − X ( X ′ X ) − 1 X ′ ] ε = n − K 1 [ ε ′ ε − ε ′ X ( X ′ X ) − 1 X ′ ε ] = n − K n [ n ε ′ ε − n ε ′ X ( X ′ X ) − 1 X ′ ε ] = n − K n [ n 1 ∑ ε i 2 − g ˉ S X X − 1 g ˉ ] ⟶ n → ∞ σ 2
8. 线性假设的大样本检验 8.1 检验单个系数:H 0 : β k = β ˉ k H_0:\beta_k=\bar{\beta}_k H 0 : β k = β ˉ k t k = n ( b k − β ˉ k ) A v a r ^ ( b k ) = b k − β ˉ k 1 n A v a r ^ ( b k ) ≡ b k − β k ˉ S E × ( b k ) ⟶ d N ( 0 , 1 ) t_k=\frac{\sqrt{n}(b_k-\bar{\beta}_k)}{\sqrt{\widehat{Avar}(b_k)}}=\frac{b_k-\bar{\beta}_k}{\sqrt{\frac{1}{n}\widehat{Avar}(b_k)}}\equiv\frac{b_k-\bar{\beta_k}}{SE^*(b_k)}\stackrel{d}\longrightarrow N(0,1)
t k = A v a r ( b k ) n ( b k − β ˉ k ) = n 1 A v a r ( b k ) b k − β ˉ k ≡ S E × ( b k ) b k − β k ˉ ⟶ d N ( 0 , 1 )
S E × ( b k ) ≡ 1 n A v a r ^ ( b k ) = 1 n ( S X X − 1 S ^ S X X − 1 ) k k {SE^*(b_k)}\equiv \sqrt{\frac{1}{n}\widehat{Avar}(b_k)}=\sqrt{\frac{1}{n}(S_{XX}^{-1}\hat{S}S_{XX}^{-1})_{kk}}
S E × ( b k ) ≡ n 1 A v a r ( b k ) = n 1 ( S X X − 1 S ^ S X X − 1 ) k k
异方差稳健的标准误
统计量t k t_k t k 服从标准正态分布,而不是t分布!!! 命题:在同方差的假定下,稳健标准误还原为普通标准误
证明:条件同方差意味着E ( ε i 2 ∣ x i ) = σ 2 > 0 E(\varepsilon_i^2|x_i)=\sigma^2>0 E ( ε i 2 ∣ x i ) = σ 2 > 0 ,根据期望迭代定律:
S ≡ E ( x i x i ′ ε i 2 ) = E x i E ( x i x i ′ ε i 2 ∣ x i ) = E x i [ x i x i ′ E ( ε i 2 ∣ x i ) ] = σ 2 E ( x i x i ′ ) ∵ s 2 ⟶ p σ 2 , S X X ⟶ p E ( x i x i ′ ) ∴ s 2 S X X 是 S 的一致估计量 ∴ A v a r ^ ( b ) = S X X − 1 ( s 2 S X X ) S X X − 1 = n s 2 ( X ′ X ) − 1 ∴ S E × ( b k ) = 1 n A v a r ^ ( b k ) = 1 n n s 2 ( X ′ X ) k k − 1 = s 2 ( X ′ X ) k k − 1 S\equiv E(x_ix_i'\varepsilon_i^2)=E_{x_i}E(x_ix_i'\varepsilon_i^2|x_i)=E_{x_i}[x_ix_i'E(\varepsilon_i^2|x_i)]=\sigma^2E(x_ix_i')\\
\begin{aligned}
\because& s^2\stackrel{p}\longrightarrow \sigma^2, S_{XX}\stackrel{p}\longrightarrow E(x_ix_i')\\
\therefore&\ s^2S_{XX}\text{是}S\text{的一致估计量}
\\ \therefore&\ \widehat{Avar}(b)=S_{XX}^{-1}(s^2S_{XX})S_{XX}^{-1}=ns^2(X'X)^{-1}
\\ \therefore&\ SE^*(b_k)=\sqrt{\frac{1}{n}\widehat{Avar}(b_k)}=\sqrt{\frac{1}{n}ns^2(X'X)_{kk}^{-1}}=\sqrt{s^2(X'X)_{kk}^{-1}}
\end{aligned}
S ≡ E ( x i x i ′ ε i 2 ) = E x i E ( x i x i ′ ε i 2 ∣ x i ) = E x i [ x i x i ′ E ( ε i 2 ∣ x i ) ] = σ 2 E ( x i x i ′ ) ∵ ∴ ∴ ∴ s 2 ⟶ p σ 2 , S X X ⟶ p E ( x i x i ′ ) s 2 S X X 是 S 的一致估计量 A v a r ( b ) = S X X − 1 ( s 2 S X X ) S X X − 1 = n s 2 ( X ′ X ) − 1 S E × ( b k ) = n 1 A v a r ( b k ) = n 1 n s 2 ( X ′ X ) k k − 1 = s 2 ( X ′ X ) k k − 1
8.2 检验线性假设:H 0 : R β = r H_0: R\beta=r H 0 : R β = r ,其中R R R 满行秩 W ≡ [ n ( R b − r ) ] ′ [ R A v a r ^ ( b ) R ′ ] − 1 [ n ( R b − r ) ] ⟶ d χ 2 ( m ) W\equiv [\sqrt{n}(Rb-r)]'[R \widehat{Avar}(b)R']^{-1}[\sqrt{n}(Rb-r)]\stackrel{d}\longrightarrow \chi^2(m)
W ≡ [ n ( R b − r ) ] ′ [ R A v a r ( b ) R ′ ] − 1 [ n ( R b − r ) ] ⟶ d χ 2 ( m )
七、对二阶矩的破坏:异方差与自相关问题 1. 异方差问题 1.1 问题起源 V a r ( u ∣ X ) ≡ Σ = ( σ 1 2 0 ⋯ 0 0 σ 2 2 ⋯ 0 0 0 ⋱ 0 0 0 ⋯ σ n 2 ) Var(u|X)\equiv\Sigma=\begin{pmatrix}\sigma_1^2 &0 &\cdots &0\\ 0 &\sigma_2^2 &\cdots &0
\\ 0 &0 &\ddots &0\\ 0 &0 &\cdots &\sigma_n^2\end{pmatrix}
V a r ( u ∣ X ) ≡ Σ = ⎝ ⎜ ⎜ ⎜ ⎛ σ 1 2 0 0 0 0 σ 2 2 0 0 ⋯ ⋯ ⋱ ⋯ 0 0 0 σ n 2 ⎠ ⎟ ⎟ ⎟ ⎞
1.2 异方差的后果 V a r ( u ) = Σ ≠ σ 2 I V a r ( β ^ ∣ X ) = ( X ′ X ) − 1 X ′ Σ X ( X ′ X ) − 1 Var(u)=\Sigma\neq\sigma^2I \\
Var(\hat{\beta}|X)=(X'X)^{-1}X'\Sigma X(X'X)^{-1}
V a r ( u ) = Σ = σ 2 I V a r ( β ^ ∣ X ) = ( X ′ X ) − 1 X ′ Σ X ( X ′ X ) − 1
OLS给出的对于方差V a r ( u ) Var(u) V a r ( u ) 的估计是"downward bias", 就是说比真实的误差方差小,进而会导致“过度拒绝” 1.3 如何发现异方差 方法一:残差图
残差e i e_i e i 与拟合值y ^ i \hat{y}_i y ^ i 的散点图 残差e i e_i e i 与某个解释变量x i k x_{ik} x i k 的散点图 方法二:做检验
1.4 Remedies S S R β ~ = ∑ i = 1 n e i 2 v i \mathop{SSR}_{\tilde{\beta}}=\sum_{i=1}^{n}\frac{e_i^2}{v_i}
SSR β ~ = i = 1 ∑ n v i e i 2
H = ( 1 / σ 1 0 ⋯ 0 0 1 / σ 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 0 ) H Y = H X β + H u Y ~ = H Y X ~ = H X u ~ = H u β ^ W L S = ( X ~ ′ X ~ ) − 1 X ~ ′ Y ~ = [ ( H X ) ′ H X ] − 1 ( H X ) ′ ( H Y ) = [ X ′ H ′ H X ] − 1 X ′ H ′ H Y = ( X ′ Σ − 1 X ) − 1 X ′ Σ − 1 Y = ( X ′ Σ − 1 X ) − 1 X ′ Σ − 1 ( X β + u ) = β + ( X ′ Σ − 1 X ) − 1 X ′ Σ − 1 u V a r ( β ^ W L S ) = V a r ( ( X ′ Σ − 1 X ) − 1 X ′ Σ − 1 u ) = ( X ′ Σ − 1 X ) − 1 X ′ Σ − 1 V a r ( u ) Σ − 1 X ( X ′ Σ − 1 X ) − 1 = ( X ′ Σ − 1 X ) − 1 H=\begin{pmatrix} 1/\sigma_1 &0 &\cdots &0 \\
0 &1/\sigma_2 &\cdots &0 \\
\vdots &\vdots &\ddots &\vdots\\
0 &0 &\cdots &0 \end{pmatrix} \\ \ \\
HY=HX\beta+Hu\\
\tilde Y=HY\\
\tilde X=HX\\
\tilde u =Hu \\
\ \\
\begin{aligned}
\hat{\beta}_{WLS}&=(\tilde{X}'\tilde{X})^{-1}\tilde{X}'\tilde{Y}\\
&=[(HX)'HX]^{-1}(HX)'(HY)\\
&=[X'H'HX]^{-1}X'H'HY\\
&=(X'\Sigma^{-1}X)^{-1}X'\Sigma^{-1}Y\\
&=(X'\Sigma^{-1}X)^{-1}X'\Sigma^{-1}(X\beta+u)\\
&=\beta+(X'\Sigma^{-1}X)^{-1}X'\Sigma^{-1}u
\end{aligned}\\ \ \\
\begin{aligned}
Var(\hat{\beta}_{WLS})&=Var((X'\Sigma^{-1}X)^{-1}X'\Sigma^{-1}u)\\
&=(X'\Sigma^{-1}X)^{-1}X'\Sigma^{-1}Var(u)\Sigma^{-1}X(X'\Sigma^{-1}X)^{-1}\\
&=(X'\Sigma^{-1}X)^{-1}
\end{aligned}
H = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ 1 / σ 1 0 ⋮ 0 0 1 / σ 2 ⋮ 0 ⋯ ⋯ ⋱ ⋯ 0 0 ⋮ 0 ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ H Y = H X β + H u Y ~ = H Y X ~ = H X u ~ = H u β ^ W L S = ( X ~ ′ X ~ ) − 1 X ~ ′ Y ~ = [ ( H X ) ′ H X ] − 1 ( H X ) ′ ( H Y ) = [ X ′ H ′ H X ] − 1 X ′ H ′ H Y = ( X ′ Σ − 1 X ) − 1 X ′ Σ − 1 Y = ( X ′ Σ − 1 X ) − 1 X ′ Σ − 1 ( X β + u ) = β + ( X ′ Σ − 1 X ) − 1 X ′ Σ − 1 u V a r ( β ^ W L S ) = V a r ( ( X ′ Σ − 1 X ) − 1 X ′ Σ − 1 u ) = ( X ′ Σ − 1 X ) − 1 X ′ Σ − 1 V a r ( u ) Σ − 1 X ( X ′ Σ − 1 X ) − 1 = ( X ′ Σ − 1 X ) − 1
现实中我们并不知道所有的σ i 2 \sigma_i^2 σ i 2 s, 用n个观测值去估计这n个σ i 2 \sigma_i^2 σ i 2 s是不可能的
因此背后还是依靠一种理论,比如我们认为解释变量的值越大,误差的方差就会越大
Y i = β 0 + β 1 X i + u i V a r ( u i ) = σ 2 X i 2 Y i / X i = β 0 / X i + β 1 + u i / X i ∴ V a r ( u i / X i ) = 1 X i 2 V a r ( u i ) = σ 2 ∴ H = ( 1 / X 1 0 ⋯ 0 0 1 / X 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 1 / X n ) → V a r ( u ~ ) = V a r ( H u ) = σ 2 I Y_i=\beta_0+\beta_1X_i+u_i \\
Var(u_i)=\sigma^2X_i^2 \\
Y_i/X_i=\beta_0/X_i+\beta_1+u_i/X_i\\
\therefore\ Var(u_i/X_i)=\frac{1}{X_i^2}Var(u_i)=\sigma^2\\
\therefore\ H=\begin{pmatrix}1/X_1 &0 &\cdots &0
\\ 0 &1/X_2 &\cdots &0 \\
\vdots &\vdots &\ddots &\vdots \\
0 &0 &\cdots &1/X_n\end{pmatrix}\rightarrow Var(\tilde{u})=Var(Hu)=\sigma^2I
Y i = β 0 + β 1 X i + u i V a r ( u i ) = σ 2 X i 2 Y i / X i = β 0 / X i + β 1 + u i / X i ∴ V a r ( u i / X i ) = X i 2 1 V a r ( u i ) = σ 2 ∴ H = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ 1 / X 1 0 ⋮ 0 0 1 / X 2 ⋮ 0 ⋯ ⋯ ⋱ ⋯ 0 0 ⋮ 1 / X n ⎠ ⎟ ⎟ ⎟ ⎟ ⎞ → V a r ( u ~ ) = V a r ( H u ) = σ 2 I
Grouped Data
V a r ( u g ) = V a r ( 1 N g ∑ i u g i ) = 1 N g σ 2 H = ( N 1 0 ⋯ 0 0 N 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ N g ) Var(u_g)=Var(\frac{1}{N_g}\sum_{i}u_{gi})=\frac{1}{N_g}\sigma^2\\
H=\begin{pmatrix}\sqrt{N_1} &0 &\cdots &0\\
0 &\sqrt{N_2} &\cdots &0\\
\vdots &\vdots&\ddots&\vdots\\ 0 &0 &\cdots &\sqrt{N_g} \end{pmatrix}
V a r ( u g ) = V a r ( N g 1 i ∑ u g i ) = N g 1 σ 2 H = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ N 1 0 ⋮ 0 0 N 2 ⋮ 0 ⋯ ⋯ ⋱ ⋯ 0 0 ⋮ N g ⎠ ⎟ ⎟ ⎟ ⎟ ⎞
方法二:Heteroskedasticity-robust standard errors
V a r r s e ( β ^ O L S ) = ( X ′ X ) − 1 X ′ d i a g ( e 1 2 , e 2 2 , ⋯ , e n 2 ) X ( X ′ X ) − 1 Var_{rse}(\hat{\beta}_{OLS})=(X'X)^{-1}X'diag(e_1^2,e_2^2,\cdots,e_n^2)X(X'X)^{-1}
V a r r s e ( β ^ O L S ) = ( X ′ X ) − 1 X ′ d i a g ( e 1 2 , e 2 2 , ⋯ , e n 2 ) X ( X ′ X ) − 1
这是对V a r ( β ^ ) Var(\hat{\beta}) V a r ( β ^ ) 的一致估计量,而非对V a r ( u ) Var(u) V a r ( u ) 的!!!
V a r r s e ( β ^ O L S ) Var_{rse}(\hat{\beta}_{OLS}) V a r r s e ( β ^ O L S ) 依旧有downward bias,但大多数情况下比V a r C L R M ( β ^ ) Var_{CLRM}(\hat{\beta}) V a r C L R M ( β ^ ) 好一些(高一些)
哪些情况呢?
严重异方差:V a r C L R M ( β ^ O L S ) ≤ V a r r s e ( β ^ ) ≤ V a r ( β ^ O L S ) Var_{CLRM}(\hat{\beta}_{OLS})\leq Var_{rse}(\hat{\beta})\leq Var(\hat{\beta}_{OLS}) V a r C L R M ( β ^ O L S ) ≤ V a r r s e ( β ^ ) ≤ V a r ( β ^ O L S )
只有一点异方差:V a r r s e ( β ^ O L S ) ≤ V a r C L R M ( β ^ ) ≤ V a r ( β ^ O L S ) Var_{rse}(\hat{\beta}_{OLS})\leq Var_{CLRM}(\hat{\beta})\leq Var(\hat{\beta}_{OLS}) V a r r s e ( β ^ O L S ) ≤ V a r C L R M ( β ^ ) ≤ V a r ( β ^ O L S )
最稳健的做法:m a x [ e ′ e n − K , V a r r s e ( β ^ O L S ) ] max[\frac{e'e}{n-K},Var_{rse}(\hat{\beta}_{OLS})] m a x [ n − K e ′ e , V a r r s e ( β ^ O L S ) ]
方法三:GLS更广泛的讨论
V a r ( ε ∣ X ) = σ 2 V ( X ) ≠ σ 2 I n Var(\varepsilon|X)=\sigma^2V(X)\neq\sigma^2I_n V a r ( ε ∣ X ) = σ 2 V ( X ) = σ 2 I n ,其中V ( X ) V(X) V ( X ) 为对称正定矩阵且已知
命题:对于正定矩阵V n × n V_{n\times n} V n × n , 存在非退化矩阵C n × n C_{n\times n} C n × n ,使得V − 1 = C ′ C V^{-1}=C'C V − 1 = C ′ C
y = X β + ε ↓ C y = C X β + C ε ↓ y ~ = X ~ β + ε ~ ↓ V a r ( ε ~ ∣ X ) = E ( ε ~ ε ~ ′ ∣ X ) = σ 2 C V C ′ = σ 2 C ( C ′ C ) − 1 C ′ = σ 2 C C − 1 ( C ′ ) − 1 C ′ = σ 2 I n ↓ β ^ G L S = ( X ~ ′ X ~ ) − 1 X ~ ′ y ~ = [ ( C X ) ′ ( C X ) ] − 1 ( C X ) ′ C y = ( X ′ V − 1 X ) − 1 X ′ V − 1 y y=X\beta+\varepsilon \\
\downarrow
\\ Cy=CX\beta+C\varepsilon
\\ \downarrow
\\ \tilde{y}=\tilde{X}\beta+\tilde{\varepsilon}
\\ \downarrow
\\Var(\tilde{\varepsilon}|X)=E(\tilde \varepsilon\tilde \varepsilon'|X)=\sigma^2CVC'=\sigma^2C(C'C)^{-1}C'=\sigma^2CC^{-1}(C')^{-1}C'=\sigma^2I_n\\
\downarrow
\\ \hat{\beta}_{GLS}=(\tilde{X}'\tilde{X})^{-1}\tilde{X}'\tilde{y}=[(CX)'(CX)]^{-1}(CX)'Cy=(X'V^{-1}X)^{-1}X'V^{-1}y
y = X β + ε ↓ C y = C X β + C ε ↓ y ~ = X ~ β + ε ~ ↓ V a r ( ε ~ ∣ X ) = E ( ε ~ ε ~ ′ ∣ X ) = σ 2 C V C ′ = σ 2 C ( C ′ C ) − 1 C ′ = σ 2 C C − 1 ( C ′ ) − 1 C ′ = σ 2 I n ↓ β ^ G L S = ( X ~ ′ X ~ ) − 1 X ~ ′ y ~ = [ ( C X ) ′ ( C X ) ] − 1 ( C X ) ′ C y = ( X ′ V − 1 X ) − 1 X ′ V − 1 y
但是V在实践中不可知!!!
可行GLS (FGLS)
β ^ F G L S = ( X ′ V ^ − 1 X ) − 1 X ′ V ^ − 1 y \hat{\beta}_{FGLS}=(X'\hat{V}^{-1}X)^{-1}X'\hat{V}^{-1}y
β ^ F G L S = ( X ′ V ^ − 1 X ) − 1 X ′ V ^ − 1 y
仅有异方差的估计实践:
(1)e i 2 = σ 2 e x p ( δ 1 + δ 2 x i 2 + ⋯ + δ K x i K ) v i e_i^2=\sigma^2exp(\delta_1+\delta_2x_{i2}+\cdots+\delta_{K}x_{iK})v_{i} e i 2 = σ 2 e x p ( δ 1 + δ 2 x i 2 + ⋯ + δ K x i K ) v i
(2)l n e i 2 = ( l n σ 2 + δ 1 ) + δ 2 x i 2 + ⋯ + δ K x i K + l n v i lne_i^2=(ln\sigma^2+\delta_1)+\delta_2x_{i2}+\cdots+\delta_Kx_{iK}+lnv_i l n e i 2 = ( l n σ 2 + δ 1 ) + δ 2 x i 2 + ⋯ + δ K x i K + l n v i 得到l n e i 2 lne_i^2 l n e i 2 的预测值l n σ ^ i 2 ln\hat\sigma_i^2 l n σ ^ i 2
(3)σ i ^ 2 = e l n σ i ^ 2 \hat{\sigma_i}^2=e^{ln\hat{\sigma_i}^2} σ i ^ 2 = e l n σ i ^ 2 ,以1 / σ i ^ 2 1/\hat{\sigma_i}^2 1 / σ i ^ 2 为权重进行WLS估计
2. 自相关问题 2.1 问题起源 ∃ i ≠ j , E ( ε i ε j ∣ X ) ≠ 0 \exists i\neq j, E(\varepsilon_i\varepsilon_j|X)\neq0 ∃ i = j , E ( ε i ε j ∣ X ) = 0
2.2 序列相关的后果 OLS估计量依然无偏并且一致 OLS估计量依然服从渐进正态分布 t检验、F检验失效 高斯马尔可夫定理不再成立,即OLS不再是BLUE 2.3 典例:AR(1) 2.4 问题诊断 画图 e i e_i e i 与e t − 1 e_{t-1} e t − 1 画成散点图 BG检验 y t = β 0 + β 1 x t 1 + ⋯ + β K x t K + ε t ε t = ρ 1 ε t − 1 + ⋯ + ρ p ε t − p + u t H 0 : ρ 1 = ⋯ = ρ p = 0 y_t=\beta_0+\beta_1x_{t1}+\cdots+\beta_{K}x_{tK}+\varepsilon_{t}\\
\varepsilon_{t}=\rho_1\varepsilon_{t-1}+\cdots+\rho_{p}\varepsilon_{t-p}+u_{t}\\
H_0:\rho_1=\cdots=\rho_{p}=0
y t = β 0 + β 1 x t 1 + ⋯ + β K x t K + ε t ε t = ρ 1 ε t − 1 + ⋯ + ρ p ε t − p + u t H 0 : ρ 1 = ⋯ = ρ p = 0
使用辅助回归:e t ⟶ O L S x i 1 , ⋯ , x i K , e t − 1 , ⋯ , e t − p ( t = p + 1 , ⋯ , n ) e_t\stackrel{OLS}\longrightarrow x_{i1},\cdots,x_{iK},e_{t-1},\cdots, e_{t-p}\ (t=p+1,\cdots,n) e t ⟶ O L S x i 1 , ⋯ , x i K , e t − 1 , ⋯ , e t − p ( t = p + 1 , ⋯ , n )
( n − p ) R 2 ⟶ d χ 2 ( p ) (n-p)R^2\stackrel{d}\longrightarrow\chi^2(p) ( n − p ) R 2 ⟶ d χ 2 ( p )
Davidson-MacKinnon (1993): 保持样本容量为n n n , n R 2 ⟶ d χ 2 ( p ) nR^2\stackrel{d}\longrightarrow\chi^2(p) n R 2 ⟶ d χ 2 ( p )
Q检验
残差各阶样本自相关系数:
ρ ^ j ≡ ∑ t = j + 1 n e t e t − j ∑ t = 1 n e t 2 ( j = 1 , 2 , ⋯ , p ) \hat{\rho}_j\equiv \frac{\sum_{t=j+1}^{n}e_te_{t-j}}{\sum_{t=1}^ne_t^2}\ (j=1,2,\cdots,p)
ρ ^ j ≡ ∑ t = 1 n e t 2 ∑ t = j + 1 n e t e t − j ( j = 1 , 2 , ⋯ , p )
Q B P ≡ n ∑ j = 1 p ρ ^ j 2 ⟶ d χ 2 ( p ) Q L B ≡ n ( n + 2 ) ∑ j = 1 p ρ ^ j 2 n − j ⟶ d χ 2 ( p ) Q_{BP}\equiv n\sum_{j=1}^{p}\hat{\rho}_j^2\stackrel{d}\longrightarrow\chi^2(p)\\
Q_{LB}\equiv n(n+2)\sum_{j=1}^{p}\frac{\hat{\rho}_j^2}{n-j}\stackrel{d}\longrightarrow\chi^2(p)
Q B P ≡ n j = 1 ∑ p ρ ^ j 2 ⟶ d χ 2 ( p ) Q L B ≡ n ( n + 2 ) j = 1 ∑ p n − j ρ ^ j 2 ⟶ d χ 2 ( p )
自相关阶数p的确定:p = m i n { f l o o r ( n / 2 ) − 2 , 40 } p=min\{floor(n/2)-2,40\} p = m i n { f l o o r ( n / 2 ) − 2 , 4 0 }
DW检验
2.5 补救方式 八、模型设定与数据问题 1. 遗漏变量 1.1 两种情况 遗漏变量与解释变量不相关C o v ( x i 1 , x i 2 ) = 0 Cov(x_{i1},x_{i2})=0 C o v ( x i 1 , x i 2 ) = 0 ,不影响一致性,但是会增大扰动项的方差 遗漏变量与解释变量相关C o v ( x i 1 , x i 2 ) ≠ 0 Cov(x_{i1},x_{i2})\neq0 C o v ( x i 1 , x i 2 ) = 0 ,OLS不再是一致估计,成为“遗漏变量偏差” 1.2 解决遗漏变量的方法 2. 无关变量 真实模型: y i = x i 1 ′ β 1 + ε i y_i=x_{i1}'\beta_1+\varepsilon_i y i = x i 1 ′ β 1 + ε i 实际估计的模型:y i = x i 1 ′ β 1 + x i 2 ′ β 2 + ( ε i − x i 2 ′ β 2 ) y_i=x_{i1}'\beta_1+x_{i2}'\beta_2+(\varepsilon_{i}-x_{i2}'\beta_2) y i = x i 1 ′ β 1 + x i 2 ′ β 2 + ( ε i − x i 2 ′ β 2 ) 估计量依旧一致,但是方差会增大 3. 解释变量的选择 m i n K B I C ≡ l n ( e ′ e / n ) + l n n n K \mathop{min}_{K} BIC\equiv ln(e'e/n)+\frac{ln\ n}{n}K
min K B I C ≡ l n ( e ′ e / n ) + n l n n K
m i n K B I C ≡ l n ( e ′ e / n ) + l n [ l n n ] n K \mathop{min}_{K} BIC\equiv ln(e'e/n)+\frac{ln[ln\ n]}{n}K
min K B I C ≡ l n ( e ′ e / n ) + n l n [ l n n ] K
BIC比AIC惩罚更严厉,但是BIC是p的一致估计。
4. 对函数形式的检验 RESET检验
基本思想:如果非线性项被遗漏了,就把非线性项引入方程,并检验其系数是否显著 y = x ′ β + δ 2 y ^ 2 + δ 3 y ^ 3 + δ 4 y ^ 4 + μ y=x'\beta+\delta_2\hat{y}^2+\delta_3\hat{y}^3+\delta_4\hat{y}^4+\mu y = x ′ β + δ 2 y ^ 2 + δ 3 y ^ 3 + δ 4 y ^ 4 + μ ,H 0 : δ 2 = δ 3 = δ 4 = 0 H_0:\delta_2=\delta_3=\delta_4=0 H 0 : δ 2 = δ 3 = δ 4 = 0 缺点:并不知道具体遗漏了哪些高次项的信息 5. 多重共线性 5.1 严格多重共线性 ( X ′ X ) − 1 (X'X)^{-1} ( X ′ X ) − 1 不存在 5.2 非严格多重共线性 OLS仍旧是BLUE 但方差V a r ( b ∣ X ) Var(b|X) V a r ( b ∣ X ) 变得很大,使得对系数的估计变得不准确 症状:
单个t t t 检验不显著,但总体R 2 R^2 R 2 较大 增减解释变量使得系数估计值发生较大变化 V a r ( b k ∣ X ) = σ 2 ( 1 − R k 2 ) S k k Var(b_k|X)=\frac{\sigma^2}{(1-R_{k}^2)S_{kk}} V a r ( b k ∣ X ) = ( 1 − R k 2 ) S k k σ 2 方差膨胀因子V I F VIF V I F : V I F = 1 1 − R k 2 VIF=\frac{1}{1-R_k^2} V I F = 1 − R k 2 1 经验规则:m a x { V I F 1 , V I F 2 , ⋯ , V I F k } ≤ 10 max\{VIF_1,VIF_2,\cdots,VIF_k\}\leq10 m a x { V I F 1 , V I F 2 , ⋯ , V I F k } ≤ 1 0 6. 极端数据 杠杆:l e v i ≡ x i ′ ( X ′ X ) − 1 x i lev_i\equiv x_i'(X'X)^{-1}x_i l e v i ≡ x i ′ ( X ′ X ) − 1 x i
0 ≤ l e v i ≤ 1 0\leq lev_i\leq1 0 ≤ l e v i ≤ 1 ∑ i = 1 n l e v i = K \sum_{i=1}^n lev_i=K ∑ i = 1 n l e v i = K (解释变量个数) 记b ( i ) b^{(i)} b ( i ) 为去掉第i i i 个观测数据后的OLS估计值,可以证明:
b − b ( i ) = ( 1 1 − l e v i ) ( X ′ X ) − 1 x i e i b-b^{(i)}=(\frac{1}{1-lev_i})(X'X)^{-1}x_ie_i
b − b ( i ) = ( 1 − l e v i 1 ) ( X ′ X ) − 1 x i e i
7. 虚拟变量 8. 经济结构变动的检验 8.1 结构变动日期已知 Chow Test
无约束方程: y 1 = X 1 β 1 + ε 1 y 2 = X 2 β 2 + ε 2 有约束方程: y = X β + ε H 0 : β 1 ⃗ = β 2 ⃗ ( K c o n s t r a i n t s t o t a l ) F = ( e ′ e − e 1 ′ e 1 − e 2 ′ e 2 ) / K ( e 1 ′ e 1 + e 2 ′ e 2 ) / ( n − 2 K ) ∼ F ( K , n − 2 K ) \text{无约束方程:}\\
y^1=X^1\beta^1+\varepsilon^1\\
y^2=X^2\beta^2+\varepsilon^2\\
\text{有约束方程:}\\
y=X\beta+\varepsilon\\
\ \\ \ \\
H_0: \vec{\beta^1}=\vec{\beta^2}(K\ constraints\ total)\\ \ \\
F=\frac{(e'e-e_1'e_1-e_2'e_2)/K}{(e_1'e_1+e_2'e_2)/(n-2K)}\sim F(K,n-2K)
无约束方程: y 1 = X 1 β 1 + ε 1 y 2 = X 2 β 2 + ε 2 有约束方程: y = X β + ε H 0 : β 1 = β 2 ( K c o n s t r a i n t s t o t a l ) F = ( e 1 ′ e 1 + e 2 ′ e 2 ) / ( n − 2 K ) ( e ′ e − e 1 ′ e 1 − e 2 ′ e 2 ) / K ∼ F ( K , n − 2 K )
虚拟变量法
y t = α + β x t + γ D t + δ D t x t + ε t y_t=\alpha+\beta x_t+\gamma D_t+\delta D_tx_t+\varepsilon_t
y t = α + β x t + γ D t + δ D t x t + ε t
8.2 结构变动日期未知 选择一个区间 [ τ 0 , τ 1 ] ⊆ [ 1 , T ] [\tau_0,\tau_1]\subseteq[1,T] [ τ 0 , τ 1 ] ⊆ [ 1 , T ] ,计算每一年份的F统计量,然后取最大值 匡特统计量(QLR) 15% trimming 9. 缺失数据与线性插值 九、面板数据 十、IV, 2SLS, GMM 1. 解释变量与扰动项相关的例子 联立方程偏差(内生变量与扰动项相关)
有效工具变量应满足的条件:
工具变量与内生解释变量相关C o v ( z i , x i ) ≠ 0 Cov(z_i,x_i)\neq0 C o v ( z i , x i ) = 0 (排他性约束)工具变量与扰动项不相关C o v ( z i , u i ) = 0 Cov(z_i,u_i)=0 C o v ( z i , u i ) = 0 初识2SLS
第一阶段:(提炼外生部分)用内生解释变量对工具变量进行回归,p t ⟶ O L S z t p_t\stackrel{OLS}\longrightarrow z_t p t ⟶ O L S z t ,得到拟合值p ^ t \hat{p}_t p ^ t 第二阶段:用被解释变量对第一阶段回归的拟合值进行回归,即q t ⟶ O L S p ^ t q_t\stackrel{OLS}\longrightarrow \hat{p}_t q t ⟶ O L S p ^ t 解释变量测量误差
真实模型:
y = α + β x × + ε y=\alpha+\beta x^*+\varepsilon
y = α + β x × + ε
C o v ( x × , ε ) = 0 Cov(x^*,\varepsilon)=0
C o v ( x × , ε ) = 0
观测到:
x = x × + u x=x^*+u
x = x × + u
C o v ( x × , u ) = 0 , C o v ( u , ε ) = 0 Cov(x^* , u)=0, Cov(u,\varepsilon)=0
C o v ( x × , u ) = 0 , C o v ( u , ε ) = 0
因此待估计的模型:
y = α + β x + ( ε − β u ) y=\alpha+\beta x+(\varepsilon-\beta u)
y = α + β x + ( ε − β u )
C o v ( x × + u , ε − β u ) = − β V a r ( u ) ≠ 0 Cov(x^*+u,\varepsilon-\beta u)=-\beta Var(u)\neq0
C o v ( x × + u , ε − β u ) = − β V a r ( u ) = 0
β ^ ⟶ p C o v ( x i , y i ) V a r ( x i ) = β V a r ( x i × ) V a r ( x i × ) + V a r ( u ) = β 1 1 + σ u 2 σ x × 2 \hat{\beta}\stackrel{p}\longrightarrow \frac{Cov(x_i,y_i)}{Var(x_i)}=\frac{\beta Var(x_i^*)}{Var(x_i^*)+Var(u)}=\beta\frac{1}{1+\frac{\sigma_u^2}{\sigma_{x^*}^2}}
β ^ ⟶ p V a r ( x i ) C o v ( x i , y i ) = V a r ( x i × ) + V a r ( u ) β V a r ( x i × ) = β 1 + σ x × 2 σ u 2 1
被解释变量存在测量误差
真正的模型:y × = β x + ε y^*=\beta x+\varepsilon y × = β x + ε , C o v ( x , ε ) = 0 Cov(x,\varepsilon)=0 C o v ( x , ε ) = 0 , β ≠ 0 \beta\neq0 β = 0
测量误差:y = y × + v y=y^*+v y = y × + v
待估计模型:y = β x + ( ε + v ) y=\beta x+(\varepsilon+v) y = β x + ( ε + v ) 扰动项误差增大
2. 工具变量法作为一种矩估计 2.1 矩估计 基本思想:用样本矩替代总体矩 OLS作为一种矩估计 E ( x i ε i ) = 0 → E [ x i ( y i − x i ′ β ) ] = 0 → β = [ E ( x i x i ′ ) ] − 1 E ( x i y i ) β ^ M M = [ 1 / n ∑ ( x i x i ′ ) ] − 1 ( 1 / n ∑ x i y i ) = ( X ′ X ) − 1 ( X ′ y ) = β ^ O L S \begin{aligned}
E(x_i\varepsilon_i)=0&\to E[x_i(y_i-x_i'\beta)]=0\\
&\to \beta=[E(x_ix_i')]^{-1}E(x_iy_i)
\end{aligned}\\
\hat{\beta}_{MM}=[1/n \sum(x_ix_i')]^{-1}(1/n \sum x_iy_i)=(X'X)^{-1}(X'y)=\hat{\beta}_{OLS}
E ( x i ε i ) = 0 → E [ x i ( y i − x i ′ β ) ] = 0 → β = [ E ( x i x i ′ ) ] − 1 E ( x i y i ) β ^ M M = [ 1 / n ∑ ( x i x i ′ ) ] − 1 ( 1 / n ∑ x i y i ) = ( X ′ X ) − 1 ( X ′ y ) = β ^ O L S
2.2 工具变量法作为一种矩估计 正交条件:E ( z i ε i ) = E ( z i ( y i − x i ′ β ) ) = 0 → β = [ E ( z i x i ′ ) ] − 1 E ( z i y i ) E(z_i\varepsilon_i)=E(z_i(y_i-x_i'\beta))=0\to \beta=[E(z_ix_i')]^{-1}E(z_iy_i) E ( z i ε i ) = E ( z i ( y i − x i ′ β ) ) = 0 → β = [ E ( z i x i ′ ) ] − 1 E ( z i y i )
β ^ I V = [ 1 / n ∑ ( z i x i ′ ) ] − 1 ( 1 / n ∑ z i y i ) = ( Z ′ X ) − 1 ( Z ′ y ) \hat{\beta}_{IV}=[1/n \sum(z_ix_i')]^{-1}(1/n \sum z_iy_i)=(Z'X)^{-1}(Z'y) β ^ I V = [ 1 / n ∑ ( z i x i ′ ) ] − 1 ( 1 / n ∑ z i y i ) = ( Z ′ X ) − 1 ( Z ′ y )
秩条件
若r a n k [ E ( z i x i ′ ) ] = K rank[E(z_ix_i')]=K r a n k [ E ( z i x i ′ ) ] = K ,则在一定的正则条件下,β ^ I V \hat{\beta}_{IV} β ^ I V 是β \beta β 的一致估计,且β ^ I V \hat{\beta}_{IV} β ^ I V 服从渐进正态分布 阶条件
不在方程中出现的工具变量个数不能少于方程中内生结束变量的个数
不可识别
恰好识别:上述工具变量法仅适用于恰好识别这一情形
过度识别
3. 2SLS 4. 有关工具变量的检验 4.1 不可识别检验 4.2 弱工具变量检验 四种检验方法
偏R 2 R^2 R 2 最小特征值统计量 "Cragg-Donald Wald F统计量" (Cragg and Donald, 1993) [需假设扰动项为iid] "Kleibergen-Paap Wald rk F统计量" 解决弱工具变量的方法
寻找更强的工具变量 使用对弱工具变量更不敏感的“有限信息最大似然估计法” (LIML) 丢弃冗余工具变量 4.3 过度识别检验——Sargan统计量 H 0 : H_0: H 0 : 所有工具变量都是外生的 e i , I V = γ 1 x i 1 + ⋯ + γ K − r x i , K − r + δ 1 z i 1 + ⋯ + δ m z i m + e r r o r i e_{i,IV}=\gamma_1x_{i1}+\cdots+\gamma_{K-r}x_{i,K-r}+\delta_1z_{i1}+\cdots+\delta_mz_{im}+error_{i} e i , I V = γ 1 x i 1 + ⋯ + γ K − r x i , K − r + δ 1 z i 1 + ⋯ + δ m z i m + e r r o r i Sargan统计量:n R 2 ⟶ d χ 2 ( m − r ) nR^2\stackrel{d}\longrightarrow\chi^2(m-r) n R 2 ⟶ d χ 2 ( m − r ) 过度识别的大前提:至少该模型是恰好识别的【需说明这些IV估计量中至少有一个是外生的】 4.4 究竟使用OLS还是工具变量:豪斯曼检验 H 0 : H_0: H 0 : 所有解释变量均外生 ( β ^ I V − β ^ O L S ) ′ D − ( β ^ I V − β ^ O L S ) ⟶ d χ 2 ( r ) (\hat{\beta}_{IV}-\hat{\beta}_{OLS})'D^{-}(\hat{\beta}_{IV}-\hat{\beta}_{OLS})\stackrel{d}\longrightarrow \chi^2(r) ( β ^ I V − β ^ O L S ) ′ D − ( β ^ I V − β ^ O L S ) ⟶ d χ 2 ( r ) 注意:传统豪斯曼检验不适用于异方差的情况
解决异方差的方法
bootstrap Durbin-Wu-Hausman Test
一阶段回归:x 2 = x 1 ′ γ + z ′ δ + v x_2=x_1'\gamma+z'\delta+v x 2 = x 1 ′ γ + z ′ δ + v 原模型中的y = x 1 ′ β 1 + x 2 β + ε y=x_1'\beta_1+x_2\beta+\varepsilon y = x 1 ′ β 1 + x 2 β + ε , 其中ε = ρ v + ξ \varepsilon=\rho v+\xi ε = ρ v + ξ y = x 1 ′ β + x 2 ′ β 2 + v ^ ′ ρ + e r r o r y=x_1'\beta+x_2'\beta_2+\hat{v}'\rho+error y = x 1 ′ β + x 2 ′ β 2 + v ^ ′ ρ + e r r o r , H 0 : ρ = 0 H_0: \rho=0 H 0 : ρ = 0 5. GMM: 假定 2SLS有效的前提:球形扰动项;但如果扰动项存在异方差或者自相关,GMM更有效
线性假定 渐进独立的平稳过程 工具变量的正交性
定义L L L 维列限量g i ≡ z i ε i g_i\equiv z_i\varepsilon_i g i ≡ z i ε i , E ( g i ) = E ( z i ε i ) = 0 E(g_i)=E(z_i\varepsilon_i)=0 E ( g i ) = E ( z i ε i ) = 0 秩条件
E ( z i x i ′ ) E(z_ix_i') E ( z i x i ′ ) 满列秩 { g i } \{g_i\} { g i } 为鞅差分序列
协方差矩阵S = E ( g i g i ′ ) = E ( ε i 2 z i z i ′ ) S=E(g_ig_i')=E(\varepsilon_i^2z_iz_i') S = E ( g i g i ′ ) = E ( ε i 2 z i z i ′ ) 为非退化矩阵 四阶矩E [ ( x i k z i j ) 2 ] E[(x_{ik}z_{ij})^2] E [ ( x i k z i j ) 2 ] 存在且有限,∀ i , j , k \forall i,j,k ∀ i , j , k 6. GMM的推导 总体矩条件:E ( g i ) = E ( z i ε i ) = 0 E(g_i)=E(z_i\varepsilon_i)=0 E ( g i ) = E ( z i ε i ) = 0 相对应的样本矩条件:g n ( β ^ ) ≡ 1 n ∑ z i ( y i − x i ′ β ^ ) = 0 g_n(\hat\beta)\equiv \frac{1}{n}\sum z_i(y_i-x_i'\hat \beta)=0 g n ( β ^ ) ≡ n 1 ∑ z i ( y i − x i ′ β ^ ) = 0 K(β \beta β 向量的维度)个未知数,L(工具变量的个数)个方程
K>L, 无穷多解,此时无法识别 K=L,唯一解, 恰好识别 K<L, 无解,过度识别
想办法找到β ^ \hat \beta β ^ 使得g n ( β ^ ) g_n(\hat \beta) g n ( β ^ ) 尽可能地接近0 \textbf{0} 0 假设W ^ \hat W W ^ 为一个L × L L\times L L × L 维对称正定矩阵,且p l i m W ^ = W plim\hat W=W p l i m W ^ = W , 定义最小化的目标函数为: m i n β ^ J ( β ^ , W ^ ) ≡ n ( g n ( β ^ ) ) ′ W ^ ( g n ( β ^ ) ) \mathop{min}_{\hat \beta }J(\hat \beta, \hat W)\equiv n(g_n(\hat \beta))'\hat W(g_n(\hat \beta))
min β ^ J ( β ^ , W ^ ) ≡ n ( g n ( β ^ ) ) ′ W ^ ( g n ( β ^ ) )
β ^ G M M ( W ^ ) ≡ a r g m i n β ^ J ( β ^ , W ^ ) = ( S Z X ′ W ^ S Z X ) − 1 S Z X ′ W ^ S Z y \begin{aligned}
\hat \beta_{GMM}(\hat W)&\equiv \mathop{argmin}_{\hat \beta}J(\hat{\beta},\hat{W})\\
&=(S_{ZX}'\hat W S_{ZX})^{-1}S_{ZX}' \hat W S_{Zy}
\end{aligned}
β ^ G M M ( W ^ ) ≡ a r g min β ^ J ( β ^ , W ^ ) = ( S Z X ′ W ^ S Z X ) − 1 S Z X ′ W ^ S Z y
其中:S Z X = 1 n ∑ z i x i ′ S_{ZX}=\frac{1}{n}\sum z_ix_i' S Z X = n 1 ∑ z i x i ′ ,S Z y = 1 n ∑ z i y i S_{Zy}=\frac{1}{n}\sum z_iy_i S Z y = n 1 ∑ z i y i
恰好识别的情况下,GMM还原为IV,因为β ^ G M M ( W ^ ) = S Z X − 1 W ^ − 1 S Z X ′ − 1 S Z X ′ W ^ S Z y = S Z X − 1 S Z y = β ^ I V \hat \beta_{GMM}(\hat W)=S_{ZX}^{-1}\hat W^{-1} S_{ZX}'^{-1}S_{ZX}' \hat W S_{Zy}=S_{ZX}^{-1}S_{Zy}=\hat \beta_{IV} β ^ G M M ( W ^ ) = S Z X − 1 W ^ − 1 S Z X ′ − 1 S Z X ′ W ^ S Z y = S Z X − 1 S Z y = β ^ I V
J ( β ^ , W ^ ) = n ( S Z y − S Z X β ^ ) ′ W ^ ( S Z y − S Z X ) = n ( S Z y ′ − β ^ ′ S Z X ′ ) W ^ ( S Z y − S Z X ) = n ( S Z y ′ W ^ − S Z X ′ β ^ ′ W ^ ) ( S Z y − S Z X ) = n ( S Z y ′ W ^ S Z y − 2 β ^ ′ S Z X ′ W ^ S Z y + β ^ ′ S Z X ′ W ^ S Z X β ^ ) ∂ J ( β ^ , W ^ ) ∂ β ^ = n ( − 2 S Z X ′ W ^ S Z y + 2 S Z X ′ W ^ S Z X β ^ ) β ^ G M M ( W ^ ) ≡ a r g m i n β ^ J ( β ^ , W ^ ) = ( S Z X ′ W ^ S Z X ) − 1 S Z X ′ W ^ S Z y \begin{aligned}
J(\hat{\beta},\hat{W})&=n(S_{Zy}-S_{ZX}\hat{\beta})'\hat{W}(S_{Zy}-S_{ZX})=n(S_{Zy}'-\hat{\beta}'S_{ZX}')\hat{W}(S_{Zy}-S_{ZX})\\
&=n(S_{Zy}'\hat{W}-S_{ZX}'\hat{\beta}'\hat{W})(S_{Zy}-S_{ZX})\\
&=n(S_{Zy}'\hat{W}S_{Zy}-2\hat{\beta}'S_{ZX}'\hat{W}S_{Zy}+\hat{\beta}'S_{ZX}'\hat{W}S_{ZX}\hat{\beta})\\
\frac{\partial J(\hat{\beta},\hat{W})}{\partial \hat{\beta}}&=n(-2S_{ZX}'\hat{W}S_{Zy}+2S_{ZX}'\hat{W}S_{ZX}\hat{\beta})\\
\hat \beta_{GMM}(\hat W)&\equiv \mathop{argmin}_{\hat \beta}J(\hat{\beta},\hat{W})\\
&=(S_{ZX}'\hat W S_{ZX})^{-1}S_{ZX}' \hat W S_{Zy}
\end{aligned}
J ( β ^ , W ^ ) ∂ β ^ ∂ J ( β ^ , W ^ ) β ^ G M M ( W ^ ) = n ( S Z y − S Z X β ^ ) ′ W ^ ( S Z y − S Z X ) = n ( S Z y ′ − β ^ ′ S Z X ′ ) W ^ ( S Z y − S Z X ) = n ( S Z y ′ W ^ − S Z X ′ β ^ ′ W ^ ) ( S Z y − S Z X ) = n ( S Z y ′ W ^ S Z y − 2 β ^ ′ S Z X ′ W ^ S Z y + β ^ ′ S Z X ′ W ^ S Z X β ^ ) = n ( − 2 S Z X ′ W ^ S Z y + 2 S Z X ′ W ^ S Z X β ^ ) ≡ a r g min β ^ J ( β ^ , W ^ ) = ( S Z X ′ W ^ S Z X ) − 1 S Z X ′ W ^ S Z y
7. GMM的大样本性质 β ^ G M M \hat{\beta}_{GMM} β ^ G M M 为一致估计:p l i m n → ∞ β ^ G M M ( W ^ ) = β \mathop plim_{n\to \infty}\hat{\beta}_{GMM}(\hat{W})=\beta p l i m n → ∞ β ^ G M M ( W ^ ) = β β ^ G M M \hat{\beta}_{GMM} β ^ G M M 为渐进正态 命题 :使A v a r ( β ^ G M M ) Avar(\hat{\beta}_{GMM}) A v a r ( β ^ G M M ) 最小化的“最优权重矩阵”为W ^ = S ^ − 1 \hat{W}=\hat{S}^{-1} W ^ = S ^ − 1 ,其中S ^ ≡ 1 n ∑ e i 2 z i z i ′ \hat{S}\equiv \frac{1}{n}\sum e_i^2z_iz_i' S ^ ≡ n 1 ∑ e i 2 z i z i ′ 是S ≡ E ( ε i 2 z i z i ′ ) S\equiv E(\varepsilon_i^2z_iz_i') S ≡ E ( ε i 2 z i z i ′ ) 的一致估计
两步最优GMM估计:
第一步:使用2SLS, 得到残差,计算S ^ ≡ 1 n ∑ e i 2 z i z i ′ \hat {S}\equiv \frac{1}{n}\sum e_i^2z_iz_i' S ^ ≡ n 1 ∑ e i 2 z i z i ′ 第二步:最小化J ( β ^ , S ^ − 1 ) J(\hat \beta, \hat{S}^{-1}) J ( β ^ , S ^ − 1 ) 命题:条件同方差(给定工具变量)情况下,最优GMM就是2SLS
8. 如何获得工具变量 列出与解释变量x相关的的尽可能多的变量清单 从这一清单中剔除与扰动项相关的变量 9. MLE也是GMM 只要E ( s i ( θ 0 ; y i ) ) = 0 E(s_i(\theta_0;y_i))=0 E ( s i ( θ 0 ; y i ) ) = 0 成立,则QMLE仍然是一致的 十一、最大似然估计法 1. 定义 L ( θ ; y 1 , ⋯ , y n ) = ∏ i = 1 n f ( y i ; θ ) L(\theta;y_1,\cdots,y_n)=\prod_{i=1}^{n}f(y_i;\theta)
L ( θ ; y 1 , ⋯ , y n ) = i = 1 ∏ n f ( y i ; θ )
l n L ( θ ; y 1 , ⋯ , y n ) = ∑ i = 1 n l n f ( y i ; θ ) lnL(\theta;y_1,\cdots,y_n)=\sum_{i=1}^{n}lnf(y_i;\theta)
l n L ( θ ; y 1 , ⋯ , y n ) = i = 1 ∑ n l n f ( y i ; θ )
θ ^ M L ≡ a r g m a x l n L ( θ ; y ) \hat{\theta}_{ML}\equiv argmax\ lnL(\theta;y)
θ ^ M L ≡ a r g m a x l n L ( θ ; y )
s ( θ ; y ) ≡ ∂ L ( θ ; y ) ∂ θ ≡ ( ∂ L ( θ ; y ) ∂ θ 1 ∂ L ( θ ; y ) ∂ θ 2 ⋮ ∂ L ( θ ; y ) ∂ θ K ) = 0 s(\theta;y)\equiv \frac{\partial L(\theta;y)}{\partial \theta}\equiv \begin{pmatrix}\frac{\partial L(\theta;y)}{\partial \theta_1} \\ \frac{\partial L(\theta;y)}{\partial \theta_2}\\ \vdots \\ \frac{\partial L(\theta;y)}{\partial \theta_K}\end{pmatrix}=0
s ( θ ; y ) ≡ ∂ θ ∂ L ( θ ; y ) ≡ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎛ ∂ θ 1 ∂ L ( θ ; y ) ∂ θ 2 ∂ L ( θ ; y ) ⋮ ∂ θ K ∂ L ( θ ; y ) ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎞ = 0
s ( θ ; y ) = ∂ ∑ i = 1 n l n f ( y i ; θ ) ∂ θ = ∑ l n f ( y i ; θ ) ∂ θ = ∑ i = 1 n s i ( θ ; y i ) s(\theta;y)=\frac{\partial \sum_{i=1}^{n}lnf(y_i;\theta)}{\partial \theta}=\sum\frac{lnf(y_i;\theta)}{\partial \theta}=\sum_{i=1}^{n}s_i(\theta;y_i)
s ( θ ; y ) = ∂ θ ∂ ∑ i = 1 n l n f ( y i ; θ ) = ∑ ∂ θ l n f ( y i ; θ ) = i = 1 ∑ n s i ( θ ; y i )
H ( θ ; y ) ≡ ∂ 2 l n L ( θ ; y ) ∂ θ ∂ θ ′ = ∑ i = 1 n ∂ 2 l n L ( θ ; y i ) ∂ θ ∂ θ ′ ≡ ∑ i = 1 n H i ( θ ; y i ) H(\theta;y)\equiv \frac{\partial^2 lnL(\theta;y)}{\partial \theta \partial \theta'}=\sum_{i=1}^{n}\frac{\partial^2 lnL(\theta;y_i)}{\partial \theta \partial \theta'}\equiv\sum_{i=1}^{n}H_i(\theta;y_i)
H ( θ ; y ) ≡ ∂ θ ∂ θ ′ ∂ 2 l n L ( θ ; y ) = i = 1 ∑ n ∂ θ ∂ θ ′ ∂ 2 l n L ( θ ; y i ) ≡ i = 1 ∑ n H i ( θ ; y i )
2. 线性回归模型的最大似然估计 首先需要对扰动项的条件概率分布进行假设,如假设为正态分布ε ∣ X ∼ N ( 0 , σ 2 I n ) \varepsilon|X\sim N(0,\sigma^2I_n) ε ∣ X ∼ N ( 0 , σ 2 I n ) ,则y ∣ X ∼ N ( X β , σ 2 I n ) y|X\sim N(X\beta,\sigma^2I_n) y ∣ X ∼ N ( X β , σ 2 I n )
y = X β + ε y=X\beta+\varepsilon y = X β + ε
f ( y ∣ X ) = ( 2 π σ 2 ) − n / 2 e x p { − 1 2 σ 2 ( y − X β ) ′ ( y − X β ) } f(y|X)=(2\pi\sigma^2)^{-n/2}exp\{-\frac{1}{2\sigma^2}(y-X\beta)'(y-X\beta)\} f ( y ∣ X ) = ( 2 π σ 2 ) − n / 2 e x p { − 2 σ 2 1 ( y − X β ) ′ ( y − X β ) }
l n L ( β ~ , σ ~ 2 ) = − n 2 l n 2 π − n 2 l n σ ~ 2 − 1 2 σ ~ 2 ( y − X β ~ ) ′ ( y − X β ~ ) lnL(\tilde{\beta},\tilde{\sigma}^2)=-\frac{n}{2}ln2\pi-\frac{n}{2}ln\tilde{\sigma}^2-\frac{1}{2\tilde{\sigma}^2}(y-X\tilde{\beta})'(y-X\tilde{\beta}) l n L ( β ~ , σ ~ 2 ) = − 2 n l n 2 π − 2 n l n σ ~ 2 − 2 σ ~ 2 1 ( y − X β ~ ) ′ ( y − X β ~ )
β ^ M L = β ^ O L S = ( X ′ X ) − 1 X ′ y \hat{\beta}_{ML}=\hat{\beta}_{OLS}=(X'X)^{-1}X'y β ^ M L = β ^ O L S = ( X ′ X ) − 1 X ′ y
第二步:l n L ( β ~ , σ ~ 2 ) = − n 2 l n 2 π − n 2 l n σ ~ 2 − 1 2 σ ~ 2 e ′ e lnL(\tilde{\beta},\tilde{\sigma}^2)=-\frac{n}{2}ln2\pi-\frac{n}{2}ln\tilde{\sigma}^2-\frac{1}{2\tilde{\sigma}^2}e'e l n L ( β ~ , σ ~ 2 ) = − 2 n l n 2 π − 2 n l n σ ~ 2 − 2 σ ~ 2 1 e ′ e ,对σ ~ 2 \tilde\sigma^2 σ ~ 2 求导,得到
σ ^ M L 2 = e ′ e n ≠ σ ^ O L S 2 = e ′ e n − K ≡ s 2 \hat\sigma^2_{ML}=\frac{e'e}{n}\neq \hat\sigma^2_{OLS}=\frac{e'e}{n-K}\equiv s^2
σ ^ M L 2 = n e ′ e = σ ^ O L S 2 = n − K e ′ e ≡ s 2
3. 最大似然估计的数值解 3.1 网格搜索(Grid Search) 3.2 针对多维——迭代法【例如:牛顿法】 x i + 1 = x i − f ( x i ) f ′ ( x i ) x_{i+1}=x_i-\frac{f(x_i)}{f'(x_i)}
x i + 1 = x i − f ′ ( x i ) f ( x i )
4. 信息矩阵与无偏估计的最小方差 4.1 信息矩阵 I ( θ ) ≡ − E [ ∂ 2 l n L ( θ ; y ) ∂ θ ∂ θ ′ ] I(\theta)\equiv -E[\frac{\partial^2lnL(\theta;y)}{\partial \theta\partial \theta'}]
I ( θ ) ≡ − E [ ∂ θ ∂ θ ′ ∂ 2 l n L ( θ ; y ) ]
4.2 Cramer-Rao Lower Bound 5. 最大似然法的大样本性质 (1)一致性
p l i m θ ^ M L = θ 0 plim\hat{\theta}_{ML}=\theta_0 p l i m θ ^ M L = θ 0 (2)渐进有效性
A v a r ( θ ^ M L ) = n [ I ( θ 0 ) ] − 1 Avar(\hat{\theta}_{ML})=n[I(\theta_0)]^{-1} A v a r ( θ ^ M L ) = n [ I ( θ 0 ) ] − 1 (3)渐进正态
n ( θ ^ M L − θ 0 ) ⟶ d N ( 0 , n [ I ( θ 0 ) ] − 1 ) \sqrt{n}(\hat{\theta}_{ML}-\theta_0)\stackrel{d}\longrightarrow N(0,n[I(\theta_0)]^{-1}) n ( θ ^ M L − θ 0 ) ⟶ d N ( 0 , n [ I ( θ 0 ) ] − 1 ) 以上性质的证明待补充。
6. 最大似然估计量的渐进协方差矩阵 A v a r ( θ ^ M L ) = n [ I ( θ 0 ) ] − 1 = n { − E [ ∂ 2 l n L ( θ 0 ; y ) ∂ θ ∂ θ ′ ] } − 1 Avar(\hat{\theta}_{ML})=n[I(\theta_0)]^{-1}=n\{-E[\frac{\partial^2lnL(\theta_0;y)}{\partial \theta\partial \theta'}]\}^{-1} A v a r ( θ ^ M L ) = n [ I ( θ 0 ) ] − 1 = n { − E [ ∂ θ ∂ θ ′ ∂ 2 l n L ( θ 0 ; y ) ] } − 1
依赖未知参数θ 0 \theta_0 θ 0 ,但θ 0 \theta_0 θ 0 待估 (1)期望值法
A v a r ( θ ^ M L ) = n { − E [ ∂ 2 l n L ( θ ^ M L ; y ) ∂ θ ^ ∂ θ ^ ′ ] } − 1 Avar(\hat{\theta}_{ML})=n\{-E[\frac{\partial^2lnL(\hat\theta_{ML};y)}{\partial \hat{\theta}\partial \hat{\theta}'}]\}^{-1} A v a r ( θ ^ M L ) = n { − E [ ∂ θ ^ ∂ θ ^ ′ ∂ 2 l n L ( θ ^ M L ; y ) ] } − 1
(2)观测信息矩阵法 (OIM)
A v a r ( θ ^ M L ) = n [ − ∂ 2 l n L ( θ ^ M L ; y ) ∂ θ ^ ∂ θ ^ ′ ] − 1 Avar(\hat{\theta}_{ML})=n[-\frac{\partial^2lnL(\hat\theta_{ML};y)}{\partial \hat{\theta}\partial \hat{\theta}'}]^{-1} A v a r ( θ ^ M L ) = n [ − ∂ θ ^ ∂ θ ^ ′ ∂ 2 l n L ( θ ^ M L ; y ) ] − 1
(3)梯度向量外积或BHHH法 (OPG)
A v a r ( θ ^ M L ) = n ( ∑ s ^ i s ^ i ′ ) − 1 Avar(\hat{\theta}_{ML})=n(\sum\hat{s}_i\hat{s}_i')^{-1} A v a r ( θ ^ M L ) = n ( ∑ s ^ i s ^ i ′ ) − 1 ,其中s ^ i ≡ ∂ l n L ( θ ^ M L ; y i ) ∂ θ \hat{s}_i\equiv \frac{\partial lnL(\hat\theta_{ML};y_i)}{\partial \theta} s ^ i ≡ ∂ θ ∂ l n L ( θ ^ M L ; y i ) 为第i i i 个观测值对得分函数的贡献之估计值
7. 三类渐进等价的统计检验 8. 准最大似然估计法 9. 对正态分布假设的检验 画图(histogram; kernel density estimation; QQ-plot) Jarque-Bera检验 J B ≡ n 6 [ ( 1 n σ ^ 3 ∑ e i 3 ) 2 + 1 4 ( 1 n σ ^ 4 ∑ e i 4 − 3 ) 2 ] ⟶ d χ 2 ( 2 ) JB\equiv \frac{n}{6}[(\frac{1}{n\hat{\sigma}^3}\sum e_i^3)^2+\frac{1}{4}(\frac{1}{n\hat{\sigma}^4}\sum e_i^4-3)^2]\stackrel{d}\longrightarrow\chi^2(2)
J B ≡ 6 n [ ( n σ ^ 3 1 ∑ e i 3 ) 2 + 4 1 ( n σ ^ 4 1 ∑ e i 4 − 3 ) 2 ] ⟶ d χ 2 ( 2 )
十二、二值选择模型 1. Linear Probability Model (LPM) 1.1 问题起源:ε i \varepsilon_i ε i 与x i x_i x i 相关 因为y只能取0或者1,因而ε i = 1 − x i ′ β \varepsilon_i=1-x_i'\beta ε i = 1 − x i ′ β 或者ε i = − x i ′ β \varepsilon_i=-x_i'\beta ε i = − x i ′ β , 与x相关, inconsistent
ε \varepsilon ε 是两点分布而不是正态分布
V a r ( ε i ) = V a r ( x i ′ β ) Var(\varepsilon_i)=Var(x_i'\beta) V a r ( ε i ) = V a r ( x i ′ β ) , 存在异方差
1.2 另一问题:预测值超出范围 y ^ > 1 \hat y>1 y ^ > 1 or y ^ < 0 \hat y<0 y ^ < 0 并不符合被解释变量二值的现实
1.3 LPM的优点 2. 连接函数F ( x , β ) F(x,\beta) F ( x , β ) 2.1 将y ^ = E ( y ∣ x ) \hat y=E(y|x) y ^ = E ( y ∣ x ) 理解为“y = 1 y=1 y = 1 ”发生的概率 E ( y ∣ x ) = 1 ⋅ P ( y = 1 ∣ x ) + 0 ⋅ P ( y = 0 ∣ x ) = P ( y = 1 ∣ x ) E(y|x)=1\cdot P(y=1|x)+0\cdot P(y=0|x)=P(y=1|x)
E ( y ∣ x ) = 1 ⋅ P ( y = 1 ∣ x ) + 0 ⋅ P ( y = 0 ∣ x ) = P ( y = 1 ∣ x )
2.2 Probit Model P ( y = 1 ∣ x ) = F ( x , β ) = Φ ( x ′ β ) = ∫ − ∞ x ′ β ϕ ( t ) d t P(y=1|x)=F(x,\beta)=\Phi(x'\beta)=\int_{-\infty}^{x'\beta}\phi(t)dt
P ( y = 1 ∣ x ) = F ( x , β ) = Φ ( x ′ β ) = ∫ − ∞ x ′ β ϕ ( t ) d t
F ( x , β ) F(x,\beta) F ( x , β ) 为标准正态的累积分布函数(cdf) 2.3 Logit Model P ( y = 1 ∣ x ) = F ( x , β ) = Λ ( x ′ , β ) ≡ e x p ( x ′ β ) 1 + e x p ( x ′ β ) P(y=1|x)=F(x,\beta)=\Lambda(x',\beta)\equiv \frac{exp(x'\beta)}{1+exp(x'\beta)}
P ( y = 1 ∣ x ) = F ( x , β ) = Λ ( x ′ , β ) ≡ 1 + e x p ( x ′ β ) e x p ( x ′ β )
3. Logit Model:分析 3.1 估计方法:MLE 第i个数据的概率密度:
f ( y i ∣ x , β ) = { Λ ( x i ′ β ) i f y i = 1 1 − Λ ( x i ′ β ) i f y i = 0 f(y_i|x,\beta)=\begin{cases} \Lambda(x_i'\beta) &\ if y_i=1\\ 1-\Lambda(x_i'\beta) &\ if y_i=0\end{cases}
f ( y i ∣ x , β ) = { Λ ( x i ′ β ) 1 − Λ ( x i ′ β ) i f y i = 1 i f y i = 0
因此,
f ( y i ∣ x i , β ) = [ Λ ( x i ′ β ) ] y i [ 1 − Λ ( x i ′ β ) ] 1 − y i f(y_i|x_i,\beta)=[\Lambda(x_i'\beta)]^{y_i}[1-\Lambda(x_i'\beta)]^{1-y_i}
f ( y i ∣ x i , β ) = [ Λ ( x i ′ β ) ] y i [ 1 − Λ ( x i ′ β ) ] 1 − y i
取对数,
l n f ( y i ∣ x i , β ) = y i l n [ Λ ( x i ′ β ) ] + ( 1 − y i ) l n [ 1 − Λ ( x i ′ β ) ] ln f(y_i|x_i,\beta)=y_i ln[\Lambda(x_i'\beta)]+(1-y_i) ln[1-\Lambda(x_i'\beta)]
l n f ( y i ∣ x i , β ) = y i l n [ Λ ( x i ′ β ) ] + ( 1 − y i ) l n [ 1 − Λ ( x i ′ β ) ]
对数似然函数:
l n L ( β ∣ y , x ) = ∑ i = 1 n y i l n [ Λ ( x i ′ β ) ] + ∑ i = 1 n ( 1 − y i ) l n [ 1 − Λ ( x i ′ β ) ] ln L(\beta|y,x)=\sum_{i=1}^{n} y_i ln[\Lambda (x_i'\beta)]+\sum_{i=1}^{n}(1-y_i)ln[1-\Lambda(x_i'\beta)]
l n L ( β ∣ y , x ) = i = 1 ∑ n y i l n [ Λ ( x i ′ β ) ] + i = 1 ∑ n ( 1 − y i ) l n [ 1 − Λ ( x i ′ β ) ]
β ^ M L E = a r g m a x l n ( β ; y , x ) \hat{\beta}_{MLE}=argmax\ ln(\beta; y,x)
β ^ M L E = a r g m a x l n ( β ; y , x )
3.2 β M L E \beta_{MLE} β M L E 并非边际效应 ∂ P ( y = 1 ∣ x ) ∂ x k = ∂ P ( y = 1 ∣ x ) ∂ ( x ′ β ) ⋅ ∂ ( x ′ β ) ∂ x k = ϕ ( x ′ β ) ⋅ β k \frac{\partial P(y=1|x)}{\partial x_k}=\frac{\partial P(y=1|x)}{\partial (x'\beta)}\cdot \frac{\partial (x'\beta)}{\partial x_k}=\phi(x'\beta)\cdot \beta_k
∂ x k ∂ P ( y = 1 ∣ x ) = ∂ ( x ′ β ) ∂ P ( y = 1 ∣ x ) ⋅ ∂ x k ∂ ( x ′ β ) = ϕ ( x ′ β ) ⋅ β k
三种常用的边际效应:
平均边际效应:分别计算在每个样本观测值上的边际效应,然后进行简单的算术平均
样本均值处的边际效应:x = x ˉ x=\bar{x} x = x ˉ 处的边际效应
在某代表值处的边际效应:x = x × x=x^* x = x × 处的边际效应
3.3 Log-odds ratio p ≡ P ( y = 1 ∣ x ) , 1 − p ≡ P ( y = 0 ∣ x ) p = e x p ( x ′ β ) 1 + e x p ( x ′ β ) , 1 − p = 1 1 + e x p ( x ′ β ) p 1 − p = e x p ( x ′ β ) l n p 1 − p = x ′ β p\equiv P(y=1|x),1-p\equiv P(y=0|x)\\
p=\frac{exp(x'\beta)}{1+exp(x'\beta)}, 1-p=\frac{1}{1+exp(x'\beta)}\\
\frac{p}{1-p}=exp(x'\beta) \\
ln \frac{p}{1-p}=x'\beta
p ≡ P ( y = 1 ∣ x ) , 1 − p ≡ P ( y = 0 ∣ x ) p = 1 + e x p ( x ′ β ) e x p ( x ′ β ) , 1 − p = 1 + e x p ( x ′ β ) 1 1 − p p = e x p ( x ′ β ) l n 1 − p p = x ′ β
l n p 1 − p ln\frac{p}{1-p} l n 1 − p p 可以视作半弹性的概念,对于odds而言。即解释变量变化一个单位,odds ratio变化了百分之多少 e x p ( β j ) exp(\beta_j) e x p ( β j ) 的解释:解释变量变化一个单位,odds变为原来的多少倍: p × 1 − p × / p 1 − p = e x p ( β 1 + β 2 x 2 + ⋯ + β j ( x j + 1 ) + ⋯ + β K x K ) e x p ( β 1 + β 2 x 2 + ⋯ + β j x j + ⋯ + β K x K ) = e x p ( β j ) \frac{p^*}{1-p^*}/\frac{p}{1-p}=\frac{exp(\beta_1+\beta_2x_2+\cdots+\beta_j(x_j+1)+\cdots+\beta_Kx_K)}{exp(\beta_1+\beta_2x_2+\cdots+\beta_j x_j+\cdots+\beta_Kx_K)}=exp(\beta_j)
1 − p × p × / 1 − p p = e x p ( β 1 + β 2 x 2 + ⋯ + β j x j + ⋯ + β K x K ) e x p ( β 1 + β 2 x 2 + ⋯ + β j ( x j + 1 ) + ⋯ + β K x K ) = e x p ( β j )
4. 二值选择模型的拟合优度 P s e u d o R 2 ≡ l n L 0 − l n L 1 l n L 0 = l n L 1 − l n L 0 l n L m a x − l n L 0 Pseudo\ R^2\equiv \frac{ln\ L_0-ln\ L_1}{ln\ L_0}=\frac{ln \ L_1-ln\ L_0}{ln\ L_{max}-ln\ L_0}
P s e u d o R 2 ≡ l n L 0 l n L 0 − l n L 1 = l n L m a x − l n L 0 l n L 1 − l n L 0
5. 二值选择模型的微观基础 5.1 扰动项的一种解释:潜变量 latent variable (潜变量) 净收益(不可观测)y × y^* y × y × = x ′ β + ε y^*=x'\beta+\varepsilon
y × = x ′ β + ε
净收益大于0,选择做;否则选择不做。 index function y = { 1 i f y × > 0 0 i f y × ≤ 0 y=\begin{cases} 1 &if\ y^*>0\\ 0 & if\ y^*\leq 0\end{cases}
y = { 1 0 i f y × > 0 i f y × ≤ 0
5.2 另一种解释:随机效用最大化模型(RUM) 假定U a = x ′ β a + ε a , U b = x ′ β b + ε b U_a=x'\beta_a+\varepsilon_a, U_b=x'\beta_b+\varepsilon_b U a = x ′ β a + ε a , U b = x ′ β b + ε b
P ( y = 1 ∣ x ) = P ( U a > U b ∣ x ) = P [ x ′ ( β a − β b ) + ( ε a + ε b ) > 0 ∣ x ] P(y=1|x)=P(U_a>U_b|x)=P[x'(\beta_a-\beta_b)+(\varepsilon_a+\varepsilon_b)>0|x]
P ( y = 1 ∣ x ) = P ( U a > U b ∣ x ) = P [ x ′ ( β a − β b ) + ( ε a + ε b ) > 0 ∣ x ]
十三、多值选择模型 1. 多项Logit与多项Probit U i j = x i β j + ε i j ( i = 1 , ⋯ , n ; j = 1 , ⋯ , J ) U_{i j}=\boldsymbol{x}_{i} \boldsymbol{\beta}_{j}+\varepsilon_{i j} \quad(i=1, \cdots, n ; j=1, \cdots, J)
U i j = x i β j + ε i j ( i = 1 , ⋯ , n ; j = 1 , ⋯ , J )
P ( y i = j ∣ x i ) = P ( U i j ⩾ U i k , ∀ k ≠ j ) = P ( U i k − U i j ⩽ 0 , ∀ k ≠ j ) = P ( ε i k − ε i j ⩽ x i ′ β j − x i ′ β k , ∀ k ≠ j ) \begin{aligned}
\mathrm{P}\left(y_{i}=j \mid \boldsymbol{x}_{i}\right) &=\mathrm{P}\left(U_{i j} \geqslant U_{i k}, \forall k \neq j\right) \\
&=\mathrm{P}\left(U_{i k}-U_{i j} \leqslant 0, \forall k \neq j\right) \\
&=\mathrm{P}\left(\varepsilon_{i k}-\varepsilon_{i j} \leqslant \boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}_{j}-\boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}_{k}, \forall k \neq j\right)
\end{aligned}
P ( y i = j ∣ x i ) = P ( U i j ⩾ U i k , ∀ k = j ) = P ( U i k − U i j ⩽ 0 , ∀ k = j ) = P ( ε i k − ε i j ⩽ x i ′ β j − x i ′ β k , ∀ k = j )
假设扰动项{ ε } \{\varepsilon\} { ε } 为iid且服从I型极值分布,则有: P ( y i = j ∣ x i ) = exp ( x i ′ β j ) ∑ k = 1 J exp ( x i ′ β k ) P\left(y_{i}=j \mid x_{i}\right)=\frac{\exp \left(x_{i}^{\prime} \beta_{j}\right)}{\sum_{k=1}^{J} \exp \left(x_{i}^{\prime} \beta_{k}\right)}
P ( y i = j ∣ x i ) = ∑ k = 1 J exp ( x i ′ β k ) exp ( x i ′ β j )
base categoryP ( y i = j ∣ x i ) = { 1 1 + ∑ k = 2 J exp ( x i ′ β k ) ( j = 1 ) exp ( x i ′ β j ) 1 + ∑ k = 2 J exp ( x i ′ β k ) ( j = 2 , ⋯ , J ) P\left(y_{i}=j \mid x_{i}\right)=\left\{\begin{array}{ll}
\frac{1}{1+\sum_{k=2}^{J} \exp \left(x_{i}^{\prime} \boldsymbol{\beta}_{k}\right)} & (j=1) \\
\frac{\exp \left(\boldsymbol{x}_{i}' \boldsymbol{\beta}_{j}\right)}{1+\sum_{k=2}^{J} \exp \left(\boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}_{k}\right)} & (j=2, \cdots, J)
\end{array}\right.
P ( y i = j ∣ x i ) = ⎩ ⎪ ⎨ ⎪ ⎧ 1 + ∑ k = 2 J e x p ( x i ′ β k ) 1 1 + ∑ k = 2 J e x p ( x i ′ β k ) e x p ( x i ′ β j ) ( j = 1 ) ( j = 2 , ⋯ , J )
十四、平稳时间序列 1. 时间序列的数字特征 γ k ≡ Cov ( y t , y t + k ) = E [ ( y t − μ ) ( y t + k − μ ) ] \gamma_{k} \equiv \operatorname{Cov}\left(y_{t}, y_{t+k}\right)=\mathrm{E}\left[\left(y_{t}-\mu\right)\left(y_{t+k}-\mu\right)\right.]
γ k ≡ C o v ( y t , y t + k ) = E [ ( y t − μ ) ( y t + k − μ ) ]
样本自协方差:
γ ^ k ≡ 1 T − k ∑ i = 1 T − k ( y t − y ˉ ) ( y t + k − y ˉ ) \hat{\gamma}_{k} \equiv \frac{1}{T-k} \sum_{i=1}^{T-k}\left(y_{t}-\bar{y}\right)\left(y_{t+k}-\bar{y}\right)
γ ^ k ≡ T − k 1 i = 1 ∑ T − k ( y t − y ˉ ) ( y t + k − y ˉ )
k阶自相关系数(对于严格平稳过程,自相关系数不依赖于时间t只依赖于滞后阶数k,因而被称作自相关函数ACF)
ρ k ≡ Corr ( y t , y t + k ) ≡ Cov ( y t , y t + k ) Var ( y t ) \rho_{k} \equiv \operatorname{Corr}\left(y_{t}, y_{t+k}\right) \equiv \frac{\operatorname{Cov}\left(y_{t}, y_{t+k}\right)}{\operatorname{Var}\left(y_{t}\right)}
ρ k ≡ C o r r ( y t , y t + k ) ≡ V a r ( y t ) C o v ( y t , y t + k )
样本自相关系数
ρ ^ k ≡ γ ^ k / γ ^ 0 \hat{\boldsymbol{\rho}}_{k} \equiv \hat{\gamma}_{k} / \hat{\gamma}_{0}
ρ ^ k ≡ γ ^ k / γ ^ 0
k阶偏自相关系数:考虑k期中间各期影响下的条件相关系数(PACF)
ρ k × ≡ Corr ( y t , y t + k ∣ y t + 1 , ⋯ , y t + k − 1 ) \rho_{k}^{*} \equiv \operatorname{Corr}\left(y_{t}, y_{t+k} \mid y_{t+1}, \cdots, y_{t+k-1}\right)
ρ k × ≡ C o r r ( y t , y t + k ∣ y t + 1 , ⋯ , y t + k − 1 )
估计ρ ^ k × \hat \rho _k^* ρ ^ k × 的方法:OLS估计,然后看y t − k y_{t-k} y t − k 前的系数
2. AR(p):自回归模型 y t = β 0 + β 1 y t − 1 + ⋯ + β p y t − p + ε t y_{t}=\beta_{0}+\beta_{1} y_{t-1}+\cdots+\beta_{p} y_{t-p}+\varepsilon_{t}
y t = β 0 + β 1 y t − 1 + ⋯ + β p y t − p + ε t
2.1 对系数的估计方法 OLS:损失p个样本容量 Exact MLE: 使用迭代法进行计算,更加精确但通常计算较复杂;且需要假定扰动项N ( 0 , σ ε 2 ) N(0,\sigma_{\varepsilon}^2) N ( 0 , σ ε 2 ) Conditional MLE: 等价于OLS, 适用于样本容量较大的情况,不依赖正态性假定 2.2 对滞后阶数的估计方法 由大到小的序贯t规则(general-to-specific sequential t rule) 使用信息准则,选择p ^ \hat p p ^ 使得AIC, BIC或者HQIC最小化 2.3 白噪声的性质 零期望 :E ( ε t ) = 0 E(\varepsilon_t)=0 E ( ε t ) = 0 同方差:V a r ( ε t ) = σ ε 2 Var(\varepsilon_t)=\sigma_{\varepsilon}^2 V a r ( ε t ) = σ ε 2 无自相关:C o v ( ε t , ε s ) = 0 , t ≠ s Cov(\varepsilon_t,\varepsilon_s)=0, t\neq s C o v ( ε t , ε s ) = 0 , t = s 3. MA(q): 移动平均模型 y t = μ + ε t + θ 1 ε t − 1 + θ 2 ε t − 2 + ⋯ + θ q ε t − q y_t=\mu+\varepsilon_t+\theta_1 \varepsilon_{t-1}+\theta_2 \varepsilon_{t-2}+\cdots + \theta_q \varepsilon_{t-q}
y t = μ + ε t + θ 1 ε t − 1 + θ 2 ε t − 2 + ⋯ + θ q ε t − q
4. ARMA初识与ACF&PACF判断 y t = β 0 + β 1 y t − 1 + ⋯ + β p y t − p + ε t + θ 1 ε t − 1 + ⋯ + θ q ε t − q y_{t}=\beta_{0}+\beta_{1} y_{t-1}+\cdots+\beta_{p} y_{t-p}+\varepsilon_{t}+\theta_{1} \varepsilon_{t-1}+\cdots+\theta_{q} \varepsilon_{t-q}
y t = β 0 + β 1 y t − 1 + ⋯ + β p y t − p + ε t + θ 1 ε t − 1 + ⋯ + θ q ε t − q
估计( p ^ , q ^ ) (\hat p ,\hat q) ( p ^ , q ^ ) 诊断性分析:确认残差为白噪声
5. Autoregressive Distributed Lag Model: ADL(p,q) y t = β 0 + β 1 y t − 1 + ⋯ + β p y t − p + γ 0 x 0 + γ 1 x t − 1 + ⋯ + γ q x t − q + ε t y_{t}=\beta_{0}+\beta_{1} y_{t-1}+\cdots+\beta_{p} y_{t-p}+\gamma_0 x_0+\gamma_{1} x_{t-1}+\cdots+\gamma_{q} x_{t-q}+\varepsilon_{t}
y t = β 0 + β 1 y t − 1 + ⋯ + β p y t − p + γ 0 x 0 + γ 1 x t − 1 + ⋯ + γ q x t − q + ε t
Note:可使用OLS估计的前提
E ( ε 1 ∣ y 1 − 1 , y 1 − 2 , ⋯ , x 1 , 1 − 1 , x 1 , 1 − 2 , ⋯ , x K , 1 − 1 , x K , t − 2 , ⋯ ) = 0 \mathrm{E}\left(\varepsilon_{1} \mid y_{1-1}, y_{1-2}, \cdots, x_{1,1-1}, x_{1,1-2}, \cdots, x_{K, 1-1}, x_{K, t-2}, \cdots\right)=0 E ( ε 1 ∣ y 1 − 1 , y 1 − 2 , ⋯ , x 1 , 1 − 1 , x 1 , 1 − 2 , ⋯ , x K , 1 − 1 , x K , t − 2 , ⋯ ) = 0 扰动项与所有解释变量的整个历史全部无关 渐进独立的平稳序列 有非零的有限四阶矩 解释变量无完全多重共线性 6. 误差修正模型 ECM 基本思想:变量的短期变动向着这个长期均衡关系的部分调整 AR(1)的ECM: Δ y t = ( 1 − β 1 ) ( y × − y t − 1 ) ⏟ error correction + ε t \Delta y_{t}=\underbrace{\left(1-\beta_{1}\right)\left(y^{*}-y_{t-1}\right)}_{\text {error correction }}+\varepsilon_{t}
Δ y t = error correction ( 1 − β 1 ) ( y × − y t − 1 ) + ε t
ADL的ECM:
原ADL: y t = β 0 + β 1 y t − 1 + γ 0 x t + γ 1 x t − 1 + ε t y_{t}=\beta_{0}+\beta_{1} y_{t-1}+\gamma_{0} x_{t}+\gamma_{1} x_{t-1}+\varepsilon_{t} y t = β 0 + β 1 y t − 1 + γ 0 x t + γ 1 x t − 1 + ε t 长期关系: y × = β 0 ( 1 − β 1 ) + ( γ 0 + γ 1 ) ( 1 − β 1 ) x × y^{*}=\frac{\beta_{0}}{\left(1-\beta_{1}\right)}+\frac{\left(\gamma_{0}+\gamma_{1}\right)}{\left(1-\beta_{1}\right)} x^{*} y × = ( 1 − β 1 ) β 0 + ( 1 − β 1 ) ( γ 0 + γ 1 ) x × 长期乘数:θ = γ 0 + γ 1 1 − β 1 \theta=\frac{\gamma_{0}+\gamma_{1}}{1-\beta_{1}} θ = 1 − β 1 γ 0 + γ 1 长期系数: ϕ = β 0 1 − β 1 \phi=\frac{\beta_0}{1-\beta_1} ϕ = 1 − β 1 β 0 ECM: Δ y t = γ 0 Δ x t + ( β 1 − 1 ) ( y t − 1 − ϕ − θ x t − 1 ) ⏟ error correction + ε t \Delta y_{t}=\gamma_{0} \Delta x_{t}+\underbrace{\left(\beta_{1}-1\right)\left(y_{t-1}-\phi-\theta x_{t-1}\right)}_{\text {error correction }}+\varepsilon_{t}
Δ y t = γ 0 Δ x t + error correction ( β 1 − 1 ) ( y t − 1 − ϕ − θ x t − 1 ) + ε t
7. MA(∞ \infty ∞ )与滞后算子 7.1 MA(∞ \infty ∞ ) y t = μ + ∑ j = 0 ∞ θ j ε t − j , θ 0 = 1 y_{t}=\mu+\sum_{j=0}^{\infty} \theta_{j} \varepsilon_{t-j}, \theta_0=1
y t = μ + j = 0 ∑ ∞ θ j ε t − j , θ 0 = 1
"绝对值可加总"(Absolutely Summable, AS) 7.2 滞后算子 L y t = y t − 1 , L 2 y t = L ( L y t ) = y t − 2 , ⋯ , L p y t = y t − p L y_{t}=y_{t-1}, L^{2} y_{t}=L\left(L y_{t}\right)=y_{t-2}, \cdots, L^{p} y_{t}=y_{t-p}
L y t = y t − 1 , L 2 y t = L ( L y t ) = y t − 2 , ⋯ , L p y t = y t − p
特别地,L 0 y t = 1 ⋅ y t = y t L^0y_t=1\cdot y_t=y_t L 0 y t = 1 ⋅ y t = y t L p ⋅ L q = L p + q L^p\cdot L^q=L^{p+q} L p ⋅ L q = L p + q 差分算子 Δ = 1 − L \Delta=1-L Δ = 1 − L , Δ y t = y t − y t − 1 = ( 1 − L ) y t \Delta y_t=y_t-y_{t-1}=(1-L)y_t Δ y t = y t − y t − 1 = ( 1 − L ) y t 7.3 A R ( p ) AR(p) A R ( p ) 也是M A ( ∞ ) MA(\infty) M A ( ∞ ) ( 1 − β 1 L − ⋯ − β p L p ) y t = β 0 + ε t \left(1-\beta_{1} L-\cdots-\beta_{p} L^{p}\right) y_{t}=\beta_{0}+\varepsilon_{t}
( 1 − β 1 L − ⋯ − β p L p ) y t = β 0 + ε t
滞后多项式β ( L ) = 1 − β 1 L − ⋯ − β p L p \beta(L)=1-\beta_1L-\cdots-\beta_pL^p β ( L ) = 1 − β 1 L − ⋯ − β p L p 7.4 滤波 α ( L ) = α 0 + α 1 L + α 2 L 2 + ⋯ \alpha(L)=\alpha_0+\alpha_1 L+\alpha_2L^2+\cdots
α ( L ) = α 0 + α 1 L + α 2 L 2 + ⋯
命题:弱平稳过程经过AS滤波作用后,仍为弱平稳过程 定义:滤波的乘积 δ ( L ) ≡ α ( L ) β ( L ) ≡ ( α 0 + α 1 L + α 2 L 2 + ⋯ ) ( β 0 + β 1 L + β 2 L 2 + ⋯ ) = α 0 β 0 + ( α 0 β 1 + α 1 β 0 ) L + ( α 2 β 0 + α 1 β 1 + α 0 β 2 ) L 2 + ⋯ \begin{aligned}
\delta(L) & \equiv \alpha(L) \beta(L) \equiv\left(\alpha_{0}+\alpha_{1} L+\alpha_{2} L^{2}+\cdots\right)\left(\beta_{0}+\beta_{1} L+\beta_{2} L^{2}+\cdots\right) \\
&=\alpha_{0} \beta_{0}+\left(\alpha_{0} \beta_{1}+\alpha_{1} \beta_{0}\right) L+\left(\alpha_{2} \beta_{0}+\alpha_{1} \beta_{1}+\alpha_{0} \beta_{2}\right) L^{2}+\cdots
\end{aligned}
δ ( L ) ≡ α ( L ) β ( L ) ≡ ( α 0 + α 1 L + α 2 L 2 + ⋯ ) ( β 0 + β 1 L + β 2 L 2 + ⋯ ) = α 0 β 0 + ( α 0 β 1 + α 1 β 0 ) L + ( α 2 β 0 + α 1 β 1 + α 0 β 2 ) L 2 + ⋯
应用:证明A R ( 1 ) AR(1) A R ( 1 ) 是M A ( ∞ ) MA(\infty) M A ( ∞ ) 方法一:
y t = β 0 + β 1 y t − 1 + ε t = β 0 + β 1 ( β 0 + β 1 y t − 2 + ε t − 1 ) + ε t = ( β 0 + β 0 β 1 ) + β 1 2 y t − 2 + β 1 ε t − 1 + ε t = ( β 0 + β 0 β 1 ) + β 1 2 ( β 0 + β 1 y t − 3 + ε t − 2 ) + β 1 ε t − 1 + ε t = β 0 ( 1 + β 1 + β 1 2 ) + β 1 3 y t − 3 + β 1 2 ε t − 2 + β 1 ε t − 1 + ε t = ⋯ = β 0 ( 1 + β 1 + β 1 2 + ⋯ ) + ε t + β 1 ε t − 1 + β 1 2 ε t − 2 + β 1 3 ε t − 3 + ⋯ \begin{aligned}
y_{t} &=\beta_{0}+\beta_{1} y_{t-1}+\varepsilon_{t} \\
&=\beta_{0}+\beta_{1}\left(\beta_{0}+\beta_{1} y_{t-2}+\varepsilon_{t-1}\right)+\varepsilon_{t} \\
&=\left(\beta_{0}+\beta_{0} \beta_{1}\right)+\beta_{1}^{2} y_{t-2}+\beta_{1} \varepsilon_{t-1}+\varepsilon_{t} \\
&=\left(\beta_{0}+\beta_{0} \beta_{1}\right)+\beta_{1}^{2}\left(\beta_{0}+\beta_{1} y_{t-3}+\varepsilon_{t-2}\right)+\beta_{1} \varepsilon_{t-1}+\varepsilon_{t} \\
&=\beta_{0}\left(1+\beta_{1}+\beta_{1}^{2}\right)+\beta_{1}^{3} y_{t-3}+\beta_{1}^{2} \varepsilon_{t-2}+\beta_{1} \varepsilon_{t-1}+\varepsilon_{t} \\
&=\cdots \\
&=\beta_{0}\left(1+\beta_{1}+\beta_{1}^{2}+\cdots\right)+\varepsilon_{t}+\beta_{1} \varepsilon_{t-1}+\beta_{1}^{2} \varepsilon_{t-2}+\beta_{1}^{3} \varepsilon_{t-3}+\cdots
\end{aligned}
y t = β 0 + β 1 y t − 1 + ε t = β 0 + β 1 ( β 0 + β 1 y t − 2 + ε t − 1 ) + ε t = ( β 0 + β 0 β 1 ) + β 1 2 y t − 2 + β 1 ε t − 1 + ε t = ( β 0 + β 0 β 1 ) + β 1 2 ( β 0 + β 1 y t − 3 + ε t − 2 ) + β 1 ε t − 1 + ε t = β 0 ( 1 + β 1 + β 1 2 ) + β 1 3 y t − 3 + β 1 2 ε t − 2 + β 1 ε t − 1 + ε t = ⋯ = β 0 ( 1 + β 1 + β 1 2 + ⋯ ) + ε t + β 1 ε t − 1 + β 1 2 ε t − 2 + β 1 3 ε t − 3 + ⋯
方法二:
y t = ( 1 − β 1 L ) − 1 ( β 0 + ε t ) = ( 1 + β L + β 2 L 2 + ⋯ ) β 0 + ( 1 + β L + β 2 L 2 + ⋯ ) ε t = β 0 ( 1 + β 1 + β 1 2 + ⋯ ) + ε t + β 1 ε t − 1 + β 1 2 ε t − 2 + β 1 3 ε t − 3 + = β 0 1 − β 1 + ε t + β 1 ε t − 1 + β 1 2 ε t − 2 + β 1 3 ε t − 3 + ⋯ \begin{aligned}
y_{t} &=\left(1-\beta_{1} L\right)^{-1}\left(\beta_{0}+\varepsilon_{t}\right) \\
&=\left(1+\beta L+\beta^{2} L^{2}+\cdots\right) \beta_{0}+\left(1+\beta L+\beta^{2} L^{2}+\cdots\right) \varepsilon_{t} \\
&=\beta_{0}\left(1+\beta_{1}+\beta_{1}^{2}+\cdots\right)+\varepsilon_{t}+\beta_{1} \varepsilon_{t-1}+\beta_{1}^{2} \varepsilon_{t-2}+\beta_{1}^{3} \varepsilon_{t-3}+\\
&=\frac{\beta_{0}}{1-\beta_{1}}+\varepsilon_{t}+\beta_{1} \varepsilon_{t-1}+\beta_{1}^{2} \varepsilon_{t-2}+\beta_{1}^{3} \varepsilon_{t-3}+\cdots
\end{aligned}
y t = ( 1 − β 1 L ) − 1 ( β 0 + ε t ) = ( 1 + β L + β 2 L 2 + ⋯ ) β 0 + ( 1 + β L + β 2 L 2 + ⋯ ) ε t = β 0 ( 1 + β 1 + β 1 2 + ⋯ ) + ε t + β 1 ε t − 1 + β 1 2 ε t − 2 + β 1 3 ε t − 3 + = 1 − β 1 β 0 + ε t + β 1 ε t − 1 + β 1 2 ε t − 2 + β 1 3 ε t − 3 + ⋯
Note:
( 1 − β L ) − 1 = 1 + β L + β 2 L 2 + β 3 L 3 + ⋯ (1-\beta L)^{-1}=1+\beta L + \beta^{2}L^{2}+\beta^{3}L^{3}+\cdots ( 1 − β L ) − 1 = 1 + β L + β 2 L 2 + β 3 L 3 + ⋯ 7.5 脉冲响应函数与累积脉冲响应函数 I R F ( j ) ≡ ∂ y t + j ∂ ε t = β 1 j I R F(j) \equiv \frac{\partial y_{t+j}}{\partial \varepsilon_{t}}=\beta_{1}^{j}
I R F ( j ) ≡ ∂ ε t ∂ y t + j = β 1 j
CIRF ( k ) ≡ ∑ j = 0 k ∂ y t + j ∂ ε t \operatorname{CIRF}(k) \equiv \sum_{j=0}^{k} \frac{\partial y_{t+j}}{\partial \varepsilon_{t}}
C I R F ( k ) ≡ j = 0 ∑ k ∂ ε t ∂ y t + j
7.6 ARMA(p,q)也是M A ( ∞ ) MA(\infty) M A ( ∞ ) y t = β 0 + β 1 y t − 1 + ⋯ + β p y t − p + ε t + θ 1 ε t − 1 + ⋯ + θ q ε t − q y t − β 1 L y t − ⋯ − β p L p y t = β 0 + ε t + θ 1 L ε t + ⋯ + θ q L q ε t β ( L ) y t = β 0 + θ ( L ) ε t \begin{array}{c}
y_{t}=\beta_{0}+\beta_{1} y_{t-1}+\cdots+\beta_{p} y_{t-p}+\varepsilon_{t}+\theta_{1} \varepsilon_{t-1}+\cdots+\theta_{q} \varepsilon_{t-q} \\
y_{t}-\beta_{1} L y_{t}-\cdots-\beta_{p} L^{p} y_{t}=\beta_{0}+\varepsilon_{t}+\theta_{1} L \varepsilon_{t}+\cdots+\theta_{q} L^{q} \varepsilon_{t} \\
\beta(L) y_{t}=\beta_{0}+\theta(L) \varepsilon_{t}
\end{array}
y t = β 0 + β 1 y t − 1 + ⋯ + β p y t − p + ε t + θ 1 ε t − 1 + ⋯ + θ q ε t − q y t − β 1 L y t − ⋯ − β p L p y t = β 0 + ε t + θ 1 L ε t + ⋯ + θ q L q ε t β ( L ) y t = β 0 + θ ( L ) ε t
其中,θ ( L ) ≡ 1 + θ 1 L + ⋯ + θ q L q \theta(L) \equiv 1+\theta_{1} L+\cdots+\theta_{q} L^{q} θ ( L ) ≡ 1 + θ 1 L + ⋯ + θ q L q 8. VAR:向量自回归过程 8.1 二元VAR(p)系统