# Finally!

My paper A Test of Endogeneity without Instrumental Variables in Models with Bunching is now forthcoming at Econometrica!

# Resubmitted my endogeneity paper to Econometrica (2nd round)

I sent the second round revision of my paper to Econometrica. The contents of the full application of the first version are back in this version. The paper is very different from the previous versions in the layout, but the essence is the same. The theory for the test statistic is now entirely contained in the online appendix. You can find the new version here.

# Submitted my paper with Christoph Rothe and Nese Yildiz.

We finally submitted our paper for publication! You can see the paper on this link.

The idea is related to that of my endogeneity paper, but the problem has several new complications. Here I tell you the gist of it very informally. Suppose that the structural equation is

$Y=g(X,U)$
But X is endogenous. Now, suppose that you got a variable Z which you think is an IV, and you plan to use a control function approach. If the approach is correct and all the conditions are met, then

$U\perp X|V,$
where V is the control function. Therefore, provided

A1) g is continuous in X,

$E(Y|X,V)=\int g(X,U) dF(U|V)$
is also continuous. This seems like the same setup as in my paper, where instead of controlling for the covariates, we control for the control function. The test would have power if

A2) F(U|X,V) is discontinuous in X

where F is the distribution of U. This happens, for example, when X has bunching points. We pursue the following case:

$X=\max\{0,h(Z,V)\},$
where the bunching, and therefore the discontinuity, is generated by a corner solution type of restriction. The test could be based on a quantity such as

$\Delta(V)=E(Y|X=0,V)-\lim_{x\downarrow 0} E(Y|X=x,V).$
Here the complications begin. The first problem is that we need to estimate V, but V is identified only when X>0. The way around it is to look at the following quantity instead:

$\theta=E(E(Y|X=0,V)-\lim_{x\downarrow 0} E(Y|X=x,V)|X=0)$
$=E(Y|X=0)-E(\lim_{x\downarrow 0} E(Y|X=x,V)|X=0)$
which eliminates the need to estimate V when X=0 thanks to the law of iterated expectations. However, the second term,

$\int \lim_{x\downarrow 0} E(Y|X=x,V)dF(V|X=0)$
requires the estimation of F(V|X=0). It turns out that although it is impossible to estimate V when X=0, it is possible to estimate F(V|X=0). The trick is to observe that without loss of generality, V is uniformly distributed, and thus F(V|X=0)=V-F(V|X>0). Since V is identified when X>0, we can estimate F(V|X>0).

# Resubmitted my endogeneity paper to Econometrica.

Find the new version here. The intuitive idea is simple. Here is the structural equation:

$Y=g(X,Z)+U,$
and we want to know whether X is endogenous or not. We don’t care about Z, which is simply a vector of controls. We need the following assumption:

A1) g is continuous in X

Now, if X is exogenous, then E(U|X,Z)=E(U|Z), so

$E(Y|X,Z)=g(X,Z)+E[U|Z]$
will be continuous in X.

This means that if A1 holds, if E[Y|X,Z] is discontinuous, then X must be endogenous! We can test the exogeneity of X by looking for a discontinuity in E(Y|X,Z).

Ok, but does this test have power? The answer is yes, in certain cases. When X is endogenous, E(U|X,Z) varies in X. For this test to have power, we need more, we need that when X is endogenous,

A2) E(U|X,Z) is discontinuous in X.

This is not a general phenomenon, but it does happen in some cases, especially if the variable X has bunching points. Check the paper to see plenty of examples.

So, the test could be build in the following way: estimate the quantity

$\Delta(Z)=E(Y|X=0,Z)-\lim_{x\rightarrow 0}E(Y|X=x,Z).$
and test whether it is equal to zero. This is a sound strategy in a linear model, for example, when the quantities above can be estimated with simple regressions. However, if we want to do nonparametric regressions two problems arise. The first is practical. The second term is a boundary quantity, which should be estimated with a local linear regression. Unfortunately those don’t run if Z has even as much as two dimensions. I’ll add a post about that sometime. The second problem is the curse of dimensionality: the variance of the estimation of any of the terms above can be huge. The solution is to aggregate the discontinuities, and I choose to do it the following way:

$\theta=\lim_{x\rightarrow 0} \int \left[ E(Y|X=0,Z)-E(Y|X,Z)\right]dF(Z|X=x)$
which is the same as

$\theta=\lim_{x\rightarrow 0} E(E(Y|X=0,Z)-Y|X=z)$
because of the law of iterated expectations. This eliminates the curse of dimensionality, and the need to estimate the second term. To test the exogeneity of X, estimate theta and test whether it is equal to zero.