An Econometric Analysis of the 'Backward-Bending' Labor Supply of Canadian Women

By Adib J. Rahman
2013, Vol. 5 No. 09 | pg. 3/6 |

The Econometric Model and Estimation Method

Consistent with existing literature (Heckman, 1974), let the desired hours in the cross-section of females be given by11:

(1) hi* = δ0 + δ1wi + δ2Zi + εi

where Z includes non-labor income and taste variables such as age and its square, education dummies, female is living with spouse, husband’s earnings, marital status, alimony dummy variable, and a dummy variable of the individual female living with a child less than six years old. We can think of εi as unobserved “tastes for work” (an unobserved, person-specific factor that makes female i work more or fewer hours than other observationally-identical females). We will refer to (1) as the structural labor supply equation. It represents the behavioral response of the individual female’s labor supply decision to her economic environment and our goal here is to estimate δ1.

Suppose the market wage that female i can command is given by:

Advertisement

(2) wi = β0 + β1Xi + µi

where X includes productivity and human capital variables such as age and its square, years of experience and it’s square, education and region dummies. In practice there may be considerable overlap between the variables in X and Z. It may be helpful to think of µi as “unobserved (wage-earning) ability” here. We will refer to (2) as the structural wage equation. The wage equation includes dummy variables that distinguish between different regions of residence. Since there is no theoretical reason justifying the inclusion of region dummies, they are excluded from the labor supply equation.

In the above situation we already know that OLS estimates of either (1) or (2) on the sample of female workers only will be biased (in the case of (1) because the sample includes only those females with positive hours; in the case of (2) because the sample includes only those females with wages above their reservation wage). So we formalize the nature and size of these biases, and obtain unbiased estimates of the δ’s and β’s as shown below.

We begin by substituting (2) into (1), which yields:

(3) hi* = δ0 + δ1[β0 + β1Xi + µi] + δ2Zi + εi

(4) hi* = [δ0 + δ1β0] + δ1β1Xi + δ2Zi + [εi + δ1µi]

(5) hi* = α0 + α1Xi + α2Zi + ηi

where α0 = δ0 + δ1β0; α1 = δ1β1; α2 = δ2; ηi = εi + δ1µi. We will refer to equation (5) as the reduced form hours equation.

As a final step in setting up the problem, note that given our assumptions female i will work a positive number of hours if and only if (iff):

(6) hi* > 0; i.e. ηi > - α0 - α1Xi - α2Zi

Advertisement

Note that conditional on observables (X and Z) either high unobserved tastes for work (εi) or (provided δ1 > 0) high unobserved wage-earning ability (µi) tend to put all women into the sample of working women.

Next, to greatly simplify matters, we assume that the underlying error terms (εi and µi) follow a joint normal distribution. Note that (a) it therefore follows that the “composite” error term ηi is distributed as a joint normal with εi and µi; and (b) we have not assumed that εi and µi are independent. In fact, it seems plausible that work decisions and wages could have a common unobserved component. Indeed, one probably would not have much confidence in an estimation strategy that required them to be independent.

Recalling that an observation is in the sample iff equation (6) is satisfied for that observation we get:

(7) E(εi|hi > 0) = E(εi| ηi > - α0 - α1Xi - α2Zi)

(8) ≡ θ1λi 

where in equation (8), the first term, θ1 is a parameter that does not vary across observations. It is the coefficient from a regression of ηi on εi; therefore of εi + δ1µi on εi. Unless δ1 (the true labor supply elasticity) is zero or negative, or there is a strong negative correlation between underlying tastes for work, εi and wage-earning ability, µi, this will be positive. In words, conditioning on observables, women who are more likely to make it into the sample – i.e. have a high ηi – will on average have a higher residual in the labor supply equation, εi).

The second term in (8), λi, has an i subscript and therefore varies across observations. Mathematically, it is the ratio of the normal density to one minus the normal cdf (both evaluated at the same point, which in turn depends on X and Z). This ratio is sometimes called the inverse Mills ratio. For the normal distribution, this ratio gives the mean property: If x is a standard normal variate, E(x|x > a) = φ(a)/(1- Φ(a)).

Now that we have an expression for the expectation of the error term in the structural labor supply equation (1) we can write:

(9) εi = E(εi|hi > 0) + εi* = θ1λi, where E(εi*) = 0.

In a sample of participants, we can therefore write (1) as:

(10) hi* = δ0 + δ1wi + δ2Zi + θ1λi + εi*

We call this the augmented labor supply equation. It demonstrates that we can decompose the error term in a selected sample into a part that potentially depends on the values of the regressors (X and Z) and a part that does not. It also tells us that, if we had data on λi and included it in the above regression, we could estimate (1) by OLS and not encounter any bias. Thus, one can think of sample selection bias as a specific type of omitted variable bias [Heckman (1979)].

Following the same reasoning for the market wage equation we get:

(11) E(µi|hi > 0) = E(µi| ηi > - α0 - α1Xi - α2Zi)

(12) ≡ θ2λi 

Note that λi in (12) is exactly the same λi that appeared in (8). The parameter θ2 is the supply coefficient from a regression of ηi on εi; therefore of εi + δ1µi on µi. As before, unless δ1 (the true labor supply elasticity) is zero or negative, or there is a strong negative correlation between εi and µi, this will be positive (on average, conditioning on observables, women who are more likely to make it into the sample – i.e. have a high ηi – will have a higher residual in the wage equation, µi).

Equation (12) allows us to write an augmented wage equation:

(13) wi = β0 + β1Xi + θ2λi + µi*, where E(µi*) = 0.

Thus, data on λi would allow us to eliminate the bias in wage equations fitted to the sample of working women only.

When (as we have assumed) all our error terms follow a joint normal distribution, the reduced form hours equation (5) defines a probit equation where the dependent variable is the dichotomous decision of whether to work or not (i.e. whether to be in the sample for which we can estimate our wage and hours equations). Note that all the variables in this probit (the X’s, Z’s and whether a female works) are observed for both female workers and female non-workers. Thus we can estimate the parameters of this equation consistently. In particular (recalling that the variance term in a probit model is not identified) we can get consistent estimates of α0η, α1η and α2η. Combined with the data on the X’s and Z’s, these estimates allow us to calculate an estimated λi for each observation in our data.

Now that we have consistent estimates of λi, we can include them as regressors in a labor supply equation estimated on the sample of participants only. Once we do so, the expectation of the error term in that equation is identically zero, so it can be estimated consistently via OLS. We can do the same thing in the wage equation. This procedure is known as the Heckit method. When we implement this, we will as a matter of fact get estimates of the θ parameters (θ1 in the case where the second stage is an hours equation; θ2 in the case where the first stage is a wage equation). These in turn provide some information about the covariance between the underlying error terms εi and µi.

In general, this technique is used whenever we are running a regression on a sample where there is a possible (or likely) correlation between the realization of the dependent variable and the likelihood of being in the sample. In principle, one can correct for sample selection bias by (i) estimating a reduced-form probit in a larger data set where the dependent variable is included in the subsample of interest; then (ii) estimating the regression in the selected sample with an extra regressor, λi. According to the reasoning above, including this extra regressor should eliminate any inconsistency due to nonrandom selection in our sample.

RELATED ARTICLES