Econometrics Assignment help

SECTION A

Answer all questions from this section

1. Consider the following regression model

Yi = Xi + ui , i = 1; :::; n:

The error term has a zero mean, variance equal to  2=X2 i ; and E (uiuj ) = 0 for i 6= j: You are given a sample of observations f(Yi ; Xi)g n i=1. You may treat Xi as being non-stochastic. Clearly annotating your answers:

(a) (5 marks) Derive the OLS estimator of : In the presence of heteroskedasticity, the OLS estimator remains unbiased (you are not asked to show this). Derive the variance of the OLS estimator of . (b)

(3 marks) Discuss how you can obtain the Best Linear Unbiased Estimator (BLUE) of given the heteroskedasticity.

2. Consider the simple linear regression model

Yi = + Xi + ui , i = 1; :::; n

in the presence of correlation between the error and regressor. The regressor exhibits variability in the sample, i.e., Pn i=1(Xi X) 2 6= 0: Under assumptions of homoskedasticity and the absence of autocorrelation, the IV estimator of that uses the instrument Z has the following (asymptotic) variance (no need to prove this statement)

V ar  ^ IV  =  2 P u n i=1(Xi X) 2 1 r 2 XZ ;

where rXZ 6= 0 is the sample correlation between X and Z and  2 u is the variance of the disturbance term u.

(a) (1 marks) Give the formula for ^ IV (you are not asked to derive it).

(b) (4 marks) Provide at least three factors that will help obtain more precise IV parameter estimates for : In your answer explain why the precision of parameter estimates is important.

(c) (3 marks) Discuss the following statement: “If X is not correlated with u; the best choice of instrument is using the regressor itself

Econometrics Assignment help

  1. You  are interested in the extent to which removing the option of paid childcare  has affected the share of time mothers compared to fathers spend with their children. The economic and health response to the pandemic has caused job losses, increased telecommuting, closed day care and schools, and placed state restrictions on in-home paid childcare – all   of which may have affected the bargaining dynamic between parents in deciding how much time to spend caring for their children. The American Time Use Survey provides nationally representative estimates of how and with whom Americans spend their time, including hours spent on paid work and childcare. It is an annual cross-sectional survey asked during weeks when school is typically in session that is linked to the Current Population Survey (demographic questions were asked several months before the Time Use questions), thus you can observe demographic information and family relationships.

You have data from 2015-2019 and will have data in 2020 next year. You are crafting your econometric specification. Prior to 2020, all states allowed paid in-home care and had schools open. As of the time of the survey in 2020, all states had closed schools (shutdown of  physical buildings and in-person instruction) but there was state variation in allowing paid in-home care for children less than 6 years old. That is, some, but not all, states deemed in-home childcare essential when stay-at-home orders went into effect in 2020 such that the availability of paid in-home care depended on the state you lived in. You  want  to learn  the effect of disallowing paid childcare and closing schools on spousal allocation of childcare hours as measured by y, the ratio of hours mother spent on childcare to hours father spent on childcare. For example, y = 1 when mothers and fathers spent the same amount of time on childcare and y = 1.5 when mothers spent 50% more than fathers.

You restrict your sample to opposite sex, married couples with children where both parents worked in the prior year and both parents report positive hours of childcare.1 You can observe the following variables for each couple:

1For simplicity, we exclude all couples with first responders, like health care workers, for whom special rules applied in the state stay-at-home orders.

yit = hours mother spent on childcare divided by hours father spent in couple in year t

 x1it =1 if wife in couple i in year t is currently working for pay, 0 otherwise

x2it =wife’s paid work hours last week reported in year t for couple i

x3it =1 if husband in couple i in year t is currently working for pay, 0 otherwise

x4it =husband’s paid work hours last week reported in year for couple i   x5it =1 if couple i has a child under 6 years old in year t, 0 otherwise              x6it =1 if couple i’s youngest child is 6-17 years old in year t, 0 otherwise

x7i =1 if wife in couple i has years of education = husband’s, 0 otherwise

x8i =1 if wife in couple i has years of education > husband’s, 0 otherwise

x9i =age of husband – age of wife

x10i =1 if wife’s occupation is teaching, 0 otherwise

x11i =1 if husband’s occupation is teaching, 0 otherwise

pandemict =1 if year is after pandemic hit (2020 or later), 0 otherwise

notavailit =1 if paid in-home childcare was NOT available for couple i in year t, 0 otherwise

Let’s simply refer to the effect of removing the option of paid childcare on the ratio of mother’s to father’s hours of childcare as the treatment effect of interest: TE.

  • Since some but not all states allow paid in-home childcare in 2020 at the time of the survey,  you  could  estimate  TE  with  αˆ3  by  running  ordinary  least  squares  using  only 2020 data with the following specification:

                                                                                                                                              yˆ = αˆ0 + αˆ1x5i + αˆ2x6i + αˆ3(notavaili x5i) + αˆ4x7i + αˆ5x8i + αˆ6x9i

  1. 4  points  Interpret αˆ3.
  1. 4  points  Interpret αˆ5.
  1. 5 points Do you think all of the necessary assumptions hold for αˆ3 to be an unbiased estimator for TE? Explain your reasoning.
    1. 4 points Instead, you assume that

E[y|X] = β0 + β1x5it + β2x6it + β3(notavailit x5it) + β4(pandemict x6it) Interpret β3.

  • 6 points If you ran an ols regression based on the specification above, what type of estimator is βˆ3?  And what assumptions are necessary for βˆ3  to be unbiased?
  • 4 points Compare the advantages and disadvantages of the two potential ordinary least squares estimators for TE: βˆ3  from the specification above and γˆ3 from assuming

E[y|X] = γ0+γ1x5it+γ2x6it+γ3(notavailitx5it)+γ4(pandemictx6it)+γ5x1it+γ6x2it+γ7x3it+γ8x4it

  • 4 points The treatment effect of removing the option of paid childcare may be different for families where one of the parents is themselves a teacher. How would you suggest modifying your specification and why?
  • 20 points To complete the Master of Science in Economics at a University, students must complete the core Economics course and one of the four advanced econometrics courses, among other requirements. Students receiving a B- or better on their final course grade receive credit for the course, while students with a C+ or below do not receive credit. Instructors of the core Econometrics course calculate a final number score for the course with cutoff values for assigning the letter grades. Failing to receive credit for a course that you attended and paid tuition for can be discouraging possibly affecting your enthusiasm for the subject matter. However, sucessfully retaking a course you struggled in may boost your confidence and enthusiasm. Suppose that you were interested in estimating the Treatment Effect of receiving a passing grade on the number of advanced econometric courses taken.

Let x0 be the cutoff value for receiving a B- in a core Econometrics course. You observe for each student:

yi =number of advanced econometrics courses student i enrolled in

x1i=core Econometrics course score for student i

x2i=fraction of students failed by student i’s core Econometrics instructor

x3i=student i’s mean GPA (excluding econometrics courses)

x4i=1 if student i received an A in their Statistics class, 0 otherwise

x5i=1 if student i took Mathematical Methods for Economists, 0 otherwise

  • 4 points To estimate the average treatment effect of receiving a passing grade on number of advanced econometrics courses taken, would you use a sharp or fuzzy Regression Discontinuity design? Explain.
  • 4 points Write down your model specification to estimate the average treatment effect.
  • 6 points Describe two graphical tests that would be important to perform and explain why.
  • 6 points How would you design your falsification test? Explain why you chose it.
  • 24 points Suppose that your goal was to estimate the effect of education on weekly hours worked for individuals approaching retirement. Assume that education is exogenous. The data set includes men over the age of 50 but less than 60 years old.
  • Suppose the data set includes a categorical variable that equals 1 if the man works 0  hours (that is,  does not work),  2 if he works more than 0 but less than 35 hours per  week, and 3 if he works more than 35 hours per week. There are no missing values.
    • 4 points Which is the appropriate econometric model?
  1. 4 points Explain why you chose this model.
  1. 4 points Write the log likelihood function implied by your model choice.
  • Instead, suppose that you had the actual number of hours worked per week (e.g. 0,1,2,. . .) and you observe that a substantial share of men don’t work at all (have 0 hours of work).
    • 4 points Which is the appropriate econometric model?
  1. 4 points Explain why you chose this model.
  1. 4 points Write the log likelihood function implied by your model choice.

Advanced Studies in Econometrics

ECON 8820: Advanced Studies in Econometrics

You may work in groups on the assignment, but you must write up your own answers in your own words.
See the course profile for information on the due date of this assignment and penalties for late submission.

  1. Suppose that U is uniformly distributed on [0, 1] and X is continuously distributed as some distribution
    FX(·). Show that Y := F
    −1
    X (U) is also distributed as FX(·). (Or, X ∼ Y , i.e., X and Y are

distributionally equivalent.)
(Answer) You may start from the definition of the CDF. Here is an alternative method that uses the
change of variables. Let g(U) := F
−1
X (U). Then, g
−1
(Y ) = FX(Y ) and ∂g−1
(y)
∂y = fX(y). So,

fY (y) = 1(FX(y) ∈ [0, 1]) · fX(y)

on y ∈ Y, which is the support of X

  1. The dataset D in the matlab data file dataHW2.mat contains information on T = 50 markets, each with
    J = 5 differentiated products. The first column indicates each market t = 1, . . . , T, the second column
    indicates each product j = 1, . . . , J. The third column shows market share Sjt for all (j, t). The fourth
    to seventh columns are observed characteristics xj = [1, xj1, xj2, xj3], which vary across products but
    not across markets. The eighth is the price pjt. The other columns, say zjt, collect all the variables,
    including xj , that are correlated with the price but not correlated with unobserved characteristics ξjt.
    1

Consumer i’s utility of purchasing product j in market t is represented as

uijt = x
0
jβit + αitpjt + ξjt + εijt

and we assume that consumer i chooses one (and only one) product (including the outside option that
gives utility ui0t = εi0t) that gives the largest utility in market t. Assume that {εijt} are i.i.d with the
type I extreme value distribution so that the probability of consumer i of choosing j is

Pijt =

exp(x
0
jβit + αitpjt + ξjt)
1 + PJ
̃j=1 exp(x
0
̃j
βit + αitp ̃jt + ξ ̃jt)

where pjt and ξjt are potentially correlated, but the dataset does not have ξjt. Assume that

βit
αit
iid∼ N (μ, Σ)

across all i and t where μ is a 5 dimensional vector and Σ is a 5 ×5 symmetric positive definite matrix.
Using the BLP method, estimate the structural parameter θ := {μ, Σ}.
For this exercise:
• Report optimal GMM estimates for every element in μ and Σ.
• You do not need to report standard errors, but discuss in detail how you would compute them.
• Use a tight tolerance, e.g., 1e-13, for the inner loop (fixed point), and use a sufficiently large
number of iterations for the outer loop (evaluation of the GMM objective function)
• Employ 1,000 draws for the simulation of the random coefficients (β
0

it, αit), using the same ran-
domness for every iteration of the outer loop.

• Comment your codes in detail.
• Submit your codes.

Advanced Studies in Econometrics

You may work in groups on the assignment, but you must write up your own answers in your own words.
Submit your outcomes in one PDF and also submit your codes so that I can implement them on my laptop.
This assignment is due on 20 May. See the course profile for information on penalties for late submission.

  1. The Bernstein polynomial density (BPD) is given as;
    f(x|θ) := X
    k
    j=1
    θjbeta(x|j, k − j + 1) (1)

where beta(x|a, b) ∝ x
a−1
(1 − x)
b−11(x ∈ [0, 1]) denotes the PDF of Be(a, b) with a > 0 and b > and

θ ∈ ∆k−1, the k − 1 dimensional unit simplex, i.e., θ ∈ Rk

  • and Pk
    j=1 θj = 1. By construction, we
    see that the specification (1) is a weighted average of the beta densities, {beta(x|j, k − j + 1)}
    k
    j=1. In
    order to understand that (1) is a histogram smoothing, make a diagram of {beta(x|j, k − j + 1)}
    k
    j=1
    on the unit support in a similar fashion of Figure 8 in Kim (2015, QE) for each k ∈ {5, 7, 10}.
    1
  1. Replicate the figure on page 31 of the lecture slides on “Bayesian Methods – Computation.” In partic-
    ular,

• Refer to the first two bullet points on page 29 regarding the true data generating process. Use
the CDF inversion method (pages 21 and 22) to simulate the data from the truncated normal
distribution.
• In addition, see page 30 for computational details, e.g., tuning parameters and number of iterations
for the Metropolis Hastings algorithm.
• You need to evaluate the density function (1) with different θ at each data point to implement the
Metropolis-Hastings algorithm. Observe however that you need to evaluate the beta components
only once, which may reduce computation time.
• Make a 2 by 2 figure just like the one in the lecture slides with a relevant title for each diagram
and labels for the vertical and horizontal axes, when they are informative.
• For the 95% credible band on panel (d), use the 2.5 and 97.5 percentiles at each point in the unit
interval. You may use 100 equidistant grid points for this exercise. Make sure you use different
line stiles for each different objects in the diagram so that they are visually distinguished.

  1. Similarly, replicate the figure on page 8 of the lecture slides on “Bayesian Methods – Computation 2.”
    • If your code for Problem 2 works well, you only need to modify your code slightly; see page 7 of
    the lecture note.