Data Science Assignment Tips

How to successfully write Data Science Assignment?

Where will I come across Data Science assignments?
As a college student, if you are pursuing a degree in Computer Science, Economics, MBA, Statistics, or any related field, you will be coming across the subject of Data Science. In simple terms, Data Science involves using various computational software to draw insights from a dataset. Understanding the data for making decisions is one of the prime characteristics expected from a Data Scientist.

Personal and Professional Skill Sets
As a student aspiring to write and solve a Data Science assignment successfully, one must develop some personal and professional skill sets. It would help if you had a curious mindset, ask yourself whether you are excited to know what this dataset speaks of. You should have the ability to question; this will immediately give you a context to analyze and framing your report. Your curious mind is essential for a good data science assignment. You have to pay heed to minute details, have clarity of understanding. Along with the knowledge in statistics, what is also required is your ability to communicate your results.

Getting Started with the Assignment
When you get an assignment in Data Science, the first thing that you should do is read the questions thoroughly. Some assignments might ask you to be creative; some may ask you to be very specific. Understanding what has been asked for is of utmost importance. This overview would set your path to solving the assignment. Right after this, you, as a student, should immediately recollect the concepts and methodologies that you might need to solve the questions. Drill down to specific concepts, and then finally decide the tool or software that you would like to use.

Before starting to solve the assignment, begin by exploring the dataset that has been given. We prefer using R. There are many packages in R that help you get an exploratory overview of the dataset delivered. Check for invalid data, missing data, and any outliers or errors present in the dataset. Correcting them before the assignment is very important. Understand the type of variables that have been used whether the data is a categorical, nominal, ratio scale, interval scale, ordinal. Understanding the kind of variables will help you in analysis and also visualization. You might want to try creating some basic visualizations as well before getting into the analysis.

Analyzing the dataset
Stick to the basic concepts and start solving the questions. One major question that arises in students’ minds is the clarity of which is the dependent variable and an independent variable. This typically becomes important while running regressions. Be through with standard statistical tests and how they are to be performed using the software. Pay extra attention to units of measurement of variables. Ordinal variables and coded variables typically become a bit complex during interpretation. Make sure to take care of this.
Communicating the results
Communicating your findings is of great importance. This means you have to pay heed to the interpretation of your data analysis results. Keep it short and to the point. Try connecting the interpretation with a write-up on how it is linked to the variables into consideration. This is where you might need some help initially from experts. The report writing contributes significantly to your marks. With experience comes the ability to join the dots between statistical results and real variable based analysis based on that. Make sure the report is well-formatted.

Challenges we may help you with
Students, at times, are unable to relate the theory and concepts with the dataset questions given. When exactly to go for a logit model, or which variables are to be used when you want to run a Chi-Square test in a dataset, is what experts can help you with. As you solve more questions, you will gain confidence. Report writing is another aspect where initially you might take help from experts. Once you get to rectify the errors you are making or see the examples of useful reports, you will eventually get the hang of what is expected of a data science assignment.

Your Future
Whether you solve it on your own or take help, Data Science assignments given at the college level is of great importance for a successful career ahead. So make sure to get it solved even if you alone cannot do it. It will be a game-changer directly or indirectly to your bright future.
So if you are still finding it difficult to adjust your skills to finish your assignments on time, we at Edumanta have a team of experts and educators who will guide you through your assignments. We assure you the best quality assignments and even better a fast delivery. Not just guiding you through solving assignments, but we also offer online tutoring 24*7 service. You will grace through with flying marks, gain knowledge, and eventually pave your way for a successful career ahead. Visit: https://assignmenthelp.edumanta.com/

What Assignment help play critical role.

           What is the importance of an assignment?

The first thing that comes to our mind when we hear about the assignment is that it is a complete waste of time. Writing assignment is something, which is almost disliked by every individual reading this blog. However, this concept is entirely wrong. Teachers deliver the necessary knowledge and information to students, which helps them understand the topics related to various subjects. As a teacher, it is not acceptable behavior to present everything to their students and pamper them. This effectively harms the learning competencies of students, and thus education becomes meaningless to them. Therefore, with the help of assignments and homework, students are expected to learn at home. Many of us might still question why we are given assignments, and what is the primary purpose behind it?

Well, let me tell you some positive aspects of an assignment.
• It helps You to invest your time wisely.
• It allows You to explore more topics in-depth, which standard classroom time doesn’t permit.
• It helps to maintain continuity in your studies and provides an excellent medium to revise your subjects before the exam.
• Teaches you to work independently and to take responsibility for your work.
• It helps you deal with deadlines in a real-life scenario and teaches You to use your time effectively by building a proper schedule.
• It lets you learn how to use tools such as E-books, research materials, and computer software to find knowledge and implement it smartly.
• Teaches You to implement your knowledge outside the book
• It gives You a feel for your real exam and trains you to do better in the
review.
• The primary purpose of assignments is to increase the learning capabilities of students. The more we use our brains, the more they develop. This is a proven scientific fact, and this is the principle behind giving extremely creative and involving assignments to the students. Students learn a lot more when they read or practice something by themselves.

If you still think that assignment is a total waste of time, then its time to change your thoughts if you want to succeed in your future. An assignment is a needful and an essential part of our education, and it’s better to adapt it instead of opposing it. It may be challenging to adjust the skills initially, so Edumanta brings you top-tutors and mentors who will guide You to solve your assignment. We provide a 24×7 service. We also have the facilities of online tutoring where you can ask your doubts regarding your assignment.
You can visit https://assignmenthelp.edumanta.com/, and everything will be a cakewalk for you.

Econometrics Assignment help

SECTION A

Answer all questions from this section

1. Consider the following regression model

Yi = Xi + ui , i = 1; :::; n:

The error term has a zero mean, variance equal to  2=X2 i ; and E (uiuj ) = 0 for i 6= j: You are given a sample of observations f(Yi ; Xi)g n i=1. You may treat Xi as being non-stochastic. Clearly annotating your answers:

(a) (5 marks) Derive the OLS estimator of : In the presence of heteroskedasticity, the OLS estimator remains unbiased (you are not asked to show this). Derive the variance of the OLS estimator of . (b)

(3 marks) Discuss how you can obtain the Best Linear Unbiased Estimator (BLUE) of given the heteroskedasticity.

2. Consider the simple linear regression model

Yi = + Xi + ui , i = 1; :::; n

in the presence of correlation between the error and regressor. The regressor exhibits variability in the sample, i.e., Pn i=1(Xi X) 2 6= 0: Under assumptions of homoskedasticity and the absence of autocorrelation, the IV estimator of that uses the instrument Z has the following (asymptotic) variance (no need to prove this statement)

V ar  ^ IV  =  2 P u n i=1(Xi X) 2 1 r 2 XZ ;

where rXZ 6= 0 is the sample correlation between X and Z and  2 u is the variance of the disturbance term u.

(a) (1 marks) Give the formula for ^ IV (you are not asked to derive it).

(b) (4 marks) Provide at least three factors that will help obtain more precise IV parameter estimates for : In your answer explain why the precision of parameter estimates is important.

(c) (3 marks) Discuss the following statement: “If X is not correlated with u; the best choice of instrument is using the regressor itself

Econometrics Assignment help

  1. You  are interested in the extent to which removing the option of paid childcare  has affected the share of time mothers compared to fathers spend with their children. The economic and health response to the pandemic has caused job losses, increased telecommuting, closed day care and schools, and placed state restrictions on in-home paid childcare – all   of which may have affected the bargaining dynamic between parents in deciding how much time to spend caring for their children. The American Time Use Survey provides nationally representative estimates of how and with whom Americans spend their time, including hours spent on paid work and childcare. It is an annual cross-sectional survey asked during weeks when school is typically in session that is linked to the Current Population Survey (demographic questions were asked several months before the Time Use questions), thus you can observe demographic information and family relationships.

You have data from 2015-2019 and will have data in 2020 next year. You are crafting your econometric specification. Prior to 2020, all states allowed paid in-home care and had schools open. As of the time of the survey in 2020, all states had closed schools (shutdown of  physical buildings and in-person instruction) but there was state variation in allowing paid in-home care for children less than 6 years old. That is, some, but not all, states deemed in-home childcare essential when stay-at-home orders went into effect in 2020 such that the availability of paid in-home care depended on the state you lived in. You  want  to learn  the effect of disallowing paid childcare and closing schools on spousal allocation of childcare hours as measured by y, the ratio of hours mother spent on childcare to hours father spent on childcare. For example, y = 1 when mothers and fathers spent the same amount of time on childcare and y = 1.5 when mothers spent 50% more than fathers.

You restrict your sample to opposite sex, married couples with children where both parents worked in the prior year and both parents report positive hours of childcare.1 You can observe the following variables for each couple:

1For simplicity, we exclude all couples with first responders, like health care workers, for whom special rules applied in the state stay-at-home orders.

yit = hours mother spent on childcare divided by hours father spent in couple in year t

 x1it =1 if wife in couple i in year t is currently working for pay, 0 otherwise

x2it =wife’s paid work hours last week reported in year t for couple i

x3it =1 if husband in couple i in year t is currently working for pay, 0 otherwise

x4it =husband’s paid work hours last week reported in year for couple i   x5it =1 if couple i has a child under 6 years old in year t, 0 otherwise              x6it =1 if couple i’s youngest child is 6-17 years old in year t, 0 otherwise

x7i =1 if wife in couple i has years of education = husband’s, 0 otherwise

x8i =1 if wife in couple i has years of education > husband’s, 0 otherwise

x9i =age of husband – age of wife

x10i =1 if wife’s occupation is teaching, 0 otherwise

x11i =1 if husband’s occupation is teaching, 0 otherwise

pandemict =1 if year is after pandemic hit (2020 or later), 0 otherwise

notavailit =1 if paid in-home childcare was NOT available for couple i in year t, 0 otherwise

Let’s simply refer to the effect of removing the option of paid childcare on the ratio of mother’s to father’s hours of childcare as the treatment effect of interest: TE.

  • Since some but not all states allow paid in-home childcare in 2020 at the time of the survey,  you  could  estimate  TE  with  αˆ3  by  running  ordinary  least  squares  using  only 2020 data with the following specification:

                                                                                                                                              yˆ = αˆ0 + αˆ1x5i + αˆ2x6i + αˆ3(notavaili x5i) + αˆ4x7i + αˆ5x8i + αˆ6x9i

  1. 4  points  Interpret αˆ3.
  1. 4  points  Interpret αˆ5.
  1. 5 points Do you think all of the necessary assumptions hold for αˆ3 to be an unbiased estimator for TE? Explain your reasoning.
    1. 4 points Instead, you assume that

E[y|X] = β0 + β1x5it + β2x6it + β3(notavailit x5it) + β4(pandemict x6it) Interpret β3.

  • 6 points If you ran an ols regression based on the specification above, what type of estimator is βˆ3?  And what assumptions are necessary for βˆ3  to be unbiased?
  • 4 points Compare the advantages and disadvantages of the two potential ordinary least squares estimators for TE: βˆ3  from the specification above and γˆ3 from assuming

E[y|X] = γ0+γ1x5it+γ2x6it+γ3(notavailitx5it)+γ4(pandemictx6it)+γ5x1it+γ6x2it+γ7x3it+γ8x4it

  • 4 points The treatment effect of removing the option of paid childcare may be different for families where one of the parents is themselves a teacher. How would you suggest modifying your specification and why?
  • 20 points To complete the Master of Science in Economics at a University, students must complete the core Economics course and one of the four advanced econometrics courses, among other requirements. Students receiving a B- or better on their final course grade receive credit for the course, while students with a C+ or below do not receive credit. Instructors of the core Econometrics course calculate a final number score for the course with cutoff values for assigning the letter grades. Failing to receive credit for a course that you attended and paid tuition for can be discouraging possibly affecting your enthusiasm for the subject matter. However, sucessfully retaking a course you struggled in may boost your confidence and enthusiasm. Suppose that you were interested in estimating the Treatment Effect of receiving a passing grade on the number of advanced econometric courses taken.

Let x0 be the cutoff value for receiving a B- in a core Econometrics course. You observe for each student:

yi =number of advanced econometrics courses student i enrolled in

x1i=core Econometrics course score for student i

x2i=fraction of students failed by student i’s core Econometrics instructor

x3i=student i’s mean GPA (excluding econometrics courses)

x4i=1 if student i received an A in their Statistics class, 0 otherwise

x5i=1 if student i took Mathematical Methods for Economists, 0 otherwise

  • 4 points To estimate the average treatment effect of receiving a passing grade on number of advanced econometrics courses taken, would you use a sharp or fuzzy Regression Discontinuity design? Explain.
  • 4 points Write down your model specification to estimate the average treatment effect.
  • 6 points Describe two graphical tests that would be important to perform and explain why.
  • 6 points How would you design your falsification test? Explain why you chose it.
  • 24 points Suppose that your goal was to estimate the effect of education on weekly hours worked for individuals approaching retirement. Assume that education is exogenous. The data set includes men over the age of 50 but less than 60 years old.
  • Suppose the data set includes a categorical variable that equals 1 if the man works 0  hours (that is,  does not work),  2 if he works more than 0 but less than 35 hours per  week, and 3 if he works more than 35 hours per week. There are no missing values.
    • 4 points Which is the appropriate econometric model?
  1. 4 points Explain why you chose this model.
  1. 4 points Write the log likelihood function implied by your model choice.
  • Instead, suppose that you had the actual number of hours worked per week (e.g. 0,1,2,. . .) and you observe that a substantial share of men don’t work at all (have 0 hours of work).
    • 4 points Which is the appropriate econometric model?
  1. 4 points Explain why you chose this model.
  1. 4 points Write the log likelihood function implied by your model choice.

Advanced Studies in Econometrics

ECON 8820: Advanced Studies in Econometrics

You may work in groups on the assignment, but you must write up your own answers in your own words.
See the course profile for information on the due date of this assignment and penalties for late submission.

  1. Suppose that U is uniformly distributed on [0, 1] and X is continuously distributed as some distribution
    FX(·). Show that Y := F
    −1
    X (U) is also distributed as FX(·). (Or, X ∼ Y , i.e., X and Y are

distributionally equivalent.)
(Answer) You may start from the definition of the CDF. Here is an alternative method that uses the
change of variables. Let g(U) := F
−1
X (U). Then, g
−1
(Y ) = FX(Y ) and ∂g−1
(y)
∂y = fX(y). So,

fY (y) = 1(FX(y) ∈ [0, 1]) · fX(y)

on y ∈ Y, which is the support of X

  1. The dataset D in the matlab data file dataHW2.mat contains information on T = 50 markets, each with
    J = 5 differentiated products. The first column indicates each market t = 1, . . . , T, the second column
    indicates each product j = 1, . . . , J. The third column shows market share Sjt for all (j, t). The fourth
    to seventh columns are observed characteristics xj = [1, xj1, xj2, xj3], which vary across products but
    not across markets. The eighth is the price pjt. The other columns, say zjt, collect all the variables,
    including xj , that are correlated with the price but not correlated with unobserved characteristics ξjt.
    1

Consumer i’s utility of purchasing product j in market t is represented as

uijt = x
0
jβit + αitpjt + ξjt + εijt

and we assume that consumer i chooses one (and only one) product (including the outside option that
gives utility ui0t = εi0t) that gives the largest utility in market t. Assume that {εijt} are i.i.d with the
type I extreme value distribution so that the probability of consumer i of choosing j is

Pijt =

exp(x
0
jβit + αitpjt + ξjt)
1 + PJ
̃j=1 exp(x
0
̃j
βit + αitp ̃jt + ξ ̃jt)

where pjt and ξjt are potentially correlated, but the dataset does not have ξjt. Assume that

βit
αit
iid∼ N (μ, Σ)

across all i and t where μ is a 5 dimensional vector and Σ is a 5 ×5 symmetric positive definite matrix.
Using the BLP method, estimate the structural parameter θ := {μ, Σ}.
For this exercise:
• Report optimal GMM estimates for every element in μ and Σ.
• You do not need to report standard errors, but discuss in detail how you would compute them.
• Use a tight tolerance, e.g., 1e-13, for the inner loop (fixed point), and use a sufficiently large
number of iterations for the outer loop (evaluation of the GMM objective function)
• Employ 1,000 draws for the simulation of the random coefficients (β
0

it, αit), using the same ran-
domness for every iteration of the outer loop.

• Comment your codes in detail.
• Submit your codes.

Advanced Studies in Econometrics

You may work in groups on the assignment, but you must write up your own answers in your own words.
Submit your outcomes in one PDF and also submit your codes so that I can implement them on my laptop.
This assignment is due on 20 May. See the course profile for information on penalties for late submission.

  1. The Bernstein polynomial density (BPD) is given as;
    f(x|θ) := X
    k
    j=1
    θjbeta(x|j, k − j + 1) (1)

where beta(x|a, b) ∝ x
a−1
(1 − x)
b−11(x ∈ [0, 1]) denotes the PDF of Be(a, b) with a > 0 and b > and

θ ∈ ∆k−1, the k − 1 dimensional unit simplex, i.e., θ ∈ Rk

  • and Pk
    j=1 θj = 1. By construction, we
    see that the specification (1) is a weighted average of the beta densities, {beta(x|j, k − j + 1)}
    k
    j=1. In
    order to understand that (1) is a histogram smoothing, make a diagram of {beta(x|j, k − j + 1)}
    k
    j=1
    on the unit support in a similar fashion of Figure 8 in Kim (2015, QE) for each k ∈ {5, 7, 10}.
    1
  1. Replicate the figure on page 31 of the lecture slides on “Bayesian Methods – Computation.” In partic-
    ular,

• Refer to the first two bullet points on page 29 regarding the true data generating process. Use
the CDF inversion method (pages 21 and 22) to simulate the data from the truncated normal
distribution.
• In addition, see page 30 for computational details, e.g., tuning parameters and number of iterations
for the Metropolis Hastings algorithm.
• You need to evaluate the density function (1) with different θ at each data point to implement the
Metropolis-Hastings algorithm. Observe however that you need to evaluate the beta components
only once, which may reduce computation time.
• Make a 2 by 2 figure just like the one in the lecture slides with a relevant title for each diagram
and labels for the vertical and horizontal axes, when they are informative.
• For the 95% credible band on panel (d), use the 2.5 and 97.5 percentiles at each point in the unit
interval. You may use 100 equidistant grid points for this exercise. Make sure you use different
line stiles for each different objects in the diagram so that they are visually distinguished.

  1. Similarly, replicate the figure on page 8 of the lecture slides on “Bayesian Methods – Computation 2.”
    • If your code for Problem 2 works well, you only need to modify your code slightly; see page 7 of
    the lecture note.