Data Science Assignment Tips

How to successfully write Data Science Assignment?

Where will I come across Data Science assignments?
As a college student, if you are pursuing a degree in Computer Science, Economics, MBA, Statistics, or any related field, you will be coming across the subject of Data Science. In simple terms, Data Science involves using various computational software to draw insights from a dataset. Understanding the data for making decisions is one of the prime characteristics expected from a Data Scientist.

Personal and Professional Skill Sets
As a student aspiring to write and solve a Data Science assignment successfully, one must develop some personal and professional skill sets. It would help if you had a curious mindset, ask yourself whether you are excited to know what this dataset speaks of. You should have the ability to question; this will immediately give you a context to analyze and framing your report. Your curious mind is essential for a good data science assignment. You have to pay heed to minute details, have clarity of understanding. Along with the knowledge in statistics, what is also required is your ability to communicate your results.

Getting Started with the Assignment
When you get an assignment in Data Science, the first thing that you should do is read the questions thoroughly. Some assignments might ask you to be creative; some may ask you to be very specific. Understanding what has been asked for is of utmost importance. This overview would set your path to solving the assignment. Right after this, you, as a student, should immediately recollect the concepts and methodologies that you might need to solve the questions. Drill down to specific concepts, and then finally decide the tool or software that you would like to use.

Before starting to solve the assignment, begin by exploring the dataset that has been given. We prefer using R. There are many packages in R that help you get an exploratory overview of the dataset delivered. Check for invalid data, missing data, and any outliers or errors present in the dataset. Correcting them before the assignment is very important. Understand the type of variables that have been used whether the data is a categorical, nominal, ratio scale, interval scale, ordinal. Understanding the kind of variables will help you in analysis and also visualization. You might want to try creating some basic visualizations as well before getting into the analysis.

Analyzing the dataset
Stick to the basic concepts and start solving the questions. One major question that arises in students’ minds is the clarity of which is the dependent variable and an independent variable. This typically becomes important while running regressions. Be through with standard statistical tests and how they are to be performed using the software. Pay extra attention to units of measurement of variables. Ordinal variables and coded variables typically become a bit complex during interpretation. Make sure to take care of this.
Communicating the results
Communicating your findings is of great importance. This means you have to pay heed to the interpretation of your data analysis results. Keep it short and to the point. Try connecting the interpretation with a write-up on how it is linked to the variables into consideration. This is where you might need some help initially from experts. The report writing contributes significantly to your marks. With experience comes the ability to join the dots between statistical results and real variable based analysis based on that. Make sure the report is well-formatted.

Challenges we may help you with
Students, at times, are unable to relate the theory and concepts with the dataset questions given. When exactly to go for a logit model, or which variables are to be used when you want to run a Chi-Square test in a dataset, is what experts can help you with. As you solve more questions, you will gain confidence. Report writing is another aspect where initially you might take help from experts. Once you get to rectify the errors you are making or see the examples of useful reports, you will eventually get the hang of what is expected of a data science assignment.

Your Future
Whether you solve it on your own or take help, Data Science assignments given at the college level is of great importance for a successful career ahead. So make sure to get it solved even if you alone cannot do it. It will be a game-changer directly or indirectly to your bright future.
So if you are still finding it difficult to adjust your skills to finish your assignments on time, we at Edumanta have a team of experts and educators who will guide you through your assignments. We assure you the best quality assignments and even better a fast delivery. Not just guiding you through solving assignments, but we also offer online tutoring 24*7 service. You will grace through with flying marks, gain knowledge, and eventually pave your way for a successful career ahead. Visit: https://assignmenthelp.edumanta.com/

Programming for Data Analysis and Visualization

Programming for Data Analysis and Visualization

CA 2

Submit one single R file for the solution of the following questions as: Firstname.Surename.R
Q1.
The dataset “Power Plant” records variables which the company’s engineers believe are
important factors in the operation of the plant. The company is interested in maximising net
hourly electrical energy output (recorded as PE in the dataset). For each hour of energy
output recorded, other variable “Temperature” (AT) in the range 1.81°C and 37.11°C is
recorded.
Steps:

  1. Run a linear regression model for PE over AT. Record the value for the slope
    and take it as the actual population parameter .
  2. For 1000 iterations:
    a. Take 50 random samples from the dataset. Run the regression model and
    using the expression for CI for , that we found in the lecture, find a 95%
    CI for .
    b. Find what percentage of the CIs generated in step 2 would contain the
    that you got in step 1.

Q2.
If and are independent random samples from the Uniform distribution U(0,1), by
generating random samples find | −
| <

.

Q3.
If
are independent random samples from the Beta distribution (1, 1 + ), by
generating random samples for 3 different values for find

∑ ln (1 −
)

∑ ln (1 −
)

and show that the result is independent from .
b. Using the distribution in Q2, show that the result is even independent from the
distribution of
.

Programming for Data Analysis

Assessment Requirements / Tasks (include all guidance notes)
This assignment will use employment data of Wales from the StatsWales data
source. This dataset provides workplace employment estimates, or estimates of
total jobs, for Wales and its NUTS2 areas, along with comparable UK data
disaggregated by industry section.
For this assignment students will undertake a data analysis and machine learning
approach to reveal the workplace employment landscape of Wales.

  1. Data processing
    1.1. Download the dataset for the period 2009 – 2018 and create a dataframe that
    concatenates Wales (total) employment value only.
    1.2. Check for any null value or outlier. If found replace that with mean value.
    1.3. Change the name of the industries as bellow
    The final dataframe should look like following
    Industry 200
    9
    201
    0
    201
    1
    2012 2013 2014 2015 2016 2017 2018

Agriculture
Production
Construction
Retail
ICT
Finance
Real_Estate
Professional_Servic
e
Public_Adminstratio
n
Other_Service

  1. Data analysis
    For each question provide graph/chart along with your own interpretation (~ 50
    words)
    2.1. Which industry employed highest and lowest workers over the period?
    2.2. Which industry has the highest and lowest overall growth over the period?
    2.3. Which years are the best and worst performing year in relation to number of
    employment. (highest and lowest employment)
  2. Visual analysis
    Create a dynamic scatter/bubble plot showing the change of workforce number over
    the period using Plotly express.
  3. PCA/Correlation
    4.1. Undertake a PCA (PC=2; columns should be like PC1, PC2, Industry) and
    produce a scatter plot. Write your interpretation about the plot and in relation
    to the analysis of section 2 & 3 (for example which industries are correlated
    over the years as well as in PCA etc.)
    4.2. Make a year wise correlation for each industry. Does the aforementioned
    industries are also correlated over the years? Explain your answer.

Page 5 of 8

  1. Clustering (k means & hierarchical)
    5.1. Using the best and worst performing year column’s employment data (2.3)
    undertake a K means clustering analysis (K=2 & 3) and identify industries
    cluster together. Write your own interpretation (~100 words).
    5.2. Using the same dataset (best & worst performing) create a hierarchical
    cluster. Compare the cluster with k means clusters.
  2. Discussion
    Provide a brief discussion (~ 300 words) on employment landscape of Wales based
    on the employment data analysis results.
    Assessment Criteria
    1.1 Data preparation 05
    1.2 Data preparation 05
    1.3 Data preparation 05
    2.1 Data analysis 05
    2.2 Data analysis 05
    2.3 Data analysis 05
    3 Visual analysis 20
    4.1 PCA 10
    4.1 Correlation 10
    5.1 Clustering 10
    5.2 Clustering 10
    6 Discussion 10
    Submission Details
    Please see Moodle for confirmation of the Assessment submission date.
    Presentation will be on 4:00 PM of submission date.
    Any assessments submitted after the deadline will not be marked and will be
    recorded as a Non-Attempt.
    The assessment must be submitted as a zip file / pdf / word document through the
    Turnitin submission point in Moodle
    Your assessment should be titled with your Student ID Number, module code and
    assessment id, e.g. st12345678 CIS4000 WRIT1

Page 6 of 8

Feedback
Feedback for the assessment will be provided electronically via Moodle, and will
normally be available 4 working weeks after initial submission. The feedback return
date will be confirmed on Moodle.
Feedback will be provided in the form of a rubric and supported with comments on
your strengths and the areas which you improve.
All marks are preliminary and are subject to quality assurance processes and
confirmation at the Examination Board.
Further information on the Academic and Feedback Policy in available in the
Academic Handbook (Vol 1, Section 4.0)
Marking Criteria
70 – 100%
(1 st )

Addressed all sections and provided correct answers with elegant
presentation of results. Applied correct data analysis approaches
and provided excellent interpretation on each section.

60-69%
(2:1)

Addressed all sections and provided correct answers with good
presentation of results. Applied mostly correct data analysis
approaches and provided very good interpretation on each section.

50-59%
(2:2)

Addressed most of the sections and provided mostly correct answers
with average presentation of results. Applied some correct data
analysis approaches and provided an average interpretation on each
section.

40-49%
(3 rd )

Addressed few sections with few correct answers with/out any
presentation of results. Applied mostly incorrect data analysis
approaches and provided