How would you characterize the skewness of the distribution in Question 1—positively skewed, negatively skewed, or approximately normal? Provide a rationale for your answer. The distribution in question 1 is skewed positively, when a skewed is positive it has the most statistic and it’s also more to the right side and the data is mostly on the right side, most people that participate in this study were mainly older generation. If it was a negative skewed it would be pointed more to the left meaning younger people participated.
Compare the original skewness statistic and Shapiro-Wilk statistic with those of the smaller dataset (n = 15) for the variable “Age at First Arrest.” How did the statistics change, and how would you explain these differences? When the original skewness statistic and Shapiro-Wilk is being compared, the original study n=20 and the n=15 has less skewness statistic. The graph of the n=15 shows a natural distribution than the original one. The new data value also is lower, the Shapiro-Wilk new statistic shows a p value of 0.211 and the new data statistic is within normal range where p is greater than 0.05.
Age at Enrollment | |||||
| Frequency | Percent | Valid Percent | Cumulative Percent | |
Valid | 41 | 1 | 6.7 | 6.7 | 6.7 |
43 | 1 | 6.7 | 6.7 | 13.3 | |
47 | 1 | 6.7 | 6.7 | 20.0 | |
49 | 1 | 6.7 | 6.7 | 26.7 | |
52 | 2 | 13.3 | 13.3 | 40.0 | |
56 | 3 | 20.0 | 20.0 | 60.0 | |
58 | 1 | 6.7 | 6.7 | 66.7 | |
60 | 2 | 13.3 | 13.3 | 80.0 | |
62 | 1 | 6.7 | 6.7 | 86.7 | |
63 | 2 | 13.3 | 13.3 | 100.0 | |
Total | 15 | 100.0 | 100.0 | |
The distribution in question 1 is skewed positively, when a skewed is positive it has the most statistic and it’s also more to the right side and the data is mostly on the right side, most people that participate in this study were mainly older generation. If it was a negative skewed it would be pointed more to the left meaning younger people participated.
Age at 1st Arrest | |||||
| Frequency | Percent | Valid Percent | Cumulative Percent | |
Valid | 12 | 1 | 6.7 | 6.7 | 6.7 |
14 | 1 | 6.7 | 6.7 | 13.3 | |
16 | 1 | 6.7 | 6.7 | 20.0 | |
17 | 1 | 6.7 | 6.7 | 26.7 | |
19 | 1 | 6.7 | 6.7 | 33.3 | |
20 | 1 | 6.7 | 6.7 | 40.0 | |
23 | 1 | 6.7 | 6.7 | 46.7 | |
27 | 1 | 6.7 | 6.7 | 53.3 | |
28 | 1 | 6.7 | 6.7 | 60.0 | |
29 | 1 | 6.7 | 6.7 | 66.7 | |
31 | 1 | 6.7 | 6.7 | 73.3 | |
38 | 1 | 6.7 | 6.7 | 80.0 | |
42 | 1 | 6.7 | 6.7 | 86.7 | |
43 | 1 | 6.7 | 6.7 | 93.3 | |
59 | 1 | 6.7 | 6.7 | 100.0 | |
Total | 15 | 100.0 | 100.0 | |
Descriptives | ||||
| Statistic | Std. Error | ||
Age at Enrollment | Mean | 54.53 | 1.818 | |
95% Confidence Interval for Mean | Lower Bound | 50.64 | | |
Upper Bound | 58.43 | | ||
5% Trimmed Mean | 54.81 | | ||
Median | 56.00 | | ||
Variance | 49.552 | | ||
Std. Deviation | 7.039 | | ||
Minimum | 41 | | ||
Maximum | 63 | | ||
Range | 22 | | ||
Interquartile Range | 11 | | ||
Skewness | -.622 | .580 | ||
Kurtosis | -.600 | 1.121 | ||
Age at 1st Arrest | Mean | 27.87 | 3.366 | |
95% Confidence Interval for Mean | Lower Bound | 20.65 | | |
Upper Bound | 35.09 | | ||
5% Trimmed Mean | 27.02 | | ||
Median | 27.00 | | ||
Variance | 169.981 | | ||
Std. Deviation | 13.038 | | ||
Minimum | 12 | | ||
Maximum | 59 | | ||
Range | 47 | | ||
Interquartile Range | 21 | | ||
Skewness | .990 | .580 | ||
Kurtosis | .746 | 1.121 |
Tests of Normality | ||||||
| Kolmogorov-Smirnova | Shapiro-Wilk | ||||
Statistic | df | Sig. | Statistic | df | Sig. | |
Age at Enrollment | .183 | 15 | .192 | .927 | 15 | .248 |
Age at 1st Arrest | .138 | 15 | .200* | .923 | 15 | .211 |
*. This is a lower bound of the true significance. | ||||||
a. Lilliefors Significance Correction |
When the original skewness statistic and Shapiro-Wilk is being compared, the original study n=20 and the n=15 has less skewness statistic. The graph of the n=15 shows a natural distribution than the original one. The new data value also is lower, the Shapiro-Wilk new statistic shows a p value of 0.211 and the new data statistic is within normal range where p-value is greater than 0.05.
Q4. The way I would describe the Kurtosis of the question 4 distribution is leptokurtic, which is where the distribution is bunched around the mean which results higher
Statistics | ||
Age at Enrollment | ||
N | Valid | 15 |
Missing | 0 | |
Skewness | -.622 | |
Std. Error of Skewness | .580 |
The Skewness statistic can be reviewed in the data below it is 0.622. a skewness that is negative means most of statistic tail is on the left, which can be seen in the data below. The table shows that the data is a little bit skewed because the value between -1 and -1/2 or 1 and ½ is moderate.
Statistics | ||
Years of Education | ||
N | Valid | 15 |
Missing | 0 | |
Skewness | .658 | |
Std. Error of Skewness | .580 | |
Kurtosis | -.936 | |
Std. Error of Kurtosis | 1.121 |
The kurtosis for years of education is 0.936, when a value is negative for a kurtosis it means the tail of the statistic is light, and the data is mainly around the mean. Meaning the magnitude is less than one, meaning the value of kurtosis is moderate.
Tests of Normality | ||||||
| Kolmogorov-Smirnova | Shapiro-Wilk | ||||
Statistic | df | Sig. | Statistic | df | Sig. | |
Number of Times Fired from Job | .311 | 15 | .000 | .737 | 15 | .001 |
a. Lilliefors Significance Correction |
Using the SPSS with the Shapiro-Wilk it’s a test that diverge from a normal distribution, meaning if a p value is less than 0.05 it can be used to verify if a distribution is normal or not. In this example the value 0.001 meaning the number is not standard from the amount of times getting tired from a job.
The Kolmogorov-Smirnov is inappropriate to report because its usually used for larger sample sizes, normally it’s not being used until the samples sizes got to 2,000.
It’s not very uncommon for the skewness to be low and the Shapiro-Wilk to be high. Skewness measures the moves over of the tail of the graph from the mean, if its low the tails becomes equal and they move over from the mean. On the other hand the Shapiro-Wilk look at the whole shape of the distribution all together, meaning the data may be non-parametric, or doesn’t follow any distribution at all.