
For all the results, Please Click here
Join the League
Please fill out the form below for a free demo class!
By the end of this chapter you should be familiar with:
Contingency tables (also called crosstabs or two-way tables) are used in statistics to summarize the relationship between several categorical variables. A contingency table is a special type of frequency distribution table, where two variables are shown simultaneously.
The Spearman’s Rank Correlation Coefficient is used to discover the strength of a link between two sets of data. The notation used is rs.
Spearman’s correlation coefficient shows the extent to which one variable increases or decreases as the other variable increases. Such behaviour is described as ‘monotonic’.
A value of 1 means the set of data is strictly increasing, a value of -1 means the set of data is strictly decreasing and a value of 0 means no monotonic behaviour.
Spearman’s rank correlation coefficient is calculated from a sample of N data pairs (X, Y) by first creating a variable U as the ranks of X and a variable V as the ranks of Y (ties replaced with average ranks). Spearman’s correlation is then calculated from U and V using:Example: Find the Spearman’s rank correlation coefficient for the following data:
Height | 65 | 66 | 67 | 67 | 68 | 69 | 70 | 72 |
Weight | 67 | 68 | 65 | 68 | 72 | 72 | 69 | 71 |
Solution: Make a table:
Where d = H – W
CF1 = 2(4 − 1)/12 = (2 × 3)/12 = 0.5 CF2 = (2 × 3)/12 = 0.5
C.F = ∑d2+ CF1 + CF2 = 26 + 0.5 + 0.5 = 27
rs = 1 – (6 × 27)/ (8×6) = 0.679
A statistical hypothesis is an assumption about a population parameter. This assumption may or may not be true. Hypothesis testing refers to the formal procedures used by statisticians to accept or reject statistical hypotheses.
There are two types:
Statisticians follow a formal process to determine whether to reject a null hypothesis, based on sample data. This process, called hypothesis testing, consists of four steps.
A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling distribution, is called a one-tailed test and a test of a statistical hypothesis, where the region of rejection is on both sides of the sampling distribution, is called a two-tailed test.
To hypothesis test with the binomial distribution, we must calculate the probability, p, of the observed event and any more extreme event happening. We compare this to the level of significance α. If p > α then we do not reject the null hypothesis. If p < α we accept the alternative hypothesis.
Example: A coin is tossed twenty times, landing on heads six times. Perform a hypothesis test at a 5% significance level to see if the coin is biased.
Solution: First, we need to write down the null and alternative hypotheses. In this case
H0: The coin is not biased.
H1: The coin is biased in favour of tails.
The important thing to note here is that we only need a one-tailed test as the alternative hypothesis says “in favour of tails”. A two-tailed test would be the result of an alternative hypothesis saying “The coin is biased”.
P[X ≤ 6] = 0.058
P[X ≤ 5] = 0.021
We would have had to reject the null hypothesis and accept the alternative hypothesis. So the point at which we switch from accepting the null hypothesis to rejecting it is when we obtain 5 heads. This means that 5 is the critical value.
Testing hypotheses with the Poisson distribution is very similar to testing them with the binomial distribution. If the probability is greater than α, the level of significance, then the null hypothesis is accepted. If it is less than α, we accepted the alternative hypothesis.
Example: An existing make of car is known to break down on average one and a half times per year. A new model is introduced and the manufacturer claims that this model is less likely to break down. Ten
randomly selected cars break down a total of eight times within the first year. Test the manufacturer’s claim at a 5% significance level.
Solution: Let X be the number of break downs of the new model of car in a year. Since we have an average rate and the data is discrete, we need to use a Poisson distribution. So X ∼ Poisson(λ) with λ=1.5. The null and alternative hypotheses will be
H0 : H1 : λ = 1.5, λ < 1.5
We need to decide whether P[X ≤ 8] < α, where α=0.05 is the significance level. Firstly, the expected number of breakdowns λt = 1.5×10 = 15.
We use the cumulative tables with λt = 15 and x=8 to see P[X ≤ 8] = 0.0374
P[X ≤ 8] = 0.0374 < 0.05 = α
So we accept the alternative hypothesis. The average rate of breakdowns has decreased.
When constructing a confidence interval with the standard normal distribution, these are the most important values that will be needed.
Significance Level | 10% | 5% | 1% |
z1 – α | 1.28 | 1.645 | 2.33 |
z1 – α/2 | 1.645 | 1.96 | 2.58 |
These values are obtained from the inverse of the cumulative distribution function of the standard normal distribution. i.e. we need to consider ∅-1x. For example, when we look for the probability, say, that z < 2.33, we get P[z < 2.33] = 0.99. Now if we have a 1% significance level, we need a 99% confidence interval so we need z distribution of sample means where μ is the true mean and μ0 is the current accepted population mean. Draw samples of size n from the population. When n is large
enough and the null hypothesis is true the sample means often follow a normal distribution with mean μ0 and standard deviation 𝜎/√𝑛 . This is called the distribution of sample means and can be denoted by 𝑥̅ ∼ N(μ0, 𝜎/√𝑛). This follows from the central limit theorem.
The z-score will this time be obtained with the formula
z = 𝑥̅− 𝜇0/𝜎/√𝑛
So if μ = 𝜇0, X ∼ N(𝜇0, 𝜎/√𝑛) and z ∼ N(0, 1)
The alternative hypothesis will then take one of the following forms: depending on what we are testing.
Example: An automobile company is looking for fuel additives that might increase gas mileage. Without additives, their cars are known to average 25 mpg (miles per gallons) with a standard deviation of 2.4 mpg on a road trip from London to Edinburgh. The company now asks whether a particular new additive increases this value. In a study, thirty cars are sent on a road trip from London to Edinburgh. Suppose it turns out that the thirty cars averaged 𝑥̅ = 25.5 mpg with the additive. Can we conclude from this result that the additive is effective?
Solution: We are asked to show if the new additive increases the mean miles per gallon. The current mean μ=25 so the null hypothesis will be that nothing changes. The alternative hypothesis will be that μ>25 because this is what we have been asked to test.
H0: μ = 25 H1: μ > 25
Now we need to calculate the test statistic. We start with the assumption the normal distribution is still valid. This is because the null hypothesis states there is no change in μ. Thus, as the value σ=2.4 mpg is known, we perform a hypothesis test with the standard normal distribution. So the test statistic will be a z score. We compute the z score using the formula
z = (𝑥̅− 𝜇0)/𝜎/√𝑛 = (25.5−25)/2.4/√30
We are using a 5% significance level and a (right-sided) one-tailed test, so α=0.05 so from the tables we obtain z1- α= 1.645 is our test statistic.
As 1.14 < 1.645, the test statistic is not in the critical region so we cannot reject H0. Thus, the observed sample mean 𝑥̅ is consistent with the hypothesis H0: μ = 25 on a 5% significance level.
The t-test is a statistical test which is widely used to compare the mean of two groups of samples. It is therefore to evaluate whether the means of the two sets of data are statistically significantly different from each other.
There are many types of t test:
ONE SAMPLE T-TEST
As mentioned above, one-sample t-test is used to compare the mean of a population to a specified theoretical mean (μ).
Let X represents a set of values with size n, with mean m and with standard deviation S. The comparison of the observed mean (m) of the population to a theoretical value μ is performed with the formula below:
t = (m−μ)/ 𝑠/√𝑛
To evaluate whether the difference is statistically significant, you first have to read in t test table the critical value of Student’s t distribution corresponding to the significance level alpha of your choice (5%). The degrees of freedom (df) used in this test are:
df = n−1
Independent (or unpaired two sample) t-test is used to compare the means of two unrelated groups of samples.
The t test statistic value to test whether the means are different can be calculated as follows:Once t-test statistic value is determined, you have to read in t-test table the critical value of Student’s t distribution corresponding to the significance level alpha of your choice (5%). The degrees of freedom (df) used in this test are:
df = nA + nB – 2
PAIRED SAMPLE T-TEST
To compare the means of the two paired sets of data, the differences between all pairs must be, first, calculated.
Let d represents the differences between all pairs. The average of the difference d is compared to 0. If there is any significant difference between the two pairs of samples, then the mean of d is expected to be far from 0.
The t-test statistic value can be calculated as follows:
t = m/𝑠√n
where m and s are the mean and the standard deviation of the difference (d), respectively. n is the size of d.
Once t value is determined, you have to read in t-test table the critical value of Student’s t distribution corresponding to the significance level alpha of your choice (5%). The degrees of freedom (df) used in this test are:
df = n – 1
Example: Find the t-test value for the following two sets of values: 7, 2, 9, 8 and 1, 2, 3, 4?
Solution:
x1 | x1 – 𝑥̅1 | (x1 -𝑥̅1 )2 |
7 | 0.5 | 0.25 |
2 | -4.5 | 20.25 |
9 | 2.5 | 6.25 |
8 | 1.5 | 2.25 |
∑( x1 -𝑥̅1 )2 = 29 |
Mean for the first set of data = (7+2+9+8)/4 = 6.5
Standard deviation for the first set of data = 3.11
x2 | x2 -𝑥̅2 | (x2 -𝑥̅2 )2 |
1 | -1.5 | 2.25 |
2 | -0.5 | 0.25 |
3 | 0.5 | 0.25 |
4 | 1.5 | 2.25 |
∑(x2 -𝑥̅2 )2 = 5 |
Mean for the first set of data = (1+2+3+4)/4 = 2.5
Standard deviation for the first set of data = 1.29
For t-test value: t = 2.36
CHI-SQUARED TEST FOR INDEPENDENCE
A chi-square test for independence is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables.
Degrees of freedom. The degrees of freedom (df) is equal to:
df = (r – 1) (c – 1)
where r is the number of levels for one categorical variable, and c is the number of levels for the other categorical variable.
The test statistic is a chi-square random variable (Χ2) defined by the following equation.
X2 = ∑ (𝑓0 − 𝑓𝑐)2 /𝑓𝑐
Where f0 are the observed values and fc are the expected values.
As we already know, that if this number is larger than a critical value then we reject null hypothesis.
The p-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a chi-square, use the chi-square distribution calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.
Example: A public opinion poll surveyed a simple random sample of 1000 voters. Respondents were classified by gender (male or female) and by voting preference (Republican, Democrat, or Independent). Results are shown in the contingency table below.
Voting Preferences | Row total | |||
Rep | Dem | Ind | ||
Male | 200 | 150 | 50 | 400 |
Female | 250 | 300 | 50 | 600 |
Column total | 450 | 450 | 100 | 1000 |
Is there a gender gap? Do the men’s voting preferences differ significantly from the women’s preferences? Use a 0.05 level of significance.
Solution:
The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyse sample data, and (4) interpret results. We work through those steps below:
Since the P-value (0.0003) is less than the significance level (0.05), we cannot accept the null hypothesis. Thus, we conclude that there is a relationship between gender and voting preference.
A chi-square goodness of fit test is applied when you have one categorical variable from a single population. It is used to determine whether sample data are consistent with a hypothesized distribution.
We follow the same steps followed for chi-square test for independence but here
Example: Acme Toy Company prints baseball cards. The company claims that 30% of the cards are rookies, 60% veterans but not All-Stars, and 10% are veteran All-Stars. Suppose a random sample of 100 cards has 50 rookies, 45 veterans, and 5 All-Stars. Is this consistent with Acme’s claim? Use a 0.05 level of significance.
Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyse sample data, and (4) interpret results. We work through those steps below:
We use the chi-square distribution calculator to find P(Χ2 > 19.58) = 0.0001. Since the P-value (0.0001) is less than the significance level (0.05), we cannot accept the null hypothesis.
Two types of errors can result from a hypothesis test.
The probability of not committing a type II error is called the power of a hypothesis test.
To compute the power of the test, one offers an alternative view about the “true” value of the population parameter, assuming that the null hypothesis is false. The effect size is the difference between the true value and the value specified in the null hypothesis.
Effect size = True value – Hypothesized value
The power of a hypothesis test is affected by three factors.
Example: A machine fills milk bottles, the mean amount of milk in each bottle is supposed to be 32 Oz with a standard deviation of 0.06 Oz. Suppose the mean amount of milk is approximately normally distributed. To check if the machine is operating properly, 36 filled bottles will be chosen at random and the mean amount will be determined.
Solution:
For all the results, Please Click here
Join the League
Download our Successful College Application Guide developed by counselors from the University of Cambridge for institutions like Oxbridge alongside other Ivy Leagues. To join our college counseling program, call at +918825012255
We are hiring a Business Development Associate and Content Writer and Social Media Strategist at our organisation TYCHR to take over the responsibility of conducting workshops and excelling in new sales territory. View More