Inference means making an assumption of outcome in the population while performing analysis on sample. So this assumption has to be tested before coming to the conclusion and this is called hypothesis testing. I have to warn you that this is a complete theoretical concept of helping you understand the hypothesis testing with simple examples. Before you read this blog, make sure you understand the basics of statistics properly like mean, standard deviation, variance, normal distribution. Once you have good knowledge on this topics, you can get into this to understand the statistical inference using hypothesis testing.
There are two types of hypothesis testing
1. Null hypothesis
Hypothesis testing is always about the population data. Hypothesis testing always develops two statements, one is null and another is alternate in order to make business decisions. This statement or expression is created by using the mean of the sample data. We make decisions on the basis of performance produced by the population. It is done by taking the sample data of the population and perform analysis in order to accept or reject the population.
When it comes to null hypothesis, it explains about the claim made by the organization or customer and its comparison with the results obtained from the observations of the sample data of the product. it is represented by Ho which explains the statement given by the organization. When it comes to alternate hypothesis, it challenges the null hypothesis against the claim and works on proving that claim is wrong as per the hypothesis. It is also used to help companies understand the productivity of the model they have created.
For example, Null hypothesis is denoted as Ho and suppose if we claim that the mean of the sample is equal to 100 then it is can written as
Ho: µ = 100
and then the alternate hypothesis challenges the above statement saying that
Ha: µ! = 100
Ha: µ< 100
Ha: µ> 100
So here not equal to 100 actually means it could be< 100 or >100.so as per real time scenario, the two possibilities we consider is less than 100 and greater than 100 as alternate hypothesis.
Suppose if we take an example of Ford Truck Company which has redesigned its car F150 to reduce the noise issues and in its advertising it has claimed that the truck is now quieter. The average noise of the truck was 68 decibels at 60 mph which is actually a heavy noise as per the market standards.
So we need to understand that, if the average noise of the truck is 68, then the claim made by the organization is false, so in order to justify their statement, the average noise has to be less than 68.So now we take a sample out of the population which is trucks and perform hypothesis testing.So when we start hypothesis testing observations shown by the company is considered as null hypothesis. Understanding the fact that average noise of the truck is 68 decibels is heavy, there is a possibility that the noise could be more than 68 too. During analysis, if the null hypothesis turns out to be true, then the company has to stop the production.So the null hypothesis is written as
Ho: u> = 68
So the alternate hypothesis is written as
Which actually challenges the null hypothesis in order to prove that the noise is lower than 68 decibels and the statement given by the company that the noise is quieter is true and production can be continued. Alternate only takes either greater than or less than the given value. It does not take equal to value based on real time scenarios in order to provide justifying solution.
If the null hypothesis is rejected the ford company has to enough evidence to support that the company producing the trucks with reduced noise. This is how the decision making is performed on the basis of hypothesis testing. In order to perform the hypothesis testing, the sample data taken from the population has to be normally distributed in the normal distribution graph which actually develops a bell curve. In this curve you need to find the exact region which falls under alternate hypothesis. To find that, critical region is introduced which is the region is the region of alternate hypothesis. If we find our analysis value under the critical region, it means the company has made a successful product since the average noise of trucks will be less than 68 decibels.
The Region of alternate can be either right tailed or left tailed or sometimes both.
So if for example, when Ho: µ<=2 is null hypothesis, then Ha: µ>2 is alternate hypothesis, which is a single tail since you have only one statement for analysis and when µ <2, the critical region falls on the left side of the mean and it is called left single tailed or lower tailed.
The region taken by the critical is the region of alternate which is either right or left tailed or sometimes both side tailed.
The right tailed has the critical region on right where the arrow is indicating the hashed region as shown in figure above and the remaining region will be taken as the region of null hypothesis Ho. when the alternate can take up two possibilities, then there will be two critical regions of alternate and the remaining region will be Ho
So what is the significance of this critical region and where this is used?
To understand this we need to follow the steps of hypothesis
- Formulate the hypothesis statements – null and alternate(mean/average)
- Take the sample data.
- Measure the sample for the mean.
- Use test statistic to do the hypothesis testing
We need to do the test statistic as there is a claim that population has some kind of outcome. To test that, we take a sample and perform operations of the sample but it does not mean we have the same outcome for population as per sample. It needs some analysis between this intervals from sample to population. That is actually done by using test statistic.
This test statistic consists of z test, t test, f test and chi square test.
Z Statistic test:
When we take a population and sample from it we need to infer about the population on the basis of sample.
If we know the standard deviation of population and the sample size you taken is more than 30 then the type of test statistic taken is Z test as per the standards. While performing hypothesis, there is always significance level with either 5% or 10% which indicates the percentage of error which is type 1 error that may occur while making decision. This value will be given by the company as per the standards. So this significance level is based on the type 1 error where the person rejects the null by making a mistake where it actually has to be accepted. This kind of errors is possible with the human involvement.
Once we calculate the value of z test by using sample mean, hypothesized mean, SD, and significance value, it is compared with the Z critical value. The formula is shown below.
Once you find out the z statistic value, you need to find the value of Z critical region which is denoted as Z α by using (This formula is used in R studio).
The Z critical value is mostly given by the company as per standards followed and if not it can be found out by calculating with significance value.
The critical region in the normal distribution is the alternate hypothesis region. If the value of Z alpha is positive, then the region is plotted on the right hand side of the normal distribution. The initial point of critical region is found out by identifying the Z critical value and the remaining region is considered as the null hypothesis region. When Z statistic value is more than the Z critical region value then it comes under the alternate hypothesis. So we can easily reject null hypothesis.
Same applies for left tailed test where it may fall under the region of alternate hypothesis.
1.We should know population standard deviation.
2. Sample taken should be more than 30
3. Formulate the hypothesis statements and check whether it is right tailed or left tailed.
4. Calculate z statistic.
5. We will be given z critical value or we can find out by using function qnorm and alpha value.
Rule of thumb:
1. if z statistic > z critical, then reject null
2. if z statistic < z critical, then accept null
1. if z statistic < z critical, then reject null
2. if z statistic > z critical, then accept null
Two tailed Test:
The critical value of z is divided into two parts since there are two alternate regions which rejects null hypothesis. Since the alpha is 0.05 and the z critical will be qnorm * ( 1 – alpha/2)
And it gives 1.96. So the 1.96 is on the right side of normal distribution and -1.96 on the left side of normal distribution. This is how the hypothesis testing can be performed by tracking the critical region and finding the Z statistic value in order to accept or reject the population.