Distribution Normality

In probability theory, the normal distribution is a very common distribution.

Normal distributions are important in statistics and are often used in industry, natural and social sciences to represent real random variables whose distributions are not known. Several very important statistical tools rely on the assumption of normality. For this reason, it is essential to validate the assumption for continuous variables so that we can choose the right (Six Sigma) tools.

The normal distribution is useful because of the central limit theorem, which states that under certain conditions the sum of many random variables will have an approximately normal distribution.

In statistics, normality tests are used to determine whether a set of data follows a normal distribution. More precisely, the tests are a form of model selection, and can be interpreted in several ways.

The normal probability plot is a graphical technique for identifying deviations from normality. This includes identifying outliers, asymmetry, flattening, etc. Normal Probability Plots are made from raw data and estimated parameters.

The Probability Plot compares the theoretical values with the observed values to test whether the distribution is normal.

On the vertical axis we see the percentage values for a theoretical normal distribution having the same mean and standard deviation as the collected data. Current data appears on the horizontal axis (see adjacent chart).

If the two distributions match perfectly, the points on the graph will fall on a straight line with a slope equal to 1. To be a normal distribution, p-value > 0.05. All data should lie between the two curved bands above and below the 45° line.

The test statistic (Anderson-Darling) measures how well the data follows a certain distribution. For example, we can use the test to determine whether the data meets the assumption of normality for a t-test. We can use MINITAB to do this.

The assumptions for the Anderson-Darling test are:

❖ H0: The data follows a normal distribution;

❖ H1: The data does not follow a normal distribution.

We use p-values to test for normality. If the p-value is greater than a risk coefficient called "risk alpha" (usually 0.05 or 0.10), the null hypothesis cannot be rejected. In the image, we can see that the p-value is 0.1 (> 0.05).

Under these conditions, we can say that our data follows a normal distribution.