![]() |
We shall simulate the creation of a normal probability plot for a
case where the data are known to be non-normal, namely data drawn
from an exponential distribution. The standard exponential
distribution We shall reproduce the steps in Tutorial #2 for Excel to generate a random sample from an exponential distribution. |
On the main menu bar, click on 'Data', then 'Data Analysis'.
In the new dialog box, |
![]() |
In the new dialog box, Number of variables: set to 1 Use the pull down menu to set the Accept the default parameters 0 and 1. In the output options group, Click 'OK'. |
![]() |
![]() |
A set of random numbers appears in cells 'A2' to 'A101'. Note that your values will be different from the ones you see here. |
![]() |
Let us fix the top row, so that we. On the main menu bar at the top, Click on the middle option |
To generate random values from the exponential distribution with mean
μ = 1 / λ
(and standard deviation = σ = μ = 1 / λ),
we need to take the negative of the natural logarithm of these
uniform random values and scale by the mean:
y = –μ × ln(x).
For the standard exponential distribution, μ = 1.
In cell 'B2', enter the formula |
![]() |
![]() |
Click and hold on the bottom right corner of cell 'B2', Copy-drag the formula down to cell 'B101' and release. Column 'B' now contains a random sample of 100 values drawn from the standard exponential distribution (mean 1). |
We shall reproduce some of the steps from
Tutorial #1.
Enter the text as shown in columns 'D' and 'G'.
Enter formulas for the summary statistics: | ![]() | |
Cell 'E2' (n): | = COUNT(B:B) | |
Cell 'E3' (Sum): | = SUM(B:B) | |
Cell 'E4' (Mean): | = E3/E2 | |
Cell 'E6' (s.d.): | = STDEV.S(B:B) | |
Cell 'E8' (min): | = MIN(B:B) | |
Cell 'E9' (Q1): | = QUARTILE.EXC(B:B, 1) | |
Cell 'E10' (median): | = MEDIAN(B:B) | |
Cell 'E11' (Q3): | = QUARTILE.EXC(B:B, 3) | |
Cell 'E12' (max): | = MAX(B:B) | |
Cell 'H5' (IQR): | = E11-E9 | |
Cell 'H6' (SIQR): | = H5 / 2 | |
Cell 'H8' (UOF): | = E11 + 3*H5 | |
Cell 'H9' (UIF): | = E11 + 3*H6 | |
Cell 'H11' (LIF): | = E9 - 3*H6 | |
Cell 'H12' (LOF): | = E9 - 3*H5 |
![]() |
First select all of column 'B'. |
By clicking on various elements within this chart you can
customize it.
Note that the details of your graph are likely to differ somewhat from this
graph. Your random sample will be different from this one.
Open Excel and import the data set, for which you wish to create a normal probability plot, into a column.
In this demonstration, we shall use the same file that was created above.
Alongside the genuine data, The new values can be any set of distinct
numbers. |
![]() |
![]() |
Click on the menu item
" Click on " |
Scroll down the list of Analysis Tools
and click on " then click on the " |
![]() |
![]() |
In the Regression dialog box: Select the range where your data are stored Select the range where the junk data are stored Check the box for confidence level. Choose a location for the tables of values,
(most of which will be irrelevant!). Check the last box " OK " button. |
At the location where the
output was chosen to be, various tables and a graph appear.
Ignore the tables.
Resize the graph.
Right-click on any of the plotted points.
Click on "Add Trendline...
".
A new pane appears to the right.
The default option (linear trend) is OK.
You may also wish to tidy the graph up somewhat, to produce a result
like this:
A random sample drawn from a normal distribution will have all (or nearly all) of its points near the straight line of a normal probability plot. It is visually obvious that this sample is inconsistent with having been drawn from a normal population.
There are problems with this plot:
The two axes are in the wrong order
The scale for the 'sample percentile' axis should be double logarithmic,
not linear.
Confidence interval curves are missing.
Minitab overcomes all of these problems easily.
We can also see what a normal probability plot looks like for data that really are drawn from a normal population.
In the main menu bar, click on 'Data' then 'Data Analysis'
Find and select 'Random Number Generation' Click on 'OK' |
![]() |
Enter the values shown in the 'Random Number Generation' dialog box
and click 'OK'.
Now we can repeat the steps to create a normal probability plot.
Scroll down the list of Analysis Tools
and click on " then click on the " |
![]() |
![]() |
In the Regression dialog box: Select the range where your data are stored Select the range where the junk data are stored Check the box for confidence level. Choose a location for the tables of values,
(most of which will be irrelevant!). Check the last box " OK " button. |
At the location where the output was chosen to be, various tables and a graph appear.
Resize the graph.
Right-click on any of the plotted points.
Click on "Add Trendline...
".
Ignore the new pane to the right.
You may also wish to tidy the graph up somewhat, to produce a result
like this:
Except for a few points at each end, these sample values fall very close to the straight line of a normal probability plot. It is visually obvious that this sample is consistent with having been drawn from a normal population.
The Excel file is NormPlot3.xlsx.
You can compare the graphs of the
probability density functions for some standard probability
distributions.
standard exponential![]() |
standard normal![]() |
---|---|
standard log-normal![]() |
standard Cauchy![]() |
beta (18, 2)![]() |
We shall use Excel to
In a new Excel workbook, type in the text shown.
Again I have chosen to change the font for the entire workbook to Arial 12 and
to change the width of the first few columns to 10.
Right clicks on cells (or on characters within a cell entry) allow you to control
cell alignment (left, centre or right) and font (Symbol font italic to change
'm' to the mean μ and 's' to the standard deviation σ)
![]() |
To change 'zcrit' to 'zcrit', In the pop-up menu, |
In the new dialog box, Click on 'OK'. |
![]() |
Copy the newly modified cell 'D5' to 'G5'.
The standard error (s.e.) is
the standard deviation σ divided by
the square root of the sample size n.
In cell 'B5' enter = B3/SQRT(B4)
For two sided 95% confidence intervals,
the critical z value is z.025
In cell 'E5' enter
= -NORM.S.INV(0.025)
For one sided 95% confidence intervals,
the critical z value is z.050
In cell 'H5' enter
= -NORM.S.INV(0.05)
In the main menu bar, click on 'Data' then 'Data Analysis'
Find and select 'Random Number Generation' Click on 'OK' |
![]() |
Enter the values shown in the 'Random Number Generation' dialog box
and click 'OK'.
When we scroll down through the data, we wish to keep the top eight rows in view.
![]() |
Click on cell 'A9', In the main menu bar, click 'View', All rows above cell 'A9' remain in view. |
Column 'A' contains values of the standard normal random quantity
For each simulated random sample, the sample mean is z standard errors
above the true population mean μ.
![]() |
In cell 'B9' enter the formula (The '$' provides an absolute reference |
Drag-copy cell 'B9' down to row 208.
When the population standard deviation σ is known, the
boundaries cL, cU
of the two-sided
xBar ± zα/2×(s.e.)
The one sided lower confidence interval is
μ < xBar +
zα×(s.e.)
The one sided upper confidence interval is
μ > xBar –
zα×(s.e.)
The two sided confidence interval captures the true value of the
population mean if and only if
Similarly, the one sided confidence interval captures the true value of the population mean if and only if μ is inside the interval.
Enter the following formulas:
In cell 'D9': | = B9-E$5*B$5 |
In cell 'E9': | = B9+E$5*B$5 |
In cell 'F9': | = IF(B$2>D9,IF(B$2<E9,1,0),0) |
In cell 'G9': | = B9-H$5*B$5 |
In cell 'H9': | = IF(B$2>G9,1,0) |
In cell 'I9': | = B9+H$5*B$5 |
In cell 'J9': | = IF(B$2<I9,1,0) |
Drag-copy cells 'D9' to 'J9' down to row 208.
Column 'E' now records how many times the 2-sided confidence interval captured the true value of μ.
In cell 'J4' enter = SUM(F9:F208)/2
Unfortunately, pressing 'F9' will not draw a new set of 200
random samples.
One has to overwrite the values in column 'A'
with a new set of random normal values.
The proportion of intervals that contain the true mean will vary,
but will mostly be close to 95%.
The user can vary the values of μ, σ and n.
One can extend this spreadsheet to simulate 99% confidence intervals (or any other level of confidence) by amending the formula for the critical z value in cells 'E5' and 'H5'.
A more extensive revision, requiring the use of Excel macros or a commercial add-in, is to estimate the population standard deviation by the sample standard deviation of each random sample. Critical values of z should be replaced by the corresponding critical values of tn-1.
Also available is a Minitab macro to simulate the construction of confidence intervals, from many random samples, for a population mean and to show more effectively that the proportion of 95% confidence intervals that capture the true value of the population mean is close to 95%.