Statistical data can be "distributed" (spread, dispersed, scattered) in a variety of ways.
See Shapes of Distributions.

bullet Bell-shaped Curve:

There are certain sets of data where the data, when graphed, are symmetrical with a single central peak at the mean (average) of the data. The shape of the curve is described as bell-shaped with the graph falling off evenly on either side of the mean. Fifty percent of the distribution lies to the left of the mean and fifty percent lies to the right of the mean. Such graphs are called normal curves, and referred to as a normal distribution. The mean, median and mode are all the same in a normal distribution.

normalturq

Notice how the histogram closely follows the form of the bell curve.

A normal distribution is the most widely known and used of all distributions. It is an extremely important statistical data distribution pattern occurring in many natural phenomena, such a blood pressure, machined parts, human height, error in measurement, IQ scores, sizes of snowflakes, lifespans of light bulbs, weights of loaves of bread, test scores, milk production in cows, etc. When data pertaining to these phenomena are graphed as histograms with data on the horizontal axis and the amount of data on the vertical axis, a bell-shaped curve (normal curve) may be created.

A normal distribution is actually a "family of distributions", since the mean and standard deviation, which determine the shape of the distribution, may differ from graph to graph.

Normal Distribution Basic Properties:
1. symmetric about the mean
2. the
mean = the mode = the median
3. the mean
divides the data in half
4.
defined by mean and standard deviation
5. the curve is
unimodal (one peak)
6. the
curve approaches, but never touches, the x-axis, as it extends farther and farther away from the mean.
7.
total area under the curve = 1.
(Characteristics of perfectly normal distributions.)
normalbunch2
While the four normal curves shown above (at the right) share all of these basic properties, they are still unique
(different from one another) as to mean and standard deviation.

divider dash

bullet Standard Deviation: See Refresher: Variance and Standard Deviation.
       (Used in this context, the term "deviation" refers to "how far from the mean".)

A normal distribution can have any mean and any positive standard deviation. The mean determines the line of symmetry of the graph, and the standard deviation determines how much the data are spread out.

The smaller the standard deviation, the more concentrated the data and narrower the graph. The larger the standard deviation, the more dispersed the data, and wider the graph.
GCnormals
Population standard deviation = σ (small case Greek sigma); population mean = μ (small case Greek mu).

divider dash

bullet A Closer Look at the Shape of the Normal Curve:
normalinflection
The graph shown above states that for this normal distribution the mean is 80 and the standard deviation is 20. The graph also shows that the mean (80) + 1 standard deviation (20) equals 100, the mean (80) + 2 standard deviations (40) = 120, and so on, to both the left and right sides of the mean.

Could you approximate the mean and standard deviation of a normal curve if this specific information was not stated on the graph?

If you were given a normal curve, without being told the mean and the standard deviation, you could approximate this information based upon the shape of the curve.
• The mean in a normal curve divides the curve symmetrically. Therefore, the mean will pass through the highest point on the graph. In this example, it is logical to assume that the mean is 80.

• In a normal curve, the point at which the graph changes from curving downward to curving upward is called an inflection point and occurs at plus (or minus) one standard deviation from the mean. Examining a normal curve for this location will yield an approximation of the value of the standard deviation. In this example, an inflection point can be seen to be occurring around 100, or approximately 20 points above the mean. The standard deviation could be approximated to be 20.
(Think of an inflection point as the center point when drawing a figure eight. It is the point where the bending changes from the top of the 8 to the bottom.)

divider dash

bullet Percentages Under a Normal Curve:

As seen in the previous section, the standard deviation can be used to sub-divide the space (the area) under a normal curve, starting from the mean. Each of these sub-divided sections can be used to represent a portion (a percentage) of the data falling into these sections of the graph. The normal curve actually shows how likely it is to find a value within a specific distance from the mean.

Using 1 standard deviation to create the subdivisions:
The most popular subdivision utilizes distances from the mean in increments of one standard deviation of that specific normal curve. When dealing with a normal curve:
approximately 68% of the data will fall within one standard deviation of the mean
      (between the mean minus one standard deviation and the mean plus one standard deviation),
approximately 95% of the data will fall within two standard deviations of the mean
      (between the mean minus two standard deviations and the mean plus two standard deviations),and
approximately 99.7% of the data will fall within three standard deviations of the mean
      (between the mean minus three standard deviations and the mean plus three standard deviations).
These three facts make up what is referred to as the Empirical Rule (or the 68-95-99.7 Rule).

normalEmp

hint gal
These percentages represent the probability of data falling within given distances from the mean of a normal curve. The probability of the data falling somewhere on the graph is 100%. Expressed as decimals, we have 0.68, 0.95, 0.997, and 1.0. There is a correspondence between probability and area under the curve that will be discussed in Understanding Z-Scores.

NOTE: Normal distributions may also be referred to as normal probability distributions.

Using ½ of 1 standard deviation to create the subdivisions:
It is possible to subdivide the area under a normal curve into smaller intervals, such as widths of 0.5 standard deviations, as shown in the graph below. The addition of the percentages in this graph will be slightly different from the Empirical Rule values which are rounded approximations. These smaller subdivisions would be used when information presented in a question falls on the increments of one-half of one standard deviation from the mean.

graohNY
Graph based upon 0.5 standard deviation subdivisions.
[68.2% - 95.4% - 99.8%]

divider dash

 

beware   Not all data is normally distributed.

Statisticians use both simple and complex mathematical techniques to determine if a data set is distributed normally. The more data that is available, the more likely it can be determined if the population data is normal or not.

The simplest test for normality is to make a histogram of the data. If the shape of the distribution resembles a bell curve, the data is likely normal. Further examinations such as whether the mean equals the median, and whether approximately 68% of the data is within one standard deviation of the mean, 95% within two standard deviations and 99.7% within three standard deviations may help verify if the data comes from a population that is normally distributed.

Example:

Kayla's scores in Chemistry this semester were rather inconsistent: 100, 85, 55, 95, 75, 100. What percent of Kayla's scores are within one standard deviation of the mean?
chemistryscores
Solution: At first reading, you may want to jump to the conclusion that the answer is 68%, because the Empirical Rule tells us that 68% of the data falls within one standard deviation of the mean (for a normal distribution). But this is NOT a normal distribution.

chemcalc1a
Looking at the histogram above, it can be seen that this data will not resemble a bell-shaped curve. With a little mathematical computation, (or a little help from the graphing calculator) it can be shown that the mean and the median are not equal.
• The mean for this set: 85
• The population standard deviation: 16.07275127
• The interval for one standard deviation about the mean:    from 85 - 16 to 85 + 16 or (69 to 101)
• The number of scores in this interval: 5
• The total number of scores: 6
• Percentage: 83.33333333%
This is an extremely small data set. In this question, the data is the "population" since we are only looking at Kayla's scores (not in relation to a bigger set of school-wide scores).

If Kayla's scores were part of a bigger set of school-wide scores, this data would not necessarily tell us that the larger "population" was also not normally distributed. This would be too small of a sample set to be used to make inferences about the larger population.

If a small sample set appears to be normal, it is dangerous to make the assumption that the population is also normal. Such an assumption may lead to an incorrect statistical analysis, and incorrect implications regarding the data.

More sophisticated tests for normality require computer software packages and complex calculations.

divider dash


Examples:
Directions: Look for the words "normally distributed" in the question. If you are NOT told that the data is normally distributed, do not use the percentages in the Empirical Rule (use your graphing calculator) to determine the information. Some examples may deal with graphical increments of width one standard deviation or one-half of one standard deviation, as seen in the charts, and some may need a graphing calculator.

1. The ages of employees at Google are normally distributed. Within this curve, 95.4% of the ages, centered about the mean, are between 22.6 and 35.4 years. Find the mean age and the standard deviation of this data.
googleexecutives
SOLUTION: 95.4% implies that the curve has been divided into intervals of 0.5 standard deviations. 95.4% is a span of 2 standard deviations from the mean.

The mean age is symmetrically located between -2 standard deviations (22.6) and +2 standard deviations (35.4).
The mean age is (22.6 + 35.4) / 2 = 29 years of age.
examplepic2
From 29 to 35.4 (a distance of 6.4 years) is 2 standard deviations. Therefore one standard deviation is 6.4 / 2 = 3.2 years.

divider


2. The amount of time a trainer spends with a young elephant in any given week is normally distributed. If the trainer spends on average 12 hours per week, with a standard deviation of 3 hours, what is the probability that the trainer spends between 12 and 15 hours a week with the elephant?
elephanttrainer
SOLUTION: The average (mean) is 12 hours per week. If the standard deviation is 3, the interval between 12 and 15 is the interval between the mean and one standard deviation above the mean. The probability of spending between 12 and 15 hours per week is 34% or 34.1%, depending upon whether you round your answer according to the Empirical Rule.
examplepic2

divider


3. A litter of 12 puppies arrived at a shelter for adoption. The puppies weighed 5.2, 6.1, 6.4, 4.9, 5.3, 6.0, 5.3, 5.4, 6.1, 5.0, 5.1, and 5.0 pounds.
a) Show that this data is not normally distributed.
b) Find the mean and the standard deviation.
c) What percent of the puppies weighs more than one standard deviation above the mean?
d) What percent of the puppy weights would have been more than one standard deviation above the mean if the data had been normally distributed?
puppy
SOLUTION:
a) Examine a histogram of the data to see if a bell-shaped curve is visible. This histogram is not bell-shaped. This data is not normally distributed.
puppygraph1
 
b) The mean is 5.483333333.
The population standard deviation is 0.4980517599

.
puppydata1

c) % more than 1 standard deviation above mean: 5.483333333 + 0.4980517599 = 5.981385093
There are 4 puppies whose weight is above 5.981385093 pounds.
4/12= 0.3333333333 = 33 1/3%

d) If data was normally distributed there would be approximately 16% above one standard deviation above the mean.

divider

4. The lifetime of a battery is normally distributed with a mean life of 40 hours and a standard deviation of 1.2 hours. Find the probability (percentage) that a randomly selected battery lasts longer than 42 hours.
battery

beware Even though this problem is normally distributed, the most accurate answer cannot be obtained by using the Empirical Rule or the charts on this page. For this problem, one standard deviation above the mean will be located at 41.2 hours, two standard deviations will be at 42.4, and one and one-half standard deviations will be at 41.8. None of these locations corresponds exactly to the needed 42 hours. We need more power to find this solution. Let's use the graphing calculator!

We will see in Understanding Z-Scores that this problem can also be accomplished without a graphing calculator. But if you know how to use your graphing calculator, you will find the process easier and faster.

SOLUTION by Graphing Calculator:
Graph the normal curve. We can see from the location of 42 on the graph that the answer is going to be quite small. Common sense says that a lifespan over 40 hours should be unlikely.

Go to the "Y=" menu first.
calc1
2nd VARS (DISTR)

calc2a
• we wish to see all x values
• mean μ is 40
• standard deviation σ is 1.2


calc3a
"Paste" places the parameters into Y1. (or you can type them there directly)
calc4a
See Graphing Calculator Statistics 2 for guidelines
for the "WINDOW".

calc5
42 lies to the right of the graph, showing little space (area) left to the right of 42.

Now, determine the probability (percentage) of a value falling to the right of 42 hours (between 42 hours and infinity).
GCnormal1
2nd VARS (DISTR)
GCnormal2
E results from typing EE
above the comma key
.
Or type directly:
ShadeNorm(42,1
E99,40,1.2)
GCgraph3
Answer under "AREA"
4.8%
divider

ti84c
Working with normal distributions,
click here.


divider


NOTE: The re-posting of materials (in part or whole) from this site to the Internet is copyright violation
and is not considered "fair use" for educators. Please read the "Terms of Use".