12. Statistical Analysis

This section is “voluntary”. You can skip it and still self-certify that you have covered the course.

It is a long section about statistical analysis rather than experimental design, giving a single example of each type of statistical analysis. The raw data is available so that you can try it out on your own software, should you want to do so.

1. Steps in the statistical analysis of your experiment

2. Statistical software

3. Example 1 A two-sample experiment using a t-test,

4. Example 2. A one-way ANOVA

5. Example 3. A randomised block design

6. Example 4. A factorial experimental design

7. Example 5 A split plot design

8. Example 6. Correlation

9. Example 7. Linear Regression

Steps in the statistical analysis of your experiment

1. Prepare the data for entry into your statistical software package (see below). It is probably easiest to do this in a spread sheet, although all software offers you the possibility of manipulating it.

The data needs to be entered with one row per experimental unit, with columns for “Treatment”, “Block”, “Factor A” etc, depending on the number and types of factor.

2. Look at the raw data graphically using a plot which shows individual observations. The aim is to get a general feeling for the results and to pick out any unexpected observations or patterns.

3. Obtain summary statistics. These are group means/medians, standard deviations, inter-quartile ranges, correlation coefficients, regression coefficients or whatever other statistics are of interest.

4. Do a statistical analysis to estimate statistical the significance of differences between means etc. with standard errors and/or confidence intervals.

5. Interpret the results.

6. Prepare graphical presentations of the results

Statistical software

If you have access to one of the commercial statistical packages designed for teaching and general use, such as MINITAB, SPSS or Graphpad then you are advised to use it. If you don’t have such access, then you might like to try using “R Commander”.

“R” is a statistical package widely used by professional statisticians. It is command driven and is therefore more difficult to learn than software which is menu-driven. But there is a front end called “R Commander” (Rcmdr) which is menu driven. It is easier to use but at the cost of being less flexible.  R and Rcmdr can be freely down loaded to PC, Unix and Mac computers from the CRAN web site. http://cran.r-project.org/bin/windows/base/ . Details of how to install Rcmdr are found here. Read them carefully. Once Rcmdr is installed it is invoked from within R. So start R by clicking its ikon and then give the command “library(Rcmdr)” (case sensitive). An introduction to Rcmdr is available on the Help menu.

Data entry

Data entry is described for the for Rcmdr. This (and most other software) requires the data to be prepared with one row per experimental unit. Columns should be labeled using a single word or two words without a gap. A period or other separator (e.g. “Dose.mg”) can be used if two words are needed. One or more columns should be used to indicate the factor(s) (treatments and blocks, if present), and another to show the outcome measurement. Missing data should be entered as NA (when using R), Although this differs between software packages.

The figure below is a screen shot of the R Commander interface. The data shown in the box at the right side of the screen shot was prepared in EXCEL, copied to the clipboard and then imported into Rcmdr (Data-> Import data->, clipboard, then click the clipboard button).

The Rcmdr window has menu items along the top (File, Edit, Data etc) and some buttons, one of which is “View data set”. There are two large windows. The top one shows the commands generated by the menu manipulations (in this case all about loading the data), and a lower “Output window” which shows the results.

These results can be marked and copied to a word processor etc. Graphs can be right-clicked and saved as a metafile. A command in the top window can be modified and run by marking it and clicking “Submit”. This might be useful in modifying the titles in graphs.

Having loaded the data the aim is to look at it graphically, obtain summary statistics (means and SDs) and do a t-test to see whether the group means differ significantly.

Example 1 A two-sample experiment using a t-test, (Raw data here)

This is data on the body weights of mice maintained on a diet with either roast or raw peanuts (made-up data). The raw data is shown here. In this data set the Group is coded numerically. We need to tell Rcmdr that “Group” is a factor, and not a numerical variable (it would have been better to use alphabetic characters for the group variable such as “Control” and “Treated”. Failure to do this will give the wrong result in most cases). This is done using the menu commands: “Data”->, “Manage variables in the data set”-> “Convert numerical variables to factors”, then indicate which column is to be converted. There is a provision to rename the group variables, but in this case numbers have been retained (but now the software knows that they are factors).

A graphical display of the data is obtained from Graphs-> Stripchart, and in this case “jitter” has been chosen. This adds some random variability on the X-axis so that points do not overlap too much..

Inspection of the plot shows no obvious outliers, so we proceed to get means and standard deviations. This is obtained by “Statistics ->Summaries->, Numerical summaries” and filling in a box to indicate that the summary should be by groups. The output is given here (output can be marked and copied. It is best expressed in the Courier typeface to maintain correct spacing).

mean    sd     n
1 40.6     4.50    11
2 44.2     2.96    11

(rounded to three significant digits)

A two-sample t-test is obtained from Statistics-> Means-> Independent samples t-test (below)

Two Sample t-test

data:  Weight by Group
t = -2.1829, df = 20, p-value = 0.04114
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.9334705 -0.1574386
sample estimates:
mean in group 1 mean in group 2
40.63636        44.18182

When presenting the results, the means and SDs should be rounded to three significant digits (e.g. 40.6, SD= 4.50), the t-value and p-values should be given (also rounded to three significant digits) so t=2.18, df=20, p=0.04. The 95% confidence interval for the difference (-6.93 to -0.16) can also be given. (Note that the minus signs here are because differences were Group1 minus Group 2, but which group is taken from which is purely arbitrary).

Example 2. A one-way ANOVA (raw data here)

The Analysis of variance partitions the total variation in a set of data into various parts, depending on the design of the experiment. In the simplest case the total variation will be split into treatment variation and residual or error variation.

First, the data was read into Rcmdr. Then a “stripchart” was produced (the figure below) It is easy to see that the total variation (the length of the Y-axis) has been increased by the urethane treatment. The ANOVA will indicate whether this increase is likely to be due to chance sampling variation. If that is unlikely, then it will be attributed to the effect of the treatment,

Mice were treated with either urethane or methylcholanthrene (MCA) in a completely randomised design. A mouse was the experimental unit. The number of micronuclei were counted in a sample of blood. The aim was to see if these two compounds (carcinogens) increased the number of micronuclei, compared with the controls.

Inspection suggests that Urethane might be different from the controls, but MCA does not appear to do so.

Using Statistics-> Summaries-> Numerical summaries produces the means, SDs, quartiles (omitted here) and n. It is debatable whether the quartiles are wanted in many cases.

Rounded to three decimal places

Next a one-way analysis of variance (Statistics->Means->One-way ANOVA.) was used to see if the means differed significantly. “Pairwise comparisons of means” was also checked to see which means differed significantly from each other.

The ANOVA comes out in the form of a table with the first column being the source of variation. Note that unlike some other statistical packages, the ANOVA table in R does not show a “Source” heading which has been added here (in green). In this case there are two sources: Group (the treatment) and Residuals. Most other packages call this “Error”, but residuals is an alternative (better?) term.

The second column (Df) shows the degrees of freedom (the total number of objects minus one). In this case there were three treatments so that represents two Df. Most other packages also show the total Df., 35 in this case. The column headed Sum Sq is a quantification of the variation associated with each source. In this case the variation is split about equally between the Group (treatment) and Residuals.

The column headed Mean Sq. is the Sum Sq divided by the residual Df. The F value is the Group Mean Sq. divided by the residuals mean sq. (11.0982/0.7929). F is a test statistic just like T, and it can be used in conjunction with the Df to calculate a p-value, in this case of 0.0000391. This is so low that we must reject the null hypothesis and accept the alternative hypothesis that group means differ.

More output from Rcmdr shows comparisons among the groups. This uses Tukey’s method for correcting for multiple comparisons. In this case MCA and Control means do not differ but Urethan and Control means do and the Urethane mean differs from the MCA mean (p=0.000 to three decimal places). There is a graphical representation, not shown here.

Finally, there are three assumptions which need to be met when using parametric statistics like the t-test and ANOVA.

1. The observations are independent. This depends on correct experimental design (correct experimental unit, randomisation and blinding),

2.The residuals (deviation of each observation from its group mean) have a normal distribution,

3. The variance is the same in each group. The last two assumptions are usually investigated using “Residuals models diagnostic plots”. Such a plot is found in Models->Graphs->Basic diagnostic plots. It is shown on the right.

The first plot is of Residuals vs Fitted (group means) to see if the variation is about the same in each group. The Urethane group is a bit more variable than the other two largely due to two “outliers”. However, from investigating the raw data there is no evidence that they are not real points, so are retained.

The second plot is designed so that if the residual have a normal distribution there will be a straight line. That is the case here apart from those two points. The third plot, Scale location” is a repeat of the first one but with a different scale on the Y-axis. As a rule of thumb, any points above 2.0 on the Y-axis would suggest heterogeneity of variance. This is not the case here.

The final plot “Constant leverage” gives more information about the individual points and helps to show up any outliers.

There are various ways of presenting the results graphically. A box plot “Graphs->Boxplot” gives the boxplot below and “Graphs->Plot of means” can also be obtained. If you want to modify the axis labels this can be done by changing the command in the top box (xlab= and ylab=)

More command can be obtained over the graphics by using R directly rather than R-Commander, but the language has to be learned.

Example 3. A randomised block design (Raw data here)

The aim of this study was to see whether mice preferred certain tastes. There were four cages, each containing two mice. They were given two water bottles, one of distilled water and the other of a test solution (salt, sucrose, ethanol and saccharin with distilled water as the control). The position of the bottles was changed each day and the solutions were offered for one week. The mice were then rested over the weekend and a new solution was given, in random order. The experimental unit is a cage of two mice for a period of time, and the blocking factor is cage. The data is given here.

The data was read into Rcmdr in the usual way, and the numerical designations for Cage was changed to make Cage a factor not a numerical variable.

A stripchart (Graphs->Stripchart) is shown on the right. There is clear evidence that mice prefer some solutions. The points are quite spread out because of differences between blocks (cages). A slightly irritating aspect of R and Rcmdr is that groups are shown in alphabetic order, although this can be changed.

The data needs to be analysed as a two-way ANOVA without interaction. The commands are Statistics->Fit models->Linear model.

A box then opens up and needs to be filled in as shown.

Cage and Treatment are entered as factors, separated by as plus (+) sign, as shown.

The immediate output is not very helpful. It shows which cages and treatments differ from the first cage (the factors are taken in alphabetic order).

The ANOVA table is shown in Models->Hypothesis tests-> ANOVA table. (Note that it is important to check that the Df for Cage and Treatment are correct. This would  not be the case if Cages had not been changed from a numerical variable to a factor)

As usual, the residuals diagnostic plots (Models-> Graphs-> Basic diagnostic plots) should be studied to test the normality of the residuals and homogeneity of variances. In this case all is normal (not shown)

Compared with the one-way ANOVA, there is now an additional row concerned with the variation caused by the blocking, which is separated out. Note that “Cages” variation is of no particular interest. It is a nuisance variable which is removed so that the effect of the treatment can be more clearly seen. Its effect was relatively small as shown by the Sum Sq, but had it not been removed the variation would have been part of the residuals, so would have made the treatment effect less significant. The treatment has a large and significant effect. The means can be calculated in Rcmdr as for the previous example:

However, the standard deviations include the variation between blocks. With a randomised block design the appropriate standard deviation is a pooled one and it comes from the square root of the error mean square in the ANOVA table, i.e. sqrt(15.4) = 3.92.

The ANOVA only tests the over-all hypothesis that there are differences among treatments. It does not indicate which ones differ from each other.  Rcmdr does not provide for multiple comparisons in a two-way design, but a test called “Fisher’s Protected Least Significant Difference” test can be done reasonably easily by hand (but only if there is over-all significance). The formula is shown here, where M1 and M2 are means, t is Student’s t with the appropriate df and significance level and S-squared is the error mean square in the ANOVA table. In this case t (0.05) with 12 Df  can be looked up in Rcmdr (Distributions->continuous distributions->t distribution->quantiles. Fill in the box for a probability of 0.05 and 12 Df  press return and t=1.782288). The variance (15.4 from the error M Sq in the ANOVA table) is multiplied by 2 and divided by n, which is 4 as each mean is made up of four numbers. The square root of that is 2.77, So the least significant difference is 1.7822 * 2.77 = 4.945. Any means that do not differ by this amount are not significantly different at p=0.05.

The results can be presented in a table by bracketing together means which do not differ from each other at the chosen significance level. The pooled SD (3.87 in this case) should also be reported. The results could also be presented as bar diagrams, in which case half the LSD of 4.945 should go up and half down.

Rcmdr also provides a “Treatment effect plot” of the points with confidence intervals (Models->Graphs->Effect plots) (not shown).

Example 4. A factorial experimental design (Raw data here)

The data used here comes from a larger study of the effect of chloramphenicol on haematology in four inbred strains and one outbred stock of mice ( Festing et al. Strain differences in haematological response to chloramphenicol succinate in mice: implications for toxicological research. Food and Chemical Toxicology 2001;39:375-383). The data is shown  here. It was read into Rcmdr.

Unfortunately there is no good way of looking at individual points classified in a two-way design in Rcmdr. However, if the data is regarded as being four groups (by adding “Group” which is not used in the statistical analysis) it is possible to produce a stripchart (Graphs->Stripchart).

Individual observations can now be seen. It is clear that strain C3H is responding, but CD-1 is not.

Tables showing means and standard deviations can be obtained from Statistics->Summaries->Table of statistics. These should only be quoted to three significant digits (as shown)

Treatment means

Strain Chloram Control
C3H  7.27   8.33
CD-1 8.66   8.51

Treatment standard deviations

Strain   Chloram   Control
C3H  0.374       0.391
CD-1 0.378     0.555

However, a pooled standard deviation should be quoted. This is obtained as the square root of the error mean square in the ANOVA table (see below, sqrt(0.1859)= 0.431).

An analysis of variance is done using Statistics->Fit models-> linear model and filling in the box as shown. Note that in this case the RBC counts depend on both the strain and the treatment, and we want to know whether there is an interaction with the two strains responding differently. So the separator between Strain and Treatment is “*”, implying multiplication.

As before, the immediate output is not very helpful. The ANOVA table is found in Models->Hypothesis tests-> ANOVA table.

It will be noted that in this case there is a statistically significant treatment x strain interaction (p=0.016).

A plot of means can be obtained from Graphs->Plot of means. However, it shows the strains on the X-axis when it is easier to understand if treatments are on that axis. Editing the script window by changing the order of the commands as shown below, and pressing “submit” as shown below results in a reasonable looking plot.

plotMeans(Dataset\$RBC, Dataset\$Strain, Dataset\$Treatment, error.bars="se")

plotMeans(Dataset\$RBC, Dataset\$Treatment,Dataset\$Strain, error.bars="se")

In view of the statistically significant interaction, it does not make sense to estimate the over-all means. Each mean will need to be shown separately as shown above. However, a pooled standard deviation should be used. This is obtained as the square root of the error mean square in the ANOVA table, i.e. sqrt(0.1859)= 0.431.

Example 5 A split plot design (Raw data here)

This is made up date (split-plot designs are not common). It is assumed that the “Treated” and “Control” animals are housed in the same cage. The treatment could be, say, something injected individually, or the animals could be different genotypes, or it could be a within-animal experiment so that each animal has two measurements, one control and one treated. Half the animals are males and the other half females. Assume that the data is “activity” (possibly behaviour or activity of an enzyme)

A split plot design is one in which there are two types of experimental units. In this case an animal (or part of an animal or an animal for a time) is the experimental unit for comparing treated versus control the sex x treatment interaction (whether the two sexes respond in the same way), using Error(B). However, it is also possible to compare the two sexes averaged across the treatments using Error A. In this case the cage is the experimental unit. The analysis of variance table therefore has two separate error terms (A and B).

The means for the two sexes are:

Sex  Mean     n
F    3.750    12
M    6.925     12

Each mean is based on 12 observations. The difference is tested against the cage (Error A) variation, giving an F-value of 46.82, p<0.000. The standard deviation appropriate to these means is a pooled estimate from the square root of 1.29175 = 1.14.

The treatment means are:

Treatment    mean    n
Control        6.03     12
Treated        4.64    12

The statistical significance in this case is tested using the Cager:Treat (Error B), giving an F-value of 11.62, p<0.000. The pooled standard deviation is the square root of 0.25803 = 0.508. The sex x treatment interaction, showing whether the two sexes respond differently is also tested against Error B, giving an F-value of 0.13, p=0.725. Had this been statistically significant hen the response would have needed to be studied separately for males and females.

Coding the statistical analysis is tricky, and probably requires professional assistance. An example is given by Crawley (2005, p176) for the R  package and the above ANOVA was coded as he suggests (it only takes two lines). It does not seem to be possible to code it using Rcmdr.

Example 6. Correlation (Raw data here)

Data on body and organ weight of 15 male mice was recorded, read into Rcmdr, and the following product-moment correlation matrix was obtained using Rcmdr (Statistics -> Summaries-> Correlation matrix).

Clearly, body weight and fat weight were highly correlated (r=0.94, p<0.0000), while fat and brain weight were un-correlated (r=-0.04, p, adjusted=1.000). Rcmdr will also do a t-test of the correlation between two variables (not shown)

Rcmdr will also do a scatter plot matrix as shown on the right. The correlation between body and fat weight is clearly seen.

Example 7. Linear Regression (Raw data here)

The data used in this example comes from an experiment by Strickland et al. “Experimental study of large-volume microwave ablation in the liver”. Br J Surg 2002;89:1003-1007. Surgical section of liver tumours is not feasible, but a company has produced a microwave probe that could be inserted into a liver tumour to ablate it.

This study used pigs as a model, and the aim of the experiment was to determine what dose of microwaves was needed to ablate a section of the liver of a given size. The experiment involved nine pigs, but as there were no differences among them the data has been combined. Pigs were anesthatised and then received 2-4 ablations of varying power. After a recovery period they were again anesthatised and the diameter of the lesions (cm) was recorded.

The raw data is given in the data page. It was read into R commander in the usual way, and was plotted as shown here (Graphs->Scatter plot). Some “jitter” was added to the X-axis so that the points would not overlap too much.

The output from the regression analysis (Statistics->Fit models->Linear regression) is:

There is really no question of whether there is a statistically significant regression. The formula for a regression line is Y=a+bX, where Y is the Y-value, a and b are constants calculated from the data and X is the power in Watts. In this case a=1.91 and b=0.0213. From this the average diameter of a lesion for a given power (in Watts) can be estimated as 1.91 + 0.0213*Power. The “Multipl R-squared” of 0.786 says that 78.6% of the total variation is accounted for by linear regression, a reasonably high value.