ICR EBM Home Page

EBM Glossary

Clinical Trials

Diagnostic Tests

Bias

EBM Terms

Statistical Significance

Epidemiology

Table of Contents

EBM Concepts

EBM Resources

Search the Internet

MUSC Find a Person

MUSC Library

MUSC Home Page

 

Medical University of South Carolina

Statistical Significance


Inferential Statistics: Determines how likely a given result occurred by chance alone. Since we can rarely study an entire population, we study a sample of the population and by inference apply that result to the entire population.

Null Hypothesis: The proposal that no difference exists between groups or that there is no association between risk indicator and outcome variables. If the null hypothesis is true then the findings from the study are the result of chance or random factors. The overall purpose of a typical study is to "reject the null hypothesis." Another example: there is less than a 1 in 20 chance that the differences between treatments seen in this trial could have occurred by chance; less than a 1 in 20 chance that the null hypothesis is true.

Chance: Random variation. Difference between the outcomes from a sample of the population and the true value obtained from looking at the outcomes from the entire population. Statistical methods are used to estimate the probability that chance alone accounts for the differences in outcomes.

Clinical vs. Statistical Significance: Statistical significance means the likelihood that the difference found between groups could have occurred by chance alone. In most clinical trials, a result is statistically significant if the difference between groups could have occurred by chance alone in less than 1 time in 20. This is expressed as a p value < 0.05. Remember that a trivial difference can have a very low p value if the number of subjects is large enough! Clinical significance has little to do with statistics and is a matter of judgment. It answers the question "Is the difference between groups large enough to be worth achieving?" Studies can be statistically significant yet clinically insignificant.

Level of Significance: The probability of incorrectly rejecting the null hypothesis, i.e. saying that there is a difference between two groups when actually there is none. Otherwise known as the probability of Type I error. By convention, the level of significance is often set to a p value of 0.01 or 0.05.

p Value: The measured probability of a finding occurring, i.e. rejecting the null hypothesis, by chance alone given that the null hypothesis is actually true. By convention, a p value < 0.05 is often considered significant. ("There is less than a 5% probability that the finding [null hypothesis rejected] was due to chance alone.")

Power: The probability of detecting an effect in the treatment vs. control group if a difference actually exists. Must also specify the size of the difference. For example, a paper describing a clinical trial with a new hypertension medication may contain the following statement - "The study had a power of 80% to detect a difference of 5 mm Hg in diastolic blood pressure between the treatment and control groups." Typical power probabilities are 80% or greater. Power = 1 - ß (see Type II Error, below)

Type I Error: Mistakenly rejecting the null hypothesis when it is actually true. The maximum probability of making a Type I error that the researcher is willing to accept is call alpha (a). Alpha is determined before the study begins. False positive conclusion. Studies commonly set alpha to 1 in 20 (=0.05).

Type II Error: Mistakenly accepting (not rejecting) the null hypothesis when it is false. The probability of making a Type II error is called beta (b). Power = 1 - b (see above). False negative conclusion. For trials the probability of a b error is usually set at 0.20 or 20% probability. A 20% chance of missing a true difference.

Testing the Null Hypothesis to Assess Efficacy of Two Treatments (e.g. drug vs. placebo)

 

 Truth

 Null hypothesis is true (no difference)

  Null hypothesis is not true (difference)

 Decision (based on statistical test)

 Accept Null Hypothesis

 Correct

 Type II Error (beta)

 Reject Null Hypothesis

 Type I Error (alpha)

 Correct
1 - beta
(Power)

Standard Error of the Mean (SEM): A measure of variability. The standard error of the mean quantifies how accurately the true population mean is known. A measure of the variability of the mean of the sample as an estimate of the true value of the population mean. The larger the sample size the smaller the standard error of the mean. Used in computing confidence intervals. In a clinical trial, the larger the sample size the tighter the 95% CI is around the point estimate of the study.

Standard Deviation: A measure of variability. The standard deviation quantifies how much the values vary from each other. A measure of the spread of individual observations around the mean value of the sample. A normal, unskewed curve will have 34% of the cases between the mean and 1 standard deviation above or below the mean; 68% of cases between 1 standard deviation above and 1 below the mean; 95.5% of cases will be within two standard deviations of the mean.

Confidence Interval: Often expressed as 95% confidence intervals. Studies are performed on a sample of the population, not the whole population. Confidence intervals give us some idea of how likely the sample mean represents the population mean. Expressed as the sample mean plus and minus a specified amount. A measure of the precision of the estimate. The 95% CI is the range of values within which we can be 95% sure that the true value lies for the whole population of patients from whom the study patients were selected. Most clinical trials study a sample of the population at risk. Because a sample is a subset of a population, the mean value obtained for the sample studied may not be same as the mean value if the entire population was studied. Results from a sample population with a wider range of values will have broader confidence intervals than results from a study with a narrower range of values. Increasing the number of results (patients) within a sample population narrows the confidence intervals. The confidence interval (CI) quantifies uncertainty. Derived from the sample mean and the standard error.
Please see: Use of confidence intervals to indicate uncertainty in research findings.

Interobserver variability: Variability between observers. Do two or more radiologists give the same reading fromthe same radiograph?

Intraobserver variability: Variability by the same observer. Does a radiologist give the same reading of a radiograph when viewed on more than one occasion?

Survival Analysis: Statistical procedures for estimating survival (prognosis) in a population under study.

Cox Proportional-Hazard Model: A type of multivariate analysis that is used to identify a combination of factors that bests predicts prognosis in the group of patients. Can also test the effect of individual factors independently. Analysis used when the outcome is the time to an event. The Cox proportional hazard model is used when practical considerations preclude observing survival time in all patients being studied.

Hazard (or Hazard Rate): Probability of an endpoint. A technical name for failure rate.

Hazard Ratio: Relative risk of an endpoint at any given time.

Multivariate Analysis: An analysis where the effects of many variables are considered. Can select a subset of variables that significantly contribute to the variation in outcome.

Kaplan-Meier Curve: Used for estimating probability of surviving a unit of time. Used to develop a survival curve when not all survival times are exactly known.

Doctoring Curriculum Home Page* ICR EBM Home Page* Table of Contents
MUSC Library* Find a Person *
MUSC Home Page

Top of Page

Last Modified January 1, 2001