
Tutorial on Introduction to biostatistics
Inferential data analysis
As the researcher draws scientific conclusions from his study using only a sample instead of the whole population, he can justify his conclusion with help of statistical inference tools. The principal concepts involved in statistical inference are theory of estimation and hypothesis testing.
Theory of estimation
Point Estimation:
A single value is used to provide the best estimate of the parameter of interest.
Interval Estimation:
Interval estimates shows the estimate of the parameter and also give an idea of the confidence that the researcher has in that estimate. This leads us to consideration of confidence intervals.
Confidence interval (CI)
A confidence interval estimate of a parameter consists of an interval, along with a probability that the interval contains the unknown parameter. The level of confidence in a confidence interval is a probability that represents the percentage of intervals that will contain the parameter if a large number of repeated samples are obtained. The level of confidence is denoted (1 - a)*100%.
The narrower the width of the confidence interval, the lower is the error of the point estimate it contains. The sample size, sample variance and the level of confidence all affect the width of the confidence interval.
- If the sample size increases it will decrease the width of the confidence interval.
- If the level of confidence increases the width will increase.
- If the variation in sample increase it will increases the width of confidence interval
Confidence intervals can be computed for estimating single mean and proportions and also for comparing the difference between two means or proportions. Confidence interval is widely used to represent the main clinical outcomes instead of p values as it has many advantages over it (such as giving information about effect size, variability and possible range). The most commonly used confidence interval is the 95% CI. Increasingly, medical journals and publications require authors to calculate and report the 95% CI wherever appropriate since it gives a measure of the range of effect sizes possible – information that is of great relevance to clinicians. The term 95% CI means that it is the interval within which we can be 95% sure the true population value lies. Note that the remaining 5% of the time, the value may fall outside this interval. The estimate, which is the effect size observed in the particular study is the point at which the true value is most likely to fall, though it can theoretically occur at any point within the confidence interval (or even outside it, as just alluded to).
Example:
A study is conducted to estimate the average glucose levels in patients admitted with diabetic ketoacidosis. Sample of 100 patients were selected and the mean was found to be 500 mg/dL with a 95% confidence interval of 320-780. This means that there is a 95% chance that the true mean of all patients will lie between 320 and 780.
Hypothesis testing vs. Estimation
Similarity: Both use sample data to infer something about a population
Difference: Designed to answer different questions
Does a new drug lower cholesterol levels?
Measure cholesterol of 25 patients before drug & after - change in cholesterol is 15 mg/dL (225 before; 210 after)
Hypothesis test: Did the drug alter cholesterol levels?
Yes/no decision. Reject or fail to reject H0
Estimation: By how much did the drug alter cholesterol levels?
Hypothesis testing
Setting up the Hypotheses:
The basic concept used in hypothesis testing is that it is far easier to show that something is false than to prove that it is true.
a) Two mutually exclusive & competing hypotheses:
Let us consider a situation where we want to test if a new drug is having superior efficacy to one of the standard drugs prevailing in the market for the treatment of tuberculosis. We will have to construct a null hypothesis and alternative hypothesis for this experiment as below:
1. The “null” hypothesis (H0)
The null hypothesis indicates a neutral position (or the status quo in an interventional trial) in the given study or experiment. Typically the investigator hopes to prove this hypothesis wrong so that the alternate hypotheses (which encompasses the concept of interest to the investigator) can be accepted.Example:
In the situation given above, though we actually want to prove the new drug to be effective, we should proceed with a neutral attitude while doing the experiment so our null hypothesis will be stated as follows:Ho: There is no difference between the effect of new drug and standard drug in treating tuberculosis
2. The “alternative” hypothesis (H1)
This is the hypothesis we believe or hope is true.
Example: In the above situation if we want to prove the new drug is superior then our alternative hypothesis will be:
H1: New drug’s effect is superior to that of the standard drug.
Based on the alternative hypothesis the test will become one-tailed test or two-tailed test. Two-tailed tests are when the researcher wants to test in both the direction for the population parameter specified in the null hypothesis (i.e. either greater or lesser). If he wants to test the parameter of the null hypothesis in only one direction greater or lesser it becomes a one-tailed test.
In the above example the researcher test framed the alternative hypothesis in only one direction (new drug is superior to the standard drug) so the test becomes a one tailed test.
b) Selecting a “significance level”: a
Significance level is the probability of rejecting the null hypothesis when it is actually true (Type I error). It is usually set at 5% i.e. a = .05 (5%)
c) Calculate the test statistics and p value
Test statistics
Calculating the test statistics will depend on our null hypothesis. It may be testing a single mean or proportion or it may be comparing two means or proportions.
p-value
A p-value gives the likelihood of the study effect, given that the null hypothesis is true. For example, a p-value of .03 means that, assuming that the treatment has no effect, and given the sample size, an effect as large as the observed effect would be seen in only 3% of studies.
In other words it gives the chance of observing a difference (effect) from the sample when the null hypothesis is true. For example, if get a p value of 0.02 then only a 2% chance is there for observing a difference (effect) from the sample if we assume the null hypothesis is true.
The p-value obtained in the study is evaluated against the significance level alpha. If alpha is set at .05, then a p-value of .05 or less is required to reject the null hypothesis and establish statistical significance.
d) Decision rule:
We can reject H0 if the p-value <α.
Most statistical packages calculate the p-value for a 2-tailed test. If we are conducting a 1-tailed test we must divide the p-value by 2 before deciding if it is acceptable. (In SPSS output, the p-value is labeled “Sig (2-tailed).”)
Table 1: Step by step guide to applying hypothesis testing in research
1. Formulate a research question
2. Formulate a research/alternative hypothesis
3. Formulate the null hypothesis
4. Collect data
5. Reference a sampling distribution of the particular statistic assuming that H0
is true (in the cases so far, a sampling distribution of the mean)
6. Decide on a significance level (a), typically .05
7. Compute the appropriate test statistic
8. Calculate p value
9. Reject H0 if the p value is less than the set level of significance otherwise accept H0
Hypothesis Testing for different SituationsTesting for Single mean – Large Samples: Z-test
Z-test for single mean is useful when we want to test a sample mean against the population mean when the sample is size is large (i.e. more than 30).
Example:
A researcher wants to test the statement that the mean level of dopamine is greater than 36 in individuals with schizophrenia. He collects a sample of 54 patients with schizophrenia.
The researcher can test the hypothesis using Z-test for testing single mean.
Testing for Two means – Large Samples: Z-test for comparing two means.
Z-test for comparing two means is useful when we want to compare two sample means when the sample is size is large (i.e. more than 30).
Example:
Past studies shows that Indian men have higher cholesterol levels than Indian women. A sample of 100 males and females were taken and their cholesterol level measured – males were found to have a mean cholesterol level of 188 mg/dL and females a mean level of 164 mg/dL. Is there sufficient evidence to conclude that the males are indeed having a higher cholesterol level?
Here we can test the hypothesis using Z-test for comparing two sample means.Testing for Single mean – t-test.
The t-test for single mean is useful when we want to test a sample mean against the population mean when the sample is size is small (i.e. less than 30).
Example:
A researcher wants to test the statement that the mean age of diabetic patients in his district is greater than 60 years. He draws a sample of 25 persons.
Here we can test the hypothesis using t-test for single mean.
Independent Sample t-test for two means.The t-test for comparing two means is appropriate when we want to compare two independent sample means when the sample is size is small (i.e. less than 30).
Example:
A study was conducted to compare males and females in terms of average years of education with a sample of 9 females and 13 males. It was inferred that males had an average of 17 years of formal education while females had 14. Can it be concluded that males are having a higher degree of education than females within this population?
Here we can test the hypothesis using t-test for comparing two sample means.
Paired t-test for two means.
Paired t-test is useful when we want to compare the two sample means when the two sample measurements are taken from the same subject under the study like pre and post measurements.
Example:
A study was conducted to compare the effect of a drug in treating hypertension by administering it to 20 patients. BP was recorded immediately before and one hour after the drug is given. The question of interest - is the drug effective is reducing blood pressure?
A paired t-test can be used for hypothesis testing and comparing two paired sample means.
Testing for Single proportion: Binomial test for proportion
If we want to test a sample proportion against the population proportion we can use the
binomial test for single proportion.
Example:
A random sample of patients is recruited for a clinical study. The researcher wants to establish that the proportion of female patients is not equal to 0.5.
The binomial test for proportion is the appropriate statistic method here.
Testing for Two proportion: Z-test for two proportions
If we want to compare two sample proportions we can use the Z-test for two proportions when the sample size is large (i.e. more than 30)
Example:
Two types of hypodermic needles, the old type and a new type, are used for giving injections. It is hoped that the new design will lead to less painful injections. The patients are allocated at random to two groups, one to receive the injections using a needle of the old type, the other to receive injections with needles of the new type.
Does the information support the belief that the proportion of patients having severe pain with injections using needles of the old type is greater than the proportion of patients with severe pain in the group getting injections using the new type?
Here we can test the hypothesis using Z-test for comparing two sample proportions.
Chi-square test (χ2)
It is a statistical procedure used to analyze categorical data.
We will explore two different types of χ2 tests:
1. One categorical variable: Goodness-of-fit test
2. Two categorical variables: Contingency table analysis
One categorical variable: Goodness-of-fit test
A test for comparing observed frequencies with theoretically predicted frequencies.
Two categorical variables: Contingency table analysis
Defined: a statistical procedure to determine if the distribution of one categorical variable is contingent on a second categorical variable
- Allows us to see if two categorical variables are independent from one another or are related
- Conceptually, it allows us to determine if two categorical variables are correlated
Note:
If the expected frequencies in the cells are “too small,” the χ2 test may not be valid
A conservative rule is that you should have expected frequencies of at least 5 in all cells
ExampleWe want to test the association between cancer and smoking habit in 250 patients. The chi-square would be an appropriate test.
Analysis of Variance (ANOVA)
When we want to compare more than two means we will have to use an analysis of variance test.
Example:
A researcher has assembled three groups of psychology students. He teaches the same topic to each group using three different educational methodologies. The researcher wishes to determine if the three modalities are giving equivalent results. He tests all the students and records the marks obtained.
An ANOVA analysis can be used to test the hypothesis.
Repeated Measures ANOVA
Repeated measures ANOVA is useful when we want to compare more than two sample means when the sample measurements are taken from the same subject enrolled in the study.Example:
A trial was conducted to compare the effect of a drug in treating hypertension by administering it to 20 patients. BP was recorded immediately before and one, two and four hours after the drug is administered
Is the drug is effective is reducing blood pressure?
Repeated measures ANOVA would be the right way to get an answer.Parametric Tests
The statistical hypothesis test such as z-test,t-test and ANOVA assumes the distributions of the variables being assessed comes from a parametrized probability distribution. The parameters usually used are the mean and standard deviation. For example, t-test assumes the variable comes from the normal population and analysis of variance assumes that the underlying distributions are normally distributed and that the variances are similar.
Parametric techniques are poweful to detect differences or similarities than the non parametric tests
Nonparametric/Distribution-free tests
Nonparametric tests: statistical tests that do not involve population parameters and do not make assumptions about the shape of the population(s) from which sample(s) originate.
It is used in the following circumstances
1. Useful when statistical assumptions have been violated
2. Ideal for nominal (categorical) and ordinal (ranked) data
3. Useful when sample sizes are small (as this is often when assumptions are violated)
What are the disadvantages of Nonparametric/Distribution-free tests?
1. Tend to be less powerful than their parametric counterparts
2. H0 & H1 not as precisely defined
There is a nonparametric/distribution-free counterpart to many parametric tests.
· The Mann-Whitney U Test: The nonparametric counterpart of the independent samples t-test
· The Wilcoxon Signed Rank Test: The nonparametric counterpart of the related samples t-test
· The Kruskal-Wallis Test: The nonparametric counterpart of one-way ANOVA
· Kolmogorov-Smirnov Test : It is a non-parametric test and is used to test whether the distribution of the two data sets are same or not
· Run Test: Run is a series of similar values followed by a different value. Run test is used to test the runs randomly occurred in a data set or not
Table 2: Statistical tests at a glance
Type of variable in the study |
Parameters to be tested |
Number of variables |
Sample size |
Test |
Ratio variables |
Mean |
One |
>30 |
Z-test |
Mean |
Two |
>30 |
Z-test |
|
Mean |
One |
<30 |
t-test |
|
Mean |
Two |
<30 |
Independent sample t-test |
|
Mean (same subject) |
Two |
<30 |
Paired sample t-test |
|
Proportion |
One |
Binomial |
||
Proportion |
Two |
>30 |
z-test |
|
Mean |
More than two |
>30 |
ANOVA |
|
Mean(same subject |
More than two |
>30 |
Repeated measures ANOVA |
|
Nominal/ Categorical variables |
Association |
Two or more |
- |
Chi-square |
Ratio variables |
Mean |
Two |
When normality assumption violated |
Mann-Whitney test |
Ratio variables |
Mean (same subject) |
Two |
When normality assumption violated |
Wilcoxon signed rank test |
Ratio variables |
Mean |
Moe than Two |
When normality assumption violated |
Kruskal Wallis test |