Slide 1:Introduction to Biostatistics Dr. M. H. Rahbar
Professor of Biostatistics
Department of Epidemiology
Director, Data Coordinating Center
College of Human Medicine
Michigan State University
What does “STATISTICS” mean? :What does “STATISTICS” mean? The word “Statistics” has several meanings:
It is frequently used in referring to recorded data
Statistics also denotes characteristics calculated for a set of data, for example, sample mean
Statistics also refers to statistical methodology, techniques and procedures dealing with the design of experiments, collection, organization, analysis of the information contained in a data set to make inferences about the population parameters
What do statisticians do? :What do statisticians do? To guide the design of an experiment or survey prior to the data collection
To analyze data using proper statistical procedures and techniques
To present and interpret results to the researchers and other decision makers including the government and industries
WHY STUDY STATISTICS? :WHY STUDY STATISTICS? Knowledge of statistics is essential for people going into research, management or graduate study
Basic understanding of statistics is useful for conducting investigations and an effective presentation
Understanding of statistics can help anyone discriminate between fact and fancy in daily life
A course in statistics should help one know when, and for what, a statistician should be consulted
Definition of Population & Sample :Definition of Population & Sample A population is a set of measurements of interest to the researcher.
Examples:
1. Income of households living in Karachi
2. The number of children in families living Pakistan
3. The health status of adults in a community
A subset of the population is called sample. A sample is usually selected such that it is representative of the population
Descriptive & Inferential Statistics :Descriptive & Inferential Statistics 1. Descriptive Statistics deal with the enumeration, organization and graphical representation of data
2. Inferential Statistics are concerned with reaching conclusions from incomplete information, that is, generalizing from the specific sample
An example of inferential statistics include using available information about the health status of people in a sample to draw inferences about the underlying population from which the sample is selected
INFERENTIAL STATISTICS :INFERENTIAL STATISTICS The objective of inferential statistics is to make inference about the population parameters based on the information contained in the sample.
Estimation (e.g., Estimating the prevalence of hypertension among adults living in Karachi)
Testing Hypothesis (e.g., Testing the effectiveness of a new drug for reducing cholesterol levels)
Sources of Data :Sources of Data Data may come from different sources:
Surveillance systems (e.g., NIH)
Planned surveys (Government, Universities, NGOs)
Experiments (Pharmaceutical Companies)
Health Organizations (Administrative Data sets)
Private sector (Banks, Companies, etc)
Government (All government agencies)
Here we will focus on surveys and experiments
What is the difference between a survey and an experiment?
Difference between Surveys & Experiments :Difference between Surveys & Experiments A Survey Data represent observations of events or phenomena over which few, if any, controls are imposed.
(e.g., Assessing the association between different lifestyles and heart disease)
In an experiment we design a research plan purposely to impose controls over the amount of exposure (treatment) to a drug. (e.g., Clinical Trials)
Sampling Methods :Sampling Methods Random Sampling (Simple)
Systematic Sampling
Stratified Sampling
Cluster Sampling
Convenience Sampling
More complex sampling
Some Epidemiologic Studies :Some Epidemiologic Studies Retrospective Studies:
Retrospective Studies gather past data from selected cases and controls to determine difference, if any, in the exposure to a suspected factor. They are commonly referred to as case-control studies
Prospective Studies:
Prospective studies are usually cohort studies in which one enrolls a group of healthy people and follows them over a certain period to determine the frequency with which a disease develops
Qualitative and Quantitative Variables :Qualitative and Quantitative Variables Examples of qualitative variables are occupation, sex, marital status, and etc
Variables that yield observations that can be measured are considered to be quantitative variables. Examples of quantitative variables are weight, height, and age
Quantitative variables can further be classified as discrete or continuous
Slide 13:VARIABLES TYPES Categorical variables (e.g., Sex, Marital Status, income category)
Continuous variables (e.g., Age, income, weight, height, time to achieve an outcome)
Discrete variables (e.g.,Number of Children in a family)
Binary or Dichotomous variables (e.g., response to all Yes or No type of questions)
Slide 14:VARIABLES SCALE SCALE OF VARIABLE
Nominal Scale
Ordinal Scale
Interval Scale
Interval Ratio Scale
Scale of Data :Scale of Data 1. Nominal: These data do not represent an amount or quantity (e.g., Marital Status, Sex)
2. Ordinal: These data represent an ordered series of relationship (e.g., level of education)
3. Interval: These data is measured on an interval scale having equal units but an arbitrary zero point. (e.g.: Temperature in Fahrenheit)
4. Interval Ratio: Variable such as weight for which we can compare meaningfully one weight versus another (say, 100 Kg is twice 50 Kg)
VARIABLES IN THE PROTOCOL :VARIABLES IN THE PROTOCOL TYPES OF VARIABLE
independent
dependent
intermediate
confounding
Independent Variable :Independent Variable The characteristic being observed and/or measured that is hypothesized to influence an event or outcome (dependent variable).
NOTE
The independent variable is not influenced by the event or outcome, but may cause it or contribute to its variation.
Dependent Variable :Dependent Variable A variable whose value is dependent on the effect of other variables (ie., “independent variables”) in the relationship being studied. Synonyms: outcome or response variable.
NOTE
an event or outcome whose variation we seek to explain or account for by the influence of independent variables.
Intermediate Variable :Intermediate Variable A variable that occurs in a causal pathway from an independent to a dependent variable. Synonyms: intervening, mediating
NOTES
it produces variation in the dependent variable, and is caused to vary by the independent variable.
such a variable is “associated” with both the dependent and independent variables.
Confounding Variable :Confounding Variable A factor (that is itself a determinant of the outcome), that distorts the apparent effect of a study variable on the outcome.
NOTE
such a factor may be unequally distributed among the exposed and the unexposed, and thereby influence the apparent magnitude and even the direction of the effect.
Organizing Data :Organizing Data Frequency Table
Frequency Histogram
Relative Frequency Histogram
Frequency polygon
Relative Frequency polygon
Bar chart
Pie chart
stem-and-leaf display
Box Plot
Frequency Table :Frequency Table Suppose we are interested in studying the number of children in the families living in a community. The following data has been collected based on a random sample of n = 30 families from the community.
2, 2, 5, 3, 0, 1, 3, 2, 3, 4, 1, 3, 4, 5, 7, 3, 2, 4, 1, 0, 5, 8, 6, 5, 4 , 2, 4, 4, 7, 6
Organize this data in a Frequency Table!
Frequency Table :Frequency Table Now suppose we need to construct a similar frequency table for the age of patients with Heart related problems in a clinic.
The following data has been collected based on a random sample of n = 30 patients who went to the emergency room of the clinic for Heart related problems.
The measurements are: 42, 38, 51, 53, 40, 68, 62, 36, 32, 45, 51, 67, 53, 59, 47, 63, 52, 64, 61, 43, 56, 58, 66, 54, 56, 52, 40, 55, 72, 69.
Measures of Central Tendency :Measures of Central Tendency Where is the heart of distribution?
1. Mean
2. Median
3. Mode
Sample Mean :Sample Mean The arithmetic mean (or, simply, mean) is computed by summing all the observations in the sample and dividing the sum by the number of observations.
For a sample of five household incomes, 6000, 10,000, 10,000, 14000, 50,000 the sample mean is,
Sample Median :Sample Median In a list ranked from smallest measurement to the highest, the median is the middle value
In our example of five household incomes, first we rank the measurements
6,000, 10,000, 10,000, 14,000, 50,000
Sample Median is 10,000
Measures of Dispersion or Variability :Measures of Dispersion or Variability Range
Variance
Standard deviation
Formula for Sample Variance & Standard deviation S :Formula for Sample Variance & Standard deviation S Standard deviation = S
Calculation of Variance and Standard deviation :Calculation of Variance and Standard deviation
Empirical Rule :Empirical Rule For a Normal distribution approximately,
a) 68% of the measurements fall within one standard deviation around the mean
b) 95% of the measurements fall within two standard deviations around the mean
c) 99.7% of the measurements fall within three standard deviations around the mean
Suppose the reaction time of a particular drug has a Normal distribution with a mean of 10 minutes and a standard deviation of 2 minutes :Suppose the reaction time of a particular drug has a Normal distribution with a mean of 10 minutes and a standard deviation of 2 minutes Approximately,
a) 68% of the subjects taking the drug will have reaction tome between 8 and 12 minutes
b) 95% of the subjects taking the drug will have reaction tome between 6 and 14 minutes
c) 99.7% of the subjects taking the drug will have reaction tome between 4 and 16 minutes