Statistics

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Introduction to statistics I : 

Introduction to statistics I Sophia King Rm. P24 HWB sk219@le.ac.uk

Descriptive Statistics : 

Descriptive Statistics Statistical procedures used to summarise, organise, and simplify data. This process should be carried out in such a way that reflects overall findings Raw data is made more manageable Raw data is presented in a logical form Patterns can be seen from organised data Frequency tables Graphical techniques Measures of Central Tendency Measures of Spread (variability)

Plotting Data: describing spread of data : 

Plotting Data: describing spread of data A researcher is investigating short-term memory capacity: how many symbols remembered are recorded for 20 participants: 4, 6, 3, 7, 5, 7, 8, 4, 5,10 10, 6, 8, 9, 3, 5, 6, 4, 11, 6 We can describe our data by using a Frequency Distribution. This can be presented as a table or a graph. Always presents: The set of categories that made up the original category The frequency of each score/category Three important characteristics: shape, central tendency, and variability

Frequency Distribution Tables : 

Frequency Distribution Tables Highest Score is placed at top All observed scores are listed Gives information about distribution, variability, and centrality X = score value f = frequency fx = total value associated with frequency ?f = N ?X =?fX

Grouped Frequency Distribution Tables : 

Grouped Frequency Distribution Tables Sometimes the spread of data is too wide Grouped tables present scores as class intervals About 10 intervals An interval should be a simple round number (2, 5, 10, etc), and same width Bottom score should be a multiple of the width Class intervals represent Continuous variable of X: E.g. 51 is bounded by real limits of 50.5-51.5 If X is 8 and f is 3, does not mean they all have the same scores: they all fell somewhere between 7.5 and 8.5

Percentiles and Percentile Ranks : 

Percentiles and Percentile Ranks X values = raw scores, without context Percentile rank = the percentage of the sample with scores below or at the particular value This can be represented be a cumulative frequency column Cumulative percentage obtained by: c% = cf/N(100) This gives information about relative position in the data distribution

Representing data as graphs : 

Representing data as graphs Frequency Distribution Graph presents all the info available in a Frequency Table (can be fitted to a grouped frequency table) Uses Histograms Bar width corresponds to real limits of intervals Histograms can be modified to include blocks representing individual scores

Frequency Distribution Polygons : 

Frequency Distribution Polygons Shows same information with lines: traces ‘shape’ of distribution Both histograms and polygons represent continuous data For non numerical data, frequency distribution can be represented by bar graphs Bar graphs have spaces between adjacent bars to represent distinct categories

Frequencies of Populations and Samples : 

Frequencies of Populations and Samples Population All the individuals of interest to the study Sample The particular group of participants you are testing: selected from the population Although it is possible to have graphs of population distributions, unlike graphs of sample distributions, exact frequencies are not normally possible. However, you can Display graphs of relative frequencies (categorical data) Use smooth curves to indicate relative frequencies (interval or ratio data)

Frequency Distribution: the Normal Distribution : 

Bell-shaped: specific shape that can be defined as an equation Symmetrical around the mid point, where the greatest frequency if scores occur Asymptotes of the perfect curve never quite meet the horizontal axis Normal distribution is an assumption of parametric testing Frequency Distribution: the Normal Distribution

Measures of Central Tendency : 

Measures of Central Tendency A way of summarising the data using a single value that is in some way representative of the entire data set It is not always possible to follow the same procedure in producing a central representative value: this changes with the shape of the distribution Mode Most frequent value Does not take into account exact scores Unaffected by extreme scores Not useful when there are several values that occur equally often in a set

Measures of Central Tendency : 

Measures of Central Tendency Median The values that falls exactly in the midpoint of a ranked distribution Does not take into account exact scores Unaffected by extreme scores In a small set it can be unrepresentative Mean (Arithmetic average) Sample mean: M = ?X Population mean: ? = ?X n N Takes into account all values Easily distorted by extreme values

Measures of Central Tendency : 

Measures of Central Tendency For our set of memory scores: 4, 6, 3, 7, 5, 7, 8, 4, 5,10 10, 6, 8, 9, 3, 5, 6, 4, 11, 6 Mode = 6: Median = 6: Mean = 6.35 The mean is the preferred measure of central tendency, except when There are extreme scores or skewed distributions Non interval data Discrete variables

Central Tendencies and Distribution Shape : 

Central Tendencies and Distribution Shape

Describing Variability : 

Describing Variability Describes in an exact quantitative measure, how spread out/clustered together the scores are Variability is usually defined in terms of distance How far apart scores are from each other How far apart scores are from the mean How representative a score is of the data set as a whole

Describing Variability: the Range : 

Describing Variability: the Range Simplest and most obvious way of describing variability Range = ?Highest - ?Lowest The range only takes into account the two extreme scores and ignores any values in between. To counter this there the distribution is divided into quarters (quartiles). Q1 = 25%, Q2 =50%, Q3 =75% The Interquartile range: the distance of the middle two quartiles (Q3 – Q1) The Semi-Interquartile range: is one half of the Interquartile range

Describing Variability: Deviation : 

Describing Variability: Deviation A more sophisticated measure of variability is one that shows how scores cluster around the mean Deviation is the distance of a score from the mean X - ?, e.g. 11 - 6.35 = 3.65, 3 – 6.35 = -3.35 A measure representative of the variability of all the scores would be the mean of the deviation scores ?(X - ?) Add all the deviations and divide by n n However the deviation scores add up to zero (as mean serves as balance point for scores)

Describing Variability: Variance : 

Describing Variability: Variance To remove the +/- signs we simply square each deviation before finding the average. This is called the Variance: ?(X - ?)² = 106.55 = 5.33 n 20 The numerator is referred to as the Sum of Squares (SS): as it refers to the sum of the squared deviations around the mean value

Describing Variability: Population Variance : 

Describing Variability: Population Variance Population variance is designated by ?² ?² = ?(X - ?)² = SS N N Sample Variance is designated by s² Samples are less variable than populations: they therefore give biased estimates of population variability Degrees of Freedom (df): the number of independent (free to vary) scores. In a sample, the sample mean must be known before the variance can be calculated, therefore the final score is dependent on earlier scores: df = n -1 s² = ?(X - M)² = SS = 106.55 = 5.61 n - 1 n -1 20 -1

Describing Variability: the Standard Deviation : 

Describing Variability: the Standard Deviation Variance is a measure based on squared distances In order to get around this, we can take the square root of the variance, which gives us the standard deviation Population (?) and Sample (s) standard deviation ? = ???(X - ?)² N s = ???(X - M)² n - 1 So for our memory score example we simple take the square root of the variance: = ?5.61 = 2.37

Describing Variability : 

Describing Variability The standard deviation is the most common measure of variability, but the others can be used. A good measure of variability must: Must be stable and reliable: not be greatly affected by little details in the data Extreme scores Multiple sampling from the same population Open-ended distributions Both the variance and SD are related to other statistical techniques

Descriptive statistics : 

Descriptive statistics A researcher is investigating short-term memory capacity: how many symbols remembered are recorded for 20 participants: 4, 6, 3, 7, 5, 7, 8, 4, 5,10 10, 6, 8, 9, 3, 5, 6, 4, 11, 6 What statistics can we display about this data, and what do they mean? Frequency table: show how often different scores occur Frequency graph: information about the shape of the distribution Measures of central tendency and variability

Descriptive statistics : 

Descriptive statistics

References and Further Reading : 

References and Further Reading Gravetter & Wallnau Chapter 2 Chapter 3 Chapter 4

authorStream Live Help