Comparative study between Descriptive Statistics

Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Comparative study between Descriptive Statistics & Inferential statistics:

Comparative study between Descriptive Statistics & Inferential statistics Presented by Partha Chatterjee Rahul Chakraborty Debadrita Dey Linza Biswas Jaya Suman Soni verma Parna Ghosh

Statistics:

Statistics The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling.

Types Of Statistics :

Types Of Statistics Descriptive Statistics Inferential statistics

Descriptive Statistics :

Descriptive Statistics Descriptive statistics include the numbers, tables, charts, and graphs used to Return to Table of Contents describe, organize, summarize, and present raw data.

Descriptive Statistics Are Most Often Used To Examine:

Descriptive Statistics Are Most Often Used To Examine central tendency (location) of data , i.e. where data tend to fall, as measured by the mean, median, and mode. dispersion (variability) of data , i.e. how spread out data are, as measured by the variance and its square root, the standard deviation. skew (symmetry) of data , i.e. how concentrated data are at the low or high end of the scale, as measured by the skew index. kurtosis (peakedness) of data , i.e. how concentrated data are around a single value, as measured by the kurtosis index.

Advantages Of Descriptive Statistics:

Advantages Of Descriptive Statistics be essential for arranging and displaying data form the basis of rigorous data analysis be much easier to work with, interpret, and discuss than raw data help examine the tendencies, spread, normality, and reliability of a data set be rendered both graphically and numerically include useful techniques for summarizing data in visual form form the basis for more advanced statistical methods

Disadvantages Of Descriptive Statistics:

Disadvantages Of Descriptive Statistics be misused, misinterpreted, and incomplete be of limited use when samples and populations are small demand a fair amount of calculation and explanation fail to fully specify the extent to which non-normal data are a problem offer little information about causes and effects be dangerous if not analyzed completely

Mean:

Mean Mean is the average, the most common measure of central tendency. The mean of a population is designated by the Greek letter mu. The mean of a sample is designated by the symbol x-bar. The mean may not always be the best measure of central tendency, especially if data are skewed. For example, average income is often misleading since those few individuals with extremely high incomes may raise the overall average.

Median:

Median Median is the value in the middle of the data set when the measurements are arranged in order of magnitude. For example, if 11 individuals were weighed and their weights arranged in ascending or descending order, the sixth value is the median since five values fall both above and below the sixth value.

Mode:

Mode Mode is the value occurring most often in the data. If the largest group of people in a sample measuring age were 25 years old, then 25 would be the mode. The mode is the least commonly used measure of central tendency, particularly in large data sets. However, the mode is still important for describing a data set, especially when more than one value occurs frequently.

Variance:

Variance Variance is expressed as the sum of the squares of the differences between each observation and the mean, which quantity is then divided by the sample size. For populations, it is designated by the square of the Greek letter sigma square. For samples, it is designated by the square of the letter s (s2). Variance is used less frequently than standard deviation as a measure of dispersion. Variance can be used when we want to quickly compare the variability of two or more sets of interval data.

Standard Deviation:

Standard Deviation Standard deviation is expressed as the positive square root of the variance, i.e. F for populations and s for samples. It is the average difference between observed values and the mean. The standard deviation is used when expressing dispersion in the same units as the original measurements. It is used more commonly than the variance in expressing the degree to which data are spread out.

Coefficient Of Variation:

Coefficient Of Variation Coefficient of variation measures relative dispersion by dividing the standard deviation by the mean and then multiplying by 100 to render a percent. This number is designated as V for populations and v for samples and describes the variance of two data sets better than the standard deviation.

Range:

Range Range measures the distance between the lowest and highest values in the data set and generally describes how spread out data are. For example, after an exam, an instructor may tell the class that the lowest score was 65 and the highest was 95. The range would then be 30.

Percentiles:

Percentiles Percentiles measure the percentage of data points which lie below a certain value when the values are ordered. For example, a student scores 1280 on the Scholastic Aptitude Test (SAT). Her scorecard informs her she is in the 90th percentile of students taking the exam. Thus, 90 percent of the students scored lower than she did.

Quartiles:

Quartiles Quartiles group observations such that 25 percent are arranged together according to their values. The top 25 percent of values are referred to as the upper quartile. The lowest 25 percent of the values are referred to as the lower quartile. Often the two quartiles on either side of the median are reported together as the interquartile range.

Measures Of Skew:

Measures Of Skew Measures of skew describe how concentrated data points are at the high or low end of the scale of measurement. Skew is designated by the symbols Sk for populations and Sk for samples. Skew indicates the degree of symmetry in a data set. The more skewed the distribution, the higher the variability of the measures, and the higher the variability, the less reliable are the data. That is, mode > median > mean. But, if the distribution is skewed left (negative skew), the mean lies to the left of the median and the mode. That is, mean < median < mode. In a perfect distribution, mean = median = mode, and skew is 0.

Measures Of Kurtosis:

Measures Of Kurtosis Measures of kurtosis describe how concentrated data are around a single value, usually the mean. Thus, kurtosis assesses how peaked or flat is the data distribution. The more peaked or flat the distribution, the less normally distributed the data. And the less normal the distribution, the less reliable the data. Mesokurtic distributions are, like the normal bell curve, neither peaked nor flat. Platykurtic distributions are flatter than the normal bell curve. Leptokurtic distributions are more peaked than the normal bell curve.

Inferential Statistics:

Inferential Statistics pertain to the procedures used to make forecasts, estimates, or judgments about a large set of data on the basis of the statistical characteristics of a smaller set (a sample).

Inferential statistics Are Most Often Used To:

Inferential statistics Are Most Often Used To Inferential statistics are frequently used to answer cause-and-effect questions and make predictions.

Advantages Of Inferential statistics:

Advantages Of Inferential statistics provide more detailed information than descriptive statistics yield insight into relationships between variables reveal causes and effects and make predictions generate convincing support for a given theory be generally accepted due to widespread use in business and academia

Disadvantages Of Inferential statistics:

Disadvantages Of Inferential statistics be quite difficult to learn and use properly be vulnerable to misuse and abuse C depend more on sound theory than on implications of a data set

Chi-square (x^2) tests:

Chi-square (x^2) tests Chi-square (x^2) tests are used to identify differences between groups when all variables nominal, e.g., gender, ethnicity, salary group, political party affiliation, and so forth. Such tests are normally used with contingency tables which group observations based on common characteristics.

Analysis of variance (ANOVA):

Analysis of variance (ANOVA) Analysis of variance (ANOVA) permits comparison of two or more populations when interval variables are used. ANOVA does this by comparing the dispersion of samples in order to make inferences about their means.

Analysis of covariance (ACOVA):

Analysis of covariance (ACOVA) Analysis of covariance (ACOVA) examines whether or not interval variables move together in ways that are independent of their mean values. Ideally, variables should move independently of one another, regardless of their means. Unfortunately, in the real world, groups of observations usually differ on a number of dimensions, making simple analyses of variance tests problematic since differences in other characteristics could cause observed differences in the values of the variables of interest.

Correlation:

Correlation Correlation (D), like ACOVA, is used to measure the similarity in the changes of values of interval variables but is not influenced by the units of measure. Another advantage of correlation is that it is always bounded by the interval: -1 < D < 1 Here -1 indicates a perfect inverse linear relationship, i.e. y increases uniformly as x decreases, and 1 indicates a perfect direct linear relationship, i.e. x and y move uniformly together. A value of 0 indicates no relationship.

Regression analysis:

Regression analysis Regression analysis is often used to determine the effect of independent variables on a dependent variable. Regression measures the relative impact of each independent variable and is useful in forecasting. It is used most appropriately when both the independent and dependent variables are interval, though some social scientists also use regression on ordinal data. Like correlation, regression analysis assumes that the relationship between variables is linear.

Logistic regression analysis:

Logistic regression analysis Logistic regression analysis is used to examine relationships between variables when the dependent variable is nominal, even though independent variables are nominal, ordinal, interval, or some mixture thereof. Suppose that one wanted to determine which program interventions were associated with a JOBS Program client's ability to get a job within six months of exiting the program. The outcome variable would be "job" or "no job,” clearly a nominal variable. One could then use several independent variables such as GED completion, job training, post-secondary education and the like to predict the odds of getting a job. Such a method was applied to the JOBS Program audit.

Discriminate analysis:

Discriminate analysis Discriminate analysis is similar to logistic regression in that the outcome variable is categorical. However, here the independent variables must be interval. In the audit of the Probation System, SAO staff explored how well a probationer's rating on drug abuse severity, social adjustment, and similar characteristics predicted whether or not the probationer committed another crime using continuous ratings.

Factor analysis:

Factor analysis Factor analysis simultaneously examines multiple variables to determine if they reflect larger underlying dimensions. Factor analysis is commonly used when analyzing data from multi-question surveys to reduce the numerous questions to a smaller set of more global issues.

Forecasting:

Forecasting Forecasting exists in many variations. The predictive power of regression analysis can be an effective forecasting tool, but time series forecasting is more common when time is a significant independent variable.

Thank You:

Thank You

authorStream Live Help