Slide 1: CLASSIFICATION AND TABULATION SKEUNESS AND KURTOSIS Slide 2: CLASSIFICATION OF DATA DEFINITION : DEFINITION “Classification is the process of arranging things (either actually or notionally) in groups or classes according to their resemblances and affinities and gives expression to the unity of attributes that may subsist amongst a diversity of individuals.” - Conner FUNCTIONS OF CLASSIFICATION : FUNCTIONS OF CLASSIFICATION Bulk of the data Simplifies the data Facilitates comparison of characteristics Renders the data ready for statistical analysis CHARACTERISTICS OF CLASSIFICATION:: CHARACTERISTICS OF CLASSIFICATION: Unambiguous Stable Flexible Exhaustiveness Mutually exclusive Suitability Homogeneity Revealing OBJECTIVES OF CLASSIFICATION:: OBJECTIVES OF CLASSIFICATION: To condense the mass of data To enable grasping of data To prepare the data for tabulation To study the relationships To facilitate comparison TYPES OF CLASSIFICATION : TYPES OF CLASSIFICATION Geographical (or spatial) classification Chronological classification Qualitative classification Quantitative classification Geographical (or spatial) classification : Geographical (or spatial) classification When the data are classified according to geographical location or region (like states, cities, regions, zones, areas, etc) it is called a geographical classification. For example, the production of food grains in India may be presented state-wise in the following manner. State-wise estimates of production of production of food grains: State-wise estimates of production of production of food grains Chronological classification: : Chronological classification: When data are observed over a period of time the type of classification is known as chronological classification (on the basis of its time of occurrence). Various time series such as National income figures, annual output of wheat, monthly expenditure of a household, daily consumption of milk, etc, are some examples of chronological classification. For example we may present the figures of population (or production, sales, etc.) as follows: Population of India from 1941 to 1991 : Population of India from 1941 to 1991 Qualitative classification: Qualitative classification we may first divide the population into males and females on the basis of the attribute ‘sex’ , each of these classes may be further subdivide into ‘literate’ and ‘ illiterate’ on the basis of the attribute ‘literacy’. Further classification can be made on the basis of some other attribute, say, employment. Quantitative classification: : Quantitative classification: Quantitative classification refers to the classification of data according to some characteristics that can be measured, such as height, weight, income, sales, profits, production, etc. For example, the students of a college may be classified according to weight as follows: Slide 16: INTRODUCTION TO TABULATION DEFINITION According to Tuttle, “A statistical table is the logical listing of related quantitative data in vertical columns and horizontal rows of numbers, with sufficient explanatory and qualifying words, phrases and statements in the form of titles, heading and footnotes to make clear the full meaning of the data and their origin” Slide 17: OBJECTIVES OF TABULATION To simplify the complex data 2. To economize space 3. To facilitate comparison 4. To facilitate statistical analysis 5. To save time 6.To depict trend 7. To help reference Slide 18: Components Of Table Table number Title of the table Caption / Box head Stub Body / Field Head note Foot note Source data Slide 19: Stub headings Caption Total (rows) Subhead Subhead Column-head Column head Column- head Column head Stub Entries Total (columns) Foot note : Source note: Slide 20: REQUIREMENTS OF GOOD STATISTICAL TABLES Suit the purpose Scientifically prepared Clarity Manageable size Columns and rows should be numbered Suitably approximated Attractive get-up Units Average and totals Logical arrangement of items Proper lettering Slide 21: Types of tables Simple and Complex tables. a) one way table : b) Two way table : Age (in years) no. of employees 25 - 35 10 35 - 45 12 45 - 55 14 Total - 36 Slide 22: Age (in years) Male employees Female employees Total 25 – 35 5 5 10 35 – 45 7 5 12 45 - 55 8 6 14 Total 20 16 36 Higher order table 2. General and special purpose tables. Slide 23: Advantages of classification and tabulation Clarifies the object Simplifies the complex data Economic space Facilitates the comparison It helps in references Depict the trend Slide 24: Disadvantages of classification and tabulation Complicated process Every data can not be put into tables Lack of flexibility Skewness: Skewness Definition: Definition “In probability theory and statistics skewness is a measure of the asymmetry of the probability distribution of a real –valued random variable.” The skewness value can be positive or negative, or even undefined. A zero value indicates that the values are relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution. Slide 27: A normal distributation is a bell-shaped distribution of data where the mean, median and mode all coincide. A frequency curve showing a normal distribution would look like this: In a normal distribution, approximately 68% of the values lie within one standard deviation of the mean and approximately 95% of the data lies within two standard deviation of the mean. Slide 28: Negatively skewed : Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values (including the median) lie to the right of the mean. Positively skewed : A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean. MEASURE OF SKEWNESS: MEASURE OF SKEWNESS Measures tells us the direction and extent of assymmetry in a series Absolute measure of skewness- Absolute Sk = Mean – Mode Standard Deviation Absolute skewness when is based on quartile Sk = Q3+Q1-2Median Q3-Q1 where Q3 -3rd quartile , Q1 -1 st quartile If value of Mean > Mode then skewness is positive If value of Mean<Mode then skewness is negative Relative measure of skewness : Relative measure of skewness Karl Pearson’s coefficient of skewness Sk = Mean – Mode Standard Deviation B . Bowleys coefficient of skewness - based on quartiles Skb =(Q3-Med) – (Med-Q1 ) Q3-Med)+(Med-Q1) Slide 31: C. Kelly’s coefficient of Skewness Skewness based upon 10 th and 9 th percentile for 1st and 9 th decile Skk = P10+P90- 2 Median P90-P10 Skk = D1+D9- 2 Median D9-D1 D. Measure of Skewness base on third moment B1= u23 u32 b1- Relative measure of Skewness u3- Third Moment u2- Second Moment KURTOSIS: KURTOSIS Definition : Definition Kurtosis refers to the degree of flatness or peakedness .In the region about the mode of a frequency curve. Slide 34: 1. Leptokurtic - The curve is more peaked than the normal curve. 2. Mesokurtic - The normal curve itself is Mesokurtic. 3. Platykurtic - The curve is more flat-topped than the normal curve. EXCESS OF KURTOSIS :- The condition of peakedness or flat-toppedness itself is known as excess of kurtosis. Types of kurtosis Slide 35: KURTOSIS CURVES: P-Platykurtic, m-Mesokurtic, L-leptokurtic L L m Slide 36: MEASURES OF KURTOSIS Most important measure of kurtosis is the value of the coefficient B 2 B 2 = u 4 2 U 2 Importance of measure of Kurtosis The greater the value of B 2 , the more peaked is the distribution. The value of normal curve is B 2 = 3. For a platykurtic curve the value of B 2 is less than 3. Derivative of B 2 is y 2 :- Positive value :- Leptokurtic Negative value:- Platykurtic.