logging in or signing up ApplicationsE aSGuest9342 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 9 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 06, 2009 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide 1: Choice of an appropriate statistical technique a complex issue somewhat arbitrary Real-life data often contain mixtures of different types of data two statisticians may select different methods depending upon what assumptions they are willing to take into account extraneous factors availability of software and its limitations availability of time and financial resources General Principles of Data Analysis Slide 2: Warnings Figures allow us to calculate them Applying different techniques and obtaining different results does not mean that something is wrong Looking for an answer to the same question by using several methods may lead to a better understanding Obtaining negative results may be as informative as getting a positive one Obtaining no answer by using one technique, does not mean that there is no answer at all Etc. General Principles of Data Analysis Slide 3: The choice of a statistical technique depends essentially upon Characteristics of the analysis question; Characteristics of the data; Characteristics of the sampling design. Characteristics of the Analysis Question Whether there is a distinction between independent and dependent variables or not? Whether the nature of the research problem requires: Description, exploration, estimation, or Testing of a hypothesis or model Whether the focus of research is on 'variables' or 'objects‘. General Principles of Data Analysis Slide 4: Characteristics of the Data Types of data sets Individuals - variables data sets Proximities data sets Variable - Variable Proximities Individual - Individual Proximities Types of Variables Continuous or Quantitative Variables Discrete or Qualitative Variables Variable types by measurement level General Principles of Data Analysis Nominal-scale variables Ordinal-scale variables Interval-scale variables Ratio-scale variables Slide 5: Techniques for problems without distinction between independent and dependent variables General Principles of Data Analysis Slide 6: Techniques for problems with distinction between independent and dependent variables General Principles of Data Analysis Slide 7: Usual way of statistical problem solving Formulate the question using terms and logics of the specific field of the problem (science management, pedagogy, economics, etc.) Reformulate the question using statistical terms and logics Find appropriate statistical model(s) and technique(s) Use the selected model(s) and technique(s) Give statistical interpretation to the results obtained Reformulate the interpretation with terms of the original field of application General Principles of Data Analysis Slide 8: Question in research management Research groups have multiple outputs comprising publications, patents, experimental materials etc. What are the differences if any in the performance of the Research Groups of selected countries? Statistical question Can we construct a reasonable productivity index, using the following measures of the scientific output Articles in country Patents Articles abroad Algorithms and designs Original research reports Experimental material Can we find a significant difference by countries in the productivity index? Scientific products by country Slide 9: Statistical model and technique Partial order scoring for constructing the index of research output Analysis of variance for testing the hypothesis concerning the significance of the difference Use of the selected model and technique Scientific products by country $RUN POSCOR $FILES PRINT = POSCOR.LST DICTIN = R2R3RU.DIC DATAIN = R2RU.DAT DICTOUT =POSCOR.DIC DATAOUT =POSCOR.DAT $SETUP POSCOR SCORES OF RU OUTPUTS BADDATA=MD1 - IDVAR=V2 - TRANSVARS=(V1) POSCOR ORDER=DESR - ANAME=‘RU OUTPUT’ – VARS=(V116,V118,V122,V126,V128,V130) $RUN ONEWAY $FILES PRINT = ONEWAY1.LST DICTIN = POSCOR.DIC DATAIN = POSCOR.DAT $SETUP ANALYSIS OF VARIANCE OF RU OUTPUT BADDATA=MD1 - PRINT=CDICT DEPVARS=(V8) CONVARS=(R1) $RECODE R1=RECODE V15 (40)=1, (360)=2, (410)=3, (638)=4, (844)=5, (868)=6 Slide 10: Scientific products by country Use of the selected model and technique (results) Slide 11: Scientific products by country Statistical interpretation The F( 5,1454)=56.018 value shows that there is a highly significant difference by country in the constracted performance index. We see also a medium strength differentiation between the countries: Eta(adj)=0.398. The Mean values show the level of each country. Interpretation for research management There are two countries with low, two ones with medium and two other ones with high productivity index. Source P.S. Nagpaul: Guide to Advanced Data Analysis using IDAMS Software Slide 12: Question in psychology - pedagogy Intellectual performance, motivation and creativity of school children can be measured by using several indicators. Some of them are produced by the children themselves (e.g. IQ tests) others are based on the evaluation given by their teachers (e.g. average grade). What are the perceivable dimensions if any behind these indicators? Statistical question In the set of the listed indicators, are there any groups within which statistical inter-correlation and between which statistical independence can be detected? T Average grade T Creative behaviour C IQ C Achievement motivation C Creativity test T Motivated behaviour C Creative attitude T Motivation index Performance, motivation and creativity of school children Slide 13: Statistical model and technique Pearsonian correlation between the measured indicators Multidimensional scaling, cluster analysis Use of the selected model and technique Executing PEARSON, MDSCAL, CLUSFIND in IDAMS MDSCAL result Performance, motivation and creativity of school children Teachers Children Slide 14: Use of the selected model and technique CLUSFIND result Performance, motivation and creativity of school children Slide 15: Performance, motivation and creativity of school children Statistical interpretation Multidimensional scaling shows clear separation of indicators produced by children and teachers Cluster analysis supports the finding of the separation of variables coming from teachers and children Pedagogical/psychological interpretation Just one aspect: ratings given by teachers to children are nearly the same, independently of the evaluated ability, attitude or behaviour dimension Source M. Hunya: Multidimensional statistical techniques in pedagogical studies Data A.Deak, B. Kozeki: Study into the effect of motivation and creativity factors on the performance of school children Slide 16: Question in hydrology We have water level data on four rivers in North-Africa (mor than 40 years). Can the water flow level be predicted on the basis of data from the past? If so, with what precision? What if the average flow level is considered instead of the individual ones? Statistical question Can the river flow values be predicted by using a set of values from the preceding period? How does the prediction change if 6 month average flow is used? Prediction of river flow values Slide 17: Statistical model and technique Autoregression model (with a lag of 12 to 36) applied to the river flow time series Transformation of the original data into a time series of moving averages (interval length = 6) Use of the selected model and technique Time Series Analysis option from the IDAMS interactive facilities Original series Moving average series 12 months R**2=0,32 12 months R**2=0,92 24 months R**2=0,35 24 months R**2=0,93 36 months R**2=0,36 Prediction of river flow values Slide 18: Use of the selected model and technique Original series Prediction of river flow values Moving average series Slide 19: Prediction of river flow values Statistical interpretation Autoregression shows that individual values can be predicted (Unbiased R**2 = 0,32 - 0,36; for 12 to 36 months) with moderate or avarage precision, high peak values are very poorly reproduced. In the case of a 6 month moving average, the prediction is nearly perfect (Unbiased R**2 = 0,92; for 12 months). Hydrological interpretation Although the pattern of changes can fairly be reproduced, even three years data from the past are not enough at all to predict the height of peak flows. But if we consider 6 month averages, they can be predicted almost with full precision. Data UNESCO, Water Science Division Slide 20: Question concerning company management What are the factors that influence the economic performance of a company? Economic performance is measured by the return on capital employed. Statistical question Can the return on capital be predicted by using a set of economic and production indicators from those characterizing the company? How does the prediction change if we are loking for a subset of best predictors? Statistical model and technique Multiple linear regression Stepwise regression Business Slide 21: Use of the selected model and technique Running REGRESSN Results The full regression model explains 70% of the adjusted variance of the dependant variable. Its standard error is about one half of the mean, value of the determinant of the correlation matrix is .79478E-05. There are 8 variables (out of 12) with high covariance ratio values. The stepwise regression model selects 3 variables for explaining 80 % variance. No multicollinearity (0.77647 ). Standard error of the estimate of the dependent variable = 0.06135 which is quite low: high reliability of estimation. Business Slide 22: Business Statistical interpretation Full regression model: the reliability of prediction is poor. Strong multicollinearity is shown. Variables, which contribute to multicollinearity can be identified The stepwise regression model: 3 variables for explaining 80% variance. No multicollinearity. High reliability of estimation. Interpretation for management Although the full indicator set can give nice prediction, it can not be suggested for real use because of the poor prediction reliability. But if we consider 3 carefully selected indicators, we can get a fair prediction. Source P.S. Nagpaul, India Slide 23: Question concerning measurement of knowledge level Tests are used very often in education for checking the level of knowledge in one or in another subject. Long tests with many questions can meet relatively easily the reliability requirement. The question is if we can make a short interactive, adaptive test from a long test, preserving at least nearly the original reliability. Statistical question Can we give a good estimate of the original test value by using a tree structure based prediction? Statistical model and technique Regression tree Education Slide 24: Use of the selected model and technique Running SEARCH Results Starting from a standardized test (for checking a specific verbal aptitude) containing 20 questions, a regression tree with 3-4 questions was obtained. The regression tree contains 10 final subgroups (leaves) with estimates for the original test value ranging from 6,4 to 59,2. The explained variance is 90,4%. Education Slide 25: Education Statistical interpretation A very good estimate can be given for the original test value by using the obtained regression tree. Interpretation for test designers Using the the tree structure, cumputer assisted test can be constructed, which is much shorter, without loosing the power of the original test. Source M. Hunya: Finding optimal interactive test structures (1982) You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
ApplicationsE aSGuest9342 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 9 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 06, 2009 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide 1: Choice of an appropriate statistical technique a complex issue somewhat arbitrary Real-life data often contain mixtures of different types of data two statisticians may select different methods depending upon what assumptions they are willing to take into account extraneous factors availability of software and its limitations availability of time and financial resources General Principles of Data Analysis Slide 2: Warnings Figures allow us to calculate them Applying different techniques and obtaining different results does not mean that something is wrong Looking for an answer to the same question by using several methods may lead to a better understanding Obtaining negative results may be as informative as getting a positive one Obtaining no answer by using one technique, does not mean that there is no answer at all Etc. General Principles of Data Analysis Slide 3: The choice of a statistical technique depends essentially upon Characteristics of the analysis question; Characteristics of the data; Characteristics of the sampling design. Characteristics of the Analysis Question Whether there is a distinction between independent and dependent variables or not? Whether the nature of the research problem requires: Description, exploration, estimation, or Testing of a hypothesis or model Whether the focus of research is on 'variables' or 'objects‘. General Principles of Data Analysis Slide 4: Characteristics of the Data Types of data sets Individuals - variables data sets Proximities data sets Variable - Variable Proximities Individual - Individual Proximities Types of Variables Continuous or Quantitative Variables Discrete or Qualitative Variables Variable types by measurement level General Principles of Data Analysis Nominal-scale variables Ordinal-scale variables Interval-scale variables Ratio-scale variables Slide 5: Techniques for problems without distinction between independent and dependent variables General Principles of Data Analysis Slide 6: Techniques for problems with distinction between independent and dependent variables General Principles of Data Analysis Slide 7: Usual way of statistical problem solving Formulate the question using terms and logics of the specific field of the problem (science management, pedagogy, economics, etc.) Reformulate the question using statistical terms and logics Find appropriate statistical model(s) and technique(s) Use the selected model(s) and technique(s) Give statistical interpretation to the results obtained Reformulate the interpretation with terms of the original field of application General Principles of Data Analysis Slide 8: Question in research management Research groups have multiple outputs comprising publications, patents, experimental materials etc. What are the differences if any in the performance of the Research Groups of selected countries? Statistical question Can we construct a reasonable productivity index, using the following measures of the scientific output Articles in country Patents Articles abroad Algorithms and designs Original research reports Experimental material Can we find a significant difference by countries in the productivity index? Scientific products by country Slide 9: Statistical model and technique Partial order scoring for constructing the index of research output Analysis of variance for testing the hypothesis concerning the significance of the difference Use of the selected model and technique Scientific products by country $RUN POSCOR $FILES PRINT = POSCOR.LST DICTIN = R2R3RU.DIC DATAIN = R2RU.DAT DICTOUT =POSCOR.DIC DATAOUT =POSCOR.DAT $SETUP POSCOR SCORES OF RU OUTPUTS BADDATA=MD1 - IDVAR=V2 - TRANSVARS=(V1) POSCOR ORDER=DESR - ANAME=‘RU OUTPUT’ – VARS=(V116,V118,V122,V126,V128,V130) $RUN ONEWAY $FILES PRINT = ONEWAY1.LST DICTIN = POSCOR.DIC DATAIN = POSCOR.DAT $SETUP ANALYSIS OF VARIANCE OF RU OUTPUT BADDATA=MD1 - PRINT=CDICT DEPVARS=(V8) CONVARS=(R1) $RECODE R1=RECODE V15 (40)=1, (360)=2, (410)=3, (638)=4, (844)=5, (868)=6 Slide 10: Scientific products by country Use of the selected model and technique (results) Slide 11: Scientific products by country Statistical interpretation The F( 5,1454)=56.018 value shows that there is a highly significant difference by country in the constracted performance index. We see also a medium strength differentiation between the countries: Eta(adj)=0.398. The Mean values show the level of each country. Interpretation for research management There are two countries with low, two ones with medium and two other ones with high productivity index. Source P.S. Nagpaul: Guide to Advanced Data Analysis using IDAMS Software Slide 12: Question in psychology - pedagogy Intellectual performance, motivation and creativity of school children can be measured by using several indicators. Some of them are produced by the children themselves (e.g. IQ tests) others are based on the evaluation given by their teachers (e.g. average grade). What are the perceivable dimensions if any behind these indicators? Statistical question In the set of the listed indicators, are there any groups within which statistical inter-correlation and between which statistical independence can be detected? T Average grade T Creative behaviour C IQ C Achievement motivation C Creativity test T Motivated behaviour C Creative attitude T Motivation index Performance, motivation and creativity of school children Slide 13: Statistical model and technique Pearsonian correlation between the measured indicators Multidimensional scaling, cluster analysis Use of the selected model and technique Executing PEARSON, MDSCAL, CLUSFIND in IDAMS MDSCAL result Performance, motivation and creativity of school children Teachers Children Slide 14: Use of the selected model and technique CLUSFIND result Performance, motivation and creativity of school children Slide 15: Performance, motivation and creativity of school children Statistical interpretation Multidimensional scaling shows clear separation of indicators produced by children and teachers Cluster analysis supports the finding of the separation of variables coming from teachers and children Pedagogical/psychological interpretation Just one aspect: ratings given by teachers to children are nearly the same, independently of the evaluated ability, attitude or behaviour dimension Source M. Hunya: Multidimensional statistical techniques in pedagogical studies Data A.Deak, B. Kozeki: Study into the effect of motivation and creativity factors on the performance of school children Slide 16: Question in hydrology We have water level data on four rivers in North-Africa (mor than 40 years). Can the water flow level be predicted on the basis of data from the past? If so, with what precision? What if the average flow level is considered instead of the individual ones? Statistical question Can the river flow values be predicted by using a set of values from the preceding period? How does the prediction change if 6 month average flow is used? Prediction of river flow values Slide 17: Statistical model and technique Autoregression model (with a lag of 12 to 36) applied to the river flow time series Transformation of the original data into a time series of moving averages (interval length = 6) Use of the selected model and technique Time Series Analysis option from the IDAMS interactive facilities Original series Moving average series 12 months R**2=0,32 12 months R**2=0,92 24 months R**2=0,35 24 months R**2=0,93 36 months R**2=0,36 Prediction of river flow values Slide 18: Use of the selected model and technique Original series Prediction of river flow values Moving average series Slide 19: Prediction of river flow values Statistical interpretation Autoregression shows that individual values can be predicted (Unbiased R**2 = 0,32 - 0,36; for 12 to 36 months) with moderate or avarage precision, high peak values are very poorly reproduced. In the case of a 6 month moving average, the prediction is nearly perfect (Unbiased R**2 = 0,92; for 12 months). Hydrological interpretation Although the pattern of changes can fairly be reproduced, even three years data from the past are not enough at all to predict the height of peak flows. But if we consider 6 month averages, they can be predicted almost with full precision. Data UNESCO, Water Science Division Slide 20: Question concerning company management What are the factors that influence the economic performance of a company? Economic performance is measured by the return on capital employed. Statistical question Can the return on capital be predicted by using a set of economic and production indicators from those characterizing the company? How does the prediction change if we are loking for a subset of best predictors? Statistical model and technique Multiple linear regression Stepwise regression Business Slide 21: Use of the selected model and technique Running REGRESSN Results The full regression model explains 70% of the adjusted variance of the dependant variable. Its standard error is about one half of the mean, value of the determinant of the correlation matrix is .79478E-05. There are 8 variables (out of 12) with high covariance ratio values. The stepwise regression model selects 3 variables for explaining 80 % variance. No multicollinearity (0.77647 ). Standard error of the estimate of the dependent variable = 0.06135 which is quite low: high reliability of estimation. Business Slide 22: Business Statistical interpretation Full regression model: the reliability of prediction is poor. Strong multicollinearity is shown. Variables, which contribute to multicollinearity can be identified The stepwise regression model: 3 variables for explaining 80% variance. No multicollinearity. High reliability of estimation. Interpretation for management Although the full indicator set can give nice prediction, it can not be suggested for real use because of the poor prediction reliability. But if we consider 3 carefully selected indicators, we can get a fair prediction. Source P.S. Nagpaul, India Slide 23: Question concerning measurement of knowledge level Tests are used very often in education for checking the level of knowledge in one or in another subject. Long tests with many questions can meet relatively easily the reliability requirement. The question is if we can make a short interactive, adaptive test from a long test, preserving at least nearly the original reliability. Statistical question Can we give a good estimate of the original test value by using a tree structure based prediction? Statistical model and technique Regression tree Education Slide 24: Use of the selected model and technique Running SEARCH Results Starting from a standardized test (for checking a specific verbal aptitude) containing 20 questions, a regression tree with 3-4 questions was obtained. The regression tree contains 10 final subgroups (leaves) with estimates for the original test value ranging from 6,4 to 59,2. The explained variance is 90,4%. Education Slide 25: Education Statistical interpretation A very good estimate can be given for the original test value by using the obtained regression tree. Interpretation for test designers Using the the tree structure, cumputer assisted test can be constructed, which is much shorter, without loosing the power of the original test. Source M. Hunya: Finding optimal interactive test structures (1982)