Methods for Assessing Progress in Face Recognition: Methods for Assessing Progress in Face Recognition TWO METHODS: TWO METHODS Face Recognition Evaluations Independent evaluations Meta-Analysis Dependent evaluations Meta-Analysis: Meta-Analysis Introduction to Meta-Analysis. Methodology for Selecting Papers Analysis of Performance Scores – Viewing the Data Through Histograms. Evaluation of Experiments with a Baseline Meta-Analysis Conclusions Introduction to Meta-Analysis: Introduction to Meta-Analysis Meta-analysis is a quantitative method for analyzing the results from multiple papers on the same subject. technique for examining experimental results across multiple papers in a field EXAMPLE : face recognition papers published in the scientific literature. Slide 5: Meta-analysis has been used extensively in Medicine psychology and social sciences. TWO TYPE OF META ANALYSIS: TWO TYPE OF META ANALYSIS ONE TYPE - a statistical analysis of results from multiple papers on a subject from different research groups. GOAL - is to take the results of a number of possibly inconclusive studies and analysis to provide conclusive results from a series of inconclusive studies . Slide 7: SECOND TYPE -examines a field to identify potential methodological problems. Each field has its established conventions for conducting and reporting research results. FIELD: automatic face recognition algorithms. Slide 8: By performing a meta-analysis- we can investigate the validity of the reported results, we can also report on the underlying causes and recommend possible solutions. Automatic face recognition-Reasons for selection to meta-analysis: Automatic face recognition-Reasons for selection to meta-analysis Very active area of research for the last decade. An accepted quantitative performance measure – probability of identification. 3. Exist databases of facial images . 4. Exist independent measures of performance – the FERET evaluations for example. 5.Exists an accepted baseline algorithm that is easily implemented – principal component analysis (PCA)-based algorithms (also known as eigenfaces ). Methodology for Selecting Papers: Methodology for Selecting Papers Selected papers that ran experiments using either the FERET or the AT&T Laboratories-Cambridge (ORL) databases . reported identification performance results for full frontal facial images. Methodology for Selecting Papers-SEARCHED PAPERS: Methodology for Selecting Papers-SEARCHED PAPERS PRODUCED 47 PAPERS Methodology for Selecting Papers: Methodology for Selecting Papers Methodology for Selecting Papers- face recognition algorithm: Methodology for Selecting Papers- face recognition algorithm EXPERIMENTAL ALGORITHM: By performing experiments on one or more data sets. For number of variations for an algorithm- choose the variation with the best overall performance. BASELINE ALGORITHM : additional algorithms- a correlation or PCA-based face recognition algorithm. For multiple baseline algorithms- select the variation of a PCA-based algorithm with the best performance as the baseline algorithm. Methodology for Selecting Papers: Methodology for Selecting Papers Combining the results- 3 sets of papers. First set – ORL Database and FERET Database. Second set - Two papers by Moghaddam and Pentland that used the same image sets. Third set- Three papers by Liu and Wechsle that used the same image sets. Analysis of Performance Scores : Analysis of Performance Scores Viewing the Data Through Histograms. First Ques-“Are researchers working on interesting problems?”, through histograms. Histograms summarize the distribution of performance scores (error rates in the meta-analysis) and allow peaks in the distribution to be easily identified. Analysis of Performance Scores : Analysis of Performance Scores Peaks with low error rates researchers are concentrating on an easy problems that is not interesting. Peaks with high error rates researchers are concentrating on hard problems. Analysis of Performance Scores: Analysis of Performance Scores In meta-analysis-characterize performance by the identification error. The error rate is the percentage of probes that are not correctly identified, which is one minus the top-match identification rate. HISTOGRAM: HISTOGRAM Analysis of Performance Scores: Analysis of Performance Scores For the 68 performance scores , 56% (38 out of 68) have an error rate below 0.10. For 40 experimental algorithms( 33 of which have corresponding baseline algorithms), 73%(29 out of 40 ) experimental algorithms report error rates of 0.10 or less. For 7 experimental algorithms that do not have a baseline score. (40-33=7), The error rates for these algorithms are: 0.008, 0.01, 0.02, 0.034,0.045, 0.046 and 0.28. Their median is 0.034. 6 out of 7 experiments have an error rate less than 0.05. Thus results from experimental algorithms without a supporting baseline algorithm are highly biased. Analysis of Performance Scores: Analysis of Performance Scores Analysis of Performance Scores: Analysis of Performance Scores Seven papers report experimental results on the ORL database -This produced 11 performance scores: 10 experimental and 1 baseline. The PCA algorithm in Lawrence was the baseline algorithm for the ORL experiments. The baseline error rate is 0.105. The error rate range for the experimental algorithms is between 0.029 and 0.13, with 7 out of 10 performance scores equal to or less than 0.05. Analysis of Performance Scores: Analysis of Performance Scores CONCLUSION: This indicates that performance using this data sets does not define a sufficiently challenging problem for automatic face recognition. In the ORL database, all the pictures of a person are taken on the same day. Thus, experiments on the ORL database are equivalent to fb (same day,same lighting) experiments on FERET data. It no longer represents an interesting problem. Evaluation of Experiments with a Baseline: Evaluation of Experiments with a Baseline Second question: “Is progress being made on interesting face recognition problems?”. Direct method comparing performance of the experimental algorithms. this is not possible. Indirect method compare algorithms measure the relationship between the performance scores of the experimental and baseline algorithms on the same experiments. Evaluation of Experiments with a Baseline: Evaluation of Experiments with a Baseline Evaluation of Experiments with a Baseline: Evaluation of Experiments with a Baseline Evaluation of Experiments with a Baseline RESULT OF SCATTER PLOT ANALYSIS: Evaluation of Experiments with a Baseline RESULT OF SCATTER PLOT ANALYSIS Evaluation of Experiments with a Baseline: Evaluation of Experiments with a Baseline CONCLUSION: All the algorithms are making the same incremental increase in performance over the baseline algorithm. PCA-based algorithms are not appropriate baseline algorithms for low error experiments. Meta-Analysis Conclusions: Meta-Analysis Conclusions For First question, In addition to including results on this easy problem, papers need to include experimental results on harder problems. Experimental results on harder problems will give the face recognition community a chance to properly assess whether a new algorithm represents an improvement over existing approaches. Meta-Analysis Conclusions: Meta-Analysis Conclusions For Second question, The answer from this meta-analysis is no. Meta-analysis results show that baseline and experimental algorithm scores are correlated and all algorithms are making the same incremental improvement over the baseline. Meta-Analysis Conclusions: Meta-Analysis Conclusions Meta-analysis suggest new methodology for conducting and reporting experiments needs to be adopted by the face recognition community. The following recommendations are made: The face recognition community should establish an algorithm implementation as a baseline. The face recognition community should establish a set of standard challenge problems. Published papers should report results on appropriate challenging problems and, for new data sets, provide new performance baseline results. Meta-Analysis Conclusions: Meta-Analysis Conclusions As progress is made, the baseline and challenge problems will need to be updated in face recognition technology. Scientific fields advance by conducting research on hard and interesting problems. The FERET evaluations and FRVT 2000 have identified three hard problems that have real-world applications: temporal changes between gallery and probe images, pose variations, and recognition from outdoor imagery. By recommendations, it will enable the face recognition community to make quantifiable progress while simultaneously working on problems that are relevant to the real-world problems.