Lior Rokach Department of Information Systems Engineering Ben-Gurion University of the Negev :

Lior Rokach Department of Information Systems Engineering Ben-Gurion University of the Negev Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data Mining

About Me:

About Me Prof. Lior Rokach Department of Information Systems Engineering Faculty of Engineering Sciences Head of the Machine Learning Lab Ben-Gurion University of the Negev Email: liorrk@bgu.ac.il http://www.ise.bgu.ac.il/faculty/liorr/ PhD (2004) from Tel Aviv University 2

Outline::

Outline: What is bibliometrics ? Short tutorial on bibiometrics measures Our methodology: data mining Task 1: Academic positions Task 2: AAAI Fellowshi p Results Conclusions

Bibliometrics “Man is an animal that writes letters” – Attributed to Lewis Carroll (Charles Dodgson) Scientist is an animal that writes papers Bibliometrics is measurement of (scientific) publications The simplest measure – Number of publications - Disadvantage : counts Quantity and disregards Quality

Publish or Perish:

Publish or Perish “I don’t mind your thinking slowly. I mind your publishing faster than you can think.” (The Nobel Laureates physicist Wolfgang Pauli)

Metrics: Do metrics matter?:

Metrics: Do metrics matter? According to Abbott et al. (Nature, 2010): Department heads says “No” “ External letters trump everything ,” But … Admit that “ those ‘qualitative’ letters of recommendation sometimes bring in quantitative metrics by the back door ” Most of the researchers (70%) believe it has an effect

Quick Guide To Bibliometrics Measures:

Quick Guide To Bibliometrics Measures

Citation Index:

Citation Index A citation index is an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents

The First Citation Index :

The First Citation Index The first citation index is attributed to the Hebrew Talmud (see above), Dated 16 th Centaury (Weinberg, 1997), while other refer to Shepard's Citations created in 1873 as the first citation index. Cited by

Simple Citations-Based Measures to Evaluate Scientists:

Simple Citations-Based Measures to Evaluate Scientists Total Citations (and its squared root) Total Citations normalized by number of authors Mean number of citations per year Mean number of citations per paper

Why citations are not always ideal way to evaluate researchers 'publications :

Why citations are not always ideal way to evaluate researchers 'publications Uncitedness : It is a sobering fact that some 90% of articles that have been published in academic journals are never cited. Even Nobel Laureates have a rather large fraction (10% or more) of uncited publications ( Egghe et al., 2011). But the terms “ uncited ” or “ seldom cited,” they are usually referring to uncited or seldom-cited in the journals monitored by Thomson Reuters and other similar databases , not to all journals, books, and reports ; “ uncited ” or “seldom-cited” is not a synonym for “ not used.” ( MacRoberts MacRoberts , 2011) Expert judgment is the best, and in the last resort the only , criterion of performance ,

A Brief History of Citation Analysis:

A Brief History of Citation Analysis 1955: Eugene Garfield - Linguist Develop the impact factor. Founder of the Institute for Scientific Information (ISI) 1997: Lee Giles; Kurt D. Bollacker ; Steve Lawrence Crawl and harvest papers on the web Focus mainly on CS 2004: “Stand on the shoulders of giants” Freely accessible web search engine for scholarly literature 2005: Jorge E. Hirsch – Physicist Develop the h-Index 2007: Carl Bergstrom – Biologist Establish http://eigenfactor.org/ Use PageRank algorithm to rank journals

1. Impact Factor (Garfield, 1955) :

1. Impact Factor (Garfield, 1955) Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas Garfield, E., Science , 1955, 122, 108-111 The impact factor for each journal, as used by Thomson Scientific, is the average number of citations acquired during the past two years for papers published over the same period. “The 2007 Impact factor for journal ABC ” = Number of times articles published in ABC during 2005-2006 were cited in indexed journals during 2007 –––––––––-––––––––––––––––––––––––––––––––––––––––– Number of “citable” articles published by ABC in 2005 and 2006

Criticisms of the Impact Factor:

Criticisms of the Impact Factor Subject variation : citation studies should be normalized to take into account variables such as field, discipline etc. Long Tail: individual papers is largely uncorrelated to the impact factor of the journal in which it was published. Limited subset of journals are indexed Biased toward English-language journals Short (two year) snapshot of journal Includes self-citations Some journals are unfairly promoting their own papers Journal Inclusion Criteria are more than just quality

PowerPoint Presentation:

Variations of Impact Factor and more: Five years Impact Factor Cited Half-Life - measure the achievability . The Cited Half-Life of journal J in year X is the number of years after which 50% of the lifetime citations of J’s content published in X have been received. Ranking - Journals are often ranked by impact factor in an appropriate ThomsonReuters subject category. journals can be categorised in multiple subject categories which will cause their rank to be different and consequently a rank should always be in context to the subject category being utilised . Other Journal Ranking: Eigenfactor - similar algorithm as Google’s PageRank By this approach, journals are considered to be influential if they are cited often by other influential journals. Removes self-citations Looks at five years of data

2. H-Index (Hirsch, 2005; Egghe and Rousseau, 2006):

2. H-Index (Hirsch, 2005; Egghe and Rousseau, 2006) A scientist is said to have Hirsch index h if h of their total, N , papers have at least h citations each

PowerPoint Presentation:

Using H-Index for Physicists by Hirsh: 10-12 tenure decisions 18 a full professorship 15–20 a fellowship in the American Physical Society 45 or higher membership in the United States National Academy of Sciences . H-Index in IS (Clarke, 2008) Using Google Scholar

h ~ mn (m=gradient, n=number of years) :

h ~ mn (m=gradient, n=number of years) 1. m ~ 1, h=20 after 20 years “Successful Scientists“ 2. m ~ 2, h=40 after 20 years “outstanding scientists“ 3. m ~ 3, h=60 (20 years) or h=90 (30 years) “truly unique individuals” Physics Nobel prizes (last 20 years) ‘h’ (median) = 35 84 % had ‘h’ ≥ 30 49 % had m < 1

Modified H-Index Metrics Scientists with the same H-Index:

Modified H-Index Metrics Scientists with the same H-Index Ref Description Measure Ruane and Tol (2008) It first calculate how many new citations are needed to increase the h-index by one point. Let m denote the additional points needed. Thus the rational hD=h1+1-m/(2h+1) . Rational H-Index Distance Ruane and Tol (2008) A researcher has an h-index of h if h is the largest number of papers with at least h citations. However, some researchers may have more than h papers, say n , with at least h citations. Let us define x= n-h . Thus the rational H-Index become hX=h+x/(s-h) where s is the total number of publications. Rational H-Index X Chun-Ting Zhang (2009) The (square root) of the surplus of citations in the h-set beyond h^2, i.e., beyond the theoretical minimum required to obtain a h-index of 'h'. The aim of the e-index is to differentiate between scientists with similar h-indices but different citation patterns. e-index

Modified H-Index Metrics To share the fame in a fair way multi-authored manuscripts:

Modified H-Index Metrics To share the fame in a fair way multi-authored manuscripts Ref Description Measure Batista et al. 2006 It divides the standard h-index by the average number of authors in the articles that contribute to the h-index, in order to reduce the effects of co-authorship; Individual h-index It first normalizes the number of citations for each paper by dividing the number of citations by the number of authors for that paper, then calculates hI,norm as the h-index of the normalized citation counts. This approach is much more fine-grained than Batista et al.'s; it more accurately accounts for any co-authorship effects that might be present and that it is a better approximation of the per-author impact, which is what the original h-index set out to provide Norm Individual h-index Schreiber (2008) Schreiber's method uses fractional paper counts (for example, only as one third for three authors.) instead of reduced citation counts to account for shared authorship of papers, and then determines the multi-authored hm index based on the resulting effective rank of the papers using undiluted citation counts. Schreiber Individual h-index

Modified H-Index Metrics Age Adjusted :

Modified H-Index Metrics Age Adjusted Ref Description Measure Sidiropoulos et al. (2006) It adds an age-related weighting to each cited article less weight to older articles. The weighting is parametrized; If we use gamma=4 and delta=1, this means that for an article published during the current year, its citations account four times. For an article published 4 years ago, its citations account only one time. For an article published 6 years ago, its citations account 4/6 times, and so on. Contemporary h-index Jin (2007) It is an age-weighted citation rate, where the number of citations to a given paper is divided by the age of that paper. Jin defines the AR-index as the square root of the sum of all age-weighted citation counts over all papers that contribute to the h-index. AR-index Like AR-index but sum over all papers instead (In particular, it allows younger and as yet less cited papers to contribute to the AWCR, even though they may not yet contribute to the h-index.) AWCR

Revised H-Index Metrics Others:

Revised H-Index Metrics Others Ref Description Measure The per-author age-weighted citation rate is similar to the plain AWCR, but is normalized to the number of authors for each paper. AWCRpA Leo Egghe (2006) Given a set of articles ranked in decreasing order of the number of citations that they received, the g-index is the (unique) largest number such that the top g articles received (together) at least g^2 citations. It aims to improve on the h-index by giving more weight to highly-cited articles. g-Index Vinkler (2009) The pi-index is equal to one hundredth of the number of citations obtained to the top square root of the total number of journal papers (‘elite set of papers’) ranked by the decreasing number of citations. Pi-index

Modified H-Index Metrics Scientists with the same H-Index:

Modified H-Index Metrics Scientists with the same H-Index Ref Description Measure Ruane and Tol (2008) It first calculate how many new citations are needed to increase the h-index by one point. Let m denote the additional points needed. Thus the rational hD=h1+1-m/(2h+1) . Rational H-Index Distance Ruane and Tol (2008) A researcher has an h-index of h if h is the largest number of papers with at least h citations. However, some researchers may have more than h papers, say n , with at least h citations. Let us define x= n-h . Thus the rational H-Index become hX=h+x/(s-h) where s is the total number of publications. Rational H-Index X Chun-Ting Zhang (2009) The (square root) of the surplus of citations in the h-set beyond h^2, i.e., beyond the theoretical minimum required to obtain a h-index of 'h'. The aim of the e-index is to differentiate between scientists with similar h-indices but different citation patterns. e-index

Modified H-Index Metrics To share the fame in a fair way multi-authored manuscripts:

Modified H-Index Metrics To share the fame in a fair way multi-authored manuscripts Ref Description Measure Batista et al. 2006 It divides the standard h-index by the average number of authors in the articles that contribute to the h-index, in order to reduce the effects of co-authorship; Individual h-index It first normalizes the number of citations for each paper by dividing the number of citations by the number of authors for that paper, then calculates hI,norm as the h-index of the normalized citation counts. This approach is much more fine-grained than Batista et al.'s; it more accurately accounts for any co-authorship effects that might be present and that it is a better approximation of the per-author impact, which is what the original h-index set out to provide Norm Individual h-index Schreiber (2008) Schreiber's method uses fractional paper counts (for example, only as one third for three authors.) instead of reduced citation counts to account for shared authorship of papers, and then determines the multi-authored hm index based on the resulting effective rank of the papers using undiluted citation counts. Schreiber Individual h-index

Modified H-Index Metrics Age Adjusted :

Modified H-Index Metrics Age Adjusted Ref Description Measure Sidiropoulos et al. (2006) It adds an age-related weighting to each cited article less weight to older articles. The weighting is parametrized; If we use gamma=4 and delta=1, this means that for an article published during the current year, its citations account four times. For an article published 4 years ago, its citations account only one time. For an article published 6 years ago, its citations account 4/6 times, and so on. Contemporary h-index Jin (2007) It is an age-weighted citation rate, where the number of citations to a given paper is divided by the age of that paper. Jin defines the AR-index as the square root of the sum of all age-weighted citation counts over all papers that contribute to the h-index. AR-index Like AR-index but sum over all papers instead (In particular, it allows younger and as yet less cited papers to contribute to the AWCR, even though they may not yet contribute to the h-index.) AWCR

Revised H-Index Metrics Others:

Revised H-Index Metrics Others Ref Description Measure The per-author age-weighted citation rate is similar to the plain AWCR, but is normalized to the number of authors for each paper. AWCRpA Leo Egghe (2006) Given a set of articles ranked in decreasing order of the number of citations that they received, the g-index is the (unique) largest number such that the top g articles received (together) at least g^2 citations. It aims to improve on the h-index by giving more weight to highly-cited articles. g-Index Vinkler (2009) The pi-index is equal to one hundredth of the number of citations obtained to the top square root of the total number of journal papers (‘elite set of papers’) ranked by the decreasing number of citations. Pi-index

Limitations of H-Index :

Limitations of H-Index The h-index ignores the importance of the publications Évariste Galois' h-index is 2, and will remain so forever. Had Albert Einstein died in early 1906, his h-index would be stuck at 4 or 5, despite his high reputation at that date. Ignore context of citations: Some papers are cited to flesh-out the introduction (related work) Some citations made in a negative context Gratuitous authorship

PowerPoint Presentation:

Education Subject Category…

Eigenfactor.org Scores:

Eigenfactor.org Scores Eigenfactor score: …the higher the better A measure of the overall value provided by all of the articles published in a given journal in a year; accounts for difference in prestige among citing journals. A measure of the journal’s total importance to the scientific community. Eigenfactor scores are scaled so that the sum of the Eigenfactor scores of all journals listed in Thomson’s Journal Citation Reports (JCR) is 100. Article Influence score: … the higher the better Article Influence measures the average influence, per article, of the papers in a journal. As such, it is comparable to the Impact Factor . Article Influence scores are normalized so that the mean article in the entire Thomson Journal Citation Reports (JCR) database has an article influence of 1.00. Still, it’s best to “compare” within subjects. Cost effectiveness: … the lower the better price / eigenfactor [2006 data]

Other Journal Ranking Efforts… :

Other Journal Ranking Efforts… SCImago Journal Rank (SJR) Similar to eigenfactor methods, but based on citations in Scopus Freely available at scimagojr.com More journals (~13,500] More international diversity Uses PageRank algorithm (like eigenfactor.org) 3 years of citations; no self-citations But: Scopus only has citations back to ~1995

SCImago:

SCImago

SCImago Journal Indicator Search…:

SCImago Journal Indicator Search…

SCImago Journal Search (Agronomy Journal):

SCImago Journal Search ( Agronomy Journal )

A Few Other Journal Ranking Proposals… many would like to use journal usage stats:

A Few Other Journal Ranking Proposals… many would like to use journal usage stats Usage Factors – Based on journal usage (COUNTER stats [Counting Online Usage of Networked Electronic Resources] ) uksg.org/usagefactors/final Y factor , a combination of both the impact factor and the weighted page rank developed by Google (Bollen et al., 2006) MESUR: MEtrics from Scholarly Usage of Resources – Uses citations & COUNTER stats http://www.mesur.org/MESUR.html

Other Measures for Evaluating Researchers (Tang, et al. 2008):

Other Measures for Evaluating Researchers (Tang, et al. 2008) Uptrend - Nothing can catch people's eyes more than a rising star. Uptrend measures are used to define the rising degree of a researcher. The information of each author’s paper including the published date and conference's impact factor. We use Least Squares Method to fit a curve from published papers in recent N years. Then we use the curve to predict one's score in the next year, which is defined as the score of Uptrend, formally

Other Measures for Evaluating Researchers (Tang, et al. 2008):

Other Measures for Evaluating Researchers (Tang, et al. 2008) Activity - People's activity is simply defined based on one's papers published in the last years. We consider the importance of each paper and thus define the activity score as:

Other Measures for Evaluating Researchers (Tang, et al. 2008):

Other Measures for Evaluating Researchers (Tang, et al. 2008) Diversity - Generally, an expert's research may include several different research fields. Diversity is defined to quantitatively reflect the degree. In particular, we first use the author-conference-topic model (Tang, et al. 2008) to obtain the research fields for each expert.

Other Measures for Evaluating Researchers (Tang, et al. 2008):

Other Measures for Evaluating Researchers (Tang, et al. 2008) Sociability - The score of sociability is basically defined based on how many coauthors an expert has. We define the score as : where # copaper c denotes the number of papers coauthored between the expert and the coauthor c. In the next step, we will further consider the location, organization, nationality information, and research fields.

Richard Van Noorden (2010):

Richard Van Noorden (2010 )

Bibliometrics Predictive Power:

Bibliometrics Predictive Power Prediction of Nobel Laureates – The Thomson Reuters rank among the top 0.1% of researchers in their fields, based on citations of their published papers over the last two decades. Since 2002, of those named Thomson Reuters Citation Laureates, 12 have gone on to win Nobel Prizes. Jensen et al. (2009) used measurements to predict which f the CNRS researchers will be promoted: h index leads to 48% of “correct” promoted scientists number of citations gives 46% number of published papers only 42%.

Research Questions:

Research Questions Primary Questions: To which extent do bibliometrics reflect scientists ranking in CS ? Which single measure is the best predictor? How should different measures be combined? Secondary Questions: Which type of manuscripts should be taken into consideration? Does Self-Citation really matter? Which citation index is better?

Research Methods:

Research Methods Retrospective analysis of scientists’ careers: Correlating academic positions with bibliometrics values that evolve as time goes by. AAAI Fellowship Using Data Mining Techniques for building: A snapshot classifier for ranking scientists to their academic position. A decision making model for promoting scientists. A classifier for deciding who should be awarded the AAAI Fellowship each year. Comparative analysis

Process:

Process

ISI Web of Knowledge:

ISI Web of Knowledge Coverage Most Journals (13,000 journals) Some Conferences (192,000 conference proceedings) Almost no Books (5,000 books) All patents (23 million patents) 256 subject categories in Science, Social Sciences, and Arts and Humanities, covering the full range of scholarship and research Many citations (716 million) Only Citations that are fully match are Accuracy Very few errors Very few missing values No Duplications

Google Scholar:

Google Scholar Coverage The largest Still has limited coverage of pre-1990 publications It is criticized for including gray literature in its citation counts (Sanderson, 2008) Accuracy Missing values Wrong values Duplicate entries

Why CS?:

Why CS? Variety of sub-fields with different citation patterns (Bioinformatics vs AI). Different types of important manuscripts (Journal, Conferences, Books, Chapters, Patents, etc). Evolving field (senior professors completed their PhD in other fields). We are personally interested in this field

Task 1: Nominating Committee:

Task 1: Nominating Committee

Inclusion/Exclusion Criteria:

Inclusion/Exclusion Criteria 47 Researchers Researchers from Stanford, MIT, Berkley and Yale Completed their PhD after 1970 Researcher name can be disambiguated CV: Promotion years are known No short-cut in the career. Total of 724 “research years”. ISI - Total number of items: 50K (2300 written by the targeted researchers). Google Scholar - Total number of items: 300K

H-Index Over Time (for 7 professors) :

H-Index Over Time (for 7 professors)

Citations Over Time (for 7 professors) :

Citations Over Time (for 7 professors)

Evaluation:

Evaluation Procedure: Leave One Researcher Out Base Classifier – Logistics Regression Publication type All – All All – Journals Journals - Journals Self-Citations: All Self-Citation 1 (the target researcher is not one of the authors) Self-Citations 2 (no overlap between original set of authors and the citing paper)

Task 1.1: Ranking Researchers:

Task 1.1: Ranking Researchers Rank a researcher to one the following positions, given only a snapshot of her bibliometrics measures: Post Assistant Associate Full Note that we are not aware to scientist previous position or seniority. Default accuracy = 35% Assistant Associate Full Post

The Ranking Task – Results Top 10 Measures:

The Ranking Task – Results Top 10 Measures Measure Self-Citation Level Citing Manuscript Type Cited Manuscript Type Source Classification Accuracy g-Index 1 Journal Journal ISI 59.95% g-Index 0 Journal Journal ISI 59.30% g-Index 2 Journal Journal ISI 59.30% Norm h-index 0 Journal All ISI 58.65% Norm h-index 1 Journal All ISI 58.65% Norm h-index 2 Journal All ISI 58.65% Norm h-index 1 Journal Journal ISI 58.00% Norm h-index 0 Journal Journal ISI 57.74% Norm h-index 2 Journal Journal ISI 57.74% Rational H Index X 2 Journal Journal Google 57.48% Combination of several measures does not improve accuracy Using other classification algorithms does not result in higher accuracy

The Ranking Task – Results Least Predictive Measures:

The Ranking Task – Results Least Predictive Measures Measure Self-Citation Level Citing Manuscript Type Cited Manuscript Type Source Classification Accuracy # Publications * * Journal Google 37.06% Individual # Publications * * Journal Google 37.06% Schreiber h-index 0 Journal Journal Google 37.19% Individual h-index 1 All All Google 38.10% Individual h-index 2 All All Google 38.10% Schreiber h-index 1 All All ISI 38.10% Schreiber h-index 0 All All ISI 38.23% Schreiber h-index 2 All All ISI 38.23% Schreiber h-index 0 Journal All ISI 38.75% Schreiber h-index 2 Journal All ISI 38.75% * Statistical significance has been found

Not by bibliometrics alone:

Not by bibliometrics alone Years from PhD Accuracy = 73.7 % !!!

Task 1.2: Promoting Researchers:

Task 1.2: Promoting Researchers Given the researcher’s current position and her bibliometrics measures, decide if she should be promoted. Measure the absolute deviation in years from the actual promotion time.

Promotion Decision Task - Results:

Promotion Decision Task - Results * No statistical significance has been found * About 2% of the cases, our system has not recommended to promote a researcher although this promotion actually took place. Average Full Associate Assistant Citing Manuscript Type Cited Manuscript Type Self Citations Level Source Calculated as Measure 1.51 1.88 1.58 1.26 Journal All 1 Google Absolute Value Rational H-Index 1 1.55 1.88 1.68 1.26 All Journal 0 Google Change from Last Rank Total Citations 1.55 1.88 1.68 1.26 All Journal 2 Google Change from Last Rank Total Citations 1.56 1.88 1.71 1.26 All Journal 1 Google Change from Last Rank Total Citations 1.56 1.79 1.74 1.28 Journal All Google Change from Last Rank Norm Individual H-Index … … … … … … … … … … 1.80 2.38 2.03 1.30 Journal Journal 1 Google Change from Last Rank Individual H Index 1.81 2.17 2.00 1.46 All Journal 1 Google Absolute Value Contemporary H Index

Not by bibliometrics alone:

Not by bibliometrics alone Average Full Associate Assistant Measure 1.51 1.88 1.58 1.26 Rational H-Index 1 1.45 2.38 1.72 1.02 Years from Phd Promoted to Associate-6 years from PhD Promoted to Full –13years from PhD

Google Scholar vs. ISI Thomson:

Google Scholar vs. ISI Thomson

Google Scholar vs. ISI Thomson:

Google Scholar vs. ISI Thomson

Self-Citations:

Self-Citations

Which Manuscripts Should be Taken into Consideration?:

Which Manuscripts Should be Taken into Consideration?

Which Citing Manuscripts Should be Taken into Consideration?:

Which Citing Manuscripts Should be Taken into Consideration?

Conclusions – Take 1:

Conclusions – Take 1 Seniority is a good indicator for promoting scientists in leading USA universities. Variation in bibliometrics among scientists slightly contribute to the promotion timing. No significant difference between ISI and Google Self-Citation is not so important After all, journals are more reliable than other publications.

Task 2: And the AAAI Fellowship Goes To:

Task 2: And the AAAI Fellowship Goes To

AAAI Fellowsihp:

AAAI Fellowsihp Try to determine if and when an AI scientist is qualified to be elected to the AAAI Fellowship Data set: 92 researchers that won the award from 1995 to 2009 only 200 randomly selected AI researchers with at least 5 papers in top tier AI Journals/Conferences Using ISI data. Google Scholar Coming soon

Task 2.1 – Leave One Scientist Out:

Task 2.1 – Leave One Scientist Out Average Performance Criterion 21% Not Identifying a fellow (False Negative) 8.2% Wrongly identifying a non-fellow (False Positive)

Using a single measure:

Using a single measure H-Index Fellows Average Performance Criterion 48% Not Identifying a fellow (False Negative) 6.1% Wrongly identifying a non-fellow (False Positive)

Task 2.2 – Predicting Next Year Fellows:

Task 2.2 – Predicting Next Year Fellows

Task 2.2 – Predicting Coming Fellows:

Task 2.2 – Predicting Coming Fellows

Rules Example:

Rules Example (TC/A = '(65.7085-inf)') and ( TP/A = '(26.084-inf)') and ( Ih = '(3.565-inf)') and ( CpY = '(13.191-inf)') => FellowWon =TRUE (49.0/5.0) (Pi = '(0.645-inf)') and (AWCR = '(1.0555-3.6035]') and (TC/A = '(80.875-inf)') => FellowWon =TRUE (29.0/3.0) (TP = '(7.5-inf)') and (e = '(6.595-inf)') and (TP = '(47.5-inf)') and (AWCR = '(1.0735-3.849]') and ( AWCRpA = '(2.1705-inf)') and ( SIh = '(0.5-3.5]') => FellowWon =TRUE (18.0/1.0 ) …

Task 2.3 – Social Network :

Task 2.3 – Social Network Based on the idea of Erdos number Predict f ellowship based on co-authorship with other fellows. http://academic.research.microsoft.com/VisualExplorer.aspx#1802181&84132

Task 2.3:

Task 2.3 Average Performance Criterion 52% Not Identifying a fellow (False Negative) 6.6% Wrongly identifying a non-fellow (False Positive) Average Performance Criterion 21% Not Identifying a fellow (False Negative) 8.2% Wrongly identifying a non-fellow (False Positive) + = Average Performance Criterion 16% Not Identifying a fellow (False Negative) 5.9% Wrongly identifying a non-fellow (False Positive)

Task 2.3:

Task 2.3 (Count >= 5) and ( CpP >= 7) and ( TP/A >= 6.883) => Fellow=TRUE (51.0/3.0) (TP/A >= 22.944) and ( Avg <= 3.266667) and ( TP <= 40) => Fellow=TRUE (23.0/3.0) (Count >= 5) and ( e >= 7.071) and ( CpP <= 1.618) => Fellow=TRUE (11.0/1.0 ) …

Conclusions – Take 2:

Conclusions – Take 2 Bibliometric measures can be used to predict fellowship Combining various measures using d ata nining techniques improve prediction power Co-authorship relations can slightly boost the accuracy

Very Near Future Work:

Very Near Future Work Adding G oogle scholar dataset Examine the contribution of conferences in predicting the fellowship. Tell Me Who Cite You, …

Why God Never Received Tenure at Any University :

Why God Never Received Tenure at Any University He had only one major publication. It was in Hebrew. It had no references. It wasn't published in a refereed journal. Some even doubt he wrote it himself. It may be true that he created the world, but what has he done since then? His cooperative efforts have been quite limited. The scientific community has had a hard time replicating his results. He never applied to the Ethics Board for permission to use human subjects. When an experiment went awry, he tried to cover it up by drowning the subjects. When subjects didn't behave as predicted, he deleted them from the sample. He rarely came to class, just told students to read the book. Some say he had his son teach the class. He expelled his first two students for learning. Although there were only ten requirements, most students failed his tests. His office hours were infrequent and usually held on a mountaintop.

References:

References JOHAN BOLLEN, MARKO A. RODRIGUEZ, HERBERT VAN DE SOMPEL, Journal status, Scientometrics , Vol. 69, No. 3 (2006) 669-687 Christenson J A, Sigelman L. Accrediting knowledge: Journal stature and citation impact in social science. Soc. Sci. Quart. 66:964-75, 1985. RAAN, A. F. J, VAN (2006), Performance-related differences of bibliometric statistical properties of research groups: cumulative advantages and hierarchically layered networks, Journal of the American Society for Information Science and Technology, 57 (14) : 1919–1935. EPSTEIN, D. (2007), Impact factor manipulation. The Write Stuff, 16 : 133–134. ANTONIA ANDRADE, RAÚL GONZÁLEZ-JONTE, JUAN MIGUEL CAMPANARIO, Journals that increase their impact factor at least fourfold in a few years: The role of journal self-citations, Scientometrics , Vol. 80, No. 2 (2009) 517—530 Peter Vinkler , The pi-index: a new indicator for assessing scientific impact, Journal of Information Science, Vol. 35, No. 5, 602-612 (2009) Peter Vinkler , An attempt for defining some basic categories of scientometrics and classifying the indicators of evaluative scientometrics , Scientometrics , Vol. 50, No. 3 (2001) 539-544 Peter Jacso , Testing the Calculation of a Realistic h-index in Google Scholar, Scopus, and Web of Science for F. W. Lancaster, LIBRARY TRENDS, Vol. 56, No. 4, Spring 2008 pp. 784-815 R. K. Merton, “The Matthew Effect in Science,” Science, vol. 159, no. 3810, pp. 56–63, January 1968. J. Beel and B. Gipp , “The Potential of Collaborative Document Evaluation for Science,” in 11th International Conference on Digital Asian Libraries (ICADL'08), ser. Lecture Notes in Computer Science (LNCS), G. Buchanan, M. Masoodian , and S. J. Cunningham, Eds., vol. 5362. Heidelberg (Germany): Springer, December 2008, pp. 375–378. Tang, J. and Zhang, J. and Yao, L. and Li, J. and Zhang, L. and Su, Z., Arnetminer : Extraction and mining of academic social networks, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 990--998, 2008, ACM. B H Weinberg, The Earliest Hebrew Citation Indexes, JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 48(4):318–330, 1997 Richard Van Noorden (2010), A profusion of measures, Nature Vol 465 Leo Egghe , Raf Guns, Ronald Rousseau(2011), Thoughts on Uncitedness : Nobel Laureates and Fields Medalists as Case Studies M.H. MacRoberts and B.R. MacRoberts , Problems of Citation Analysis: A Study of Uncited and Seldom-Cited Influences (2011)

You do not have the permission to view this presentation. In order to view it, please
contact the author of the presentation.

Send to Blogs and Networks

Processing ....

Premium member

Use HTTPs

HTTPS (Hypertext Transfer Protocol Secure) is a protocol used by Web servers to transfer and display Web content securely. Most web browsers block content or generate a “mixed content” warning when users access web pages via HTTPS that contain embedded content loaded via HTTP. To prevent users from facing this, Use HTTPS option.