Bug_reports_AAAI

Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

AAAI 2011 CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems:

AAAI 2011 CosTriage : A Cost-Aware Algorithm for Bug Reporting Systems Jin-woo Park 1 , Mu- Woong Lee 1 , Jinhan Kim 1 , Seung -won Hwang 1 , Sunghun Kim 2 POSTECH, Korea, Republic of 1 HKUST, Hong Kong 2

Bug reporting systems:

Bug reporting systems Bugs!! More than 300 bug reports per day in Mozilla (a big software project) Bug Solving One of the important issues in a software development process Bug reports are posted, discussed, and assigned to developers Open sources projects Apache Eclipse Linux kernel Mozilla

Bug reporting systems:

Bug reporting systems Bug reports Has Bug ID, title, description, status, and other meta data Assigned to developers Fixed by developers Challenges Bug triage Duplicate bug detection … title (summary) description bug ID status other data bug fix history (time)

Bug reporting systems:

Bug reporting systems Bug Triage Assigning a new bug report to a suitable developer Bottleneck of bug fixing process Labor intensive Miss-assignment can lead to slow bug fix Open Source Project Bug Reports Triager assign Developers Can be automated!!! Bottleneck of the bug fixing process

Preliminary (recommendation):

Preliminary (recommendation) Recommender algorithms Content-based recommendation (CBR) Predicting user’s interests based on item features Machine learning methods Over-specialization problem Collaborative filtering recommendation (CF) Predicting user’s interests based on affinity’s interests for items User neighborhood Sparsity problem Hybrid recommendation Content-boosted collaborative filtering (CBCF) Combining an existing CBR with a CF Better performance than either approach alone

Preliminary (recommendation):

Preliminary (recommendation) Recommender algorithms Content-based recommendation (CBR) Predict user’s interests based on item features Use machine learning methods Learn features and recommend similar items Over-specialization problem Title Bug 1 Bug 2 Bug 3 Bug 4 Dev 1 10 9 1 ? Dev 2 6 5 10 ? Dev 3 8 7 7 ? Feature word 1 word 2 word 3 Count 1 1 4 Bug Report 4 Developers

Preliminary (recommendation):

Preliminary (recommendation) Recommender algorithms Content-based recommendation (CBR) Predicting user’s interests based on item features Machine learning methods Find and recommend similar items Over-specialization problem Title Bug 1 Bug 2 Bug 3 Bug 4 Dev 1 10 9 1 9 Dev 2 6 5 10 5 Dev 3 8 7 7 7 Feature word 1 word 2 word 3 Count 1 1 4 Bug Report 4 Developers

Preliminary (recommendation):

Preliminary (recommendation) Recommender algorithms Collaborative filtering recommendation (CF) Predicting user’s interests based on affinity’s interests for items User neighborhood Sparsity problem Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 ? Bug Reports

Preliminary (recommendation):

Preliminary (recommendation) Recommender algorithms Collaborative filtering recommendation (CF) Predicting user’s interests based on affinity’s interests for items User neighborhood Sparsity problem Bug 1 Bug 2 Bug 3 Developer 1 10 5 10 Developer 2 5 10 6 Developer 3 7 7 7 Developer 4 9 5 9 Bug Reports

Preliminary (recommendation):

Preliminary (recommendation) Recommender algorithms Collaborative filtering recommendation (CF) Predicting user’s interests based on affinity’s interests for items User neighborhood Sparsity problem Bug 1 Bug 2 Bug 3 Bug 4 Developer 1 10 5 10 ? Developer 2 5 10 6 ? Developer 3 7 7 7 ? Developer 4 9 5 9 ? Bug Report 4 Problem 1  No one solved new bug!!

Preliminary (recommendation):

Preliminary (recommendation) Recommender algorithms Collaborative filtering recommendation (CF) Predicting user’s interests based on affinity’s interests for items User neighborhood Sparsity problem Bug 1 Bug 2 Bug 3 Bug 4 Developer 1 10 5 ? ? Developer 2 ? ? ? ? Developer 3 ? ? 10 ? Developer 4 ? ? ? 3 Bug Report 4 Problem 2  Rating is extremely sparse!!

Preliminary (recommendation):

Preliminary (recommendation) Recommender algorithms Hybrid recommendation Content-boosted collaborative filtering (CBCF) Combining an existing CBR with a CF Better performance than either approach alone Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 ? ? ? ? Dev 2 ? 8 3 ? ? Dev 3 ? ? ? 7 ? Bug Report 5 Two Phases - CBR phase - CF Phase Feature word 1 word 2 word 3 Count 3 2 4

Preliminary (recommendation):

Preliminary (recommendation) Recommender algorithms Hybrid recommendation Content-boosted collaborative filtering (CBCF) Combining an existing CBR with a CF Better performance than either approach alone Bug Report 5 Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 10 10 10 10 Dev 2 3 8 3 8 3 Dev 3 7 7 7 7 7 CBR phase

Preliminary (recommendation):

Preliminary (recommendation) Recommender algorithms Hybrid recommendation Content-boosted collaborative filtering (CBCF) Combining an existing CBR with a CF Better performance than either approach alone Bug Report 5 CF phase Existing recommendation approaches are not suitable! Bug 1 Bug 2 Bug 3 Bug 4 Bug 5 Dev 1 10 9 7 9 8 Dev 2 5 8 3 8 5 Dev 3 7 8 5 7 6

Preliminary (Bug triage):

Preliminary (Bug triage) PureCBR [ Anvik06] Multi-class classification problem using a SVM classifier Applying the classifier to estimate developer’s score for a new bug report Bug reports B are converted into pair <F( B ), D> for training F( B ) is the feature vector indicating the counts of the keyword w of description of B D is the developer who fixed B (class) Classifier training Classifier’s scores Assign a new bug to Dev 1 Bug Fix History New Bug Report

Preliminary (Bug triage):

Preliminary (Bug triage) PureCBR [ Anvik06] Problem Over-specialization problem This approach only considers accuracy Are developer happy? → We consider developer’s cost (e.g., interests, bug fix time, and expertise) Bug Report 1 Bug Report 2 50 days 2 days

Goal:

Goal Goal Reformulating the bug triage problem Optimizing not only accuracy but also cost Constructing developer profiles for cost Challenge Enhancing CBCF approach for sparse data Extreme sparseness of the past bug fix history data A bug fixed by a developer Need to reduce sparseness for enhancing quality of CBCF Bug fix time from bug fix history

Overview:

Merging classifier’s scores and developer’s cost scores. The accuracy scores are obtained using PureCBR [Anvik06] The developer cost scores are obtained our proposed approach from extremely sparse bug fix history. Overview Accuracy scores Cost scores + = Hybrid scores Assign a new bug to Dev 1 New Bug Report Developer profiles Bug classifier <SVM> Cost Accuracy Cost score Accuracy score Aggregation Recommended Developer

CosTriage (Cost Estimation):

CosTriage (Cost Estimation) A Cost-aware Triage Algorithm for bug reporting system Using a topic modeling to reduce the sparseness Enhancing the quality of CBCF for extremely sparse data Categorization bugs to reduce the sparseness Bug fix time from bug fix history

CosTriage:

CosTriage Categorizing bugs Using topic modeling approach Latent Dirichlet Allocation (LDA) [BleiNg03] Each topic is represented as a bug type The topic distribution of reports determine bug types We adopt the divergence measure proposed in [ Arun , R. PAKDD ‘10] Finding the natural number of topics (# bug types) t is the natural number of bug types

CosTriage:

CosTriage Developer profiles modeling Developer profiles The element of developer profiles, Pu [ i ], denotes the developer’s cost for i th -type bugs T denotes the number of bug types Obtaining profiles The average time to fix i th type bugs

CosTriage:

CosTriage Predicting missing values in profiles Using CF for developer profiles Similarity measure: k=1

CosTriage:

CosTriage Obtaining developer’s cost for a new bug report New Bug Report Developer cost for a new bug Bug type = 1

Merging:

Merging Merging classifier’s scores and developer’s cost scores. Accuracy scores [Anvik06] Cost scores ( CosTriage ) + = Hybrid scores Bug Reports Developer profiles <Bug types> Bug classifier <SVM> Cost Accuracy Cost score Accuracy score Aggregation Recommended Developer

Experiments:

Experiments Subject Systems 97,910 valid bug reports 255 active developers From four open source projects Approaches PureCBR CBCF CosTraige

Experiments:

Experiments Two research questions Q1. How much can our approach improve cost (bug fix time) without sacrificing bug assignment accuracy? Q2. What are the trade-offs between accuracy and cost (bug fix time)? Evaluation measures W is the set of bug reports predicted correctly. N is the number of bug reports in the test set. The real fix time is unknown, we only use the fix time for correctly matched bugs.

Experiments:

Experiments Estimating Bug Fix Time The relative error of expected bug fix time Improving bug fix time (Q1) Costriage reduces the costs efficiently by 30% without seriously compromising accuracy

Experiments:

Experiments Trade-off between accuracy and bug fix time (Q2)

Conclusion:

Conclusion We proposed a new bug triaging technique Optimizing accuracy but also cost Enhancing CBCF for sparse data Using topic modeling to reduce the extreme sparseness Bug reports, Q&A, … Categorizing bugs Experiments using four real bug report corpora Apache, Eclipse, Linux, and Mozilla

Q & A:

Q & A Thank you! Do you have any questions?

Back up:

Back up We adopt the divergence measure proposed in [Arun10] Finding the natural number of topics (# bug types) t is the natural number of bug types Mozilla Apache

Back up - Bug features:

Back up - Bug features Bug features Keywords of title and description Other meta data New Bug Report Title : Traditional Memory Rendering refactoring request Description : Request additional refactoring so we can … Traditional Rendering. Remove stopwords Traditional Memory Rendering refactoring request Request refactoring … Traditional Rendering

authorStream Live Help