Dots of Tech Perception

September 14, 2008

Paper Published

Filed under: bugzilla,defect prediction,mozilla,mozilla firefox — GG @ 6:19 pm

So finally my work in defect prediction was published. Here is a presentation.

March 5, 2008

Intuitively linking CVS and Bugzilla

The main problem with using software repositories in defect prediction is the lack of integration of the CVS history files and defect tracking systems. You can link the problem reports(PRs) and modification reports(MRs) using the PR identification number available both in the MRs in CVS and in the PRs in Bugzilla. However, the real challenge is to associate the bug reports in Bugzilla with the specific Firefox releases. The data collection process took place at moment t and the goal was to collect bugs that were in the source code the moment of the release, t1 (Figure below). This is not trivial as the following example illustrates. Suppose at the time of the release t1 a defect was in the source code. If the defect was solved after the release, say at t2 or t’, the bug at t3, when we collected the data, is labeled as being resolved.

An approximate of the defects that were in the source code at the time of a Firefox release t1, to be all the bugs with creation timestamp before t1 AND:

1.with the status CLOSED, RESOLVED or VERIFIED after t1 OR
2.with the commited to CVS timestamp after t1 OR
3.with the status NEW, ASSIGNED or REOPENED at t3 .

It may happen that a bug was solved, the commit message exists in the CVS history file at t2, but the bug status was not modified in Bugzilla environment. It may also be the case that the commit message in CVS is not reflecting the change performed, it does not have a PR identification number associated with it, even if the change resolves a problem and it is reported in Bugzilla at t’. We also selected only the PRs with the severity marked as blocker, critical, major, normal, minor, and where applicable with the resolution set to FIXED. The problem with this approach is that there may be defects in the code undiscovered at moment t1, and that will be reported after the release. Because there is no way to tell to which release the bugs belong we simply did not consider them.

Mining for defects – Mozilla Firefox

Filed under: bugzilla,defect prediction,mozilla,mozilla firefox — GG @ 1:58 am

The main problem with using software repositories in defect prediction is the lack of integration of the CVS history files and defect tracking systems. You can link the PRs with MRs using the PR identification number available both in the MRs in CVS and in the PRs in Bugzilla.

A real challenge is to associate the bug reports in Bugzilla with the specific Firefox releases. The data collection process takes place at moment t3 and the goal is to collect bugs that are in the source code the moment of the release, t1. This is not trivial as the following example illustrates. Suppose at the time of the release t1 a defect was in the source code. If the defect was solved after the release, say at t2 or t’, the bug at t3, when data is collected, is labeled as being resolved.

But we are dealing with an open source environment. It may happen that a bug was solved, the commit message exists in the CVS history file at t2, but the bug status was not modified in Bugzilla environment. It may also be the case that the commit message in CVS is not reflecting the change performed, it does not have a PR identification number associated with it, even if the change resolves a problem and it is reported in Bugzilla at t’.

There is a lot of debate with respect to whether size and complexity can predict defects. We argue that there is value in size and complexity metrics with respect to defect prediction and that research should rather focus on to what extent can size and complexity predict defects or in what particular cases we can predict defects based on size and complexity metrics. In this context, we present our interpretation of the results.

Data Collection – Mining for Defects

Filed under: bugzilla,defect prediction,mozilla,mozilla firefox — GG @ 1:50 am

As a non critical software system, it is widely recognized that Firefox contains post release defects. OSS facilitates the collection of data to be used in defect prediction models. An important requirement for OSS code is that it should be rigorously modular, self-contained and self explanatory, to allow development at remote sites. Therefore, the data that can be used for prediction models in OSS could be retrieved from the source code version repositories (CVS) and bug tracking systems (Bugzilla). On the other hand, OSS development is characterized by lack of a formal process, poor design and architecture, and development tools that are not comparable to those used in commercial development. Few of the defect prediction approaches in commercial software can be directly applied to OSS development, however results obtained from OSS prediction models can be used in an industry environment.

1. Versions: Firefox is based on independent Mozilla Core components layered together. Due to this architecture some of Mozilla’s applications share many components, but they are fundamentally different in functionality.

The Mozilla source code is organized in several branches. The trunk is the main branch, the central source code that is used for continuous and ongoing development. Trunk builds contain the very latest changes and updates. However, the trunk can also be very unstable at times. When development is started for a specific Mozilla version a new branch is created. At conception, a derived branch contains everything that the principal branch contains. Firefox 1.0 branch was derived from Mozilla Branch 1.7 while Firefox 1.5 from Mozilla Branch 1.8. Firefox branches that are forked from the existing Mozilla branch will be used for all future releases of Firefox. The term release is used in OSS development to refer to different types of releases: major and minor, alpha and beta.

Firefox Branch 1.5.0.3 resynchronized the code base with the trunk which contained additional features not available in Firefox 1.0. On the other hand, in release 1.5.0.3 the focus was not on adding features but on improving security related aspects, which were bypassed in version 1.5.0. This peculiarity of the three selected releases allowed us to test if the performance of a defect prediction models increases when trained on data collected from major releases instead of minor ones.

2. Module Selection: The reason behind branching is that components that need to be prepared for a future release are at the same time continuously developed on the trunk. A distinction needs to be made between Firefox-specific source code, i.e. code that does not support any other Mozilla application, and the Mozilla components that support Firefox.

3. Metrics: To derive the product metrics for each source file Understand C++ can be used. The tool computes the source code metrics for C and C++ programs and generates metrics reports. The reports contain three categories of metrics: project level, file level, and function level. It also contains object oriented metrics for the .cpp files.

The reason behind branching is that components that need to be prepared for a future release are at the same time continuously developed on the trunk. A distinction needs to be made between Firefox-specific source code, i.e. code that does not support any other Mozilla application, and the Mozilla components that support Firefox.

March 4, 2008

Mozilla Bugzilla Reporting Process – aka a bug’s lifecycle

Filed under: bugzilla,defect prediction,mozilla firefox — GG @ 6:51 am

The Mozilla project relies on Bugzilla, a defect tracking system, to monitor problem reports (PR), i.e. bugs. A PR in Bugzilla has several pre-defined attributes. Some fields, such as the PR identification number and creation timestamp, are created when the report is first filed. Other fields, such as the product, component, and severity, are selected by the testers when the report is filed and may be changed over the lifetime of the report. Other fields routinely change over time, such as the current status of the report, and if resolved, its resolution state.

Studying the lifecycle of a bug facilitates linking the Bugzilla PRs and CVS Modification Reports (MRs). The status and resolution fields define bugs as evolving entities that change over time. When a tester enters a new bug in Bugzilla the status of the bug is set to UNCONFIRMED. The Mozilla quality assurance team will look at it and confirm the bug exists and changing its status to NEW. After a developer looks at the bug and either accepts it or assigns it to someone else, the bug’s status becomes ASSIGNED. Once the bug is fixed, its status changes to RESOLVED. Finally, the quality assurance team verifies that the bug was indeed fixed and the status is set to VERIFIED and then CLOSED. If the quality assurance team is not satisfied with the solution, than the bug is REOPENED and the process starts again. A report can be RESOLVED in various ways. Bugzilla PRs indicate this in the resolution field. If the bug was solved and this resulted in a change to the code base, the bug is resolved as FIXED. When a developer determines that the bug is a duplicate of an existing report then it is marked as DUPLICATE. If the developer is unable to reproduce the defect, then the resolution is set to WORKSFORME. If the report describes a problem that will not be fixed, i.e. it is not an actual bug, the report is marked as WONTFIX or INVALID.

In Bugzilla terminology, a bug can be anything that needs to be tracked. Some entries are not real bugs, i.e. defects, but rather enhancements. When analyzing a report in Bugzilla, the quality assurance team rates severity of the bug using one of the following labels: blocker, critical, major, normal, minor, trivial, or enhancement.

While Bugzilla contains information about defects, it does not contain information about the location of the defects in the source code. Instead, this information is captured in the CVS log files. CVS Modification Reports (MRs) keep the complete history of any file in the project, including when and what was modified. Bonsai, Mozilla’s web interface to its CVS repository, can be used to retrieve MRs related to source files, comments associated with the files, and the timestamp of the commit message. Each comment acknowledges the people who submitted the change and contains relevant PR identifications numbers (if any). Every number that appeared in a MR’s comment field was a potential link to a bug, indicating that that commit message solved a PR. We selected the number as a candidate for a bug id if the following two conditions were met: the number had the length less than 6 digits and the comment message contained the keywords bug, bug id, id or # before the number.

Blog at WordPress.com.