Delta debugging framework
Delta debugging framework
Delta debugging is an automated approach to debugging that isolates failures systematically. Given a failing test that can be mechanically verified (including a browser crash), delta debugging is a way of automatically isolating the change that introduced the failure. Having a framework in place to pull builds from CVS, bisect by date and change set (using bonsai data -- remember, CVS doesn't have changesets!), and report results would let computers make developers more productive.
Name(s) of primary people working on the project. If you want to join a project as leader, discuss with other leaders first. Include links to personal pages within wiki.
Name(s) of people casually working on the project, or who have contributed significant help. Include links to personal pages within wiki.
NOTE: only Project Leader(s) should add names here. You can’t add your own name to the Contributor list.
Provides more depth than the Project Description. This is the place for technical discussions, project specs, or other details. If this gets very long, you might consider breaking this part into multiple pages and linking to them.
Based on the papers on delta debugging that I've sifted through (with From Automated Testing to Automated Debugging by Andreas Zeller being the most understandable to non-math wizards, and Yesterday, my program worked. Today, it does not. Why? also by Andreas Zeller probably being the most relevant to our project), here is an outline of my understanding of delta debugging, a summary of the concepts that must be taken into account while working on the delta debugging framework project, and an outline of the conceptual stages of developing the delta debugging framework.
Delta debugging is an algorithm that can automatically and systematically isolate/narrow down the failure-inducing circumstances that are necessary to produce the bug. Delta debugging can be applied to isolating various types of failure-inducing circumstances, including:
- program input
- user interaction (key presses, button presses, mouse clicks, etc.)
- program code modification (adding / updating / deleting variables, functions, classes, etc.)
How? From my understanding, given a known bug and a known set of circumstances that can reproduce the bug, we can continually execute tests that vary the circumstances until a minimal subset of circumstances that can reproduce the bug is left. That is, for each test, a circumstance is removed, and if the bug is still present, than that circumstance can theoretically be eliminated as the cause of the bug from the set of circumstances. In this project's case, the circumstance will be certain change(s) to the program code that caused a regression in the program.
[Note: The rest of this post contains mathematical concepts that I may not fully understand but am trying to put into terms understandable to non-math wizards. A dangerous combination.]
Before we continue, here are some important terms to understand.
- Configuration. A subset of all changes made to source code since the last known good version and the version with the regression.
- Test. A test case or function that can determine whether a configuration contains the failure-inducing change (Fails), doesn't contain the failure-inducing change (Passes), or produces indeterminate results (possibly because of interference or inconsistency).
- Subset. Given two sets (A and B), A is a subset of B if and only if all of the elements of set A are in set B.
- Superset. Given two sets (A and B), B is a superset of A if and only if B contains all of the elements of A.
- Union. Given two sets (A and B), the union of A and B contain all elements of A and B.
- Intersection. Given two sets (A and B), the intersection of A and B are the elements that are in both A and B.
With regards to the delta debugging algorithm, we must take into account these possible concepts that may complicate it:
- Interference. Each individual change may not cause the program to regress, but applying a combination of changes together causes the regression. Thus, there must be a method to identify the set of changes that cause the regression.
In the case of interference, we must recursively test both halves with all changes in one half of the configuration remaining applied.
- Inconsistency. Changes in the source code may be dependent on other changes in the source code and without them the program cannot be compiled and tested successfully. Thus, there must be a method to identify and handle dependencies (such as a dependency graph).
[Wild, far left field thought: How can we ensure that when applying a configuration, it is consistent? Well, in the compiler process, there is a lexical analysis step which breaks the source code into tokens, then there is a dependence analysis step which produces constraints/dependencies between the tokens. Theoretically, if we can harness parts of the compilation process, we could have a method of knowing the dependencies between the changes in the source code.]
- Granularity. A single logical change may consist of hundreds of lines of code, yet only a couple lines of the change may be responsible for the regression. Thus, There must be a method to break the change into smaller manageable chunks.
- Monotony. If a change causes a failure, any configuration that includes this change fails as well (makes sense to me). If a configuration does not cause a failure, then all subsets of the configuration do not cause a failure, thus they can be eliminated from the change set (makes some sense to me. However, if my understanding is correct, while a subset of the configuration does not cause the failure by itself, the concept of interference suggests that the subset combined with another configuration may cause the regression).
- Unambiguity. A failure is caused by only one configuration (No interference). Thus, for efficiency, we do not have to search the other configuration for more failure-inducing changes. Whether or not a configuration is ambiguous or unambiguous, a failure-inducing change will be produced. However, for completeness with regard to finding all failure-inducing changes, it is preferred to search both configurations.
Now that we are aware of the different concepts that we must take into account with regards to delta debugging, the next section will outline some facts and assumptions that are being made, and attempt to define the vision and process of the delta debugging framework.
Project Facts and Assumptions
- The source tree for the Mozilla project is HUGE. With many different source file types (C++, JS, XUL, etc.) in many different directories.
- Failure-inducing change(s) will unlikely be localized to a single directory and file. Failure-inducing change(s) may be spread across many different directories and source files.
- The source files could be of the same type (C++), mixed type (C++, JS), same directory, different directory. It shouldn't matter. The framework should be source type and location agnostic.
- The failure-inducing change(s) may not be localized to a single developer. The failure-inducing change(s) may have been caused by another developer's change(s) to a source file they were working on. That is, a single developer's source scope may not be encapsulated but interconnected and interdependent on other developers source code.
- The developer's current version of the source code contains the regression.
- The developer has a test case that can be used indicate whether the test passes/fails/is indeterminate.
- The developer will NOT know the date/version of the last known good version.
- Bonsai is a tool that can produce a list of differences between versions of a source file. (Bonsai's functionality has not been examined closely yet but will have to as it may be a key component to the framework)
Possible Vision of the Delta Debugging Framework:
(subject to change based on stakeholder consultation/feedback, feasibility study)
- Since the last time a developer executed a test case that passed, the developer modified some source files. The source files may be of the same type or mixed type, same directory or different directory. It shouldn't matter. The framework should be source type and location agnostic. Upon executing the test case again, the result is now a failure. The developer panics. It's only days before the deadline to submit bug patches before the source tree is supposed to be closed for release and the bug is a blocker. The developer doesn't want to be shamed for delaying the release, and the source code is too complex to find the bug in time, so what should they do? Use the delta debugging framework! that's what. How? you may ask. Well keep reading to find out. * scenario may vary.
- The delta debugging framework may require the developer to input one piece of information. The test case/function that used to pass but now fails. It will be used to determine whether the source files with progressive changes passes/fails the test.
- Once the developer has inputted this piece of information, it will use Bonsai to query the source tree and compile a list of all the changes to the source files since a certain amount of time.
- (If there was a method of determining change dependencies so as to eliminate the possibility of inconsistencies, it would be done in this step. One possible way of reducing the possibility of inconsistencies is to logically group changes by location or check in time.)
- This step would be where the delta debugging algorithm would come into play. The algorithm should basically:
- Recursively, incrementally remove changes from the source code with the regression.
- Recompile the source tree.
- Execute the test case. There may be 3 outcomes:
- The test case passes. We know that the failure-inducing change(s) are in the change(s) that were removed.
- The test case fails. We know that the failure-inducing change(s) are not exclusively in the change(s) that were removed. I say not exclusively because of the concept of Interference (described above).
- The test case is indeterminate. There were some inconsistencies.
The flowchart represents the simplistic version of the delta debugging algorithm. It will theoretically find a failure-inducing change set but not necessarily the minimal set or the full set of failure-inducing change(s). The algorithm is depicted as recursively linear however it could be binarily recursive. In the linear version, the theoretical maximum number of iterations (worst case scenario) is:
where n represents the total number of changes and r is a subset of n.
In other words, the summation of the combinations of changes without repetitions that can be made given that the size of the change set can vary from 1 to n.
FLOWCHART TO BE UPDATED SOON. I HOPE.
Project Source Repository
Assuming you have SVN, the project's source can be obtained via SVN using the following command:
svn checkout svn://cdot.senecac.on.ca/deltadbg
Project Task List
|GNU Make (http://www.gnu.org/software/make/)|
|Wrapper around the GNU make utility||Mozilla uses the GNU make utility to build their source tree. your mission is to make a wrapper around the GNU make utility so that the make command can be programmatically called to build the source tree.||Richard Chu||Work in progress. Initial wrapper created: makewrapper.pl. Requires thorough test case (maketest.pl needs more test cases).|
|Subversion (SVN) Repository (http://subversion.tigris.org/, http://svnbook.red-bean.com/nightly/en/index.html)|
|Wrapper around the necessary SVN commands||For the automated debugging to work, we may need to automatically modify the working copy by reverting to a different revision or updating certain directories and files. It may also need to know the differences between revisions and changesets.||Richard Chu||Work in progress. Initial wrapper created: svn.pl. Currently has subroutines for commit, update, diff, and checkout commands. May need to wrap other SVN commands. Requires thorough test case (svntest.pl needs more test cases).|
|Query SVN repository for revision history||Everytime a user commits with svn, the revision number is incremented. Your mission is to find the relevant commands that will return all of the changes that were committed between 2 different revision numbers. How is this data stored? formatted? what meta-data is available? How can we parse this data into usable format?||Richard Chu||Work in progress.|
|Query SVN repository for differences between revisions||Your mission is to find out the relevant commands that can return the differences between two revisions, the meta-data that is kept with each revision, how differences between two revisions are stored and formatted, and how this data can be parsed into a usable form for our project (wrapper?).||Richard Chu||Work in progress.|
|SVN revert between revisions||Your mission is to find out the relevant commands that can revert changes in the local sandbox to earlier revisions so that the changeset in the revision can be tested.||Richard Chu||Work in progress.|
|CVS/Mozilla Bonsai (http://www.mozilla.org/bonsai.html)
You can do these tasks by trying to interpret the Bonsai source code yourself, or preferably by finding a person who has intimate knowledge of the Bonsai source code and asking them.
|Query CVS via Bonsai for checkins||You can use Bonsai to search for the checkins made within a certain time frame, within a certain directory, made by a certain developer, etc. Your mission is to find the relevant source files, functions, variables, etc. that drive this functionality.||TBD.||Not started.|
|Results of querying CVS via Bonsai for checkins||Bonsai obviously returns results from the query. The question is how? Your mission is to find the relevant source files, functions, variables, etc. that is used to return and store results. What you need to find out is what type of data is returned and how are the results formatted?||TBD.||Not started.|
|Query CVS via Bonsai for version history||For each file in the CVS repository, Bonsai has the ability to list a history of versions. From the first to the latest. Your mission is to find out the relevant source files, functions, variables, etc. that are used to obtain the version history for a certain source file.||TBD.||Not started.|
|Results of querying CVS via Bonsai for version history||Your mission is to find out the relevant source files, functions, variables, etc. that are used to return and store the results of searching for a file's version history. What data is returned and how is the data stored/formatted?||TBD.||Not started.|
|Query CVS via Bonsai for differences between versions||Using Bonsai, a user can see the differences between two different versions of a source file. Your mission is to find out the relevant source files, functions, variables, etc. that are used to find the differences between two different versions of a source file.||TBD.||Not started.|
|Results of querying CVS via Bonsai for differences between versions||Using Bonsai, a user can see the differences between two different versions of a source file. How are the results returned? How is it formatted? Your mission is to find out the relevant source files, functions, variables, etc. that are used to return the results of the query. You are to find out how the data is returned and how it is formatted.||TBD.||Not started.|
NewsForge: An Introduction to Delta Debugging Delta debugging simplifies debugging process in a program by automating the process and continually splitting the program to smaller chunks called deltas. This technique is useful in three circumstances:
- Error occurs due to user inputs (e.g. keypress, file I/O). Delta debugging is used to eliminate user actions irrelevant to the nature of the error and pinpoint the cause of the error.
- Error occurs due to recent changes to the code. In this situation, deltas are retrieved from the net differences from both codes.
- Multithreading environment. Delta debugging can track down the exact order of operations originating from multiple threads that caused the error.
We need to focus on one of these circumstances. Judging from the project description, we should work on the first case, while perhaps opening the door to future expansion.
- Bonsai issue -- unresolved
This is where your regular updates will go. In these you should discuss the status or your work, your interactions with other members of the community (e.g., Seneca and Mozilla), problems you have encountered, etc. Put detailed technical information into the Project Details page (i.e., update it as you go), and save this section for news about participation in the project.
Oct. 17, 2006
The quest to build a suitable "wrapper" for the build system is not doing well. So far I have only observed various build logs posted on Tinderbox for the SeaMonkey build. Here are some observations I come up with.
- Successful builds - green or yellow on Tinderbox
seamonkey-bin exists, is nonzero, and executable. seamonkey-bin exists, is nonzero, and executable. seamonkey-bin binary exists, build successful.
- Broken builds - red on Tinderbox
gmake: *** [nsAttrValue.o] Error 1 ... gmake: *** [alldep] Error 2
Analyzing build logs may be the "lazy" solution to building a wrapper to detect build failures, but that's the best thing we've got so far. Searches for any of the strings listed above in LXR did not yield valuable result, which means the build system remains a mystery to us. Thoughts:
- Tinderbox builds rely on report logs to capture build data. If we can figure out how to get tap into this process, we can better plan our prototype.
- Tinderbox performs a build operation first, and then automatically starts a bunch of tests (e.g. MozillaAliveTest, regxpcom test). These tests do not exist within LXR, either, which means they reside outside the tree. If we can find the script that fires these tests, we can probably create our first prototype as a test and hook it into the build system.
So many questions... Perhaps I should start hanging out at #build.
Oct. 15, 2006
Updated the file svn.pl. Added the diff subroutine. It's a wrapper around the svn diff command. Updated the update subroutine. Added another argument and made them both optional. Subroutine can now accept a directory/file argument in addition to revision number.
Also, currently working on creating some subroutines that can break up a changeset based on the type requested. Changeset types include:
- Revision. A revision may be a changeset. However, MANY changes may occur in each revision so it may be wise to break up the revision into smaller changesets.
- Directory. All changes made within a directory may be considered a changeset. Again, MANY changes may occur within a directory, so it may be wise to break it up into smaller changesets.
- File. All changes made within a file may be considered a changeset. Again, MANY changes may occur within a file, so it may be wise to break up the revision into smaller changesets.
- Code block. Logical code blocks may be considered a changeset.
- Line. A line in a file is the lowest change denominator.
Except for the first type (revision), changeset may stagger between more than one revision. The lower down the list you go, the higher the possibility of inconsistencies (inability to successfully compile source tree).
Oct. 10, 2006
Added two files to the SVN repository: makewrapper.pl and maketest.pl. I have no idea if the files will be useful or if I just wasted my time.
makewrapper.pl is a light wrapper around the GNU make utility used to build the source tree. This wrapper was created to be able to programmatically execute the make command with various options. One thing I haven't figured out yet is how to get the exit status code that GNU make returned (0 - successful; 2 - error) after it finishes executing so that I can return that exit code in the build subroutine.
maketest.pl just tests the subroutines in makewrapper.pl.
Starting to get a hang of Perl again after not using it for quite some time.
Oct. 09, 2006
Added two files to the SVN repository: svn.pl and svntest.pl. I have no idea if the files will be useful or if I just wasted my time.
svn.pl is a light wrapper around a couple svn commands used to manipulate the svn repository and user's working copy. This wrapper was created to be able to programmatically execute SVN commands. One thing I haven't figured out yet is how to get the exit status code that is returned after every SVN command is executed as I would like to return this exit code in the subroutines.
svntest.pl just tests the subroutines in svn.pl.
Of note, Perl's object-orientedness is quite different from the other object-oriented languages that I am used to such as C++ and Java.
Oct. 03, 2006
Discussed the project with dave humphrey. The gist of the discussion is:
<vipers101> dave: I hereby present you, richard!
<vipers101> richard: say something!
<dave> pffft...way to play like a team
<vipers101> I prefer to have my team share my griefs
<dave> let's assume you've got a program that used to work, but now it crashes. you'd like to figure out what changes to the code introduced the crash. you can programmatically test for the existence of the crash. so you want to have the computer go back in time and figure out what was the delta (e.g., changeset) where the crash started. you want to be able to isolate the problem down to a set of changes to a set of files. further if possible. let's say you have revisions 200 to 500, and you know the bug is caused by *something* in there. so you want to bisect your way through those changesets (in SVN each of those numbers is a changeset) until you can see where the bug was introduced
<vipers101> dave: you mean the bugs are not introduced by the recent work?
<dave> you don't know. you can't assume. maybe, maybe not. remember that working on a huge codebase means that there will be lots of changes going on around your work. so if you do a checkin, they take the weekend off, 50 other people could checkin before your next commit. what you need is a way to test for the problem (maybe a return code). then you need to be able to automatically pull and build the code, and run that test. wash, rinse, repeat
<richard> if i get this, i can't assume that the user will already have a test case/function that returns pass/fail?
<dave> right. you probably need to write something to figure this out. so a hung browser might mean you have a timer that watches for an event. or you might wait on a return code. a crash is probably easiest to test for. this is what I mean by writing a "wrapper".
<vipers101> richard: I think we just have a new task list
Sept. 26, 2006
I read through your documentation here, and it is looking good. I also spoke to Shaver by phone this morning, and we chatted briefly about this project. He suggests that you start your work by looking for a suitable Crash Case, one that happens reliably. Then you need at what would be necessary in order to bisect the change set (e.g., bonsai data) in order to get closer to the change that introduced the bug. Shaver suggested that robc (Rob Campbell) might be a good person to help you brainstorm on this.