Difference between revisions of "Code Indexer"

Latest revision as of 14:19, 16 January 2007

Project Name

Source Code Indexing Service Analysis

Project Description

Mozilla’s source code is enormous—millions of lines of C, C++, JavaScript, Perl, Python, Java, C#, etc. Developers currently use the lxr system to quickly search and browse it on-line: http://lxr.mozilla.org. Mozilla is planning a move from CVS to Subversion for revision control, and at the same time wants to evaluate other source indexing services. Two BSD students are working to setup, document, and test other potential services (e.g., fisheye, opengrok, mxr) on one of the Seneca-Mozilla servers (see below). In each case this requires configuration changes and some scripting to get the services to properly integrate with Mozilla’s other on-line tools. When the test services are installed and synched with the live source tree, Mozilla will point its developers to them and get feedback—the students will help collect and synthesize this feedback.

Project Leader(s)

John Ford (John64)

Project Contributor(s)

Status

I now have some time to resume work on the project. This time I will manage my time more wisely ~~I have too many assignments. I need to do work on them, but I should have more free time during the Christmas break.~~

Options

Help with LXR/MXR/Bonsai development.
Make a sort-of branch respectful version of OpenGrok but this would be a very shoddy implementation that doesn't really do what it should
Setup one OpenGrok per active branch of the Mozilla Project, this would have no version history whatsoever, apart from file dates.
[Re]write major portions of how OpenGrok deals with history and changesets and the likes, this is my personal preference.
Try to fit Fisheye into the current development model, but it seems this might be more like finding a problem for a solution. This is a very very powerful tool, but it is not really like LXR or OpenGrok, it is more useful to analyze CVS/SVN histories more than search for functions, files, definitions and the likes. Fisheye is also extremely slow. Within my lan, it takes a long time to do any queries, and over the internet it is impossible. I don't know why, especially since OpenGrok uses the same basic technologies. (10 minutes plus for one page load on a lan connection, in its defence, it is still indexing the code)

Why I like OpenGrok

Apart from the fact that it does not support branches, this is in my opinion the perfect tool. It is fast, open souce and most importantly, it makes really easy to navigate, well thought out pages that just work. Because of the way OpenSolaris does file versions for their code, they don't use branches at all. OpenSolaris currently uses a linear method of file versioning, they don't use branches, they use versions as a sory of branch, basically the idea that Office 12 is the "2007 branch" and Office 11 is the "2003 branch". Mozilla doesn't do this, so it would be nessecary to implement this feature. Luckily, however, OpenGrok is very modularized and atomic in nature. If you go to the OpenGrok page, you can get a more complete explanation, but the basic jist of it is that there are many "Guru's", each with a task. The files are first read by the History Guru who looks at the file and decides what type of versioning the file uses. Once the versions have been analyzed, they are passed on to the file analyzer guru who then decides what type of file it is, and passes it on to a file type analyzer. The allows for portions of the code to be changed without changing the whole system, so if we wanted to be able to do special things with XUL/XPCOM as far as how to handle its symbols, we would write one module which is not dependent at all on any other file analyzer. The same way, if Mozilla switchs to SVN, we would just port the branching support to SVN. On the chances that Mozilla switches to something other than CVS or SVN, a HistoryGuru could be written for that type of versioning history. The OpenGrok project is under the CDDL which derives from the MPL 1.1

In closing, I really like the OpenGrok project because it is very fast, very powerful and VERY modular!

Links

Official Blurb just in case I forget what I am doing :P
John's School Page
CVS2SVN Tool to convert CVS to SVN. Will be used to test SVN interop.
Devmo:CVS Checkout
Devmo: Rsyncing the CVS
Devmo: CVS Tags - to get the branches to checkout
Tomcat5 on Ubuntu
Tomcat Tips
CVS on non-gnu.org
CVS "Guide"
Subversion
Blog Entry on OpenGrok
Java5 on Ubuntu - "sudo update-alternatives --config java" and "apt-get remove --purge java-gcj-compat"
Mozilla Jargon
Misc
CDDL - explanation of the diferences between the MPL and CDDL

Pulling CVS

This code will pull the CVS for the branches specified in @branches, or it did at some point, your mileage may vary

#!/usr/bin/perl
use strict;
use warnings;

# Pull CVS from the mozilla project server
# Where you want the branch folders
my $src_root = "/var/mozilla";

# Where is your run.sh for opengrok? (or equivalent script to start the indexer)
my $opengroker = "/var/opengrok/run.sh";

# Where is your server?
my $cvsserver = ':pserver:anonymous@hera.senecac.on.ca:/cvsroot';

# Branches to be pulled
my @branches = (
  "HEAD",
  "MOZILLA_1_8_BRANCH", 
  "MOZILLA_1_8_0_BRANCH", 
  "AVIARY_1_0_1_20050124_BRANCH",
  "REFLOW_20061031_BRANCH" 
);

# Descriptions for each branch, don't delete old ones for the sake of deleting them
my %descriptions = (
  "HEAD" => "Trunk - development branch",
  "MOZILLA_1_8_BRANCH" => "Firefox 2.0 - development branch",
  "MOZILLA_1_8_0_BRANCH" => "Firefox 1.5 - maintainance branch",
  "MOZILLA_1_7_BRANCH" => "Firefox 1.0 - maintainance branch",
  "AVIARY_1_0_1_20050124_BRANCH" => "Suite - maintainance branch",
  "REFLOW_20061031_BRANCH" => "Reflow Refactoring"
);

# Open the file or
open BRANCHLIST, ">$branchlistpath" or die "Could not open file";

# Clear out what ever source was there
system ("rm -rf ${src_root}/*");

foreach (@branches){
# Download the makefile, then checkout from the makefile
  system("
    mkdir ${src_root}/$_;
    cd ${src_root}/$_;
    cvs -d ${cvsserver} co -r $_ mozilla/client.mk;
    make -f ${src_root}/${_}/mozilla/client.mk checkout MOZ_CO_PROJECT=all;
  ");
}

system ("bash $opengroker");

@@ Line 10: / Line 10: @@
 ==Project Contributor(s)==
-==Project News and Details==
+==Status==
-====Status====
+I now have some time to resume work on the project.  This time I will manage my time more wisely
-* '''Issue:'''  OpenGrok is very demanding.  It has allready filled a 12GB VMware image, and i still dont know how large the index grows.  This could be a problem, not because i dont have space, but because this may be too much for the school server.  Likely OpenGrok would need a dedicated, reasonibly fast machine.  It seems to be a very powerful, extremely well done application but it is missing the ability to associate "blame" from what I see.  I also don't know how it takes care of file versioning.  This happened when indexing the entire 2.9GB mozilla CVSROOT.  If you take a look at the example OpenGrok I link to, you will see that its a really nice application!
+<strike>I have too many assignments.  I need to do work on them, but I should have more free time during the Christmas break.</strike>
-* Switched to a local VM from my P4 --[[User:John64|John64]] 21:44, 4 October 2006 (EDT)
-* Set Java's location in /etc/default/tomcat5 --[[User:John64|John64]] 20:15, 3 October 2006 (EDT)
-* Figured out port for Tomcat5, 8180 private.  Setting up port forwarding as we speak using public port 81 Oct 3, 2006
-* Using personal Machine instead of the VM due to it being double NAT-ed, and thus inaccessible from outside the host machine Oct 2, 2006
-* Set up server using Ubuntu 6.06LTS with Linux 2.6.17ck1 Oct 2, 2006
-* RSyncing machine Oct 2, 2006
-====Candidates====
+==Options==
-* [http://lxr.mozilla.org/ LXR]/[http://landfill.mozilla.org/mxr-test/ MXR]/[http://www.mozilla.org/bonsai.html Bonsai] - Not working on setting one up because there is already one
+* Help with [http://lxr.mozilla.org/ LXR]/[http://landfill.mozilla.org/mxr-test/ MXR]/[http://www.mozilla.org/bonsai.html Bonsai] development.
-* [http://gonzui.sourceforge.net Gonzui] - Impressive looking thing.  This will be my first target to setup due to its simplicity and apparently very powerful nature
+* Make a sort-of branch respectful version of [[OpenGrok]] but this would be a very shoddy implementation that doesn't really do what it should
-* [http://www.opensolaris.org/os/project/opengrok/ OpenGrok] - This is by far the coolest project I have come across so far.  It uses Java Server Pages, something I know nothing about, so lots of reading.  Since this is part of the opensolaris project, I have been thinking of trying to run it in an OpenSolaris Virtual Machine, as that OS is picking up steam.  It is available for Linux, and that is the target system.  By the way, this one is my favorite so far. Example: [http://cvs.opensolaris.org/source/ Opensolaris]
+* Setup one OpenGrok per active branch of the Mozilla Project, this would have no version history whatsoever, apart from file dates.
-* [http://www.cenqua.com/fisheye/index.html Fisheye] - Commercial Solution that is free (as in beer) for open source projects.  Before I start to look at it, I would like to exhaust the numerous open source prospects.
+* [Re]write major portions of how [[OpenGrok]] deals with history and changesets and the likes, this is my personal preference.
-* Hosted by Tigris - I forgot the name, but its hosted by "Tigris" and it looked pretty good.  If you know the name, please edit as appropriate
+* Try to fit [http://www.cenqua.com/fisheye/index.html Fisheye] into the current development model, but it seems this might be more like finding a problem for a solution.  This is a very very powerful tool, but it is not really like LXR or OpenGrok, it is more useful to analyze CVS/SVN histories more than search for functions, files, definitions and the likes.  Fisheye is also '''extremely''' slow.  Within my lan, it takes a long time to do any queries, and over the internet it is impossible.  I don't know why, especially since OpenGrok uses the same basic technologies. (10 minutes plus for one page load on a lan connection, in its defence, it is still indexing the code)
-* [http://savannah.nongnu.org/projects/horus Horus] - Not really what is needed, but its a nice interface for programming students own code.  I will not be actively doing anything to it.
-* [http://bazaar-vcs.org Bazaar] - I dont really know what this is, it might be what is needed, but it might be something irrelevant.
-* [http://sourceforge.net/projects/sourcenav/ Sourcenav] - investigating this project
-* [http://www.msnbc.msn.com/id/15138828/from/RS.5/ Google Code Search] - Investigating
-====Links====
+====Why I like OpenGrok====
+Apart from the fact that it does not support branches, this is in my opinion the perfect tool.  It is fast, open souce and most importantly, it makes really easy to navigate, well thought out pages that just work.    Because of the way OpenSolaris does file versions for their code, they don't use branches at all.  OpenSolaris currently uses a linear method of file versioning, they don't use branches, they use versions as a sory of branch, basically the idea that Office 12 is the "2007 branch" and Office 11 is the "2003 branch".  Mozilla doesn't do this, so it would be nessecary to implement this feature.  Luckily, however, OpenGrok is very modularized and atomic in nature.  If you go to the OpenGrok page, you can get a more complete explanation, but the basic jist of it is that there are many "Guru's", each with a task.  The files are first read by the History Guru who looks at the file and decides what type of versioning the file uses.  Once the versions have been analyzed, they are passed on to the file analyzer guru who then decides what type of file it is, and passes it on to a file type analyzer.  The allows for portions of the code to be changed without changing the whole system, so if we wanted to be able to do special things with XUL/XPCOM as far as how to handle its symbols, we would write one module which is not dependent at all on any other file analyzer.  The same way, if Mozilla switchs to SVN, we would just port the branching support to SVN.  On the chances that Mozilla switches to something other than CVS or SVN, a HistoryGuru could be written for that type of versioning history.  The OpenGrok project is under the [http://www.sun.com/cddl/ CDDL] which [http://www.sun.com/cddl/CDDL_MPL_redline.pdf derives from the MPL 1.1]
+In closing, I really like the OpenGrok project because it is '''very''' fast, '''very''' powerful and '''VERY''' modular!
+==Links==
 * [https://sparc.senecacollege.ca/portal.php?project&pid=23 Official Blurb] just in case I forget what I am doing :P
 * [http://matrix.senecac.on.ca/~jhford/ John's School Page]
 * [http://cvs2svn.tigris.org CVS2SVN] Tool to convert CVS to SVN.  Will be used to test SVN interop.
-* [http://en.wikipedia.org/wiki/Apache_Tomcat Tomcat on Wikipedia]
+* [http://developer.mozilla.org/en/docs/Mozilla_Source_Code_Via_CVS Devmo:CVS Checkout]
-* [http://developer.mozilla.org/en/docs/Mozilla_Source_Code_Via_CVS CVS Checkout]
+* [http://developer.mozilla.org/en/docs/Rsyncing_the_CVS_Repository Devmo: Rsyncing the CVS]
-* [http://developer.mozilla.org/en/docs/Rsyncing_the_CVS_Repository Rsyncing the CVS]
+* [http://developer.mozilla.org/en/docs/CVS_Tags Devmo: CVS Tags] - to get the branches to checkout
 * [http://www.ubuntuforums.org/showthread.php?t=219985 Tomcat5 on Ubuntu]
 * [http://www.onjava.com/pub/a/onjava/2003/06/25/tomcat_tips.html Tomcat Tips]
 * [http://www.nongnu.org/cvs/ CVS] on non-gnu.org
+* [http://durak.org/cvswebsites/doc/cvs.php CVS "Guide"]
 * [http://subversion.tigris.org/ Subversion]
 * [http://atucker.typepad.com/blog/2005/11/a_new_source_br.html Blog Entry] on OpenGrok
 * [http://ubuntuforums.org/showthread.php?t=124431 Java5 on Ubuntu] - "sudo update-alternatives --config java" and "apt-get remove --purge java-gcj-compat"
+* [http://www.mozilla.org/docs/jargon.html Mozilla Jargon]
+* [http://www.deitel.com/CodeSearchEngines/CodeSearchEngines_ResourceCenter_MerobaseOpenGrokCodeProject.html Misc]
+* [http://www.sun.com/cddl/CDDL_why_details.html CDDL] - explanation of the diferences between the MPL and CDDL
+==Pulling CVS==
+This code will pull the CVS for the branches specified in @branches, or it did at some point,  your mileage may vary
+<pre>
+#!/usr/bin/perl
+use strict;
+use warnings;
+# Pull CVS from the mozilla project server
+# Where you want the branch folders
+my $src_root = "/var/mozilla";
+# Where is your run.sh for opengrok? (or equivalent script to start the indexer)
+my $opengroker = "/var/opengrok/run.sh";
+# Where is your server?
+my $cvsserver = ':pserver:anonymous@hera.senecac.on.ca:/cvsroot';
+# Branches to be pulled
+my @branches = (
+  "HEAD",
+  "MOZILLA_1_8_BRANCH",
+  "MOZILLA_1_8_0_BRANCH",
+  "AVIARY_1_0_1_20050124_BRANCH",
+  "REFLOW_20061031_BRANCH"
+);
+# Descriptions for each branch, don't delete old ones for the sake of deleting them
+my %descriptions = (
+  "HEAD" => "Trunk - development branch",
+  "MOZILLA_1_8_BRANCH" => "Firefox 2.0 - development branch",
+  "MOZILLA_1_8_0_BRANCH" => "Firefox 1.5 - maintainance branch",
+  "MOZILLA_1_7_BRANCH" => "Firefox 1.0 - maintainance branch",
+  "AVIARY_1_0_1_20050124_BRANCH" => "Suite - maintainance branch",
+  "REFLOW_20061031_BRANCH" => "Reflow Refactoring"
+);
+# Open the file or
+open BRANCHLIST, ">$branchlistpath" or die "Could not open file";
+# Clear out what ever source was there
+system ("rm -rf ${src_root}/*");
-==Notes on Accessing Test Server==
+foreach (@branches){
-If you want to access the test server through anything other than port 80, you are going to have to type in the following address into a browser and note the IP address you get in your address bar.  This is because I have dynamic DNS.  Everything, including the source itself, will be in the http root for easy access to the files.  This is not optimal, and will not stay this way once things advance.
+# Download the makefile, then checkout from the makefile
+  system("
+    mkdir ${src_root}/$_;
+    cd ${src_root}/$_;
+    cvs -d ${cvsserver} co -r $_ mozilla/client.mk;
+    make -f ${src_root}/${_}/mozilla/client.mk checkout MOZ_CO_PROJECT=all;
+  ");
+}
-[http://superfind.bounceme.net Superfind] - Will resolve as www.no-ip.com computer, which is why you have to use a browser to get the IP
+system ("bash $opengroker");
-==Questions==
-Please edit in an answer if you know
-====What to index====
+</pre>
-Q: I was wondering if it is prefered to index the source for the current development or stable branch.  More specifically, I am unsure how each solution handles file versions. <br/>
-A: Please Edit Me

CDOT Wiki β