The result of this milestone can be found at source:tags/RELEASE-0.2.0.1
The previous steps are needed to make it runable via cron based systems.
You can do a scanning of a repository by command line, a searching in the result of a scanning. You can create
a configuration to let SupoSE do the work of time based scanning.
Source can be accessed via source:branches/B_0.4.0
Test to integrate the Apache Framework Tika to see if we can use it.
Bug Fix Release to fix problem with commons collections (source:/repositories/browse/supose/branches/R_0.5.0.1)
Here we go on...
--query "+filename:*Repos*"or
--query "+filename:*.java"
New Release
This new milestone is intended to create a different project structure to represent planned features like SOAP, RESTlet interfaces etc. The development of this milestone is done on the following branch source:/branches/B_0.7.0
The current source can be found for the 0.7.1 release source:/trunk
The usual pattern for branches
------------------------------------------------------------------------ r242 | kama | 2009-04-02 21:02:01 +0200 (Do, 02 Apr 2009) | 1 line Changed paths: A /branches/F184 (from /trunk:241) - Branch to implemente Feature#184------------------------------------------------------------------------
You would like to start from scratch with SupoSE.
supose scan --url URLRepository --create --index index.Test
supose scan --url URLRepository --create --index index.Test --torev 500
supose scan --url URLRepository --fromrev 501 --index index.Test
Searching is more or less simple to define. You just give the index you like to use and the query you have.
supose search --index index.Test --query "QUERY"
If you like to define which fields will be printed out in the result you have to use the *--fields* option. Their you can give different field names to define that they should be printed out. So the following command will search in the index with the given query and will print out the fields revision and message.
supose search --index index.Test --fields revision message --query "QUERY"
You can schedule the scanning process if you like to get rid of the hand scanning (or may be you thought about) cron based scanning of the repositories. The more comfortable way will be to let SupoSE do the work for you.
Command line parameters are
supose schedule --configuration ./repositories.ini --configbase ./
If you got multiple index files and want to merge them into one, you've got to enter something like the following
supose merge --destination ./mergedindexfolder --index ./firstindexfolder ./secondindexfolder ./thirdindexfolder
If you would like to see what command line options are also possible, just type
supose --help
(1) For all examples you have to be aware to append ".sh" or ".cmd" for the appropiate platform you are working on. For Linx/Unix you have to append ".sh" and for Windows you have to append ".cmd".
(2) I suggest you to use file:// protocol for the scanning to get maximum performance, but http:// protocol is working too, but a little slower.
The configuration for the scheduler uses a single file which defines the different information to access the repositories, which URL, which
authorization information will be used and when you like to run the scanning process and of course where to put the scanned information.
Let us start with the following simple configuration file for a single repository which should be scanned on a time based system
[SupoSE] url = http://svn.soebes.de/supose indexusername = anonymous indexpassword = guest fromrev = 1 torev = HEAD resultindex = summary
The first part which is given in square brackets, describes a unique name for the repository. This name is stored in a field which can later be used to make a separation in search queries. An other need for the name is to distinguish different entries from each other in
the configuration file.
The url defines the URL which is used to access the repository. Here you can use http/https/file/svn/svn+ssh/ protocol whatever you need.
The next parts indexuser and indexpassword will define the authorization information to access the repository. The fromrev and the torev will define from which revision till which revision the scan process will run for the first time.
The last part will define where to put the result of the scanning process.
You can configure multiple repositories to be scanned by SupoSE. To do this you need to give the information for every repository in it's own section
of the configuration file. This means you have to enhance the above configuration file for every new repository.
[SupoSE] url = http://svn.soebes.de/supose indexusername = anonymous indexpassword = guest fromrev = 1 torev = HEAD [AntSVK] url = http://svn.soebes.de/antsvk indexusername = anonymous indexpassword = guest fromrev = 1 torev = HEAD [GForge] url = http://svn.soebes.de/gforge indexusername = anonymous indexpassword = guest fromrev = 1 torev = HEAD
[SupoSE] url = http://svn.soebes.de/supose indexusername = anonymous indexpassword = guest fromrev = 1 torev = HEAD cron = 0 * * ? * * [AntSVK] url = http://svn.soebes.de/antsvk indexusername = anonymous indexpassword = guest fromrev = 1 torev = HEAD cron = 0 0 2 ? * * [GForge] url = http://svn.soebes.de/gforge indexusername = anonymous indexpassword = guest fromrev = 1 torev = HEAD cron = 0 0 18 ? * *
Detailed explanations of the cron expression can be found here and here and further details can be seen here
The index is a result of the scanning process for every repository every time it will be scanned. SupoSE is designed in that way to put the result of every scan process into a result index which can be used to do the real search.
If you like you can configure to scan e.g. three repositories and put the results of the scanning into a single index but if you like you can put the results into two different indexes e.g. Repository 1 and 2 can be put into an index result1 and the result of Repository 3 can be put into an index result2.[SupoSE] url = http://svn.soebes.de/supose indexusername = anonymous indexpassword = guest fromrev = 1 torev = HEAD resultindex = indexresult1 [AntSVK] url = http://svn.soebes.de/antsvk indexusername = anonymous indexpassword = guest fromrev = 1 torev = HEAD resultindex = indexresult1 [GForge] url = http://svn.soebes.de/gforge indexusername = anonymous indexpassword = guest fromrev = 1 torev = HEAD resultindex = indexresult2
During the scanning process, all files will be read from the Subversion Repository and it is checked if for a particular document type (decided by the extension) exists a special document handler.
This document handler will get the whole contents of the file and can do with it what it's like to do. The basic idea is to scan e.g. Word, Excel files, using 3rd party libraries like POI etc. to extract the text information from such kind of files. The scanned information will be stored in the index and can
be searched later.
If you have an idea for an particular document type just give me a hint about it and what kind of data will be of interest.
Older Releases are available via the archive.
Just download the package and extract the contents.
After that you can just call the supose.sh (supose.cmd on Windows based systems) and give a command line like this:
supose.sh scan --url http://svn.soebes.de/supose
Some performance hints. Whenever you can try to use an file:// access method, cause it's faster than http protocol. Don't blame me with
questions: "It's too slow...".but ftp requires the passwords.....need to know them before this ..donot blame me.
This is an aggregation Milestone to summarize all feature request which today are not scheduled for a particular milestone. And of course feature requests which had made by users via the web site.
The following fields will be added during indexing of every document (file/directory):
Starting with the Release 0.6.0 the installation has been simplified. Just download the current release (0.6.0 or later) for Windows (*-bin.zip) or Unix (*-bin-unix.tar.gz) and unzip/untar the distribution package. Now you have two choices:
set path=%path%;C:\supose-0.5.3\bin
export PATH=$PATH:/home/username/supose-0.5.3/bin
Or the other opportunity is to call SupoSE directory from the bin directory of the distribution.
Currently you will get in trouble if you try to scan large repositories (more 5000-6000 revisions) with large files (100 mibi and above), cause you will get an OutOfHeapSpace exception.
This bug is fixed with Release 0.2.0.1.
mvn release:prepare------------------------------------------------------------------------ r234 | kama | 2009-02-21 21:17:01 +0100 (Sa, 21 Feb 2009) | 1 line Changed paths: A /tags/R_0.5.1 (from /trunk:232) R /tags/R_0.5.1/pom.xml (from /trunk/pom.xml:233) [maven-release-plugin] copy for tag R_0.5.1 ------------------------------------------------------------------------
The basic idea to scan multiple repositories is to have scanning part which configurable by parameters to scan different repositories and store the resulting index into different directories.
The second idea behind the scene which is coming up with such an approach is, how to configure the scanning process handling multiple repositories.
Support things like "ParentPath" like Apache Module supports for multiple repositories.
Merge the indexed of different repositories together to get a single index which is searchable.
May be we make it possible to merge indexes by configuration and can define an searchable result for different combinations of repositories.
OpenOffice uses a file format called "Open Document Format" or ODF (ISO/IEC 26300).
The format is based on XML and available for everyone.
For more details on ODF, see wikipedia.
The current release 0.6.1 can scan a replicated repository via file protocol in relatively short time.
The Subversion RepositoryThe following has been tested with SupoSE (0.6.2 RC1 - 447)
The "Apache Software Foundation Repository" (02.April 2010)Here i have documented the circumstances of the test. Currently working on improvements #309
After the improvements the new release needed only 23 hours to index the whole repository incl.
merging the indexes together.
So this means after the performance improvements the whole indexing process for the ASF Repository took less than 23 hours (ok ok 6 mins less ;-)).
The current planning of Releases is given at the Roadmap. There you can see which release are planned and which
features, Bugs etc. will be solved for the particular milestones.
The current state of the Milestones are more or less beta state. Any information about bugs/improvements or feature requirements etc.
are welcome. You can find the releases in the download area I have started a list with known bugs.
(TODO: Finish this....)
This page will describe how releases will be named and which naming convention is behind that.
Basically the Release number consist of the following part:
major.minor.patch[.bugfix]-RevNumber
Currently we have the major number 0, so this means everything can be changed....
I hope to stabelize the first things in 0.6.0 may be in 0.7.0
The plan is to have Major Release (x.0.0)
new features etc. may be break backwards compatibility and
Minor releases can break backwards compatibility.
Changes allowed
This page will outline how releases will be handled
Major Release (x.0.0)
SupoSE currently doesn't has a Major releases (1.0.0).
Minor releases can break backwards compatibility.
Changes allowed
Changes not allowed
None at this moment.
Patch Release (0.0.x)
Patch releases shouldn't break backwards compatibility.
Changes allowed
Changes not allowed
Never answered 'til yet...
The queries will comprise of Fields and their needed values. A more detailed description of the query syntax can be found on the Lucene Query Syntax page. The available command etc. be found in the command description.
Simple kind of questions:
+message:fixed +message:1If you miss the second part you will get a bunch of result where someone has written "Fixed" anywhere in the log message.
+tag:*
+path:*/tags/*
+filename:/tags/* +node:dir
+filename:/tags/* +node:(dir unknown)If you like to see the deleted tags only just change to:
+filename:/tags/* +kind:d
+filename:/branches/* +kind:d
+filename:*.doc
+filename:*.doc +kind:d
+propertyname:YYou have to be a little bit careful, cause usually property names have ":" in their names. This has to be done:
+svn\:executeable:*This will search for all files which have the exectuable flag set. If you like to search for the asteric itself you have to change this to
+svn\:executable:\*.
+revision:[1 TO 100]
If you like to see the result of a current Maven 2 build just take a look here.
For the Unit Testing are we use the TestNG framework.
The command line analyzer uses the Apache Commons CLI2 library of which unfortunately no release package has been released until now. So if you like to build SupoSE from source you have to download the package and create a .jar which should be put into the local maven repository by using the following
mvn install:install-file -Dfile=commons-cli2-2.0-dev.jar -DpomFile=commons-cli2-2.0.pom -Dclassifier=dev
You can download those needed files here.
Starting with the [milestone:"0.4.0 Mars" Milestone 0.4.0 Mars], i have introduced a first Java parser which is able to extract any kind of comments and the method names from a given Java source code file. These informations will be put into the created index during
the scanning process.
Take a look at the following example file:
package com.soebes.supose.parse.java;
/*
* Default comment.
*/
public class Test1 {
private String value;
/** This is a JavaDoc comment */
public Test1 () {
}
/* Comment voidMethod1
*/
public void voidMethod1() {
}
//Line Comment staticMethod1
public static void staticMethod1() {
}
public string getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
private void helperMethod() {
setValue("test");
}
}
The current parser will extract all the comments and all the method names, except the constructor. May be i will change this if more is needed.
If you like to search for contents of files you can use the contents field which contains the content of the files.
By using the contents field within the search query you now define the pattern you would like search for. The pattern can contain wildcard as you might expect they to work e.g. known by the command line.
The following query will find all entries which contain the word Company in the contents field of all revisions in all paths.
+contents:Company
If you like to search multiple words you can simply use the following:
+contents:Word +contents:Word2
If you need to search for a particular phrase you can use this:
+contents:"This is the Phrase"
If you like to Search for filenames you can use the filename field which contains the filename.
By using the filename field within the search query you now define the pattern you would like search for.
The pattern can contain wildcard as you might expect they to work e.g. known by the command line.
The following query will find all filenames which contain the given pattern in all revisions and all paths within the repository.
+filename:*.txtA common search pattern to search for all existing word files (.doc)
+filename:*.doc
+filename:*.docx
If you know only parts of your filename you are searching for just simply define those parts in the query. The following query will search for file which contains the scm inside. This will find filenames which contain uppercase or lowercase written characters or in other words the search is
non-case-sensitive.
+filename:*scm*.doc
The following will search for the svn:externals property which contains https.
In other words it will search for any entry which contains an svn:externals which uses https protocol.
+svn\:externals:*https\:*The following will narrow down the above to entries which reference svn.apache as part of their externals reference.
+svn\:externals:*https\://svn\.apache*Will search any revision/path etc. if the property svk:merge has been used
+svk\:merge:*You can of course search with more informations in like the following
+svn\:mergeinfo:*/subversion/branches/1.5.x*
If you like to search for particular revisions in the repository you can use the revision field which contains the revision number of the repository.
The following query will find all entries which are related to revision 20 in the repository. This is more or less equivalent to svn log -r 20 URL -v.
+revision:20
If you like to use multiple revisions you can simply use:
+revision:20 +revision:30
If you like to use a revision range you can use the following which is more or less equivalent to svn log -r1:200 -v URL.
+revision:[1 TO 200]
This will search in the revisions from 1 till 200.
If you like to search for tags or branches you can use the tag or branch field which contains the names of the tags or branches.
By using the tag field within the search query you will define the pattern you would like search for. The pattern can contain wild card as you might expect they to work e.g. known by the command line.
The following query will find all tags which are existing in all revisions within the repository.
+tag:*
The result list will contain all existing tags incl. Maven Tags which have a particular pattern.
If you like to see only the list of tags without the Maven Tags just extend the query as follows:
+tag:* -maventag:*
If you like to search for Subversion Tags (particular type used by the Subversion Team) you can use a search query as follows:
+subversiontag:*
If you like to search for a branch you can use the branch field to define the pattern for the branch name you would like to search for.
+branch:*
svn_version.h.------------------------------------------------------------------------ r34864 | hwright | 2008-12-19 20:58:27 +0100 (Fr, 19 Dez 2008) | 1 line Changed paths: A /tags/1.5.5 (from /branches/1.5.x:34862) M /tags/1.5.5/subversion/include/svn_version.h Tagging 1.5.5 with svn_version.h matching tarball. ------------------------------------------------------------------------
SupoSE is an abbreviation for *Su*bversion Re*po*sitory *S*earch *E*ngine.
This is an abbreviation for SupoSE *We*b-*F*ront-*E*nd.
We will start a simple Web-Front-End for SupoSE which is called SupoSEWeFE
The usual pattern for Tags is:
------------------------------------------------------------------------ r226 | kama | 2009-02-20 16:09:02 +0100 (Fr, 20 Feb 2009) | 2 lines Changed paths: A /tags/R_0.5.0.1 (from /branches/R_0.5.0.1:225) - Release 0.5.0.1 - Bugfix release for Issue#169------------------------------------------------------------------------
Under active development (If you find things which are not clear etc. just drop a ticket).
This is a Java based approach to do real searching within a complete Subversion repository. Based on performance issues and so on, I have decided not to do a real time scanning within the Repository. I have decided to do a scanning of the whole content of a Subversion repository. The result, called index can be used to do real searching. An other purpose of this approach is to be able to search through multiple repositories instead of one.
With the exception of binary files where no particular document handler exists, all files will be indexed. This means we do index Word, Excel and Powerpoint files (2007 Office variants as well).
This means we do not index only the trunk or the HEAD revision, we index all revisions on all paths within a whole Repository. Filename, path, log message, properties etc. are made searchable (see Fields for further details).
If you think for what you can search take a look at
the questions you never tried to ask your repository
The up-to-date builds can be looked at here: http://78.46.16.202:8080/jenkins/job/SupoSE-default/ or http://78.46.16.202:8080/jenkins/job/SupoSE-site/
You can find a detailed description of the features etc. in the users guide.
An overview about the release plan and the currently existing can be found here.
The requirements describe what is needed to build SupoSE from source code.
If you like to checkout the current state of development you can simply check the Subversion Repository
The source code and the application which i have written is licensed under the The GNU General Public License Version 2. Other parts of the application in particular the 3rd party libraries have different licenses.
In the Download Area you find all current releases of the project.
In the archive you can find older releases if you like to take a look into.
Currently a nightly build can give you an up-to-date release before real delivery date.
Information about the changes which had been made can be seen at the bulletin board.
An overview about the releases can be found in the Release Notes.
The release management is described on the ReleaseManagement page in detail.
-Dsvnkit.http.methods=Basic,Digest,NTLMA detailed explanation can be found here.
The searching and scanning of repositories is currently done by a command line interface (at the moment). The description
of the Command Line describe what you can do and which options are available. You can of course use
Luke too. Or take a look at the command line examples. For a detailed
description of the queries and features take a look at the users guide.
If you like to post feature requests, questions, suggestions, bugs or anything about SupoSE please use the ticket system to check if the bug might have been reported already or you if you like to report a new one just use the new issue area
or just write an email to me.
It would be nice if you give an email address so i can get in contact with to ask question etc.