QuickGuide
From Clairv
To Windows users: currently Clairv can only be deployed on *nix (including Linux) machines. We are working on the Windows version.
Contents |
Assumptions
Before we start, we have to make some assumptions in order to ease the following description. Note that these assumptions are not restrictions.
We suppose the following:
- You are running Linux OS on your machine.
- It is guaranteed by prior conditions that all the operations described below have no permission problems (read, write, and execution).
We will try to take the OS distinction into consideration in the following guide. Users of other OSes (especially Windows) should feel much troublesome. If there are problems caused by this distinction, please feel free to leave a message in the forum.
Okay, let's start to build your search engine.
Installation and Configuration
Download the Clairv all-in-one binary package and decompress it. If you download the source package, please refer to the installation guide to build it and adjust upon the following steps.
After decompression, you get two directories: indexer and frontend. As the names show, the former folder contains stuff used to index files and the latter one offers a front end. Claiv currently consists of these two modules, each of which has little dependencies on each other. For complete API references of these two modules, please refer to the User Manual section.
The locations of the two modules do not matter. However, you have to prepare a place to store the indices which will be collected. Let's say you decide to place them in /home/<user>/clairv/indices where <user> is your username and you have write permission over this directory.
Now open the file config.xml in the indexer directory using your favorite editor. You will probably see the something like the following:
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd"> <bean id="config" class="net.sf.clairv.index.Config"> <property name="config"> <value> indexDir=/home/foo/clairv/indices </value> </property> </bean> <bean id="indexBuilder" class="net.sf.clairv.index.BufferedResourceProcessor"> <property name="analyzer"> <bean class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> </property> <property name="maxBufferSize" value="300"/> </bean> </beans>
If you have experience in programing with Spring, you should be familiar with it. Next you need to tailor this configuration to adapt your environment and requirement. Two typical resources, which are also the only two kinds of resources currently supported by Clairv, are the database resource and the file system resources. Let us show the usage of them in the following scenario.
- You want to search files with .doc (MS Word format) or .ppt (MS PowerPoint format) extensions in the /home/ftp/ directory
- You also want to search the contents of the forum hosted on your site
Before we dive into the configuration for the resources, we have to explain some terminologies and issues.
- Resource
- The abstraction of the physical resources mentioned above. It is the most important concept in Clairv. The corresponding
- Field -
Let's configure these two resources one by one.
Configuring FTP Contents Resource
Here we use net.sf.clairv.resource.DefaultFileSystemResource. Add a bean in your config.xml like the following:
<bean class="net.sf.clairv.resource.DefaultFileSystemResource"> <property name="name" value="Slides"/> <property name="baseDirs"> <list> <value>/home/ftp/</value> </list> </property> <property name="fileHandler"> <bean class="net.sf.clairv.resource.ExtensionFileHandler"> <property name="extensionMappings"> <value> ppt=net.sf.clairv.index.builder.impl.POIPptBuilder doc=net.sf.clairv.index.builder.impl.POIDocBuilder </value> </property> </bean> </property> <property name="documentListeners"> <list> <bean class="net.sf.clairv.resource.url.SimpleUrlFieldGenerator"> <property name="urlPattern"> <value>ftp://localhost/${filePath}</value> </property> </bean> </list> </property> <property name="hitTextPattern"> <value> <![CDATA[ <a href="${url}">${fileName}</a><br/> ${body} <br/><span class="annotation">${url} - ${size}</span> ]]> </value> </property> <property name="searchableFields"> <list> <value>body</value> <value>fileName</value> </list> </property> </bean>
Producing Indices
Now we are ready to build our indices against the configured resources. This step is fairly simple as long as you have configured your resources in config.xml. Change your current working directory to the indexer sub-directory and invoke the startup script by issuing ./start.sh (as stated, we will offer the Windows batch later.)
Setting up the Web Front End
Clairv comes with a simple web front end for searching resources. You can, of course, directly use the searcher API in your own application, and completely ignore built-in one.
More with Clairv
The guide above should be sufficient for typical use. If you want to learn more, you would probably like to see "How to extend Clairv" and the User Manual

