|
||||||||||
|
|
||||||||||
|
HTTP-Server ConfigurationIt is not necessary to run any http-server on the harvest-machine.The best idea is to run the Harvest-CGI-Scripts on a separted machine (e.g. the Webserver of your institution). The nph-search.cgi will deal all the communication between the webserver and the Harvest-Broker. For this it will need a file called "Brokers.cf" of the format: #NAME HOST PORT Physics harvest.physik.uni-oldenburg.de 8501 UniOldenburg harvest.physik.uni-oldenburg.de 8531 ...All files needed for this are located in $HARVEST_HOME/cgi-bin and its subdirectory "lib".
RunHarvestHarvest should be started now.The easiest way to do this is by calling $HARVEST_HOME/RunHarvest. This program asks the customer a series of questions about the environment in that Harvest should run - subsequently it starts a Gatherer and a Broker. The Progam asks:
Actually, this is very impractical.
ps -ef |grep harvest find out the PID of the prozesses gatherd, broker and glimpseserver. Remove these prosesses with kill -9 (PID#) It is better to restart these processes manually at a later time, after the Gatherer and the Broker now put on are individually configured.
Gatherer ConfigurationThe Gatherer collets data that are lying on the WWW-Server.The Gatherer should be configured carefully for this reason - especially, in order to prevent that it indicates data that should not be collected. RunHarvest creates a .cf-File in the Gatherer directory. By editing this file is it possible to configure the Gatherer. A detailed listing of possibilities with examples are found in the Harvest manual; also have a look at the examples in
$HARVEST_HOME/gatherers/example-...
Attention: Example: The Gatherer lands on a privat homepage. There is a link to a search engine (e.g. Yahoo). So it jumps from this page to Yahoo and is keeping running... That is why it is important to configure the Gatherer carefully, please think about what to indicate exactly. An example hostfilter could be: Deny xxx Deny arXiv.org Deny ojps.aip.org Deny www.adobe.com Deny www.yahoo.de Deny www.w3.org Deny www.slac.stanford.edu Deny lycos Deny .com Allow .*This is nearly the default of one of the PhysDep gatherers. Broker ConfigurationThe Broker is the part of Harvest that accesses the data collected by the Gatherer and that makes an interface for the inquiry avaible for the user.The command RunHarvest creates a Broker automatically; however, one can create further brokers which access the same 'database' at any time - the corresponding command is CreateBroker.
In $HARVEST_HOME/brokers/BROKER/admin/broker.conf
you should change the line:
There is still a border in glimpseindex-program. If the size of all indexed data
increases 1 GByte the needed memory will increase quite fast, so just test (-M 300 ... -M 2000 ...)
but be aware not to use more memory then available (physically plus swap).
Please do not forget to restart the glimpse-server after rebuilding the glimpse-index (just kill the process)! CGI-DIR/Brokers.cf: Add any new Broker into this file manually. Files worth to know:
|
|||||||||
|
||||||||||
|
with funds of the German Ministry of Education and Research (BMBF) and of the Government of Lower Saxony.
Last Update: 07. Aug. 2002 © 2001-2002, ISN Oldenburg GmbH |
|||||||||