GoSWISH Manual

By Daniel Hellerstein

05 March 2001. GoSWISH ver 1.6 Daniel Hellerstein, danielh@crosslink.net

GoSWISH: A Search Engine Utility For OS/2

Abstract:      GoSWISH is a free search engine for OS/2 Web Sites. GoSWISH consists of ver 1.3 of the SWISH "web indexer", a script to automate it's use, and a script to provide a front-end to WWW clients.

-- Table of Contents:

I.     Introduction II. Installation IIa. Installing as a CGI-BIN script IIb. Installing as an SRE-http addon IIc. Notes for upgraders III. Using GoSWISH IIIa. Using GOSWISH.HTM IV. The GOSWISH.CMD program IV.a.    Invoking GOSWISH.CMD IV.b.    GOSWISH.CMD:  Make a SWISH index file IV.c.    GOSWISH.CMD:  Search a SWISH index file V.     The MKDCT.CMD program AppA. Hints on using SWISH AppB. Acknowledgements and Legal Stuff

--

1. Introduction
The Simple Web Indexing System for Humans (SWISH) is a fast (and free) multi-platform program for generating & searching indices of the contents of a set of files. SWISH is designed to be used as "keyword search" tool, and will return output that can be incorporated into WWW output.

SWISH does the hard part of indexing (extracting words from selected documents and organizing them in compact fashion), and of searching this index for specified keywords. Although not terribly difficult to use, it is a bit idiosyncratic. By providing a simple, web-based interface to SWISH, GoSWISH can help the typical (i.e.; lazy and/or overworked) web administrator use SWISH to keep his search tools up-to-date and easy-to-use. In addition, GoSWISH makes it easy to create, and display, summaries of the files that are found during a keyword search.

GOSwish includes a copy SWISH ver 1.3. This is GNU-freeware (see the appendix for further details).

To use GoSWISH you should have an OS/2 web server that understands CGI-BIN. Especially useful is a web-server that can handle POSTed CGI-BIN requests (since the set of options can get rather long).

Even better, GoSWISH can also be run (somewhat more efficiently) as an addon for the SRE-http freeware web server for OS/2 (see http://www.srehttp.org for details).

II. Installation

First, you should UNZIP GOSWISH to an empty temporary directory.

Most people will want to use the INSTALL.CMD program (that comes with GoSWISH) to install. This program will ask you whether you are installing GoSWISH to be run as a CGI-BIN program, or as an addon for the SRE-http web server. It will then ask you for a few directories, modify a few files, and then copy them.

After it's done, you can try using GoSWISH!

For those who like to install software "by hand", the following instructions can be followed.

IIa: Installing GoSwish as a CGI-BIN Script

In addition to copying files to appropriate places, the installation of GoSWISH requires a few changes to the GOSWISH.CMD file; and possibly to the GOSWISH.HTM file. Therefore, pay especial attention to step iv of the following instructions.

0) Unzip GOSWISH.ZIP to an empty temporary directory

i) Create a SWISH subdirectory in some convenient locations. For sake    of explanation, let's assume that you create D:\HTTP\SWISH.

ii) Copy the following files to this directory (say, to D:\HTTP\SWISH):        SWISH-E.EXE         GOSWISH.CMD         iii) Copy GOSWISH.HTM to somewhere in your web tree. That is, copy it to some place accessible via the www. For sake of     explanation, let's assume you copy it to E:\WWW, and E:\WWW is      the "root" of your web tree.

Then, you must edit GOSWISH.HTM (with your favorite text editor) and make the following changes:

a) Change all occurences of the string "search_document_directory"           (without the quotes) to the relative directory you'ld like           your  "search documents"  written to by default           (or, you change it to an empty string).             * For example, change it to "SWISH/" (without the quotes).

b) Change all occurences of the string "GOSWISH.CMD"          (in URLs and in forms) to /CGI-BIN/GOSWISH.CMD (or whatever string your server uses to signify CGI-BIN scripts).

iv) With your favorite text editor, edit GOSWISH.CMD (the copy in    you SWISH directory) and change the following parameters (they'll be     very clearly marked):

a) SWISH_DIR -- the directory used to store "SWISH" indices.       b) WEB_ROOT_DIR    -- The root of your web tree. For example: SWISH_DIR='D:\HTTP\SWISH' WEB_ROOT_DIR='E:\WWW'

c) If you are planning on using the "regenerate swish index"         option, and your web server uses something other then          /cgi-bin/ as a a "cgi-bin script" signal, then you should also          change the CGI_STRING variable.

v) Copy this "modified" version of GOSWISH.CMD to your CGI-BIN   directory.

Assuming that there are no other access restrictions (is there an HTACCESS file you need to modify?) you are now ready to use GoSWISH!

II.b. Installing GoSwish as a SRE-http addon

SRE-http users can use the above instructions and run GoSWISH as a CGI-BIN script. However, for purposes of speed and flexibility, we recommend running GoSWISH as an SRE-http addon.

In addition to the above steps:

vi) As part of step iii, you might want to modify the NEED_PRIVS     parameter in GOSWISH.cmd.       NEED_PRIVS is used to limit who can create new SWISH indices.

vii)Instead of (or in addition to) copying to the CGI-BIN directory, copy     GOSWISH.CMD to your SRE-http "addon" directory. If you are using a       version of SRE-http newer then 1.2L.1297d, you can delete the version of      GOSWISH.CMD on "D:\HTTP\SWISH". Otherwise, you'll have to keep the      two copies synchronized (later versions of SRE-http keep track of      where a script is running from). viii)  Optional. Copy MKDCT.CMD to your SWISH (i.e.; D:\HTTP\SWISH) directory.

In addition, you may be interested in the following "sample" files; you should copy them to your SWISH directory.

SAMPLE.CON : a sample SWISH "configuration" file SAMPLE.SWI : a sample SWISH "index" file SRCHSAMP.HTM : a sample "use GoSWISH to search an index" document SAMPLE.DCT : a sample "description  cache file" MKDCT.IN  : a sample "list-of-URLS" (used by MKDCT.CMD). DESCRIBE.TXT : a sample "directory-specific" description file (used by MKDCT).

IIc. Notes for upgraders:

GoSWISH 1.4 incorporates a number of changes, many of which are "under the hood". The most important change is support for SWISH 1.3; which is shipped as SWISH-E.EXE (a copy of SWISH-E.EXE is included in GOSWISH.ZIP).

SWISH 1.3 is less buggy then SWISH 1.1, and supports some useful new features. Unfortunately, the "index" files created by SWISH 1.3 are different (just a bit, but enough) from those created by SWISH 1.1.

Therefore, if you have older SWISH index files (say, as created by GoSWISH 1.2 or before), you may want to NOT use SWISH 1.3. GoSWISH will work with SWISH 1.1 and SWISH 1.2 (albeit with these new features disabled), but you must set a few parameters.

In particular, the SWISH_VERSION parameter in GOSWISH.CMD must be set to: SWISH_VERSION=1.1 (see GOSWISH.CMD files for details).

As of version 1.44, GoSWISH is now distributed with rxSWISH.DLL -- a dynamic link library that emulates SWISH-E.EXE (ver 1.3). We recommend using this -- just set the SWISH_VERSION parameter to: SWISH_VERSION='13_DLL' Hint: you should move (or copy) rxSWISH.DLL (from your server's root directory) to a directory in your LIBPATH (such as x:\os2\dll).

Lastly -- see the READ.ME for late breaking news.

Minor notes:

* SRCHINDX.CMD is no longer supported -- for all intents and purposes, all it's functionality is now incorporated into GOSWISH.CMD * The WWWDIR option is no long supported -- you can't change the "WEB_ROOT_DIRECTORY" on the fly. But, since you can specify fully qualified directories, and multiple replacement rules, this capability is      no longer needed. * Note the use of SWISH_DIR and WEB_ROOT_DIR instead of INDEX_DIR and WWW_DIR * The HEADER option has been modified (it no longer adds   *  H1 and H2 options, similar to HEADER, now available  *  GoSWISH can now tell SWISH-E to use "stemming" rules when indexing    *  "Property" retention, with later display, is supported in GoSWISH.   *  FOOTER_FILEs and HEADER_FILEs are now assumed to be in the      SWISH_DIR directory.

III) Using GoSWISH

To get started with GoSWISH, we recommend GOSWISH.HTM: it provides a complete interface to GoSWISH and to SWISH. With GOSWISH.HTM you can:

a) Specify the directory (fully qualified, or web relative) to create an     index of,and then create the index.  b) Create "summaries" of all text (plain and html) documents encountered whilst creating the index. c) Create an HTML document that allows you to enter keywords, and then search    the index you just created.

The HTML documents created in step c can then be made "available to the public". Alternatively, GoSWISH automatically tracks the various indices created (say, one for each of several major areas), and you can instruct GoSWISH to offer a menu listing these various indices.

More ambitious users can structure their own "calls" to GOSWISH.CMD. The various options understood by GOSWISH.CMD are listed in Section IV.

You may have noticed that we've mentioned "descriptions" and "summaries" a few times. This refers to the generation of short (300 word) summaries for matched files. Thus, not only will matching files be found, but descriptions (extracted from the contents of the file) can also be displayed.

An important point should be remembered: the SWISH "index of your site" is a static document -- it will NOT reflect recent changes in the contents of your website. Frequent recreations of this index may be necesssary (hopefully, GoSWISH will make that easy to do).

IIIa. GOSWISH.HTM

GOSWISH.HTM automates practically everything having to do with index creation and use -- you may never need to use any other tool. However, please do not hesitate to modify, change, or otherwise chop GOSWISH.HTM to it's constituent pieces.

GOSWISH.HTM can be used for three purposes: a) to create a Swish Index (and a GoSWISH description-cache file) b) to search a Swish Index (and a GoSWISH description-cache file) c) to list currently available "search forms" (created by GoSWISH)

GOSWISH.HTM has two index creation modes: a "simple" and a "custom" mode. In most cases the simple mode will be quite adequate. The custom mode is actually quite similar, with defaults that are the same as those used when the "simple" mode is used. Do note that the "custom" mode uses a POST style request; so your server must understand POST requests.

Regardless of which creation mode you use, GoSWISH will activate GOSWISH.CMD. GOSWISH.CMD will then a) create a "SWISH configuration" file,  b) launch SWISH in a new process on your server, and feed it this configuration file c) if desired, it will also generate generate descriptive summaries     (and store them in a description-cache file).  d) A short status document will be returned to you (the client), which will contain a link to a sample "search this index" HTML document. This document contains a form that allows you to specify keywords, and a     few simple options (such as number of matches). In addition, if you chose to create summaries, you can also specify whether or not to     display summaries.

This document can be used as is. Or, you can append, amend or     otherwise modify it. If you want to use it "as is", you might need to    wait until the index (and descriptions) creation (of steps b and c) are completed; a task which may take a few minutes (depending on the number     and size of the files to be indexed).

NOTE:

Notes:

* GOSWISH will launch daughter processes; so reciept of a complete response from the server does not mean all the work's been done. To avoid this uncertainty, SRE-http users can "monitor swish while    it runs", as well as monitor the  generation of descriptions.

Unfortunately, due to the simplicity of the CGI-BIN protocol, when run as a CGI-BIN script this "monitoring" feature is not available.

* GoSWISH will generate a sample "search this index" document that uses the "search mode" of GoSWISH. You can use this search document as is (it's a reasonably efficient interface), or you can modify with your favorite text editor. If you want to take a shot at    this sort of customization, you should read section IV.c.

IV. GOSWISH.CMD

The following describes the various GOSWISH.CMD options. Note that many of these refer to SWISH options -- see Appendix A for an overview of what the SWISH options do. In addition, the "custom" section of GOSWISH.HTM describes most of these options.

IV.a. Invoking GOSWISH.CMD

GOSWISH.CMD can be invoked with a URL of the form /cgi-bin/GOSWISH?mode=x&option1=val1&option2=val2&etc. or, if you are an SRE-http user: /GOSWISH?mode=x&option1=val1&option2=val2&etc.

The MODE argument determines what type of action GOSWISH.CMD will perform. MODE can take one of the following values:

MODE=M : Create a swish index

MODE=S : Search a SWISH index mode

MODE=L : List currently available SWISH indices (and provide          links to forms that can be used to search them)

MODE=REGEN : List currently available SWISH configuration files, which can be used to regenerate a SWISH index This is provided as an alternative to running SWISH-E (with a pre-existing configuration file) from an os/2 command prompt. Note that "descriptive                    summaries" are NOT regenerated, only the SWISH index.

MODE=2REGEN : Regenerate a swish index (using a given swish                     configuration file). MODE=2REGEN is generated by MODE=REGEN -- it will not be described in                     this document.

Example: MODE=M

The remaining options understood by GOSWISH depend on the value of MODE: one set of options is used when MODE=M, and a second set is used whtn MODE='S'. Note that when MODE='L' or MODE='REGEN', no other options are used, and when MODE='2REGEN', a set of filename options are used.

The following sections describes the GOSWISH.CMD options. Note that some of these examples presume you are using a URL to invoke GOSWISH, hence URL encoding rules (such as using a + for a space) are displayed. However, if you are using a FORM with INPUT elements, you should convert these encoded characters.

But before listing these options, please note that GOSWISH.CMD contains a number of configuration parameters (some of which can be overridden by the following options). You can change these parameters by editing GOSWISH.CMD with your favorite text editor -- the parameters are in a section at the top of the file, and are documented.

IV.b. GOSWISH.CMD: Creating a SWISH Index (mode=M)

In "create a swish index" mode, there are four classes of options: file options, indexing rules, description options, and other options.

for the others).
 * File Options: You MUST set the SEL option (defaults are used

SEL : The directories to be searched. You can enter.. relative directories (directories that do not have a drive letter), or       fully qualified directories. in a space delimited list. Relative directories are assumed to be under (subdirectories of) the WEB_ROOT_DIR directory. Example: SEL=/    (search WEB_ROOT_DIR and all it's subdirectories) SEL=DIR1/ SEL=/DIR1 /DIR10  /DIR2 SEL=/DIR1/ /DIR2/*  /DIR3/ Notes: * swish will index the subdirectories of each directory (relative or absolute) that you enter. * To specify an explicit set of files in a directory, but not in         subdirectories (of a directory), you can   use * as a wildcard. For example: samples/ means all files in samples/, and in subdirectories of samples samples/* means all files in samples/ but NOT in subdirectories of samples/ samples/foo*.* means all files that match foo*.* in samples/ but NOT in subdirectories of samples/ * trailing and leading / (or \) characters in relative directory names are strictly optional (they will be removed and added         as need be).

SWIFILE: The name of the "swish index" to create. If not a fully qualified name, it will be written to the SWISH_DIR directory. If not specified, a unique random name (in SWISH_DIR) will be used.

SWISHVERSION: Which version of SWISH to use. This overrides the default SWISH_VERSION variable set in GOSWISH.CMD. SWISHVERSION can take one of the following values: 11 == version 1.1 12 == version 1.2 13 == version 1.3 Example: SWISHVERSION=13

SEARCHDOC: The name of the "search this index" HTML document. Relative names are assumed to be relative to the WEB_ROOT_DIR. If not specified, a unique random name (in WEB_ROOT_DIR) will be          used. If a fully-qualified name is used, you should also include the "selector" that will invoke this file. For example: SEARCHDOC=D:\WWWNEW\SDOCS\SEARCHX.HTM+/WWW2/SEARCHX.HTM ... note the use of a "url-encoded space" (the +) to delimit the fully qualified file name and the "selector").

OVERWRITE  If set to 1, filenames will overwrite pre existing file names. Note that this overrides the OVERWRITE variable in GOSWISH.CMD


 * Indexing Rules:

EXTLIST : List of extensions to index -- if a file does not have one of these extensions, it will be ignored.

EXTLIST_NOFOLLOW: Do NOT extract "words" from these documents. You should ONLY extract words from text documents (such as HTML files).

DOSTEM If 1, then use the "stemming algorithim". If 0, do NOT use the stemming algorithim. The default value is 1 Example: DOSTEM=1

METANAMES List of  name fields to assign words to. If you specify a list of such fields (such as DESCRIPTION and           CONTENT), then you'll be able to search explicitily for words appearing in these  elements in HTML files. For example, if you specify: METANAMES=DESCRIPTION then you can later search using a KEYWORD DESCRIPTION=myword and SWISH will find all files that have the word "myword" in          a "description"  element.

There is one drawback -- these words will NOT be found under a           usual search (however, you can "or" together normal keyword           searches and  keyword searches).

PROPNAMES List of "property names" (defined as values of META tags) to retain for each file. This is extra descriptive information that can be shown with other search results. Example: PROPNAMES=DESCRIPTON+AUTHOR+MODIFIEDDATA

IGNORELIMIT: Used to ignore "common" words that occur too frequently IGNOREWORDS: A set of common words to ignore (such as "the" and "or"). If not specified, a list of about 1000 "swishdefault" words are used.

REPWITH: SWISH will store files using fully qualified file names. If you want to store URLS (a requirement if you want clients to be        able to "click to recieve" matched documents), the REPWITH can be used to specify a "replacement rule". By default, a        default REPWITH is used (that will create a selector that is         relative to the value of the SEL option). Note that if you specify multiple directories to search, by default a seperate replacement rule will be generated for each directory you specify.

If you specify any REPWITH rules, default ReplaceRules will NOT be generated!

These FR_ options are used to suppres indexing of files and directories. Please see SWISH.HTM for details. FR_DIRECTORY: the "FILE RULES DIRECTORY " instructoins FR_TITLE: The "FILE RULES TITLE" instructions FR_FILENAME: The "FILE RULES FILENAME" instructions FR_PATHNAME: The "FILE RULES PATHNAME" instructions.

These index options are stored in the swish index file; they just provide identifying information. INDEXNAME : Name of the index INDEXADMIN: Administrator INDEXPOINTER: Pointer to this index INDEXDESCRIPTION: Description of the index


 * Description options.

The "description-cache" (DCT) file is created by extracting informaton from each matched document. For HTML documents, a META NAME="DESCRIPTION" or HTTP EQUIV DESCRIPTION element is used (if available).

For example:  Otherwise, values of  headers are used.

For non-HTML documents, the first few hundred characters are used.

Note to SRE-http users: The MKDCT program can also be used to create a "description-cache file".

MAKESUMMARY: Make a descriptions-cache (DCT) file (that contains file summaries) Can be one of : 0 = Do NOT create a DCT file. If you select 0, then descriptions will NOT be available (of course,                   you can always rerun GOSWISH to make descriptions at a later                    date) Note: on sites with well document HTML files, you can use DESCRIPTION (and other such) "properties" instead of summaries. 1 = Read descriptive summaries from a DESCRIBEFILE. 2 = Read descriptive summaries from a DESCRIBEFILE. If no such directory-specific description file exists, and this is a "text" file, then create a descriptive summary by examining the contents of the file. Note: "text files" are defined as "indexed" files -- files that do NOT match the EXTLIST_NOFOLLOW list. DCTFILE:    The name to use for the "descriptions-cache" (DCT) file. Relative names are written relative to the SWISH_DIR. If not specified, a random name (in SWISH_DIR) is used.

DESCRIBEFILE: A text file that contains explicit descriptions. The DESCRIBEFILE, if specified, is an "own directory" specific file -- a seperate one should be specified for each directory (and subdirectory) being indexed. If an entry for an indexed file is found in the "own              directory DESCRIBEFILE, then the associated description is used              (a description will NOT be constructed from the contents of the file).              DESCRIBEFILE files should look like:                 filename.ext  a description                 filenam2.ext  another deDscription                 filenam3.ext  another description, this one on                   | two lines

Example: DESCRIBEFILE=DESCRIBE.TXT


 * Other options.

WATCH: If WATCH=1, and you are running GOSWISH as an SRE-http addon, then status information will be shown.

Note: when run as a CGI-BIN script, WATCH is ignored (status       is not "watched". However, GOSWISH will "START" SWISH --         a descriptive name will show up in the task list.        When run as an SRE-http addon, programs are DETACHed (and will not show in the task list). DOSTEM:  If DOSTEM=1, then apply a "stemming" algorithim when indexing.         This algorithim will strip stems (sucn as "s", "ed", etc.)         from words before indexing them.

INDEXCOMMENTS: If INDEXCOMMENTS=0, then SWISH will index HTML comments. If INDEXCOMMENTS=1, comments will not be indexed.


 * Example.

Note that this would typically be a single long request, or would be the body of a POST request (als note use of URL encoding):

/CGI-BIN/GOSWISH? mode=M& sel=/SAMPLES& swifile=& searchdoc=& repwith=& extlist=.htm+.txt+.gif+.jpg+.doc+.sht+.html+.shtml& extlist_nofollow=+.gif+.xbm+.jpg+& fr_pathname=contains+admin+testing+demo+trash+construction+PRIVATE+private+confidential+& fr_directory=contains+.htaccess+& fr_filename=contains+%23+%25+%7E+.bak+.orig+.old+old.+& fr_title=contains+construction+example+pointers+& ignorelimit=50+100& ignorewords=SwishDefault& indexname=&indexadmin=&indexdescription=&indexpointer=& makesummary=2& htmls=+HTM+HTML+SHTML+SHT+& describefile=DESCRIBE.TXT& watch=Y

IV.c.GoSWISH MODE=S options

MODE='S' is the "search a SWISH" index MODE. The following parameter can be fed to GoSWISH, either as part of an HTML form, or as part of a (possibly quite long) URL.

INDEX: A space delimited list of "SWISH" indices. Each index in this list may be a fully qualified name, or      a relative name. If relative name is used, it is assumed to be relative to the SWISH_DIR.

DCT_FILE: A space delimited list of description-cache files. Each index in this list may be a fully qualified name, or      a relative name. If relative name is used, it is assumed to be relative to the SWISH_DIR.

EXIST If EXIST=1, then GoSWISH will check to see if the matches are accessible. When matches are URLS (which is the usual case), socket calls are used. If the URL does not exists, the match will be displayed, but will not be linked.

In general, use of EXIST=1 is NOT recommended -- it dramatically increases response time.

CONDITION Controls search logic. CONDITION should set to OR or NOT. Depending on the value of CONDITION, an "OR" or a "NOT"  will placed between keywords. If CONDITION is not included, "AND" is used. * Keywords for which an explicit "logical control" was included will not be effected by the CONDITION parameter. That is, CONDITION only applies to keywords that do not have a preceding AND, OR, or NOT.

* Caution: CONDITION does not work well with (phrases) or               in combination with  complex user specified search strings.

Example: 

Note: the list of "keywords" can also contain OR, NOT, AND and -- CONDITION will NOT override these explicit boolean terms.

COMMENT Comments to place (using ) under header. You can include multiple COMMENT elements.

Example:  

FOOTER_FILE A file to use as a footer. * The FOOTER_FILE is assumed to be relative to the SWISH directory (or to a virtual directory) * Server side includes will NOT be attempted on the footer_file.

Example: 

HEADER_FILE A file to use as a header. If HEADER_FILE is specified, the HEADER, H1, and H2 options are ignored (COMMENTS are NOT ignored).

* If you use a HEADER_FILE, you MUST include a  statement in it. * The HEADER_FILE is assumed to be relative to the SWISH_DIR director -- either in it, or in a subdirectory of SWISH_DIR. * Server side includes will NOT be attempted on the header_file.

Example: 

HEADER, H1, and H2    A header to display (at the top of the results page). A default header is used if neither HEADER_FILE or HEADER is specified.

H1 will automatically prepend an , and append a , to     your header. H2 will automatically prepend an <H2>, and append a </H2>, to    your header.

You should only use one of HEADER, H1, or H2

Example: <INPUT TYPE="hidden" NAME="header" VALUE=" Results of search ">

KEYWORD A space delimited list of words to search for, with OR AND NOT used as (optional)logical controls (AND is assumed). If KEYWORD is not included, a keyword of HELP is used.

Example (from a FORM): <INPUT TYPE="text" NAME="keyword"> Another example, assuming you defined a METANAME of AUTHOR <INPUT TYPE="text" name="keyword" value="AUTHOR="> enter author's name (the user should enter the author's name after the             AUTHOR= in the input box)

OPTION A search-modification option. You can include multiple OPTION elements. Valid OPTIONs include: -t HBthec : search in head, body, title, header, emphasized, or comments -m nn     : display nn maximum of nn matches For a description of these options, see the SWISH documentation at: http://www.eit.com/software/swish/swish.html or run SWISH (from an OS/2 command prompt) for a short synopsis.

Examples (display 10 matches, searching HTML documents only): <INPUT TYPE="hidden" NAME="option" VALUE="-m10"> <INPUT TYPE="hidden" name="option" value="-t+HB">

SPECIAL OPTIONS (requires a DCT_FILE) Option="FILE" Instead of looking for keywords, you can search for "URL names" that contain a matching substrings. Option="SUMMARY" Searches the "automatic description" for matching substrings.

* The URL name will be displayed as the link (instead of           the TITLE). However, the descriptions will be the same as          those used in regular keyword searches through the SWISH index.

* When file or summary search is selected, the HBthec options are ignored.

Example: <INPUT TYPE="hidden" name="option" value="file">

SEARCH_LINK: Specify the target for a "New Search?" link (which will be displayed       at the bottom of results pages.  This should be a valid URL pointing        to your search form.         Example:           <input type="hidden" name="SEARCH_LINK" value="http://mysite.net/search1.htm">

SHOWPROP: Specify "properties" to display (if available). You can specify multiple occurences of SHOWPROP -- a cumulative list is built.

In addition to property names, you can include a special "property" of _SUMMARY_n (n=0, 1, or 2) These are synonymous with specifying a SUMMARY=n option.

Examples: <input type="hidden" name="SUMMARY" value="Description"> <input type="hidden" name="SUMMARY" value="Author"> (note that BOTH of these could be specified simultaneously --        which would mean "display both the Description and the Author         properties for each matched file).       Warning: in order to use SHOWPROP, you MUST have used an                appropriate PropertyNames option when you created                the SWISH configuraton file. For example:                     PropertyNames description Author

START: Display the first m matches, starting with the START match. By default, START=1.

The most frequent use of START is to tell GoSWISH to "make links   to the next 20 matches.  To do this, you should use a special    form of the START option:         START=1 0     .... that is, a 1, a space, then a 0 (or, if used in url: START=1+0)

GoSWISH will interpret this to mean * "start at the #1 match and display the selected quantity of matches       * if there are undisplayed matches, provides links to the          next (or prior) set of matches    Please note that an inefficient algorithim is use:GoSWISH will re-search    the SWISH index, and selectively display the appropriate matches (say, matches    21 to 40 if you specified "show 20 matches".    Nevertheless, this option does give you the ability to display lots of    matches " a page at a time".

Examples: START=20 START=1+0 (where the + is a URL-encoded space)

Note: the "m" (in "first m matches") is specified by using an OPTION. For example, to specify "display 10 matches  at a time, starting from the first match": <INPUT TYPE="hidden" NAME="option" VALUE="-m 10"> <INPUT TYPE="hidden" NAME="start" VALUE="1 0">

SUMMARY: Display a summary. summary=0 : do not display summaries summary=1 : display summaries. summary=2 : display summaries; if no summary is available, try to create one by reading the file, or the URL of the file (depending                     on what appears in the G.

Notes: * If summary=1, then a DCT_FILE must be specified. * If summary=2, then a DCT_FILE should be specified, but need not be. However, we do NOT recommend using SUMMARY=2, since creating summaries "on the fly" can bog down your server. * See SHOWPROP for another way of requesting display of summaries. * Summary display will indicate the "source" of the summary: >> if from a "DESCRIBEFILE", or from a <META> element, then the standard font is used >> if generated from a non-html text file, or from the <BODY> of an HTML file, <TT> font is used.

/CGI-BIN/GOSWISH? MODE=S& INDEX=index32.swi& DCT_FILE=index32.dct& KEYWORD=daniel& COND=AND& OPTION=-m+20& SUMMARY=1& HEADER=Search+of+my+files
 * Example (note use of URL encoding):


 * Miscellaneous comments

* Please remember that the SWISH "index" of your directory is a static document, and will not reflect subsequent changes in the contents of    your site (this is also true of the "description file"). So, if you make substantial changes in site content, you should rerun GoSWISH.

* If you do NOT need the "keyword search" features (that is, you only want     to search filenames or summaries), you can skip the use of SWISH. This does require providing MKDCT with a list-of-URLs (see MKDCT for    details).

* The "search documents" produced by GoSWISH can be easily modified. In particular, you can add HEADER_FILE, FOOTER_FILE, and COMMENT options.

* GoSWISH will auto-detect whether a the target SWISH index is version 1.1 or version 1.3, and will run the appropriate version of SWISH (assuming    that SWISH 1.1 is named SWISH.EXE and ver 1.3 is SWISH-E.EXE.

* When specifying multiple index files, you can NOT mix version 1.1 and version 1.3 swish indices.

V. The MKDCT program

MKDCT.CMD is a standalone program used to create a "description-cache" (DCT) file". The output of MKDCT differs from the description file that    can be (optionally) produced by GOSWISH.CMD in several ways:

a) You can create either "regular" or "structured" DCT files (see below for a description of these two forms of DCT files).

b) You can run MKDCT at any time. In contrast, GOSWISH only produces it's       description file while producing the SWISH index.     c) MKDCT has a few extra options. d) MKDCT contains a simple "description-cache file" editor.

MKDCT has two file selection modes: a "SWISH" mode and a "List of URL's" mode. * SWISH mode: The SWISH mode uses the ".CON" file you used to     create the SWISH index; and the ".SWI" SWISH index file.

* List-of-URLS mode: The List-of-URLs mode requires a text file containing "URLS" to be examined (see MKDCT.IN for a simple example). Entries in these files should have the following form: URL " short description" byte_size fully_qualified_filename where the URL should be a valid "link" to your site, and the last three terms are optional.

In general, if you've gone to the trouble of obtaining and using SWISH, it's probably easier to use the SWISH mode. When it comes to creating the descriptions, either generate descriptions for HTML and plain-text files by examining the contents of the file, or it will examine "directory-specific" description files -- text files that may be in each of the (several) directories being indexed. Each of these (possibly several) files should contain descriptions about the files in it's own directory.

The basic structure of these "directory-specific" description files is simple. It should contain entries that look like:

filename.ext a description filenam2.ext another description filenam3.ext another description, this one on     | two lines (this is the continuation of the filenam3.ext desciption)

These descriptions can be of any length -- just be sure to start the "continuation lines" with a | character. Furthermore, the files can be of any mime type -- they need not be "HTML" or "plain text" files. Lastly, you should NOT include path information on the filename.ext portion of an entry -- a given "directory-specific" description file ONLY refers to files in "it's own directory".

Structured vs. Regular DCT Files

"Regular" DCT files are the same as DCT files produced by GoSWISH.

"Structured" DCT files (which can be read, but not generated, by GoSWISH) contain the same information, but use structured records to speed up data retrieval. While not important for an index of a small (say, less then a few hundred) set of files, for large (several thousand) indices, extraction of summaries from structured DCT files can be several times faster.

Other then the need to use MkDCT, there is one disadvantage to structured DCT files -- they can NOT be combined. Regular DCT files can be combined, say by using an HTML form statement of: <input type="hidden" name="DCT_FILE" value="index1.dct INDEX2.DCT">

Notes:

* As with SWISH's index, the description file is permanent (at     least until you delete or replace it). Thus, changes to the contents of your files will not effect the descriptions (nor will such changes    effect the SWISH index).

* MKDCT will ask you to supply a fully qualified name for the description file.

* At the top of MKDCT.CMD are a number of user changable parameters. For example, you can modify the value of the | "continuation flag". Of more general use, if you intend to use MKDCT frequently (for     example, if your site is changing rapidly), you may want to change some of the default file names.

* Hint: Creating an DCT file for a large set of files can monopolize your CPU for several minutes. If you do not want to bog your machine down, and are willing to accept a longer completion time, you can instruct MKDCT to "run at a lower priority".

Appendix A) Hints on using SWISH
The following offers a brief description of how to run SWISH as a standalone program. We do not necessarily advocate running SWISH directly (i.e.; rather then running it via GoSWISH).... it's a matter of taste.

Serious users should obtain and read the SWISH documentation, which can be found at http://sunsite.berkeley.edu/SWISH-E. It's actually fairly well written!

However, for those who aren't real ambitious, the following will probably be all they really need to know to use SWISH effectively! Note that this example does NOT use features unique to SWISH 1.3 (it will work with both SWISH 1.1 and SWISH 1.3). First,as mentioned in the installation section above, two samples files are included: two sample files: SAMPLE.CON : A "configuration" file used by SWISH SAMPLE.SWI : The results of using SAMPLES.CON, ready to be used as an "INDEX" file. SRCHSAMP.HTM : An html document that calls uses GOSWISH to search sample.swi.

Since SAMPLES.CON is tersely documented, let's discuss some of it's more important variables.

IndexDir A space delimited list of "directories" to search (note that       subdirectories of these directories are also searched). These should be fully qualified directories (though you don't need the drive       letter).

Note that in SAMPLES.CON, two directories are indexed: SAMPLES and IMGS. Further note that we assume that the GoServe data directory is \WWW.

IndexFile The "index" file generated by SWISH, and used by the INDEX option of GOSWISH. Since we try to be FAT friendly, we usually give it a .SWI extension, but you can call it anything.

IndexOnly Only files with these extensions will be indexed.

NoContents Files with these extensions will only have their names indexed (that is, their contents will not be examined). This may not work all the time (SWISH has several such       bugs in it).

ReplaceRules Replaces the portion of the file name with some other string.

THIS IS CRUCIAL -- if you don't get this one right, the links created by GOSWISH will NOT work.

Note that ReplaceRules only seems to work on strings listed in the IndexDir. Thus, you can't give different ReplaceRules to different branches of a subdirectory tree (unless each       branch was explicitily mentioned in the IndexDir option)

You can use any string in the "convert to" portion of your ReplaceRules. However, note the following:

FileRules Used to suppress reporting certain directories. Caution: FileRules seem to be a bit flakey, you may want to                experiment them.

Once you've created your .CON file, you can run it through SWISH (using the -c option). Then, run it through SWISH again, using the -f swish_index_file_name -w word1 word2

-

Apppendix B. Acknowledgements and Legal Stuff
The original creator of SWISH (in 1994) was Kevin Hughes (then of EIT). Custody of the rights have since passed to UC Berkeley, which distributes new versions of SWISH as GNU style freeware (see http://www.fsf.org/copyleft/gpl.html for the generic GNU license).

The current (February 1999) home page for the SWISH project is: http://sunsite.berkeley.edu/SWISH-E/

If you are interested in the complete SWISH for OS/2 package, you can look for it on: http://sunsite.berkeley.edu/SWISH-E/Ports/OS2/swishe131.zip, hobbes.nmsu.edu (search for SWISH) or the SRE-http home page http://www.srehttp.org/pubfiles/swish11.zip (swish 1.1), or   http://www.srehttp.org/pubfiles/swish13.zip (swish 1.3)

GoSWISH was developed by Daniel Hellerstein (danielh@crosslink.net). It too is freeware, with the following GNU-like disclaimer:

Copyright 1998,1999,2001 by Daniel Hellerstein.

Permission to use the GoSWISH software package for any purpose is hereby granted without fee, provided that the author's name not be used in  advertising or publicity pertaining to distribution of the software without specific written prior permision. With some provisos, this includes the right to subset and reuse the code, with proper attribution.

Furthermore you may also charge a reasonable re-distribution fee for GoSWISH; with the understanding that this does not remove the work from the public domain and that the above provisos remain in effect.

Note that this disclaimer is only in regard to the various files comprising GoSWISH, which does NOT include the SWISH executable(s) -- see the GNU license for information on distribution of SWISH.

Many kudos to Christopher McRae (christopher.mcrae@mq.edu.au) who ported the version 1.3 C source code (from Berkeley) to OS/2, and who generously created rxSWISH.DLL.

Also thanks to Stewart Buckingham (stu@mailroom.com) who bravely stepped up to the beta testing plate.