dialoregon.net logo

State of Oregon - Online Services

155 Cottage St NE Salem OR 97301
Phone: 503-378-2135    Fax: 503-378-5543
Email: support@dialoregon.net


Home | Webmail | Web Forum | User Services | Support | Registration

Directory Search How To


The services below are provided to users of Online Services for use in web pages on this server. Online Services does not provide consulting, development, instruction or training for any of these services, or web page development in general. Please use the documentation provided with each of these services for support. If you feel there are errors in any documentation or problems with any of these services, please contact Online Services Support.

Online Services shall in no way be responsible for any damages or losses resulting from the use of any of these services. Please refer to the Online Services User Agreement for further details on usage policy.


This document explains how to provide searching capabilities on your own personal files on the web server. Please be sure to read the requirements section, as it provides important information on getting started. The options section provides additional information on configuring and customizing the search engine.

Overview

The directory search script gives you the ability to allow users to search through files in your public_html directory, or any subdirectories below it. You will provide a search form that allows users to enter a simple search query. The search script will use the query to search through the files in each specified directory and return a hyper link to any file that matches the query. There are a variety of required inputs and optional features (described below) that you may use to configure and customize the search.

There is no standard for the search form, other than it must conform to the requirements and options below. This tutorial assumes that you have working knowledge of how to create an HTML form and does not teach the specifics of HTML form syntax. A generic sample form is provided as a reference in getting started.

Requirements

1. The user's browser must supply the web server with certain environmental variables. Browsers that do not supply these variables will not be able to use your search form. Fortunately, most browsers (Netscape, MS Internet Explorer) supply these variables.

2. Your opening <form> tag must be:

<form method="POST" action="/cgi-bin/dirsearch.cgi">

Note: POST should be uppercase or some browsers may not work properly.

The directory search only supports the POST method. It will return an error if any other method is used.

3. You must specify the directories that the search script will search through using the directories variable. You may define several directories at once by separating each with a comma, or you may define the directories variable several times over, each with a different directory or a comma separated list of directories.

Your public_html directory is the root directory for the directory search. All directory names included in the directories variable must begin with either public_html or "." (period), which can be used in place of public_html. For example, to include your public_html directory in the directory search, set the directories variable to one of the following:

<input type="hidden" name="directories" value="public_html">
<input type="hidden" name="directories" value=".">
All subdirectories of public_html are denoted by their path name beginning from public_html. For example, if you had the directory public_html/files/search, you could add that directory, as well as your public_html directory, to the same search as follows:
<input type="hidden" name="directories" value=".,./files/search">
where "." represents your public_html directory, "./files/search" represents the public_html/files/search directory and "," is the separator between the two directories. Optionally, you can leave off the "./" on the beginning of a subdirectory path (i.e files/search).

If the directories variable is not set to at least one valid directory, the search script will return an error message;

4. For each directory you specify in the directories variable, you must create a configuration file called dirsearch and place it in the directory on the web server. The configuration file is used to mark the directory as searchable and also allows you to restrict access to files in the directory. The search script will not search a directory unless the dirsearch file exists and is created as follows:

  1. The first line of the file must be either ALLOW ALL EXCEPT or DENY ALL EXCEPT
  2. All subsequent lines contain the names of files to either ALLOW or DENY (as dictated by the first line of the file). Each line can consist of one or more file names separated by spaces. It is not necessary to add any files below the first line if you wish to allow or deny access to all files in a directory.
  3. The file must be named dirsearch (all lowercase with no file extension) and must be saved as a plain text file.
For example, in order to allow search access to every file in your public_html directory except index.html, you would create a dirsearch file in your public_html directory with the following content:
ALLOW ALL EXCEPT
index.html

5. You must specify the file extensions for each type of file the search script will search through using the file_ext variable. You may define several extensions at once by separating each with a comma, or you may define the file_ext variable several times over, each with a different file extension or a comma separated list of file extensions.

For instance, if you want to search through HTML pages, you would set the file_ext variable as follows:

<input type="hidden" name="file_ext" value=".htm,.html">
where ".htm" and ".html" represent file extensions for HTML pages and "," is the separator between the two file extensions.

If the file_ext variable is not set to at least one valid file extension, the search script will return an error message.

6. You must define the query variable, which should allow users to input a query for the search. Typically, this variable would be an input text area which allows users to type in a simple query to search on, such as in the following example:

<input type="text" name="query">
There are several restrictions on forming queries that you will want to be aware of and may want to make users of your search aware of.
  1. Words in a query are limited to uppercase/lowercase letters, numbers and spaces. All other characters will be stripped from the query before it is processed, unless the strict variable is set (see option 4 below).
  2. A query must consist of at least a single word, but may contain multiple words separated by spaces.
  3. Words in a query may be joined together in a search with either an AND or OR. A query may consist of either one or more ANDs or one or more ORs, but may NOT contain both AND and OR in the same query. AND will force the search script to return only those files that contain all of the words ANDed together in the query. For example, the query "mice AND men" will return only those files that contain both the words mice and men. OR will return files that contain at least one of the ORed words in the query. For example, the query "cat OR dog" will return any files that contain either the word cat or the word dog (or both words).

Options

1. By default, the search is case insensitive. You can make the search case sensitive by setting the case variable. For example, to allow users to decide whether they want a case sensitive search, you can provide the following radio button:
<input type="radio" name="case">
If the user selects the radio button, then the case variable will be set and the search will be case sensitive. If the user does not select the radio button, then the case variable will not be set and the search will revert to the default case insensitive.

2. By default, the search script will allow a query word to match as a substring of any word in a file. For example, the query car would be a match not only for the word car, but also carry, vicar and escargo. To force the search script to match on complete words only, set the word_match variable. For example, to make every user search match on complete words only, add

<input type="hidden" name="word_match">
to your form. In the above example, this would force the search script to only match the query car with the word car.

3. By default, the minimum length for each word in a query is 3. You can override this default by setting the word_length variable to a new number. For example, to make the minimum word length 4, add

<input type="hidden" name="word_length" value="4">
to your form. If the value of word_length is less than 1 or not a number, then the search will revert back to the default length of 3. If the user enters a word that is less than the default length, an error message will be displayed. In order to produce more accurate search results for your users, it is recommended that you not decrease the minimum word length below 3.

4. The strict variable allows you to control the behavior of the query parser for the search script. By default, the query parser attempts to correct any mistakes a user might make submitting a query. In particular, the parser will do the following by default:

  1. Remove any illegal characters from the query string (Requirement 6.1).
  2. Fix any illegal use of AND or OR (Requirement 6.3). For example, if the query contains both AND and OR, the parser will change all occurances of each to whichever occurs first in the query.
To override this default behavior, set the strict variable, which will cause the search script to display an error message if the user enters an invalid query. The following hidden form tag activates strict parsing:
<input type="hidden" name="strict">

5. By default, the search script will print out a simple header, the user query and the corresponding URLs to any matching files. To customize the presentation of your results, you can set the template variable to a file that will be used to display the results in. For example, to use a template file called results.html, set the template variable as follows:

<input type="hidden" name="template" value="results.html">
The template file does not have to be an HTML page, but should contain the text and other information you want your results to be displayed within. To place the results within your template file, add the following HTML tag to the file on a line by itself:

<!--#dirsearch-->

The search script will print out your template file and incorporate your results into the file at the point where you placed the tag.


Online Services is a division of the Technology Support Center - State of Oregon