Center for International Earth Science Information Network (CIESIN) Columbia University
Home PageContact Info
User's Guide to Searching with the CIESIN Search Engine
R. Bourdeau

This is the second part of a two part document describing the use of CIESIN's search engine. The first, or basic search help, is available here.

Advanced Features

TELLING THE SEARCH ENGINE TO KEEP IT SIMPLE
As you saw in the previous section, the search engine has a small number of special words, such as AND, OR, NOT, and WITH that modify the meaning of the search expression. Expressions such as:
death and taxes

will be interpreted as a search for documents containing both the words "death" and "taxes", but not the intended phrase "death and taxes".

You can tell the search engine to  ignore the special meaning of words in the search expression by writing portions of the search expression in the form {phrase}

For example:
death {and} taxes
to be {or not} to be
{to be or not to be}water
{within} $house
(death {and} taxes AND health) NOT social security

The last example above matches only those documents that contain the phrase "death and taxes" as well as the word "health" but excludes documents that contain the phrase "social security".

WORDS THAT ARE NEAR EACH OTHER IN A DOCUMENT
When words or phrases occur in a document near one another, there's a good chance that they might be more related to a single topic of interest. A search that involves checking the nearness of words and phrases is called a "proximity search".

The CIESIN search engine allows for nearness of words to be described in two ways:

word1 NEAR word2

and

NEAR( (word1, word2, ...), n)

where n is some number.

In the first form above, the expression will select only those documents containing word1 and word2, and only when word1 and word2 occur within 100 words of each other somewhere in the document.

For example:
death NEAR cause

The second form is more complicated. The expression will select only those documents containing all of the words word1, word2, ..., and only when all of the words occur in a group no longer than n words in length. In other words, there must be some excerpt that can be taken out of the document, consisting of no more than n words, and that excerpt must contain all the search terms.

Consider the following example:
near((red tide,cause,sewage),50)

First, only documents that contain the phrase "red tide", and the words "cause" and "sewage" will be considered.

Consider the following scenario:
Document 1: ... red tide...(30 words)...sewage...(30 words)...cause...
Document 2:  ...cause...(20 words)...red tide...(30 words)...sewage...

With the search expression above, only the second document would match the query, because the total distance from the first word to the last word in Document 1 is 60 words, while in Document 2 the distance is 50 words.

The search engine also understands sentences and paragraphs. You can use the WITHIN operator to indicate two or more words occur in the same sentence or paragraph as follows:

(death AND cause) WITHIN SENTENCE
(death AND unnatural) WITHIN PARAGRAPH 
USING PATTERNS TO SEARCH
Words can be misspelled, can occur in difference tenses, can be pluralized, and can have other forms that make it more difficult to find matches using exact matching of words and phrases. To address these problems, the search engine supports numerous pattern matching tools to allow for more flexible searching. Here we will discuss only a few of them: wildcards, word stemming, soundex, and fuzzy searches.

Wildcard (%)

A wildcard, %, matches any  number of characters.  It is used when it is desirable to specify only a portion of a word when searching. Examples are as follows:
polluti% matches words beginning with the "polluti", such as pollution and polluting.
pol%ing matches words beginning with 'pol' and ending with 'ing'  such as polling, polluting, and politicking.
%lution% matches words containing the sequence of letters "lution", such as pollution, solution, and resolutions.

Word Stem ($)

The stem pattern finds words with the same stem form. This is useful for finding "GOING" and "WENT" from "GO", for instance. Examples:
$go matches words having the same stem as "go", including going, gone, and went.
$pollution matches word having word stem as pollution, e.g. polluting, pollute, pollutant

place description of whatever here

Soundex (!)

Soundex query finds words which sound similar. Example:

!hog

place description of whatever here

Fuzzy (?)

The fuzzy pattern finds words with similar form. This is useful for finding mis-typed or mis-OCR'd words. The fuzzy operator is ?. Example:

?dog

A SIMPLE TOOL THAT DOES A LOT
ABOUT ()

About applies word stem, wildcards, and other patterns to find variations on the words and phrases given in the query. It uses a variety of strategies to find the most information that might be relevant to your search expression.

about(temperature)
about(global climate change in the southern hemisphere)
THE AMAZINGLY COMPLICATED FINAL EXAMPLE
In order to illustrate the flexibility that you have in defining search criteria, we offer the following very complicated but potentially useful example:
about(causes of disease that result in unnatural death)
AND ($cause near water)
AND ( (pollut% AND infect%) WITHIN SENTENCE )
CIESIN's search engine is built on the InterMedia Text Cartridge from Oracle Corporation, and supports most of the InterMedia query language. Details on the InterMedia query language can be found here

This page last modified: May 07, 2002