User's Guide to Searching with the CIESIN Search Engine R. Bourdeau This is the second part of a two part document describing the use of CIESIN's search engine. The first, or basic search help, is available here. Advanced Features |
||||||||||
TELLING THE SEARCH ENGINE TO KEEP IT SIMPLE | ||||||||||
As you saw in the previous section, the search engine has a small number of special words, such as AND, OR, NOT, and WITH that modify the meaning of the search expression. Expressions such as:
will be interpreted as a search for documents containing both the words "death" and "taxes", but not the intended phrase "death and taxes". You can tell the search engine to ignore the special meaning of words in the search expression by writing portions of the search expression in the form {phrase} For example:death {and} taxes The last example above matches only
those documents that contain the phrase "death and taxes" as well
as the word "health" but excludes documents that contain the phrase
"social security". |
||||||||||
WORDS THAT ARE NEAR EACH OTHER IN A DOCUMENT | ||||||||||
When words or phrases occur in a document near one another, there's a good chance that they might be more related to a single topic of interest. A search that involves checking the nearness of words and phrases is called a "proximity search".
The CIESIN search engine allows for nearness of words to be described in two ways:
where In the first form above, the expression will select only those documents containing word1 and word2, and only when word1 and word2 occur within 100 words of each other somewhere in the document. For example:death NEAR cause The second form is more complicated. The expression will select only those documents containing all of the words word1, word2, ..., and only when all of the words occur in a group no longer than n words in length. In other words, there must be some excerpt that can be taken out of the document, consisting of no more than n words, and that excerpt must contain all the search terms. Consider the following example:near((red tide,cause,sewage),50) First, only documents that contain the phrase "red tide", and the words "cause" and "sewage" will be considered. Consider the following scenario:Document 1: ... red tide...(30 words)...sewage...(30 words)...cause... With the search expression above, only the second document would match the query, because the total distance from the first word to the last word in Document 1 is 60 words, while in Document 2 the distance is 50 words. The search engine also understands sentences and paragraphs. You can use the WITHIN operator to indicate two or more words occur in the same sentence or paragraph as follows: (death AND cause) WITHIN SENTENCE |
||||||||||
USING PATTERNS TO SEARCH | ||||||||||
Words can be misspelled, can occur in difference tenses, can be pluralized, and can have other forms that make it more difficult to find matches using exact matching of words and phrases. To address these problems, the search engine supports numerous pattern matching tools to allow for more flexible searching. Here we will discuss only a few of them: wildcards, word stemming, soundex, and fuzzy searches.
A wildcard, %, matches any number of characters. It is used when it is desirable to specify only a portion of a word when searching. Examples are as follows: The stem pattern finds words with the same stem form. This is useful for finding "GOING" and "WENT" from "GO", for instance. Examples: Soundex query finds words which sound similar. Example: The fuzzy pattern finds words with similar form. This is useful for finding mis-typed or mis-OCR'd words. The fuzzy operator is ?. Example: |
||||||||||
A SIMPLE TOOL THAT DOES A LOT | ||||||||||
ABOUT ()
About applies word stem, wildcards, and other patterns to find variations on the words and phrases given in the query. It uses a variety of strategies to find the most information that might be relevant to your search expression. about(temperature) |
||||||||||
THE AMAZINGLY COMPLICATED FINAL EXAMPLE | ||||||||||
In order to illustrate the flexibility that you have in defining search criteria, we offer the following very complicated but potentially useful example:
about(causes of disease that result in unnatural death) |
||||||||||
CIESIN's search engine is built on the InterMedia Text Cartridge from Oracle Corporation, and supports most of the InterMedia query language. Details on the InterMedia query language can be found here |
This page last modified: May 07, 2002
|