Skip to main content
A Open Source Solution
Home Download Support API

provides a Java full-text resource indexer and search engine. Multiple search contexts may be created. Any LDBC-supported database may be used including MySQL, Oracle, MS-SQL, HSQLDB and DB/2.

thales : noun . [ thA-lEs ] Thales of Miletus 624 BC to 545 BC) named after the first known Greek philosopher, scientist and mathematician although his occupation was that of an engineer. He is believed to have been the teacher of Anaximander (611 BC - 545 BC) and he was the first natural philosopher in the Milesian School.

differs from other "bag-of-words" indexes in that there may be multiple bags (contexts) and the search may be limited to one or more of them. The engine indexes text and associates the text with a reference. Each reference has a title. Thus an engine to index and search an entire web site might to read each page on the site, take the title element and the URL to create a reference and reference title and then index the textual contents of the page. In addition if the author was specified the authors name might be indexed in the "author" context.

uses JDBC and the LDBC package. The JDBC driver must be one of those supported by LDBC.

Concepts

Words and phrases are indexed within the scope of a Reference. They may further be restricted to a single Context within the reference.

Reference
An object being indexed. Each reference has a name and a title. The name is generally a machine-readable reference while the title is the human-readable title for the same item. does not store the actual referenced item just the name and title. So could be used as a catalog of images where the reference name is the actual image URL and the title is a brief description of the image. The text to be indexed would be a long description and perhaps some information about the creator of the image.
Word
Words and/or phrases indexed as whole items. allows the indexing of phrases as complete items. It is up to the application to be able to recreate the phrases for the query.
Content
The context in which the word is being used. For example the description of an image or the author of a document.

Querying

When querying the database the context(s) that are to be searched may be specified or, if null, all contexts will be searched. For example you may want to index documents and index the body of the document in the "body" context and the authors in the "author" context. This would allow a query to locate all documents that John Smith had authored separate from all the documents about John Smith the author.

Words are located by direct hit, synonym hits and soundex hits. The relative values associated with each hit type are set in the ThalesConfig object. Setting a value to 0 (zero) removes it from the calculation. The relative values of each word along with the number of words in the query and the number of times the word appears in the database factor into a final rank for a resource. With the highest rank being the closest match to what was requested. The results are sorted by rank order.

Finding

When finding words only exact words are located. This is much quicker than the Query operation but is much more restricted.

Cross Referencing

Once a reference has been located it is possible to find all other references that share the same indexed words. The XRef mechanism provides this functionality.

Thales Requirements

  • A Java runtime (JRE) version 1.3.1 or above. A complete Java SDK is needed to compile the source code.
  • A JDBC driver and the LDBC package is required. The JDBC driver must be one of those supported by LDBC.
  • Jakarta Log4J for flexible log file handling.
  • The Xenei Utilities.

Optional Packages

  • Jakarta Ant version 1.5.2 or above if you plan to compile the source code.
All trademarks and copyrights are the property of their respective owners.
Copyright © 2002-2004 by Xenei.com, All Rights Reserved
OpenSource.org
AnyBrowser.org
Xenei.org - Open Source Solutions