The FullText indexing system for documents - server part



Introduction
This document describes server part of working client server FullText application. The document builds on:



Registy keys used
   CollectionsDirectory
   Dx1Directory
   ExeAndLiterals
   GroupDirectory
   SharedDirectory
   ShellExe
   WorkingDirectory



History
New - 27.3.2000:
  - added command 'X'
  - returns also CAPITALIZED words
New - 17.6.2000:
  - processes HTML files
  - processes UTF-8 input
  - command 'A': returns list of unresolved codes (limited size) in response file
  - works with alternative command file separators
New - 31.12.2000:
  - added command 'Z'
New - 30.1.2001:
  - full formula in 'F' (fulltext_search_mode bits 2048, 4096)  - see Fult.htm
  - file_id max. size increased to 512 B
New - 6.3.2001:
  - added command 'W'
New - 18.3.2001:
  - added "index without existence check" to 'A'
New - 30.9.2002
 - added as_well_EN to 'F'
 - FULLTEXT_SEARCH_PUT_WEIGHT_OUT to 'F'
 - FULLTEXT_SEARCH_PUT_EXPRESSION_WORDS to 'F'
 - FULLTEXT_SEARCH_PUT_LANGUAGE_OUT to 'F'

Návrhy:
  - tisknout na konzoli for PACK: count_merged count_discarded duration filesize_out
  - hlidat, aby nebylo prekroceno 32 bitu pri MERGE



List of command types for input files (scheme *.I_?)
  The commands are sorted by priority. The priorities are the same as the order they are described in this document. In the program source code, the order is given in FULLTEXT_ExamineSharedIn() routine.
Each command returns a response file (for 'A' and 'D' commands: if not stated otherwise). The types of response files are described below input files.
Collection names and file_id's are expected in upper ASCII.

  Both input and output files use separator delimited values. The selection of separator value is limited by collisions with characters used in texts:

The separator value is detected automatically as the second character in command file. The same separator is then used in response file.

Coding of input data, query words and returned data is given as code_page; internally the words are always stored in DB code. The codes are:

Each command starts with id letter, then follow single parameters, last of them is the same letter as id.

List of types for output (response) files (scheme *.O_?)
  If a file type is missing here, then "general response" (described below) is used Each process performs its work based on file commands except of:

Indexing of a large number of files



Safety flag
This flag ensures safety of all files against switching off power or machine reset. It can be set OFF and ON with 'Q' command file. Setting it OFF speeds up the update operations by factor of 2. For retrieval operations it has no effect.



Example of FullText server installation and usage as Basic Network version
1.1. On a network machine, create a directory for FullText server files - the machine must be visible from all client machines - see also 1.3.
1.2. Copy there installation files.
1.3. Under the directory for FullText server files create the following directories (if you will not do this, the server will create the directories but the access rights may not be set correctly):
       - FULLTEXT.IN     as FullText server input directory for user request files,
       - FULLTEXT.OUT as FullText server output directory for FullText server response files,
       - FULLTEXT.SAF  as FullText server safety directory for recovery after crash.
     Make sure both input and output directories are visible and  accessible for Create/Read/Write/Delete from all potential clients.
     All clients must have available either full paths or aliases of  collections used. The first collection (in FullText server list) can be referenced also
     as DEFAULT_COLLECTION. Other collections created under "Collections" registry value can be accessed as DEFAULT\collection_name
1.4. Set up server run in start up menu (or create a shortcut):
       - "ISYS.EXE /FULLTC" for Server with CONSOLE started from its starting directory
       - "ISYS.EXE /FULLT" for Server without CONSOLE started from exe directory.
1.5. Either using a program or creating files by hand, place installation command files into input shared directory. At least one 'B' file must  be used.
       More files can be used at any moment later.
1.6. Run the FullText server - it will display console window, process any files  in input shared directory and wait for user requests.
1.7. The server shut down can be performed by placing 'S' command file to input shared directory.

2.1. Install client programs for single applications.
2.2. An indexing of a data is performed when client program writes appropriate 'A' command file. The indexed data can be:
       - part of the command file,
       - placed as a separate file to a directory visible from FullText server.
     The FullText server:
       - automatically invalidates an old index (if it exists),
       - builds new index,
       - sends a notification back to the client (if requested),
       - deletes the command file,
       - does nothing with the separate data file.
2.3. Retrieving of word occurrences is performed when client program writes appropriate 'F' or 'G' command file. The FullText server:
       - always places a response file to output shared directory,
       - deletes the command file.
     It is up to client program to delete the response file. Otherwise the files are deleted by server after given amount of time.
The client programs must be aware of possible collisions of file names placed to input shared directory.

3.1. The Index Files maintenance (Pack&Merge) is performed automatically by the FullText server.
3.2  Check of integrity can be requested by placing 'U' file to the input shared directory. The FullText server:
       - deletes the command file,
       - always places a response file 'U' to output shared directory; if no error has been found, the error text is "Check performed O.K.".



Example of FullText server installation and usage as Basic Standalone version
1.1. Create a directory for FullText server files.
1.2. Copy there installation files.
1.3. The following directories will be set by server automatically:
       - FULLTEXT.IN  as FullText server input directory for user request files,
       - FULLTEXT.OUT as FullText server output directory for FullText server response files,
       - FULLTEXT.SAF as FullText server safety directory for recovery after crash.
1.4. Set up server run in start up menu (or create shortcut:
       - "ISYS.EXE /FULLTC" for Server with CONSOLE started from its starting directory
       - "ISYS.EXE /FULLT" for Server without CONSOLE started from exe directory.
1.5. Either using a program or creating files by hand, place installation command files into input shared directory. At least one 'B' file must be used.
       More files can be used at any moment later.
1.6. Run the FullText server - it will display console window, process any files in input shared directory and wait for user requests.
1.7. The server shut down can be performed by placing 'S' command file to input shared directory.

2.x. Same as for Network version.

3.x. Same as for Network version.



Speed considerations
There are a number of factors influencing the speed of Fulltext indexing system:
1. The file communication between Server and Clients - this is given feature which cannot be changed.
2. Using 'I' commands instead of 'A' commands - this can increase the speed for large (> 5000) batches of files to index. For general use this is not
    good  method.
3. Suppressing of response files for 'A' or 'I' commands. Good method is to check only every 10th file.
4. Suppressing of safety - increases the speed 2 times at the cost of possible index corruption in case of power down during the indexing process.