(C) Copyright 1999 International Business Machines Corporation
All Rights Reserved
 
Instructions and Information About the www.vm.ibm.com Searcher
 
 
This package contains the searcher we use to index and search
the web pages for www.vm.ibm.com.  The basic idea is that our
site is completely contained in an SFS directory tree.  We
have a scanner (an exec) that walks the tree to detect keywords
and an RSK-based module that reads the scanner's output into
memory and services lookup requests sent by a simple CGI program.
 
The searcher is NOT generalized, so you will have to modify it
to get it to work for your site.  To modify it you will need to
have Rexx, Pipelines, CGI, and HTML skills.
 
There are three main parts to the searcher:
 
 
1.  The Scanner
 
The scanner -- KWDS EXEC -- is comprised of these pieces:
  KWDS EXEC
  KWDS REXX
  TAGS REXX
  BB   CONFIG
 
This scanner is able to scan the SFS directory tree containing
our site and build a set of CMS files describing the correspondence
between keywords and URLs in which the keywords reside.
 
To get the scanner to work for you, you will have to:
 
a.  in KWDS EXEC, change variable 'sitesfsroot' to nominate
    the root of your site
 
b.  create a directory /include/ in your site's directory
    tree and place file BB CONFIG in that directory, OR
    place BB CONFIG somewhere else and modify KWDS EXEC so
    it will be able to find BB CONFIG in the place you put it
 
c.  In file BB CONFIG, modify the EXCLUDE clauses at the end
    of the file to record those directories that should be
    excluded from keyword scans (you can get rid of the rest
    of the junk therein - it is there for another exec we
    run at our site)
 
You must run the scanner in a userid that has write authority
to your entire site.  Perhaps a filepool administrator userid
is a good choice for you - that's what we do here.
 
KWDS EXEC indexes only the part of your file that is between
the "<!-- BEGIN CONTENT -->" and "<!-- END CONTENT -->" markers
therein.  (Each file on our site contains a lot of boilerplate
and the "real content" is between those HTML comments.)  You
might want to change KWDS EXEC to operate differently.
 
The scanner leaves output files throughout your site.  The files
it leaves are:
 
 INDEX IX-DIRS   (left only in root)
 This file names the directories the scanner searched.  It is
 read later by the RSK search server.
 
 INDEX IX-FILES  (one in each directory)
 This file names the HTML and HTM files that were scanned and
 records a little data about each scanned file.  The most
 important thing recorded herein is the correspondence between
 scanned files and scanner output files.
 
 nnnnnnnn IX-WORDS (one for each HTM or HTML file scanned)
 This file contains the words found in the scan of one HTM or
 HTML file.  The HTM/HTML file this IX-WORDS file corresponds
 to is recorded in INDEX IX-FILES.
 
 
2.  The Server
 
The server is comprised of the following files:
 
IXSERV MODULE
PROFILE $EXEC  (rename to PROFILE EXEC to activate)
PROFILE RSK
IXSERV BKWUMAP
IXSERV BKWSGP
BKWRTE MODULE      (this is actually an RSK part)
BKWUME TEXT        (this is actually an RSK part)
 
You will need to create a userid to run IXSERV MODULE disconnected.
This userid will need to have the following statements in its
CP directory entry:
 
 XCONFIG ADDRSPACE MAXNUMBER 32 TOTSIZE 64G SHARE
 IUCV *MSG MSGLIMIT 65535
 MACHINE XC
 
The rest is pretty much up to you.  Name your userid IXSERV if
possible (if not possible, you must reconfigure file SEARCH CGI -
see below).
 
IXSERV MODULE attempts to allocate some pretty large memory
buffers (4 MB).  I would recommend a minimum of a 48 MB
virtual machine.  Maybe even 64 MB will be required, depending
on what else is going on (nucleus extensions, for example).
If you get an 801 error from ssMemoryAllocate, make the
virtual machine bigger.
 
Probably the easiest thing for you to do is to put all of the
above-named files on the index server's A-disk.  Then customize
the files, as follows:
 
 PROFILE EXEC         tinker with as desired
 PROFILE RSK          change variable 'sitesfsroot' to nominate
                       the root of your web site
 
IPL the server machine and issue command "IXSERV" to start the
server.  DO THIS ONLY AFTER YOU HAVE RUN THE SCANNER AT LEAST ONCE.
 
For a complete list of the RSK commands you can type at the IXSERV
console, see the RSK Programmer's Guide.  However, here is a very
abbreviated list of commands you might find interesting.
 
- CP cmdstring - issues CP command, writes results to console
 
- CMS cmdstring - issues CMS command, writes results to console
 
- ENROLL LIST - shows you some statistics about the keywords
   and URLs your server has indexed.  The "Entries" column
   is the column of interest.  The number of entries in the
   Uxxxxxxx set is the number of URLs you have indexed.  Then
   number of entries in the Kxxxxxxx set is the number of
   different (distinct) keywords in your index.
 
 
3.  The CGI
 
The following files comprise the user interface to your searcher:
 
INDEX HTML
SEARCH CGI
SEARCH FORM
GENSRCH REXX
VMHOME HEADER
VMHOME TRAILER
 
To install:
 
a.  Create a directory /search/ on your site and dump the
    above-mentioned files into it.
 
b.  Edit INDEX HTML to remove everything EXCEPT what's between
    "<!-- BEGIN CONTENT -->" and "<!-- END CONTENT -->".  Supply
    your own HTML at top and bottom instead.
 
c.  If you want to tinker with the search form's appearance, do so
    in INDEX HTML, making the corresponding changes in SEARCH FORM.
 
d.  In SEARCH CGI, change the Rexx constants at the top of the
    exec to nominate your own SFS site root, name of search
    machine, and so on.
 
e.  Tinker with files VMHOME HEADER and VMHOME TRAILER so that
    SEARCH CGI builds HTML to your liking.  (Look at label
    "answerbrowser" in SEARCH CGI to see what it does with the
    header and trailer.  Adjust the header and trailer files
    accordingly.)
 
f.  Make sure your HTTP server machines all have this statement
    in their CP directory entries:
     IUCV *MSG MSGLIMIT 65535
 
File SEARCH CGI was written for use with EnterpriseWeb/VM.
 
If you are using Velocity Software's ESAWEB, you might find SEARCH
ESAWEB to be a suitable replacement for SEARCH CGI -- it was
supplied by James Weissman of Velocity Software, Inc.  and you
should direct all questions to him:  james@velocity-software.com.
 
If you are using some other HTTP server you will have to customize
SEARCH CGI to work with your server.
 
 
3a.  Use of TCP
 
(This section used to be called "use of UDP", but the UDP stuff
never did work quite right, so I deleted it and wrote TCP support
instead.)
 
Here is how you can make search.cgi talk to IXSERV over TCP.
 
a.  Set up the TCP stack to be ready for IXSERV to use port
    85 for TCP (changes in PROFILE TCPIP).  You could pick
    a different port number if you want, I suppose.
 
b.  Add the following commands to PROFILE RSK:
 
    CONFIG NOMAP_TCP ON
    SUBCOM START TCP
    TCP START IXFIND 85 50 0.0.0.0 TCPIP
 
    (In the TCP START command, replace "TCPIP" with the name
    of your TCP/IP stack machine.)
 
c.  In SEARCH CGI, change "comm_method" to "TCP".
 
d.  In SEARCH CGI, change "tcp_p", "tcp_a", and "tcp_s" to be
    the right values for your environment.
 
4.  Ongoing Concerns
 
Your site's content isn't static, so you need to refresh your
keyword index periodically.  The general idea is that you should
configure your system's programmable operator so that it
periodically runs KWDS EXEC and then tells IXSERV to reload
its index.  Each time you want the index recomputed, your
programmable operator must issue these commands:
 
  EXEC KWDS /
  CP MSG IXSERV IXLOAD LOAD 262144 sitesfsroot
 
where "sitesfsroot" is the fully-qualified SFS directory (that is,
filepool:filespace.directory) that is the root of your site.
 
(NB: "262144" is the size of an index data space in pages.  If
you are running into out-of-space problems, you can make this
number bigger, up to 524288.)
 
 
5.  If You Have Problems
 
If you have problems getting this to work, contact me:
 
Brian Wade
IBM VM/ESA Development
bkw at us.ibm.com