Configuration file format -- Attributes
ht://Dig © 1995, 1996, 1997 Andrew Scherpbier
<andrew@contigo.com>
Please see the file COPYING for license
information.
Alphabetical list of attributes
- add_anchors_to_excerpt
-
- type:
- boolean
- used by:
- htsearch
- default:
- true
- description:
- If set to true, the first occurence of each matched word
in the excerpt
will be linked to the closest anchor in the document.
This only has effect if the excerpt value is used
in the excerpt is actually going to be displayed.
- example:
- add_anchors_to_excerpt: no
- allow_numbers
-
- type:
- boolean
- used by:
- htdig
- default:
- false
- description:
- If set to true, numbers are considered words. This means
that searches can be done on number as well as regular
words. All the same rules apply to numbers as to words.
See the description of
valid_punctuation
for the rules used to determine what a word is.
- example:
- allow_numbers: true
- allow_virtual_hosts
-
- type:
- boolean
- used by:
- htdig
- default:
- true
- description:
- If set to true, htdig will index virtual web sites as
expected. If false, all URL host names will be
normalized into whatever the DNS server claims the IP
address to map to. If this option is set to false,
there is no way to index either "soft" or "hard" virtual
web sites.
- example:
- allow_virtual_hosts: false
- bad_extensions
-
- type:
- string list
- used by:
- htdig
- default:
- .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif .jpg
.jpeg .aiff
- description:
- This is a list of extensions on URLs which are considered
non-parsable. This list is used mainly to supplement the
MIME-types that the HTTP server provides with documents.
Some HTTP servers do not have a correct list of MIME-types
and so can advertise certain documents as text while they
are some binary format.
- example:
- bad_extensions: .foo .bar .bad
- bad_word_list
-
- type:
- string
- used by:
- htdig and
htsearch
- default:
- ${common_dir}/bad_words
- descriptions:
- This specifies a file which contains words which should be
excluded when digging or searching. This list should include
the most common words or other words that you don't want
to be able to search on (things like sex or
smut are examples of these.)
The file should contain one word per line. A sample bad
words file is located in the examples directory.
- example:
- bad_word_list: ${common_dir}/badwords.txt
- common_dir
-
- type:
-
string
- used by:
-
htdig,
htnotify,
htfuzzy,
htmerge and
htsearch
- default:
-
COMMON_DIR
- description:
-
Specifies the directory for files that will or can be
shared among different search databases. The default
value for this attribute is defined at compile time.
- example:
-
common_dir: /tmp
- create_image_list
-
- type:
-
boolean
- used by:
-
htdig
- default:
-
false
- description:
-
If set to true, a file with all the image
URLs that were seen will be created, one URL
per line. This list will not be in any order
and there will be lots of duplicates, so
after htdig has completed, it should be
piped through sort -u to get a
unique list.
- example:
-
create_image_list: yes
- create_url_list
-
- type:
-
boolean
- used by:
-
htdig
- default:
-
false
- description:
-
If set to true, a file with all the URLs
that were seen will be created, one URL per
line. This list will not be in any order and
there will be lots of duplicates, so after
htdig has completed, it should be piped
through sort -u to get a unique list.
- example:
-
create_url_list: yes
- database_base
-
- type:
-
string
- used by:
-
htdig,
htnotify,
htfuzzy,
htmerge and
htsearch
- default:
-
${database_dir}/db
- description:
-
This is the common prefix for files that are specific to a
search database. Many different attributes use this
prefix to specify filenames. Several search databases can
share the same directory by just changing this value for
each of the databases.
- example:
-
database_base: ${database_dir}/sales
- database_dir
-
- type:
-
string
- used by:
-
htdig,
htnotify,
htfuzzy,
htmerge and
htsearch
- default:
-
DATABASE_DIR
- description:
-
This is the directory which contains all database and
other files related to ht://Dig. It is never used directly
by any of the programs, but other attributes are defined
in terms of this one.
The default value of this attribute is determined at
compile time.
- example:
-
database_dir: /var/htdig
- doc_db
-
- type:
-
string
- used by:
-
htdig,
htmerge and
htsearch,
- default:
-
${database_base}.docdb
- description:
-
This file will contain a GDBM database of documents
indexed by URL. It contains all the information gathered
for each document, so this file can become rather large if
max_head_length
is set to a large value.
- example:
-
doc_db: ${database_base}documents.gdbm
- doc_index
-
- type:
-
string
- used by:
-
htmerge and
htsearch,
- default:
-
${database_base}.docs.index
- description:
-
This file will contain a GDBM database which
maps document numbers to document URLs. It
is basically an intermediate database from
the word database to the document database.
- example:
-
doc_index: documents.index.gdbm
- doc_list
-
- type:
-
string
- used by:
-
htdig
- default:
-
${database_base}.docs
- description:
-
This file is basically a text version of the
file specified in doc_db.
Its only use is to have a human readable
database of all documents. The file is easy
to parse with tools like perl or tcl.
- example:
-
doc_list: /tmp/documents.text
- end_elipses
-
- type:
-
string
- used by:
-
htsearch
- default:
-
<b><tt> ...</tt></b>
- description:
-
When excerpts are displayed in the search
output, this string will be appended to the
excerpt if there is text following the text
displayed. This is just a visual reminder to
the user that the excerpt is only part of
the complete document.
- example:
-
end_elipses: ...
- endings_affix_file
-
- type:
-
string
- used by:
-
htfuzzy
- default:
-
${common_dir}/english.aff
- description:
-
Specifies the location of the file which contains the
affix rules used to create the endings search algorithm
databases. Consult the documentation on
htfuzzy for more information on
the format of this file.
- example:
-
endings_affix_file: /var/htdig/affix_rules
- endings_dictionary
-
- type:
-
string
- used by:
-
htfuzzy
- default:
-
${common_dir}/english.0
- description:
-
Specifies the location of the file which contains the
dictionary used to create the endings search algorithm
databases. Consult the documentation on
htfuzzy for more information on
the format of this file.
- example:
-
endings_dictionary: /var/htdig/dictionary
- endings_root2word_db
-
- type:
-
string
- used by:
-
htfuzzy and
- default:
-
${common_dir}/root2word.gdbm
- description:
-
This attributes specifies the database filename to be used in
the 'endings' fuzzy search algorithm. The database maps
word roots to all legal words with that root. For more
information about this and other fuzzy search algorithms,
consult the htfuzzy
documentation.
Note that the default value uses the common_dir attribute instead of
the database_dir attribute.
This is because this database can be shared with different
search databases.
- example:
-
endings_root2word_db: /var/htdig/r2t.gdbm
- endings_word2root_db
-
- type:
-
string
- used by:
-
htfuzzy and
- default:
-
${common_dir}/word2root.gdbm
- description:
-
This attributes specifies the database filename to be used in
the 'endings' fuzzy search algorithm. The database maps
words to their root. For more information about this and
other fuzzy search algorithms, consult the htfuzzy documentation.
Note that the default value uses the common_dir attribute instead of
the database_dir attribute.
This is because this database can be shared with different
search databases.
- example:
-
endings_root2word_db: /var/htdig/r2t.gdbm
- excerpt_length
-
- type:
-
number
- used by:
-
htsearch
- default:
-
300
- description:
-
This is the maximum number of characters the
displayed excerpt will be limited to. The
first matched word will be bolded in the
middle of the excerpt so that there is some
surrounding context.
The start_elipses and
end_elipses
are used to indicate
that the document contains text before and
after the displayed excerpt respectively.
- example:
-
excerpt_length: 500
- excerpt_show_top
-
- type:
-
boolean
- used by:
-
htsearch
- default:
-
false
- description:
-
If set to true, the excerpt of a match will always show
the top of the matching document. If it is false (the
default), the excerpt will attempt to show the part of the
document that actually contains one of the words.
- example:
-
excerpt_show_top: yes
- exclude_urls
-
- type:
-
string list
- used by:
-
htdig
- default:
-
cgi-bin
- description:
-
If a URL contains any of the space separated
patterns, it will be rejected. This is used
to exclude such common things such as an infinite virtual
web-tree which start with cgi-bin.
- example:
-
exclude_urls: students.html cgi-bin
- heading_factor_1 -
heading_factor_6
-
- type:
-
number
- used by:
-
htdig
- default:
-
heading_factor_1: 5
heading_factor_2: 4
heading_factor_3: 3
heading_factor_4: 1
heading_factor_5: 1
heading_factor_6: 0
- description:
-
This is a factor which will be used to
multiply the weight of words between <h1> and
</h1> tags. It is used to assign the level of
importance to certain headers. Setting a factor to 0 will
cause words in this heading to be ignored. The number may
be a floating point number. See also the title_factor and text_factor attributes.
- example:
-
heading_factor_1: 7.75
heading_factor_2: 5.3
heading_factor_3: 2
heading_factor_4: 0
heading_factor_5: 0
heading_factor_6: 0
- htnotify_sender
-
- type:
-
string
- used by:
-
htnotify
- default:
-
webmaster@www
- description:
-
This specifies the email address that htnotify email
messages get sent out from. The address is forged using
/usr/lib/sendmail. Check htnotify/htnotify.cc for
detail on how this is done.
- example:
- htnotify_sender: bigboss@yourcompany.com
- http_proxy
-
- type:
-
string
- used by:
-
htdig
- default:
-
<empty>
- description:
-
When this attribute is set, all HTTP document retrievals
will be done using the HTTP-PROXY protocol. The URL
specified in this attribute points to the host and port
where the proxy server resides.
The use of a proxy server greatly improves performance
of the indexing process.
- example:
- http_proxy: http://proxy.bigbucks.com:3128
- image_list
-
- type:
-
string
- used by:
-
htdig
- default:
-
${database_base}.images
- description:
-
This is the file that a list of image URLs gets written to
by htdig when the create_image_list is set to
true. As image URLs are seen, they are just appended to
this file, so after htdig finishes it is probably a good
idea to run sort -u on the file to eliminate
duplicates from the file.
- example:
- image_list: allimages
- keyword_factor
-
- type:
-
number
- used by:
-
htdig
- default:
-
100
- description:
-
This is a factor which will be used to
multiply the weight of words in the list of keywords of a document.
The number may
be a floating point number. See also the title_factor and text_factorattributes.
- example:
-
keyword_factor: 12
- keywords_meta_tag_names
-
- type:
-
list
- used by:
-
htdig
- default:
-
keywords htdig-keywords
- description:
-
The words in this list are used to search for keywords in HTML
META tags. This list can contain any number of strings
that each will be seen as the name for whatever keyword convention
is used.
The META tags have the following format:
<META name="somename" value="somevalue"<
- example:
-
keywords_meta_tag_names: keywords description
- limit_urls_to
-
- type:
-
string list
- used by:
-
htdig
- default:
-
.sdsu.edu/
- description:
-
This specifies a set of patterns that all
URLs have to match against in order for them
to be included in the search.
Any number of strings can be specified,
separated by spaces. If multiple patterns
are given, at least one of the patterns has
to match the URL.
Matching is a case-insensitive string match
on the URL to be used. The match will be performed
after the relative references have been converted
to a valid URL. This means that the URL will
always start with http://.
Granted, this is not the
perfect way of doing this, but it is simple
enough and it covers most cases.
- example:
-
limit_urls_to: .sdsu.edu kpbs
- locale
-
- type:
-
string
- used by:
-
htdig
- default:
-
iso_8859_1
- description:
-
Set this to whatever locale you want your search
database cover. It affects the way international
characters are dealt with.
On most systems a list of legal locales can be found in
/usr/lib/locale. Also check the
setlocale(3C) man page.
- example:
-
locale: en_US
- maintainer
-
- type:
-
string
- used by:
-
htdig
- default:
-
badguy@localhost
- description:
-
This should be the email address of the
person in charge of the digging operation.
This string is added to the user-agent:
field when the digger sends a request to a
server.
- example:
-
maintainer: ben.dover@uptight.com
- match_method
-
- type:
-
string
- used by:
-
htsearch
- default:
-
or
- description:
-
This is the default method for matching that htsearch
uses. The valid choices are:
This attribute will only be used if the HTML form that
calls htsearch didn't have the method
value set.
- example:
-
match_method: boolean
- matches_per_page
-
- type:
-
number
- used by:
-
htsearch
- default:
-
10
- description:
-
If this is set to a relatively small number, the matches
will be shown in pages instead of all at once.
- example:
-
matches_per_page: 999
- max_description_length
-
- type:
-
number
- used by:
-
htdig
- default:
-
60
- description:
-
While gathering descriptions of URLs, htdig will only record
those descriptions which are shorter than
this length. This is used mostly to deal
with broken HTML. (If a hyperlink is not
terminated with a </a> the description
will go on until the end of the document.)
- example:
-
max_description_length: 40
- max_doc_size
-
- type:
-
number
- used by:
-
htdig
- default:
-
100000
- description:
-
This is the upper limit to the amount of data retrieved
for documents. This is mainly used to prevent
unreasonable memory consumption since each document will
be read into memory by htdig.
- example:
-
max_doc_size: 5000000
- max_head_length
-
- type:
-
number
- used by:
-
htdig
- default:
-
512
- description:
-
For each document retrieved, the top of the
document is stored. This attribute
determines the size of this block. The text
that will be stored is only the text; no
markup is stored.
We found that storing 50,000 bytes will
store about 95% of all the documents
completely. This really depends on how much
storage is available and how much you want
to show.
- example:
-
max_head_length: 50000
- max_hop_count
-
- type:
-
number
- used by:
-
htdig
- default:
-
999999
- description:
-
Instead of limiting the indexing process by URL pattern,
it can also be limited by the number of hops or
clicks a document is removed from the starting URL.
Unfortunately, this only works reliably when a complete
index is created, not an update.
The starting page will have hop count 0.
- example:
-
max_hop_count: 4
- max_stars
-
- type:
-
number
- used by:
-
htsearch
- default:
-
4
- description:
-
When stars are used to display the score of
a match, this value determines the maximum
number of stars that can be displayed.
- example:
-
max_stars: 6
- metaphone_db
-
- type:
-
string
- used by:
-
htfuzzy and
htsearch
- default:
-
${database_base}.metaphone.gdbm
- description:
-
The database file used for the fuzzy "metaphone" search
algorithm. This database is created by htfuzzy and used by htsearch.
- example:
-
metaphone_db: ${database_base}.mp.db
- method_names
-
- type:
- string list
- used by:
- htsearch
- default:
-
and All or Any boolean Boolean
- description:
- These values are used to create the
method menu. It consists of pairs.
The first element of each pair is one of the known
methods, the second element is the text that will be
shown in the menu for that method. This text needs to
be quoted if it contains spaces.
- example:
- method_names: or Or and And
- minimum_word_length
-
- type:
- number
- used by:
- htdig and
htsearch
- default:
-
3
- description:
- This sets the minimum length of words that will be
indexed. Words shorter than this value will be silently
ignored but still put into the excerpt.
Note that by making this value less than 3, a lot more
words that are very frequent will be indexed. It might
be advisable to add some of these to the bad_words list.
- example:
- minimum_word_length: 2
- next_page_text
-
- type:
- string
- used by:
- htsearch
- default:
- [next]
- description:
-
The text displayed in the hyperlink to go to the next
page of matches.
- example:
- next_page_text: <img
src="/htdig/buttonr.gif">
- no_excerpt_text
-
- type:
- string
- used by:
- htsearch
- default:
- <em>(None of the search words were found in the top
of this docuemnt.)</em>
- description:
- This text will be displayed in place of the excerpt if
there is no excerpt available. If this attribute is set
to nothing (blank), the excerpt label will not be
displayed in this case.
- example:
- no_excerpt_text:
- no_next_page_text
-
- type:
- string
- used by:
- htsearch
- default:
- [next]
- description:
-
The text displayed where there would normally be a
hyperlink to go to the next page of matches.
- example:
- no_next_page_text:
- no_prev_page_text
-
- type:
- string
- used by:
- htsearch
- default:
- [prev]
- description:
-
The text displayed where there would normally be a
hyperlink to go to the previous page of matches.
- example:
- no_prev_page_text:
- nothing_found_file
-
- type:
-
string
- used by:
-
htsearch
- default:
-
${common_dir}/nothing_found.html
- description:
-
This specifies the file which contains the HTML
text to display when no matches were found.
The file should contain a complete HTML
document.
Note that this attribute could also be defined in terms of
database_base to make is
specific to the current search database.
- example:
-
nothing_found_file: /www/searching/nothing.html
- prev_page_text
-
- type:
- string
- used by:
- htsearch
- default:
- [prev]
- description:
-
The text displayed in the hyperlink to go to the previous
page of matches.
- example:
- prev_page_text: <img
src="/htdig/buttonl.gif">
- remove_bad_urls
-
- type:
- boolean
- used by:
- htmerge
- default:
- false
- description:
-
If TRUE, htmerge will remove any URLs which were marked as
unreachable by htdig from the database. If FALSE, it will
not do this. When htdig is run in initial mode, documents
which were referred to but could not be accessed should
probably be removed, and hence this option should then be
set to TRUE, however, if htdig is run to update the
database, this may cause documents on a server which is
temporarily unavailable to be removed. This is probably
NOT what was intended, so hence this option should be set
to FALSE in that case.
- example:
- remove_bad_urls: true
- robotstxt_name
-
- type:
- string
- used by:
- htdig
- default:
- htdig
- description:
-
Sets the name that htdig will look for when parsing
robots.txt files. This can be used to make htdig appear
as a different spider than ht://Dig. Useful to
distinguish between a private and a global index.
- example:
- robotstxt_name: myhtdig
- search_algorithm
-
- type:
- string list
- used by:
- htsearch
- default:
- exact:1
- description:
- Specifies the search algorithms and their weight to use
when searching.
Each entry in the list consists of the algorithm name,
followed by a colon (:) followed by a weight multiplier.
The multiplier is a floating point number between 0 and 1.
Current algorithms supported are:
- exact
-
The default exact word matching algorithm. This
will find only exacly matched words.
- soundex
-
Uses a slighly modified soundex algorithm to match
words. This requires that the soundex database be
present. It is generated with the
htfuzzy program.
- metaphone
-
Uses the metaphone algorithm for matching words.
This algorithm is more specific to the english
language than soundex. It is generated with the
htfuzzy program.
- endings
-
This algorithm uses language specific word endings
to find matches. Each word is first reduced to its
word root and then all known legal endings are used
for the matching. This algorithm uses two databases
which are generated with
htfuzzy.
- synonyms
-
Performs a dictionary lookup on all the words.
This algorithm uses a database generated with the htfuzzy program.
- example:
- search_algorithm: exact:1 soundex:0.3
- search_results_footer
-
- type:
-
string
- used by:
-
htsearch
- default:
-
${common_dir}/footer.html
- description:
-
This specifies a filename to be output at the end of
search results. While outputing the footer, some
variables will be expanded. Variables use the same syntax
as the bourne shell. If there is a variable VAR, the
following will all be recognized:
The following variables are available:
- MATCHES
-
The number of documents that were matched.
- PLURAL_MATCHES
-
If MATCHES is not 1, this will be the string "s",
else it is an empty string. This can be used to say
something like "$(MATCHES) document$(PLURAL_MATCHES)
were found"
- MAX_MATCHES
-
The number of maximum displayed matches.
- MAX_STARS
-
The value of the max_stars
attribute.
- LOGICAL_WORDS
-
A string of the search words with either "and" or
"or" between the words, depending on the type of
search.
- WORDS
-
A string of the search words with spaces in between.
Note that this file will NOTbe
output if no matches were found. In this case the nothing_found_file
attribute is used in stead.
- example:
-
search_results_footer: /usr/local/etc/ht/end-stuff.html
- search_results_header
-
- type:
-
string
- used by:
-
htsearch
- default:
-
${common_dir}/header.html
- description:
-
This specifies a filename to be output at the start of
search results. While outputing the header, some
variables will be expanded. Variables use the same syntax
as the bourne shell. If there is a variable VAR, the
following will all be recognized:
The following variables are available:
- MATCHES
- The number of documents that were matched.
- PLURAL_MATCHES
- If MATCHES is not 1, this will be the string "s",
else it is an empty string. This can be used to say
something like "$(MATCHES) document$(PLURAL_MATCHES)
were found"
- MAX_MATCHES
- The number of maximum displayed matches.
- MAX_STARS
- The value of the max_stars attribute.
- LOGICAL_WORDS
- A string of the search words with either "and" or
"or" between the words, depending on the type of search.
- WORDS
- A string of the search words with spaces in between.
Note that this file will NOTbe
output if no matches were found. In this case the nothing_found_file
attribute is used in stead.
- example:
- search_results_header: /usr/local/etc/ht/start-stuff.html
- soundex_db
-
- type:
-
string
- used by:
-
htfuzzy and
htsearch
- default:
-
${database_base}.soundex.gdbm
- description:
-
The database file used for the fuzzy "soundex" search
algorithm. This database is created by htfuzzy and used by htsearch.
- example:
-
soundex_db: ${database_base}.snd.gdbm
- star_blank
-
- type:
-
string
- used by:
-
htsearch
- default:
-
${image_url_prefix}/star_blank.gif
- description:
-
This specifies the URL to use to display a blank of the
same size as the star defined in the image_star attribute or in the
image_patterns attribute.
- example:
-
star_blank: http://www.somewhere.org/icons/elephant.gif
- star_image
-
- type:
-
string
- used by:
-
htsearch
- default:
-
${image_url_prefix}/star.gif
- description:
-
This specifies the URL to use to display a
star. This allows you to use some other icon
instead of a star. (We like the star...)
The display of stars can be turned on or off
with the use_star_image
attribute
and the maximum number of stars that can be
displayed is determined by the
max_stars attribute.
Eventhough the image can be changed, the ALT
value for the image will always be a '*'.
- example:
-
star_image: http://www.somewhere.org/icons/elephant.gif
- star_patterns
-
- type:
-
string list
- used by:
-
htsearch
- default:
-
- description:
-
This attribute allows the star image to be changed
depending on the URL or the match it is used for. This
is mainly to make a visual distinction between matches
on different web sites. The star image could be
replaced with the logo of the company the match refers to.
It is advisable to keep all the images the same size in
order to line things up properly in a short result listing.
The format is simple. It is a list of pairs. The first
element of each pair is a pattern, the second element is
a URL to the image for that pattern.
- example:
star_patterns: | http://www.sdsu.edu | /sdsu.gif \ |
| http://www.ucsd.edu | /ucsd.gif |
- start_elipses
-
- type:
-
string
- used by:
-
htsearch
- default:
-
<b><tt>... </tt></b>
- description:
-
When excerpts are displayed in the search
output, this string will be prepended to the
excerpt if there is text before the text
displayed. This is just a visual reminder to
the user that the excerpt is only part of
the complete document.
- example:
-
start_elipses: ...
- start_url
-
- type:
-
string list
- used by:
-
htdig
- default:
-
http://www/
- description:
-
This is the list of URLs that will be used to start a dig
when there was no existing database. Note that multiple
URLs can be given here.
- example:
-
start_url: http://www.somewhere.org/alldata/index.html
- substring_max_words
-
- type:
-
integer
- used by:
-
htsearch
- default:
-
25
- description:
-
The Sbustring fuzzy algorithm could potentially match a
very large number of words. This value limits the
number of words each substring pattern can match. Note
that this does not limit the number of documents that
are matched in any way.
- example:
-
substring_max_words: 100
- synonym_dictionary
-
- type:
-
string
- used by:
-
htfuzzy
- default:
-
${common_dir}/synonyms
- description:
-
This points to a text file containing the synonym
dictionary used for the synonyms search algorithm.
Each line of this file has at least two words. The
first word is the word to replace, the rest of the words
are synonyms for that word.
- example:
-
synonym_dictionary: /usr/dict/synonyms
- synonym_db
-
- type:
-
string
- used by:
-
htsearch and
htfuzzy
- default:
-
${common_dir}/synonyms.gdbm
- description:
-
Points to the database that htfuzzy creates when the
synonyms algorithm is used.
htsearch uses this to
perform synonym dictionary lookups.
- example:
-
synonym_db: ${database_base}.syn.gdbm
- syntax_error_file
-
- type:
-
string
- used by:
-
htsearch
- default:
-
${common_dir}/syntax.html
- description:
-
This points to the file which will be displayed if a
boolean expression syntax error was found.
- example:
-
syntax_error_file: ${common_dir}/synerror.html
- template_map
-
- type:
-
string list
- used by:
-
htsearch
- default:
-
Long builtin-long builtin-long Short builtin-short builtin-short
- description:
-
This maps match template names to internal names and
template file names. It is a list of triplets. The
first element in each triplet is the name that will be
displayed in the FORMAT menu. The second element is the
name used internally and the third element is a filename
of the template to use.
There are two predefined templates, namely
builtin-long and
builtin-short. If the filename is one
of those, they will be used instead.
More information about templates can be found in the htsearch documentation.
- example:
template_map: | Short short ${common_dir}/short.html \
Normal normal builtin-long \
Detailed detail ${common_dir}/detail.html |
- template_name
-
- type:
-
string
- used by:
-
htdig
- default:
-
builtin-long
- description:
-
Specifies the default template if none is given by the
search form. This needs to map to the template_map.
- example:
-
template_name: long
- text_factor
-
- type:
-
number
- used by:
-
htdig
- default:
-
1
- description:
-
This is a factor which will be used to
multiply the weight of words that are not in any special
part of a document.
Setting a factor to 0 will
cause normal words to be ignored. The number may
be a floating point number. See also the heading_factor_[1-6], title_factor, and keyword_factor attributes.
- example:
-
text_factor: 0
- timeout
-
- type:
-
number
- used by:
-
htdig
- default:
-
30
- description:
-
Specifies the time the digger will wait to
complete a network read. This is just a
safeguard against unforseen things like the
all too common transformation from a network
to a notwork.
The timeout is specified in seconds.
- example:
-
timeout: 42
- title_factor
-
- type:
-
number
- used by:
-
htdig
- default:
-
100
- description:
-
This is a factor which will be used to
multiply the weight of words in the title of a document.
Setting a factor to 0 will
cause words in the title to be ignored. The number may
be a floating point number. See also the heading_factor_[1-6] attribute.
- example:
-
title_factor: 12
- url_list
-
- type:
-
string
- used by:
-
htdig
- default:
-
${database_base}.urls
- description:
-
This file is only created if
create_url_list is
set to true. It will contain a list of all URLs that were
seen.
- example:
-
url_list: /tmp/urls
- use_star_image
-
- type:
-
boolean
- used by:
-
htsearch
- default:
-
true
- description:
-
If set to true, the star_image
attribute is used to display upto
max_stars images for each match.
- example:
-
use_star_image: no
- valid_punctuation
-
- type:
-
string
- used by:
-
htdig and
htsearch
- default:
-
.-_/!#$%^&*'
- description:
-
This is the set of characters which will be
deleted from the document before determining
what a word is. This means that if a
document contains something like
Andrew's the digger will see
this as Andrews.
The same transformation is performed on the
keywords the search engine gets.
- example:
-
valid_punctuation: -'
- word_db
-
- type:
-
string
- used by:
-
htdig,
htmerge and
htsearch,
- default:
-
${database_base}.words.gdbm
- description:
-
This is the main word database. It is an
index of all the words to a list of
documents that contain the words. This
database can grow large pretty quickly.
- example:
-
word_db: ${database_base}.allwords.gdbm
- word_list
-
- type:
-
string
- used by:
-
htdig and
htmerge
- default:
-
${database_base}.wordlist
- description:
-
This is the input file that htmerge
uses to create
the main words database specified by
word_db.
This file gets about as large as the main
words database. If this file exists when
htdig is running, it will append data to
this file. htmerge will then use the
existing data and the appended data to
create a completely new main word database.
- example:
-
word_list: ${database_base}.allwords.text
Last modified: Thu Jul 3 10:31:29 PDT
andrew@contigo.com