Hi,
I'm trying to make some changes to the search.conf on ca 2.0.3
In the Italian language there are many words that are separated by the ' apex character,
for example "dell'opera" or "l'esecizio" and in the ca_sql_search_word table I see the words indexed as "lesercizio" or "dellopera".
I can't use stop words because some words are too significant like "un uno",
for example "uno nessuno e centomila" is a famous work by Pirandello.
Sorry, but I'm not very good with regular expressions...
How can I add the apex as a separation character?
Many thanks for your help,
antonio.
I write below the change I tried to make:
Stop words are common terms unlikely to selective and therefore filtered from search input
prior to execution of the search. Filtering stop words may reduce the size of the search
index and improve performance. However, it may adversely affect search accuracy and
relevance if the stop word list contents terms significant for current content.
#
Stop words will be applied to all cataloguing languages for which a word list is defined
use_stop_words = 0
Regular expression defining characters to be considered whitespace when indexing using
the SqlSearch2 search engine plugin
#whitespace_tokenizer_regex = "[\s\"“”\—]+"
whitespace_tokenizer_regex = "[\s\"“”\—\/]+"
Regular expression defining punctuation characters to be stripped prior to indexing using
the SqlSearch2 search engine plugin
#punctuation_tokenizer_regex = "[,;:\(\){\}[\]\|\\\+\!\&«»\'’]+"
punctuation_tokenizer_regex = "[,;:\(\){\}[\]\|\\\+\!\&«»\'’]+"
Regular expression defining characters to be stripped from beginning and end only prior to indexing using
the SqlSearch2 search engine plugin. These are characters typically used in identifier where leading
and trailing occurrences are not significant, but interior occurrences are.
#separator_tokenizer_regex = "[\.\-\/]+"
separator_tokenizer_regex = "[\.'’\-\/]+"