Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Indexing cyrillic docs
#1
Hi,

doesn't LetoDMS index words with cyrillic letters? I tried to index docs with english and russian words but I could see only english words in "Fulltext index info". I had to change Lucene/IndexedDocument.php file for fulltext cyrillic indexing. Here is my changes:
Code:
--- IndexedDocument.php.orig    2012-08-30 16:24:40.199586601 +0300
+++ IndexedDocument.php    2012-08-30 17:43:55.233553208 +0300
@@ -39,6 +39,7 @@
        if($convcmd) {
            $_convcmd = $convcmd;
        }
+        Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive() );
        $version = $document->getLatestContent();
        $this->addField(Zend_Search_Lucene_Field::Keyword('document_id', $document->getID()));
        $this->addField(Zend_Search_Lucene_Field::Keyword('mimetype', $version->getMimeType()));

I only inserted one command and then fulltext search works with russian words. I'm not sure about correctness of this change. So what do you think about this correction?
Reply
#2
(08-31-2012, 02:36 PM)sarulezzz Wrote: Hi,

doesn't LetoDMS index words with cyrillic letters? I tried to index docs with english and russian words but I could see only english words in "Fulltext index info". I had to change Lucene/IndexedDocument.php file for fulltext cyrillic indexing. Here is my changes:
Code:
--- IndexedDocument.php.orig    2012-08-30 16:24:40.199586601 +0300
+++ IndexedDocument.php    2012-08-30 17:43:55.233553208 +0300
@@ -39,6 +39,7 @@
        if($convcmd) {
            $_convcmd = $convcmd;
        }
+        Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive() );
        $version = $document->getLatestContent();
        $this->addField(Zend_Search_Lucene_Field::Keyword('document_id', $document->getID()));
        $this->addField(Zend_Search_Lucene_Field::Keyword('mimetype', $version->getMimeType()));

I only inserted one command and then fulltext search works with russian words. I'm not sure about correctness of this change. So what do you think about this correction?

Looks reasonable. Zend_Search_Lucene probably uses latin1 by default.

Uwe
Reply


Forum Jump:


Users browsing this thread: