10-18-2011, 07:57 PM
(10-18-2011, 08:08 AM)simonbaev Wrote:(10-06-2011, 08:35 PM)steinm Wrote: It uses pdftotext, catdoc, ssconvert and cat to extract words from the whole document.
It also takes the comment and keywords from the letodms database into account.
I have catdoc and pdftotext installed but fulltext doesn't search in PDF. I checked both php.log and apache2/error.log -- none contains anything helpful.
Would be a requirement for ssconvert to be installed for fulltext search to work? It seems that in debian it appears to be part of gnumeric package which in turns has bunch of dependencies... I'm really not impressed with idea of installing 180Mb+ of something to search in spreadsheets.
Could it be some other reason that full-text search doesn't work? I can confirm that Zend Framework is properly installed as I can see the fulltext index info in Admin-tools.
yes, ssconvert is part of gnumeric. PDF documents should be indexed even without ssconvert.
The fulltext index info should give you a list of terms? Search for 'mimetype'. There should
be lines like
Code:
mimetype:application/msword
mimetype:application/octet-stream
mimetype:application/pdf
mimetype:application/pdf; charset=binary
mimetype:application/vnd.ms-office
mimetype:application/x-empty
mimetype:application/zip
mimetype:image/jpeg
mimetype:image/png
mimetype:image/png; charset=binary
mimetype:text/plain; charset=us-ascii
Uwe