Login

steinm · 10-18-2011, 07:57 PM

(10-18-2011, 08:08 AM)simonbaev Wrote:
(10-06-2011, 08:35 PM)steinm Wrote: It uses pdftotext, catdoc, ssconvert and cat to extract words from the whole document.
It also takes the comment and keywords from the letodms database into account.

I have catdoc and pdftotext installed but fulltext doesn't search in PDF. I checked both php.log and apache2/error.log -- none contains anything helpful.

Would be a requirement for ssconvert to be installed for fulltext search to work? It seems that in debian it appears to be part of gnumeric package which in turns has bunch of dependencies... I'm really not impressed with idea of installing 180Mb+ of something to search in spreadsheets.

Could it be some other reason that full-text search doesn't work? I can confirm that Zend Framework is properly installed as I can see the fulltext index info in Admin-tools.

yes, ssconvert is part of gnumeric. PDF documents should be indexed even without ssconvert.
The fulltext index info should give you a list of terms? Search for 'mimetype'. There should
be lines like

Code:
mimetype:application/msword

mimetype:application/octet-stream

mimetype:application/pdf

mimetype:application/pdf; charset=binary

mimetype:application/vnd.ms-office

mimetype:application/x-empty

mimetype:application/zip

mimetype:image/jpeg

mimetype:image/png

mimetype:image/png; charset=binary

mimetype:text/plain; charset=us-ascii

depending on the mimetypes used in your dms.

Uwe

Login
Username:
Password:	Lost Password?
	Remember me