10-11-2011, 07:56 PM
Lucene.php
|
(10-06-2011, 08:35 PM)steinm Wrote: It uses pdftotext, catdoc, ssconvert and cat to extract words from the whole document. I have catdoc and pdftotext installed but fulltext doesn't search in PDF. I checked both php.log and apache2/error.log -- none contains anything helpful. Would be a requirement for ssconvert to be installed for fulltext search to work? It seems that in debian it appears to be part of gnumeric package which in turns has bunch of dependencies... I'm really not impressed with idea of installing 180Mb+ of something to search in spreadsheets. Could it be some other reason that full-text search doesn't work? I can confirm that Zend Framework is properly installed as I can see the fulltext index info in Admin-tools.
10-18-2011, 07:57 PM
(10-18-2011, 08:08 AM)simonbaev Wrote:(10-06-2011, 08:35 PM)steinm Wrote: It uses pdftotext, catdoc, ssconvert and cat to extract words from the whole document. yes, ssconvert is part of gnumeric. PDF documents should be indexed even without ssconvert. The fulltext index info should give you a list of terms? Search for 'mimetype'. There should be lines like Code: mimetype:application/msword Uwe (10-18-2011, 07:57 PM)steinm Wrote: The fulltext index info should give you a list of terms? Search for 'mimetype'. There should In lucene subdir of the content directory I can see many files primarily with cfs and sti extensions. I grep through them for pdf and then for mimetype and it seem they contain these triggers but in a mixture of text+binary. Please find an example attached (I wasn't able to attach the real file so I attached the screenshot of the file opened in vi). I was expecting to find actual list of words from all files in my DMS but cfs and sti files appear to have small size. Could it be an issue that catdoc can't handle Word 2007 files? -- Simon PS: I selected "Fulltext index info" in Admin-Tools and it displayed 149 items. Somewhere in the middle I found this: Code: mimetype:application/msword
10-18-2011, 09:42 PM
(10-18-2011, 08:35 PM)simonbaev Wrote: PS: I selected "Fulltext index info" in Admin-Tools and it displayed 149 items. Somewhere in the middle I found this: Yes, this is a good sign. If you have a look at the other items in that list, there should be some starting with 'content:'. Those are keywords found in the content of the document. Search for them and you should get the associated document. If ther aen't any items starting with 'content:' then the content could not be indexed. You could than try to add a trivial text/plain document and update the full text index. text/plain documents just need 'cat' for indexing. Uwe
I finally got it to work on the 1and1 hosting
![]() ![]() ![]() After calling the technicall department and them having no idea at all i came across FAQ on the 1and1 website: Quote:Every version of PHP has a default or global php.ini file for the default PHP settings. Normally it is suggested to leave the default or global php.ini file unaltered and to simply create your own php.ini file wherever needed to override the default settings. You can create the php.ini file using a text editor and saving the file to the folder where the settings should apply. Please note that using a php.ini file to override the default settings will only alter the PHP settings for all PHP files in the directory where it is saved. This means that any PHP files in subfolders will not recognize the changes you've made in the php.ini file. You will have to copy the file to any subdirectories needed or created symbolic links in subfolders to the custom php.ini file. So what i did was create a local php.ini file with an include_path to the local Zend folder on my hosted space and copied the php.ini to the Root and the subfolders /LetoDMS_Lucene /LetoDMS_Core /adodb also changed the .htaccess file aswell and now it works ![]() ![]() ![]() ![]() |
« Next Oldest | Next Newest »
|
Users browsing this thread: 1 Guest(s)