Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Lucene.php
#24
(10-18-2011, 07:57 PM)steinm Wrote: The fulltext index info should give you a list of terms? Search for 'mimetype'. There should
be lines like
Code:
mimetype:application/msword
mimetype:application/octet-stream
mimetype:application/pdf
mimetype:application/pdf; charset=binary
mimetype:application/vnd.ms-office
mimetype:application/x-empty
mimetype:application/zip
mimetype:image/jpeg
mimetype:image/png
mimetype:image/png; charset=binary
mimetype:text/plain; charset=us-ascii
depending on the mimetypes used in your dms.

In lucene subdir of the content directory I can see many files primarily with cfs and sti extensions. I grep through them for pdf and then for mimetype and it seem they contain these triggers but in a mixture of text+binary. Please find an example attached (I wasn't able to attach the real file so I attached the screenshot of the file opened in vi).

I was expecting to find actual list of words from all files in my DMS but cfs and sti files appear to have small size. Could it be an issue that catdoc can't handle Word 2007 files?

--
Simon

PS: I selected "Fulltext index info" in Admin-Tools and it displayed 149 items. Somewhere in the middle I found this:
Code:
mimetype:application/msword
mimetype:application/octet-stream
mimetype:application/pdf
mimetype:application/rtf
mimetype:application/vnd.ms-excel
mimetype:application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
mimetype:application/vnd.openxmlformats-officedocument.wordprocessingml.documen
Is this a good sign?
Reply


Messages In This Thread
Lucene.php - by caos - 09-25-2011, 06:08 AM

Forum Jump:


Users browsing this thread: 1 Guest(s)