Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
LetoDMS Full Text Search under Windows Server
#1
Brick 
Hello! I'm trying to put letoDMS to work in a server under Windows and IIS7.

My FullText Index seems not to be working, and I've been reading all I can at the forums, but can't find a solution for this one.

It seems that you need pdftotext to put the index to fully work with Searchable PDF Files. But I cant "add" to Windows console the "pdftotext" command. I've found something about pdftotext in the file "Lucene/IndexedDocument.php" but I don't really know what to do there.

Any experience with this? Can I modify the code to call the file with a route, like: 'C:\pdftotex.exe' (already try it, but doesn't work).

Thanks so much for your help!

Gustavo from Argentina.
Reply
#2
(10-26-2012, 11:24 PM)lobizonxp Wrote: Hello! I'm trying to put letoDMS to work in a server under Windows and IIS7.

My FullText Index seems not to be working, and I've been reading all I can at the forums, but can't find a solution for this one.

It seems that you need pdftotext to put the index to fully work with Searchable PDF Files. But I cant "add" to Windows console the "pdftotext" command. I've found something about pdftotext in the file "Lucene/IndexedDocument.php" but I don't really know what to do there.

Any experience with this? Can I modify the code to call the file with a route, like: 'C:\pdftotex.exe' (already try it, but doesn't work).

There was no configuration in LetoDMS 3.3.x for the commands to turn a document into text for indexing by lucene. You must modify the code. LetoDMS 3.4.0 will have a configuration option.

Uwe
Reply
#3
(10-29-2012, 02:50 AM)steinm Wrote:
(10-26-2012, 11:24 PM)lobizonxp Wrote: Hello! I'm trying to put letoDMS to work in a server under Windows and IIS7.

My FullText Index seems not to be working, and I've been reading all I can at the forums, but can't find a solution for this one.

It seems that you need pdftotext to put the index to fully work with Searchable PDF Files. But I cant "add" to Windows console the "pdftotext" command. I've found something about pdftotext in the file "Lucene/IndexedDocument.php" but I don't really know what to do there.

Any experience with this? Can I modify the code to call the file with a route, like: 'C:\pdftotex.exe' (already try it, but doesn't work).

There was no configuration in LetoDMS 3.3.x for the commands to turn a document into text for indexing by lucene. You must modify the code. LetoDMS 3.4.0 will have a configuration option.

Uwe
@steinm i think he means indexing pdf files like i tryed on my Synology.
with configuration in 3.4 you men letoDMS can OCR files? that would be really perfect!

@ lobizonxp try to check some php-settings about safe_mode_exec_dir so that letodms can execute that pdftotext file.
is your pdftotext command working? check on some of your pdf-Files uploadet so data/folder in letoDMS.
Reply
#4
And I did it!

The PDF files uses 2 programs to extract the text, pdftotext and sed. You need to put those 2 in the Windows folder, and reboot the server. Sed also need 3 dll's to work.

The piece of code that wasn't working in 'Lucene/IndexedDocument.php' line 33, was the sed part. The \' is not working on Windows, so you have to replace that for ".

This is solved! Tnx a lot!
Reply
#5
(10-29-2012, 05:39 PM)atarifreak Wrote: @steinm i think he means indexing pdf files like i tryed on my Synology.
with configuration in 3.4 you men letoDMS can OCR files? that would be really perfect!
[/quote]

No. pdftotext just extracts the text from the pdf file if it is included as text, which is usually the case unless the pdf pages are bare images.

Uwe
Reply
#6
In order to put OCR to LetoDMS, they have to stop paying atention about what is really important of the system. There are a lot of software to turn a PDF with images into Searchable PDF that pdftotext can extract text from.
Reply
#7
(10-30-2012, 10:02 PM)lobizonxp Wrote: In order to put OCR to LetoDMS, they have to stop paying atention about what is really important of the system. There are a lot of software to turn a PDF with images into Searchable PDF that pdftotext can extract text from.

Actually, OCR could even be added very easily. All you need is programm (e.g. a shell script) that runs a document through OCR

Uwe
Reply
#8
(10-29-2012, 06:26 PM)lobizonxp Wrote: And I did it!

The PDF files uses 2 programs to extract the text, pdftotext and sed. You need to put those 2 in the Windows folder, and reboot the server. Sed also need 3 dll's to work.

The piece of code that wasn't working in 'Lucene/IndexedDocument.php' line 33, was the sed part. The \' is not working on Windows, so you have to replace that for ".

This is solved! Tnx a lot!

This totally solved the problem on Windows =).
Thank you.
Reply
#9
Kafran, just an aware: Pay atention to excecution times... I have about 2500 files stored, and Full text index takes hours!

In order to surpass this problem, open the Lucene file "IndexedDocument.php" to add this line in it: ini_set('max_execution_time', 3600);

Careful, your server should have another excecution time, example: IIS7 hace a CGI section where you must change the number. 3600 seconds is the maximum timeout for IIS7.

Hope it helped! ¿I'm writing english well? It's hard to think in spanish and write english! xD

Regards!
Reply
#10
(11-07-2012, 09:02 PM)lobizonxp Wrote: Kafran, just an aware: Pay atention to excecution times... I have about 2500 files stored, and Full text index takes hours!

In order to surpass this problem, open the Lucene file "IndexedDocument.php" to add this line in it: ini_set('max_execution_time', 3600);

Careful, your server should have another excecution time, example: IIS7 hace a CGI section where you must change the number. 3600 seconds is the maximum timeout for IIS7.

Hope it helped! ¿I'm writing english well? It's hard to think in spanish and write english! xD

Regards!

I do not understood this part of execution time. Sorry, I'm not a coder =/.

Yeah, I can understand your english. I think in Portuguese and write in english xD.

I'm running Leto at Xampp (Apache + php).

I have 33 files stored and growing =).

Wow, 2500 files stored? Which version do you use? I'm using the 3.4.0RC3 but it seems to be a litle buggy. I need something more stable.

How do you make backup of your data? Backup is fundamental for me =x.
Reply


Forum Jump:


Users browsing this thread: