LetoDMS Community Forum
Full Text Index - Printable Version

+- LetoDMS Community Forum (https://community.letodms.com)
+-- Forum: LetoDMS Support (https://community.letodms.com/forumdisplay.php?fid=4)
+--- Forum: Technical Support (https://community.letodms.com/forumdisplay.php?fid=10)
+--- Thread: Full Text Index (/showthread.php?tid=522)

Pages: 1 2 3


Full Text Index - Daanl - 07-09-2012

Good morning,

My LetoDMS installation works 100%.

I have uploaded pdf files to the server but when I try to index the file it seems like it is only indexing the file name. I loaded OCR pdf file on the server and I would like to index the entire file and do a full text search.

Can someone possibly help with this on how can I create a full text index of the pdf file and not just index the file name.

Regards


RE: Full Text Index - steinm - 07-09-2012

(07-09-2012, 01:46 PM)Daanl Wrote: Good morning,

My LetoDMS installation works 100%.

I have uploaded pdf files to the server but when I try to index the file it seems like it is only indexing the file name. I loaded OCR pdf file on the server and I would like to index the entire file and do a full text search.

Can someone possibly help with this on how can I create a full text index of the pdf file and not just index the file name.

You need pdftotext for it. Check if it is installed.

Uwe


RE: Full Text Index - Daanl - 07-10-2012

(07-09-2012, 06:48 PM)steinm Wrote:
(07-09-2012, 01:46 PM)Daanl Wrote: Good morning,

My LetoDMS installation works 100%.

I have uploaded pdf files to the server but when I try to index the file it seems like it is only indexing the file name. I loaded OCR pdf file on the server and I would like to index the entire file and do a full text search.

Can someone possibly help with this on how can I create a full text index of the pdf file and not just index the file name.

You need pdftotext for it. Check if it is installed.

Uwe

Hi Uwe,
pdftotext is installed. When I click on Create index it shows

Recreating index

D DMS
D Testing
1:511.3 BLOCH. 2000. Proofs and fundamentals.pdf (document added)

and when I click on Fulltext index info it shows


8 Terms

document_id:1
mimetype:application/x-unknown
owner:admin
title:and
title:bloch
title:fundamentals
title:pdf
title:proofs

Regards,Daan


RE: Full Text Index - steinm - 07-12-2012

(07-10-2012, 03:29 PM)Daanl Wrote:
(07-09-2012, 06:48 PM)steinm Wrote:
(07-09-2012, 01:46 PM)Daanl Wrote: Good morning,

My LetoDMS installation works 100%.

I have uploaded pdf files to the server but when I try to index the file it seems like it is only indexing the file name. I loaded OCR pdf file on the server and I would like to index the entire file and do a full text search.

Can someone possibly help with this on how can I create a full text index of the pdf file and not just index the file name.

You need pdftotext for it. Check if it is installed.

Uwe

Hi Uwe,
pdftotext is installed. When I click on Create index it shows

Recreating index

D DMS
D Testing
1:511.3 BLOCH. 2000. Proofs and fundamentals.pdf (document added)

and when I click on Fulltext index info it shows


8 Terms

document_id:1
mimetype:application/x-unknown
owner:admin
title:and
title:bloch
title:fundamentals
title:pdf
title:proofs

Regards,Daan

The problem is the mimetype of the document. It's application/x-unknown and that is not run through any command. It should be application/pdf.

Uwe



RE: Full Text Index - atarifreak - 09-27-2012

(07-12-2012, 07:04 PM)steinm Wrote: The problem is the mimetype of the document. It's application/x-unknown and that is not run through any command. It should be application/pdf.

Uwe

well, i have same problem. but for me mimetype is pdf...
Code:
document_id:9
mimetype:application/pdf
owner:admin
title:agb

and just for the record: using pdftotext on that pdf created much more txt that i expected. so its not a problem with that textfile.


RE: Full Text Index - DerMac - 09-28-2012

Hello,

same problem here.

* letodms works with no errors
* pdf can be imported
* Full-Index created
Code:
Recreating index

D DMS
  17:pdf_barrierefrei.pdf (document added)
* Lucene-Index is build
Code:
ls -la /volume1/letoDMS/lucene/
drwxr-xr-x    2 nobody   root          4096 Sep 28 17:09 .
drwxrwxrwx    5 nobody   root          4096 Sep 21 17:15 ..
-rw-rw-rw-    1 nobody   nobody         398 Sep 28 17:09 _10.cfs
-rw-rw-rw-    1 nobody   nobody           0 Sep 28 17:09 optimization.lock.file
-rw-rw-rw-    1 nobody   nobody           0 Sep 28 17:09 read-lock-processing.lock.file
-rw-rw-rw-    1 nobody   nobody           0 Sep 28 17:09 read.lock.file
-rw-rw-rw-    1 nobody   nobody          20 Sep 28 17:09 segments.gen
-rw-rw-rw-    1 nobody   nobody          42 Sep 28 17:09 segments_1a
-rw-rw-rw-    1 nobody   nobody           0 Sep 28 17:09 write.lock.file
* Fulltext Info
Code:
5 Terms

document_id:17
mimetype:application/pdf
owner:admin
title:barrierefrei
title:pdf
* Execute pdftotext as apache-user
Code:
/tmp $ pdftotext pdf_barrierefrei.pdf text.txt
/tmp $ head text.txt
Aktion Mensch e.V. Fachartikel ,,PDF-Dokumente ­ lesbar für alle"

PDF-Dokumente ­ lesbar für alle
PDF erfreut sich nicht nur bei Broschüren und Handbüchern, sondern auch bei amtlichen Formularen immer größerer Beliebtheit. In der zunehmend barrierefreien Internetwelt sind zugängliche PDF´s aber noch eine Seltenheit. Was ist zu tun, um dieses - eher für layoutgetreuen Druck bekannte - Format zugänglich zu machen?

Fachartikel für Aktion Mensch e.V. Autor: Roland Heuwinkel Version 1.0 vom 16.10.03

Autor: Roland Heuwinkel

17. Oktober 2003
/tmp $

It seems that pdftotext is not called.
Is it possible to debug the process?

Regards


RE: Full Text Index - steinm - 10-01-2012

(09-28-2012, 09:04 PM)DerMac Wrote: It seems that pdftotext is not called.
Is it possible to debug the process?

Just put some echos in LetoDMS_Lucene/Lucene/IndexedDocument.php.

you should var_dump $_convcmd. It contains the conversion programms.

Uwe


RE: Full Text Index - atarifreak - 10-03-2012

(10-01-2012, 06:03 PM)steinm Wrote:
(09-28-2012, 09:04 PM)DerMac Wrote: It seems that pdftotext is not called.
Is it possible to debug the process?

Just put some echos in LetoDMS_Lucene/Lucene/IndexedDocument.php.

you should var_dump $_convcmd. It contains the conversion programms.

Uwe

sorry, i cant do that. can you please explain exactly what to do?


RE: Full Text Index - DerMac - 10-03-2012

Thank you! With your advice I found the problem.

I installed the DMS on a Synology DiskStation.

The PHP config variable 'safe_mode_exec_dir' is set to a special SubDir.

I tried it with a symbolic link in this SubDir to pdftotext, but the same result.
So I had to unset this variable (via the web front end of the box).

Now the full index runs.

Regards.


RE: Full Text Index - atarifreak - 10-03-2012

(10-03-2012, 09:53 PM)DerMac Wrote: Thank you! With your advice I found the problem.

I installed the DMS on a Synology DiskStation.

The PHP config variable 'safe_mode_exec_dir' is set to a special SubDir.

I tried it with a symbolic link in this SubDir to pdftotext, but the same result.
So I had to unset this variable (via the web front end of the box).

Now the full index runs.

Regards.

Thank you for that information.
Can you tell me how to debug that process? i am not that php-coder but i know how to use vi :-)
i will install letodms on my synology too but first want to check with lampp. so i need to unset safe_mode_exec_dir?
is this done by just
safe_mode_exec_dir =

in php.ini?

but do this anything if safe_mode = off ?