CD Update

Matijs matijs at R2JNET.COM
Mon Nov 13 11:11:11 CET 2000


Maouse wrote:
>
> OK FOLKS;
>
>   Thought you all would like a status on what's what:
> 4) I am going to scan EVERYTHING in.  Initially this is TIF I believe.
> This ORIGINAL will be put on the CD.  Once in TIF I will use some FREE FOR
> 15 DAYS OCR App to convert it/read it.  There are a few good ones out there
> and it won't cost me anything to do it this way.  I will then of course
> have to read it all (while doing this I'd imagine, so pray for me and my
> wife...lol).  Once in DOC or whatever format (Hopefully with images of all
> the grey boxes, etc...) I will strip off the boxes and pictures and get
> just plain text.
> 5) This plain text will be added to the database/app I have already created.
> 6) The DOC file will already be searchable using WORD, but I will have my
> brother convert (for free once again) the DOC to a PDF.  These will be by
> book I imagine (so you may have to wait for them to load!).  But they will
> also be searchable and readable on almost any PC nowadays.

Books 1 and 2 were scanned, OCRed and corrected by us (the group from
Amsterdam). Originally, this was in LaTeX, from which we produced the
PDF files some of you have. However, recently I have converted it all to
XML format. This mean that many more formats can be produced. I already
did LaTeX (for printing and PDFs) and HTML, but I guess RTF (for
conversion to Word) and plain text could be done too. So basically, I
think it would be a waste of time to re-OCR all of books 1 and 2. TIF
(or some other, smaller, bitmap format) would still be nice, though, for
reference.

As pointed out by Scott, I run linux, usually, and I don't own a copy of
Office, so any Access applications are useless to me. I can read Word 97
though (if fast save is off), using StarOffice. All the other formats
are no problem anyway.

Matijs.



More information about the pnp mailing list