sup/tg/ - Articles

Optical Character Recognition, or OCR, is a method of recognizing letters in a PDF in order to make it searchable. By stringing letters together to form words, the program – Adobe Acrobat – is able to turn what was a jpeg or similar image into clickable, highlight-able, searchable text. Sounds pretty handy doesn’t it? Why yes, yes they are! OCR technology makes reading a PDF that much simpler. In addition to making the text interactive, it also allows for the compressing of PDFs. Take your average Dark Heresy scan. It’s about 200 megs, give or take a few megs for the differences in scans available. Once run through an OCR converter, that size can shrink all the way down to under 50 megs. Repeat this process for any book out there.

“But if OCR is so good, why aren’t all books OCR’d!” one might be wondering. Well. There is a slight loss of quality involved in the process. The same Dark Heresy book lost some of its quality. It still remained an excellent scan, however during the OCR process it does lose some. The remaining problem is the necessary equipment. Not just any program either. Adobe Acrobat is the only one I’ve found so far that is able to perform the process. Along with the program, a certain amount of time is needed, usually a few hours for the OCR to take place. During this period, the program uses a large amount of resources, making other endeavors nigh on impossible.

However, this brave author, has started OCRing books, in a venture he calls UNLIMITED OCR WORKS. Through this program, he’s OCR’d a fair number of books, and the library grows often. Though intensive, I have a number of books that I have OCR’d personally. The catalog will be updated and posted shortly, along with handy links.

If you have a particular or special request, please field them. It’s kind of fun to OCR books and stuff, so let me know over the IRC or something.

~Ishallcallu

OCR, Tutorial - View All by Author

4 Comments

View Older News