Universal Access To All Knowledge
Home donate | Forums | FAQs | Contributions | Terms, Privacy, & Copyright | Contact | Volunteer Positions | Jobs | Bios
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: nick.k9 Date: January 28, 2010 07:06:55am
Forum: texts Subject: Fixing mistakes in OCR'ed texts

I found this book the other day, and started reading the "Full Text," which is chock full of OCR mistakes ("scannos"):

http://www.archive.org/details/sotweedfactor006326mbp

Is there any process for fixing this? Are there any formatting guidelines for these texts? Do we expect them to be e-texts (human-readable), or merely available for the benefit of machines (meaning searching)? E.g., should punctuation mistakes be corrected?

I looked in the FAQ and searched the forums, and found nothing on this. It seems like this should really be added to the FAQ:

http://www.archive.org/about/faqs.php#Texts_and_Books

Lastly, I understand that this is in the Universal Library, so I could have posted in that forum. But it appeared dead, so I opted to put it here instead.

Thanks,
-Nick

Reply to this post
Reply [edit]

Poster: stbalbach Date: January 28, 2010 10:37:28am
Forum: texts Subject: Re: Fixing mistakes in OCR'ed texts

Hmm I was just reading some John Barth last night..

There is no mechanism at Internet Archive for correcting OCR text, other than re-uploading it to the Open Source Books library as a separate work. But if you want to do OCR checking type work, Project Gutenberg Distributed Proodreaders does exactly that, check em out as they need help.

BTW.. The Sot-Weed Factor is still in copyright, so this is a "pirate book", IA just doesn't know it (yet) :)

Reply to this post
Reply [edit]

Poster: garthus Date: January 28, 2010 03:09:43pm
Forum: texts Subject: Re:Sot Weed Factory in the Public Domain' Fixing mistakes in OCR'ed texts

Stephen,

Yes its copyright registration was renewed in 1988 and is still in copyright. Unless someone was given the right to place it in the Universal Library.

Gerry

This post was modified by garthus on 2010-01-28 23:08:35

This post was modified by garthus on 2010-01-28 23:09:43

Terms of Use (10 Mar 2001)