|
Poster:
|
nick.k9 |
Date:
|
January 28, 2010 07:06:55am |
|
Forum:
|
texts
|
Subject:
|
Fixing mistakes in OCR'ed texts |
I found this book the other day, and started reading the "Full Text," which is chock full of OCR mistakes ("scannos"):
http://www.archive.org/details/sotweedfactor006326mbpIs there any process for fixing this? Are there any formatting guidelines for these texts? Do we expect them to be e-texts (human-readable), or merely available for the benefit of machines (meaning searching)? E.g., should punctuation mistakes be corrected?
I looked in the FAQ and searched the forums, and found nothing on this. It seems like this should really be added to the FAQ:
http://www.archive.org/about/faqs.php#Texts_and_BooksLastly, I understand that this is in the Universal Library, so I could have posted in that forum. But it appeared dead, so I opted to put it here instead.
Thanks,
-Nick
|
Poster:
|
stbalbach |
Date:
|
January 28, 2010 10:37:28am |
|
Forum:
|
texts
|
Subject:
|
Re: Fixing mistakes in OCR'ed texts |
Hmm I was just reading some John Barth last night..
There is no mechanism at Internet Archive for correcting OCR text, other than re-uploading it to the Open Source Books library as a separate work. But if you want to do OCR checking type work,
Project Gutenberg Distributed Proodreaders does exactly that, check em out as they need help.
BTW..
The Sot-Weed Factor is still in copyright, so this is a "pirate book", IA just doesn't know it (yet) :)
|
Poster:
|
garthus |
Date:
|
January 28, 2010 03:09:43pm |
|
Forum:
|
texts
|
Subject:
|
Re:Sot Weed Factory in the Public Domain' Fixing mistakes in OCR'ed texts |
Stephen,
Yes its copyright registration was renewed in 1988 and is still in copyright. Unless someone was given the right to place it in the Universal Library.
Gerry
This post was modified by garthus on 2010-01-28 23:08:35
This post was modified by garthus on 2010-01-28 23:09:43