What is a good OCR software for Japanese?
Thread poster: DavidCanek
DavidCanek
DavidCanek
Local time: 15:27
English to Czech
Feb 9, 2012

Hi,

Could anyone recommend a good software product to OCR a bunch of PDFs in Japanese?

Thanks,
David


 
DZiW (X)
DZiW (X)
Ukraine
English to Russian
+ ...
FineReader Feb 9, 2012

Hello David.

The problem with hieroglyphs is they usually require 300+ DPI...

So, ABBYY FineReader might be of help. Although I don't use it for Oriental languages , but it proved to be one of the best OCR's atm.


 
D. B. Slavenskoj
D. B. Slavenskoj
Russian to English
+ ...
Tesseract Feb 9, 2012

Tesseract has language data files for Japanese, and can be trained as well. See: http://en.wikipedia.org/wiki/Tesseract_(software)

 
Kirti Vashee
Kirti Vashee  Identity Verified
United States
Local time: 07:27
Free OCR Feb 9, 2012

This project sounded promising but it is important to remember that source files that have less than 300 dpi are likely to be unsuccessful in most OCR packages

http://www.reviewmylife.co.uk/blog/2010/06/08/free-japanese-ocr-translation/


 
Kirti Vashee
Kirti Vashee  Identity Verified
United States
Local time: 07:27
More OCR options for JA Feb 9, 2012

First, there's 読んde!!ココ v.13. (Windows only.) Here's the basic info:

Main Page:
http://ai2you.
... See more
First, there's 読んde!!ココ v.13. (Windows only.) Here's the basic info:

Main Page:
http://ai2you.com/ocr/

More Details:
http://ai2you.com/ocr/product/koko13/workings.asp

Free Trial:
http://ai2you.com/ocr/product/koko13/trial01.asp

Buy it here:
http://ai2you.com/shopai2you/ocr/koko13.asp

Works with TWAIN scanners and WIA scanners, will play nicely all the way to Win 7 64.

It claims to handle smudged kanji and underlined words, and has a learning mode. Plus, it has a bunch of built-in dictionaries to help with recognition. It claims to be able to handle both kana, kanji, and alphanumeric text on the same page as well, something that ReadIris choked on frequently when I used it.

If it does what it claims to, then it would be a heck of a lot better than anything IRIS puts out, for a lot cheaper. ~13,000 yen for the full download version. 20,000 yen if you want a box. Interface is all Japanese.


The other software I'm looking at is e.Typist (Windows only, supports Mac via Boot Camp... I think. It's vague about Mac support.) :

Main Page-- details along the sidebar links:
http://mediadrive.jp/products/et/index.html

Try the Eval version here:
http://mediadrive.jp/products/et/index8.html

Buy the Download version for cheap here:
http://shop.mediadrive.jp/item_list.htm … p;request=

Looks pretty similar to 読んde!!ココ, feature-wise, with a few notable exceptions. First, you can buy the Neo edition which only does EN and JP for 9800 yen (download), or the standard edition which does a bunch of languages for ~13000 yen (download). If you buy at a store, expect to pay 13,000/20,000 yen. Discounts for downloading are nice here, just like 読んde!!ココ. The Neo feature is nice if you don't care about other languages.

Otherwise, it seems to do just about everything that 読んde!!ココ does, with a few exceptions. First, it has a "preview mode" where it superimposes what it thinks it sees over the text it scans, so you can correct it. Also, it doesn't say whether it supports WIA scanners. It's vague about that. It says it supports Win 7 64, but it's kind of sketchy about which scanners it supports. I guess I'll try the eval version first to see if it likes my Brother MFC 7840.

Both handle image files, PDFs, scans, photos, and various input devices, and will output to txt, rtf, excel, word, etc., with some variations between the two. Check the websites to see if your flavors are supported.

Both have large dictionaries, and it looks like both support learning modes for Japanese, which ReadIris does not.

And if you want Free Japanese OCR, there's this thread here:
http://forum.koohii.com/viewtopic.php?id=2608
Collapse


 
Kirti Vashee
Kirti Vashee  Identity Verified
United States
Local time: 07:27
Japanese OPtions Feb 9, 2012

These are JA user interface options

http://search.vector.co.jp/vsearch/vsearch.php?query=OCR


 
DavidCanek
DavidCanek
Local time: 15:27
English to Czech
TOPIC STARTER
Thanks! Feb 9, 2012

Thanks for all the tips!

David


 
Best, cheapest OCR software for Japanese Jul 17, 2012

I searched the Internet for several days trying to find a good OCR software for Japanese for MacIntosh (I now have OS X version 10.6.8).

It was very frustrating ... many of the software specifications were totally inadequate for deciding whether the software would do what I needed, which was OCR for scanned PCR files with vertically oriented Japanese, 150 dpi resolution, B/W contrast not the best). The software that looked most promising was available only for Windows.

... See more
I searched the Internet for several days trying to find a good OCR software for Japanese for MacIntosh (I now have OS X version 10.6.8).

It was very frustrating ... many of the software specifications were totally inadequate for deciding whether the software would do what I needed, which was OCR for scanned PCR files with vertically oriented Japanese, 150 dpi resolution, B/W contrast not the best). The software that looked most promising was available only for Windows.

Finally, I found that I already owned the best (or at least highly adequate) OCR software. It is Abobe Acrobat X Pro (my version is 10.1.3). It rapidly converts whole pages of Japanese text image (PDF scan) to copiable and editable text. It works in vertical orientation (and I assume also horizontal orientation). I've just tested it a bit, and it was 100% accurate. Maybe with further testing it will prove to be somewhat less than 100%, but it was astounding. It does not do translation, but I can copy and paste into translation programs.

I say that this is the cheapest OCR software, because I already had Abobe Acrobat X Pro. I needed it for other things, and finding that it had OCR capability was essentially free, in my case. However, Abobe Acrobat X Pro is probably cheaper anyway than equally good PCR conversion software.

Here is an excellent link telling you exactly how to use the OCR converter function in Abobe Acrobat X Pro:

acrobatusers.com/tutorials/how-do-i-edit-text-in-a-scanned-pdf

Here is the translation software I use ("detailed word info" option):

nihongo.j-talk.com/kanji/

I hope I have helped people who want OCR for Japanese.
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

What is a good OCR software for Japanese?






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »