OCR Tesseract problem

Shubhamautomation · April 23, 2021, 6:07am

Hello,
I am trying to read text from image using OCR tesseract but after running i am not getting any output in outputtext field. (See Snapshot)

I don’t how to use it ?
Also please tell how to extract data (like PAN number from PAN card )

Thank You !

rajanimhanta · April 23, 2021, 6:50am

Hi Shubham,

You have to give the tessdata master folder path in the Data Folder path.
below is link for tessdata master.
link : GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine

Shubhamautomation · April 23, 2021, 10:20am

I downloaded the zip file which was 230 mb.
when I was extracting the zip using win RAR it displayed currupted.
when i repaired it and and used in OCR tesseract steps , outputtext was blank

is there any other way to download the file?
if you could give some steps will be more better .
Thank you for hearing me

rajanimhanta · April 23, 2021, 1:32pm

Can you put the attached jar in PS lib folder.jai-imageio-core-1.4.0.jar (613.3 KB)

Shubhamautomation · April 23, 2021, 6:40pm

Again i downloaded the tesseract ZIP file using WIFI which was 650 mb and extracted.
first tesseract data folder path applied(see snapshot)
then I tried with one jpg file and one pdf file and run
but no output in outputtext field

I also placed the jai-imageio-core.jar file in lib folder but same problem found

fbaldin · April 25, 2021, 12:14pm

Please add a Generate Rows prior to the OCR Tesseract Step

Shubhamautomation · April 25, 2021, 3:16pm

Yes , I added Generate row and it is able to extract data from image.

1 More Question:
If I want to extract PAN number from PAN Card then how can I do this.
using OCR i able to extract all data from scanned document
But how to extract particular text from scanned document or image?

Can you help for this in step wise?

vijaykumar.naikwade · April 28, 2021, 12:13pm

Please use the following regular expression using ‘Regex Evaluation’ plugin step to extract PAN number from the OCRed text.
^[A-Z]{5}[0-9]{4}[A-Z]{1}$

Shubhamautomation · April 28, 2021, 12:45pm

I tried but it returns. N
Means matvh not found

vijaykumar.naikwade · April 28, 2021, 1:36pm

Shubhamautomation · April 29, 2021, 4:36am

Still Not working !
if my input rows contains only PAN numbers then it works
But as i am extracting text from PAN card which consist of multiple text.
it is returning N.
Question: Do regular expression return the matched Result into field? or simply match the pattern and return boolean value.

Shubhamautomation · April 29, 2021, 6:34am

The String that i am getting AFTER OCR is

e

mmmm

Permanent Account Number Card

ABABB0000M

T/ Name

S ot i

WM AAAAAAAAA BBBBBBBBB

…

fbaldin · May 1, 2021, 12:49pm

Try adding . * like the picture… see if that works:

A-Z ↩︎

Shubhamautomation · May 2, 2021, 7:26am

Yes it’s Working Fine Thank you!
What I did? I removed ALL the space from string after extracting From PAN CARD to make to it single line.
Now it working absolutely fine.

Arun9047 · November 11, 2021, 9:25am

Hello, i am also facing same issue… without generate rows plugin the OCR Plugin is working without output,

After adding generate rows plugin , OCR throwing error,

Please kindly give some suggestions

Topic		Replies	Views
OCR Tesseract Error- Get Help & Discuss	1	405	June 22, 2021
OCR: Tesseract 🔍 Self-Help & Learn rpa , article	3	1027	January 4, 2023
Tesseract OCR not working, giving only empty values Process Studio rpa	2	203	July 9, 2023
Creating a workflow to read text from image using the OCR: Tesseract plugin Get Help & Discuss rpa , generic	1	268	October 11, 2023
Read text from image Advanced Ready Actions (IT Steps) workflow	3	236	June 30, 2023

OCR Tesseract problem

Related topics