OCR Tesseract problem

Hello,
I am trying to read text from image using OCR tesseract but after running i am not getting any output in outputtext field. (See Snapshot)

I don’t how to use it ?
Also please tell how to extract data (like PAN number from PAN card )

Thank You !

Hi Shubham,

You have to give the tessdata master folder path in the Data Folder path.
below is link for tessdata master.
link : GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine

I downloaded the zip file which was 230 mb.
when I was extracting the zip using win RAR it displayed currupted.
when i repaired it and and used in OCR tesseract steps , outputtext was blank

is there any other way to download the file?
if you could give some steps will be more better .
Thank you for hearing me

Can you put the attached jar in PS lib folder.jai-imageio-core-1.4.0.jar (613.3 KB)

Again i downloaded the tesseract ZIP file using WIFI which was 650 mb and extracted.
first tesseract data folder path applied(see snapshot)
then I tried with one jpg file and one pdf file and run
but no output in outputtext field

I also placed the jai-imageio-core.jar file in lib folder but same problem found

Please add a Generate Rows prior to the OCR Tesseract Step

Yes , I added Generate row and it is able to extract data from image.

1 More Question:
If I want to extract PAN number from PAN Card then how can I do this.
using OCR i able to extract all data from scanned document
But how to extract particular text from scanned document or image?

Can you help for this in step wise?

Please use the following regular expression using ‘Regex Evaluation’ plugin step to extract PAN number from the OCRed text.
^[A-Z]{5}[0-9]{4}[A-Z]{1}$

I tried but it returns. N
Means matvh not found

image

Still Not working !
if my input rows contains only PAN numbers then it works
But as i am extracting text from PAN card which consist of multiple text.
it is returning N.
Question: Do regular expression return the matched Result into field? or simply match the pattern and return boolean value.

The String that i am getting AFTER OCR is

e

mmmm

Permanent Account Number Card

ABABB0000M

T/ Name

S ot i

WM AAAAAAAAA BBBBBBBBB

Try adding . * like the picture… see if that works:


  1. A-Z ↩︎

Yes it’s Working Fine Thank you!
What I did? I removed ALL the space from string after extracting From PAN CARD to make to it single line.
Now it working absolutely fine.

Hello, i am also facing same issue… without generate rows plugin the OCR Plugin is working without output,

After adding generate rows plugin , OCR throwing error,

Please kindly give some suggestions