How to Extract Specific Element from PDF or Scanned Image

Task like: How we can extract PAN number from Pan Card
or extracting specfic element from PDF invoice.
How we can do that.
Please tell in step wise

Hi Shubham,

Once you read the PDF File, you will get the extracted text data.
You need to identify the pattern of the specific data/field you want to get from the extracted data.

In your case the PAN Number has a pattern as mentioned below:

The PAN (or PAN number) is a ten-character long alpha-numeric unique identifier.
The first five characters are letters (in uppercase by default), followed by four numerals, and the last (tenth) character is a letter.
The fourth character identifies the type of holder of the card.

Sagar Singh

Ok thanks
And for invoice pdf? How to do that?
I used pdf form steps in workflow. some pdf’s fields are visible in pdfform steps but some not.

I used invoice1 pdf which was editable in browser if we open it. If this pdf I use in pdf form steps then field which are present in invoice pdf are visible.

But when I open another pdf which was not editable if we open in Browser. It’s field was not visible in pdf form steps

Hi Shubham,

The “Read PDF Form” plugin has a limitation, it works only with editable PDF forms.
That is the reason it is working fine with the first invoice in your case. But unable to get any fields from the other invoice as it is not editable.

Sagar Singh

Ok !
But in case if we need to read invoice then how we will do it?

There is another plugin available PDF to Text, you can try using that.
But it would read the PDF and return whole data in text.

Sagar Singh

As UIpath provides the feature of indicating elements on pdf or image. So I was trying to do that in AE.