How to read a pdf & extract values from the pdf. How can I do that?

Hi. I want to read a pdf file from the website & then extract some values from the content of the pdf by specifying certain conditions. How it can be done ?

If you want to read the PDF from a form, we have a Plugin to read the PDF and get the contents from it. i.e. Read PDF form.

However if you have it on website and you can view the same, then you can extract the values using Surface Spy and using its relative position.


Can we read any pdf file?
I tried two invoice pdf form using read pdf form.
one invoice pdf was editable if i open it in browser and other invoice pdf was not editable.

Using read pdf form steps I am able to read first invoice(Field shown in WF steps) but
not able to read second invoice pdf(Field not shown in WF steps)

The “Read PDF Form” plugin has a limitation, it works only with editable PDF forms.
That is the reason it is working fine with the first invoice in your case. But unable to get any fields from the other invoice as it is not editable.

To answer your question “Can we read any pdf file?”.
Yes, there is another plugin PDF to Text, but it would read the PDF and return whole data in text.

Would it be possible to share a sample of an editable pdf file which can be used in the plugin ‘Read PDF Form’.