DataFrame object in Pandas based on data from pdf file

Reading data from a pdf file requires the tabula-py module to be installed. This module also enables saving the read data to a data file in csv or json format.

import tabula
df_list = tabula.read_pdf('file.pdf')

The read_pdf function reads one page from a pdf file by default, if no value is given for the pages parameter (if you want to load all pages: pages = ‘all’).

The above function returns a list object containing successive DataFrame objects, for example:

df = df_list[0]        # first DataFrame object

Leave a Reply

Your email address will not be published. Required fields are marked *