XML file to pandas DataFrame object – minidom

In this article, I will describe how to load data from an XML file into a DataFrIn the created document, I search for a list of elements from the XML file named person and, in turn, I retrieve the value of the id attribute for each of them, which I save in the persons dictionary.ame object using minidom module.

Project files are available for download >>here<<

The same XML file as described in the previous post is used.

In addition to xml.dom.minidom, I also use the pandas module and the defaultdict class from the collections module.

Using the context manager, I open the XML file, which is loaded and from which the DOM document is created. Moreover, for each child tag, ie position, first_name, last_name, etc., I save its value.

The persons dictionary is as an argument when creating a DataFrame object.

main.py source code:

import xml.dom.minidom
from collections import defaultdict
import pandas as pd

persons = defaultdict(list)
with xml.dom.minidom.parse(open('persons.xml')) as tree:
    persons_list = tree.getElementsByTagName('person')
    for person in persons_list:
        persons['id'].append(person.getAttribute('id'))
        for tag in ('position', 'first_name', 'last_name', 'email', 'salary'):
            persons[tag].append(person.getElementsByTagName(tag)[0].firstChild.data)


df = pd.DataFrame(persons, columns=persons.keys()).set_index('id')
df['salary'] = df['salary'].astype(float)
print(df.sort_values(by='salary', ascending=False))

Leave a Reply

Your email address will not be published. Required fields are marked *