XML file to pandas DataFrame object #3

In this article, I will describe how to load data from an XML file into a DataFrIn the created document, I search for a list of elements from the XML file named person and, in turn, I retrieve the value of the id attribute for each of them, which I save in the persons dictionary.ame object using minidom module.

Project files are available for download >>here<<

The same XML file as described in the previous post is used.

In addition to xml.dom.minidom, I also use the pandas module and the defaultdict class from the collections module.

Using the context manager, I open the XML file, which is loaded and from which the DOM document is created. Moreover, for each child tag, ie position, first_name, last_name, etc., I save its value.

The persons dictionary is as an argument when creating a DataFrame object.

main.py source code:

import xml.dom.minidom
from collections import defaultdict
import pandas as pd

persons = defaultdict(list)
with xml.dom.minidom.parse(open('persons.xml')) as tree:
    persons_list = tree.getElementsByTagName('person')
    for person in persons_list:
        persons['id'].append(person.getAttribute('id'))
        for tag in ('position', 'first_name', 'last_name', 'email', 'salary'):
            persons[tag].append(person.getElementsByTagName(tag)[0].firstChild.data)


df = pd.DataFrame(persons, columns=persons.keys()).set_index('id')
df['salary'] = df['salary'].astype(float)
print(df.sort_values(by='salary', ascending=False))

XML file to pandas DataFrame object #2

In this article, I will describe how to load data from an XML file into a DataFrame object using xml.sax module.

Project files are available for download >>here<<

The same XML file as described in the previous post is used.

In addition to xml.sax, I also use the pandas module and the defaultdict class from the collections module.

The first step in reading data from an XML file using SAX is to implement your own class that inherits from the ContentHandler() class. The handler class overrides three methods of the base class: the startElement() method, which is called when reading the next tag starts, characters () method, which reads the values ​​stored for a particular element and the endElement () method, which is called when reading the corresponding tag has finished. In addition, in the init () method, I create an instance of the dictionary in which the data read from the XML file will be saved.

In the startElement () method, I define a class attribute called tag, and if the processed tag is ‘person’ I write the id number for the person.

Then, in the characters() method, I save the values ​​stored in each element as the appropriate class variables.

In the endElement () method, I save the retrieved values ​​to the persons dictionary.

Parsing the XML file with the xml.sax module is done by calling the make_parser() method, which returns an instance of the parser. Then the handler instance is assigned to the created parser as an argument to the setContentHandler function() – in this case the PersonsHandler class.

The parse () method is then called to parse the source XML document.

The persons variable holds the values ​​of the dictionary created by the handler, and that dictionary is passed as a parameter when creating a DataFrame.

main.py souce code:

import xml.sax
from collections import defaultdict
import pandas as pd

class PersonsHandler(xml.sax.ContentHandler):
    def __init__(self):
        self.persons = defaultdict(list)
    def startElement(self, tag, attr):
        self.tag = tag
        if tag == 'person':
            self.persons['id'].append(attr['id'])
            
    def characters(self, content):
        if content.strip():
            if self.tag == 'position': self.position = content
            elif self.tag == 'first_name': self.first_name = content
            elif self.tag == 'last_name': self.last_name = content
            elif self.tag == 'email': self.email = content
            elif self.tag == 'salary': self.salary = content
    
    def endElement(self, tag):
        if tag == 'position': self.persons['position'].append(self.position)
        elif tag == 'first_name': self.persons['first_name'].append(self.first_name)
        elif tag == 'last_name': self.persons['last_name'].append(self.last_name)
        elif tag == 'email': self.persons['email'].append(self.email)
        elif tag == 'salary': self.persons['salary'].append(self.salary)

parser = xml.sax.make_parser()
parser.setContentHandler(PersonsHandler())
parser.parse(open('persons.xml'))
persons = parser.getContentHandler().persons


df = pd.DataFrame(persons, columns=persons.keys()).set_index('id')
df['salary'] = df['salary'].astype(float)
print(df.sort_values(by='salary', ascending=False))

XML file to pandas DataFrame object #1

In this article, I will describe how to load data from an XML file into a DataFrame object using xml.etree.ElementTree.

Project files are available for download >>here<<

The sample XML file will describe people employed in the company and will have the following form:

<persons>
    <person id="">
        <position></position>
        <first_name></first_name>
        <last_name></last_name>
        <email></email>
        <salary></salary>
    </person>
</persons>

The following data is saved for each person with a unique id attribute: position, name, surname, email address and salary.

First, I import the necessary modules, ie xml.etree.ElementTree to parse the XML document. From the collections module I import defaultdict, which will store lists containing first names, surnames, salaries, etc., which I then pass as an argument of the DataFrame class.

import xml.etree.ElementTree as et
from collections import defaultdict
import pandas as pd

In the next line, I create a dictionary that will store employee data obtained from the XML file:

persons = defaultdict(list)

Then I load the XML file and download the parent element – root – persons, and then for each child element – person I get the value of its id attribute, as well as the values ​​of its elements: position, first_name, last_name, etc. Each of these values ​​is added as the next element of the appropriate dictionary list.

tree = et.parse("persons.xml")
root = tree.getroot()
for child in root:
    id = child.attrib.get('id')
    position = child.find('position').text
    first_name = child.find('first_name').text
    last_name = child.find('last_name').text
    email = child.find('email').text
    salary = child.find('salary').text
    
    persons['id'].append(id)
    persons['position'].append(position)
    persons['first_name'].append(first_name)
    persons['last_name'].append(last_name)
    persons['email'].append(email)
    persons['salary'].append(salary)

I pass the created dictionary as an argument to the DataFrame object being created, with the names of the dictionary keys as the names of the columns and the id column as the DataFrame index.

Then I change the data type of the salary column to float to sort the frame in order of decreasing values ​​of that column. Alternatively, the conversion to float could be done when creating the DataFrame object, additionally providing the dtype argument.

Refueling report from xlsx file to Pandas + localized string resources

The source files of the program along with a sample report from a gas station can be found >> here <<

The Reports class has three methods: __init__(), load_resources(), and my_parser().

The __init__() method reads all files in the Data directory that have an xlsx extension. The last added file from this file list is loaded – the pandas frame object is created.

In the load_resources() method, I load localized strings depending on the user’s locale. If there is no appropriate file, the version with messages in English is loaded.

In the my_parser () method, I sum up the collected fuel with the division into vehicles. In case of discrepancies with the source documents, you can view a detailed list of refueling for a specific vehicle giving part of the registration number.

script code: main.py

import os
import glob
import locale
import calendar
import pandas as pd
import importlib_resources


class Reports:
	
    def __init__(self):
        self.load_resources()
        script_dir = os.path.dirname(os.path.abspath(__file__))
        data_dir = os.path.join(script_dir, 'Data')
        list_of_files = glob.glob(os.path.join(data_dir, '*.xlsx'))
        latest_file = max(list_of_files, key=os.path.getctime)
        path_to_file = os.path.join(data_dir, latest_file)
        self.df = pd.read_excel(path_to_file)
        self.my_parser()
    
    def load_resources(self):
        self.locale, encoding = locale.getdefaultlocale()
        r = importlib_resources.files('Resources')
        try:
            r_strings = (r / f'strings_{self.locale}.txt').read_text(encoding='utf-8').splitlines()
        except:
            r_strings = (r / f'strings_en_US.txt').read_text(encoding='utf-8').splitlines()
            self.locale = 'en_US'
        self.r_str = dict(x.split(':') for x in r_strings)


    
    def my_parser(self):
        os.system('cls' if os.name == 'nt' else 'clear')
        print()
        f_date = self.df['Date'].min()
        l_date = self.df['Date'].max()

        with calendar.different_locale(self.locale):
            print(self.r_str['title'], f'({f_date.day} -  {l_date.day} {calendar.month_abbr[int(l_date.month)]} {l_date.year})')
        self.df['Amount'] = self.df['Amount'].astype(str)
        self.df['Amount'] = self.df['Amount'].str.replace(',', '.')
        self.df['Amount'] = self.df['Amount'].astype(float)
        print()
        print(self.r_str['fuel_total'], self.df['Amount'].sum())
        cars_group = self.df.groupby('Registration number')
        print()
        # print(cars_group['Amount'].sum()) #standard output
        for item in zip(cars_group.groups, cars_group['Amount'].sum().values):
            name, value = item
            print(f'{name}    {value}')


        print()
        while True:
            choice = input(self.r_str['choice'] + ': ')
            os.system('cls' if os.name == 'nt' else 'clear')
            if choice == '/q': raise SystemExit
            if choice == '/m': self.my_parser()
            for key, value in cars_group['Amount']:
                if choice.upper() in key:
                    print(self.r_str['refueling'], key)
                    total = 0
                    for i, v in zip(value.index, value.values):
                        print(f"{self.df.loc[i, 'Date']}   {v}")
                        total +=v
                    print(self.r_str['ref_sum'] + f' {key}: {total}')
                    print()


if __name__ == "__main__":
    reports = Reports()

Color maps in matplotlib

To create a scatter chart that uses a selected color palette, enter the name of the selected color scheme as the cmap argument.

The colors are grouped (see documentation). For instance: Sequential – different intensity of one color, Diverging – different intensity of two contrasting colors, Qualitative – different contrasting colors, but matching the palette, e.g Pastel1 – containing different pastel colors.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.set_title('Chart title')
ax.set_xlabel('X axis label')
ax.set_ylabel('Y axis label')

# x - list with X axis values, e.g. product name
# y - list with Y axis values, e.g. product price
# intensivity - a list with numbers corresponding to the intensity of the attribute
# cm - name of the selected color palette

ax.scatter(x, y, c=intensivity,  cmap=cm)  
  # e.g: plt.scatter(x, y, c=intensivity, s=50,  cmap='plasma')

mappable = ax.collections[0]
cbar = fig.colorbar(mappable=mappable)
cbar.set_label('intensivity')

plt.show()

The s parameter is the size of drawn characters.

To display a colorbar that shows the intensity of a given feature for the displayed point, the mappable object should be specified as a parameter, which for the scatter plot is stored as an element of the AxesSubplot collections list.

As a result, on the chart, which may show, for example, the name of the product and the price, you can add additional information using the color, e.g. popularity among buyers or the quantity of goods in the warehouse (e.g. green – the product is available without any problems, up to red – no goods in stock) )

DataFrame object in Pandas based on data from pdf file

Reading data from a pdf file requires the tabula-py module to be installed. This module also enables saving the read data to a data file in csv or json format.

import tabula
df_list = tabula.read_pdf('file.pdf')

The read_pdf function reads one page from a pdf file by default, if no value is given for the pages parameter (if you want to load all pages: pages = ‘all’).

The above function returns a list object containing successive DataFrame objects, for example:

df = df_list[0]        # first DataFrame object

Loading data from csv file to numpy.ndarray

When there are no missing values ​​in the source data, we can use the numpy.loadtxt() function.

However, if there is no value in the loaded file, instead of the above function, we can use the numpy.genfromtxt() function, i.e.

import os
import numpy as np
script_dir = os.path.dirname(__file__)
path_to_file = os.path.join(script_dir, 'data_file.csv')

data_array = np.genfromtxt(path_to_file, dtype='str')

The genfromtxt() function returns an object of type numpy.ndarray. As additional function parameters, we can add e.g.

  • delimiter - determines which sign separates particular values
  • skip_header - specifies how many lines from the beginning of the file are to be skipped
  • autostrip - a bool parameter specifying whether spaces should be automatically removed

A list of all parameters can be found here.

Decorators with parameters – Fizz Buzz game

First, I will define a numbers function that returns a dictionary, where the keys are integers in the range given as the argument to the function (from start to end inclusive).

numbers() function source code:

def numbers(start, end):
    dict = {}
    for number in range(start, end + 1):
        dict[number] = ''
    return dict

The function can be modified using the decorators with parameters.

My decorator function that modifies dictionary values:

def fizzbuzz(number, message):
    def decor(func):
        def wrapper(*args):
            dict = func(*args)
            for key in dict:
                if not (key % number):
                    dict[key] += message
            return dict
        return wrapper
    return decor

Function numbers() definition with the decorator:

@fizzbuzz(5, 'BUZZ')
@fizzbuzz(3, 'FIZZ')
def numbers(start, end):
    dict = {}
    for number in range(start, end + 1):
        dict[number] = ''
    return dict

Calling the numbers function will not change, e.g. to display dictionary values ​​with keys from 0 to 15:

print(numbers(0, 15))

Modification by using a decorator is done behind the scenes.

Django #4

You can test the application by creating a free account here.
You can get the source code from GitHub here.

In this post I will present views responsible for car management and repairs.

▣ I will start with the main view showing the cars entered by the user.

The maximum number of entries per page is 10 cars.

After clicking on a particular car, the repairs of a given car are displayed.

Both car and repair entries are sorted by date, i.e. the newest entries are displayed first.

The view class displaying cars inherits from the ListView class and the LoginRequiredMixin class (so that access is only possible for logged in users).

First, I define all the class attributes, i.e. the model used – Car, the template used – cars.html, the name of the object under which the data in the context is available – context_object_name – ‘cars’, and the number of entries on the page – 10 vehicles.

In this class, I override two methods – get_queryset (), which is responsible for filtering data, and get_context_data (), in which I complete the context data with data from the search field.

CarListView () class source code:

class CarsListView(LoginRequiredMixin, ListView):
    model = Car
    template_name = 'cars/cars.html'
    context_object_name = 'cars'
    paginate_by = 10

    def get_queryset(self):
        if self.request.GET.get('q'):
            q = self.request.GET.get('q')
            make_results = self.model.objects.filter(
                user=self.request.user, make=q).order_by('-pk')
            model_results = self.model.objects.filter(
                user=self.request.user, model=q).order_by('-pk')
            if make_results.exists():
                return make_results
            elif model_results.exists():
                return model_results
            else:
                return self.model.objects.none()
        return self.model.objects.filter(user=self.request.user).order_by('-pk')

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        context['q'] = self.request.GET.get('q', '')
        return context

The get_queryset() method lists all of the user’s vehicles. The last added car is visible first (sort ‘-pk’). If the make or model of the vehicle (s) was entered in the search field, only these vehicles will be displayed.

The get_context_data() method adds an entry from the search field to the context, thanks to which the data about car repairs is displayed correctly when dividing cars into individual pages.

▣ Adding a vehicle is done by pressing the Add Car button in the application menu.

The view that handles adding a car is the AddCarView () class, and the template that displays the add car form is car_form.html (the same template also supports updating auto data).

The AddCarView () class inherits the functionality from the CreateView () and LoginRequiredMixin class (only logged in users can create a new vehicle).

I define the following view class attributes: model that the class uses – Car, fields – the form fields to be visible, and success_url – the url that will be loaded after successfully completing the form.

I override the form_valid() method, which adds the user who created the new car (the Car model requires a user attribute to be defined).

The source code for the AddCarView class :

class AddCarView(LoginRequiredMixin, CreateView):
    model = Car
    fields = ['make', 'model', 'vrn', 'year']
    success_url = '/'

    def form_valid(self, form):
        form.instance.user = self.request.user
        return super().form_valid(form)

▣ Removing the car is done using the DeleteCarView () view class, and the template that prompts you to delete the auto is the car_confirm_delete.html file (displays the Delete button that deletes the auto and the Cancel button that returns to the previous page).

DeleteCarView () class inherits functionality from DeleteView, LoginRequiredMixin class (only logged in user can delete auto) and UserPassesTestMixin (user can delete auto created by himself).

I define the model arguments – the used Car model and success_url – the address to which it will be loaded after the successful removal of the car – in this case, the view showing all the user’s cars.

I am creating a test_func () method which checks if the user who wants to delete the car is the person who created the auto (the user can only delete the cars he created).

I override the delete () method, which additionally displays the message about deleting the car.

The DeleteCarView() class source code:

class DeleteCarView(LoginRequiredMixin, UserPassesTestMixin, DeleteView):
    model = Car
    success_url = '/'

    def test_func(self):
        if self.get_object().user == self.request.user:
            return True
        return False

    def delete(self, request, *args, **kwargs):
        success_message = f'Car {self.get_object()} has been deleted'
        messages.success(self.request, success_message)
        return super().delete(request, *args, **kwargs)

▣ The vehicle data is updated using the UpdateCarView () class, while the template displaying the form for changing this data is car_form.html.

The UpdateCarView () class inherits functionality from the UpdateView class as well as the LoginRequiredMixin classes (only the logged in user can update the data) and UserPassesTestMixin (a function is called to check if the user who wants to modify is the one who created the auto).

I define class attributes: model – specifies the model that is used to change the data – in this case the Car model, fields attribute – specifying which form fields are to be available and success_message – a text message about updating the car data.

Two methods are defined: test_func () and get_success_url ().

The test_func () method checks if the user who wants to modify the car’s data is the one who created the car.

The get_success_url() method displays a message about the data change and returns to the page with the name: car_detail – displaying repairs for a given car. As additional parameters, they are sent using the GET method: row, p and q that define the row and page of the car in the table and the text string from the search box.

Code of the UpdateCarView() class:

class UpdateCarView(LoginRequiredMixin, UserPassesTestMixin, UpdateView):
    model = Car
    fields = ['make', 'model', 'vrn', 'year']
    success_message = 'Car info has been updated'

    def get_success_url(self, **kwargs):
        row = self.request.GET.get('row')
        p = self.request.GET.get('p')
        q = self.request.GET.get('q')
        options = '?p=' + p + '&row=' + row
        options += '&q=' + q
        messages.success(self.request, self.success_message)
        return reverse_lazy('car_detail') + options

    def test_func(self):
        if self.get_object().user == self.request.user:
            return True
        return False

▣ Adding a repair note is defined in the AddRepairView() class, and the form is defined in the repair_form.html template.

This class has the following arguments: model – specifying the model used – in this case Repair. The next argument: fields – specifies what the form fields are displayed. The last argument is the success_message string displayed after the repair note was successfully added.

The methods in the AddRepairView() class are: get_context_data(), form_valid(), and get_success_url().

The get_context_data() method adds a Car object to the context, making it visible in the template.

The form_valid () method uses the data from the form to create a new model instance. To properly create an instance of the Repair class, it is necessary to provide a foreign key – a Car object that identifies the vehicle to which a given repair note relates.

The get_success_url () method specifies the address of the page to be displayed when a new note is successfully added. In this case, a page named car_detail as defined in cars / urls.py will be displayed.

class AddRepairView(LoginRequiredMixin, CreateView):
    model = Repair
    fields = ['date', 'description']
    success_message = 'New repair has been added'

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        context['car'] = Car.objects.get(id=self.kwargs['pk'])
        return context

    def form_valid(self, form, **kwargs):
        form.instance.car = Car.objects.get(id=self.kwargs['pk'])
        return super().form_valid(form)

    def get_success_url(self, **kwargs):
        row = self.request.GET.get('row')
        p = self.request.GET.get('p')
        q = self.request.GET.get('q')
        options = '?p=' + p + '&row=' + row
        options += '&q=' + q
        return reverse_lazy('car_detail') + options

to be continued

Django #3

You can see the application deployed on heroku here.
You can get a source code here.

In previous posts, I described user management in the application, i.e. a new user can register – create an account on the site, a registered user can log in, log out or change the password.

Then I’ll describe the main functionality of the app – car and repair records.

For this purpose, in the project urls.py file, I will attach the mapping of addresses related to cars application.

After the changes, the project urls.py file will look like this:

from django.contrib import admin
from django.urls import path, include
from users import views as users_views
from django.contrib.auth import views as auth_views

urlpatterns = [
    path('admin/', admin.site.urls),
    path('', include('cars.urls')),
    path('register-user/', users_views.RegisterUser.as_view(
        template_name='users/register-user.html'), name='register_user'),
    path('login/', auth_views.LoginView.as_view(template_name='users/login.html'), name='login'),
    path('logout/', auth_views.LogoutView.as_view(template_name='users/logout.html'), name='logout'),
    path('change-password/', users_views.ChangePassword.as_view(
        template_name='users/change-password.html'), name='change_password'),
]

The urls.py file created in the cars application also takes into account mapping addresses to appropriate views. The urls.py file from the cars application looks like this:

from django.urls import path
from .views import (
    CarsListView,
    AddCarView,
    RepairsListView,
    UpdateCarView,
    DeleteCarView,
    AddRepairView
)
urlpatterns = [
    path('', CarsListView.as_view(), name='cars'),
    path('add-car/', AddCarView.as_view(), name='add_car'),
    path('car/', RepairsListView.as_view(), name='car_detail'),
    path('car/<int:pk>/update/', UpdateCarView.as_view(), name='update_car'),
    path('car/<int:pk>/delete/', DeleteCarView.as_view(), name='delete_car'),
    path('car/<int:pk>/new-repair/', AddRepairView.as_view(), name='add_repair'),
]

The main view defined in the CarListView class displays all the user’s cars.

Remember to add the application in the project’s settings.py file in the INSTALLED_APPS list, e.g. for cars it will be an element: ‘cars.apps.CarsConfig’.

I used the pre-installed user model in the users application. In the cars application, I created models – objects mapping database tables.

These models will define: vehicle – Car model and repair – Repair model. The models.py file looks like this:

from django.db import models
from django.contrib.auth.models import User
from django.utils import timezone
from django.core.validators import MaxValueValidator, MinValueValidator


class Car(models.Model):
    make = models.CharField(max_length=10)
    model = models.CharField(max_length=10)
    vrn = models.CharField(max_length=10)
    year = models.IntegerField(default=timezone.now().year,
                               validators=[MinValueValidator(timezone.now().year - 100),
                                           MaxValueValidator(timezone.now().year)])
    user = models.ForeignKey(User, on_delete=models.CASCADE)

    def __str__(self):
        return f'{self.make} {self.model}'


class Repair(models.Model):
    date = models.DateField(default=timezone.now)
    description = models.TextField()
    car = models.ForeignKey(Car, on_delete=models.CASCADE)

    def __str__(self):
        lead = self.description[:10] + '...'
        return lead

The Car model contains make, model, vrn, year, and user attributes that describe the vehicle make, model, vehicle registration number, and year of manufacture, as well as a foreign key containing the user object that created the vehicle entry.

The year field has validators that make it impossible to enter a car with a production year 100 years older than the current date and newer than the current year.

The on_delete option of the user attribute has the default value CASCADE, ie deleting a user will also delete all his cars.

The Repair model includes the attributes: date – repair date, description – repair details and a foreign key car – which car the repair concerns.

The __str__() method prints the header that is the beginning of the repair description (leading 10 characters).

To be able to edit entries of both models from the administration panel, you need to add both models to the admin.py file of the cars application, i.e.

from django.contrib import admin
from .models import Car, Repair

admin.site.register(Car)
admin.site.register(Repair)

In the next post, I will describe the views that define individual functionalities, i.e. displaying cars belonging to the user, adding repairs, and searching for cars using the search box on the website.