Loading data from csv file to numpy.ndarray

When there are no missing values ​​in the source data, we can use the numpy.loadtxt() function.

However, if there is no value in the loaded file, instead of the above function, we can use the numpy.genfromtxt() function, i.e.

import os
import numpy as np
script_dir = os.path.dirname(__file__)
path_to_file = os.path.join(script_dir, 'data_file.csv')

data_array = np.genfromtxt(path_to_file, dtype='str')

The genfromtxt() function returns an object of type numpy.ndarray. As additional function parameters, we can add e.g.

  • delimiter - determines which sign separates particular values
  • skip_header - specifies how many lines from the beginning of the file are to be skipped
  • autostrip - a bool parameter specifying whether spaces should be automatically removed

A list of all parameters can be found here.

Parsing the CSV file

CSV files are text files in which each line represents one data record, and the individual data in the line is separated by a delimiter, usually a comma.

In the example below, we are parsing a refueling report from a gas station. The first line is the header and contains the data: Contractor’s data; Name; Surname; Correction number; WZ number; Date; Time; Counter; Station; Registration number; Card number; Product name; VAT percentage; Price at the station; Net price; Gross price; Discount value; Quantity; Net; VAT; Gross.

In this particular case, the delimiter is the semicolon character. The following lines will contain entries about the next refueling. We want to obtain from the source file data on the date of refueling, the registration number of the car and the number of liters of fuel taken.

import csv

with open('report.csv') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=';')
    total = 0
    for line in csv_reader:
        print('{}  {}  {} ltr'.format(
            line['Data'], line['Registration number'], line['Quantity']))

        total += float(line['Quantity'])
    print('Total: ', total, 'ltr')

    with open('new-report.csv', 'w') as new_csv_file:
        field_names = ['Date', 'Auto', 'Refueling']
        csv_writer = csv.DictWriter(
            new_csv_file, fieldnames=field_names, delimiter=';')
        csv_writer.writeheader()
        csv_file.seek(0)
        next(csv_reader)
        for line in csv_reader:
            dict = {}
            dict['Date'] = line['Date']
            dict['Auto'] = line['Registration number']
            dict['Refueling] = line['Quantity']
            csv_writer.writerow(dict)

We perform parsing using the csv module. Then, using the context manager, open the report.csv file for reading. We use the DictReader object for reading, thanks to which it will be possible to refer to the value by specifying the keys from the csv file header.
Then the total value is calculated – the total amount of fuel taken.

We save the obtained data on refueling in the new-report.csv file. In this case, we’re using a DictWriter object. To use the iterator again, set the file content pointer to the beginning of the file – csv_file.seek (0). We replace the default headers with the new field_names contained in the list, so it is necessary to jump to the new iterator value by executing next (csv_reader). We save the new line in the file with the writerow () method of the csv.DictWriter object.