Cleanest way to read a CSV file with Python

2016-01-10 2 min read

code

Python’s my goto language for doing quick tasks and analyses with the majority of them being quick scripts to analyze a file or pull some data. I’m constantly looking to improve my code and lately have developed the following approach. The goal isn’t to make it as short as possible but to make it as expressive and clean as possible. They’re related but not synonymous.

#!/usr/bin/python

import csv
from collections import namedtuple

# Can add whatever columns you want to parse here
# Can also generate this via the header (skipped in this example)
Row = namedtuple('Row', ('ymd', 'state', 'size', 'count'))

with open('file.csv', 'r') as f:
    r = csv.reader(f, delimiter=',')
    r.next() # Skip header
    rows = [Row(*l) for l in r]
    # Do whatever you want with rows

The reason I like this approach is that it’s obvious what’s happening and it’s being done in a Pythonic way. There’s no traditional for loop that spans multiple lines and it’s simple to update the loop to manipulate the values during the handling of reach row. This approach also leverages the namedtuple collection which is one of my favorite types - a class-like structure that’s significantly more memory efficient but provides easy named access the fields (row.ymd, row.state). With this basic structure in place we can add all the bells and whistles that manipulate and tweak the rows. One thing to be aware of is that the namedtuple generates if immutable so you either need to manipulate the values before construction or use additional structures to transform the data.

Dan Goldin

Cleanest way to read a CSV file with Python