I read Thomas Piketty’s Capital in the Twenty-First Century a couple of months ago but have only organized my notes and thoughts now. It’s a simple, enjoyable read that provides an overview of the modern western economies and offers a compelling explanation of how wealth and income equality occur. I took a variety of economics classes in college but none of them felt as concrete as the book: Piketty does a great job introducing simple mathematical relationships and then simulating the results under different conditions. This allows the reader to get a feel for the data and makes the ideas much more tangible than an abstract formula. Piketty couples this with the economic data from the past two centuries to craft a persuasive argument for the causes of wealth accumulation.

Countless others have looked through the data, identified issues, and provided counterarguments so I don’t want to get into that but I do want to highlight how important having data is for all types of research. If we’re serious about these topics we should strive to collect as much data as possible while making it as accessible as possible. Piketty spent numerous hours collecting and transcribing the data from various paper sources and it’s amazing what came out; I can only imagine how much other valuable research would come out if there was more publicly available data.

Governments should be responsible for collecting data and releasing it publicly. Many are starting to do this already although it still tends to be obfuscated behind a navigational maze and hidden in esoterically formatted PDFs. Over time we should see it become more transparent as the data formats standardize and we develop better tools to dig through the existing data.

Another issue we need to address is data correctness. On one hand it’s great that people are going through Piketty’s data and making sure it’s valid but on the other if it’s extremely confrontational and used to invalidate his work it serves as a warning to others that plan on releasing their data. Why would a researcher spend thousands of hours collecting data and making it accessible and then have to deal with the critics who find a few issues? Much easier to keep the data hidden and only provide the high level numbers that can’t easily be challenged without doing the hard work. This perverse incentive needs to be resolved if we expect to see high quality researched being produced with open sourced data.

I’m hopeful that these larger scope theories with potential societal-impact become more common as we move into the 21st century. We have an increasing variety of tools to start making sense of this data with both individuals and institutions being more involved in organizing the world’s data. No theory will ever be perfect or explain every case but having more data will serve as a guide for governments to hopefully improve life for their citizens. And if data is collected along the way it will fuel more analysis with actionable insights.


Read more!