Crowdsourced data

2018-03-11 2 min read

    Open source has become a critical part of modern software development that allows small teams to move quickly and do in months what used to take years. This has been driven by massive platforms, such as GitHub, that make it extremely easy to find useful code, contribute back, and provide feedback, comments, and requests.

    Unfortunately, data hasn’t seen as strong of an open sourcing trend. There are a few sites - ranging from data.gov for government data to various aggregators that offer various datasets for download but the formats are inconsistent and some even come in PDF. There just hasn’t been a single open data standard that’s been globally adopted. Instead we have cities offering PDF and CSV files for download and companies offering throttled APIs to their proprietary data.

    Parental leave in tech

    Things are heading in the right direction and I only wish it was quicker. A recent trend that I’ve been a big fan of is people editing and collaborating on a Google spreadsheet that’s designed to provide transparency for a topic. The most recent example I discovered is “parental leave in tech.” It’s a simple crowdsourced spreadsheet that lists the parental leave policies for various tech companies. If you wanted the information about one company I’m sure you’d be able to find it on the web but there was nothing that consolidated the information into a single document.

    While contributing to open source code generally requires some coding ability none of that is required to add or modify a few cells of a spreadsheet. Because of that low barrier the formatting may end up inconsistent but that will just be fixed by someone else later. Creating a Google spreadsheet to collect data isn’t very valuable unless others are contributing and that’s gotten much easier. I discovered the parental leave spreadsheet using Twitter which massively lowers distribution costs and if something is both valuable and easy to contribute to it ends up quickly amassing a ton of data.

    These spreadsheets so far have been more tech focused but inevitably they will move beyond tech and into the mainstream. I can’t wait.