In June 2015, the Internal Revenue Service (IRS) released a flood of information on US nonprofits by making electronically-filed Form 990s, the primary disclosure document and main source of information for nonprofit organizations, available to the public. Over 1.4 million tax returns dating back to 2011 are now available on Amazon Web services, so that anyone with a computer and Internet connection can access the data in a machine-friendly format. With hundreds of thousands of organizations already e-filing – and more every year – this dataset is growing exponentially. This new public dataset has the potential to usher in an era of radical accountability and transparency in the nonprofit sector.
But data availability does not equate to data interoperability.
The colossal task of unpacking the data, of making it truly open, is too complicated and vast for any one individual, even any one organization, to undertake on their own. The Nonprofit Open Data Collective, a collaboration of leading Form 990 players like Charity Navigator, Guidestar, Urban Institute, and Aspen Institute, as well as nonprofit scholars and independent professionals, recognized that this project requires a team effort and worked together to create a standardized dataset that can be used to draw insights about the nonprofit sector writ large.
They organized the Form 990 Datathon, which expanded the work started by Charity Navigator’s Digitized Form 990 Decoder. Network collaboration with Guidestar and ASU will enable Charity Navigator to finalize the tech platform behind the digitized form, and then run another hackathon to best leverage the data. The ultimate goal is to convert 990 data into more accessible spreadsheets and post the resulting files online for anyone to use.
On the eve of the first Form 990 Datathon, Dr. David Borenstein, Lead Data Scientist at Charity Navigator, came to Feedback Labs with some broader questions for the group:
Who takes ownership of the data? Once Form 990 data is publicly available and digestible, we can use the data to better understand the nonprofit sector and hold nonprofit organizations accountable. Opening up this dataset can be like the Rosetta Stone, helping us to decipher the inner workings of the nonprofit sector. But it can also be like unlocking Pandora’s Box, leading to accusations and finger-pointing among nonprofits.
While the Nonprofit Open Data Collective is working to clean Form 990 inputs and form a common taxonomy around how we talk about the data, it may not be appropriate for this group to serve as advocates for nonprofit organizations, protecting them from the potential harm that this data can bring. Should the IRS take on this role? If we consider this data a public good, it may make sense for IRS to assume the role as protectors of this data. On the other hand, it is the nonprofits themselves who (through a legal mandate) provide data to the IRS. They too are important stakeholders in this discussion. But having the US nonprofits at the table is no easy task. Should there be a consortium of nonprofits, working with the government but advocating for nonprofits? There are many possible routes to take, but DataStorm attendees agreed that, in order for this dataset to be truly open, we need to make sure we have all the right voices in the room.
There are many possible routes to take, but DataStorm attendees agreed that, in order for this dataset to be truly open, we need to make sure we have all the right voices in the room.
What’s next? Once the 990 data is more open and available, what kinds of tools should be built to allow for higher aggregation and dissemination of knowledge? Before we think about building tools, we have a few critical next steps.
- Determine what questions we are trying to answer. A common theme in previous DataStorms, building tools that address specific problems have better user experiences than tools that attempt to serve multiple functions at once.
- Define the audience. Who are we expecting to use this tool? Most importantly, what are the tech-capabilities of this audience? A tool targeted to data scientists and analysts is going to look different than a tool designed for nonprofit employees.
- Understand what makes this data unique. Building a tool around a unique feature of the 990 form can allow users to address questions that were previously unanswerable. One member of the this DataStorm mentioned that the financial holdings are the most unmined part of the 990s, and allowing better access to this dataset can lead to a better understanding of nonprofit financials and where the money goes.
We are excited to see what’s next for David Bornstein and his team at Charity Navigator. Want to learn more? Click here to read what was accomplished at the Form 990 Datathon and check back on the Feedback Labs Blog for future collaboration. Want to stay involved? Reach out to us at [email protected] or contribute your thoughts below.
David’s mission is to make data more human-oriented. By crafting narratives from numbers, David seeks to multiply the value and utility of Charity Navigator’s hard-earned knowledge. Conversely, by automating tedious calculations, he can free up human capital for creative applications. In his previous roles, David built tools to simplify data exploration in science and government. David has taught inside New Jersey prisons and helped build homes for the vulnerable in Maine, Mississippi and Louisiana. He holds a Ph.D. in Quantitative and Computational Biology from Princeton University.
DataStorms, a stream of our popular LabStorms, are collaborative brainstorming sessions designed to inspire open dialogue around collecting, analyzing, storing, and exchanging feedback data. Facilitators relate lessons and challenges that feedback data may bring in return for thoughtful ideas, suggestions, and informal peer-review from our community.