Paul Bradshaw has come up with the following diagram to explain the process of data journalism:
The five stages are:
Compile – Gathering the data. This is the most important stage. Everything else depends on how the data set was created. Sometimes reporters are given data. Sometimes they have to extract it from a database. Sometimes they scrape it from websites or pull it from APIs. Sometimes they collect it themselves through observations, surveys, or crowdsourcing.
Clean – The process of removing human error and/or converting it into a format that is consistent.
Context – Journalists need to ask who, when, how and for what purpose the data was gathered. Then they need to analyze it. Is anything significant revealed in the data? Where is the story?
Combine – Single source stories are often flat and one-dimensional. Combining one data set with another or with additional reporting can make it more accurate and vivid.
Communicate – Figure out the best way to convey what is significant and interesting about the data. Then do it.