Distrusting data: Things to remember from Buzzfeed debunking

The article Distrust Your Data: Jacob Harris on six ways to make mistakes with data was the basis for the Buzzfeed article exercise. (Skim Harris if you were absent).

Here are some key takeaways:

1. “Fear and paranoia are the best friends a data journalist can have.”

2. Proxies
A proxy is a variable that is used when it’s impossible to measure something directly. To be a good proxy, it must have a close correlation to the variable of interest.

3. Sample size
A sample is a subset of something bigger. Larger sample sizes generally lead to increased precision.

4. Correlation does not equal causation
Just because two variables appear to be related to each other does not necessarily mean that one causes the other.

5. Ecological inference fallacy 
Beware of drawing conclusions about a population based on information about a subsection of the population.

6. Question outliers
An outlier – a number or point that is distant from others – can be something interesting or it can be an error in the methodology or data.

7. Consider the source of the data

8. What is the methodology?

9. Does the data smell?

10. You can do it better

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment