Tidy Data

Improving Messy Data


A lot of real data isn’t very tidy, mostly because most scientists aren’t taught about how to structure their data in a way that is easy to analyze.

Download a messy version of some of the Portal Project data. Note that there are multiple tabs in this spreadsheet.

Think about what could be improved about this data. In a text file (to be turned in as part of the assignment):

  1. Describe five things about this data that are not tidy and how you could fix each of those issues.

  2. Could this data easily be imported into a programming language or a database in its current form?

  3. Do you think it’s a good idea to enter the data like this and clean it up later, or to have a good data structure for analysis by the time data is being entered? Why?