Within the past month I have had the opportunity to sit down and work on my capstone project uninterrupted. The goal as of last semester was to finish the data cleaning/gathering portion of this project and dive deep into validating my results. Two weeks into the break, I finally finished gathering all the data from approximately 200 files and organized it into one file through Python code alone. My previous attempt at using Microsoft Excel failed as I was trying to run too many formulas on one spreadsheet. This second attempt instead modified the code I had taken and concatenated the files in such a way that I could add the year dynamically. This means I now have one CSV file that contains all crimes from 2009 to 2017. On top of organizing, I was also able to simplify the 600 columns of information into approximately 80 columns. Many crimes such as liquor were classified not only as a liquor crime, but also by location and type. (It could be an on-campus crime or off campus or hate or public property crime.) I made the executive decision to combine all these different classifications to make the data more human readable. This combination did make the assumption that if a college did not report any statistics, they had no crimes reported. There is logic behind why I made this decision. In a given report, we only pulled data for that year. (A 2009 report includes 2007 and 2008 statistics, but we only took 2009.) I assume if a college was receiving federal funding, then they would have to report statistics for that year. This was not always the case. In fact, many colleges just did not have anything for the year of the report. I do not know why this data is missing, but I might be able to find out more by contacting the Clery Center which could provide an explanation. Until I receive a response, I am stuck making this assumption.