new project

NYC Open Data Portal

Working Through CSV

Source Data

Oil-burning boilers are one of the largest sources of air pollution in NYC. The purpose of the project is to see how complete this dataset is and to map the data in a geographic visualization. The original data is small-sized (~22MB, 8K rows), and appropriately enough, dirty.

Steps

  1. Get data from remote url using NYC Open Data API, and save it to local file rows.csv.
  2. Data cleaning: There are issues with the data as imported, having to do with cleanliness and completeness. First, get rid of rows with all NaN value at the end of the dataframe. Then parse the ‘Owner Address’ to get clean latitude and longitude information.
  3. Save the clean data with latitude and logitude into a new file oil.csv.
  4. Load the cleaned CSV data into a new dataset on CartoDB for geographic visualization.
  5. Georeference the map by the cleaned latitude and longitude columns.
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s