Mapping NYC Taxi Data

This post was inspired by HN user eck's top comment seen here. Earlier this week the New York City Taxi & Limousine Commission officially released yellow and green taxi trip record data for all of 2014 and up to June of 2015. This includes millions of records that include pick-up and drop-off dates and times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data, which was previously only available through submission of a formal Freedom of Information Law (FOIL) request, is available in CSV format as well as from Google's BigQuery…

read more...

NBA Twitter, Emojis, and Word Embeddings

A few weeks ago I read this blog post from the Instagram engineering team on machine learning and emoji trends. The post talked about general emoji usage over time on Instagram and then used word2vec an algorithm that uses a unsupervised learning process to read through a corpus of text and is then able to predict the context around a given word or emoji. The famous example of this word embedding method is vector['king'] - vector['man] + vector['woman'] = ['queen']. When I first read this post the NBA playoffs had recently started so I decided I would collect tweets…

read more...

Pandas & Burritos - Analyzing Chipotle Order Data

A few months back the New York Times ran an article titled "At Chipotle, How Many Calories Do People Really Eat?" which took a look at the average amount of calories a typical order at Chipotle contained. They found that: The typical order at Chipotle has about 1,070 calories. That’s more than half of the calories that most adults are supposed to eat in an entire day. The recommended range for most adults is between 1,600 and 2,400. Very surprising, Chipotle isn't the healthiest place to eat. What was more interesting to me was the fact…

read more...

Exploring NBA Data with Python

After a long weekend of NBA All-Star game festivities I stumbled upon Greg Reda's excellent blog post about web scraping on Twitter. In it he goes over how to find and use API's to scrape data from webpages. The example he uses is the NBA's very own stats website, which to my surprise provides a lot of very interesting data. I decided to dig a little deeper and see what I could find. The shot log API from NBA.com returns data about every shot a player took during a game. These data points include how much time was left…

read more...

Catching the bus to class with Python

For the past month I have been studying abroad in London. One of the first things I noticed when arriving here was how much better the public transportation system was compared to what I am used to in Philadelphia. Whether you are taking the tube or the bus it is clean, quick, and the easiest way to get around the city. It took a few weeks to become fully acclimated with all the stops and routes but when everything starts connecting it becomes very easy to get around. Depending on traffic my commute to class is about a fifteen-minute bus…

read more...

A quick look at the World Cup final through Instagram

The 2014 World Cup will go down as one of the best in recent history. It featured countless headlines, from Germany asserting themselves as the top team in the world by humiliating the entire nation of Brazil 7-1 on their own turf and eventually winning it all. To the end of Spanish world soccer dominance and Luis Suarez continuing with his animalistic tendencies. And lets not forget James Rodriguez making sure he’s a household name with this gem: I had wanted to do a project using the Instagram API for a while and thought the world cup final would…

read more...