I’ve been playing with python to try to learn how to make maps using Matplotlib. As it turns out, it’s not all that complicated. Using Reddit’s API I scraped the top 300 posts from the largest city subreddits in the United States and Canada. After some data wrangling I end up with a Pandas dataframe with the coordinates of the city, as well as all the titles of the top 300 posts associated with that city’s reddit. Finally I made a program that searches through those titles with some regular expression, and throws the results onto a map. The program also attaches labels to the top cities.
I wrote more about this, with some different examples, for Exploring Geography on the Goodlands blog. I’ll link to that as soon as it’s up online.
The size of the circle is the relative frequency of the word’s usage in that particular city. So equal-size circles for different words mean that, compared to the average, this city uses those two terms at about the same rate. Here’s a few fun examples of maps I was able to produce.
This is sort of a “control” to make sure that we can actually draw some sort of conclusions from this Reddit data. For y’all, it looks like the data matches up very well with established usage patterns. So it works (at least in this instance)! I’m not sure why Albany, NY has such a high instance for a Northern city.
This is another test, and it also passes, matching up with Tornado Alley. The three overlapping dots in north Texas are Dallas, Fort Worth, and the metroplex suburb of Denton, which has a surprisingly large Reddit community.
Huntsville, Alabama stands out here. The Marshall Space Flight Center and the Army’s Aviation and Missile Command are why this town is called Rocket City, USA. Reddit probably exacerbates this balance, since it has a well-known bias towards the engineering professions among active users. I’m surprised that Houston isn’t larger on this map. Its smaller size could be simply because Houston is a bigger city, with more to post about besides NASA.
This query was a mix of no-brainers and some surprises. Los Angeles, Houston and Washington, DC are the top three – no surprise here. LA and Houston are sprawl-afflicted, car-centric cities. Around Washington DC, steady suburban growth since 2000, including all the way through the recession, has clogged up the Beltway and other older infrastructure. I’ve never been to Austin, but my guess is that, as America’s fastest-growing big city, it is also having its fair share of growing pains.
So what about Norfolk and Nashville? Some quick googling fails to produce an explanation. Neither Nashville nor Norfolk have particularly long commutes, nor are they known for congestion. Google Trends shows that the Hampton Roads area leads Virginia in “traffic” searches, which surprises me as a Nova native, considering our legendary bottlenecks. Nothing special for Nashville, though. Perhaps just noise – if you have an explanation, please let me know.
5.We are all familiar with how Trump played the media against itself, and Reddit is no exception. Trump was far more talked about than Obama or Hillary Clinton in most cities. Interestingly, it was the cities that hated Trump the most that gave him the most publicity. Oakland, California, where Trump got 4.62% of the vote, leads the nation, while Chicago, Portland, Oregon and Philadelphia follow.
The region where Obama mentions were the most above-average would not have been my first guess – a belt stretching from Michigan through Cincinnati on to Nashville, Memphis and Birmingham. These areas were crucial to Trump’s victory, and it wouldn’t surprise me if Obama was portrayed there in a negative light. Honolulu, Obama’s birthplace, leads unsurprisingly.
This one is a little less straightforward. I was running some algorithms to find interesting word pairings, and this came up as highly symmetric – few cities had both high “theft” words and “family”, Bloomington, Indiana being an exception. It seems like the West Coast, Southern Ontario and Tornado Alley have a theft problem! Waterloo stands out as a college town.
These are just a few examples, let me know if you have any ideas for keywords to run. If I come across anything else that’s interesting, I’ll post it.