Compare Prices Everywhere

This is my entry for the Tableau Ironviz qualifier contest. It is all about mobile design and Tableau 10’s new Device Manager. I intended to create a simple and functional visualization resembling a mobile app.

When I played with the app and browsed through the data, the biggest surprise for me was the cost of apartments in Luanda, Angola. It didn’t seem right for the country whose GDP is so much lower than the rich European, Asian or North American superpowers. But apparently the data checks out and the high prices of properties in Luanda are the result of a limited supply of luxurious housing demanded by expats. The economy is booming but the construction is lagging behind. There is an interesting article here, if you wanted to learn more.

Have fun comparing prices and salaries in almost 600 cities around the world!


Back to top|Contact me

Bob Dylan – analyzing 55 years of lyrics

Love him and his music or not, Bob Dylan is considered by many to be one of the greatest lyricists of all time. He has recorded 36 studio albums, written hundreds of songs, and is still touring the world, performing at dozens of concerts each year. Not bad for a 74-year-old grandpa who started his career more than 5 decades ago. He’s been recognized so many times that Wikipedia has a separate page to list all of his awards and nominations.

But what is it that draws people to his songs? Do his lyrics have a particular essence? Are they happy or sad? Does Dylan sing primarily about himself and his experiences? What words does he use most frequently? If you’ve ever wondered about any of this, my latest viz, and a submission to Tableau’s IronViz Music Viz Contest, attempts to help answer these questions. In this post I break down how I approached the challenge of finding new ways to understand Dylan through text analysis of his lyrics.

I started this project by first determining where to source my primary data set – the lyrics! There are many websites out there that archive song lyrics, but I decided to rely on Dylan’s official website, presumably the most reliable source of information about the artist and his writing. Most of his songs do indeed have listed lyrics; I say most because several songs are missing lyrics and a handful of pieces are instrumental. I used to scrape albums and song lists, as well as lyrics. All you need to do this yourself is use their free app, available on their website. Be sure to check out the tutorials in the Help sections to get started. The program is fairly easy to use but video-walkthroughs will likely save you time and some potential frustration.

I scraped the lyrics twice. For text analysis in Tableau I needed each line in a separate row and for sentiment analysis (more on this later) I wanted to have the text of the whole song in a single row.

Preparing data for Tableau

After training’s crawler to separate each line into its own row, this is what I got:

What I eventually wanted to end up with was each word in a separate row. To achieve this, I took the data to Tableau and used the INDEX() function to number each line, Restarting Every song. I copied the resulting crosstab back to Excel and used Excel’s Text-to-Columns command to split each line into words, using space as a delimiter. I then added numbering to each column of words:


From there, I relied on Tableau’s free data reshape tool to convert my wide data into a long format. This gave me the table like below, with each word identified by the line number and the word’s location in the line. The only thing lost in translation were paragraph breaks between verses, but I can live with that.


If you were paying attention, you may have noticed that I added 2 additional columns: pct_before (for punctuation preceding the word) and pct_after (for punctuation following the word). I’d like to credit Robert Rouse for this method of treating punctuation in data prep. He did some fantastic work with his visualization of Bible text and I got many great ideas by digging into his viz.

Separating but keeping punctuation marks is important if you want to analyze raw words while retaining the ability to properly display your text in Tableau, by concatenating raw words with the punctuation marks.

Sentiment Analysis

Okay, so we can now analyze our lyric statistics in Tableau, but how can we assess the mood and emotion of the song? R to the rescue! As I mentioned, for sentiment analysis I needed each song as a long string (one song per row). That is the data input format required by R. Bora Beran writes more on his blog about running an R sentiment package from within Tableau, but for a Tableau Public viz I had to do my analysis in R, and plug the results into Tableau. What the R sentiment package does (download it here), is it cross-references the words analyzed against its built in database of over 6000 words classified as positive, negative, or neutral, compiles results for the whole string (a song in our case) and uses fancy statistics to best fit a single descriptor to your text. The database of words is just a text file and you can add your own words and their sentiment to the database to adapt the package to your data set, especially if the text you are analyzing contains uncommon words, or word combinations.

Below is the R code that reads in the data file (two column CSV with the song name and lyrics), strips punctuation marks, numbers, and converts all words to lowercase, runs its classification, and outputs the results to a new CSV.

# load library

# load data
data <- read.csv("words.csv")

# remove numbers
data[,2] = gsub("[[:digit:]]", "", data[,2])
# remove punctuation
data[,2] = gsub("[[:punct:]]", "", data[,2])
# convert to lowercase
data[,2] = tolower(data[,2])

# classify emotion
class_emo = classify_emotion(data, algorithm="bayes", prior=1.0)
# get best fit
emotion = class_emo[,7]
# classify polarity
class_pol = classify_polarity(data, algorithm="bayes")
# get best fit
polarity = class_pol[,4]

sentiment = cbind(data, emotion, polarity)

write.csv(sentiment, file="sentiment.csv")

The rest was Tableau fun and the result is below. I hope you’ll enjoy.

Maxime K - Impressive, really…

Congrats and best luck for the contest!

Hector Alvaro Rojas - Nice work has been done here!

Text mining and Sentiment analysis in one great expression.

I would like to go farther and try to get the pattern or tendency of some of his great messages. Well, this is part of the Sentiment Analysis idea. Anyway, “the answer my friend is Blowing In The Wind”.


Gregory Lewandowski - Simply put, this is fantastic!! Your attention to detail, the analysis, and the clarity of the story are unparalleled!!

Well done!!

George Gorczynski - Thank you, Gregory. I’m really glad you find the viz engaging.

Dr Andrew Meyenn - Hi,

I love your site here.

I have had similar idea, but looking at visualisation using R.

Ideas was to look at themes and then try and data mine themes – R has a themes package.

I will let you know what I come up with.

You have convinced me on Tableau – but it is charge no?

All the best Andrew Meyenn

Back to top|Contact me

Chicago Crime Scene

I lived in Chicago in the early 90s. I loved the city, the hustle and bustle, grandiose architecture, busy urban streets, and its bold location on the shores of Lake Michigan. I also felt quite safe there, living in the Far North Side neighborhood of Jefferson Park. I knew that there were parts of the city that should be avoided so that’s exactly what my friends and I did and we went on with our busy lives.

Well, Chicago has a problem. A very serious problem with violent crime, and especially deadly shootings. It has been dubbed by some as America’s homicide capital. Perhaps this is not entirely fair as there are smaller cities in the US that have fewer homicides but violent crime rates per capita that are much higher. St Louis and Detroit often take this distinction. However, there is no escaping the harsh news of another deadly weekend in Chicago, lives lost, squandered hopes, and people fleeing deadly neighborhoods. Only last week 11 people were shot and killed and 61 wounded on Chicago streets. It’s not difficult to understand why Chicago has recently earned the moniker, Chiraq.

The City of Chicago has a great data portal with a dataset of all reported crimes in the city since 2001. I’ve known about it for a couple of years and always wanted to do a visualization about crime in Chicago. Guess what, Tableau just announced the 10x Data Viz Contest to celebrate the ability of the upgraded Tableau Public to handle up to 10 million rows! Perfect: my Chicago dataset has almost 6 million rows and contains crime reports up until the end of May. Have a look!

Special thanks to Ben Sullins for his great tutorial on embedding Google News feed into Tableau and the code he graciously shared. Don’t forget to scroll down to Jim Wahl’s comment, from which I learned how to embed Twitter feed into my viz.

Alexander Mou - Take Action tab has a link to a non-functional page.

Great Viz!

George Gorczynski - Thanks, Alexander.

Christian Collins - This is a great viz! Now lets see how I can put these data to work.

David Schuler - This is great! Do you have any sort of references to how you did the mapping portion?

George Gorczynski - Thanks for the comment, David. Do you have any specific questions about the mapping?

Bobbi Rowlett - This is very amazing visual. Wondering how much time was invested in it. I’m very much a newbie and you have opened up new ideas to me. Thank you very much for sharing.

Back to top|Contact me

What is morally acceptable? It depends where you live.

In 2013, the Pew Research Global Attitudes Project asked over 40,000 respondents in 40 countries about their opinions on eight topics generally considered to be moral issues:

  • Abortion
  • Divorce
  • Drinking alcohol
  • Extramarital affair
  • Gambling
  • Homosexuality
  • Sex between unmarried adults, and
  • Use of contraceptives

Respondents indicated their sentiment about each issue by choosing one of the following responses:

  • Morally unacceptable
  • Morally acceptable
  • Not a moral issue
  • Depends on the situation, or
  • Refused to answer

Unsurprisingly, there was a great deal of variation in responses across countries and topics. The data suggest that Pakistan and Ghana are the most conservative countries in their attitudes toward the eight issues covered, while France and Germany the most liberal. Around the world, there is general agreement that using contraceptives is morally acceptable and even more consensus that extramarital affairs are morally unacceptable. In some countries, it appears that the most controversial or uncomfortable topic was that of homosexuality, with up to 11% of respondents in India and Pakistan refusing to even answer the question of whether homosexuality is morally acceptable or not.

For more interesting insights, explore the visualization below. [Confession: I like radar charts. While some may disagree, I find them morally acceptable and quite useful, although they require a bit of getting used to. I decided to show some aspects of the data using a radar chart because it effectively presents multivariate data and provides quick visual cues for commonalities and outliers. Select a couple of countries and watch the difference in shapes to draw your conclusions.]

ANDRES ALVAREZ - Very good investigation and very good showing

pgupta - Hi George, this is an awesome viz. I had a question – did you have to pre-compute the x,y coords that I see in the underlying data? I guess I can do that, based on the known slope of the line corresponding to the “point”. But just wanted to check in with you.

Country point Country Question Value X Y
France 0 France Having an extramarital affair 47 21 21.103601885
France 1 France Sex between unmarried adults 6 0 3.81
France 2 France Having an abortion 14 -6 6.286179285

Thank you.

George Gorczynski - Hi – Thanks for your comment and a question, pgupta. Yes, I calculated the X and Y of each point in the data source. However, these calculations could also be carried out in Tableau.

10 Questions for Tableau Zen Master Kelly Martin - […] Gorczynski’s Morally Acceptable – I’m usually annoyed by radar charts, but this is […]

Angela - Hi may I ask how do you create the background circles to show the percentages?

Back to top|Contact me

The Militarization of Police in the US

At the end of August, US president Barack Obama ordered a review of federal programs that enable local and state law enforcement agencies to receive or purchase military equipment. This decision was prompted by the unsettling images of heavily armed police officers during clashes that followed the recent fatal shooting of a black teenager in Ferguson, Missouri.

Within one such federal program, the “1033 program” initiated in 1991, over $5.1 billion in military hardware was transferred to date to 8000 local and state law enforcement agencies in the United States.  Since 2006, police have received tens of thousands of pieces of surplus military equipment from the government worth almost $2 billion. This includes, 80,000 rifles, 50 fixed wing airplanes, over 400 helicopters, 350 boats, 350 armored vehicles, among them dozens of mine resistant vehicles, and close to 20,000 pieces of night vision and infrared gear. As of 2014, 184 police departments had been suspended from the program for failure to comply with guidelines and missing weapons ranging from assault rifles to armored Humvees.

In May, The New York Times requested and received a database of military equipment transfers (2006-2014) from the Pentagon. I obtained this raw data from the GitHub account where the Times graciously posted it. I was inspired by the work of NYT graphics department on this topic but the visualization below represents my interpretation of the data. I presented the dollar value and quantities of the military gear transferred to individual states and counties. I also combined it with population statistics to normalize the acquisition numbers and get a better idea of the size of per capita transfers. Finally, I looked at the relationship between crime statistics and the size of transfers to individual states, and I investigated crime and police homicide trends over the last decade.

Douglas Galloway - Great Viz ! very relevant and telling of what may or may not be a really dangerous epidemic. Is the 210M is enough to support a small war.

George Gorczynski - Thank you for your comment, Douglas. Agreed, the numbers are staggering. It would be interesting to know how this equipment is being utilized (i.e. 79 helicopters, 8 airplanes and a mine resistant vehicle in Brevard County alone). In some cases the acquisitions may be justified and we all want the police force to be adequately equipped and protected. But the trend is certainly troubling.

AOK ManRay - I’m all about constantly examining policies and their effectiveness, and correcting when we see waste or no improvement or–worst of all–making things worse through unintended consequences. However, I wanted to highlight a major fallacy that you hint at here. By highlighting the areas that have high militarization and low crime, you are suggesting that this is inappropriate. But how do you know that there is no cause and effect here? I have no idea whether that’s true or not, I just feel that it’s something to be considered. If that were true, then it would be more worthwhile to highlight those who have high military spending and high crime–why has the system failed them? Or have they only more recently received equipment? Or maybe it was even higher before? I have no answers to these questions, just wanted to highlight the challenges with correlations…

Beautiful viz, btw! Very clear, concise and easy to read. Thanks for sharing!

George Gorczynski - You raised a very good point, thanks. More granular (state/county level) analysis of crime rate trends would be valuable.

Back to top|Contact me