Setting map classification with parameters

Maps are tricky, especially quantitative choropleth maps. Not because they are hard to make in Tableau. Just the opposite, it takes just a few mouse clicks to make one but is it right? It depends on the data. When you drop a continuous measure on the Filled Map, Tableau creates a choropleth map and assigns a unique shade of color to each mark – a sequential color palette.

 

When your data is normally distributed, this default setup might be just what you need…

 

 

 

 

 

 

 

 

 

… but it often isn’t and the color assignment requires further exploration.

 

 

 

 

 

 

Overview of map classing

Cartographers use several different methods of aggregating features into classes, all with a single purpose of making spotting patterns in the data easier. The maps above are examples of Unclassified Scheme where every mark (polygon) with a unique data value receives a unique shade of grey. Some polygons may look identically colored but, as long as the value they represent differs, so does its shade. An alternative to this approach is reducing the individual quantitative values to a smaller number of categories or classes. Think of it as binning your values and here is a histogram that illustrates it:

So what are these different classification methods? I’m glad you asked! Below is a list of the most important map classifications. See Tableau workbooks further down in the text for illustration.

Equal Interval Scheme

It divides all values (between min and max) into classes of equal width. For example, percent of Latino population by US County: [5%-10%], [11%-15%], [16%-20%], etc., where the width of the class is 5%. The easiest way to create an equal interval scheme in Tableau is to switch to Stepped Color option in Edit Colors menu.

Pros

  • easy to understand
  • useful as a common classification scheme for comparing multiple maps.

Cons

  • not good for skewed data distribution.

Quantile Scheme

In a quantile scheme each class contains an equal number of marks. In Tableau, this can be achieved with Percentile Quick Table Calculation.

Pros

  • easy to identify marks at the extremes, e.g. top 20% or bottom 20%
  • intervals are usually wider at the extremes highlighting changes in the middle values.

Cons

  • break points may seem arbitrary and irregular.

Natural Breaks (Jenks) Scheme

Natural Breaks classes are based on, yes, you guessed it, natural breaks inherent in the data. This scheme uses an algorithm that creates breaks where there are relatively big jumps in data values. In other words, with this scheme, you should see minimum variation between members of each class and maximum variation in value between classes. Since Jenks Scheme uses algorithm similar to K-means (minimizing distances within groups), a similar result will be achieved with Tableau’s clustering which is based on K-means algorithm.

Pros

  • maximizes the similarity of values in each class

Cons

  • it’s a bit”mathy” and may need some explanation of statistical concepts used.

Custom Breaks Scheme

The name says it all, you create your own classes based on the data and what you want to emphasize. You can create these groups with Tableau calculation.

Pros

  • gives the mapmaker full control over the message of the visualization

Cons

  • see Pros – mapmakers of questionable integrity can easily manipulate the message.

Mean and Standard Deviation Scheme

Places breaks at the mean and selected standard deviation intervals above and below the mean.

Pros

  • provides a good idea of variance or how much the data differs from the mean

Cons

  • requires map readers to be familiar with basic statisticsl concepts of mean and standard deviation.

These classification schemes were explained and illustrated in detail in Sarah Battersby’s TC16’s talk “Mapping Tips from a Cartographer”. Sarah is Tableau’s research scientist and cartography expert. Her workbook from that talk is below:

 

Credit: Sarah Battersby (TC16 talk)

Automating map classes with parameters

In this part of the article I will introduce another interesting map classification and show how to make exploring different classifications easy with a couple of parameters.

Geometrical Interval Scheme

This scheme needs a bit more explanation than other schemes but is nevertheless very useful for certain applications. Breaks are based on intervals that create a geometric series. What?? Simple, each class interval is larger than the previous one by an increasing amount. Still confused? Let’s look at an example. Assume that our data has a minimum value of 10, maximum 160 and class interval is 10. The boundaries of classes will look as follows:

Class 1:  10 – 20 (10 + 10*1)

Class 2: 20 – 40 (20 + 10*2)

Class 3: 40 – 70 (40 + 10*3)

Class 4: 70 – 110 (70 + 10*4)

Class 5: 110 – 160 (110 + 10*5)

The class interval is calculated as a root of degree N of the range of the data, where N is the number of classes you choose.

Pros

  • great for skewed distributions, emphasizes differences in dense parts of the data.

Cons

  • uncommon.

Compare the two maps below, accompanied by histograms of the data distribution. They both show % of Population of Latino Origin by County. The top one uses 5 equal interval classes and the bottom one uses geometric interval classes. For the data like this one, highly skewed, the geometric scheme has a clear advantage of breaking apart values of high frequency.

Setting and adjusting classes with parameters

I created this workbook to speed up exploration of different class sizes. It allows for the selection of:

  1. measure to explore
  2. classification scheme (either equal or geometric interval)
  3. desired number of classes, and
  4. number of decimal places to use in the legend.

Note that the equal interval class can be set just by dropping your measure on Color and switching to Stepped Color option in Edit Colors menu. However, this calculated alternative displays the exact class boundaries and allows for highlighting the class by clicking in the legend. Both options are quite useful.

Summary

There are plenty of different classification schemes available to color your choropleth map. Know your data, check its distribution (view the histogram) and think of the message you want to convey. Use calculations and parameters to explore different options and/or give the map viewer options to decide how they want to display it.

Additional resources and references

Sarah Battersby’s Tableau Public Profile

Mapping Tips from a Cartographer (Sarah Battersby’s TC16 talk)

Classification Systems (Slideshare deck by John Reiser)

Choropleth Maps – A Guide to Data Classification  (GIS Geography blog)

ArcGIS Data classification methods (ArcGIS Pro Online Help)

Geometric Class Formula (Useless Archaeology blog)

About the Geometrical Interval classification method (ArcGIS blog)

Back to top|Contact me

4 methods to import line geography into Tableau

If you’ve played with Tableau’s new spatial data connector, you might have come across an error like this:

or even this:

The first error message will pop up if you try to connect to a spatial file containing lines and the second when your spatial file contains more than one feature type. Geographic features can be polygons, lines or points. At the time of this writing, Tableau can connect to spatial files containing either polygons or points but not both in the same file. I don’t know if/when Tableau is planning to support mixed geometry spatial files but support for lines is a high priority item and is coming soon. Hear Tableau developers addressing one user’s question about line support:

But until then, we can always import line geography to Tableau the “old-fashioned way” like we used to do before Tableau 10.2. There are at least 4 ways to do this:

  1. FME’s Shapefile to TDE online converter
  2. Alteryx Tableau Shapefile to Polygon Converter written by Craig Bloodworth
  3. shapetotab utility developed years ago by Richard Leeke
  4. Buffering the lines in a GIS software like QGIS, as described by Adam Crahen

I’ve found that the easiest method is using FME’s online converter. Go to https://www.safe.com/free-tools/shapefile-to-tableau/

Just drag and drop your files into the conversion window. Shapefiles consist of several files and they are usually available for download as a ZIP archive. You don’t even have to unzip it, just drop it in. Press the Convert Now button and in a few seconds you will get a prompt to download the resulting Tableau extract. And if your archive contains multiple shapefiles, the FME converter will automatically convert all of them!

I used the converter to make a Tableau extract of streets of Vancouver, based on a shapefile downloaded from Vancouver’s Data Catalog.

 

After connecting to the extract you will see that FME converter created 2 custom measures: spatial_latitude and spatial_longitude and 2 custom dimensions: spatial_geometry_id and spatial_geometry_order.

Drop spatial_latitude and spatial_longitude onto the canvas;

Switch Marks to Line;

Put spatial_geometry_id on Detail, and

Put spatial_geometry_id on Path.

 

 

 

 

A little bit of formatting and our map is ready:

Back to top|Contact me

How to geocode thousands of addresses and make a Tableau custom polygon + point map, with a little help from FME

I recently had an opportunity to test the new spatial data connector in Tableau. It is a highly anticipated addition to the long list of connectors that allows you to work with ESRI Shapefiles, KML, MapInfo, and GeoJSON files directly in Tableau. The connector can interpret polygon and point entities (no lines as of yet) and is a big step towards making creation of maps in Tableau a much better experience. This long post is split into 2 parts, the first one describing preparation of data (which I did in FME) and the second part is about handling multiple shapefiles in Tableau to make the map work just the way we want it to.

PART 1 – data preparation

A little background about the project. I want to plot about 30,000 nonprofit organizations on a map of Washington State and classify them by the groups they belong to. The data I got to start with contained the ID of each nonprofit, its name, along with address, including ZIP code, and a code that can be related to the desired grouping.

What I wanted to end up with was a table with all orgs along with their lats and longs and corresponding county, congressional district and legislative district. FME to the rescue!

If you haven’t heard of FME yet, it is a program developed by Safe Software. FME stands for Feature Manipulation Engine and it lets you create visual, drag-and-drop workflows to reshape or translate your data. Similar to Alteryx but focused on spatial data (and a hell of a lot more affordable than Alteryx). FME can be used to transform any type of data but it really shines when applied to tough spatial problems. Our project is not anywhere close to being a tough job for FME, so we’ll just be scratching the surface of what it can do.

I created my FME workflow to geocode all locations, assign county, legislative and congressional district value, clean it up and output to a shapefile:

click image to view it full size

The workflow has 3 main sections:

Geolocation

  1. Reads in the spreadsheet with org names and addresses
  2. Reads in ZIP codes latitude/longitude lookup table
  3. Runs addresses through Bing geocoding transformer
  4. Passes records that Bing failed to geolocate (usually PO Box addresses) to a ZIP code lookup table
  5. Creates geometric points from lats and longs
  6. Removes unnecessary fields.

Administrative area assignment:

  1. Reads in shapefiles with boundaries of US counties, US congressional districts, and Washington state legislative districts
  2. Extracts WA state entities for US wide shapefiles
  3. Renames “NAME” attribute in each shapefile to the proper name of an administrative area (county, etc.)
  4. Performs and overlay of geolocated points on areas to assign each point attributes of the areas it is contained in.

Output:

  1. Combines geolocated points and points for which locations could not be found (points without coordinates will not show on the map but we still want them to be counted)
  2. Cleans up the file by removing extraneous fields
  3. Outputs the final table of points with area assignments to a shapefile.

An additional note about geocoding options in FME.

The software comes with prebuilt access to 13 geolocation services. Some are paid but about half offer a least a limited free geocoding. Below I am listing free ones with transaction limits:

Bing (125,000 annually)

FreeGeIP.net (15,000 per hour)

Google (2,500 per day)

Here (15,000 per month)

IPInfo.io (1,000 per day)

Mapzen (30,000 per day)

OpenCage Data (2,500 per day)

 

Although it’s not implemented in my workflow, FME has a Recorder and Player transformers that let you save to a file partial results of your workflow and replay it later in the same or another workflow. This is especially helpful when you geocode a lot of addresses and you don’t want to use up your quota by running the full workflow every time you make some changes and want to test it.

 

 

 

 

PART 2 – Tableau

The hard part is done, now let’s bring it all together in Tableau.

I joined all 4 shapefiles, on the left the geocoded points, on the right, boundaries of counties, congressional districts and legislative districts. The join clause, in all 3 joins, is the name of admin area. Remember that we have admin area assignments in our point shapefile. I used full outer joins to make sure I could display boundaries of admin areas even if there are no points within them. Conversely, if some points fell beyond Washington state boundary, I want to know about it.

Connecting a list of points as a shapefile, as opposed to text or Excel, is intentional. It solves a problem of significant data preparation that would otherwise be needed to create a combined polygon and point map. Tableau’s Alan Eldridge wrote excellent post about it, see the links in the resources section at the end of this post.

To give users the ability to switch between different administrative areas, we need a parameter and 2 calculated fields:

 

Admin boundaries based on user selection.

 

 

 

 

 

 

 

 

Level of detail based on user selection.

 

 

 

 

 

 

 

PAUSE

Let’s pause here for a moment. You probably have been wondering why we are bringing county and congressional district boundaries in; doesn’t Tableau have them built in? Yes, but we also need legislative districts which are not built in. Tableau interprets built in geographic fields (State, County, etc.) as strings and the Geomtery field from a shapefile as type geometry. It would not let us mix the two types in the Level calculated field. Hence the need to keep everything consistent with shapefiles.

 

 

END OF PAUSE

To make the map, drag Geometry field from the nonprofit points shapefile (2015-501c3.shp) to canvas – Tableau will draw a map of all points. Put Names field (same data source) on Detail to separate the points into individual marks, this will allow tooltip display for each individual point.

Duplicate Longitude (generated) field on Columns to create a dual axis map and put Level and Detail calculated fields on a Detail mark.

Lastly, we need to create nonprofit groups based on Ntee Cd field, and place the group on colour.

A few minor adjustments and we have an interactive polygon/point map with ability to switch between different administrative boundary options!

 

Summary

With Tableau 10.2 it is easy to create polygon/point maps, including multiple custom geographies. If needed, data preprocessing is a cinch with a tool like FME. I am just scratching the surface of what FME can do but I really got to enjoy working with it since I discovered the tool a couple of months ago. As I explore mapping in Tableau and discover FME, more blog posts are bound to come soon.

Additional Resources and References

Points and Polygons (Alan Eldridge, June 23, 2014)

Tackle your geospatial analysis with ease in Tableau 10.2 (Kent Marten, February 14, 2017)

Points and Polygons in Tableau 10.2 (Alan Eldridge, January 18, 2017)

Using GEOMETRY Fields in Calculations (Alan Eldridge, February 21, 2017)

 

Back to top|Contact me

Compare Prices Everywhere

This is my entry for the Tableau Ironviz qualifier contest. It is all about mobile design and Tableau 10’s new Device Manager. I intended to create a simple and functional visualization resembling a mobile app.

When I played with the app and browsed through the data, the biggest surprise for me was the cost of apartments in Luanda, Angola. It didn’t seem right for the country whose GDP is so much lower than the rich European, Asian or North American superpowers. But apparently the data checks out and the high prices of properties in Luanda are the result of a limited supply of luxurious housing demanded by expats. The economy is booming but the construction is lagging behind. There is an interesting article here, if you wanted to learn more.

Have fun comparing prices and salaries in almost 600 cities around the world!

pricecompare

Back to top|Contact me

Bob Dylan – analyzing 55 years of lyrics

Love him and his music or not, Bob Dylan is considered by many to be one of the greatest lyricists of all time. He has recorded 36 studio albums, written hundreds of songs, and is still touring the world, performing at dozens of concerts each year. Not bad for a 74-year-old grandpa who started his career more than 5 decades ago. He’s been recognized so many times that Wikipedia has a separate page to list all of his awards and nominations.

But what is it that draws people to his songs? Do his lyrics have a particular essence? Are they happy or sad? Does Dylan sing primarily about himself and his experiences? What words does he use most frequently? If you’ve ever wondered about any of this, my latest viz, and a submission to Tableau’s IronViz Music Viz Contest, attempts to help answer these questions. In this post I break down how I approached the challenge of finding new ways to understand Dylan through text analysis of his lyrics.

I started this project by first determining where to source my primary data set – the lyrics! There are many websites out there that archive song lyrics, but I decided to rely on Dylan’s official website, presumably the most reliable source of information about the artist and his writing. Most of his songs do indeed have listed lyrics; I say most because several songs are missing lyrics and a handful of pieces are instrumental. I used import.io to scrape albums and song lists, as well as lyrics. All you need to do this yourself is use their free app, available on their website. Be sure to check out the tutorials in the Help sections to get started. The program is fairly easy to use but video-walkthroughs will likely save you time and some potential frustration.

I scraped the lyrics twice. For text analysis in Tableau I needed each line in a separate row and for sentiment analysis (more on this later) I wanted to have the text of the whole song in a single row.

Preparing data for Tableau

After training import.io’s crawler to separate each line into its own row, this is what I got:
original_from_importio

What I eventually wanted to end up with was each word in a separate row. To achieve this, I took the data to Tableau and used the INDEX() function to number each line, Restarting Every song. I copied the resulting crosstab back to Excel and used Excel’s Text-to-Columns command to split each line into words, using space as a delimiter. I then added numbering to each column of words:

after_splitting_lines

From there, I relied on Tableau’s free data reshape tool to convert my wide data into a long format. This gave me the table like below, with each word identified by the line number and the word’s location in the line. The only thing lost in translation were paragraph breaks between verses, but I can live with that.

Tableau_input

If you were paying attention, you may have noticed that I added 2 additional columns: pct_before (for punctuation preceding the word) and pct_after (for punctuation following the word). I’d like to credit Robert Rouse for this method of treating punctuation in data prep. He did some fantastic work with his visualization of Bible text and I got many great ideas by digging into his viz.

Separating but keeping punctuation marks is important if you want to analyze raw words while retaining the ability to properly display your text in Tableau, by concatenating raw words with the punctuation marks.

Sentiment Analysis

Okay, so we can now analyze our lyric statistics in Tableau, but how can we assess the mood and emotion of the song? R to the rescue! As I mentioned, for sentiment analysis I needed each song as a long string (one song per row). That is the data input format required by R. Bora Beran writes more on his blog about running an R sentiment package from within Tableau, but for a Tableau Public viz I had to do my analysis in R, and plug the results into Tableau. What the R sentiment package does (download it here), is it cross-references the words analyzed against its built in database of over 6000 words classified as positive, negative, or neutral, compiles results for the whole string (a song in our case) and uses fancy statistics to best fit a single descriptor to your text. The database of words is just a text file and you can add your own words and their sentiment to the database to adapt the package to your data set, especially if the text you are analyzing contains uncommon words, or word combinations.

Below is the R code that reads in the data file (two column CSV with the song name and lyrics), strips punctuation marks, numbers, and converts all words to lowercase, runs its classification, and outputs the results to a new CSV.

# load library
library(sentiment)

# load data
data <- read.csv("words.csv")

# remove numbers
data[,2] = gsub("[[:digit:]]", "", data[,2])
# remove punctuation
data[,2] = gsub("[[:punct:]]", "", data[,2])
# convert to lowercase
data[,2] = tolower(data[,2])

# classify emotion
class_emo = classify_emotion(data, algorithm="bayes", prior=1.0)
# get best fit
emotion = class_emo[,7]
# classify polarity
class_pol = classify_polarity(data, algorithm="bayes")
# get best fit
polarity = class_pol[,4]

sentiment = cbind(data, emotion, polarity)

write.csv(sentiment, file="sentiment.csv")

The rest was Tableau fun and the result is below. I hope you’ll enjoy.

Maxime K - Impressive, really…

Congrats and best luck for the contest!

Hector Alvaro Rojas - Nice work has been done here!

Text mining and Sentiment analysis in one great expression.

I would like to go farther and try to get the pattern or tendency of some of his great messages. Well, this is part of the Sentiment Analysis idea. Anyway, “the answer my friend is Blowing In The Wind”.

https://www.youtube.com/watch?v=vWwgrjjIMXA

HA

Gregory Lewandowski - Simply put, this is fantastic!! Your attention to detail, the analysis, and the clarity of the story are unparalleled!!

Well done!!

George Gorczynski - Thank you, Gregory. I’m really glad you find the viz engaging.

Back to top|Contact me