Women still underrepresented in films

It is not a secret that women in films are underrepresented and underpaid compared to their male partners. But what exactly is a percentage of films with women in lead roles compared to films led by men? I explored movies released so far in 2017 to find out.

The most difficult part of making this visualization was getting the data. I found that The Movie DB website has a very well documented API but HTML queries allowed scraping of only 20 results (movies) at a time. So, the only way to get thousands of results without going berserk was to automate the API access. Luckily, good folks from the R community created a TMDb R package that hugely sped up the development process. The package has its own limitations so the process still took more time than I expected but I managed to extract about 7000 films released around the world in 2017. From these 7000 about half did not have credits available, making them useless for my needs. So, the analysis takes into account not all films released so far in 2017 but only the ones that have credits in the TMDb database. Considering the large number of movies with credits available (almost 3000) and assuming that people contributing information to the database did not choose films to work on based on cast gender, I think the results I got are still valuable.

I found a lot of insights in this data. It comes as no surprise that men dominate starring roles in every country and in all genres. What surprised me is that women are most likely to be cast as a lead in horror movies. Fantasy, Romance and Drama come real close and the least likely movies you will see with female leads in are Action, Science Fiction and Western flicks. As for differences among languages, French movie makers favour female leads more than in any other country.

Click on image below to access the interactive viz.


Back to top|Contact me

Proximity Analysis with Voronoi Diagrams – Mapping US International Airports

What on earth is a Voronoi diagram? Put simply, it’s a graphical representation of proximity analysis that can be easily recognized by its characteristic cell-like structures. Each point in a Voronoi diagram is surrounded by a polygon (cell) that contains locations closer to it than to any other points.

Google Search


Mathematicians were informally using this type of diagram as early as the 17th century. However, it wasn’t until about a hundred years ago when Ukrainian mathematician Georgy Voronoi defined a solid mathematical theory supporting proximity analysis.

Voronoi diagrams can be used in many different disciplines like meteorology, geophysics, epidemiology, biology, mining, city planning, wherever there is a need to analyze spatially distributed data. In aviation, they are used to identify the nearest landing strip in case of diversion.

In this Tableau/FME exercise, I set out to see the distribution of international airports in the US and identify the most remote US airports, i.e. airports surrounded by the largest Voronoi polygon. The following is a breakdown of my process.


Data sources

I found a list of US international airports with passenger statistics on Wikipedia, unfortunately it doesn’t include coordinates. Luckily, there is an up to date listing of all airports in the world, including lats and longs on Our Airports website. Perfect, I would use it to lookup coordinates for the airports listed on Wikipedia.

For the map of the US in a shapefile format, I always go to the US Census website. There you can find a comprehensive repository of national and state administrative units in the US. For this project I just needed a US outline.

US Census website


Reshaping Data and Creating Voronoi Polygons

I used FME to create the Voronoi polygons and reshape my data. I’ve used FME on other projects and I love how easy it is to create reusable, automated workflows and handle geographic data in practically any way imaginable. Here’s the FME workflow I made for this project.

FME workflow – click to view full size image

I’ll break it down into several pieces to highlight what FME does.

Workflow – Part 1

  1. Imports both lists of airports (US and world) in Excel format.
  2. Blends them (with FeatureMerger transformer) based on the common IATA code and tags each US airport with lat/long coordinates.
  3. Filters only US mainland international airports. For simplicity, I wanted to only map airports in the contiguous US, so I excluded airports in Alaska, Hawaii and US territories. I also excluded all airports that did not have “International” in their name.
  4. Creates point geometry (VertexCreator) based on each airport’s coordinate.
  5. Supplies these points as an input to the VoronoiDiagrammer transformer.

Workflow – Part 2


  1. Imports the shapefile of US boundary.
  2. Filters out states and territories outside of 48 contiguous states.
  3. Removes the internal state boundaries (Dissolver)


After I ran the complete workflow, I noticed that I had many more Voronoi polygons than airports. This wasn’t right. As it turned out, FME created separate polygons for each little off-shore island and this required fixing. I used Deaggregator, Filter and Aggregator to clean up the geography and get a single continental US boundary polygon.


Workflow – Part 3

  1. Voronoi diagrams know no boundaries. They will extend way past your area of interest and need to be trimmed. The Clipper transformer takes Voronoi diagrams as a Clippee and the US boundary as a Clipper and creates a nicely trimmed proper map.
  2. The remainder of the workflow reprojects polygon and point data from Mercator to a conical projection (more about that in a bit), calculates the area of each Voronoi polygon, cleans up the attributes, and exports our data to shapefiles.


Creating an Alternative Map Projection in Tableau

If you have ever tried to create a map in Tableau, you know that it always defaults to a Mercator projection. This is the projection where the western part of the border between Canada and US is a straight, horizontal line. I, however, wanted to build a map in a different (conic) projection, this is why I reprojected my US boundary shapefile in FME from the Mercator to Albers coordinate system. The problem is that no matter your shapefile’s projection, Tableau will always display it in Mercator to conform it to its background maps.

It looks like Tableau reprojects the geographic file it connects to on the fly. But not all is lost. It turns out that you can trick Tableau into thinking the map is already in Mercator projection to leave it as is. As Tableau’s background map is in Mercator, our polygons don’t overlay on it properly but we can always drop transparency to zero to mask this issue.



How do we trick Tableau into doing this? The idea for this comes from Tableau’s researcher Sarah Battersby, who suggested changing the projection definition in QGIS. But I tinkered with the projection file (.prj file that is a part of a shapefile bundle) and discovered that merely changing the name of the projection from Albers to Mercator does the job. Be warned however, this is an untested hack, it worked in this situation but I haven’t tested it on other maps and with other projections.

Making the Map Blend with the Dark Background

So we have our map in Tableau but we changed the Washout setting on the background to 100% and now the background is white. And there is no setting to change the color of the map background.



There is a solution to that too! I used Mapbox to create a custom background map. I removed all graphical elements and left only a solid background with a color set to what I wanted it to be in Tableau, and I replaced the default background map in Tableau with the Mapbox map I just created.




Putting it All Together in Tableau

The rest of the work was all Tableau, building a dashboard with a custom legend, icons, and selecting an appropriate color palette.

Hope you enjoyed this lengthy article! Please post a comment and share broadly, and feel free to email me if you have any questions.

Back to top|Contact me

Setting map classification with parameters

Maps are tricky, especially quantitative choropleth maps. Not because they are hard to make in Tableau. Just the opposite, it takes just a few mouse clicks to make one but is it right? It depends on the data. When you drop a continuous measure on the Filled Map, Tableau creates a choropleth map and assigns a unique shade of color to each mark – a sequential color palette.


When your data is normally distributed, this default setup might be just what you need…










… but it often isn’t and the color assignment requires further exploration.







Overview of map classing

Cartographers use several different methods of aggregating features into classes, all with a single purpose of making spotting patterns in the data easier. The maps above are examples of Unclassified Scheme where every mark (polygon) with a unique data value receives a unique shade of grey. Some polygons may look identically colored but, as long as the value they represent differs, so does its shade. An alternative to this approach is reducing the individual quantitative values to a smaller number of categories or classes. Think of it as binning your values and here is a histogram that illustrates it:

So what are these different classification methods? I’m glad you asked! Below is a list of the most important map classifications. See Tableau workbooks further down in the text for illustration.

Equal Interval Scheme

It divides all values (between min and max) into classes of equal width. For example, percent of Latino population by US County: [5%-10%], [11%-15%], [16%-20%], etc., where the width of the class is 5%. The easiest way to create an equal interval scheme in Tableau is to switch to Stepped Color option in Edit Colors menu.


  • easy to understand
  • useful as a common classification scheme for comparing multiple maps.


  • not good for skewed data distribution.

Quantile Scheme

In a quantile scheme each class contains an equal number of marks. In Tableau, this can be achieved with Percentile Quick Table Calculation.


  • easy to identify marks at the extremes, e.g. top 20% or bottom 20%
  • intervals are usually wider at the extremes highlighting changes in the middle values.


  • break points may seem arbitrary and irregular.

Natural Breaks (Jenks) Scheme

Natural Breaks classes are based on, yes, you guessed it, natural breaks inherent in the data. This scheme uses an algorithm that creates breaks where there are relatively big jumps in data values. In other words, with this scheme, you should see minimum variation between members of each class and maximum variation in value between classes. Since Jenks Scheme uses algorithm similar to K-means (minimizing distances within groups), a similar result will be achieved with Tableau’s clustering which is based on K-means algorithm.


  • maximizes the similarity of values in each class


  • it’s a bit”mathy” and may need some explanation of statistical concepts used.

Custom Breaks Scheme

The name says it all, you create your own classes based on the data and what you want to emphasize. You can create these groups with Tableau calculation.


  • gives the mapmaker full control over the message of the visualization


  • see Pros – mapmakers of questionable integrity can easily manipulate the message.

Mean and Standard Deviation Scheme

Places breaks at the mean and selected standard deviation intervals above and below the mean.


  • provides a good idea of variance or how much the data differs from the mean


  • requires map readers to be familiar with basic statisticsl concepts of mean and standard deviation.

These classification schemes were explained and illustrated in detail in Sarah Battersby’s TC16’s talk “Mapping Tips from a Cartographer”. Sarah is Tableau’s research scientist and cartography expert. Her workbook from that talk is below:


Credit: Sarah Battersby (TC16 talk)

Automating map classes with parameters

In this part of the article I will introduce another interesting map classification and show how to make exploring different classifications easy with a couple of parameters.

Geometrical Interval Scheme

This scheme needs a bit more explanation than other schemes but is nevertheless very useful for certain applications. Breaks are based on intervals that create a geometric series. What?? Simple, each class interval is larger than the previous one by an increasing amount. Still confused? Let’s look at an example. Assume that our data has a minimum value of 10, maximum 160 and class interval is 10. The boundaries of classes will look as follows:

Class 1:  10 – 20 (10 + 10*1)

Class 2: 20 – 40 (20 + 10*2)

Class 3: 40 – 70 (40 + 10*3)

Class 4: 70 – 110 (70 + 10*4)

Class 5: 110 – 160 (110 + 10*5)

The class interval is calculated as a root of degree N of the range of the data, where N is the number of classes you choose.


  • great for skewed distributions, emphasizes differences in dense parts of the data.


  • uncommon.

Compare the two maps below, accompanied by histograms of the data distribution. They both show % of Population of Latino Origin by County. The top one uses 5 equal interval classes and the bottom one uses geometric interval classes. For the data like this one, highly skewed, the geometric scheme has a clear advantage of breaking apart values of high frequency.

Setting and adjusting classes with parameters

I created this workbook to speed up exploration of different class sizes. It allows for the selection of:

  1. measure to explore
  2. classification scheme (either equal or geometric interval)
  3. desired number of classes, and
  4. number of decimal places to use in the legend.

Note that the equal interval class can be set just by dropping your measure on Color and switching to Stepped Color option in Edit Colors menu. However, this calculated alternative displays the exact class boundaries and allows for highlighting the class by clicking in the legend. Both options are quite useful.


There are plenty of different classification schemes available to color your choropleth map. Know your data, check its distribution (view the histogram) and think of the message you want to convey. Use calculations and parameters to explore different options and/or give the map viewer options to decide how they want to display it.

Additional resources and references

Sarah Battersby’s Tableau Public Profile

Mapping Tips from a Cartographer (Sarah Battersby’s TC16 talk)

Classification Systems (Slideshare deck by John Reiser)

Choropleth Maps – A Guide to Data Classification  (GIS Geography blog)

ArcGIS Data classification methods (ArcGIS Pro Online Help)

Geometric Class Formula (Useless Archaeology blog)

About the Geometrical Interval classification method (ArcGIS blog)

Scattered Plots: Links and Likes for April 2017 | POINTS OF VIZ - […] Map Classification With Parameters by George Gorczynski (@GGorczynski) Another long blogpost, and somewhat more technical, but this […]

Back to top|Contact me

4 methods to import line geography into Tableau

If you’ve played with Tableau’s new spatial data connector, you might have come across an error like this:

or even this:

The first error message will pop up if you try to connect to a spatial file containing lines and the second when your spatial file contains more than one feature type. Geographic features can be polygons, lines or points. At the time of this writing, Tableau can connect to spatial files containing either polygons or points but not both in the same file. I don’t know if/when Tableau is planning to support mixed geometry spatial files but support for lines is a high priority item and is coming soon. Hear Tableau developers addressing one user’s question about line support:

But until then, we can always import line geography to Tableau the “old-fashioned way” like we used to do before Tableau 10.2. There are at least 4 ways to do this:

  1. FME’s Shapefile to TDE online converter
  2. Alteryx Tableau Shapefile to Polygon Converter written by Craig Bloodworth
  3. shapetotab utility developed years ago by Richard Leeke
  4. Buffering the lines in a GIS software like QGIS, as described by Adam Crahen

I’ve found that the easiest method is using FME’s online converter. Go to https://www.safe.com/free-tools/shapefile-to-tableau/

Just drag and drop your files into the conversion window. Shapefiles consist of several files and they are usually available for download as a ZIP archive. You don’t even have to unzip it, just drop it in. Press the Convert Now button and in a few seconds you will get a prompt to download the resulting Tableau extract. And if your archive contains multiple shapefiles, the FME converter will automatically convert all of them!

I used the converter to make a Tableau extract of streets of Vancouver, based on a shapefile downloaded from Vancouver’s Data Catalog.


After connecting to the extract you will see that FME converter created 2 custom measures: spatial_latitude and spatial_longitude and 2 custom dimensions: spatial_geometry_id and spatial_geometry_order.

Drop spatial_latitude and spatial_longitude onto the canvas;

Switch Marks to Line;

Put spatial_geometry_id on Detail, and

Put spatial_geometry_id on Path.





A little bit of formatting and our map is ready:

Back to top|Contact me

How to geocode thousands of addresses and make a Tableau custom polygon + point map, with a little help from FME

I recently had an opportunity to test the new spatial data connector in Tableau. It is a highly anticipated addition to the long list of connectors that allows you to work with ESRI Shapefiles, KML, MapInfo, and GeoJSON files directly in Tableau. The connector can interpret polygon and point entities (no lines as of yet) and is a big step towards making creation of maps in Tableau a much better experience. This long post is split into 2 parts, the first one describing preparation of data (which I did in FME) and the second part is about handling multiple shapefiles in Tableau to make the map work just the way we want it to.

PART 1 – data preparation

A little background about the project. I want to plot about 30,000 nonprofit organizations on a map of Washington State and classify them by the groups they belong to. The data I got to start with contained the ID of each nonprofit, its name, along with address, including ZIP code, and a code that can be related to the desired grouping.

What I wanted to end up with was a table with all orgs along with their lats and longs and corresponding county, congressional district and legislative district. FME to the rescue!

If you haven’t heard of FME yet, it is a program developed by Safe Software. FME stands for Feature Manipulation Engine and it lets you create visual, drag-and-drop workflows to reshape or translate your data. Similar to Alteryx but focused on spatial data (and a hell of a lot more affordable than Alteryx). FME can be used to transform any type of data but it really shines when applied to tough spatial problems. Our project is not anywhere close to being a tough job for FME, so we’ll just be scratching the surface of what it can do.

I created my FME workflow to geocode all locations, assign county, legislative and congressional district value, clean it up and output to a shapefile:

click image to view it full size


The workflow has 3 main sections:


  1. Reads in the spreadsheet with org names and addresses
  2. Reads in ZIP codes latitude/longitude lookup table
  3. Runs addresses through Bing geocoding transformer
  4. Passes records that Bing failed to geolocate (usually PO Box addresses) to a ZIP code lookup table
  5. Creates geometric points from lats and longs
  6. Removes unnecessary fields.

Administrative area assignment:

  1. Reads in shapefiles with boundaries of US counties, US congressional districts, and Washington state legislative districts
  2. Extracts WA state entities for US wide shapefiles
  3. Renames “NAME” attribute in each shapefile to the proper name of an administrative area (county, etc.)
  4. Performs and overlay of geolocated points on areas to assign each point attributes of the areas it is contained in.


  1. Combines geolocated points and points for which locations could not be found (points without coordinates will not show on the map but we still want them to be counted)
  2. Cleans up the file by removing extraneous fields
  3. Outputs the final table of points with area assignments to a shapefile.

An additional note about geocoding options in FME.

The software comes with prebuilt access to 13 geolocation services. Some are paid but about half offer a least a limited free geocoding. Below I am listing free ones with transaction limits:

Bing (125,000 annually)

FreeGeIP.net (15,000 per hour)

Google (2,500 per day)

Here (15,000 per month)

IPInfo.io (1,000 per day)

Mapzen (30,000 per day)

OpenCage Data (2,500 per day)


Although it’s not implemented in my workflow, FME has a Recorder and Player transformers that let you save to a file partial results of your workflow and replay it later in the same or another workflow. This is especially helpful when you geocode a lot of addresses and you don’t want to use up your quota by running the full workflow every time you make some changes and want to test it.





PART 2 – Tableau

The hard part is done, now let’s bring it all together in Tableau.

I joined all 4 shapefiles, on the left the geocoded points, on the right, boundaries of counties, congressional districts and legislative districts. The join clause, in all 3 joins, is the name of admin area. Remember that we have admin area assignments in our point shapefile. I used full outer joins to make sure I could display boundaries of admin areas even if there are no points within them. Conversely, if some points fell beyond Washington state boundary, I want to know about it.

Connecting a list of points as a shapefile, as opposed to text or Excel, is intentional. It solves a problem of significant data preparation that would otherwise be needed to create a combined polygon and point map. Tableau’s Alan Eldridge wrote excellent post about it, see the links in the resources section at the end of this post.

To give users the ability to switch between different administrative areas, we need a parameter and 2 calculated fields:


Admin boundaries based on user selection.









Level of detail based on user selection.









Let’s pause here for a moment. You probably have been wondering why we are bringing county and congressional district boundaries in; doesn’t Tableau have them built in? Yes, but we also need legislative districts which are not built in. Tableau interprets built in geographic fields (State, County, etc.) as strings and the Geomtery field from a shapefile as type geometry. It would not let us mix the two types in the Level calculated field. Hence the need to keep everything consistent with shapefiles.




To make the map, drag Geometry field from the nonprofit points shapefile (2015-501c3.shp) to canvas – Tableau will draw a map of all points. Put Names field (same data source) on Detail to separate the points into individual marks, this will allow tooltip display for each individual point.

Duplicate Longitude (generated) field on Columns to create a dual axis map and put Level and Detail calculated fields on a Detail mark.

Lastly, we need to create nonprofit groups based on Ntee Cd field, and place the group on colour.

A few minor adjustments and we have an interactive polygon/point map with ability to switch between different administrative boundary options!



With Tableau 10.2 it is easy to create polygon/point maps, including multiple custom geographies. If needed, data preprocessing is a cinch with a tool like FME. I am just scratching the surface of what FME can do but I really got to enjoy working with it since I discovered the tool a couple of months ago. As I explore mapping in Tableau and discover FME, more blog posts are bound to come soon.

Additional Resources and References

Points and Polygons (Alan Eldridge, June 23, 2014)

Tackle your geospatial analysis with ease in Tableau 10.2 (Kent Marten, February 14, 2017)

Points and Polygons in Tableau 10.2 (Alan Eldridge, January 18, 2017)

Using GEOMETRY Fields in Calculations (Alan Eldridge, February 21, 2017)


Back to top|Contact me