Improving Data Visualizations

Understanding the Principles of Effective and Clean Visualizations

Principles of Plotting

According to William S. Cleveland, the principles of plotting fall under two main categories: those that improve vision and those that improve understanding.

Improving Vision

  • Reducing clutter
  • Use visually prominent data elements
  • Use proper scale lines and a data rectangle
  • Be careful with reference lines, labels, notes, and keys

Improving Understanding

  • Provide explanations and draw conclusions
  • Use all available space
  • Align juxtaposed plots
  • Use log scales when appropriate
  • Bank to 45 degrees

Choosing a Visual that Needs Improvement

When exploring Kaggle, I found a visualization that utilized statistics from FIFA World Cup matches from 1930 to 2014. The data used was scraped from the official FIFA World Cup Archive website and uploaded onto Kaggle for users to play around with. This particular user who made the visualization shown in Figure 1 also made other plots to explore the datasets that are available on Kaggle. The user's main purpose of making the plots was to explore the history of games since 1930 and to see if there is anything interesting to note in the data. In the line plot I chose to recreate, the purpose was to visualize the trend in the number of goals made during the World Cup each year. I noticed that there were some issues with it that make the plot difficult to comprehend at first glance.

According to Principles 1 and 3 of improving the vision of a plot, there should be an appropriate number of tick marks and labels for each axis and the scale lines should never interfere with the data. In Figure 1, we can argue that there is an excessive number of ticks on the x-axis. Furthermore, the ticks on the y-axis are so close together that the labels interfere with each other. This is because the user did not use a uniform scale. Instead, the ticks are made where each data point is located. Although it makes it possible to view the specific numbers at each point in the line plot, it makes it difficult to read the numbers where the tick labels overlap.

Another issue with the visualization is the lack of explanation for why the user made the plot (Principle 1 of improving the understanding of a plot). I can only assume that the user graphed the goals made throughout the history of FIFA World Cups simply to explore the data and to visualize the rising trend over the years. However, this doesn't tell the readers the entire story. It would also be helpful to know how many matches were played during each World Cup since more matches would imply that more goals were made.

line plot that does not follow plotting principles

Figure 1. Line graph of total number of goals made in FIFA World Cups (1930-2014).

Improving the Visual Using d3.js

In order to improve on this visualization, I decided to use a uniform scale for both the x-axis and y-axis. This prevented the ticks from overlapping each other. I also decided to decrease the number of ticks so the tick labels do not appear to be overly cluttered. This would solve Principles 1 and 3 of improving the vision of a plot. In order to keep the information that displays the number of goals and the year for each vertex of the line plot, I decided to add circles/dots to the graph and tooltips that display extra information when a mouse hovers over the circles.

To solve Principle 1 of improving the understanding of the plot, I decided to add a more detailed title for the line plot to clarify what the graph is showing. My graph still does not take into account the number of matches played at each World Cup, but it still retains the original purpose of visualizing the rising trend of goals made.

Figure 2. Revised line graph of total number of goals made in FIFA World Cups (1930-2014).