DataViz Makeover 2 - COVID Vaccine Survey

Visualisation makeover for Likert Scale data and uncertainty.

Published

Feb. 11, 2021

Citation

Djojosaputro, 2021

1. Introduction

COVID pandemic has impacted the world in unimaginable ways: work from home and zoom meetings are becoming norms; overseas travel almost comes to a complete halt; we might even start to forget that COVID confirmed cases and death are human beings, and not just numbers in a statistics. For the pandemic to be over, we need to achieve herd immunity either through recovery from infection or vaccine . However, despite being eager of seeing the end to this calamity, people might also have worries and beliefs that make them hesitant to take the vaccine.

Imperial College London Big Data Analytical Unit and YouGov conducted a survey that measures the people’s behaviours in response to COVID-19. This data visualisation makeover will focus on the willingness of people in various countries to take the COVID vaccine.

2. Original Visualisation Evaluation

To start, we examine the original visualisation from the data (Figure 1) to learn what we can improve. The critiques are given in terms of clarity and aesthetics.

Figure 1. Original Visualisation

2.1 Clarity

To keep:

  1. The countries graphs are shown in a sorted order. The left graph is shown in alphabetical order, while the right graph is shown in a descending order.
  2. The usage of colour is consistent between the two graphs. Both uses blue colour for strongly agree.
  3. The axes and gridlines help the users to compare the values.

To be improved:

  1. Although the Likert Scale data is ordinal, the choice of colour does not show an inherent order. It would be better if a diverging colour scale is used, because the data has a meaningful central value, which is the neutral opinion .
  2. Because the survey is conducted on a sample of the population, the actual proportion in the population might not be exactly the same. Not visualising the uncertainty can mislead the users, and thus we need to show the range of possible values .
  3. It is hard to compare the actual proportion of people who picked 2, 3, or 4 in the Likert Scale because they do not have a common baseline.
  4. The order of the countries are inconsistent between the two graphs.
  5. The title of the legend is not informative.

2.2 Aesthetics

To keep:

  1. The chart has a nice font selection that is easy to read and not unnecessarily embellished.
  2. The number of tick marks is just nice to allow comparison but not overly clutter the visualisation.
  3. The two graphs are properly aligned.
  4. Labels have less colour intensity so they do not distract the users.

To be improved:

  1. The choice of colours is too reliant on hue variation instead of value or chroma, hence increasing the visual clutter . It is better to limit the colour palette to 2 or 3 hues and use variation of colour intensity to make the visualisation more aesthetically pleasing and functional.
  2. Country names are not formatted properly. There is no capitalisation and there are dashes in the names.
  3. Decimal points are inconsistent in the axes. The left graph has no decimal points but the right one has 1 decimal points.
  4. Labels of the colour legend are inconsistent. Value 1 and 5 have a text explanation, while 2, 3, and 4 are just numbers.

3. Alternative Graphical Representation

Figure 2 shows the alternative graphical representation proposed for the makeover.

Figure 2. Alternative Graphical Representation

The survey data uses a 5-point Likert scale for the respondents to rate the statements. There are multiple ways to visualise Likert scale data, such as a 100% stacked bar chart as in the original visualisation, multiple pie charts, diverging stacked bar chart, and so on .

Diverging stacked bar chart will help to compare the positive and negative sentiment more clearly, but there are multiple views of how to categorise neutral opinion. Having the middle of the neutral proportion as the center line makes it hard to compare the values because there is no common baseline . Therefore, the first part of the alternative design is a diverging stacked bar chart, in which we will split the neutral opinion and put it at the outermost part of the stacked bar.

The second part of the alternative design is a dot plot with error bar that indicates the confidence interval of the proportion. This will show the underlying variation in the data and prevent misleading the users .

The issues from the original visualisation that the alternative design tries to overcome is colour-coded in Figure 2, with orange numbers corresponding to critiques for clarity and blue numbers for critiques with regards to aesthetics.

Clarity:

  1. To show an order in the survey response using a diverging colour scale.
  2. To show the uncertainty using an error bar in the dot plot.
  3. To allow the users to toggle between different response of interest. The users may choose to view individual percentage and error bars for ‘Strongly Agree,’ ‘Agree,’ ‘Neutral,’ ‘Disagree,’ and ‘Strongly Disagree.’
  4. To sort the order of countries in both graphs consistently in descending manner according to the selected response of interest in the right graph.

Aesthetics:

  1. To limit the colour palette and utilise a variation of colour intensity on top of differing hue.
  2. To use proper capitalisation and formatting for the country name labels.
  3. To standardise the axes labels to show no decimal points.
  4. To label the colour legends consistently using a textual explanation.

Additional Features:

  1. To allow the users to visualise different survey items. The users can see the response for questions such as whether the respondents are afraid of COVID vaccine side effects, not just restricted to whether or not they are willing to take the vaccine.
  2. To use animation to show the transitions between different selected parameters, so the users can easily notice if the ranks of the countries change.

The final look of the data visualisation makeover is shown in Figure 3. It is also available in Tableau Public

Figure 3. Data Visualisation Makeover 2 Final Look

4. Step-by-Step Description

In this section, we are going through the steps to recreate the Data Visualisation Makeover shown in Figure 3 using Tableau. Tableau Desktop has a 14-day trial that can be downloaded here.

4.1. Data Preparation

Data Source

The visualisation is based on the publicly available Imperial College London YouGov Covid 19 Behaviour Tracker Data Hub, which aims to get insights or how people respond to COVID-19. They also publish a dashboard to visualise the data.

Data Cleaning

As the survey was done in a large scale and measures a wide variety of behavioural responses, there are a lot of columns that we do not need. Reducing the size of the dataset is necessary to speed up Tableau, and ensure we do not drown in the data ;)

Figure 4. Connect to Data Source
Figure 5. Remove Table

Next, we need to join all the files using the Union function from Tableau. Ensure the workspace has no tables to prevent errors when we join the csv files.

Figure 6. Add New Union
Figure 7. Add All csv Files

We can either drag all the tables listed in the Files column on the left or use the Wildcard tab.

Figure 8. Hide Unused Columns

Brace yourself, depending on your machine, the data cleaning part can take a long time for Tableau to process. DO NOT CLICK UPDATE NOW or AUTOMATICALLY UPDATE. Unless you want to stare at your Tableau and wait.

To reduce the dataset, we need to hide away all columns we are not using and export the smaller set. Do keep in mind that we can still remove away more even after exporting, but we need to go through this process again if we want to add more columns.

We are interested in vac_1, vac2_1, vac2_2, vac2_3, vac2_6, and vac_3 survey items, as well as gender, age, household_size, household_children and employment_status contextual data.

Figure 9. Go To Worksheet

When we combine the files in a union, Tableau will add a column called Path or Table, depending on whether we used Wildcard or Specific method to add the files. We can rename and format this field, but it would be easier to do it later after we export the data subset.

Figure 10. View Data
Figure 11. Export Full Data

Now, we have a smaller dataset containing only the columns that we are interested in. Let’s process the file in a new workbook.

Figure 12. Edit Aliases

To improve the country name appearance, we are going to recode it.

The legend in the original visualisation is inconsistent because the survey item responses are stored in string, and only the value 1 and 5 have a descriptor for it. We are going to standardize the value as integer, so we need to get rid of the textual description by using a custom split.

Figure 13. Custom Split

On each column there are ‘Abc’ or ‘#’ symbol. ‘Abc’ means the column is a string, while ‘#’ means it is numerical. If the symbol starts with an equal sign, such as ‘=Abc’ or ‘=#,’ it means the column is a calculated field. Ensure that the survey items are in a string format before doing the split. If they are not, click on the data type and change it to String.

The value for 1 is ‘1 - Strongly Agree.’ Notice that the numerical value is the first part of the string before a space bar. Therefore, we will split the first column using the space bar as a separator.

Figure 14. Describe Field

We can use describe field to check whether the split was done correctly.

We have the values from 1 to 5 now in the column, so we can convert the data type to number.

Figure 15. Change Data Type

We can also rename the column and hide the original column to make it cleaner. I added a suffix ‘- ori’ the original columns and used the original column names in the calculated columns.

Repeat the steps from Figure 13 for all survey items, and we are ready to create the visualisation.

4.2. Diverging Stacked Bar Chart

Creating Visualisation

We will need make use of parameters to allow the users to dynamically change the survey items being displayed.

Figure 16. Create Parameter for Survey Item
Figure 17. Create Calculated Fields

We will need several calculated fields to enable the visualisation. The steps to create a calculated fields are: * Right-click on an empty space in the data pane > Create Calculated Field…. * Enter the field name and the formula.

The formulas that we need for the diverging stacked bar chart are:

  1. Number of Records:

    1

  2. Selected Survey Item:

    CASE [Select Survey Item] when ‘vac1’ then [Vac 1] when ‘vac2_1’ then [Vac2 1] when ‘vac2_2’ then [Vac2 2] when ‘vac2_3’ then [Vac2 3] when ‘vac2_6’ then [Vac2 6] when ‘vac3’ then [Vac 3] end

  3. Count Positive:

    If [Selected_Survey_Item] < 3 then 1 elseif [Selected_Survey_Item] = 3 then 0.5 else 0 END

  4. Count Negative:

    If [Selected_Survey_Item] > 3 then -1 elseif [Selected_Survey_Item] = 3 then -0.5 else 0 END

  5. Total Count: TOTAL(SUM([Number of Records]))

  6. Positive Percentage:

    SUM([Count Positive]) / [Total Count]

  7. Negative Percentage:

    SUM([Count Negative]) / [Total Count]

We create the visualisation by adding fields to the rows and columns shelves, as well as the filters and marks cards. Figure 18 shows the first cut of the visualisation.

Figure 18. Specify Rows and Columns Fields

To get to this stage, the steps are:

** Drag Negative Percentage and Positive Percentage to Columns. ** Drag Country to Rows. * We are not interested in the average or sum of the survey response, but we are going to treat is a category that describes the responses. Therefore, we need to convert it to a dimension by doing right-click on the Selected_Survey_Item field in the data pane > Convert to Dimension. * Drag ** Drag Negative Percentage and Positive Percentage to Columns. * Drag Selected_Survey_Items to Color Marks Card. * Drag Selected_Survey_Items to Filters Card and select All, then uncheck Null to remove all null responses.

Figure 19. Edit Colour Legend

The visualisation is not completed yet, but I found it confusing to look at when the colour seem randomly chosen and the legend are not informative. We are going to stop a while to fix the legend before continuing.

Figure 20. Select Colour for Each Response
Figure 21. Compute Using Selected_Survey_Item

Notice that in Figure 18, the negative and positive percentage for each country do not add up to 100% and Israel has the smallest bar compared to other countries. To rectify this issue, ensure the percentages are computed based on the selected survey item.

Figure 22. Use Dual Axis

Although the percentages add up to 100% now, but they are actually plotting the positive and negative percentages side by side with a different axis scale. We will use dual axis to standardise the two axis.

Figure 23. Reorder Colours

Previously, neutral bars are drawn first for the Positive Percentage. We can rearrange the bars by manually reordering the colour in the colour legend.

This creates an unintuitive order for the colour legend, but it is fine. We will not use this legend in the final dashboard.

Customisation

We can do some final touch up to customise how our visualisation looks like. For a more detailed steps to change each of the elements, take a look at my first dataviz makeover post.

Axis

Figure 24. Edit Axis Title

For the axis, we are going edit the title and format the numbers.

Figure 25. Edit Axis Number Format

Title

Remember the parameter we created earlier? We are going to use the questions as the title.

Figure 26. Edit Title

Filters

Having filters allow the users to interact with the visualisation and tailor it according to their needs. We will use Employment Status, Gender, Age, Household Children and Household Size fields as the filters.

Figure 27. Add String Filters
Figure 28. Add Numeric Filters

For the numeric filter, the interface is slightly different because we can pick a range of values.

Figure 29. Diverging Stacked Bar Chart

One down, one more to go!

4.3. Dot Plot with Error Bars

Creating Visualisation

The dot plot will visualise the selected response (e.g. Strongly Agree) for the selected survey item in the diverging stacked bar chart. We also need to repeat the steps to create parameters and calculated fields, but with some adjustments.

Figure 29. Create Parameter for Response

The formulas we need to add for the dot plot are:

  1. Count Selected Response:

    If [Selected_Survey_Item]=[Select Response] Then 1 elseif [Select Response] = 6 Then if [Selected_Survey_Item] = 4 or [Selected_Survey_Item] = 5 then 1 else 0 end elseif [Select Response] = 0 then if [Selected_Survey_Item] = 1 or [Selected_Survey_Item] = 2 then 1 else 0 end else 0 End

  2. Prop:

    TOTAL(SUM([Count Selected Response]))/ [Total Count]

  3. Prop 95% Lower Limit%:

    [Prop] - 1.959964 * sqrt([Prop]*(1-[Prop])/[Total Count])

  4. Prop 95% Upper Limit%:

    [Prop] + 1.959964 * sqrt([Prop]*(1-[Prop])/[Total Count])

  5. Prop 99% Lower Limit%:

    [Prop] - 2.575829 * sqrt([Prop]*(1-[Prop])/[Total Count])

  6. Prop 99% Upper Limit%:

    [Prop] + 2.575829 * sqrt([Prop]*(1-[Prop])/[Total Count])

Dot Plot

We are going to start by creating the dots for the proportion from the sample.

Figure 30. First Cut of the Dot Plot

If the dots are in the same value for all countries, make sure that Prop is computed using cell.

I found a neat trick that we can select Fit Height above the column shelf to automatically adjust the height of the chart.

Figure 31. Edit Colour

You can play around and see how the colour of the dots changes according to the selected response.

Figure 32. Add Labels

We can adjust the circle size again if we need to, but try not to make it very big, otherwise the circles will get cropped.

Error Bars

Figure 33. Add Measure Values

Why is the line going zig-zag like a drunk driver? Because we have not specified the path for the lines.

Figure 34. Add Colour and Path
Figure 35. Dual Axis

Combine the circle and the confidence interval by using a dual axis.

Sort

Tableau does not allow sorting using a blended measure, so we need to use a workaround . We want to sort according to the order of Prop, so we will use it in the Rows shelf and hide it away.

Figure 36. Add Prop to Columns and Convert to Discrete
Figure 37. Sort by Prop
Figure 38. Remove Gridlines

If your visualisation becomes cluttered with gridlines when you added Prop, you can remove them.

Figure 39. Edit Title for Dot Plot with Error Bar
Figure 40. Dot Plot with Error Bars

4.4. Dashboard

Now we are going to combine the two charts together in a dashboard, but we might still need to go back and adjust a few things in the chart.

Figure 41. Dashboard Initial Layout

Does something look weird? Yes, we have not sorted the bar chart according to the order of the countries on the dot plot. Go back to the Diverging Stacked Bar Chart Sheet and repeat the steps to sort the countries according to Prop. Don’t panic if it suddenly becomes an even grid, just ensure the Negative and Positive Percentage are still computed using Selected_Survey_Item.

Figure 42. Apply Filter to All Sheets

Before removing the country column for the dot plots, we need to make sure that the sorting works properly and the order of the countries in the two charts are identical. But they are not. This is because the diverging stacked bar chart has filters applied on it, but not the dot plot with error bars. So, we need to apply the filter to all sheets to standardise the sorting order.

Figure 43. Reorder Parameters and Filters
Figure 44. Add Blanks

Tableau does not allow us to easy adjust the size of the objects on the dashboard. We can use the workaround to add blanks to add some paddings.

Create a Customised Legend

It looks almost complete, but the colour legend is still missing. Because we have to manually order the colour legends for the diverging stacked bar chart, now we need to create a customised legend in a new sheet.

Figure 45. Add Colour to Legend

It looks quite nice, except the labels are on the left side. We will need a new calculated field to add the label. Create Calculated Field with the following formula.

  1. Label:

    CASE [Selected_Survey_Item] WHEN 1 then ‘Strongly Agree’ WHEN 2 then ‘Agree’ WHEN 3 then ‘Neutral’ WHEN 4 then ‘Disagree’ WHEN 5 then ‘Strongly Disagree’ END

Figure 46. Add Label to Legend
Figure 47. Edit Axis

We just need to do a few more minor touch ups.

Figure 48. Add Legend to Dashboard
Figure 49. Add Text Object

In the round of final check, I just realised that age filter was not shown all this while. It is fine, we are human. We make mistakes. If you also forgot some filters, it is not too late to add them in.

Figure 50. Add Age Filter
Figure 51. Add Animations

For a final touch, since this visualisation is interactive, we can add animation so the users can see the changes of rank between countries when they change a parameter or a filter.

Figure 51. Final Dashboard

Congratulations, we are done!

5. Major Observations

Some major insights were obtained from the alternative design of the COVID Vaccine Survey.

a. People are more willing to be vaccinated a year from now than this week due to worries for COVID vaccine potential side effects

We can clearly see the dominance of positive sentiment in the the willingness to be vaccinated a year from now (Figure 52), but the same thing does not apply to the willingness to be vaccinated this week (Figure 53).

Figure 52. Proportion of People Willing To Be Vaccinated a Year from Now
Figure 53. Proportion of People Willing To Be Vaccinated This Week

This can be due to the worries of potential side effects for the COVID vaccine. Naturally, giving more time would allow the researchers to examine the side effects for COVID vaccine more carefully. The top 3 countries with the highest total proportion that agree to be vaccinated this week (Figure 54) is the same as the top 3 countries that are not worried about COVID vaccine side-effects (Figure 55).

Figure 54. Top 3 countries with Highest Proportion That Agrees to be Vaccinated This Week
Figure 55. Top 3 Countries with Highest Proportion That Are Not Worried About COVID Vaccine Potential Side Effects

b. Older people are more worried about getting COVID-19 and more willing to be vaccinated this week

Similarly, by observing the division of sentiment regarding willingness to be vaccinated this week in the older people and the younger generation, we can see that the dominance for positive sentiment is higher in the older people (Figure 56) compared to the younger ones (Figure 57).

Figure 56. Proportion of Older People Willing to be Vaccinated This Week
Figure 57. Proportion of Younger People Willing to be Vaccinated This Week

Although not as prominent, the proportion of people who are not worried about getting COVID-19 is also higher in the younger generation (Figure 58) than the older ones (Figure 59).

Figure 58. Proportion of Younger People Not Worried About Getting COVID-19
Figure 59. Proportion of Older People Not Worried About Getting COVID-19

c. Countries with Highest Proportion that Worries about Getting COVID-19 Are Also Worried about the Side Effect of Vaccine

Figure 60. Top 3 Countries with Highest Proportion That Worries about Getting COVID-19

The significant difference between Japan, Spain, and South Korea to the other countries in terms of proportion that agrees to being worried of getting COVID-19 is interesting to observe. There is a very large gap to the next ranking country, Singapore, and even the 99% confidence interval do not overlap with any of the other countries.

Another survey item is phrased in a similar way, and the same 3 countries are in the group that shows a significantly higher worry than the other countries. Figure 61. Top 5 Countries with Highest Proportion That Worries about the Potential Side Effects of COVID-19 Vaccine

The top 5 countries with highest proportion that worries about the potential side effect of COVID-19 virus are Japan, Singapore, France, Spain, and South Korea. Again, there is no overlap in the confidence interval for the proportions.

It is possible that this observation is due to a higher anxiety level in the countries in general. Therefore, the people may be worried about everything, whether it is getting COVID-19, the side effect of COVID-19 vaccine, or whether they can get promotion, get married, remembered to feed their fish, and whether they have locked the door and turn off the stove.

This must be further investigated, so the common factor between these countries can be determined.

Footnotes

    References

    Charumilind, Sarun, Matt Craven, Jessica Lamb, Adam Sabow, and Matt Wilson. 2021. “When Will the COVID-19 Pandemic End?” McKinsey &Amp; Company. McKinsey &amp; Company. https://www.mckinsey.com/industries/healthcare-systems-and-services/our-insights/when-will-the-covid-19-pandemic-end.
    Jones, and P Sarah. 2020. “Imperial College London YouGov Covid Data Hub.” GitHub. Imperial College London Big Data Analytical Unit; YouGov Plc. 2020. https://github.com/YouGov-Data/covid-19-tracker.
    Pirrone, Alana. 2020. “Seven Different Ways to Display Likert Scale Data.” Medium. https://medium.com/nightingale/seven-different-ways-to-display-likert-scale-data-d0c1c9a9ad59.
    Shaffer, Jeffrey. n.d. “5 Tips on Designing Colorblind-Friendly Visualizations.” Tableau. https://www.tableau.com/about/blog/2016/4/examining-data-viz-rules-dont-use-red-green-together-53463.
    Stone, M. 2006. “Choosing Colors for Data Visualization.” Perceptual Edge. http://www.perceptualedge.com/articles/b-eye/choosing_colors.pdf.
    Tableau. n.d. “Sorting by Field Is Unavailable for Data Blended Measures: Tableau Software.” Sorting by Field Is Unavailable for Data Blended Measures | Tableau Software. Tableau. https://kb.tableau.com/articles/issue/sorting-by-field-unavailable-for-data-blended-measures.
    Torres, Nicole. 2016. “Why It’s so Hard for Us to Visualize Uncertainty.” Harvard Business Review. https://hbr.org/2016/11/why-its-so-hard-for-us-to-visualize-uncertainty.
    Wexler, Steve. 2018. “Showing Uncertainty in Survey Results.” Data Revelations. https://www.datarevelations.com/resources/showing-uncertainty/.
    ———. 2020. “Rethinking the Divergent Stacked Bar Chart - Placing the Stronger Views in the Center.” Data Revelations. https://www.datarevelations.com/resources/rethinkingdivergent/.
    Yi, Michael. 2019. “How to Choose the Colors for Your Data Visualizations.” Medium. https://medium.com/nightingale/how-to-choose-the-colors-for-your-data-visualizations-50b2557fa335.

    Citation

    For attribution, please cite this work as

    Djojosaputro (2021, Feb. 12). Gabriella Pauline: DataViz Makeover 2 - COVID Vaccine Survey. Retrieved from https://gabriellapauline.netlify.app/posts/2021-02-12-dataviz-makeover-2/

    BibTeX citation

    @misc{djojosaputro2021dataviz,
      author = {Djojosaputro, Gabriella Pauline},
      title = {Gabriella Pauline: DataViz Makeover 2 - COVID Vaccine Survey},
      url = {https://gabriellapauline.netlify.app/posts/2021-02-12-dataviz-makeover-2/},
      year = {2021}
    }