Building a Dashboard for Real-time COVID-19 Case Reports and Vaccinations with Dash

Crystal Huang
Nerd For Tech
Published in
6 min readJul 14, 2021

--

snapshot of the dashboard product created by the author

Besides predictive models, Dashboard is another great way to unleash the power of data by displaying and visualizing data for domain experts to analyze. Building on top of the idea of the dashboard, periodically updating it with new data can reflect a real-time data visualization, hence produce more accurate analysis.

For my last project at Metis, our task is — Using a large to massive scale dataset obtained by any means, engineer an end-to-end data storage and processing pipeline that provides a useful service in any domain of interest.

Since all my previous projects have linked to COVID-19 one way or another, I thought —

Why not echo back to the COVID-19 data?

Disclaimer: I am new to machine learning and also to blogging. So, if there are any mistakes, please do let me know. All feedback is appreciated.

Motivations

As the Delta variant of COVID-19 spread looming over the US, CDC and healthcare experts are warning poorly vaccinated regions to be prepared for the renewed danger. Upon researching for my current state Virginia’s status, I realized there are mostly cases surveillance dashboards. Although there are a few vaccination dashboards, they are separated from the cases surveillance, which can be inconvenient for users if they want to explore the relationship between cases and vaccinations.

So the goal of this project is to build a pipeline for a dashboard to visualize Virginia COVID-19 case reports and vaccinations with real-time data.

Data

  • The Virginia Vaccine Administered dataset (267,000 data points as of 6/24/2021) and COVID cases dataset (60,600 data points as of 6/24/2021) from Virginia Open Data. The data are updated every day so the database is constantly growing.
  • Cense data for Virginia counties population

Tools

  • SODA API
  • SQL, sqlachemy
  • Python(pandas,numpy)
  • Dash
  • Heroku
  • cron job

Methodology

The main idea of this project is to build an end-to-end pipeline that will produce a dashboard with updated data. Below, I’ll show the overview and breakdown of the pipeline:

Pipeline Overview

Pipeline visualization created by the author
  1. Data Acquisition — Data is acquired through API with python script
  2. Data Storage — The acquired data are stored into SQL database and accessed with sqlachemy
  3. Data Preprocess — data cleaning and wrangling
  4. Data Visualization — creates plotly graphs for visualization and analysis
  5. Web application — creates the interactive Dash dashboard app using plotly graphs
  6. Deployment — App is then deployed on Heroku for public usage/demo.
  7. Automation — a cronjob is set up to run the python script for Data acquisition, storage and preprocess on a daily basis, so the dashboard will reflect the real-time data.

About Dash

What is Dash?

Dash is an open-source Python library to create a web-based visualization application provided by Plotly. It is great for building dashboards, markdown reports, and any data visualization purpose.

I won’t dive deep into how Dash works here, as there are great articles and resources that explain better than I do. Here are a few if you like to check them out:

What I learned from using Dash

  • The code is more similar to Flask (since it is integrated with Flask) than Streamlit
  • Having some understanding of HTML and CSS would be helpful (especially if you’d like to make the dashboard look fancy)

I chose to use Dash out of curiosity since I’ve built Flask and Streamlit apps in my previous projects. Overall, I like that it is easy to use and gives a great appearance of the dashboard.

Results

And voila! Here’s the dashboard…

You can also check out the app here!

Insights

Using the dashboard product, I draw some preliminary analysis and insights from the data.

Counts of cases and vaccinations in Virginia counties
Rates of cases and vaccinations in Virginia counties
  1. Counts vs. Rates per 100,000 (measurement of disease)

Counts of reports and vaccination are higher in the same areas. And these areas are actually cities, so hence higher population.

By using rates, we see the pattern changes, those areas with higher case rates have lower vaccination rates and vice versa.

2. Timeseries analysis of cases and vaccinations

Timeseries comparison of cases and vaccinations

And comparing the time series of cases and vaccines, although we cannot infer that there is a causal relationship, we can see there is a trend of decrease in cases after an increase in vaccination. The case numbers reach a plateau that went down after the peak of vaccination occurred.

Dashboard showing Virginia Beach data
Dashboard showing Northhampton data

3. County-level comparison in the same region

Checking on Virginia beach, it’s doing worse compared to the regional average. It also has a lower percentage of the vaccinated population. And Most people are getting Pfizer at pharmacies, so that may be something the outreach programs can look into.

On the other hand, Northhampton, another county in the same region, is doing better than the regional average. Also has a higher percentage of the vaccinated population. And most of them are getting Moderna with other community health providers.

Overall, the dashboard showed that there are counties with high case rates and low vaccination rates and a trend of a decreasing number of cases with an increasing number of vaccinations. In addition, the county dashboard can be used to decide outreach strategy.

Future Work

If I had more time, I’d build a predictive model using time series/forecasting, apply more filters on the app such as dates and regions, and update dashboard graphs and visuals base on user feedback, if there are any.

Takeaways

For me, the challenging part of this project is to pick and choose what types of graphs to be on the dashboard that would add more value to the analysis. Without domain knowledge, it can be a bit lost. So I took some time and effort to read up on the disease measurement methods for this project. Always learning in every single way!

Overall, I enjoyed doing this project, building the dashboard, designing the layout, and making it more visually appealing. Building this from end to end gave me insights into each process of the pipeline for a data science project. As my last project at Metis, it is a great way to circle back to SQL, summarize my learnings, and experimenting with new tools.

Thanks for reading :) Hope it was interesting and insightful to you.

You can find my project work on my GitHub repo.

--

--