Face Mask Detection with Deep Learning and Computer Vision

Image detection with deep neural learning is something I’ve been looking forward to learn since I started my journey in data science. My background is in medical diagnostic sonography and echocardiography. And I was amazed by the application of deep learning techniques for detections of diseases/tumors using X-ray images. Although I didn’t choose to do medical images for the current project, I look forward to the applications in the future when there’s sufficient amount of medical image database, specifically ultrasound and echocardiogram images.

For the 6th project at Metis, we’re tasked to use primarily non-tabular data (images, text, time series, audio, etc.) and build a neural network model that addresses a useful prediction and/or recommendation problem in any domain of interest.

Disclaimer: I am new to machine learning and also to blogging. So, if there are any mistakes, please do let me know. All feedback is appreciated.

Backstory and Project Goal

source: https://www.eamc.org/news-and-media/why-is-wearing-a-mask-important

It’s the year of 2021, we all learned the importance of mask in our lives. As life slowly returns to normal, there are increased crowd volume in the public space. Although some states have lifted mask requirements, it is still mandated at indoor public places like airports and hospitals. But it’s difficult and inefficient to inspect the large crowd with labor screening. So, the goal of this project is to build a face mask detection system using deep learning algorithms and computer vision to let machine help with inspection.

Want to read this story later? Save it in Journal.

Tools and Approaches Used

  • Python (pandas, numpy)
  • Google colab (Cloud computing)
  • Keras/Tensorflow — CNN
  • OpenCV
  • Matplotlib, seaborn
  • Streamlit
  • Heroku


I used the Face Mask Detection Images Dataset from Kaggle that contains 11,800 images of face with mask and without mask. The dataset is already in a train/val/test directory format that is in ratio of 80:10:10. In addition, the data is pretty balanced between the two classes (about 50/50).

Looking at some examples of the images, there’s pretty wide range of faces without mask. There are also images that have object blocking the face. For images with mask, there’s also a variety of masks, even the fake-face masks. It also has some cartoon ones, as well as augmented images. So this dataset provides great variance of images.


I’ve broken down my workflow into 2 main section: Training Mask detector and Applying mask detector.


1. Train Mask Detector

To build the classification model, I first extracted a small subset of data from the full dataset to build a work pipeline using CNN algorithm.

from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Dropout
from keras.callbacks import EarlyStopping
model = Sequential()

model.add(Conv2D(32, (3,3),input_shape=(150,150,3),activation='relu'))

model.add(Conv2D(64, (3,3),activation='relu'))

model.add(Conv2D(128, (3,3),activation='relu'))

model.add(Conv2D(128, (3,3),activation='relu'))

model.add(Dense(512, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

result = model.fit_generator(
callbacks=[EarlyStopping(patience=16, verbose=1)]

Then, I fit the full train dataset after the model pipeline was established. The baseline binary model’s accuracy, precision, recall, and F1 score are all 0.99. Hence, the confusion matrix shown below also shows the model’s great performance on the test set.

Since the binary model did so well, I decided to try with 3-Class Model by adding another class of “Incorrect mask”. For that model, I added more datasets for 3-class model:

  • MaskedFace-Net dataset that contains images of faces with a correctly or incorrectly worn mask (used 8,990 images of incorrectly worn mask and 3,900 images of correctly worn mask)
  • CelebFaces Attributes Dataset from Kaggle that contains images of celebrity faces (used 3,900 images for without mask)

So the modified dataset now has 24,000 Images that’s evenly distributed among classes with a train/val/test directory of 80:10:10 split.

For the 3-class model, I followed the similar pipeline for baseline model with a slight modification:

model.add(Dense(3, activation='softmax'))


Surprisingly, the 3-class CNN model also did very well. Accuracy, precision, recall and F1 scores are all 0.99. Again, the confusion matrix also shows the model mislabeled only 17 images in the test set.

Findings and Insights

Both models — Binary and 3-class models, did very well within the dataset. But I double check those both with outside images. The binary model had great performance both within and outside the dataset while 3-class model did good within the dataset but not so good labeling “incorrect mask” in new images.

So the binary model is more suitable for real-life images and is used as the mask detector in the application.

2. Apply Mask Detector

But how do we apply this mask detector model to real-life images?

The dataset images are cropped to the face area, but in reality, most real-life images are usually bigger than face area and often has many faces in the images. So we have to make some transformation.

So, to apply mask detector to real-life images, first I’d need to extract faces from the real-life image using OpenCV and Haar Cascade Classifier.

A quick explanation of Haar Cascade Classifier — It’s an object detection algorithm used to identify faces in an image or a real-time video. The models are stored on GitHub, and we can access them with OpenCV methods. [1]

import cv2
import matplotlib.pyplot as plt
# Load the cascade
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
# Load image
face_img = cv2.imread('img.jpeg')
# convert to grayscale for cascade
face_img = cv2.cvtColor(face_img, cv2.IMREAD_GRAYSCALE)
# detect face area
face_rects = face_cascade.detectMultiScale(face_img)
# convert back to BGR layout for plt
out_img = cv2.cvtColor(face_img, cv2.COLOR_RGB2BGR)

# draw rectangle of the face area
(x,y,w,h) in face_rects:
cv2.rectangle(out_img, (x,y), (x+w,y+h), (255,255,255), 10)
# show image with face area drawn

After identifying faces and their locations, I applied the mask detector (binary CNN model) to each face area and showed prediction results.


The model did pretty well on the faces that are detected in the images. Here are some examples of the results.

Notice that some faces didn’t have prediction result. It is because the face detector didn’t pick up on the faces, therefore the mask detector was not applied to them. The face detector aka the Haar Cascade Classifier Algorithm do have some limitation on manipulated face image due to edge feature detection. But overall, this model still has great performance!

Application Usage

To put this project into production, I built a Streamlit app that allows users to upload images or use webcam for face mask detection.

Here’s a snapshot of the face mask detection on image…

Face mask detection on Images

And here’s a demo for Webcam mask detection…

Face mask detection on Webcam

Lastly, I deployed the Streamlit app on Heroku without the webcam feature. You can try out the Heroku app for Mask Patrol here!

Future Work

To extend this project, I would process the images for 3-class model by cropping them to the face area with face detector to build a more robust 3-class model that can be used outside of the dataset. I would also attempt feature extraction and data augmentation on “incorrect mask” images to generalize images.


1.Mittal, A. 2020. “Haar Cascades, Explained.” Analytics Vidhya. Medium. https://medium.com/analytics-vidhya/haar-cascades-explained-38210e57970d

2.Menon, A. 2019. “Face Detection in 2 Minutes using OpenCV & Python.” Towards Data Science. Medium. https://towardsdatascience.com/face-detection-in-2-minutes-using-opencv-python-90f89d7c0f81

3.Rosebrock, A. 2018. “OpenCV Face Recognition”. Pyimagesearch. https://www.pyimagesearch.com/2018/09/24/opencv-face-recognition/


Deep learning is very powerful! But when dealing with a large dataset, it’s always good to try working out a pipeline with a small set of data first. It’ll save some time on waiting for model to process.

Also if your computer doesn’t have a good GPU, try using cloud computing like Google Colab or Cloud. You’d appreciate the quietness of your computer when running the model on cloud. Or it’ll sound like a NASA launch in your room.

Thanks for reading :) Hope it was interesting and insightful to you.

You can find my project work on my GitHub repo.

Enjoyed this post? Subscribe to the Machine Learnings newsletter for an easy understanding of AI advances shaping our world.




Junior Data Scientist @ HealthRhythms. https://www.linkedin.com/in/crystal-huang-ds/

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Chapter 1 : Supervised Learning and Naive Bayes Classification — Part 1 (Theory)

Multilingual Toxic Comment Classification

Speech-to-Text using Convolutional Neural Networks

Custom Data TensorFlow Object detection API

Optimize your CPU for Deep Learning.

3 easy machine learning projects

Fujitsu AI, Tokyo U & RIKEN AIP Study Decomposes DNNs Into Modules That Can Be Recomposed Into New…

Shifting Peaks in Signal Separation

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Crystal Huang

Crystal Huang

Junior Data Scientist @ HealthRhythms. https://www.linkedin.com/in/crystal-huang-ds/

More from Medium

Prelude to Convolutional Neural Networks

Image Classification with Neural Network

Spoken word classification — Tensorflow Speech Recognition Challenge