Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


AI for Good: Fighting COVID-19 with Data Science
Latest   Machine Learning

AI for Good: Fighting COVID-19 with Data Science

Last Updated on July 20, 2023 by Editorial Team

Author(s): Tadeusz Bara-Slupski

Originally published on Towards AI.

This is the first of two articles about our recent participation in the Pandemic Response Hackathon. Stay tuned for technical details of our CoronaRank solution (Markov Chains, R, Shiny, and how to manipulate a dataset of >100GB quickly).

The COVID-19 pandemic is putting an unprecedented strain on communities, healthcare systems, and the economy. Much of the effort towards containing the spread of the virus remains with taking individual responsibility for the benefit of the wider community. Various governmental agencies and international organizations are putting policies in place aimed at containing the pandemic and maximizing the efficiency of healthcare service delivery.

What can a data science company do to assist these efforts?

Our AI for Good initiative aims at bridging the gap between tech expertise and those in need of such support who are at the forefront of the fight for a sustainable future of our planet. Committed to this vision, we set out to contribute our data science skills to a project which could reduce the impact of the COVID-19 pandemic.

We recently got this chance during a hackathon centered around finding solutions for the global pandemic. During the hackathon, we developed CoronaRank — an algorithm that provides users with a personal coronavirus risk score and generates heat maps of risky areas.

Pandemic Response Hackathon

Devpost is a platform that provides the tech community with an opportunity to contribute to overcoming various global challenges. Their recent Pandemic Response Hackathon asked the participants to develop technologies to solve what appears to be the most significant public health challenge in decades.

The hackathon launched on the 27th of March. Over the next three days, more than 2,000 participants got involved and submitted upwards of 230 projects across four tracks:

  • Public Health and Information Sharing
  • Epidemiology & Science of the Disease
  • Keeping our Health Workers Safe
  • Second-Order Societal Impacts

30 different organizations committed resources, including cloud computing from Amazon AWS, visualization tools from Mapbox, datasets from Veraset, and many others.

We entered the hackathon in collaboration with Ewa Knitter, an infectious disease epidemiologist who kindly offered to support our efforts.

Problems we set out to tackle

After initial discussions, we identified several problems particularly compelling in the current outbreak, and we realized that they could be addressed using geolocation data. Specifically:

  • COVID-19 tests are a limited resource, and there’s not an obvious way to decide who should be tested.
  • Since few tests are being done, and partly because many infected people are asymptomatic, it’s challenging to know which people and areas to avoid.
  • Supply chain management in the healthcare sector is going to be extremely difficult moving forward, and policymakers need information on the current potential hotspots where an outbreak might be imminent.
  • Many young, healthy people are ignoring social distancing guidance on the basis that they have a low personal risk. We need a way to illustrate how breaking isolation can affect communities.

Our solution

To address these problems, we decided to create heatmaps of pandemic hotspots with high human interaction. Such heatmaps would give public officials an idea of the locations for the next potential outbreak and provide the users with information about the risk of noncompliance with public health measures.

To achieve this, we took inspiration from Google’s PageRank algorithm, which ranks web pages based partly on their interactions and connections with other popular web pages. We replicated this methodology in epidemiology with Markov Chain modeling. The resulting CoronaRank is an algorithm that uses geolocation data, epidemiology data, self-reporting, and Markov Chain modeling to assess the likelihood of coronavirus exposure.

To create and implement CoronaRank, we made use of the Veraset database for New York. Veraset provides anonymized phone geolocation data giving each individual a unique identifier.

The challenge was to analyze this large dataset (over 100GB of data per day) in a limited timeframe. However, building on our previous experience with Big Data, we were able to quickly develop the algorithm. We went on to embed it within a web application — Community Shield — designed for use on smartphones, which displays pandemic hotspots — areas with high activity in a recent period, as well as give the user a risk score depending on how many interactions they had in these hotspot areas.

Heat map of New York. Darker (red) circles indicate locations with higher risk.

An individual’s CoronaRank is the likelihood that they may be infected with COVID-19. Confirmed cases are assigned a CoronaRank of 1. Non-confirmed persons are assigned a CoronaRank of 0<x<1 based on the interactions or possible interactions with others based on geolocation data from the past two weeks obtained from phones.

Demo of user input.

The more you travel to risky places, the higher your CoronaRank. The more high-rank people visit a place, the riskier it becomes.

A high-risk individual (CoronaRank of 0.9) visited several high-risk areas in Manhattan recently.

You can test out the demo of the app here. For now, it includes three predefined risk profiles to showcase the app’s capabilities.

Our plans for the future

We plan to develop the CoronaRank algorithm further by including a self-reporting feature. This way, the user can anonymously provide information about their COVID-related symptoms (if any). This will affect their CoronaRank and, by extension, that of all other people they met in recent weeks. This would be very valuable to public health organizations that do not have the capacity to screen and test each citizen.

We also aim to integrate Google Takeout to import personal location data into the app to make it fully user-specific and improve the UI.

We hope to partner with governmental and international institutions to get endorsement for the app and deliver it to the public. A long-term collaboration would help to turn the app into a comprehensive tool to educate individuals and drive informed healthcare delivery policy for public institutions. To make this a reality, we need to obtain cloud resources to make this app available at scale.

Follow Us for More

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓