Accelerating Privacy-Preserving Research in the Age of COVID-19
A transparent look under the hood of the COVID Alliance Research Platform (CARP)
Amid the greatest natural disaster to hit the United States in a century, many researchers, academics and technologists have been trying to find the most valuable way to contribute. It’s a question that the COVID Alliance has been grappling with since the early days of the COVID pandemic — and one that we have built a solution to address head-on.
Research in the age of COVID presents a dilemma: the urgent need for innovative epidemiological, public health, and social science research on the one hand; monumental risks for the erosion of data privacy on the other.
It took only weeks for contact-tracing apps in the United States and South America to be flagged by watchdogs for violating their own privacy policies. Privacy International advocacy director Edin Omanovic put the situation bluntly, saying the surveillance industry “understands that this is an opportunity comparable to 9/11 in terms of legitimizing and normalizing surveillance. We’ve seen a huge willingness from people to help them as much as possible. However, helping health authorities fight the virus is different to helping security authorities use this moment as an opportunity for a data grab.”
The Alliance aims to accelerate critical research and generate policy-relevant insights to mitigate the coronavirus pandemic without sacrificing individual privacy. The Alliance does this by democratizing access to data and analytical resources, ensuring that underserved communities can stand on equal footing with their ability to make informed decisions to save lives.
Drawing from our team’s multi-disciplinary background, we share several driving beliefs:
- There is a critical need for epidemiological and public health research to inform effective policy responses and ensure transparent, equitable, and evidence-based policymaking.
- There is an abundance of novel data sources available to drive such research and policy-making, such as data relating to population movements, self-reported symptoms, economic activity, test results, and hospital capacity. Research and insights from these datasets could save lives if enacted into policy.
- Unfortunately, use of these datasets entails massive potential privacy risks, and also requires large up-front costs in terms of purchasing, ingesting, and transforming data.
Without a systemic intervention, these shortfalls are obstructing the development of the research and policy response when it is needed most. It’s because of this commitment to privacy-preserving products that we’re sharing the technological underpinnings behind the Alliance’s first live product.
The COVID Alliance Research Platform (CARP) is accelerating research critical to informing the COVID-19 response by providing accessible and scalable data for scientists studying the disease. Built on the Alliance’s secure data platform and hosted on the Snowflake Data Marketplace, CARP utilizes novel datasets provided by data partners (CARP does not collect data directly from individuals). Many of these datasets have rarely been available to researchers and academics, particularly within the scope of public health research.
Our launch partners — including a nationally-renowned team of researchers at the University of Chicago’s Harris School of Public Policy and the RAND Corporation, a leading nonprofit global policy think tank — are tackling timely research questions that have a direct bearing on the policy response to this ongoing pandemic. Some examples include:
- How do communities formulate effective reopening plans that balance growth against safety, while also providing transparency and equity?
- How can public health officials evaluate the effects of COVID-mitigation policies (e.g. social distancing rules, travel bans, quarantine rules, and school closures), so that policymakers can continuously learn from and improve policy going forward?
- How can the Alliance introduce universities and governments to the power of big data and foster collaboration?
In the coming weeks, we plan to open applications to CARP to dozens more qualified research partners and academic institutions around the country (and world). The Alliance will carefully evaluate every research application to examine research credentials, purpose, data access requested, and other criteria to ensure that our platform is prioritized for the most pressing problems.
Prioritizing research is an important step for the Alliance, but it must be done appropriately with proper protections and controls. This is why we’ve built our CARP platform by embedding privacy and security into the product design, operations, infrastructure, IT systems, and business practices from its beginning.
By ensuring privacy and security through every phase of the data lifecycle, CARP enables processing of novel and timely data relevant to solving the pandemic while remaining uncompromising on the protection of individual privacy. This will bridge the gap between researchers with the expertise to tackle quantitative questions relevant to stemming the pandemic, and data-driven analytics based on real-world information about citizen mobility and health.
In the spirit of transparency and open science, let’s take a deeper look at how the Alliance has created the CARP with a privacy-first approach towards analytics on geolocation data.
A research environment built for speed, scale & privacy
At the highest level, the CARP is divided into three environments:
- A Data Ingestion Enclave is responsible for sourcing and ingesting information from a variety of proprietary vendors who have made their data accessible to the Alliance, including XMode, SafeGraph, and others.
- A Data Lake powered by Snowflake and compartmentalized by Immuta, our data governance provider, allows for the separation of data by category and use case, along with deeper protections to shield sensitive columns or data attributes.
- The Data Analytics & Research Enclaves combines the latest in interactive analytics and machine learning environments to provide a consolidated experience with strong security controls across all layers of the ecosystem.
Permissioning to ensure an air-tight environment
Before making CARP data available to researchers, the Alliance undertook an extensive data inventory to understand and record data available to researchers. Such data is recorded in the Alliance Knowledge Base. The Knowledge Base houses descriptions of the Alliance’s data and information on other resources or considerations that would be helpful to potential researchers in preparing a full research application.
Potential researchers will fill out a brief application with their contact information, affiliations, and reason they are seeking access to the Knowledge Base. In addition, the Alliance will complete a Data Privacy Impact Assessment (“DPIA”) to understand and mitigate privacy and information security risks. After their research application is approved, prospective researchers can consult the Alliance Knowledge Base for their research project.
As part of the approval process, external researchers will sign a data use agreement with the Alliance prior to gaining access. Modeled on the NIH’s public access policies, this agreement prohibits re-identification and secondary use of data. Academic researchers targeting a specific study will provide written attestation from their Human Subjects Protection Committee or Internal Review Board for approval of the study protocol prior to being granted access to notebook research environments.
After approval to the environment, the user will be provided limited access to Alliance-curated data sources through Immuta, the backend for the permission and governance layer. Immuta provides the ability to aggregate, anonymize, and sydonomyze data to create shielded views on arbitrary data tables, selectively applying hashing, K-anonymization, or blanking of sparsely represented values. These data anonymization techniques allow for computation on individual records that contain insights derived from mobility data without risking the exposure of identities or sensitive attributes.
Additional controls will limit the scope and use of data based on the user’s level of clearance. Immuta provides a robust set of mechanisms to group related data sources with similar access controls, whether on Snowflake or temporary research artifacts on S3. These controls are used to create several access groups that depend on the researcher’s specific needs, actively maintained by our internal Governance and Permissioning team.
The primary interface for interacting with data will be within a curated Jupyter notebook administered through Saturn Cloud. These notebooks come pre-enabled with standard data analysis tools like pandas, sklearn and the ability to publish to internal visualization tools provided by the Alliance. Users can use these to consume existing tables or create their own.
For example, a curated table may contain inferred likelihood of a given individual’s probability of being a healthcare or essential worker based on their mobility behavior (e.g., regular proximity to a hospital in certain time windows, or a pattern of deviating from their home location). The CARP governance tools allow for precise row-level permissioning to only information that is necessary to a given researcher’s analysis.
Any machine learning analysis that uses shielded attributes can later be reproduced on the full data through a Pachyderm integration, which allows for deployable data science workloads. In the next evolution of the CARP ecosystem, these deployments will be integrated into an Insights Platform consisting of dashboards exposing the most salient research outputs derived from the Research Platform.
With the CARP, the Alliance is enabling high-quality, policy shifting research while preserving individual privacy. As we move forward into the next phase of this pandemic, we are committed to working with researchers and governments alike to draw out substantive insights that will save lives.
If your research institution is interested in applying for access to the COVID Alliance Research Platform, please visit our website. Please note that a defined scope of research and verifiable credentials are required in order to receive access to CARP and corresponding datasets.
About the Author
Elena Elkina is Privacy and Data Protection Advisor to the COVID Alliance. She is also co-founder and Partner at Aleada Consulting, where she advises clients on privacy, data protection, and information security issues. During her 20-year compliance career, she has worked with healthcare institutions, Internet of Things, and software companies, major law firms, and the government sector on both international and domestic levels. Elena is a Co-founder and a Board Member for Women in Security and Privacy (WISP), a non-profit organization that aims to advance women in the privacy and security fields. She also serves on the advisory board of StaySafeOnline, Privacy Day where she educates and empowers our global digital society to use the internet safely and securely.