论文标题

数据Airlock:限制数据信息学的基础架构

The Data Airlock: infrastructure for restricted data informatics

论文作者

Rolan, Gregory, Dalins, Janis, Wilson, Campbell

论文摘要

当由于各种法律,安全,道德或实际原因,禁止从数据存储组织外部访问操作数据或模型时,数据科学协作是有问题的。在执行协作数据科学工作时,针对此类限制的数据,存在重大数据隐私挑战。在本文中,我们描述了与受限制数据相关的一系列原因和风险,以及可用于减轻此类问题的社会,环境,数据和加密度量。然后,我们展示这些通常是不足以进行限制数据上下文的不足之处,并介绍“数据锁” - 安全的基础架构,以促进“眼睛”数据科学工作负载。在描述了我们的用例之后,我们详细介绍了数据Airlock基础架构的第一个单一组织版本的体系结构和实现。我们以结果和从该实施中学习以及第二个联合版本的概述要求结束。

Data science collaboration is problematic when access to operational data or models from outside the data-holding organisation is prohibited, for a variety of legal, security, ethical, or practical reasons. There are significant data privacy challenges when performing collaborative data science work against such restricted data. In this paper we describe a range of causes and risks associated with restricted data along with the social, environmental, data, and cryptographic measures that may be used to mitigate such issues. We then show how these are generally inadequate for restricted data contexts and introduce the 'Data Airlock' - secure infrastructure that facilitates 'eyes-off' data science workloads. After describing our use-case we detail the architecture and implementation of a first, single-organisation version of the Data Airlock infrastructure. We conclude with outcomes and learning from this implementation, and outline requirements for a second, federated version.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源