论文标题
数据站:结合数据,计算和市场力量
The Data Station: Combining Data, Compute, and Market Forces
论文作者
论文摘要
本文介绍了数据站,这是我们正在设计的新数据体系结构,以解决我们今天面临的一些最具挑战性的数据问题:访问敏感数据;数据发现和集成;以及治理和合规性。数据站偏离了现代数据湖泊,因为数据和衍生的数据产品(例如机器学习模型)都是密封的,无法直接看到,访问或下载任何人。数据站没有向用户传递数据;相反,用户将问题带到数据中。对数据和计算之间通常的关系的这种反转降低了许多与共享和使用敏感数据相关的安全风险。 数据站的设计遵循许多数据问题需要人参与的原则,而激励措施是获得此类参与的关键。为此,数据站实施了市场设计,以创建,管理和协调激励措施的使用。我们解释了这种新型平台及其设计的动机。
This paper introduces Data Stations, a new data architecture that we are designing to tackle some of the most challenging data problems that we face today: access to sensitive data; data discovery and integration; and governance and compliance. Data Stations depart from modern data lakes in that both data and derived data products, such as machine learning models, are sealed and cannot be directly seen, accessed, or downloaded by anyone. Data Stations do not deliver data to users; instead, users bring questions to data. This inversion of the usual relationship between data and compute mitigates many of the security risks that are otherwise associated with sharing and working with sensitive data. Data Stations are designed following the principle that many data problems require human involvement, and that incentives are the key to obtaining such involvement. To that end, Data Stations implement market designs to create, manage, and coordinate the use of incentives. We explain the motivation for this new kind of platform and its design.