数据与价值：自然语言项目的评估优先方法

论文标题

数据与价值：自然语言项目的评估优先方法

Data-to-Value: An Evaluation-First Methodology for Natural Language Projects

论文作者

Leidner, Jochen L.

论文摘要

大数据，即按大规模收集，存储和处理数据，由于商品计算机的群集的到来，这些计算机的到来是由应用程序级分布式分布式的并行操作系统（如HDFS/HADOOP/SPARK）提供的，并且此类基础架构已按大规模彻底改变了数据挖掘。 For data mining project to succeed more consistently, some methodologies were developed (e.g. CRISP-DM, SEMMA, KDD), but these do not account for (1) very large scales of processing, (2) dealing with textual (unstructured) data (i.e. Natural Language Processing (NLP, "text analytics"), and (3) non-technical considerations (e.g. legal, ethical, project managerial aspects). 为了解决这些缺点，引入了一种新方法，称为“数据到价值”（D2V），该方法由详细的问题目录进行指导，以避免在面对与方法相关的相当抽象的盒子和箭头图时，与大数据分析项目团队脱节。

Big data, i.e. collecting, storing and processing of data at scale, has recently been possible due to the arrival of clusters of commodity computers powered by application-level distributed parallel operating systems like HDFS/Hadoop/Spark, and such infrastructures have revolutionized data mining at scale. For data mining project to succeed more consistently, some methodologies were developed (e.g. CRISP-DM, SEMMA, KDD), but these do not account for (1) very large scales of processing, (2) dealing with textual (unstructured) data (i.e. Natural Language Processing (NLP, "text analytics"), and (3) non-technical considerations (e.g. legal, ethical, project managerial aspects). To address these shortcomings, a new methodology, called "Data to Value" (D2V), is introduced, which is guided by a detailed catalog of questions in order to avoid a disconnect of big data text analytics project team with the topic when facing rather abstract box-and-arrow diagrams commonly associated with methodologies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题