论文标题
数据与价值:自然语言项目的评估优先方法
Data-to-Value: An Evaluation-First Methodology for Natural Language Projects
论文作者
论文摘要
大数据,即按大规模收集,存储和处理数据,由于商品计算机的群集的到来,这些计算机的到来是由应用程序级分布式分布式的并行操作系统(如HDFS/HADOOP/SPARK)提供的,并且此类基础架构已按大规模彻底改变了数据挖掘。 For data mining project to succeed more consistently, some methodologies were developed (e.g. CRISP-DM, SEMMA, KDD), but these do not account for (1) very large scales of processing, (2) dealing with textual (unstructured) data (i.e. Natural Language Processing (NLP, "text analytics"), and (3) non-technical considerations (e.g. legal, ethical, project managerial aspects). 为了解决这些缺点,引入了一种新方法,称为“数据到价值”(D2V),该方法由详细的问题目录进行指导,以避免在面对与方法相关的相当抽象的盒子和箭头图时,与大数据分析项目团队脱节。
Big data, i.e. collecting, storing and processing of data at scale, has recently been possible due to the arrival of clusters of commodity computers powered by application-level distributed parallel operating systems like HDFS/Hadoop/Spark, and such infrastructures have revolutionized data mining at scale. For data mining project to succeed more consistently, some methodologies were developed (e.g. CRISP-DM, SEMMA, KDD), but these do not account for (1) very large scales of processing, (2) dealing with textual (unstructured) data (i.e. Natural Language Processing (NLP, "text analytics"), and (3) non-technical considerations (e.g. legal, ethical, project managerial aspects). To address these shortcomings, a new methodology, called "Data to Value" (D2V), is introduced, which is guided by a detailed catalog of questions in order to avoid a disconnect of big data text analytics project team with the topic when facing rather abstract box-and-arrow diagrams commonly associated with methodologies.