论文标题
从依赖性解析树中提取无头MW:解析,标记和关节建模方法
Extracting Headless MWEs from Dependency Parse Trees: Parsing, Tagging, and Joint Modeling Approaches
论文作者
论文摘要
无头的MWE是一种有趣且频繁的多词表达式(MWE),没有真正的内部句法主导关系。例子包括许多命名实体(“富国银行”)和日期(“ 2020年7月5日”)以及某些生产构造(“吹气”,“日复一日”)。尽管其特殊的地位和流行率,但目前的依赖性批准方案仍需要处理此类平坦结构,就好像它们具有内部句法头部一样,并且大多数当前的解析器以与头部结构相同的方式处理它们。同时,在解析的背景之外,标签者通常用于识别MWES,但标记可能会从结构信息中受益。我们从经验上比较了这两种常见的策略 - 远程和标记 - 预测平面MWE。此外,我们提出了一种有效的关节解码算法,该算法结合了两种策略的得分。关于MWE意识到的英语依赖语料库的实验结果以及频繁平坦结构的六个非英国依赖树库表明:(1)标记比识别平面结构MWE的解析更准确,(2)我们的联合解码器和解两种不同的视图,并为非伯特特征提供了更高的精确范围,从而使par and carters conters conters conters conters conters conters conters conters conters conters conters conters conters conters and conters and carters and carters and carters and carters and(3)。
An interesting and frequent type of multi-word expression (MWE) is the headless MWE, for which there are no true internal syntactic dominance relations; examples include many named entities ("Wells Fargo") and dates ("July 5, 2020") as well as certain productive constructions ("blow for blow", "day after day"). Despite their special status and prevalence, current dependency-annotation schemes require treating such flat structures as if they had internal syntactic heads, and most current parsers handle them in the same fashion as headed constructions. Meanwhile, outside the context of parsing, taggers are typically used for identifying MWEs, but taggers might benefit from structural information. We empirically compare these two common strategies--parsing and tagging--for predicting flat MWEs. Additionally, we propose an efficient joint decoding algorithm that combines scores from both strategies. Experimental results on the MWE-Aware English Dependency Corpus and on six non-English dependency treebanks with frequent flat structures show that: (1) tagging is more accurate than parsing for identifying flat-structure MWEs, (2) our joint decoder reconciles the two different views and, for non-BERT features, leads to higher accuracies, and (3) most of the gains result from feature sharing between the parsers and taggers.