基于半结构性阿育吠陀文本的语义注释和查询框架

论文标题

基于半结构性阿育吠陀文本的语义注释和查询框架

Semantic Annotation and Querying Framework based on Semi-structured Ayurvedic Text

论文作者

Terdalkar, Hrishikesh, Bhattacharya, Arnab, Dubey, Madhulika, S, Ramamurthy, Singh, Bhavna Naneria

论文摘要

知识库（KB）是许多自然语言处理（NLP）和信息检索（IR）任务的重要资源，例如语义搜索，自动提问等。它们对试图从文本中获取信息的研究人员也很有用。但是，不幸的是，由于无法获得或缺乏工具和方法的足够准确性，梵文NLP中最新的NLP中最新的知识库构造。因此，在这项工作中，我们描述了我们在梵文文本的手动注释上的努力，以创建知识图（KG）。我们从Ayurvedic文本Bhavaprakasha的Bhavaprakashanighighantu中选择Dhanyavarga的章节。构造的知识图包含410个实体和764个关系。由于Bhavaprakashanighantu是描述不同物质的各种特性的技术词汇表文本，因此我们开发了一个详尽的本体论来捕获本文中存在的实体和关系类型的语义。要查询知识图，我们设计了31个查询模板，涵盖了大多数常见的问题模式。对于手动注释和查询，我们自定义了以前由我们开发的Sangrahaka框架。包括数据集在内的整个系统可从https://sanskrit.iitk.ac.in/ayurveda/获得。我们希望通过手动注释和随后的策划创建的知识图将有助于将来开发和测试NLP工具，并研究Bhavaprakasanighantu文本。

Knowledge bases (KB) are an important resource in a number of natural language processing (NLP) and information retrieval (IR) tasks, such as semantic search, automated question-answering etc. They are also useful for researchers trying to gain information from a text. Unfortunately, however, the state-of-the-art in Sanskrit NLP does not yet allow automated construction of knowledge bases due to unavailability or lack of sufficient accuracy of tools and methods. Thus, in this work, we describe our efforts on manual annotation of Sanskrit text for the purpose of knowledge graph (KG) creation. We choose the chapter Dhanyavarga from Bhavaprakashanighantu of the Ayurvedic text Bhavaprakasha for annotation. The constructed knowledge graph contains 410 entities and 764 relationships. Since Bhavaprakashanighantu is a technical glossary text that describes various properties of different substances, we develop an elaborate ontology to capture the semantics of the entity and relationship types present in the text. To query the knowledge graph, we design 31 query templates that cover most of the common question patterns. For both manual annotation and querying, we customize the Sangrahaka framework previously developed by us. The entire system including the dataset is available from https://sanskrit.iitk.ac.in/ayurveda/ . We hope that the knowledge graph that we have created through manual annotation and subsequent curation will help in development and testing of NLP tools in future as well as studying of the Bhavaprakasanighantu text.

下载PDF全文

下载文献需遵守相关版权规定

论文标题