论文标题

使用BM25加权的粉Silte语言的信息检索系统

Information retrieval system for silte language using BM25 weighting

论文作者

Johar, Abdulmalik

论文摘要

信息检索系统的主要目的是根据用户需求从大量数据收集中提取适当的信息。信息检索系统的基本概念是,当用户发送查询时,该系统将根据其相关程度来生成按顺序排名的相关文档的列表。数字非结构化的粉质文本文档不时增加。数字文本信息的增长使正确信息的利用和访问变得困难。因此,开发用于尔语言的信息检索系统允许搜索和检索满足用户信息需求的相关文档。在这项研究中,我们设计了用于SILTE语言的概率信息检索系统。该系统具有索引和搜索部分。在这些模块中,包括了不同的文本操作,例如令牌化,词干,删除词和同义词。

The main aim of an information retrieval system is to extract appropriate information from an enormous collection of data based on users need. The basic concept of the information retrieval system is that when a user sends out a query, the system would try to generate a list of related documents ranked in order, according to their degree of relevance. Digital unstructured Silte text documents increase from time to time. The growth of digital text information makes the utilization and access of the right information difficult. Thus, developing an information retrieval system for Silte language allows searching and retrieving relevant documents that satisfy information need of users. In this research, we design probabilistic information retrieval system for Silte language. The system has both indexing and searching part was created. In these modules, different text operations such as tokenization, stemming, stop word removal and synonym is included.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源