论文标题
Chandojnanam:梵文表识别和利用系统
Chandojnanam: A Sanskrit Meter Identification and Utilization System
论文作者
论文摘要
我们提出了Chandojñānam,这是一种基于网络的梵文计(Chanda)标识和利用系统。除了识别仪表的核心功能外,它还具有友好的用户界面以显示扫描,这是指标模式的图形表示。该系统通过使用后端中的光学特征识别(OCR)发动机来支持从上载图像中识别仪表。它还能够一次处理整个文本文件。该文本可以通过将其视为单个行列表或作为经文的集合来以两种模式进行处理。当线或经文不完全对应于已知仪表时,Chandojñānam能够根据序列匹配找到模糊(即近似和关闭)匹配的模糊。这打开了基于仪表的错误数字语料库校正的范围。该系统可在https://sanskrit.iitk.ac.in/jnanasangraha/chanda/上使用,python库的形式可在https://github.com/hrishikekeshrt/chanda/上提供。
We present Chandojñānam, a web-based Sanskrit meter (Chanda) identification and utilization system. In addition to the core functionality of identifying meters, it sports a friendly user interface to display the scansion, which is a graphical representation of the metrical pattern. The system supports identification of meters from uploaded images by using optical character recognition (OCR) engines in the backend. It is also able to process entire text files at a time. The text can be processed in two modes, either by treating it as a list of individual lines, or as a collection of verses. When a line or a verse does not correspond exactly to a known meter, Chandojñānam is capable of finding fuzzy (i.e., approximate and close) matches based on sequence matching. This opens up the scope of a meter-based correction of erroneous digital corpora. The system is available for use at https://sanskrit.iitk.ac.in/jnanasangraha/chanda/, and the source code in the form of a Python library is made available at https://github.com/hrishikeshrt/chanda/.