论文标题
多元驱动的迪里奇霍克斯流程
Multivariate Powered Dirichlet Hawkes Process
论文作者
论文摘要
文档的出版时间带有有关其语义内容的相关信息。 Dirichlet-Hawkes流程已提议共同模拟文本信息和出版动力学。在最近的几项工作中,这种方法已成功使用,并扩展了以解决特定的具有挑战性的问题 - 对于短文或纠缠出版的动态而言。但是,当前形式的先验不允许进行复杂的出版动力。特别是,推断的主题彼此独立 - 例如,关于金融的出版物被认为对政治出版物没有影响。 在这项工作中,我们开发了多元驱动的Dirichlet-Hawkes流程(MPDHP),从而减轻了这一假设。有关各种主题的出版物现在可以互相影响。我们详细介绍并克服了考虑互动主题所带来的技术挑战。我们在一系列合成数据集上对MPDHP进行了系统评估,以定义其应用域和局限性。最后,我们在Reddit数据上开发了MPDHP的用例。在本文的结尾,有兴趣的读者将知道如何以及何时使用MPDHP,并且何时不使用MPDHP。
The publication time of a document carries a relevant information about its semantic content. The Dirichlet-Hawkes process has been proposed to jointly model textual information and publication dynamics. This approach has been used with success in several recent works, and extended to tackle specific challenging problems --typically for short texts or entangled publication dynamics. However, the prior in its current form does not allow for complex publication dynamics. In particular, inferred topics are independent from each other --a publication about finance is assumed to have no influence on publications about politics, for instance. In this work, we develop the Multivariate Powered Dirichlet-Hawkes Process (MPDHP), that alleviates this assumption. Publications about various topics can now influence each other. We detail and overcome the technical challenges that arise from considering interacting topics. We conduct a systematic evaluation of MPDHP on a range of synthetic datasets to define its application domain and limitations. Finally, we develop a use case of the MPDHP on Reddit data. At the end of this article, the interested reader will know how and when to use MPDHP, and when not to.