论文标题
页面表:保持平坦而热(缓存)
Page Tables: Keeping them Flat and Hot (Cached)
论文作者
论文摘要
由于内存能力超过了TLB覆盖范围,因此大型数据应用程序经常出现页面桌面步行。我们研究了解决此费用的两种互补技术:减少所需的访问数量并减少每个访问的延迟。第一种方法是通过机会主义地“平淡”页面表来完成的:将两个传统4KB页面表节点合并到一个2MB节点中,从而减少了表的深度和搜索它所需的间接数量。第二个是通过偏向高速缓存算法来完成的,以在高TLB错过率的高时期内保持页面表格,因为这些时期还看到了高数据失误率,因此,在缓存中拥有较小的页面表中的较小的页面桌子比遭受增加数据缓存的可能性更大的可能受益。 我们评估了本机和虚拟化系统的这些方法,以及各种逼真的内存碎片场景,描述了我们的内核实施和硬件设计中所需的有限变化,识别和解决与自我引用页面表和内核内存分配相关的挑战,并使用学术和工业模拟器进行稳健性来比较服务器和移动系统的结果。 我们发现,扁平化确实减少了页面步行中所需的访问次数(至1.0),但由于页面沃克卡切斯(已经1.5访问),其性能影响(+2.3%)很小。优先级的缓存具有更大的效果( +6.8%),并且组合将性能提高 +9.2%。由于2D页面步行,对虚拟化系统(4.4至2.8访问, +7.1%的性能)更有效。通过组合两种技术,我们证明了最先进的 +14.0%的性能增益和-8.7%的动态缓存能量和-4.7%的虚拟执行动态DRAM Energion,并具有非常简单的硬件和软件更改。
As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page table walks. We investigate two complementary techniques for addressing this cost: reducing the number of accesses required and reducing the latency of each access. The first approach is accomplished by opportunistically "flattening" the page table: merging two levels of traditional 4KB page table nodes into a single 2MB node, thereby reducing the table's depth and the number of indirections required to search it. The second is accomplished by biasing the cache replacement algorithm to keep page table entries during periods of high TLB miss rates, as these periods also see high data miss rates and are therefore more likely to benefit from having the smaller page table in the cache than to suffer from increased data cache misses. We evaluate these approaches for both native and virtualized systems and across a range of realistic memory fragmentation scenarios, describe the limited changes needed in our kernel implementation and hardware design, identify and address challenges related to self-referencing page tables and kernel memory allocation, and compare results across server and mobile systems using both academic and industrial simulators for robustness. We find that flattening does reduce the number of accesses required on a page walk (to 1.0), but its performance impact (+2.3%) is small due to Page Walker Caches (already 1.5 accesses). Prioritizing caching has a larger effect (+6.8%), and the combination improves performance by +9.2%. Flattening is more effective on virtualized systems (4.4 to 2.8 accesses, +7.1% performance), due to 2D page walks. By combining the two techniques we demonstrate a state-of-the-art +14.0% performance gain and -8.7% dynamic cache energy and -4.7% dynamic DRAM energy for virtualized execution with very simple hardware and software changes.