<テクニカルレポート>
Discovering Frequent Substructures in Large Unordered Trees

作成者
本文言語
出版者
発行日
収録物名
出版タイプ
アクセス権
関連DOI
関連URI
関連情報
概要 In this paper, we study a data mining problem of discovering frequent substructures in a large collection of semi-structured data, where both of the patterns and the data are modeled by labeled unorde...red trees. An unordered tree is a directed acyclic graph with a specified node called the root, and all nodes but the root have at most one parent. Each node is labeled by a symbol drawn from an alphabet. Such unordered trees can be seen as either a generalization of itemsets in relational databases or an efficient specialization of attributed graphs in graph mining. They are also useful in various applications such as analysis of chemical compounds and mining hyperlink structures in Web. Introducing novel definitions of the support and the canonical form for unordered trees, we present an efficient algorithm called Unot that computes all labeled unordered trees appearing in a collection of data trees with frequency above a user-specified threshold. We prove that the algorithm enumerates each frequent pattern T in $ O(kb^2n) $ per pattern, where $ k $ is the size of $ T $, $ b $ is the branching factor of the data tree, and $ n $ is the total number of occurrences of $ T $ in the data trees. The keys of the algorithm are efficient enumerating all unordered trees in canonical form and incrementally computation of the occurrences based on a powerful design technique known as the reverse search.続きを見る

本文ファイル

pdf trcs216 pdf 237 KB 671  

詳細

レコードID
査読有無
タイプ
登録日 2009.04.22
更新日 2018.08.31