<technical report>
Discovering Frequent Substructures in Large Unordered Trees

Creator
Language
Publisher
Date
Source Title
Vol
Publication Type
Access Rights
Related DOI
Related URI
Relation
Abstract In this paper, we study a data mining problem of discovering frequent substructures in a large collection of semi-structured data, where both of the patterns and the data are modeled by labeled unorde...red trees. An unordered tree is a directed acyclic graph with a specified node called the root, and all nodes but the root have at most one parent. Each node is labeled by a symbol drawn from an alphabet. Such unordered trees can be seen as either a generalization of itemsets in relational databases or an efficient specialization of attributed graphs in graph mining. They are also useful in various applications such as analysis of chemical compounds and mining hyperlink structures in Web. Introducing novel definitions of the support and the canonical form for unordered trees, we present an efficient algorithm called Unot that computes all labeled unordered trees appearing in a collection of data trees with frequency above a user-specified threshold. We prove that the algorithm enumerates each frequent pattern T in $ O(kb^2n) $ per pattern, where $ k $ is the size of $ T $, $ b $ is the branching factor of the data tree, and $ n $ is the total number of occurrences of $ T $ in the data trees. The keys of the algorithm are efficient enumerating all unordered trees in canonical form and incrementally computation of the occurrences based on a powerful design technique known as the reverse search.show more

Hide fulltext details.

pdf trcs216 pdf 237 KB 679  

Details

Record ID
Peer-Reviewed
Type
Created Date 2009.04.22
Modified Date 2018.08.31