Discovering Frequent Substructures in Large Unordered Trees - Collections | Kyushu University Library

Back to Results List

＜technical report＞
Discovering Frequent Substructures in Large Unordered Trees

Creator	Creator Name Asai, Tatsuya 浅井, 達哉 Affiliation Affiliation Name Kyushu University 九州大学
	Creator Name Arimura, Hiroki 有村, 博紀 Affiliation Affiliation Name Kyushu University 九州大学
	Creator Name Uno, Takeaki 宇野, 毅明 Affiliation Affiliation Name National Institute of Informatics 国立情報学研究所
	Creator Name Nakano, Shin-ichi 中野, 眞一 Affiliation Affiliation Name Gunma University 群馬大学
Language	English
Publisher	Department of Informatics, Kyushu University
Publisher	九州大学大学院システム情報科学研究院情報理学部門
Date	2003-06
Source Title	DOI Technical Report
Vol	216
Publication Type	Accepted Manuscript
Access Rights	open access
Related DOI	DOI Technical Report \|\| 216
Related DOI	http://www.i.kyushu-u.ac.jp/research/report.html
Related URI	DOI Technical Report \|\| 216
Related URI	http://www.i.kyushu-u.ac.jp/research/report.html
Relation	DOI Technical Report \|\| 216
Relation	http://www.i.kyushu-u.ac.jp/research/report.html
Abstract	In this paper, we study a data mining problem of discovering frequent substructures in a large collection of semi-structured data, where both of the patterns and the data are modeled by labeled unorde...red trees. An unordered tree is a directed acyclic graph with a specified node called the root, and all nodes but the root have at most one parent. Each node is labeled by a symbol drawn from an alphabet. Such unordered trees can be seen as either a generalization of itemsets in relational databases or an efficient specialization of attributed graphs in graph mining. They are also useful in various applications such as analysis of chemical compounds and mining hyperlink structures in Web. Introducing novel definitions of the support and the canonical form for unordered trees, we present an efficient algorithm called Unot that computes all labeled unordered trees appearing in a collection of data trees with frequency above a user-specified threshold. We prove that the algorithm enumerates each frequent pattern T in $ O(kb^2n) $ per pattern, where $ k $ is the size of $ T $, $ b $ is the branching factor of the data tree, and $ n $ is the total number of occurrences of $ T $ in the data trees. The keys of the algorithm are efficient enumerating all unordered trees in canonical form and incrementally computation of the occurrences based on a powerful design technique known as the reverse search.show more

Hide fulltext details.

File	FileType	Size	Views	Description
trcs216	pdf	237 KB	679

Details

Record ID	3055
Peer-Reviewed	Unrefereed
Type	テクニカルレポート
Created Date	2009.04.22
Modified Date	2018.08.31