Optimized Substructure Discovery for Semi-structured Data - Collections

＜technical report＞
Optimized Substructure Discovery for Semi-structured Data

Creator	Creator Name Abe, Kenji 安部, 賢治 Affiliation Affiliation Name Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
	Creator Name Kawasoe, Shinji 川副, 真治 Affiliation Affiliation Name Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
	Creator Name Asai, Tatsuya 浅井, 達哉 Affiliation Affiliation Name Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
	Creator Name Arimura, Hiroki 有村, 博紀 Affiliation Affiliation Name PRESTO, JST \| Department of Informatics, Kyushu University 独立行政法人科学技術振興機構 \| 九州大学大学院システム情報科学研究院情報理学部門
	Creator Name Arikawa, Setsuo 有川, 節夫 Affiliation Affiliation Name Department of Informatics, Kyushu University 九州大学大学院システム情報科学研究院情報理学部門
Language	English
Publisher	Department of Informatics, Kyushu University
Publisher	九州大学大学院システム情報科学研究院情報理学部門
Date	2002-03
Source Title	DOI Technical Report
Vol	206
Publication Type	Accepted Manuscript
Access Rights	open access
Related DOI	DOI Technical Report \|\| 206
Related DOI	http://www.i.kyushu-u.ac.jp/research/report.html
Related URI	DOI Technical Report \|\| 206
Related URI	http://www.i.kyushu-u.ac.jp/research/report.html
Relation	DOI Technical Report \|\| 206
Relation	http://www.i.kyushu-u.ac.jp/research/report.html
Abstract	We address the problem of finding interesting substructures from a colletion of semi-structured data such as XML or HTML. Our framework of data mining is optimized pattern discovery introduced by Fuku...da et al., where the goal of a mining algorithm is to discover a pattern that optimizes a given statistical measure such as the information entropy over a class of simple patterns. In this paper, modeling semi-structured data with labeled ordered trees, we study the efficient algorithm for the optimized pattern discovery problem for the class. In a previous paper, we developed the rightmost expansion technique and the incremental occurrence update technique by generalizing enumeration technique developed by Bayardo (SIGMOD'98) for discovering long itemsets to implement an efficient frequent pattern miner for the class of labeled ordered trees. By combining these technique with the pruning technique for optimized patterns of Morishita and Sese (PODS'00), we present an efficient algorithm for finding optimized patterns for labeled ordered trees of bounded size. Experimental results show that our algorithm perform well on a variety of size of data and range of parameters. We also show an approximation hardness result for labeled ordered trees of unbounded size.show more

Hide fulltext details.

File	FileType	Size	Views	Description
trcs206	pdf	930 KB	304
trcs206.ps	gz	0.98 MB	172

Details

Record ID	3050
Peer-Reviewed	Unrefereed
Type	テクニカルレポート
Created Date	2009.04.22
Modified Date	2018.08.31

Export

Link to this page

Search Other Services

Statistics

＜technical report＞ Optimized Substructure Discovery for Semi-structured Data

Hide fulltext details.

Details

People who viewed this item also viewed

＜technical report＞
Optimized Substructure Discovery for Semi-structured Data