<テクニカルレポート>
Instance Based Table Integration Algorithm for Multilingual Tables on the Web

作成者
本文言語
出版者
発行日
収録物名
出版タイプ
アクセス権
関連DOI
関連URI
関連情報
概要 We present an instance based table integration algorithm. A table is a set of instances of a record which consists of fields. A field is a pair of an attribute name and a sequence of attribute values ...of the same type. Given tables, the algorithm calculates two numerical features for each field using character codes and then finds correspondence between fields among tables. The novelty of the algorithm is that it uses the character code chart for the language in which the contents of the tables are written. This enables that a field can be represented by only two types of features. The algorithm requires neither an attribute value contained in all input tables nor attribute names. So, the algorithm is suitable for tables obtained from Web data, as long as they are written in the same language. Applying the algorithm for real Web data written in many languages, we demonstrate that the algorithm yields the accurate results and is robust for errors. The languages are Chinese, English, Germany, Japanese, and Korean.続きを見る

本文ファイル

pdf trcs217 pdf 78.2 KB 270  

詳細

レコードID
査読有無
注記
タイプ
登録日 2009.09.15
更新日 2018.08.31

この資料を見た人はこんな資料も見ています