<テクニカルレポート>
Unsupervised Spam Detection based on String Alienness Measures

作成者
本文言語
出版者
発行日
収録物名
出版タイプ
アクセス権
関連DOI
関連URI
関連情報
概要 We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (i.e. how different it... is from others) of substring equivalence classes within a given set of strings. A document is then classified as spam if it contains a characteristic equivalence class as a substring. The proposed method is unsupervised, independent of language, and is very efficient. Computational experiments conducted on data collected from Japanese web forums show fairly good results.続きを見る

本文ファイル

pdf trcs229 pdf 477 KB 472  

詳細

レコードID
査読有無
主題
タイプ
登録日 2009.04.22
更新日 2017.01.24

この資料を見た人はこんな資料も見ています