Experiments on the Use of Machine Learning Classification Methods in Online Crime Text Filtering and Classification

Fadl Mutaher Ba-Alwi *

Faculty of Computer and Information Technology, Sana'a University, Yemen.

Mohammed Albared

Faculty of Computer and Information Technology, Sana'a University, Yemen.

*Author to whom correspondence should be addressed.


Abstract

With the exponential growth of textual information available from the Internet, there has been an emergent need to find relevant and in-time knowledge about crimes from this huge size of information. The huge size of such data makes the process of retrieving and analyzing texts manually a very difficult task. Furthermore, domain-specific documents classification is a hard task and suffers from low classification efficiency due to overlapping among domain subclasses. This work is focused on finding an appropriate classification model for crime domain-specific knowledge on the Web. To do that, the two-level classification method for online crime text filtering and classification is used. In each level, three feature selection methods (Gini Index, Chi-square statistic and Information gain) and three learning methods (K-nearest neighbor, Naive Bayes and support vector machine (SVM)) are investigated. The experimental results in the first level indicate that Information gain feature selection method performs the best for crime terms selection and both SVM and NB exhibit the best performance for crime text filtering. Furthermore, the experimental results in the second  level indicate that Gini index feature selection method performs the best for crime types terms selection and SVM classifier exhibits the best performance on classifying crime documents into their appropriate crime types.

Keywords: Crime data mining, web mining, focused crawling, classification


How to Cite

Ba-Alwi, Fadl Mutaher, and Mohammed Albared. 2015. “Experiments on the Use of Machine Learning Classification Methods in Online Crime Text Filtering and Classification”. Current Journal of Applied Science and Technology 12 (5):1-12. https://doi.org/10.9734/BJAST/2016/21504.

Downloads

Download data is not yet available.