Machine Learning Approach to Identifying the Dataset Threshold for the Performance Estimators in Supervised Learning
Institute for Educational Development, East Africa
Currently for small-scale machine learning projects, there is no limit which has been set by its researchers to categorise datasets for inexperienced users such as students while assessing and comparing performance of machine learning algorithms. Based on the lack of such a threshold, this paper presents a step by step guide for identifying the dataset threshold for the performance estimators in supervised machine learning experiments. The identification of the dataset threshold involves performing experiments using four different datasets having different sample sizes from the University of California Irvine (UCI) machine learning repository. The sample sizes are categorised in relation to the number of attributes and number of instances available in the dataset. The identified dataset threshold will help unfamiliar machine learning experimenters to categorise datasets correctly and hence selecting the appropriate performance estimation method
International Journal for Infonomics (IJI)
Omary, Z., & Mtenzi, F. (2010). Machine learning approach to identifying the dataset threshold for the performance estimators in supervised learning. International Journal for Infonomics (IJI), 3(3), 314-325.