The present study investigates the optimization of machine learning algorithms, specifically the Naïve Bayes classifier, in the context of Educational Data Mining (EDM). The primary objective is to scrutinize the impact of various feature selection algorithms on the performance of the model. Six feature selection methods—Information Gain, Gain Ratio, Symmetric Uncertainty Coefficient, Relief-F, Correlation-Based Feature Selection, and One R measure—are employed for an exhaustive comparative analysis. The research utilizes the "Higher Education Students Performance Evaluation" dataset available on the UCI Machine Learning Repository. This dataset is particularly robust, comprising 145 samples and 33 features, out of which 30 have been meticulously chosen for this study. The criteria for feature selection were based on their presumed relevance and potential impact on academic performance. Upon implementing the Naïve Bayes algorithm, the study discerns that the Gain Ratio method emerges as the most proficient, boasting an accuracy rate of 60%. Interestingly, aside from the Correlation-Based Feature Selection, the semester grade point average stands out as the most significant feature affecting student success rates. According to the Gain Ratio method, additional influential variables, listed in descending order of importance, include gender, the impact of projects/activities on academic success, expected grade point average upon graduation, weekly study hours, type of scholarship received, frequency of reading non-academic literature, mother's educational level, and participation in departmental seminars/conferences as well as class attendance. The research affirms the overall effectiveness of feature selection methods, with the exception of the One R method, in enhancing the predictive accuracy of the Naïve Bayes algorithm. These findings not only validate the utility of feature selection in EDM but also provide invaluable insights for researchers and educators interested in advancing the methodologies in the field of Educational Data Mining.
Feature Selection, Educational Data Mining, Naive Bayes, Higher Education.