Predictive analytics of student performance: Multi-method and code
DOI:
https://doi.org/10.23917/jramathedu.v9i4.4643Keywords:
Customized Learning, Academic Performance, Educational Data AnalysisAbstract
The maintenance of a high level of education in universities can be a challenging task due to low academic performance. Despite the significant amount of collected diagnostic data, education managers underutilize machine learning methods to improve the accuracy of predicting academic performance. Authors apply a multi-method approach for data analysis using simple logistic and linear regressions, k-means clustering, that all together gave a synergetic effect. The proposed approach differs from known analogs in that, firstly, the dimensionality of the feature space increases due to the normalization of scores onto a single scale and the creation of new features: the index and rank of students, as well as the changes in performance across various activities for each student. Secondly, students at academic risk are forecasted, and the statistical significance of the features included in the model is evaluated. Thirdly, for each student, the final score for the semester is forecasted using an linear regressive model of academic performance. Fourthly, groups of students with similar learning trajectories are identified for customization of consultations. The authors managed to achieve a high predictive ability of models based on historical training data: binary prediction of exam passing in 90% of cases, prediction of individual assessment in 70% of cases.
References
Ahmad, N. B., Alias, U. F., Mohamad, N., & Yusof, N. (2019). Principal Component Analysis and Self-Organizing Map Clustering for Student Browsing Behaviour Analysis. Procedia Computer Science, 163, 550–559. https://doi.org/10.1016/J.PROCS.2019.12.137
Aissaoui, O., Madani, Y., Oughdir, L., Dakkak, A., & EL ALLIOUI, Y. (2020). A Multiple Linear Regression-Based Approach to Predict Student Performance (pp. 9–23). https://doi.org/10.1007/978-3-030-36653-7_2
Alier, M., Casañ Guerrero, M. J., Amo, D., Severance, C., & Fonseca, D. (2021). Privacy and e-learning: A pending task. Sustainability (Switzerland), 13(16). https://doi.org/10.3390/SU13169206
Araveeporn, A. (2023). Comparison of Logistic Regression and Discriminant Analysis for Classification of Multicollinearity Data. WSEAS TRANSACTIONS ON MATHEMATICS, 22, 120–131. https://doi.org/10.37394/23206.2023.22.15
Arzamastsev, S. A., Bgatov, M. V., Kartysheva, E. N., Derkunskii, V. A., & Semenchikov, D. N. (2018). Forecasting Subscriber Churn: Comparison of Machine Learning Methods. Computer Tools in Education, 5, 5–23.
Bayazit, A., Ilgaz, H., Gönüllü, İ., & Erden, Ş. (2022). Profiling students via clustering in a flipped clinical skills course using learning analytics. Medical Teacher, 45(7), 724–731. https://doi.org/10.1080/0142159x.2022.2152663
Boehmke, B., & Greenwell, B. (2020). Hands-on Machine Learning with R. In CRC Press. https://www.routledge.com/Hands-On-Machine-Learning-with-R/Boehmke-Greenwell/p/book/9781138495685
Bonaccorso, Giuseppe. (2018). Machine Learning Algorithms. In Packt Publishing: Vol. 2nd ed. Packt Publishing Ltd. https://www.oreilly.com/library/view/machine-learning-algorithms/9781789347999/
Bruce, P., & Bruce, A. (2017). Practical Statistics for Data Scientists. O’Reilly. https://www.oreilly.com/library/view/practical-statistics-for/9781491952955/ch04.html
Elisabeta, P. M., & Alexandru, M. R. (2018). Comparative Analysis of E-Learning Platforms on The Market. 2018 10th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), 1–4. https://doi.org/10.1109/ECAI.2018.8679004
Hafsa, M., Wattebled, P., Jacques, J., & Jourdan, L. (2023). E-learning recommender system dataset. Data in Brief, 47, 108942. https://doi.org/https://doi.org/10.1016/j.dib.2023.108942
How to Do a T-Test in Python | Built In. (n.d.). Retrieved March 8, 2024, from https://builtin.com/data-science/t-test-python
Humbert, P., Le Bars, B., & Minvielle, L. (2022). Robust Kernel Density Estimation with Median-of-Means principle. International Conference on Machine Learning.
Kahramanoğlu, R. (2018). Analysis of Changes in the Affective Characteristics and Communicational Skills of Prospective Teachers: Longitudinal Study. International Journal of Progressive Education, 14(6), 177–199. https://doi.org/10.29329/IJPE.2018.179.14
Komosny, D., & Rehman, S. U. (2022). A Method for Cheating Indication in Unproctored On-Line Exams. Sensors (Basel, Switzerland), 22(2). https://doi.org/10.3390/S22020654
Liu, M., & Yu, D. (2023). Towards intelligent E-learning systems. Education and Information Technologies, 28(7), 7845–7876. https://doi.org/10.1007/s10639-022-11479-6
Olatunde-Aiyedun, T. (2021). Student Teachers’ Attitude towards Teaching Practice. 8, 6–17.
Oluwadele, D., Singh, Y., & Adeliyi, T. (2023). E-Learning Performance Evaluation in Medical Education—A Bibliometric and Visualization Analysis. Healthcare, 11, 232. https://doi.org/10.3390/healthcare11020232
Petrovic, S. V. (2006). A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters.
Qiu, F., Zhang, G., Sheng, X., Jiang, L., Zhu, L., Xiang, Q., Jiang, B., & Chen, P. (2022). Predicting students’ performance in e-learning using learning process and behaviour data. Scientific Reports, 12(1), 453. https://doi.org/10.1038/s41598-021-03867-8
Reiser, E., & Joseph’s College, S. (2017). Blending Individual and Group Assessment: A Model for Measuring Student Performance. Journal of the Scholarship of Teaching and Learning, 17(4), 83–94. https://doi.org/10.14434/JOSOTL.V17I4.21938
Rykov, A., De Amorim, R. C., Makarenkov, V., & Mirkin, B. (2024). Inertia-Based Indices to Determine the Number of Clusters in K-Means: An Experimental Evaluation. IEEE Access, 12, 11761–11773. https://doi.org/10.1109/ACCESS.2024.3350791
Shahiri, A., Husain, W., & Abdul Rashid, N. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414–422. https://doi.org/10.1016/j.procs.2015.12.157
Sheikholeslami, G., & Zhang, A. (1998). A Multi-Resolution Clustering Approach for Very Large Spatial Databases *. Proceedings of the 24th VLDB Conference.
Shou, Z., Xie, M., Mo, J., & Zhang, H. (2024). Predicting Student Performance in Online Learning: A Multidimensional Time-Series Data Analysis Approach. Applied Sciences, 14(6). https://doi.org/10.3390/app14062522
Troussas, C., Virvou, M., & Alepis, E. (2013). Comulang: towards a collaborative e-learning system that supports student group modeling. SpringerPlus, 2(1), 387. https://doi.org/10.1186/2193-1801-2-387
Urrutia-Aguilar, M., Fuentes-Garcia, R., Martinez, D., Beck, E., Ortiz, S., & Guevara-Guzmán, R. (2016). Logistic Regression Model for the Academic Performance of First-Year Medical Students in the Biomedical Area. Creative Education, 07, 2202–2211. https://doi.org/10.4236/ce.2016.715217
Vladova, A. (2024). Developing group and individual performance paths based on e-learning platform data. Large-Scale Systems Control, 111, 179–196.
Vladova, A., & Shek, E. (2021). Data preprocessing for machine analysis of sales representatives’ key performance indicators. Business Informatics, 15(3), 48–59. https://doi.org/10.17323/2587-814X.2021.3.48.59
Vladova, A. Yu., Vladov, Yu. R., & Yakimov, A. I. (2021). Visualizing Results of Promoting Campaigns. 2021 14th International Conference Management of Large-Scale System Development (MLSD), 1–4. https://doi.org/10.1109/MLSD52249.2021.9600205
Wati, M., Rahmah, W. H., Novirasari, N., Haviluddin, Budiman, E., & Islamiyah. (2021). Analysis K-Means Clustering to Predicting Student Graduation. Journal of Physics: Conference Series, 1844(1), 012028. https://doi.org/10.1088/1742-6596/1844/1/012028
Węglarczyk, S. (2018). Kernel density estimation and its application. ITM Web of Conferences, 23, 00037. https://doi.org/10.1051/ITMCONF/20182300037
Yadav, N., & Deshmukh, S. (2023). Prediction of Student Performance Using Machine Learning Techniques: A Review (pp. 735–741). https://doi.org/10.2991/978-94-6463-136-4_63
Yağcı, M. (2022). Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learning Environments, 9(1), 11. https://doi.org/10.1186/s40561-022-00192-z
Yang, S. J. H., Lu, O. H. T., Huang, A. Y. Q., Huang, J. C. H., Ogata, H., & Lin, A. J. Q. (2018). Predicting Students’ Academic Performance Using Multiple Linear Regression and Principal Component Analysis. Journal of Information Processing, 26, 170–176. https://doi.org/10.2197/IPSJJIP.26.170
Ye, M., Sheng, X., Lu, Y., Zhang, G., Chen, H., Jiang, B., Zou, S., & Dai, L. (2022). SA-FEM: Combined Feature Selection and Feature Fusion for Students’ Performance Prediction. Sensors, 22(22), 8838. https://doi.org/10.3390/s22228838
Zafar, B., Alhassan, A., & Mueen, A. (2020). Predict Students’ Academic Performance based on their Assessment Grades and Online Activity Data. International Journal of Advanced Computer Science and Applications, 11. https://doi.org/10.14569/IJACSA.2020.0110425
Zahoranský, D., & Polasek, I. (2015). Text search of surnames in some Slavic and other morphologically rich languages using rule based phonetic algorithms. IEEE Transactions on Audio, Speech and Language Processing, 23(3), 553–563. https://doi.org/10.1109/TASLP.2015.2393393
Zhang, Y., Yun, Y., An, R., Cui, J., Dai, H., & Shang, X. (2021). Educational Data Mining Techniques for Student Performance Prediction: Method Review and Comparison Analysis. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.698490
Submitted
Accepted
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Alla Vladova, Katsiaryna M. Borchyk
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.