<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.3" article-type="research-article"><front><journal-meta><journal-id journal-id-type="issn">2541-2590</journal-id><journal-title-group><journal-title>JRAMathEdu (Journal of Research and Advances in Mathematics Education)</journal-title><abbrev-journal-title>J.Res.Adv.Math.Educ</abbrev-journal-title></journal-title-group><issn pub-type="epub">2541-2590</issn><issn pub-type="ppub">2503-3697</issn><publisher><publisher-name>Lembaga Pengembangan Publikasi Ilmiah dan Buku Ajar, Universitas Muhammadiyah Surakarta</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.23917/jramathedu.v9i4.4643</article-id><article-categories/><title-group><article-title>Predictive analytics of student performance: Multi-method and code</article-title></title-group><contrib-group><contrib contrib-type="author"><name><surname>Vladova</surname><given-names>Alla Yu.</given-names></name><address><country>Russian Federation</country><email>alla.vladova@gmail.com</email></address><xref ref-type="aff" rid="AFF-1"/><xref ref-type="corresp" rid="cor-0"/></contrib><contrib contrib-type="author"><name><surname>Borchyk</surname><given-names>Katsiaryna M.</given-names></name><address><country>Belarus</country></address><xref ref-type="aff" rid="AFF-2"/></contrib><aff id="AFF-1">Financial University under the Government of Russian Federation</aff><aff id="AFF-2">Belarusian-Russian University</aff></contrib-group><author-notes><corresp id="cor-0"><bold>Corresponding author: Alla Yu. Vladova</bold>, Financial University under the Government of Russian Federation .Email:<email>alla.vladova@gmail.com</email></corresp></author-notes><pub-date date-type="pub" iso-8601-date="2024-10-30" publication-format="electronic"><day>30</day><month>10</month><year>2024</year></pub-date><pub-date date-type="collection" iso-8601-date="2024-10-31" publication-format="electronic"><day>31</day><month>10</month><year>2024</year></pub-date><volume>9</volume><issue>4</issue><issue-title>Volume 9 Issue 4 October 2024</issue-title><fpage>190</fpage><lpage>204</lpage><history><date date-type="received" iso-8601-date="2024-3-28"><day>28</day><month>3</month><year>2024</year></date><date date-type="rev-recd" iso-8601-date="2024-10-16"><day>16</day><month>10</month><year>2024</year></date><date date-type="accepted" iso-8601-date="2024-10-23"><day>23</day><month>10</month><year>2024</year></date></history><permissions><copyright-statement>Copyright (c) 2024 Alla Vladova, Katsiaryna M. Borchyk</copyright-statement><copyright-year>2024</copyright-year><copyright-holder>Alla Vladova, Katsiaryna M. Borchyk</copyright-holder><license><ali:license_ref xmlns:ali="http://www.niso.org/schemas/ali/1.0/">https://creativecommons.org/licenses/by-nc/4.0</ali:license_ref><license-p>This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.</license-p></license></permissions><self-uri xlink:href="https://journals2.ums.ac.id/index.php/jramathedu/article/view/4643" xlink:title="Predictive analytics of student performance: Multi-method and code">Predictive analytics of student performance: Multi-method and code</self-uri><abstract><p>The maintenance of a high level of education in universities can be a challenging task due to low academic performance. Despite the significant amount of collected diagnostic data, education managers underutilize machine learning methods to improve the accuracy of predicting academic performance. Authors apply a multi-method approach for data analysis using simple logistic and linear regressions, k-means clustering, that all together gave a synergetic effect. The proposed approach differs from known analogs in that, firstly, the dimensionality of the feature space increases due to the normalization of scores onto a single scale and the creation of new features: the index and rank of students, as well as the changes in performance across various activities for each student. Secondly, students at academic risk are forecasted, and the statistical significance of the features included in the model is evaluated. Thirdly, for each student, the final score for the semester is forecasted using an linear regressive model of academic performance. Fourthly, groups of students with similar learning trajectories are identified for customization of consultations. The authors managed to achieve a high predictive ability of models based on historical training data: binary prediction of exam passing in 90% of cases, prediction of individual assessment in 70% of cases.</p></abstract><kwd-group><kwd>Customized Learning</kwd><kwd>Academic Performance</kwd><kwd>Educational Data Analysis</kwd></kwd-group><funding-group><funding-statement>This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors</funding-statement></funding-group><custom-meta-group><custom-meta><meta-name>File created by JATS Editor</meta-name><meta-value><ext-link ext-link-type="uri" xlink:href="https://jatseditor.com" xlink:title="JATS Editor">JATS Editor</ext-link></meta-value></custom-meta><custom-meta><meta-name>issue-created-year</meta-name><meta-value>2024</meta-value></custom-meta></custom-meta-group></article-meta></front><body><sec><title>INTRODUCTION</title><p>The academic performance of students is one of the most important characteristics of the educational activities of an educational institution, by which professors and education managers can judge the results achieved or the problems that exist. Each university has its own systems for assessing academic performance, including various indicators of academic activities <xref ref-type="bibr" rid="BIBR-10">(Elisabeta &amp; Alexandru, 2018)</xref>. The academic performance of students in mathematical disciplines is usually assessed through computer tests, expert evaluation of semester projects, preparation level for seminars, and attendance (<xref ref-type="bibr" rid="BIBR-39">(Zafar et al., 2020)</xref>. The quality of students' work is then used for effective educational process management, in making decisions about awarding state academic and named scholarships, issuing diplomas with honors, and other tasks. Thus, the research covers the following tasks: (1) how to effectively use historical data to predict student performance; (2) predictive models that are the most understandable to education managers; (3) how to reduce the subjective influence of experts on the final grades of students.</p></sec><sec><title>Literature review</title><p>The researchers <xref ref-type="bibr" rid="BIBR-41">(Zhang et al., 2021)</xref> state that predicting student performance helps all stakeholders in the educational process. For example, students can choose appropriate courses or exercises and make plans for the semester accordingly <xref ref-type="bibr" rid="BIBR-14">(Ibrahim &amp; Rusli, 2007)</xref>, discovering the relationships between courses. Professors can adjust educational materials and curricula depending on students' abilities and identify students at risk within the group <xref ref-type="bibr" rid="BIBR-16">(Kloft et al., 2014)</xref>. Managers in the education sector can review the curriculum and optimize the set of disciplines. The prediction could guide course selection and early warning on student learning, but finding the key factors affecting most education behaviors is a more important task. That is because (1) the key feature could correspond to interventions of education; (2) the reason of success or failure could reflect the pattern of student learning; (3) understanding of these factors could provide plan settings, course assignments, and learning sequence with suggestions. As <xref ref-type="bibr" rid="BIBR-15">(Kahramanoğlu, 2018)</xref> notes, the same characteristics help to indirectly analyze the hard and soft skills of prospective teachers.</p><p>The article <xref ref-type="bibr" rid="BIBR-36">(Yağcı, 2022)</xref> proposes a machine learning model for predicting the final scores of undergraduate students, using their scores for midterm exams as the input data. To forecast the exam scores, the performance metrics of random forest, k-nearest neighbor, support vector machines, logistic regression, and naive Bayes algorithms are calculated and compared. The dataset consisted of the academic performance scores of 1854 students at a state university in Turkey during the autumn semester of 2019-2020. Predictions are made using three types of features: midterm exam scores, as well as department and faculty names. The proposed model achieved a classification accuracy of 70–75%. The insufficient accuracy of the model can be explained by the presence of low-variable features.</p><p>Authors of the study <xref ref-type="bibr" rid="BIBR-20">(Oluwadele et al., 2023)</xref> assessed academic performance in the field of medical education through indicators of students' acquisition and perception of knowledge, level of confidence, ease of use of the e-learning platform and willingness to recommend e-learning. The flaw of the proposed approach lies in the qualitative nature of the analyzed features, their strong dependence on the opinion of different experts.</p><p>Researchers <xref ref-type="bibr" rid="BIBR-18">(Liu &amp; Yu, 2023)</xref> uses the online student actions that the e-learning platform allows you to collect, namely: the time it takes to answer a question or submit an assignment, the number of missed questions, excessive tardiness, cheating on tests, derogatory comments in online discussions. The disadvantage of this approach is that the data was taken without additional transformations that affect the performance of the model.</p><p>Exploration <xref ref-type="bibr" rid="BIBR-38">(Ye et al., 2022)</xref> offers a model for predicting the effectiveness of online learning, based on the selection and merging of features. The model uses the relationship between behaviors and examines whether combinations of behaviors are better predictors of academic performance.</p><p>The researchers <xref ref-type="bibr" rid="BIBR-35">(Yadav &amp; Deshmukh, 2023)</xref> emphasize that the most significant factors influencing students' academic performance are low initial scores, family support, living arrangements, gender, previous performance, students' internal assessment, average academic performance, and students' activity in e-learning. They also note that the plan to improve students' academic performance should take into account additional consultations for students with low performance. This helps both students and teachers to overcome the challenges faced during education. The idea of selecting students for additional consultations formes the basis of the fourth stage of the proposed method.</p><p>Thus, it is possible to identify the following features that are used to predict academic performance: attendance coefficient; ratio of scores for work or campus activities to the total possible certification score; performance dynamics, the change in scores between the first and second certification. This change in estimates is the basis of the proposed method.</p><p>The authors <xref ref-type="bibr" rid="BIBR-37">(Yang et al., 2018)</xref> designed several week learning activity, includes homework, quizzes, video-based learning. They applied multiple linear regression model to predict students’ academic scores. They also reworked the well-known metrics for assessing the accuracy of the models, using data obtained during the cross-validation stage. They believe, however, that the models are applicable to the courses, learning activities and data attributes for which they were developed.</p><fig id="figure-roi7bp" ignoredToc=""><label>Figure 1</label><caption><p>Tag cloud of scores</p></caption><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45012" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><p>Researchers note <xref ref-type="bibr" rid="BIBR-28">(Troussas et al., 2013)</xref>, that clustering users into groups with common interests is very useful when learning multiple languages. They used the k-means algorithm because of its simplicity. Authors <xref ref-type="bibr" rid="BIBR-6">(Bayazit et al., 2022)</xref> use the same clustering algorithm to identify students with low engagement. A known drawback of the algorithm is that the number of clusters is set a priori and does not sufficiently reflect students with a satisfactory level of interaction. Two clustering quality indexes are tested and compared in <xref ref-type="bibr" rid="BIBR-21">(Petrovic, 2006)</xref>. Experimental results comparing the effectiveness of a multiple classiﬁer with the two indexes implemented show that the system using the Silhouette index produces slightly more accurate results than the system that uses the Davies-Bouldin index.</p><p>Based on the literature review, it can be concluded that there is significant interest in predicting students' academic performance using machine learning methods. It has been established that academic performance prediction is carried out either through binary classification – whether a student will pass the exam or not, or through regression to predict the potential score for the exam. It has been identified that methods grouping students based on similarities in learning trajectories are not commonly applied.</p><sec><title>Quick overview of the initial dataset</title><p>The quality of the data of the e-learning platform has a direct impact on the accuracy of predictive models <xref ref-type="bibr" rid="BIBR-22">(Qiu et al., 2022)</xref>. The initial dataset with intermediate and final scores of students for the first semester in the discipline Data Analysis contains 20 features: two text features with the surnames and first names of students; number of a group; eight digital features with students' scores for homework, self-study, term papers and tests posted in the e-learning platform; and the remaining numerical features contain the scores given by the teacher for programming skills, activity in the classroom, as well as final scores.</p><p>To visualize numerical features <xref ref-type="bibr" rid="BIBR-30">(Vladova, 2024)</xref> we apply the WordCloud library of the Python language, and built the tag cloud shown in <xref ref-type="fig" rid="figure-roi7bp">Figure 1</xref>. Based on <xref ref-type="fig" rid="figure-roi7bp">Figure 1</xref>, it is evident that the educational institution employs two grading scales: midterm performance is evaluated on a scale from 0 to 5.0, while the final scores are converted to a scale from 0 to 100.0 with maximum scores of 5 and 100 points respectively. The majority of students demonstrate good and excellent knowledge</p><fig id="figure-1" ignoredToc=""><label>Figure 2</label><caption><p>The map of the number of scores for different activities: a) taking into account the students' gender; b) taking into account the students' group</p></caption><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45013" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><p>The map of the number of scores for different activities: a) taking into account the students' gender; b) taking into account the students' group</p></sec><sec><title>Purpose and objectives of the study</title><p>The aim of the study is to enhance students' preparedness by developing models for assessing and predicting students' academic performance based on their current scores. The objectives of this research are:</p><list list-type="order"><list-item><p>Conduct a statistical analysis of student scores for a particular discipline: assess data imbalance, identify gaps, construct score distribution densities, and explore the correlation matrix of scores.</p></list-item><list-item><p>Increase the dimensionality of the feature space by normalizing scores to a common scale and creating new features such as student indexes, ranks, and the differences between scores at different time points, obtained by each student.</p></list-item><list-item><p>Predict students at academic risk and evaluate the statistical significance of the features included in the model.</p></list-item><list-item><p>Customize consultations for student groups with similar learning trajectories.</p></list-item><list-item><p>Forecast semester final scores for each student. This approach involves a comprehensive analysis, modeling, and customization of consultations to effectively improve students' academic performance levels in universities.</p></list-item></list></sec><sec><title>Statistical analysis of the initial dataset</title><p>It is essential to address the data imbalance highlighted in the primary statistical analysis:</p><list list-type="order"><list-item><p>Gender imbalance, with women constituting 30% more than men</p></list-item><list-item><p>Ibalance in the number and level of scores received by students for various activities. The histograms depicting the number of students assessed for each activity, segmented by gender and group, are presented in <xref ref-type="fig" rid="figure-1">Figure 2</xref>.</p></list-item></list><p>The analysis revealed that male students are more involved in coursework and significantly more active in classes, while a significantly larger number of female students completed the second test. Bonus assignments are challenging for both male and female students. There are missing values in the data shown in <xref ref-type="fig" rid="figure-2">Figure 3</xref> because not all students completed the full amount of work. These</p><fig id="figure-2" ignoredToc=""><label>Figure 3</label><caption><p>Map of data omissions</p></caption><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45014" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><fig id="figure-3" ignoredToc=""><label>Figure 4</label><caption><p>Correlation matrix</p></caption><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45015" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><p>missing values are logically replaced with zero scores. From <xref ref-type="fig" rid="figure-2">Figure 3</xref>, it is evident that there is a positive trend in programming skills but a negative trend in class activity.</p><p>Understanding the relationships between features allows for better preparation for the clustering process, eliminating redundant or highly correlated features (Hafsa et al., 2023). The lower triangular correlation matrix in <xref ref-type="fig" rid="figure-3">Figure 4</xref> shows that the highest coefficients of linear correlation are observed between the second module certification and the final score (0.86), as well as between exam scores and the final score (0.93).</p><fig id="figure-4" ignoredToc=""><label>Figure 5</label><caption><p>Score distribution densities</p></caption><p>Figure description...</p><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45016" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><fig id="figure-5" ignoredToc=""><label>Figure 6</label><caption><p>Stages of the methodology</p></caption><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45017" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><p>We made the density of score distributions across the main control points (tests, exam, final score) close to normal shape shown in <xref ref-type="fig" rid="figure-4">Figure 5</xref> using the Kernel density estimation method (<xref ref-type="bibr" rid="BIBR-13">(Humbert et al., 2022)</xref>; <xref ref-type="bibr" rid="BIBR-34">(Węglarczyk, 2018)</xref>). Moreover, the grading scales for tests vary from 0 to 10, for the exam from 0 to 60, for the final score from 0 to 100. The average score for men for the second test, exam and final score is slightly shifted to the left relative to the average score for women.</p></sec></sec><sec><title>METHODS</title><p>The proposed methodology for predicting the final score includes five stages shown in <xref ref-type="fig" rid="figure-5">Figure 6</xref>. The first stage involves performing a statistical analysis to identify imbalances, data omissions, high and low correlations. In the second stage, new features are formed, and the dataset is aggregated by student index and group number, forming a performance for each student. In the third stage, with a sufficient number of scores in the aggregate, a binary classification predictive model for students who passed and did not pass the exam is built along with an error assessment. There are several methods that implement a binary classification and the most effective methods are those derived from linear discriminant analysis (such as Quadratic Discriminant Analysis (QDA), Regularized Discriminant Analysis (RDA) and Logistic regression <xref ref-type="bibr" rid="BIBR-4">(Araveeporn, 2023)</xref>. The choice of logistic regression is dictated by the fact that it does not assume normal distribution of independent variables and homogeneity of variation-covariance matrices. The quality of separation is evaluated using the F1 metric - the harmonic mean of accuracy and completeness:</p><p><inline-formula><tex-math id="math-1"><![CDATA[ \documentclass{article} \usepackage{amsmath} \begin{document} \displaystyle F1 = \frac{\text{TP}}{TP + \frac{FP + FN}{2}} \end{document} ]]></tex-math></inline-formula> , (1)</p><p>where TP, FP, FN – are true positive, false positive and false negative forecasts.</p><p>In the fourth stage performance is forecasted using linear regression. As a result, a trend and a predicted performance score are determined for each student. The quality of prediction is evaluated through mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE) <xref ref-type="bibr" rid="BIBR-2">(Aissaoui et al., 2020)</xref>:</p><p><inline-formula><tex-math id="math-2"><![CDATA[ \documentclass{article} \usepackage{amsmath} \begin{document} \displaystyle MAE = \frac{\sum_{i}^{n}{|y_{i} - x_{i}|}}{n} \end{document} ]]></tex-math></inline-formula> , <inline-formula><tex-math id="math-3"><![CDATA[ \documentclass{article} \usepackage{amsmath} \begin{document} \displaystyle \text{MSE} = \frac{1}{n}\sum_{i}^{n}{(y_{i} - x_{i})}^{2} \end{document} ]]></tex-math></inline-formula> , <inline-formula><tex-math id="math-4"><![CDATA[ \documentclass{article} \usepackage{amsmath} \begin{document} \displaystyle \text{RMSE} = \sqrt{\text{MSE}} \end{document} ]]></tex-math></inline-formula> (2)</p><p>where yi is the prediction and xi is the true value, n – the observations number.</p><p>Finally, the fifth stage entails clustering students based on the similarity of their score set. The quality of clustering is determined by the silhouette coefficient. It measures how well an object matches its cluster compared to other clusters: a value close to 1 indicates that the object is well clustered, a value close to 0 indicates that the object is on the border between two clusters, a negative value indicates incorrect clustering.</p><p>The Silhouette Score can be calculated for each feature in the cluster and then averaged for an overall assessment of the clustering quality <xref ref-type="bibr" rid="BIBR-8">(Bonaccorso, 2018)</xref>. The formula for calculating the silhouette coefficient for the individual object i is as follows:</p><p><inline-formula><tex-math id="math-5"><![CDATA[ \documentclass{article} \usepackage{amsmath} \begin{document} \displaystyle s\left( i \right) = \frac{b\left( i \right) - a(i)}{max(a\left( i \right),b\left( i \right))} \end{document} ]]></tex-math></inline-formula> (3)</p><p>where a(i) — The average distance from feature i to all other features in the same cluster. This value indicates how close the features within the cluster are to each other; b(i) — The minimum average distance from feature i to features in the nearest other cluster. This value indicates how close the feature is to other clusters.</p><p>The average value of the silhouette coefficients of all objects is calculated according to the formula:</p><p><inline-formula><tex-math id="math-6"><![CDATA[ \documentclass{article} \usepackage{amsmath} \begin{document} \displaystyle S = \frac{1}{n}\sum_{i = 1}^{n}{s(i)} \end{document} ]]></tex-math></inline-formula>, (4)</p><p>where n is the total number of objects.</p><p>The proposed method differs from known analogs in that, firstly, the dimensionality of the feature space increases due to the normalization of scores onto a single scale and the creation of new features: the index and rank of students, as well as the changes in performance across various activities for each student. Secondly, students at academic risk are forecasted, and the statistical significance of the features included in the model is evaluated. Thirdly, for each student, the final score for the semester is forecasted using an linear regressive model of academic performance. Fourthly, groups of students with similar learning trajectories are identified for customization of consultations.</p><p>The practical significance of this method lies in the possibility of obtaining new knowledge about the learning process.</p><sec><title>New features generation</title><p>Creating new features improves models in the following aspects: reducing calculation speed or required data volume, enhancing model interpretability, and increasing predictive accuracy <xref ref-type="bibr" rid="BIBR-31">(Vladova &amp; Shek, 2021)</xref> . Based on the names and endings of Russian surnames <xref ref-type="bibr" rid="BIBR-40">(Zahoranský &amp; Polasek, 2015)</xref>, the binary attribute Sex has been added. To anonymize the data <xref ref-type="bibr" rid="BIBR-3">(Alier et al., 2021)</xref>, a feature called Index is introduced, comprised of the first letters of the student's last name, first name, gender, and group number. It was found that test, project, and homework scores range from 0 to 5 points, midterm assessment scores vary from 0 to 22 points, exam scores range from 0 to 60 points, and final scores range from 0 to 100 points. Therefore, all scores are normalized to a range from 0 to 1 by dividing by their respective maximum values. This transformation results in new normalized features that comprehensively characterize an academic performance on a scale from 0 to 1. The distributions of these features are shown in <xref ref-type="fig" rid="figure-6">Figure 7</xref>.</p><p>The next feature contains the student's rank relative to other students in the group and is obtained by calculating the sum of the product of the scores by the weighting coefficients and sorting the results from maximum to minimum shown in <xref ref-type="table" rid="table-1">Table 1</xref>.</p><fig id="figure-6" ignoredToc=""><label>Figure 7</label><caption><p>Distribution of normalized scores by main types of academic work</p></caption><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45018" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><table-wrap id="table-1" ignoredToc=""><label>Table 1</label><caption><p>Student’s rank. Fragment</p></caption><table frame="box" rules="all"><thead><tr><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Index</p></th><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Rank</p></th></tr></thead><tbody><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>ВАf5   </p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>1</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>МАf5</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>2</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p/></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>…..</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>НАm4   </p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>49</p></td></tr></tbody></table><table-wrap-foot><p>Table </p></table-wrap-foot></table-wrap><table-wrap id="table-2" ignoredToc=""><label>Table 2</label><caption><p>Academic performance. Fragment</p></caption><table frame="box" rules="all"><thead><tr><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Index</p></th><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Campus work progress</p></th><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Programming skills progress</p></th><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Activity progress</p></th><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Test progress</p></th><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Attestation progress</p></th></tr></thead><tbody><tr><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>ВДmf5   </p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.11</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.6</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.2</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>-0.1</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.13</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>ВКf5</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.23</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.0</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.0</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.9</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.41</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>ГАf5</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>-0.06</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.0</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.0</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.1</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.01</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>MАf5</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.12</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.0</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.4</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>0.0</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>-0.11</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="left" valign="top"><p/></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>….</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p/></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p>….</p></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p/></td><td colspan="1" rowspan="1" style="" align="left" valign="top"><p/></td></tr></tbody></table></table-wrap><p>To provide a more comprehensive view of a student's academic progress we consider changes in academic performance across various activities shown in <xref ref-type="table" rid="table-2">Table 2</xref>. Negative values indicate a decrease in score in the second part of the semester. This approach can be particularly useful for educators and academic institutions to gain a deeper understanding of student development.</p><p>By calculating the differences between normalized scores at different time points (e.g., Campus work 2 - Campus work 1, Attestation 2 – Attestation 1, etc.), these new features effectively capture the change in an academic performance or achievement in specific activities over a period of time, such as a semester.</p><p>The article (Shahiri et al., 2015) gives the top four methods for predicting academic performance. The neural network has the highest prediction accuracy (98%), followed by the decision tree (91%). Further, the support vector machine and the K-nearest neighbor machine gave the same accuracy, which is (83%). Finally, the method that has lower prediction accuracy is the naive Bayes method (76%). Since the number of scores for each student in the existing dataset is small, it is inefficient to use a neural network. For the available data, it is rational to use one of binary classification methods.</p></sec><sec><title>Binary classification of academic performance</title><p>At the first stage we need to classify students into those who will pass or fail the exam. The logistic regression is one of binary classification method applicable when the dependent variable is dichotomous. Let's assume that passing the exam is the target event (A. Yu. Vladova, 2024). There is</p><fig id="figure-7" ignoredToc=""><label>Figure 8</label><caption><p>Results of classifying students into those who passed and those who did not pass the exam</p></caption><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45019" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><table-wrap id="table-3" ignoredToc=""><label>Table 3</label><caption><p>Feature Importance</p></caption><table frame="box" rules="all"><thead><tr><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Number</p></th><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Feature</p></th><th colspan="1" rowspan="1" style="" align="center" valign="top"><p>Importance</p></th></tr></thead><tbody><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>0</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>Sex</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>0.502</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>5</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>Test progress</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>0.475</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>3</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>Programming skills progress</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>0.254</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>2</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>Campus work progress</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>0.189</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>1</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>Group</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>0.169</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>4</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>Activity progress</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>0.125</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>7</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>Attestation progress</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>0.054</p></td></tr><tr><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>6</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>Bonus task progress</p></td><td colspan="1" rowspan="1" style="" align="center" valign="top"><p>0.009</p></td></tr></tbody></table><table-wrap-foot><p>Table note...</p></table-wrap-foot></table-wrap><p>labeled dataset - students' scores on various tasks (progress in Campus works, Tests, and Attestations) and their score for the exam. The training (80%) and test (20%) datasets are separated from it. The logistic regression model is then trained and evaluated for accuracy as follows: Let X be the vector of input features (students' scores on various tasks), and Y be the binary output (pass/fail the exam). Let's assume that a student has the probability of passing the exam is:</p><p>P{y=1|x} = f(z), (5)</p><p>where z = θ0 + θ1x1+… +θnxn are column vectors of the values of the input normalized features x and regression coefficients θ; f(z) – logistic function defined as <inline-formula><tex-math id="math-7"><![CDATA[ \documentclass{article} \usepackage{amsmath} \begin{document} \displaystyle f\left( z \right) = \frac{1}{1 + e^{- z}} \end{document} ]]></tex-math></inline-formula>.</p><p>Visualization of classification results uses student indexes, where learning outcomes and predicted test results are indicated in different colors shown in <xref ref-type="fig" rid="figure-7">Figure 8</xref>. The output is a trained logistic regression model capable of predicting whether a student will pass an exam based on their input scores. After training the logistic regression model, the importance of features was estimated by the absolute values of their coefficients shown in <xref ref-type="table" rid="table-3">Table 3</xref>.</p><p>Features with higher coefficients are more important in predicting the target variable <xref ref-type="bibr" rid="BIBR-9">(Bruce &amp; Bruce, 2017)</xref>. Therefore, the latter feature can be excluded from model training. In addition, we checked its statistical significance with the Student's test (How to Do a T-Test in Python | Built In, n.d.). There is a negative trend, but the effect of project marks on the exam result does not demonstrate statistical significance (t-statistic = -1.99, p &gt; 0.05). In this case, there is no reason to conclude that project scores have a significant impact on exam passing.</p></sec><sec><title>Academic performance prediction</title><p>The normalized set of input and output features is again broken down into training and test parts. An instance of the linear regression model learns from the training part and makes predictions for the test part. The data is visualized using a scatter plot. <xref ref-type="fig" rid="figure-18ek9q">Figure 9</xref> showing the exam result for several student's index. To evaluate the performance of the linear regression model, error metrics are calculated on test data: MAE = 0.079, MSE = 0.078, RMSE = 0.088, R squared = 0.89.</p><fig id="figure-18ek9q" ignoredToc=""><label>Figure 9</label><caption><p>Academic performance prediction</p></caption><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45020" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><fig id="figure-8" ignoredToc=""><label>Figure 10</label><caption><p>Results of сlassifying students by similarity of scores</p></caption><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45010" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><p>For the available dataset, the model shows high accuracy because the known test and predicted values are close enough, and the errors do not exceed 9%. The t-statistic of -2.83 and a p-value of 0.01 suggest that there may be a significant difference in the Test progress feature between the groups of males and females being compared <xref ref-type="bibr" rid="BIBR-19">(Olatunde-Aiyedun, 2021)</xref>. The low p-value indicates evidence to reject the null hypothesis in favor of a significant difference.</p></sec><sec><title>Clustering students by similarity of scores</title><p>When creating personalized learning plans, identifying successful/unsuccessful learning strategies, it is useful to identify groups of students with similar sets of assessments <xref ref-type="bibr" rid="BIBR-23">(Reiser &amp; Joseph’s College, 2017)</xref>. For this purpose, students are clustered using the k-means method <xref ref-type="bibr" rid="BIBR-33">(Wati et al., 2021)</xref> on the same labeled dataset. Let X be the vector of input features (students' scores for various tasks) and Y be the output feature (cluster number). The initialization of the mass centers of the clusters is random. The algorithm seeks to minimize the total standard deviation of the cluster points from the centers of these clusters:</p><p><inline-formula><tex-math id="math-8"><![CDATA[ \documentclass{article} \usepackage{amsmath} \begin{document} \displaystyle V = \sum_{i = 1}^{k}{\sum_{x \in S_{i}}^{}{(x - \mu_{i})}^{2}} \end{document} ]]></tex-math></inline-formula>, (3)</p><p>where k is the number of clusters, Si are resulting clusters, i=1,2,…., k, and μi are centers of mass of all X vectors from the cluster Si.</p><p>The steps are repeated until convergence, that is, until the centroids stop changing significantly or until the maximum number of iterations is reached <xref ref-type="bibr" rid="BIBR-7">(Boehmke &amp; Greenwell, 2020)</xref>. The results of clustering are presented in a two-dimensional plot in <xref ref-type="fig" rid="figure-8">Figure 10</xref> using the principal component analysis <xref ref-type="bibr" rid="BIBR-1">(Ahmad et al., 2019)</xref>, which reduces the dimensionality of the data space by converting a large set of features into a smaller one with minimal loss.</p><p>To find the optimal number of clusters, three characteristics are calculated: inertia, silhouette index, and Davis-Baldwin index. Inertia is computed as the sum of the squared distances from each data point to its nearest cluster centroid. It shows how grouped the points are within all the clusters in <xref ref-type="fig" rid="figure-afvjof">Figure 11</xref>. The lower the inertia, the better the model, because more compact and dense clusters usually imply a clearer structure in the data (Rykov et al., 2024). The silhouette index is computed for each data item in a cluster by measuring how close it is to the rest of its cluster compared to the elements of other clusters. The closer the silhouette index value is to one, the better the clusters are</p><fig id="figure-afvjof" ignoredToc=""><label>Figure 11</label><caption><p>Selection of the optimal number of clusters</p></caption><graphic xlink:href="https://journals2.ums.ac.id/jramathedu/article/download/4643/3825/45011" mimetype="image" mime-subtype="png"><alt-text>Image</alt-text></graphic></fig><p>separated. The Davis-Baldwin index is computed as the average of the paired distance ratios between the centroids of the clusters and their average intra-cluster distance. The smaller the Davis-Baldwin index, the better the clustering. As a result, 17 clusters of students with similar scores are created with the following characteristics: inertia 890.47, silhouette index 0.28, Davis-Baldwin index 0.83. Analysis of inertia graphs, silhouette and Davis-Baldwin indices showed that the optimal number of clusters varies from 16 to 20. </p></sec></sec><sec><title>FINDINGS</title><p>To identify methods and key factors influencing academic performance we performed the literature review. To analyze, customize and predict academic performance based on the data from e-learning platforms we offered the multistage methodology. At the first and second stages it applies statistical methods to form new features and improve the predictive ability of models. Thus, correlation analysis revealed a strong relationship between a number of features. Therefore, the dynamic features introduced into the feature space, taking into account the change in academic performance over time. The problems of predicting exam grades, classifying students into passing and non-passing an exam, as well as clustering students by sets of grades are solved at the third, forth and fifth stages consequentially. The results of the classification of exam grades are as follows: the estimate of the harmonic mean value of accuracy and completeness for the initial data F1 is 82 %. The linear regression model demonstrated the following error values: MAE = 0.1, MSE = 0.02, RMSE = 0.1, R2 = 0.7.</p></sec><sec><title>DISCUSSION</title><p>Classifying, clustering, and predicting academic performance can be useful for multiple stakeholders such as teachers, students, and institutions. For teachers, these tools help identify at-risk students, adapt curricula, and design targeted interventions. Students benefit by gaining insights into their performance trends, enabling better planning of study strategies and group work. Institutions can use these models to identify program-wide trends, evaluate curriculum effectiveness, and allocate resources to address systemic issues.</p><p>This article explores the use of machine learning methods to predict student performance, emphasizing several critical steps in the process. To ensure that all features are on a comparable scale, missing data is addressed, and normalization is applied. Key academic components—such as homework, projects, midterm scores, and test scores—serve as predictors, while new features are created to enhance the models' predictive power. The study employs clustering to analyze student behavior, multiple linear regression for performance prediction, and logistic regression for binary classification, such as pass/fail outcomes. The models are evaluated using metrics like mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE) to assess their accuracy. Additionally, Principal Component Analysis (PCA) and Kernel Density Estimation (KDE) are used to visualize the data, providing deeper insights into its structure and distribution. Together, these methods offer a robust framework for understanding and predicting academic performance.</p><p>At the same time, the proposed multi-method has a number of limitations. For example, when performing the first two stages, which include statistical analysis and generation of additional features, it is necessary to make some efforts to normalize the data, set a threshold for excluding strongly correlated features, and correctly specify pairs of different-temporal features of the same name that are converted into dynamic ones. In addition, when using methodology in universities of a humanitarian orientation, the Programming skills attribute can be replaced, for example, by attendance.</p><p>At the last stage, the k-means clustering method is used to divide students into groups. This is a simple and straightforward method, which, unlike more modern clustering methods (e.g., DBSCAN <xref ref-type="bibr" rid="BIBR-30">(Vladova, 2024)</xref>, DBCLASD <xref ref-type="bibr" rid="BIBR-26">(Sheikholeslami &amp; Zhang, 1998)</xref>, WaveClaster <xref ref-type="bibr" rid="BIBR-26">(Sheikholeslami &amp; Zhang, 1998)</xref>) involves a separate expert study on the number of clusters. This study includes an estimate of the inertia, silhouette index, and Davis-Baldwin index and results in a recommended cluster count interval. Within this interval, the educational manager must select one number – the exact number of clusters. Such a choice requires a certain expertise from the decision-maker, but at the same time allows him to take into account the administrative restrictions on the number of groups of students studying in different programs.</p><p>In the subsequent study, it is proposed to exclude from further consideration students at academic risk identified at the first stage <xref ref-type="bibr" rid="BIBR-27">(Shou et al., 2024)</xref>, and also to investigate the impact on academic performance of signs of IP address coincidences when doing homework <xref ref-type="bibr" rid="BIBR-17">(Komosny &amp; Rehman, 2022)</xref>, duration of work, start and end time of work. In addition, it is necessary to develop dashboards that greatly facilitate model settings and decision-making for managers and teachers of educational institutions.</p></sec><sec><title>CONCLUSION</title><p>The literature review highlights various approaches to predicting student academic performance using machine learning and statistical methods. Researchers emphasize the importance of identifying key factors influencing performance, such as prior academic scores, and engagement in e-learning platforms. Various models, including regression analysis and classification techniques, have been utilized, demonstrating mixed success rates, while suggesting the need for further feature selection and data transformation to enhance predictive accuracy. The weaknesses of the approaches include insufficient accuracy of the models, the use of qualitative features, and the influence of experts</p><p>The study carried out a comprehensive statistical analysis of students' scores for a math discipline. This involved assessing data imbalance, identifying gaps, constructing score distribution densities, and exploring the correlation matrix of scores. These analyses provide valuable insights into the distribution and relationships of student scores, laying a strong foundation for further modeling and predictions.</p><p>The study changed the dimensionality of the feature space by normalizing scores to a common scale and creating new features such as student indexes, ranks, and differences between scores at different time points. This expansion of the feature space enhances the richness of the dataset and can potentially lead to more robust and accurate predictive models.</p><p>By developing models to predict students at academic risk and evaluating the statistical significance of the included features, the study addresses the crucial issue of identifying and supporting students who may be at risk of underperforming. This proactive approach can help institutions tailor interventions and support to students who need it most.</p><p>The study's plan to customize consultations for student groups with similar learning trajectories reflects a student-centric approach to enhancing academic performance. By recognizing the diverse needs of student cohorts and tailoring support accordingly, the study aims to foster a more personalized and effective learning environment.</p><p>The approach of predicting exam scores for individual students demonstrates a commitment to providing comprehensive support beyond mere assessment. By leveraging analysis, modeling, and customization of consultations, the study aims to proactively improve students' academic performance levels within university settings.</p><p>The proposed multi-method to the analysis of data from electronic platforms shows a picture of student engagement close to reality. The progress track, predictive assessment and clustering allows educational managers and teachers to assign consultations to groups of students at academic risk and with deteriorating academic performance.</p></sec></body><back><sec sec-type="author-contributions"><title>Author Contributions</title><p>Vladova, A., &amp; Borchyk, K. M. (2024). Predictive analytics of student performance: Multi-method and code. JRAMathEdu (Journal of Research and Advances in Mathematics Education), 9(4), 190-204. https://doi.org/10.23917/jramathedu.v9i4.4643</p></sec><sec><title>Availability of data and materials</title><p>All data are available from https://github.com/avladova/Student-performance-prediction .</p></sec><sec><title>Competing interests</title><p>The authors declare that the publishing of this paper does not involve any conflicts of interest. This work has never been published or offered for publication elsewhere, and it is completely original.</p></sec><sec sec-type="how-to-cite"><title>How to Cite</title><p>Vladova, A., &amp; Borchyk, K. M. (2024). Predictive analytics of student performance: Multi-method and code. JRAMathEdu (Journal of Research and Advances in Mathematics Education), 9(4), 190-204. https://doi.org/10.23917/jramathedu.v9i4.4643</p></sec><ref-list><title>References</title><ref id="BIBR-1"><element-citation publication-type="article-journal"><article-title>Principal Component Analysis and Self-Organizing Map Clustering for Student Browsing Behaviour Analysis</article-title><source>Procedia Computer Science</source><volume>163</volume><person-group person-group-type="author"><name><surname>Ahmad</surname><given-names>N.B.</given-names></name><name><surname>Alias</surname><given-names>U.F.</given-names></name><name><surname>Mohamad</surname><given-names>N.</given-names></name><name><surname>Yusof</surname><given-names>N.</given-names></name></person-group><year>2019</year><fpage>550</fpage><lpage>559</lpage><page-range>550-559</page-range><pub-id pub-id-type="doi">10.1016/J.PROCS.2019.12.137</pub-id><ext-link xlink:href="10.1016/J.PROCS.2019.12.137" ext-link-type="doi" xlink:title="Principal Component Analysis and Self-Organizing Map Clustering for Student Browsing Behaviour Analysis">10.1016/J.PROCS.2019.12.137</ext-link></element-citation></ref><ref id="BIBR-2"><element-citation publication-type=""><article-title>A Multiple Linear Regression-Based Approach to Predict Student Performance</article-title><person-group person-group-type="author"><name><surname>Aissaoui</surname><given-names>O.</given-names></name><name><surname>Madani</surname><given-names>Y.</given-names></name><name><surname>Oughdir</surname><given-names>L.</given-names></name><name><surname>Dakkak</surname><given-names>A.</given-names></name><name><surname>ALLIOUI</surname><given-names>E.L.</given-names></name><name name-style="given-only"><given-names>Y.</given-names></name></person-group><year>2020</year><fpage>9</fpage><lpage>23</lpage><page-range>9-23</page-range><pub-id pub-id-type="doi">10.1007/978-3-030-36653-7_2</pub-id><ext-link xlink:href="10.1007/978-3-030-36653-7_2" ext-link-type="doi" xlink:title="A Multiple Linear Regression-Based Approach to Predict Student Performance">10.1007/978-3-030-36653-7_2</ext-link></element-citation></ref><ref id="BIBR-3"><element-citation publication-type="article-journal"><article-title>Privacy and e-learning: A pending task</article-title><source>Sustainability (Switzerland</source><volume>13</volume><issue>16</issue><person-group person-group-type="author"><name><surname>Alier</surname><given-names>M.</given-names></name><name><surname>Casañ Guerrero</surname><given-names>M.J.</given-names></name><name><surname>Amo</surname><given-names>D.</given-names></name><name><surname>Severance</surname><given-names>C.</given-names></name><name><surname>Fonseca</surname><given-names>D.</given-names></name></person-group><year>2021</year><pub-id pub-id-type="doi">10.3390/SU13169206</pub-id><ext-link xlink:href="10.3390/SU13169206" ext-link-type="doi" xlink:title="Privacy and e-learning: A pending task">10.3390/SU13169206</ext-link></element-citation></ref><ref id="BIBR-4"><element-citation publication-type="article-journal"><article-title>Comparison of Logistic Regression and Discriminant Analysis for Classification of Multicollinearity Data</article-title><source>WSEAS TRANSACTIONS ON MATHEMATICS</source><volume>22</volume><person-group person-group-type="author"><name><surname>Araveeporn</surname><given-names>A.</given-names></name></person-group><year>2023</year><fpage>120</fpage><lpage>131</lpage><page-range>120-131</page-range><pub-id pub-id-type="doi">10.37394/23206.2023.22.15</pub-id><ext-link xlink:href="10.37394/23206.2023.22.15" ext-link-type="doi" xlink:title="Comparison of Logistic Regression and Discriminant Analysis for Classification of Multicollinearity Data">10.37394/23206.2023.22.15</ext-link></element-citation></ref><ref id="BIBR-5"><element-citation publication-type="article-journal"><article-title>Forecasting Subscriber Churn: Comparison of Machine Learning Methods</article-title><source>Computer Tools in Education</source><volume>5</volume><person-group person-group-type="author"><name><surname>Arzamastsev</surname><given-names>S.A.</given-names></name><name><surname>Bgatov</surname><given-names>M.V.</given-names></name><name><surname>Kartysheva</surname><given-names>E.N.</given-names></name><name><surname>Derkunskii</surname><given-names>V.A.</given-names></name><name><surname>Semenchikov</surname><given-names>D.N.</given-names></name></person-group><year>2018</year><fpage>5</fpage><lpage>23</lpage><page-range>5-23</page-range><ext-link xlink:href="http://cte.eltech.ru/ojs/index.php/kio/article/view/1542" ext-link-type="uri" xlink:title="Forecasting Subscriber Churn: Comparison of Machine Learning Methods">Forecasting Subscriber Churn: Comparison of Machine Learning Methods</ext-link></element-citation></ref><ref id="BIBR-6"><element-citation publication-type="article-journal"><article-title>Profiling students via clustering in a flipped clinical skills course using learning analytics</article-title><source>Medical Teacher</source><volume>45</volume><issue>7</issue><person-group person-group-type="author"><name><surname>Bayazit</surname><given-names>A.</given-names></name><name><surname>Ilgaz</surname><given-names>H.</given-names></name><name><surname>Gönüllü</surname><given-names>İ.</given-names></name><name><surname>Erden</surname><given-names>Ş.</given-names></name></person-group><year>2022</year><fpage>724</fpage><lpage>731</lpage><page-range>724-731</page-range><pub-id pub-id-type="doi">10.1080/0142159x.2022.2152663</pub-id><ext-link xlink:href="10.1080/0142159x.2022.2152663" ext-link-type="doi" xlink:title="Profiling students via clustering in a flipped clinical skills course using learning analytics">10.1080/0142159x.2022.2152663</ext-link></element-citation></ref><ref id="BIBR-7"><element-citation publication-type="chapter"><article-title>Hands-on Machine Learning with R</article-title><source>CRC Press</source><person-group person-group-type="author"><name><surname>Boehmke</surname><given-names>B.</given-names></name><name><surname>Greenwell</surname><given-names>B.</given-names></name></person-group><year>2020</year><ext-link xlink:href="https://www.routledge.com/Hands-On-Machine-Learning-with-R/Boehmke-Greenwell/p/book/9781138495685" ext-link-type="uri" xlink:title="Hands-on Machine Learning with R">Hands-on Machine Learning with R</ext-link></element-citation></ref><ref id="BIBR-8"><element-citation publication-type="chapter"><article-title>Machine Learning Algorithms</article-title><source>Packt Publishing</source><person-group person-group-type="author"><name><surname>Bonaccorso</surname><given-names>Giuseppe</given-names></name></person-group><year>2018</year><publisher-name>Packt Publishing Ltd</publisher-name><ext-link xlink:href="https://www.oreilly.com/library/view/machine-learning-algorithms/9781789347999/" ext-link-type="uri" xlink:title="Machine Learning Algorithms">Machine Learning Algorithms</ext-link></element-citation></ref><ref id="BIBR-9"><element-citation publication-type=""><article-title>Practical Statistics for Data Scientists</article-title><person-group person-group-type="author"><name><surname>Bruce</surname><given-names>P.</given-names></name><name><surname>Bruce</surname><given-names>A.</given-names></name></person-group><year>2017</year><ext-link xlink:href="https://www.oreilly.com/library/view/practical-statistics-for/9781491952955/ch04.html" ext-link-type="uri" xlink:title="Practical Statistics for Data Scientists">Practical Statistics for Data Scientists</ext-link></element-citation></ref><ref id="BIBR-10"><element-citation publication-type="paper-conference"><article-title>Comparative Analysis of E-Learning Platforms on The Market</article-title><source>2018 10th International Conference on Electronics, Computers and Artificial Intelligence (ECAI</source><person-group person-group-type="author"><name><surname>Elisabeta</surname><given-names>P.M.</given-names></name><name><surname>Alexandru</surname><given-names>M.R.</given-names></name></person-group><year>2018</year><fpage>1</fpage><lpage>4</lpage><page-range>1-4</page-range><pub-id pub-id-type="doi">10.1109/ECAI.2018.8679004</pub-id><ext-link xlink:href="10.1109/ECAI.2018.8679004" ext-link-type="doi" xlink:title="Comparative Analysis of E-Learning Platforms on The Market">10.1109/ECAI.2018.8679004</ext-link></element-citation></ref><ref id="BIBR-11"><element-citation publication-type="article-journal"><article-title>E-learning recommender system dataset</article-title><source>Data in Brief</source><volume>47</volume><person-group person-group-type="author"><name><surname>Hafsa</surname><given-names>M.</given-names></name><name><surname>Wattebled</surname><given-names>P.</given-names></name><name><surname>Jacques</surname><given-names>J.</given-names></name><name><surname>Jourdan</surname><given-names>L.</given-names></name></person-group><year>2023</year><page-range>108942</page-range><pub-id pub-id-type="doi">10.1016/j.dib.2023.108942</pub-id><ext-link xlink:href="10.1016/j.dib.2023.108942" ext-link-type="doi" xlink:title="E-learning recommender system dataset">10.1016/j.dib.2023.108942</ext-link></element-citation></ref><ref id="BIBR-12"><element-citation publication-type=""><article-title>How to Do a T-Test in Python | Built In</article-title><ext-link xlink:href="https://builtin.com/data-science/t-test-python" ext-link-type="uri" xlink:title="How to Do a T-Test in Python | Built In">How to Do a T-Test in Python | Built In</ext-link></element-citation></ref><ref id="BIBR-13"><element-citation publication-type="paper-conference"><article-title>Robust Kernel Density Estimation with Median-of-Means principle</article-title><source>Proceedings of the 39th International Conference on Machine Learning</source><volume>162</volume><person-group person-group-type="author"><name><surname>Humbert</surname><given-names>P.</given-names></name><name><surname>Bars</surname><given-names>B.Le</given-names></name><name><surname>Minvielle</surname><given-names>L.</given-names></name></person-group><year>2022</year><fpage>9444</fpage><lpage>9465</lpage><page-range>9444-9465</page-range><ext-link xlink:href="https://proceedings.mlr.press/v162/humbert22a.html" ext-link-type="uri" xlink:title="Robust Kernel Density Estimation with Median-of-Means principle">Robust Kernel Density Estimation with Median-of-Means principle</ext-link></element-citation></ref><ref id="BIBR-14"><element-citation publication-type="chapter"><article-title>Predicting Students</article-title><source>Academic Performance: Comparing Artificial Neural Network, Decision Tree and Linear Regression. 21st Annual SAS Malaysia Forum</source><person-group person-group-type="author"><name><surname>Ibrahim</surname><given-names>Z.</given-names></name><name><surname>Rusli</surname><given-names>D.</given-names></name></person-group><year>2007</year></element-citation></ref><ref id="BIBR-15"><element-citation publication-type="article-journal"><article-title>Analysis of Changes in the Affective Characteristics and Communicational Skills of Prospective Teachers: Longitudinal Study</article-title><source>International Journal of Progressive Education</source><volume>14</volume><issue>6</issue><person-group person-group-type="author"><name><surname>Kahramanoğlu</surname><given-names>R.</given-names></name></person-group><year>2018</year><fpage>177</fpage><lpage>199</lpage><page-range>177-199</page-range><pub-id pub-id-type="doi">10.29329/IJPE.2018.179.14</pub-id><ext-link xlink:href="10.29329/IJPE.2018.179.14" ext-link-type="doi" xlink:title="Analysis of Changes in the Affective Characteristics and Communicational Skills of Prospective Teachers: Longitudinal Study">10.29329/IJPE.2018.179.14</ext-link></element-citation></ref><ref id="BIBR-16"><element-citation publication-type="paper-conference"><article-title>Predicting MOOC Dropout over Weeks Using Machine Learning Methods</article-title><source>Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs</source><person-group person-group-type="author"><name><surname>Kloft</surname><given-names>M.</given-names></name><name><surname>Stiehler</surname><given-names>F.</given-names></name><name><surname>Zheng</surname><given-names>Z.</given-names></name><name><surname>Pinkwart</surname><given-names>N.</given-names></name></person-group><year>2014</year><fpage>60</fpage><lpage>65</lpage><page-range>60-65</page-range><pub-id pub-id-type="doi">10.3115/v1/W14-4111</pub-id><ext-link xlink:href="10.3115/v1/W14-4111" ext-link-type="doi" xlink:title="Predicting MOOC Dropout over Weeks Using Machine Learning Methods">10.3115/v1/W14-4111</ext-link></element-citation></ref><ref id="BIBR-17"><element-citation publication-type=""><article-title>A Method for Cheating Indication in Unproctored On-Line Exams. Sensors</article-title><person-group person-group-type="author"><name><surname>Komosny</surname><given-names>D.</given-names></name><name><surname>Rehman</surname><given-names>S.U.</given-names></name></person-group><year>2022</year><publisher-loc>Basel, Switzerland</publisher-loc><pub-id pub-id-type="doi">10.3390/S22020654</pub-id><ext-link xlink:href="10.3390/S22020654" ext-link-type="doi" xlink:title="A Method for Cheating Indication in Unproctored On-Line Exams. Sensors">10.3390/S22020654</ext-link></element-citation></ref><ref id="BIBR-18"><element-citation publication-type="article-journal"><article-title>Towards intelligent E-learning systems</article-title><source>Education and Information Technologies</source><volume>28</volume><issue>7</issue><person-group person-group-type="author"><name><surname>Liu</surname><given-names>M.</given-names></name><name><surname>Yu</surname><given-names>D.</given-names></name></person-group><year>2023</year><fpage>7845</fpage><lpage>7876</lpage><page-range>7845-7876</page-range><pub-id pub-id-type="doi">10.1007/s10639-022-11479-6</pub-id><ext-link xlink:href="10.1007/s10639-022-11479-6" ext-link-type="doi" xlink:title="Towards intelligent E-learning systems">10.1007/s10639-022-11479-6</ext-link></element-citation></ref><ref id="BIBR-19"><element-citation publication-type="article-journal"><article-title>Student Teachers’ Attitude towards Teaching Practice</article-title><source>International Journal of Culture and Modernity</source><volume>8</volume><person-group person-group-type="author"><name><surname>Olatunde-Aiyedun</surname><given-names>T.</given-names></name></person-group><year>2021</year><fpage>6</fpage><lpage>17</lpage><page-range>6-17</page-range><ext-link xlink:href="http://ijcm.academicjournal.io/index.php/ijcm/article/download/59/58" ext-link-type="uri" xlink:title="Student Teachers’ Attitude towards Teaching Practice">Student Teachers’ Attitude towards Teaching Practice</ext-link></element-citation></ref><ref id="BIBR-20"><element-citation publication-type="article-journal"><article-title>E-Learning Performance Evaluation in Medical Education—A Bibliometric and Visualization Analysis</article-title><source>Healthcare</source><volume>11</volume><person-group person-group-type="author"><name><surname>Oluwadele</surname><given-names>D.</given-names></name><name><surname>Singh</surname><given-names>Y.</given-names></name><name><surname>Adeliyi</surname><given-names>T.</given-names></name></person-group><year>2023</year><page-range>232</page-range><pub-id pub-id-type="doi">10.3390/healthcare11020232</pub-id><ext-link xlink:href="10.3390/healthcare11020232" ext-link-type="doi" xlink:title="E-Learning Performance Evaluation in Medical Education—A Bibliometric and Visualization Analysis">10.3390/healthcare11020232</ext-link></element-citation></ref><ref id="BIBR-21"><element-citation publication-type=""><article-title>A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters</article-title><person-group person-group-type="author"><name><surname>Petrovic</surname><given-names>S.V.</given-names></name></person-group><year>2006</year><ext-link xlink:href="https://citeseerx.ist.psu.edu/document?repid=rep1&amp;amp;type=pdf&amp;amp;doi=b2db00f73fc6b97ebe12e97cfdaefbb2fefc253b" ext-link-type="uri" xlink:title="A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters">A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters</ext-link></element-citation></ref><ref id="BIBR-22"><element-citation publication-type="article-journal"><article-title>Predicting students’ performance in e-learning using learning process and behaviour data</article-title><source>Scientific Reports</source><volume>12</volume><issue>1</issue><person-group person-group-type="author"><name><surname>Qiu</surname><given-names>F.</given-names></name><name><surname>Zhang</surname><given-names>G.</given-names></name><name><surname>Sheng</surname><given-names>X.</given-names></name><name><surname>Jiang</surname><given-names>L.</given-names></name><name><surname>Zhu</surname><given-names>L.</given-names></name><name><surname>Xiang</surname><given-names>Q.</given-names></name><name><surname>Jiang</surname><given-names>B.</given-names></name><name><surname>Chen</surname><given-names>P.</given-names></name></person-group><year>2022</year><page-range>453</page-range><pub-id pub-id-type="doi">10.1038/s41598-021-03867-8</pub-id><ext-link xlink:href="10.1038/s41598-021-03867-8" ext-link-type="doi" xlink:title="Predicting students’ performance in e-learning using learning process and behaviour data">10.1038/s41598-021-03867-8</ext-link></element-citation></ref><ref id="BIBR-23"><element-citation publication-type="article-journal"><article-title>Blending Individual and Group Assessment: A Model for Measuring Student Performance</article-title><source>Journal of the Scholarship of Teaching and Learning</source><volume>17</volume><issue>4</issue><person-group person-group-type="author"><name><surname>Reiser</surname><given-names>E.</given-names></name><name><surname>Joseph’s College</surname><given-names>S.</given-names></name></person-group><year>2017</year><fpage>83</fpage><lpage>94</lpage><page-range>83-94</page-range><pub-id pub-id-type="doi">10.14434/JOSOTL.V17I4.21938</pub-id><ext-link xlink:href="10.14434/JOSOTL.V17I4.21938" ext-link-type="doi" xlink:title="Blending Individual and Group Assessment: A Model for Measuring Student Performance">10.14434/JOSOTL.V17I4.21938</ext-link></element-citation></ref><ref id="BIBR-24"><element-citation publication-type="article-journal"><article-title>Inertia-Based Indices to Determine the Number of Clusters in K-Means: An Experimental Evaluation</article-title><source>IEEE Access</source><volume>12</volume><person-group person-group-type="author"><name><surname>Rykov</surname><given-names>A.</given-names></name><name><surname>Amorim</surname><given-names>R.C.</given-names></name><name><surname>Makarenkov</surname><given-names>V.</given-names></name><name><surname>Mirkin</surname><given-names>B.</given-names></name></person-group><year>2024</year><fpage>11761</fpage><lpage>11773</lpage><page-range>11761-11773</page-range><pub-id pub-id-type="doi">10.1109/ACCESS.2024.3350791</pub-id><ext-link xlink:href="10.1109/ACCESS.2024.3350791" ext-link-type="doi" xlink:title="Inertia-Based Indices to Determine the Number of Clusters in K-Means: An Experimental Evaluation">10.1109/ACCESS.2024.3350791</ext-link></element-citation></ref><ref id="BIBR-25"><element-citation publication-type="article-journal"><article-title>A Review on Predicting Student’s Performance Using Data Mining Techniques</article-title><source>Procedia Computer Science</source><volume>72</volume><person-group person-group-type="author"><name><surname>Shahiri</surname><given-names>A.</given-names></name><name><surname>Husain</surname><given-names>W.</given-names></name><name><surname>Abdul Rashid</surname><given-names>N.</given-names></name></person-group><year>2015</year><fpage>414</fpage><lpage>422</lpage><page-range>414-422</page-range><pub-id pub-id-type="doi">10.1016/j.procs.2015.12.157</pub-id><ext-link xlink:href="10.1016/j.procs.2015.12.157" ext-link-type="doi" xlink:title="A Review on Predicting Student’s Performance Using Data Mining Techniques">10.1016/j.procs.2015.12.157</ext-link></element-citation></ref><ref id="BIBR-26"><element-citation publication-type="paper-conference"><article-title>A Multi-Resolution Clustering Approach for Very Large Spatial Databases *</article-title><source>Proceedings of the 24th VLDB Conference</source><person-group person-group-type="author"><name><surname>Sheikholeslami</surname><given-names>G.</given-names></name><name><surname>Zhang</surname><given-names>A.</given-names></name></person-group><year>1998</year><ext-link xlink:href="https://www.vldb.org/conf/1998/p428.pdf" ext-link-type="uri" xlink:title="A Multi-Resolution Clustering Approach for Very Large Spatial Databases *">A Multi-Resolution Clustering Approach for Very Large Spatial Databases *</ext-link></element-citation></ref><ref id="BIBR-27"><element-citation publication-type="article-journal"><article-title>Predicting Student Performance in Online Learning: A Multidimensional Time-Series Data Analysis Approach</article-title><source>Applied Sciences</source><volume>14</volume><issue>6</issue><person-group person-group-type="author"><name><surname>Shou</surname><given-names>Z.</given-names></name><name><surname>Xie</surname><given-names>M.</given-names></name><name><surname>Mo</surname><given-names>J.</given-names></name><name><surname>Zhang</surname><given-names>H.</given-names></name></person-group><year>2024</year><pub-id pub-id-type="doi">10.3390/app14062522</pub-id><ext-link xlink:href="10.3390/app14062522" ext-link-type="doi" xlink:title="Predicting Student Performance in Online Learning: A Multidimensional Time-Series Data Analysis Approach">10.3390/app14062522</ext-link></element-citation></ref><ref id="BIBR-28"><element-citation publication-type="article-journal"><article-title>Comulang: towards a collaborative e-learning system that supports student group modeling</article-title><source>SpringerPlus</source><volume>2</volume><issue>1</issue><person-group person-group-type="author"><name><surname>Troussas</surname><given-names>C.</given-names></name><name><surname>Virvou</surname><given-names>M.</given-names></name><name><surname>Alepis</surname><given-names>E.</given-names></name></person-group><year>2013</year><page-range>387</page-range><pub-id pub-id-type="doi">10.1186/2193-1801-2-387</pub-id><ext-link xlink:href="10.1186/2193-1801-2-387" ext-link-type="doi" xlink:title="Comulang: towards a collaborative e-learning system that supports student group modeling">10.1186/2193-1801-2-387</ext-link></element-citation></ref><ref id="BIBR-29"><element-citation publication-type="article-journal"><article-title>Logistic Regression Model for the Academic Performance of First-Year Medical Students in the Biomedical Area</article-title><source>Creative Education</source><volume>07</volume><person-group person-group-type="author"><name><surname>Urrutia-Aguilar</surname><given-names>M.</given-names></name><name><surname>Fuentes-Garcia</surname><given-names>R.</given-names></name><name><surname>Martinez</surname><given-names>D.</given-names></name><name><surname>Beck</surname><given-names>E.</given-names></name><name><surname>Ortiz</surname><given-names>S.</given-names></name><name><surname>Guevara-Guzmán</surname><given-names>R.</given-names></name></person-group><year>2016</year><fpage>2202</fpage><lpage>2211</lpage><page-range>2202-2211</page-range><pub-id pub-id-type="doi">10.4236/ce.2016.715217</pub-id><ext-link xlink:href="10.4236/ce.2016.715217" ext-link-type="doi" xlink:title="Logistic Regression Model for the Academic Performance of First-Year Medical Students in the Biomedical Area">10.4236/ce.2016.715217</ext-link></element-citation></ref><ref id="BIBR-30"><element-citation publication-type="article-journal"><article-title>Developing group and individual performance paths based on e-learning platform data</article-title><source>Large-Scale Systems Control (UBS</source><volume>111</volume><person-group person-group-type="author"><name><surname>Vladova</surname><given-names>A.Yu</given-names></name></person-group><year>2024</year><fpage>179</fpage><lpage>196</lpage><page-range>179-196</page-range><pub-id pub-id-type="doi">10.25728/ubs.2024.111.7</pub-id><ext-link xlink:href="10.25728/ubs.2024.111.7" ext-link-type="doi" xlink:title="Developing group and individual performance paths based on e-learning platform data">10.25728/ubs.2024.111.7</ext-link></element-citation></ref><ref id="BIBR-31"><element-citation publication-type="article-journal"><article-title>Data preprocessing for machine analysis of sales representatives’ key performance indicators</article-title><source>Business Informatics</source><volume>15</volume><issue>3</issue><person-group person-group-type="author"><name><surname>Vladova</surname><given-names>A.</given-names></name><name><surname>Shek</surname><given-names>E.</given-names></name></person-group><year>2021</year><fpage>48</fpage><lpage>59</lpage><page-range>48-59</page-range><pub-id pub-id-type="doi">10.17323/2587-814X.2021.3.48.59</pub-id><ext-link xlink:href="10.17323/2587-814X.2021.3.48.59" ext-link-type="doi" xlink:title="Data preprocessing for machine analysis of sales representatives’ key performance indicators">10.17323/2587-814X.2021.3.48.59</ext-link></element-citation></ref><ref id="BIBR-32"><element-citation publication-type="paper-conference"><article-title>Visualizing Results of Promoting Campaigns</article-title><source>14th International Conference Management of Large-Scale System Development (MLSD</source><person-group person-group-type="author"><name><surname>Vladova</surname><given-names>A.Yu</given-names></name><name><surname>Vladov</surname><given-names>Yu R.</given-names></name><name><surname>Yakimov</surname><given-names>A.I.</given-names></name></person-group><year>2021</year><fpage>1</fpage><lpage>4</lpage><page-range>1-4</page-range><pub-id pub-id-type="doi">10.1109/MLSD52249.2021.9600205</pub-id><ext-link xlink:href="10.1109/MLSD52249.2021.9600205" ext-link-type="doi" xlink:title="Visualizing Results of Promoting Campaigns">10.1109/MLSD52249.2021.9600205</ext-link></element-citation></ref><ref id="BIBR-33"><element-citation publication-type="article-journal"><article-title>Analysis K-Means Clustering to Predicting Student Graduation</article-title><source>Journal of Physics: Conference Series</source><volume>1844</volume><issue>1</issue><person-group person-group-type="author"><name><surname>Wati</surname><given-names>M.</given-names></name><name><surname>Rahmah</surname><given-names>W.H.</given-names></name><name><surname>Novirasari</surname><given-names>N.</given-names></name><name><surname>Haviluddin</surname><given-names>Budiman</given-names></name><name><surname>E.</surname></name><name name-style="given-only"><given-names>Islamiyah</given-names></name></person-group><year>2021</year><page-range>012028</page-range><pub-id pub-id-type="doi">10.1088/1742-6596/1844/1/012028</pub-id><ext-link xlink:href="10.1088/1742-6596/1844/1/012028" ext-link-type="doi" xlink:title="Analysis K-Means Clustering to Predicting Student Graduation">10.1088/1742-6596/1844/1/012028</ext-link></element-citation></ref><ref id="BIBR-34"><element-citation publication-type="article-journal"><article-title>Kernel density estimation and its application</article-title><source>ITM Web of Conferences</source><volume>23</volume><person-group person-group-type="author"><name><surname>Węglarczyk</surname><given-names>S.</given-names></name></person-group><year>2018</year><page-range>00037</page-range><pub-id pub-id-type="doi">10.1051/ITMCONF/20182300037</pub-id><ext-link xlink:href="10.1051/ITMCONF/20182300037" ext-link-type="doi" xlink:title="Kernel density estimation and its application">10.1051/ITMCONF/20182300037</ext-link></element-citation></ref><ref id="BIBR-35"><element-citation publication-type=""><article-title>Prediction of Student Performance Using Machine Learning Techniques: A Review</article-title><person-group person-group-type="author"><name><surname>Yadav</surname><given-names>N.</given-names></name><name><surname>Deshmukh</surname><given-names>S.</given-names></name></person-group><year>2023</year><fpage>735</fpage><lpage>741</lpage><page-range>735-741</page-range><pub-id pub-id-type="doi">10.2991/978-94-6463-136-4_63</pub-id><ext-link xlink:href="10.2991/978-94-6463-136-4_63" ext-link-type="doi" xlink:title="Prediction of Student Performance Using Machine Learning Techniques: A Review">10.2991/978-94-6463-136-4_63</ext-link></element-citation></ref><ref id="BIBR-36"><element-citation publication-type="article-journal"><article-title>Educational data mining: prediction of students’ academic performance using machine learning algorithms</article-title><source>Smart Learning Environments</source><volume>9</volume><issue>1</issue><person-group person-group-type="author"><name><surname>Yağcı</surname><given-names>M.</given-names></name></person-group><year>2022</year><page-range>11</page-range><pub-id pub-id-type="doi">10.1186/s40561-022-00192-z</pub-id><ext-link xlink:href="10.1186/s40561-022-00192-z" ext-link-type="doi" xlink:title="Educational data mining: prediction of students’ academic performance using machine learning algorithms">10.1186/s40561-022-00192-z</ext-link></element-citation></ref><ref id="BIBR-37"><element-citation publication-type="article-journal"><article-title>Predicting Students’ Academic Performance Using Multiple Linear Regression and Principal Component Analysis</article-title><source>Journal of Information Processing</source><volume>26</volume><person-group person-group-type="author"><name><surname>Yang</surname><given-names>S.J.H.</given-names></name><name><surname>Lu</surname><given-names>O.H.T.</given-names></name><name><surname>Huang</surname><given-names>A.Y.Q.</given-names></name><name><surname>Huang</surname><given-names>J.C.H.</given-names></name><name><surname>Ogata</surname><given-names>H.</given-names></name><name><surname>Lin</surname><given-names>A.J.Q.</given-names></name></person-group><year>2018</year><fpage>170</fpage><lpage>176</lpage><page-range>170-176</page-range><pub-id pub-id-type="doi">10.2197/IPSJJIP.26.170</pub-id><ext-link xlink:href="10.2197/IPSJJIP.26.170" ext-link-type="doi" xlink:title="Predicting Students’ Academic Performance Using Multiple Linear Regression and Principal Component Analysis">10.2197/IPSJJIP.26.170</ext-link></element-citation></ref><ref id="BIBR-38"><element-citation publication-type="article-journal"><article-title>SA-FEM: Combined Feature Selection and Feature Fusion for Students’ Performance Prediction</article-title><source>Sensors</source><volume>22</volume><issue>22</issue><person-group person-group-type="author"><name><surname>Ye</surname><given-names>M.</given-names></name><name><surname>Sheng</surname><given-names>X.</given-names></name><name><surname>Lu</surname><given-names>Y.</given-names></name><name><surname>Zhang</surname><given-names>G.</given-names></name><name><surname>Chen</surname><given-names>H.</given-names></name><name><surname>Jiang</surname><given-names>B.</given-names></name><name><surname>Zou</surname><given-names>S.</given-names></name><name><surname>Dai</surname><given-names>L.</given-names></name></person-group><year>2022</year><page-range>8838</page-range><pub-id pub-id-type="doi">10.3390/s22228838</pub-id><ext-link xlink:href="10.3390/s22228838" ext-link-type="doi" xlink:title="SA-FEM: Combined Feature Selection and Feature Fusion for Students’ Performance Prediction">10.3390/s22228838</ext-link></element-citation></ref><ref id="BIBR-39"><element-citation publication-type="article-journal"><article-title>Predict Students’ Academic Performance based on their Assessment Grades and Online Activity Data</article-title><source>International Journal of Advanced Computer Science and Applications</source><volume>11</volume><person-group person-group-type="author"><name><surname>Zafar</surname><given-names>B.</given-names></name><name><surname>Alhassan</surname><given-names>A.</given-names></name><name><surname>Mueen</surname><given-names>A.</given-names></name></person-group><year>2020</year><pub-id pub-id-type="doi">10.14569/IJACSA.2020.0110425</pub-id><ext-link xlink:href="10.14569/IJACSA.2020.0110425" ext-link-type="doi" xlink:title="Predict Students’ Academic Performance based on their Assessment Grades and Online Activity Data">10.14569/IJACSA.2020.0110425</ext-link></element-citation></ref><ref id="BIBR-40"><element-citation publication-type="article-journal"><article-title>Text search of surnames in some Slavic and other morphologically rich languages using rule based phonetic algorithms</article-title><source>IEEE Transactions on Audio, Speech and Language Processing</source><volume>23</volume><issue>3</issue><person-group person-group-type="author"><name><surname>Zahoranský</surname><given-names>D.</given-names></name><name><surname>Polasek</surname><given-names>I.</given-names></name></person-group><year>2015</year><fpage>553</fpage><lpage>563</lpage><page-range>553-563</page-range><pub-id pub-id-type="doi">10.1109/TASLP.2015.2393393</pub-id><ext-link xlink:href="10.1109/TASLP.2015.2393393" ext-link-type="doi" xlink:title="Text search of surnames in some Slavic and other morphologically rich languages using rule based phonetic algorithms">10.1109/TASLP.2015.2393393</ext-link></element-citation></ref><ref id="BIBR-41"><element-citation publication-type="article-journal"><article-title>Educational Data Mining Techniques for Student Performance Prediction</article-title><source>Method Review and Comparison Analysis. Frontiers in Psychology</source><volume>12</volume><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>Y.</given-names></name><name><surname>Yun</surname><given-names>Y.</given-names></name><name><surname>An</surname><given-names>R.</given-names></name><name><surname>Cui</surname><given-names>J.</given-names></name><name><surname>Dai</surname><given-names>H.</given-names></name><name><surname>Shang</surname><given-names>X.</given-names></name></person-group><year>2021</year><pub-id pub-id-type="doi">10.3389/fpsyg.2021.698490</pub-id><ext-link xlink:href="10.3389/fpsyg.2021.698490" ext-link-type="doi" xlink:title="Educational Data Mining Techniques for Student Performance Prediction">10.3389/fpsyg.2021.698490</ext-link></element-citation></ref></ref-list></back></article>
