-Second year

-Group project

The aim of the Big Data module was to teach students the basics of data analysis including differential statistics, inferential statistic and correlations using the MATLAB program. The assessed project portion focused on machine learning for predictive modelling. 

In groups, students were required to select a dataset to analyse. In our case the dataset chosen was based on the social, gender and study data from over 600 students undertaking a Portuguese course at secondary school. The aim was to predict whether a student was likely to have an above or below average performance based on their attributes. This model could then be used to identify underperforming students and give them extra support to mitigate against exam failure. 

The final model shows that most influential attributes on academic performance include: a student's desire to continue onto higher education, whether their father completed secondary school as well as their father's vocation. The model was 70% accurate i.e. was able to correctly predict if a student would fail based on their attributes. 

The module was a useful tool in highlighting the effectiveness of data analysis to find solutions to real world problems. 

Gini tree