Over the last two decades, the advancement
of Information technology has resulted in innovations, such as Big Data Revolution,
having increasingly visible effects on the life of World Society. We live in an era where
almost everything we do is connected to data. Our job as data scientists is to think critically,
be reflective and minimise biases in our results. Despite the exponential growth of research
on Big Data in various areas, there is a limited body of research in the use of Big Data in
higher education. In order to view the insight of the educational
data we’ve gathered and solve problems, we sorted the process of data science into
four main stages: 1. Data collection and processing, 2. Data analysis and algorithms, 3. Data visualisation
and 4. Decision making. So what’s the challenge or bias that may
occur during the processes we mentioned? Let’s look at analysing the teaching quality of
universities for example. First of all, with the aim of comprehending
student satisfaction and evaluation of teaching quality, data scientists chose to collect
data by sending students questionnaires via email. Since filling the questionnaire is
not necessary, some students might ignore it. On the other hand, students who are dissatisfied
with the teaching quality would participate more actively. Due to the majority of feedback
being negative, data scientists may decipher this as evidence of poor quality. Besides,
data scientists often remove blank values to structure sample data. In this case, the
representative of students who remain neutral would be eliminated from the sample. In the data analysis and algorithms stage,
next challenge is to choose a transparent algorithm with clear evaluation rules. For
instance, when scientists design questionnaires, they should select an algorithm which has
a clear and understandable assessment rules. If teachers have any questions about the results,
it will be easy for them to find out reasons and give feedback.
Then data scientists need to be reflective, collecting feedback, using errors to train
models and avoiding the statistical engine continue spinning out faulty. Moving on to data visualisation, data scientists
should visualise the information appropriately. Compare the following charts, the one on the
left only shows the score of teaching quality for engineering degrees, instead of displaying
both overall score and engineering degrees score as the chart on the right. This might
confuse and mislead students who emphasize the overall teaching quality of the whole
university more, to make the wrong decision. Finally, decision making means
working towards key goals by leveraging verified, analysed data rather than merely shooting
in the dark. In fact, the Teaching metric is usually measured by these five performance
indicators shown on the screen. But some data scientists would choose to define the teaching
quality based on Universities’ Nobel Prize winners or significant inventions/achievements.
This decision that data scientists had made, is biased and unfair for those newly-established
universities. Bias is sometimes unavoidable because of funding,
politics or resources constraints. However, based on the biases and mistakes that data
scientists may make, the result could be significantly different from how it was supposed to be.
Being a data scientist have to be critical and reflective, can’t just ignore the biases.
Recognising the type of biases and challenges, understanding the impact on the conclusions
will improve the quality of the conclusions and make us a better data scientist.