Recommendation @ EdX

EdX

I am in my last semester of my diploma thesis Recommendation of new questions in online student communities. In short, we are recommending new questions to most suitable users in educational domain. It is done in CQA system, which is some sort of discussion board/forum, both of them are used by students for discussion and solving problems in Massive Open Online Courses (MOOCs - online education similar to university but available to many people), e.g. online learning platforms Coursera, Udacity or EdX. The aim of the work is to help to answer a new questions. Moreover, it supports interactions among students which leads to better learning. Our approach is specifically designed for educational domain and our innovation is usage of data from the online course and explicit modelling of users’ willingness to answer.

Question recommendation design

Inputs for the question routing framework are new questions and users’ activities in CQA system and MOOC course. The output for a new question is a list of recommended answerers sorted by their ranking of how likely they will answer the question.

Design Question recommendation pipeline.

At first, question title and body are concatenated and preprocessed by tokenization, stop words removal and stemming by Snowball stemmer. After preprocessing the question profile as a bag-of-words model is created and LDA latent topics are inferred.

def preprocess_document(self, text):
    words = [self.preprocess_word(word) 
    		for word in utils.tokenize(text, lowercase=True, deacc= True)]
    return [word for word in words if self.is_valid_word(word)]

def preprocess_word(self, word):
    return self.stemmer.stem(word)

def is_valid_word(self, word):
    if len(word) < MIN_WORD_LENGTH or word in self.stop:
        return False
    return True

Text processing with gensim.

We are modelling users’ expertise and willingness to answer by following features, which are updated in real-time:

Expertise features Willingness features
Question-user text similarity Total answer count
Answers count within a week category Total comments count
Answers count within a topic category Total questions count
Votes count within a week category Total votes earned
Votes count within a topic category Answers count in recent period
Total knowledge gap Last answer time
Knowledge gap within a week category Average CQA activity
Knowledge gap within a topic category Average course activity
Portion of seen lectures within a week category Course registration date
Portion of seen lectures within a topic category Seen questions within a week category
Grade Seen questions within a topic category
  Portion of seen lectures within a week category
  Portion of seen lectures within a topic category
  Lecture freshness

Askalot CQA system is developed in the Ruby on Rails web framework. We used this framework to implement modules responsible for showing the recommendations to the users. To implement the listeners responsible for listening to a new events and updating the features in the database, we used Ruby programming language. Askalot CQA system use PostgreSQL as a database system, which we used to persist and load features for each user which are used by the question routing method. For text processing and classification, the Python libraries gensim and scikit-learn are used.

def predict(self, X_exp, X_will):
    exp_predictions = self.exp_clf.predict(self.baseline, X_exp)
    will_predictions = self.will_clf.predict(self.baseline, X_will)

    indices = [ind for ind, (i, j) in enumerate(zip(exp_predictions, will_predictions))]
    probabilities = exp_predictions[indices] * will_predictions[indices]

    # Sort descending based on probabilies array
    i = np.array(probabilities).argsort()[::-1]
    indices = np.array(indices)[i]
    return indices, exp_predictions, will_predictions

Combination of two classifiers into the ensemble classifier predicting probability of student having both knowledge and willingness to answer.

A/B experiment at EdX

We designed and implemented the recommendation in CQA system Askalot (TODO). In cooperation with Harvard University and TU Delft university, we deployed it for their course Quantum Cryptography on EdX platform. More than 4500 students were signed up for the course.

Metric Count
Questions 473
Answers 426
Registered students 4511
Contributing students 298

We divided users into the three user groups:

  • Educational specific group - recommendation by our method
  • Baseline group - question routing without the educational specific features
  • Control group - do not have a question recommendation

New question is recommended to 10 users in educational-specific group and to 10 users in baseline group. User can get maximal 4 recommendations per 7 days.

Forms of recommendation

New questions are recommended by Askalot notification system and recent recommendations are listed on the Askalot dashboard. Moreover, recommended questions are highlighted in the list of all questions.

notification Notification view.

Dashboard Recommended questions on dashboard.

Next plan

In the final semester, we are going to evaluate the results of A/B experiment - the accuracy of recommendations and total imapct on the student community.

Further reading

Deployment of Askalot to EdX

Written on January 29, 2017