Recommendation @ EdX
I am in my last semester of my diploma thesis Recommendation of new questions in online student communities. In short, we are recommending new questions to most suitable users in educational domain. It is done in CQA system, which is some sort of discussion board/forum, both of them are used by students for discussion and solving problems in Massive Open Online Courses (MOOCs - online education similar to university but available to many people), e.g. online learning platforms Coursera, Udacity or EdX. The aim of the work is to help to answer a new questions. Moreover, it supports interactions among students which leads to better learning. Our approach is specifically designed for educational domain and our innovation is usage of data from the online course and explicit modelling of users’ willingness to answer.
Question recommendation design
Inputs for the question routing framework are new questions and users’ activities in CQA system and MOOC course. The output for a new question is a list of recommended answerers sorted by their ranking of how likely they will answer the question.
At first, question title and body are concatenated and preprocessed by tokenization, stop words removal and stemming by Snowball stemmer. After preprocessing the question profile as a bag-of-words model is created and LDA latent topics are inferred.
def preprocess_document(self, text): words = [self.preprocess_word(word) for word in utils.tokenize(text, lowercase=True, deacc= True)] return [word for word in words if self.is_valid_word(word)] def preprocess_word(self, word): return self.stemmer.stem(word) def is_valid_word(self, word): if len(word) < MIN_WORD_LENGTH or word in self.stop: return False return True
We are modelling users’ expertise and willingness to answer by following features, which are updated in real-time:
|Expertise features||Willingness features|
|Question-user text similarity||Total answer count|
|Answers count within a week category||Total comments count|
|Answers count within a topic category||Total questions count|
|Votes count within a week category||Total votes earned|
|Votes count within a topic category||Answers count in recent period|
|Total knowledge gap||Last answer time|
|Knowledge gap within a week category||Average CQA activity|
|Knowledge gap within a topic category||Average course activity|
|Portion of seen lectures within a week category||Course registration date|
|Portion of seen lectures within a topic category||Seen questions within a week category|
|Grade||Seen questions within a topic category|
|Portion of seen lectures within a week category|
|Portion of seen lectures within a topic category|
Askalot CQA system is developed in the Ruby on Rails web framework. We used this framework to implement modules responsible for showing the recommendations to the users. To implement the listeners responsible for listening to a new events and updating the features in the database, we used Ruby programming language. Askalot CQA system use PostgreSQL as a database system, which we used to persist and load features for each user which are used by the question routing method. For text processing and classification, the Python libraries gensim and scikit-learn are used.
def predict(self, X_exp, X_will): exp_predictions = self.exp_clf.predict(self.baseline, X_exp) will_predictions = self.will_clf.predict(self.baseline, X_will) indices = [ind for ind, (i, j) in enumerate(zip(exp_predictions, will_predictions))] probabilities = exp_predictions[indices] * will_predictions[indices] # Sort descending based on probabilies array i = np.array(probabilities).argsort()[::-1] indices = np.array(indices)[i] return indices, exp_predictions, will_predictions
A/B experiment at EdX
We designed and implemented the recommendation in CQA system Askalot (TODO). In cooperation with Harvard University and TU Delft university, we deployed it for their course Quantum Cryptography on EdX platform. More than 4500 students were signed up for the course.
We divided users into the three user groups:
- Educational specific group - recommendation by our method
- Baseline group - question routing without the educational specific features
- Control group - do not have a question recommendation
New question is recommended to 10 users in educational-specific group and to 10 users in baseline group. User can get maximal 4 recommendations per 7 days.
Forms of recommendation
New questions are recommended by Askalot notification system and recent recommendations are listed on the Askalot dashboard. Moreover, recommended questions are highlighted in the list of all questions.
In the final semester, we are going to evaluate the results of A/B experiment - the accuracy of recommendations and total imapct on the student community.