Summarizing Wikipedia contributions of a user

Encadrants

  • Antoine Amarilli, INFRES
  • Emails: antoine.amarilli@telecom-paris.fr
  • Bureaux: N/A

Nombre d'étudiant par instance du projet:

  • Minimum: 3
  • Maximum: 5

Nombre d'instances du projet :

1

Sigles des UE couvertes et/ou Mots-clés :

Wikipedia, open source, data analysis, data visualization, edit distance

Logo/Favicon

project image

Description du projet :

Wikipedia1 is an established collaborative project to write an encyclopedia under an open license. Contributors to Wikipedia have a wide diversity of profiles2, and include in particular some academic researchers specialized in various areas. However, contributing to Wikipedia is not always appealing to researchers because, unlike other academic activities, it typically does not lead to wide recognition. It is true that the history of all user edits is publicly available, but the main contributors of an article are not visibly acknowledged on Wikipedia itself. For this reason, users cannot easily measure or advertise their impact as a Wikipedia contributor. Indeed, they only have access to the raw list of their edits, making it hard to find what is significant.

The goal of this project is to give Wikipedia users the possibility to analyze the history of their contributions and to visualize the main areas in which they contribute and the specific pages in which they have contributed the most. Further, by integrating statistics about page views, the project will make it possible for users to understand how many readers have benefited from their contributions. The end goal of the project is to incentivize more people to contribute to Wikipedia, in particular more academic researchers, by computing a public profile which highlights the impact of their contributions.

Objectifs du projet :

  • Investigate feasability: study the available APIs for Wikipedia to obtain the edit history of a user, the history of edits of a given Wikipedia page, and the page view statistics.
  • Investigate existing tools for similar purposes3 and determine what they can or cannot do.
  • Create a first version: given a user name on the English Wikipedia, retrieve the list of all contributions of the user, and compute the list of pages that the user has created or to which they significantly contributed.
  • Add the possibility to filter based on the Wikipedia categories or based on the Wikiproject, to focus on pages in a specific topic area (e.g., computer science).
  • Use the page views API to compute the number of page views on the edited pages.
  • Using the history of the pages edited by the user, and using an edit distance computation to align the successive revisions of the page, determine to which pages the user has significantly contributed in the sense that a large proportion of the current version of the page was originally written by the user.
  • Using historical page view data, compute the precise number of page views on all pages across all versions to which the user had significantly contributed.
  • Track how many links have been added by other users that lead to pages to which the user is a significant contributor.

Références bibliographiques: