Version-controlled documents, such as Wikipedia or program codes in Subversion, demands a novel methodology to be analyzed efficiently. The documents are continually edited by one or more authors in contrast of the case of static documents. These collaborative processses make traditional methodologies to be ineffective, yet needs for efficient methodologies are rapidly developing. This paper proposes two new models based on Local Space-time Smoothing (LSS) which captures important revision patterns while Cumulative Revision Map (CRM) tracks word insertions and deletions in particular positions of a document. These two methods enable us to understand and visualize the revision patterns intuitively and efficiently. Synthetic data and real-world data are used to demonstrate its applicability.
Seungyeon Kim, Joshua V. Dillon and Guy Lebanon. Cumulative Revision Map. preprint
Seungyeon Kim. Modeling and Visualizing Version-controlled Documents. Master's Thesis, Georgia Institute of Technology, 2011. pdf
Seungyeon Kim and Guy Lebanon. Local Space-Time Smoothing for Version Controlled Documents. Proceedings of The 23rd International Conference on Computational Linguistics (COLING) 2010. pdf