We propose to use structural rank for computing content similarity of a set of documents.
- Solely considering the non-zero pattern will be enough to measure the content similarity of a set of documents.
- Term-document matrices are usually very sparse. Bipartite graph traverse algorithm for structural rank will be extremely fast in this case.
- It will be much faster than other pairwise based similarity metrics because we can compute the similarity directly on a set.
Seungyeon Kim, Haesun Park and Guy Lebanon. Fast Spammer Detection Using Structural Rank. 2014. preprint