我要吃瓜

Conference Paper (published)

A tale of four metrics

Details

Citation

Connor R (2016) A tale of four metrics. In: Amsaleg L, Houle M & Schubert E (eds.) Similarity Search and Applications. SISAP 2016. Lecture Notes in Computer Science, 9939. 9th International Conference on Similarity Search and Applications, SISAP 2016, Tokyo, Japan, 24.10.2016-26.10.2016. Cham, Switzerland: Springer, pp. 210-217. https://doi.org/10.1007/978-3-319-46759-7_16

Abstract
There are many contexts where the definition of similarity in multivariate space requires to be based on the correlation, rather than absolute value, of the variables. Examples include classic IR measurements such as TDF/IF and BM25, client similarity measures based on collaborative filtering, feature analysis of chemical molecules, and biodiversity contexts. In such cases, it is almost standard for Cosine similarity to be used. More recently, Jensen-Shannon divergence has appeared in a proper metric form, and a related metric Structural Entropic Distance (SED) has been investigated. A fourth metric, based on a little-known divergence function named as Triangular Divergence, is also assessed here. For these metrics, we study their properties in the context of similarity and metric search. We compare and contrast their semantics and performance. Our conclusion is that, despite Cosine Distance being an almost automatic choice in this context, Triangular Distance is most likely to be the best choice in terms of a compromise between semantics and performance.

Keywords
Information retrieval; cosine similarity; cosine distance; semantic basis; query threshold;

StatusPublished
Title of seriesLecture Notes in Computer Science
Number in series9939
Publication date31/12/2016
Publication date online27/09/2016
URL
PublisherSpringer
Place of publicationCham, Switzerland
ISSN of series0302-9743
ISBN978-3-319-46758-0
Conference9th International Conference on Similarity Search and Applications, SISAP 2016
Conference locationTokyo, Japan
Dates