Conference Paper (published)
Details
Citation
Connor R, Cardillo FA, Moss R & Rabitti F (2013) Evaluation of Jensen-Shannon distance over sparse data. In: Brisaboa N, Pedreira O & Zezula P (eds.) Similarity Search and Applications: 6th International Conference, SISAP 2013, A Coru?a, Spain, October 2-4, 2013, Proceedings. Lecture Notes in Computer Science, 8199. Similarity Search and Applications: 6th International Conference, SISAP 2013, Coruna, Spain, 02.10.2013-04.10.2013. Berlin, Heidelberg: Springer Verlag, pp. 163-168. https://doi.org/10.1007/978-3-642-41062-8_16
Abstract
Jensen-Shannon divergence is a symmetrised, smoothed version of Küllback-Leibler. It has been shown to be the square of a proper distance metric, and has other properties which make it an excellent choice for many high-dimensional spaces in ?*.
The metric as defined is however expensive to evaluate. In sparse spaces over many dimensions the Intrinsic Dimensionality of the metric space is typically very high, making similarity-based indexing ineffectual. Exhaustive searching over large data collections may be infeasible.
Using a property that allows the distance to be evaluated from only those dimensions which are non-zero in both arguments, and through the identification of a threshold function, we show that the cost of the function can be dramatically reduced.
Keywords
Sparse data; inverted index; sparse vector; intrinsic dimensionality; large data collection;
Journal
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Status | Published |
---|---|
Title of series | Lecture Notes in Computer Science |
Number in series | 8199 |
Publication date | 31/12/2013 |
URL | |
Publisher | Springer Verlag |
Place of publication | Berlin, Heidelberg |
ISSN of series | 0302-9743 |
ISBN | 978-3-642-41061-1 |
Conference | Similarity Search and Applications: 6th International Conference, SISAP 2013 |
Conference location | Coruna, Spain |
Dates | – |