Flexible Aggregate Similarity Search

[Overview] [Papers and Talks] [Source Code] [Data set] [Contacts]　

Overview

Aggregate similarity search, a.k.a. aggregate nearest neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves the most (or top-k) similar object to Q from a database P, where the similarity is an aggregation (e.g., sum, max) of the distances between the retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of ΦM objects in Q for some support 0 < Φ ≤ 1. We call this new definition flexible aggregate similarity (Fann) search, which generalizes the Ann problem. Next, we present algorithms for answering Fann queries exactly and approximately. Our approximation algorithms are especially appealing, which are simple, highly efficient, and work well in both low and high dimensions. They also return nearoptimal answers with guaranteed constant-factor approximations in any dimensions. Extensive experiments on large real and synthetic datasets from 2 to 74 dimensions have demonstrated their superior efficiency and high quality.

Papers and Talks

1. Flexible Aggregate Similarity Search,

Full version:

Source Code

Important Notice

Please cite our paper if you use this library for your work. Thanks!

If you find any bugs or any suggestions/comments, we are very happy to hear from you!

Library Description

The library is developed in GNU C++. For installation and usage, please refer to the README file.

Download

Flexible Aggregate Similarity Search Library [tar.gz]

Data set

We have generated and experimented with the data sets described in the paper. For real data sets, please follow the description in our paper. We also include the test data set in the library.

Contacts

Bin Yao