Approximate String Search in Spatial Databases

[Overview] [Papers and Talks] [Source Code] [Dataset] [Contacts]　

Overview

This work presents a novel index structure, MHR-tree, for efficiently answering approximate string match queries in large spatial databases. The MHR-tree is based on the R-tree augmented with the min-wise signature and the linear hashing technique. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the sub-tree of u. We analyze the pruning functionality of such signatures based on set resemblance between the query string and the q-grams from the sub-trees of index nodes. MHR-tree supports a wide range of query predicates efficiently, including range and nearest neighbor queries. We also discuss how to estimate range query selectivity accurately. We present a novel adaptive algorithm for finding balanced partitions using both the spatial and string information stored in the tree. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our approach.

Papers and Talks

1. Approximate String Search in Spatial Databases,

Full version: Talk:

Source Code

Important Notice

Please cite our paper if you use this library for your work. Thanks!

If you find any bugs or any suggestions/comments, we are very happy to hear from you!

Library Description

The library is developed in GNU C++. For installation and usage, please refer to the README files in the subfolders.

Download

Approximate String Search in Spatial Databases Library [tar.gz]

Dataset

We have generated and experimented with the datasets described in the paper. For real data sets, please see here: TX data set and CA data set.

Contacts

Bin Yao