What is wrong?

Notice: Before sending an error with the download, please try the direct link first: Effective and Efficient Similarity Search in Databases


You must sign in to do that.

Forgot password?

Effective and Efficient Similarity Search in Databases

Effective and Efficient Similarity Search in Databases

Effective and Efficient Similarity Search in Databases

Score: 6.00 | 1 vote
| Sending vote
| Voted!

Book Details:

Publisher:Universitat Postdam
Pages:117 pages
Size:7.96 MB
License:Pending review


With ever-growing amounts of data and the ability and desire to integrate and query more and more databases, there is a need for eficient processing of this data. Traditional relational database systems are built for fast retrieval of data from a large corpus. With SQL and eficient index structures, such as the B+-tree, retrieval of records with exact matches in their attribute values from even very large databases can be implemented with little effort. However, a query may also be inaccurate, as it may contain typing errors or missing values, and also a database record may contain incorrect or incomplete information. In this case, an index that only finds exact matches cannot be used. A traditional database system neither offers the possibility to define what is a similar record, nor does it perform a fast retrieval of those records.

The field of research that solves this problem is called similarity search: Given a set of records in a database and a query record, similarity search aims to find all records in the database that are suficiently similar to the query record.

This thesis is structured as follows. We begin with an overview of our similarity search system in Chapter 2 before describing the components of the system in detail in the following chapters. Chapter 3 introduces the similarity model used throughout the thesis. We also propose the novel similarity measure for comparing database records that exploits frequencies of values. Chapter 4 contains an introduction to similarity indexes for fast retrieval of similar values given specific similarity measures. We present an index structure for string similarity search, the State Set Index (SSI), and compare the method with previous index structures. For subsequent chapters, we assume that we have created one similarity index for each attribute, and that we have an overall similarity measure composed of attribute-specific measures. In Chapter 5, we then introduce query plans as a means of describing how to access the similarity indexes and how to combine the results. We describe static and query-specific algorithms for selecting query plans based on the criteria result completeness and execution cost. Chapter 6 adds the BSA method for answering top-k queries with similarity indexes by retrieving bulks of IDs of relevant records and combining results into a priority queue. For Chapters 3 to 6, related work is described at the end of each chapter. We conclude the thesis and give an overview on open research questions for future work in Chapter 7.



Loading comments...

Scanning lists...

The book in numbers

global rank

rank in categories

online since


rate score




Social likes

Nothing yet...



This may take several minutes


Countries segmentation

This may take several minutes

Source Referers

Websites segmentation


This may take several minutes