What is wrong?

Notice: Before sending an error with the download, please try the direct link first: Quantitative Data Cleaning for Large Databases


You must sign in to do that.

Forgot password?

Quantitative Data Cleaning for Large Databases

Quantitative Data Cleaning for Large Databases

Quantitative Data Cleaning for Large Databases

Score: ---- | 0 votes
| Sending vote
| Voted!

Book Details:

Publisher:UC Berkeley
Pages:42 pages
Size:418 KB
License:Pending review


Data collection has become a ubiquitous function of large organizations { not only for record keeping, but to support a variety of data analysis tasks that are critical to the organizational mission. Data analysis typically drives decision-making processes and eficiency optimizations, and in an increasing number of settings is the raison d'etre of entire agencies or firms.

Despite the importance of data collection and analysis, data quality remains a pervasive and thorny problem in almost every large organization. The presence of incorrect or inconsistent data can significantly distort the results of analyses, often negating the potential benefits of information-driven approaches. As a result, there has been a variety of research over the last decades on various aspects of data cleaning: computational procedures to automatically or semi-automatically identify { and, when possible, correct { errors in large data sets. In this report, we survey data cleaning methods that focus on errors in quantitative attributes of large databases, though we also provide references to data cleaning methods for other types of attributes. The discussion is targeted at computer practitioners who manage large databases of quantitative information, and designers developing data entry and auditing tools for end users. Because of our focus on quantitative data, we take a statistical view of data quality, with an emphasis on intuitive outlier detection and exploratory data analysis methods based in robust statistics.

In addition, we stress algorithms and implementations that can be easily and eficiently implemented in very large databases, and which are easy to understand and visualize graphically. The discussion mixes statistical intuitions and methods, algorithmic building blocks, eficient relational database implementation strategies, and user interface considerations. Throughout the discussion, references are provided for deeper reading on all of these issues.



Loading comments...

Scanning lists...

The book in numbers

global rank

rank in category

online since


rate score

Nothing yet...


Nothing yet...

Social likes

Nothing yet...



This may take several minutes


Countries segmentation

This may take several minutes

Source Referers

Websites segmentation


This may take several minutes