¿Qué está mal?

Aviso: Antes de informar sobre un error con la descarga, por favor, prueba el enlace directo: Quantitative Data Cleaning for Large Databases


Debes iniciar sesión para hacer esto.

Quantitative Data Cleaning for Large Databases

Quantitative Data Cleaning for Large Databases

Quantitative Data Cleaning for Large Databases

Puntuación: ---- | 0 votos
| Enviando voto
| ¡Votado!

Detalles del libro:

Editor:UC Berkeley
Páginas:42 páginas
Tamaño:418 KB
Licencia:Pendiente de revisión


Data collection has become a ubiquitous function of large organizations { not only for record keeping, but to support a variety of data analysis tasks that are critical to the organizational mission. Data analysis typically drives decision-making processes and eficiency optimizations, and in an increasing number of settings is the raison d'etre of entire agencies or firms.

Despite the importance of data collection and analysis, data quality remains a pervasive and thorny problem in almost every large organization. The presence of incorrect or inconsistent data can significantly distort the results of analyses, often negating the potential benefits of information-driven approaches. As a result, there has been a variety of research over the last decades on various aspects of data cleaning: computational procedures to automatically or semi-automatically identify { and, when possible, correct { errors in large data sets. In this report, we survey data cleaning methods that focus on errors in quantitative attributes of large databases, though we also provide references to data cleaning methods for other types of attributes. The discussion is targeted at computer practitioners who manage large databases of quantitative information, and designers developing data entry and auditing tools for end users. Because of our focus on quantitative data, we take a statistical view of data quality, with an emphasis on intuitive outlier detection and exploratory data analysis methods based in robust statistics.

In addition, we stress algorithms and implementations that can be easily and eficiently implemented in very large databases, and which are easy to understand and visualize graphically. The discussion mixes statistical intuitions and methods, algorithmic building blocks, eficient relational database implementation strategies, and user interface considerations. Throughout the discussion, references are provided for deeper reading on all of these issues.



Cargando comentarios...

Escaneando listas...

El libro en números

Posición global

posición en categoría

en catálogo desde



Nothing yet...


Nothing yet...

'LIKES' sociales

Nothing yet...



Esto puede tardar un momento


Segmentación por países

Esto puede tardar un momento

Páginas de entrada

Segmentación por sitios web


Esto puede tardar un momento