OpenRefine is a free, open source power tool for working with messy data and improving it
-
Updated
Sep 12, 2025 - Java
OpenRefine is a free, open source power tool for working with messy data and improving it
OpenRefine is a free, open source power tool for working with messy data and improving it
Data visualisations in Power BI
Data visualisations in Power BI
A Scalable Data Cleaning Library for PySpark.
A Scalable Data Cleaning Library for PySpark.
Table Enforcer is my attempt to apply a sort of "test driven development" workflow to data cleaning and validation. A python package to facilitate the iterative process of developing and using schema-like representations of DataFrames in pandas for recoding and validating instances of these data.
Table Enforcer is my attempt to apply a sort of "test driven development" workflow to data cleaning and validation. A python package to facilitate the iterative process of developing and using schema-like representations of DataFrames in pandas for recoding and validating instances of these data.
Examples for Optimus a Data Cleansing Library for Big Data.
Examples for Optimus a Data Cleansing Library for Big Data.
"Telewire Analytics," an innovative project aimed at optimizing resource utilization within the telecom industry.
"Telewire Analytics," an innovative project aimed at optimizing resource utilization within the telecom industry.
GitHub Repo of our Tidyverse workshop organized on Sep 8, 2022
-This project targets the textual analysis of Egyptian movie plot summaries that were curated from online sources, covering the four golden decades of Egyptian Cinema.
GitHub Repo of our Tidyverse workshop organized on Sep 8, 2022
Implementation of a Neural Network (NN) model for handwriting recognition using the MNIST dataset.
This course by University of Michigan introduces the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will also introduces data manipulation and cleaning techniques using python pandas data science library.
This project is an internal project with INTEL where a framework for monitoring data quality from disparate sources and automating it using python.
Add a description, image, and links to the datacleansing topic page so that developers can more easily learn about it.
To associate your repository with the datacleansing topic, visit your repo's landing page and select "manage topics."