Deep Dive Into PowerBi | Cleaning of Data In PowerBI
The previous article of this series was about the prerequisites required for becoming a PowerBI Developer. if you want to read the previous article click on link
In this article, We will Learn How to Clean our data in PowerBi, Before going deep into today's topic we go through or clarifying some points.
According to my experience, when we call the term ' cleaning of Data' everyone thinks some deletion or insertion in data has to be performed but the picture is something different
let's talk about what is cleaning of data
if we find out the simple and appropriate meaning of cleaning of data is the Transformation of data into worthwhile Data that means we have to perform some operation on data set to achieve the useful information
Operations to be performed for data cleaning
- very first we have to check the null value in the column because null value doesn't have mean, After that we replacing null values with meaningful values like ' zero or not defined' its depends on the data type of a particular column. for example
- check inappropriate characters in column value and removing them using DAX function because that creates multiple columns of a single column in your dashboard. for example
In the above picture you see guys, column value United kingdom have some unwanted characters if we plot this data on visual with the county on-axis, Power BI treat it as a different value (i show you guys in the picture what I am saying)
- a most important step, check datatype of every column because we use our column in DAX function inappropriate data type create lots of problems to execute that functions
(above picture, having a column 'Invoice Date' with the text data type. above picture, having a column 'Invoice Date' with text data type and we have to change the data type of this column)
- Eliminating the duplicate column is also an important step for every Power BI developer Because it reduces the data size that increases the efficiency of PowerBI (efficiency in terms of correctness).
- Our Data is Statistically correct, before start develop Dashboard every developer has to analyze data Statistically using mean, standard deviation, range, or clustering algorithms. It is possible we think meaningful column but after checking Statistically it would be useless.