Day 4: Power Query Editor – E.T.L (Extract, Transform, Load) & Data Cleaning in Power BI
Power BI is a powerful tool for transforming raw data into meaningful insights. Today, we focus on Power Query Editor, the heart of the ETL (Extract, Transform, Load) process. This is where data cleaning, transformation, and preparation happen, making your datasets ready for analysis.
1. Text Transformation
Cleaning and transforming text is crucial for consistent data. Power Query provides a variety of formatting options:
Format options:
-
lowercase – Converts all text to lowercase
-
UPPERCASE – Converts all text to uppercase
-
Capitalize Each Word – Capitalizes the first letter of every word
-
Trim – Removes leading and trailing spaces
-
Clean – Eliminates non-printable characters
-
Add Prefix / Add Suffix – Adds custom text before or after the column values
Merge columns – Combine multiple columns into one for better readability or aggregation.
Extract operations:
-
Length – Finds the number of characters in a text
-
First characters / Last characters – Extracts characters from the start or end
-
Range – Extracts characters between a specified range
-
Text before / after delimiter – Splits text based on a character (like a comma or dash)
-
Text between delimiters – Extracts the text between two specific characters
2. Statistics
Analyzing numeric data starts with basic statistics:
Basic calculations:
-
Sum – Total of all numbers
-
Min / Max – Smallest and largest values
-
Average (AVG) – Sum of all numbers ÷ Count of numbers
-
Example: For 1, 2, 1 → 1+2+1 = 4 → 4 ÷ 3 = 1.33
-
-
Count – Counts all values, including duplicates
-
Example: 100, 200, 200, 500, 500, 500, 500, 1000 → Count = 8
-
-
Count Distinct – Counts only unique values
-
Example: Same values → Count Distinct = 4
-
3. Standard & Scientific Calculations
Standard operations: Addition, Subtraction, Multiplication, Division
Scientific operations: Absolute values for handling negative numbers
Rounding options:
-
Round Up
-
Round Down
-
Round
Information functions:
-
Is Even / Is Odd – Identify number parity
-
Sign – Returns the sign of a number
4. Date and Time Functions
Managing date and time is critical in analytics:
Age, Date Only – Extract or calculate age from a date
Year functions:
-
Year
-
Start of Year
-
End of Year
Month functions:
-
Month
-
Start of Month
-
End of Month
-
Days in Month
-
Month Name
Quarter functions:
-
Quarter
-
Start of Quarter
-
End of Quarter
Week functions:
-
Week
-
Start of Week
-
End of Week
-
Week of Year
-
Week of Month
Day functions:
-
Day
-
Start of Day / End of Day
-
Day of Week
-
Day of Year
Earliest / Latest records – Identify first or most recent data in a column
Time functions:
-
Time Only
-
Hour
-
Minute
-
Seconds
Duration – Calculate differences between two time or date fields
5. Scripting with R and Python
Power Query also supports R and Python, allowing advanced data transformations and custom analytics scripts.
6. Group By
Grouping is a powerful feature to aggregate similar data in a single step. You can group by one or more columns and perform operations like Sum, Average, or Count while keeping your data organized.
Power Query Editor is truly the backbone of data cleaning in Power BI. Mastering these functions will not only save time but also ensure data accuracy, which is essential for meaningful analytics.