Categories
Code

How to ingest Excel or (zipped!) CSV files properly in Power Query for Power BI or Excel

When you import Excel or CSV (Comma Seperated Value) files in Power BI using the standard buttons, you end up with a large mess of code! Power BI creates a parameter, a sample file and an extra query…for every import!

So let’s not do that.

The query scripts below allow you to import files, without creating those additional bits, by being self contained. Good to know is that they can be used in Power Query both Excel and Power BI.

A debt of gratitude is owed to Ruth at the Youtube channel ‘Curbal’, who in her videos (You are getting files from folders the WRONG way in Power BI!! Here is how to fix it- Part 1 & The BEST way to combine files in Power BI so you don’t lose columns – Part 2) explains the most important import techniques used in the scripts below.

How to Start

Make sure to have a ‘parameter’ set up called ‘SiteURL’ with the base URL of your Sharepoint/Teams-site where you’ve stored all your data.

If not, adapt that part of the code to point at where you do have your excels/csv’s stored.

How to load one excel in Power Query

The following script is useful for importing one excel at a time, and takes the data on each sheet and creates a table from them.

let
    Source = Excel.Workbook(Web.Contents(SiteURL & "/Shared%20Documents/Model/Data/Example.xlsx"), null, true),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Sheet")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Data"}),
    #"Combine into expanded table" = Table.Combine ( #"Removed Other Columns"[Data] ),
    #"Promoted Headers" = Table.PromoteHeaders(#"Combine into expanded table", [PromoteAllScalars=true])
in
     #"Promoted Headers"

How to load a whole folder of excels in Power Query

let
    Source = SharePoint.Files(SiteURL , [ApiVersion = 15]),
    #"Filtered xlsx" = Table.SelectRows(Source, each [Folder Path] = "https://example.com/sites/SiteName/Shared Documents/General/" and [Extension] = ".xlsx"),
    #"Create Data for each" = Table.AddColumn(#"Filtered xlsx", "DataContent", each Excel.Workbook([Content], null, true)),
    #"Add DataSheets" = Table.AddColumn(#"Create Data for each", "DataSheets", each Table.SelectRows([DataContent], each [Item] = "Sheet 1")),
    #"Removed Other Columns" = Table.SelectColumns(#"Add DataSheets",{"DataSheets"}),
    #"Combined Excel Rows" = Table.Combine(#"Removed Other Columns"[DataSheets]),
    #"Promote Headers in Data" = Table.AddColumn(#"Combined Excel Rows", "Custom", each Table.PromoteHeaders([Data],
[PromoteAllScalars=true])),
    #"Select Custom" = Table.SelectColumns(#"Promote Headers in Data",{"Custom"}),
    #"Added Custom" = Table.Combine(#"Select Custom"[Custom])
in
    #"Added Custom"

How to load multiple CSV’s, all in one ZIP in Power Query

let
    Source = SharePoint.Files(SiteURL, [ApiVersion = 15]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Name] = "History-Revenue.zip")),
    #"Invoked Custom Function" = Table.AddColumn(#"Filtered Rows", "Unzipcontents", each Unzipcontents([Content])),
    #"Expanded Unzipcontents" = Table.ExpandTableColumn(#"Invoked Custom Function", "Unzipcontents", {"Content"}, {"Zip"}),
    Unzip = Table.AddColumn(#"Expanded Unzipcontents", "Unzipped", each Csv.Document([Zip],[Encoding=1252,Delimiter=",",QuoteStyle=QuoteStyle.Csv])),
    #"Add Headers to Unzipped" = Table.AddColumn(#"Unzip", "Custom", each Table.PromoteHeaders([Unzipped],
[PromoteAllScalars=true])),
    #"Removed Other Columns" = Table.SelectColumns(#"Add Headers to Unzipped",{"Custom"}),
    #"Combine Custom" = Table.Combine(#"Removed Other Columns"[Custom])

in
    #"Combine Custom"

Unzipcontents – Unzip script to add as query in Power Query

(ZIPFile) => 
let
    Header = BinaryFormat.Record([
        MiscHeader = BinaryFormat.Binary(14),
        BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileSize   = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
        ExtrasLen  = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)    
    ]),

    HeaderChoice = BinaryFormat.Choice(
        BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
        each if _ <> 67324752             // not the IsValid number? then return a dummy formatter
            then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
            else BinaryFormat.Choice(
                    BinaryFormat.Binary(26),      // Header payload - 14+4+4+2+2
                    each BinaryFormat.Record([
                        IsValid  = true,
                        Filename = BinaryFormat.Text(Header(_)[FileNameLen]), 
                        Extras   = BinaryFormat.Text(Header(_)[ExtrasLen]), 
                        Content  = BinaryFormat.Transform(
                            BinaryFormat.Binary(Header(_)[BinarySize]),
                            (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
                        )
                        ]),
                        type binary                   // enable streaming
                )
    ),

    ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),

    Entries = List.Transform(
        List.RemoveLastN( ZipFormat(ZIPFile), 1),
        (e) => [FileName = e[Filename], Content = e[Content] ]
    )
in
    Table.FromRecords(Entries)