Parsing a CSV File in Azure Blob Storage

Hao WU
Published by Hao WU
Category : Azure / Event Grid / Logic Apps
29/04/2024

Introduction

Recently, I worked on a project where I needed to read the contents of CSV files and then process them. We store these files in Azure Blob Storage containers. In this article, I’ll show you the solution used for this task.

 

Problem

Let’s assume we have a container named “students” on a Storage Account called “stohwu”, which is of type “StorageV2 (general purpose v2)”.

storage account

Within our container:

blob container

CSV files storing information about students are dropped in over time. We need to read these files. Let’s take the example of a file named Students.CSV:

Student.csv

Each line contains the information of a student, separated by tabs, with each line ending in “CRLF”.

When a file is dropped as a blob, how can we trigger our reading and processing process? The file content is a string; how can we transform it for reading?

 

Solution and Demo

I decided to use an Azure Logic App to handle these files. First, we’ll see how to trigger this component at the moment a blob is dropped, and secondly, how to manipulate it.

 

Trigger

When we drop a file into a container, we can choose to use either a Blob Trigger or an Event Grid Trigger to detect this event and trigger the process. However, the Blob Trigger has limitations:

  • It does not support “Blob-only storage accounts”. This requires the use of a “General Purpose Storage Account”.
  • It triggers the retrieval of all blobs in the container. This is not well-suited for cases where we only want to retrieve newly created blobs.
  • It polls, so the frequency depends on the settings. Too high a frequency will unnecessarily increase the bill. Too low a frequency may potentially result in a too long processing time.

To address these limitations, we will use the Event Grid Trigger. The trigger will be instant, upon the creation of the blob. Moreover, it will be easier to target only new creations.

First, we create an Event Grid System Topic in this manner:

Event Grid

With a Topic Type “Storage Account (Blob &GPv2)”. Here is the Topic created:

Event Grid Topic

I choose the Event Grid Trigger for our Logic App, specifying the Resource Type as “Microsoft.Storage.StorageAccounts” and the Event Type as “Microsoft.Storage.BlobCreated”. I also add Prefix Filter and Suffix Filter to refine the triggering conditions. The configurations are as follows:

Logic App Configuration

 

Get Blob Content

Next, we create the “Get blob content using path” operation. The “path” is specified using information from the trigger. This allows it to be generic.

Logic App Configuration

Here is the exact Blob Path content:

concat('/students/',split(triggerBody()?['data']?['url'],'/')?[4])

After executing the Logic App, I get the following output:

Blob Content

 

Parsing the File

Let’s look at the body obtained in the “Get blob Content”:

"body": "S1\tEnzo\tMartin\r\nS2\tLisa\tBernard\r\nS3\tPaul\tDubois\r\nS4\tMarc\tRichard\r\n"

We will use an Azure Function to parse the content and transfer it into json.

Here’s our “Student” object model:

namespace MW.Blog.HWU.Model 
{ 
    public class Students 
    { 
        public List<Student> listStudents { get; set; } 
        public Students() 
        { 
            listStudents = new List<Student>(); 
        }
    } 

    public class Student 
    { 
        public string Id { get; set; } 
        public string FirstName { get; set; } 
        public string LastName { get; set; } 
    } 
}

And our Azure Function:

[FunctionName("ParserStudentsFileToJson")] 
public static Students RunParser( 
    [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest request, 
    ILogger log) 
{ 
    string reqBody = new StreamReader(request.Body).ReadToEnd(); 
    dynamic data = JsonConvert.DeserializeObject(reqBody); 
    string fileContent = data?.fileContent; 
    Students studentsRes = new Students(); 

    string[] Lines = new string[] { }; 
    if (fileContent.IndexOf("\r\n") > 0) 
    { 
        Lines = fileContent.Split("\r\n"); 
    } 
    foreach (var l in Lines) 
    { 
        if (l.Trim().Length > 0) // ignore empty line 
        { 
            string[] items = l.Split("\t"); 
            Student stu = new Student 
            { 
                Id = items[0], 
                FirstName = items[1], 
                LastName = items[2] 
            }; 
            studentsRes.listStudents.Add(stu); 
        } 
    } 
    return studentsRes; 
}

In summary, our Logic App looks like this:

logic app overview

Let’s conduct a test. Once a Student.CSV file is dropped into the container, our process is immediately triggered, and we see that the content is transferred after the Azure Function:

Azure Function Output

Mission accomplished! Once we’ve transformed the content into JSON, we can do whatever we want with it.

 

Conclusion

In this article, we triggered the reading of a CSV file upon its creation in our Storage Account using an Event Grid Trigger. To go further, we could consider bypassing the Logic App by directly using an Event Handler in our Azure Function.