Importing Complex JSON files using SQL Server Integration Services

Problem

How to import UK petition JSON files using SQL Server Integration Services?

Solution

In tip #1 we covered the UK petition file format and how to import the JSON files using SQL Server Integration Services. We have also learned about importing nested JSON files in tip #2.  These two tips form the basis for this tip. It is assumed that you have practiced those examples and are familiar with the data format.

In this tip, we are going to download the UK petition data (here) as a JSON file. This JSON file is very complex in nature and will be very difficult to understand and define classes to store the value at runtime. Hence initially, I have come up with simple example to explain the complex parts. Then I will showcase the actual working example to load the UK petition file directly using SSIS.

Reviewing JSON File Format

In the below example #1, two petition details have been consolidated and provided with attributes. This example is an advanced version of the example mentioned in tip #2.

In this example, I have an included additional object "attributes" for the data item. Each petition has attributes: "action", "background", "additional_details", "state", "signature_Count" and "created at" petition.

After refreshing the JSON data content using the Notepad++ JSON viewer, the JSON object hierarchy has been displayed on the left-hand side. It is observed that the inner most object attribute has several attributes (like action, background, state, etc.).

In addition to the "links" object, the attributes object defined under the array. Each item in the array represent a petition. Hence a petition object will have type and id as simple attributes.

In addition, the petition object will have the link details and attribute details as objects. At the root level, a JSON object has been defined.

SSIS Output Columns

From the supplied sample petition file, it is assumed that a petition has column details like id, type, link, action, background, additional_details, state, signature_count and created_dt.

As a first step, let’s create the output columns as mentioned in the picture below.

Creating “DataItem” Class

To enable us to deserialize the JSON file content, we need to create a class definition which reflects the hierarchy of the supplied JSON file. The below represents the class definitions.

 using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Linq;
using System.Text;
namespace PetitionNamespace
{
   public class DataItem
   { 
      public Collection<DataSubItem> data { get; set; }
   }
   public class DataSubItem
   {
      public string type { get; set; }
      public string id { get; set; }
      public Selflink links { get; set; }
      public AttributeSubItems attributes { get; set; }
   }
   public class Selflink
   {
      public string self { get; set; }
   }
   public class AttributeSubItems
   {
      public string action { get; set; }
      public string background { get; set; }
      public string additional_details { get; set; }
      public string state { get; set; }
      public string signature_count { get; set; }
      public string created_at { get; set; }
   }
}			

The "AttributeSubItems" class is the inner most class to contain the attributes of a petition. In addition, the SelfLink class is also an inner most class to define the link (self).

The DataSubItem class contains the definition of a petition which defines the property of the petition like type, id as simple properties. It also contains other properties "link" and "Attribute" as objects. The root level class DataItem has the property "data" which returns a collection of DataSubItem.

Deserialization of JSON Data

We need to read the JSON file and store the file content as a string. The JSON file content can be deserialized using the JavaScript serializer. After the successful deserialization, the data content can be stored as a type of “dataitem” object. The DataItem contains a collection of “datasubItem”. Each DataSubItem represents a petition. The petition details like type and id can be directly availed from the “datasubitem”. The link "self" object can be derived from the "links" property. Also, the attributes of the petition can be derived from the attribute property of the “datasubitem”.

public override void CreateNewOutputRows()
{
   /*
     Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
     For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
   */
   String jsonFileContent = File.ReadAllText(@"E:\WorkArea\Analysis\SSIS_JSON\Petition.JSON");
   JavaScriptSerializer js = new JavaScriptSerializer();
   DataItem dataItem = js.Deserialize<DataItem>(jsonFileContent);
   foreach (DataSubItem dataSubItem in dataItem.data)
   {
      Output0Buffer.AddRow();
      Output0Buffer.type = dataSubItem.type;
      Output0Buffer.id = dataSubItem.id;
      Selflink selflink = dataSubItem.links;
      Output0Buffer.Link = selflink.self;
      AttributeSubItems attributeSubItems = dataSubItem.attributes;
      Output0Buffer.Action = attributeSubItems.action;
      Output0Buffer.Background = attributeSubItems.background;
      Output0Buffer.additionaldetails = attributeSubItems.additional_details;
      Output0Buffer.state = attributeSubItems.state;
      Output0Buffer.signaturecount = attributeSubItems.signature_count;
      Output0Buffer.createdat = attributeSubItems.created_at;
   }
}			

After the successful execution of the package, we can see two records are getting generated by the script component. A data viewer has been setup between the script component and the union all to display the records.

Downloading UK petition JSON file

The UK petition details can be downloaded as a JSON file using this URL. Let’s open the petition JSON file in Notepad++ and refresh the JSON viewer. The below image represents the JSON object hierarchy. The root level JSON object has a links object and an array of petition objects.

After expanding the "Links" object, we can see the links object has properties (self, first, last, next and prep).

Let’s expand the data attributes to understand the hierarchy. It is observed that the data array has multiple petition objects. Each petition object has attributes "type" and "id". Petition also has other properties "links" and "attributes" as objects.

It is evident that the supplied JSON has two datasets. The links dataset has only one record with columns "Self", "First", "Last", "Next" and "Prev". The Petition dataset contains all the petition details as records. Each petition record has the details like "type", "id", "links", "action", "background", "additional_details, "state", and "signature_count". As the "links" and "attributes" are defined as a nested JSON, we need to extract the details and add into the petition details.

As we have two datasets, we need to create two outputs in the script component. The LinksOutput will be crated with the columns "self", "First", "Last", "Next" and "Prev".

The Petitionoutput will have the columns "id", "type", "link", "Action", "Background",  "additional_details", "state", "signature_count" and "created_dt". The below image represents the outputs with the columns.

Once the outputs have been created, we can add two union all components as a “dump” destination and connect the outputs as shown in the picture below.

Creating Petition Object in SSIS

The supplied petition JSON object has many nested JSON objects and arrays. Hence, we need to define multiple classes to store the JSON data contents at runtime using the C# libraries.

The innermost class "Attributes" has the details about the attributes of a petition. This class has the properties like "Action", "Background", "additional_details", "state", "signature_count" and "created_dt".

Another innermost class is "SelfLink" which contains the class definition of the link. The class DataSubItem represents a petition. This class has properties like "type", "id", "SelfLink" and "AttributeSubItems".

The class DataItem has a property data which will return a collection of DataSubItems. The root level class "Petition" has two properties namely "Links" and "Data". The "links" property represents the "Links" dataset and the "data" property represents the Petition dataset. Using these objects, we can navigate and extract the data from the supplied JSON file.

The below represents the petition class definition.

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Linq;
using System.Text;
namespace PetitionNamespace
{
   public class Petition
   {
      public LinkSubItem links { get; set; }
      public System.Collections.ObjectModel.Collection<DataSubItem> data { get; set; }
   }
   public class DataItem
   {
      public Collection<DataSubItem> data { get; set; }
   }
   public class DataSubItem
   {
      public string type { get; set; }
      public string id { get; set; }
      public Selflink links { get; set; }
      public AttributeSubItems attributes { get; set; }
   }
   public class Selflink
   {
      public string self { get; set; }
   }
   public class AttributeSubItems
   {
      public string action { get; set; }
      public string background { get; set; }
      public string additional_details { get; set; }
      public string state { get; set; }
      public int signature_count { get; set; }
      public string created_at { get; set; }
   }
   class Link
   {
      public LinkSubItem links { get; set; }
   }
   public class LinkSubItem
   {
      public string self { get; set; }
      public string first { get; set; }
      public string last { get; set; }
      public string next { get; set; }
      public string prev { get; set; }
   }
}			

Deserialization of JSON Data

The UK petition file can be deserialized using the JavaScriptSerializer. The deserialized object can be stored as an object of type Petition. This petition object contains linkSubItem object which can be accessed using the Links property of the Petition object. The link attributes "Self", "First", "last", "Next", "Prev" are available as properties of the linksubitem object.

The petition details are available as a collection of DataSubItem object. This can be accessed using the data property of the petition object. Each item in the DataSubItem collection represent a petition.

The petition properties id and type can be accessed directly using the datasubitem object. The "self" property of the petition can be accessed using the "Links" object. The links property of the dataSubItem will return the selflink object.

The self-property of the Selflink object will return the actual link. The petition attributes can be accessed using the attributes property of datasubitem. The datasubitem will return attributesubitems. The actual attributes "action", "background" can be accessed using the attributesubitem object.

The below code will help you to deserialize the JSON file and will return links and Petition outputs.

public override void CreateNewOutputRows()
{
   /*
     Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
     For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
   */
   String jsonFileContent = File.ReadAllText(@"C:\SSIS_JSON\samples.JSON");
   JavaScriptSerializer js = new JavaScriptSerializer();
   Petition petition = js.Deserialize<Petition>(jsonFileContent);
   
   //Extract links details as a separate output
   LinkSubItem linkSubItem = petition.links;
   LinksOutputBuffer.AddRow();
   LinksOutputBuffer.Self = linkSubItem.self;
   LinksOutputBuffer.First = linkSubItem.first;
   LinksOutputBuffer.Last = linkSubItem.last;
   LinksOutputBuffer.Next = linkSubItem.next;
   LinksOutputBuffer.Prev = linkSubItem.prev;
 
   //Extract petition details in a Petition output
   System.Collections.ObjectModel.Collection<DataSubItem> dataSubItems = petition.data;
   foreach (DataSubItem dataSubItem in dataSubItems)
   {
      PetitionOutputBuffer.AddRow();
      PetitionOutputBuffer.type = dataSubItem.type;
      PetitionOutputBuffer.id = dataSubItem.id;
      Selflink selflink = dataSubItem.links;
      PetitionOutputBuffer.Link = selflink.self;
      AttributeSubItems attributeSubItems = dataSubItem.attributes;
      PetitionOutputBuffer.Action = attributeSubItems.action;
      PetitionOutputBuffer.Background = attributeSubItems.background;
      PetitionOutputBuffer.additionaldetails = attributeSubItems.additional_details;
      PetitionOutputBuffer.state = attributeSubItems.state;
      PetitionOutputBuffer.signaturecount = attributeSubItems.signature_count;
      PetitionOutputBuffer.createdat = attributeSubItems.created_at;
   }
}			

Once the changes have been made, let’s execute the package to see the results. As we have setup the data viewer on both the outputs, we can see the data in the data pipeline.

The LinksOutput has one record with the columns "Self", "First","Last","Next" and "Prev". The Petition output will have all the petition details with the columns "id", "type", "link", "Action", "Background", "additional_details", "state" and "siganture_count".

Summary

In this tip, we have learned how to import a UK petition JSON file using SQL Server Integration Services. We have also learned about analyzing nested JSON data files in a step by step manner.

Next Steps
  • Learn JSON basics with this tip
  • Challenge your JSON knowledge with this tip

Last Update:
2018-04-26

About the author

Nat Sundar is working as an independent SQL BI consultant in the UK with a Bachelors Degree in Engineering.

View all my tips

Bir cevap yazın

E-posta hesabınız yayımlanmayacak. Gerekli alanlar * ile işaretlenmişlerdir

This site uses Akismet to reduce spam. Learn how your comment data is processed.

TOP