Parsing complex JSON structures is usually not a trivial task. When your destination is a database, what you expect naturally is a flattened result set. Things get more complicated when your JSON source is a web service and the result consists of multiple nested objects including lists in lists and so on. Things get even more complicated if the JSON schema changes over time, which is often a real-life scenario.
We have these wonderful Azure Logic Apps, which help us consistently get the JSON results from various sources. However, Logic Apps are not so good at parsing more complex nested structures. And they definitely don’t like even subtle source schema changes.
Enter Databricks!
With Databricks you get:
- An easy way to infer the JSON schema and avoid creating it manually
- Subtle changes in the JSON schema won’t break things
- The ability to explode nested lists into rows in a very easy way (see the Notebook below)
- Speed!
Following is an example Databricks Notebook (Python) demonstrating the above claims. The JSON sample consists of an imaginary JSON result set, which contains a list of car models within a list of car vendors within a list of people. We want to flatten this result into a dataframe. Here you go:
from pyspark.sql.functions import explode, col
source_json = """ { "persons": [ { "name": "John", "age": 30, "cars": [ { "name": "Ford", "models": [ "Fiesta", "Focus", "Mustang" ] }, { "name": "BMW", "models": [ "320", "X3", "X5" ] } ] }, { "name": "Peter", "age": 46, "cars": [ { "name": "Huyndai", "models": [ "i10", "i30" ] }, { "name": "Mercedes", "models": [ "E320", "E63 AMG" ] } ] } ] } """
dbutils.fs.put("/tmp/source.json", source_json, True)
source_df = spark.read.option("multiline", "true").json("/tmp/source.json")
# Explode all persons into different rows persons = source_df.select(explode("persons").alias("persons")) # Explode all car brands into different rows persons_cars = persons.select( col("persons.name").alias("persons_name") , col("persons.age").alias("persons_age") , explode("persons.cars").alias("persons_cars_brands") , col("persons_cars_brands.name").alias("persons_cars_brand") ) # Explode all car models into different rows persons_cars_models = persons_cars.select( col("persons_name") , col("persons_age") , col("persons_cars_brand") , explode("persons_cars_brands.models").alias("persons_cars_model") )
display(persons_cars_models)
We’ve seen here how we can use Databricks to flatten JSON with just a few lines of code.
Keep your eyes open for future Databricks related blogs, which will demonstrate more of the versatility of this great platform.
More on some of the used functions (PySpark 2.3.0 documentation):
Wonderful work! That is the type of info that are
meant to be shared around the internet. Shame
on Google for now not positioning this put up higher!
Come on over and talk over with my web site . Thanks =)
What’s up, the whole thing is going perfectly here and ofcourse every one is sharing data, that’s in fact excellent, keep up writing.
Good day! This post could not be written any better! Reading through this post reminds me of my good old room mate!
He always kept talking about this. I will forward this
page to him. Pretty sure he will have a good read. Many thanks for sharing!
Thanks Man. this is really helpful.
Greetings from Idaho! I’m bored to tears at work so I decided to browse your site on my iphone during lunch break.
I love the info you provide here and can’t wait to
take a look when I get home. I’m surprised at how fast your blog loaded on my cell phone ..
I’m not even using WIFI, just 3G .. Anyways, amazing site!
It’s the best time to make some plans for the future and
it is time to be happy. I have read this post and if I
could I want to suggest you few interesting things or advice.
Maybe you could write next articles referring to this article.
I wish to read more things about it!
Keep this going please, great job!
Hi there! I simply want to give you a big thumbs up for your great info you have got here on this post.
I’ll be returning to your web site for more soon.
Incredible! This blog looks just like my old one!
It’s on a totally different topic but it has pretty much the same layout and
design. Wonderful choice of colors!
You have observed very interesting points! ps nice website.
This site really has all the info I needed about this subject and
didn’t know who to ask.
I am impressed with this site, rattling I am a fan.
Awesome help! Thank you very much.
I just like the helpful info you supply for your articles.
I’ll bookmark your weblog and take a look at once more here frequently.
I’m rather certain I will be informed a lot of new stuff proper
here! Good luck for the next!
Woah! I’m really loving the template/theme of this site.
It’s simple, yet effective. A lot of times it’s very hard to get that “perfect balance” between user friendliness and visual appeal.
I must say that you’ve done a very good job with this.
Also, the blog loads extremely fast for me on Opera.
Exceptional Blog!
Thanks for sharing these tips! I’m always interested in learning more about this.