how to to count and get avg from complex nested JSON

Question

asked Nov 22, 2019 in Big Data Hadoop & Spark by dna (120 points)

I have JSON file named Class.json and want to calculate all data with some condition.

Class.json

{
"class": [
{
"class_id": "1",
"data": {
"lesson3": {
"id": 3,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-07-11",
"lesson_price": "USD 25",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2016-09-24",
"lesson_price": "USD 15",
"status": "OPEN REGISTRATION"
}
]
},
"lesson4": {
"id": 4,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2016-12-17",
"lesson_price": "USD 19",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2015-11-12",
"lesson_price": "USD 29",
"status": "ONGOING"
},
{
"schedule_id": "3",
"schedule_date": "2015-11-10",
"lesson_price": "USD 14",
"status": "ON SCHEDULE"
}
]
}
}
},
{
"class_id": "2",
"data": {
"lesson1": {
"id": 1,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-05-21",
"lesson_price": "USD 50",
"status": "CANCELLED"
}
]
},
"lesson2": {
"id": 2,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-06-04",
"lesson_price": "USD10",
"status": "FINISHED"
},
{
"schedule_id": "5",
"schedule_date": "2018-03-01",
"lesson_price": "USD12",
"status": "CLOSED"
}
]
}
}
}
]
}

I've tried

df = spark.read.json("class.json", multiLine=True)
df.show()

and its shows:

+--------------------+
| class |
+--------------------+
|[[1, [,, [3, [[US... |
+--------------------+

then for accessing the array I've tried this one

try = df.select("class").map(lambda s: s['data'])

But got error AttributeError: 'DataFrame' object has no attribute 'map'

or doing df['class'][0]['data'] got Column<b'class[0][data]'>

Goal:

count schedule that has the status "ONGOING" that has schedule date before 2017-01
average lesson price before 2017-01

How to do it with pyspark?

Please log in to answer this question.

how to to count and get avg from complex nested JSON

0 Answers

Related questions

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources