Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I work on a dataframe with two column, mvv and count.

+---+-----+
|mvv|count|
+---+-----+
| 1 |  5  |
| 2 |  9  |
| 3 |  3  |
| 4 |  1  |


i would like to obtain two list containing mvv values and count value. Something like

mvv = [1,2,3,4]
count = [5,9,3,1]


So, I tried the following code: The first line should return a python list of row. I wanted to see the first value:

mvv_list = mvv_count_df.select('mvv').collect()
firstvalue = mvv_list[0].getInt(0)


But I get an error message with the second line:

AttributeError: getInt

1 Answer

0 votes
by (32.3k points)
edited by

Look, the problem in your approach is that first you are trying to get integer from a Row Type, the output of your collect is like this:

>>> mvv_list = mvv_count_df.select('mvv').collect()

>>> mvv_list[0]

Out: Row(mvv=1)

Instead if you take something like this:

>>> firstvalue = mvv_list[0].mvv

Out: 1

You will get the mvv value.

Now, in order to get all the information of the array do:

>>> mvv_array = [int(row.mvv) for row in mvv_list.collect()]

>>> mvv_array

Out: [1,2,3,4]

But if you try the same for the other column:

>>> mvv_count = [int(row.count) for row in mvv_list.collect()]

You get an error:

Out: TypeError: int() argument must be a string or a number, not 'builtin_function_or_method'

This happens because count is a built-in method and the column has the same name as count. A workaround to do this without getting an error for the other column is change the column name of count to _count:

>>> mvv_list = mvv_list.selectExpr("mvv as mvv", "count as _count")

>>> mvv_count = [int(row._count) for row in mvv_list.collect()]

But this workaround is not needed, as you can access the column using the dictionary syntax:

>>> mvv_array = [int(row['mvv']) for row in mvv_list.collect()]

>>> mvv_count = [int(row['count']) for row in mvv_list.collect()]

This will work finely without any error.

If you want to know more about Spark, then do check out this awesome video tutorial:

If you wish to learn What is Apache Spark visit this Apache Spark Training by Intellipaat.

If you are interested to learn Python from Industry experts, you can sign up for this Python Certification Course by Intellipaat.

Browse Categories

...