any efficient way to count binary values in columns of big data table?

Question

asked Jun 19, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

I am using big-query for Chicago crime data in google cloud platform. However, I want to count number of arrest and non arrest per crime type. It is easy to count this in pandas but it's not intuitive for me how to count binary values in dataframe with big query. Can anyone give me possible idea to make this count?

data: because Chicago crime data is big I am not able to give reproducible example here, but it is very easy to preview crime data from here: Chicago crime data

Let me give you a little preview:

my big query:

SELECT
  primary_type,
  count(arrest),
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016
FROM
  `bigquery-public-data.chicago_crime.crime`
WHERE
  arrest = TRUE
  AND year IN (2001,
    2018)
  AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY
  primary_type,
  arrest

but this query gives me empty output, I don't know how to make it work.

goal:

from Chicago crime data table, I want to extract number of total arrest and non-arrest for each primary type, I want to exclude OTHER OFFENSE and all non-criminal types until end of 2018.

how can I correct my big-query to get my expected output?

1 Answer

Amit Rawat · Answer 1 · 2019-06-19T13:23:25+0000

The problem in your code is in the line:

AND year IN (2001, 2018)

Here you selected only 2001 and 2008, instead, you should have selected all the years between 2001 and 2018. You should replace the above command with:

AND year IN (2001, 2018)

I hope this would work for you.

SELECT
primary_type,
COUNT(arrest) arrest_total,
COUNTIF(year = 2015) AS arrests_2015,
COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE arrest = TRUE
AND year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, arrest

Also, to include non-arrests in your data table run the following code:

#standardSQL

SELECT
primary_type,
arrest,
COUNT(arrest) arrest_total,
COUNTIF(year = 2015) AS arrests_2015,
COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY primary_type, arrest

And, for getting one output row per primary type, write the below command:

SELECT
primary_type,
COUNTIF(arrest) arrests,
COUNTIF(NOT arrest) non_arrests,
COUNT(arrest) arrest_total,
COUNTIF(year = 2015) AS arrests_2015,
COUNTIF(year = 2016) AS arrests_2016
FROM `bigquery-public-data.chicago_crime.crime`
WHERE year BETWEEN 2001 AND 2018
AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')

GROUP BY primary_type

If you want more information regarding Hadoop, refer to the following video:

any efficient way to count binary values in columns of big data table?

1 Answer

Related questions

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources