0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I am using big-query for Chicago crime data in google cloud platform. However, I want to count number of arrest and non arrest per crime type. It is easy to count this in pandas but it's not intuitive for me how to count binary values in dataframe with big query. Can anyone give me possible idea to make this count?

data: because Chicago crime data is big I am not able to give reproducible example here, but it is very easy to preview crime data from here: Chicago crime data

Let me give you a little preview:

my big query:

SELECT
  primary_type,
  count(arrest),
  COUNTIF(year = 2015) AS arrests_2015,
  COUNTIF(year = 2016) AS arrests_2016
FROM
  `bigquery-public-data.chicago_crime.crime`
WHERE
  arrest = TRUE
  AND year IN (2001,
    2018)
  AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')
GROUP BY
  primary_type,
  arrest

but this query gives me empty output, I don't know how to make it work.

goal:

from Chicago crime data table, I want to extract number of total arrest and non-arrest for each primary type, I want to exclude OTHER OFFENSE and all non-criminal types until end of 2018.

how can I correct my big-query to get my expected output?

1 Answer

0 votes
by (28.1k points)
edited by

The problem in your code is in the line:

AND year IN (2001, 2018)

Here you selected only 2001 and 2008, instead, you should have selected all the years between 2001 and 2018. You should replace the above command with:

AND year IN (2001, 2018)

I hope this would work for you.

SELECT

 primary_type,

 COUNT(arrest) arrest_total,

 COUNTIF(year = 2015) AS arrests_2015,

 COUNTIF(year = 2016) AS arrests_2016

FROM `bigquery-public-data.chicago_crime.crime`

WHERE arrest = TRUE

AND year BETWEEN 2001 AND 2018

AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')

GROUP BY primary_type, arrest


 

Also, to include non-arrests in your data table run the following code:

#standardSQL

SELECT

 primary_type,

 arrest,

 COUNT(arrest) arrest_total,

 COUNTIF(year = 2015) AS arrests_2015,

 COUNTIF(year = 2016) AS arrests_2016

FROM `bigquery-public-data.chicago_crime.crime`

WHERE year BETWEEN 2001 AND 2018

AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')

GROUP BY primary_type, arrest

And, for getting one output row per primary type, write the below command:

SELECT

 primary_type,

 COUNTIF(arrest) arrests,

 COUNTIF(NOT arrest) non_arrests,

 COUNT(arrest) arrest_total,

 COUNTIF(year = 2015) AS arrests_2015,

 COUNTIF(year = 2016) AS arrests_2016

FROM `bigquery-public-data.chicago_crime.crime`

WHERE year BETWEEN 2001 AND 2018

AND primary_type NOT IN ('OTHER OFFENSE', ' all non-criminal types')

GROUP BY primary_type

If you want more information regarding Hadoop, refer to the following video:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...