Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

This is my dataset:

 a    ID   a.1    b.1     a.2   b.2

1    1  40.00   100.00  NA    88.89

2    2  100.00  100.00  100   100.00

3    3  50.00   100.00  75    100.00

4    4  66.67   59.38   NA    59.38

5    5  37.50   100.00  NA    100.00

6    6  100.00  100.00  100   100.00

This is the code I am applying on the data frame:

 temp <- do.call(rbind,strsplit(names(df)[-1],".",fixed=TRUE))

 dup.temp <- temp[duplicated(temp[,1]),]

 res <- lapply(dup.temp[,1],function(i) {

 breaks <- c(-Inf,quantile(a[,paste(i,1,sep=".")], na.rm=T),Inf)

 cut(a[,paste(i,2,sep=".")],breaks)

 })

but I am getting the below error:

 Error in cut.default(a[, paste(i, 2, sep = ".")], breaks) : 

 'breaks' are not unique

The same code works fine on another dataset.

 varnames<-c("ID", "a.1", "b.1", "c.1", "a.2", "b.2", "c.2")

 a <-matrix (c(1,2,3,4, 5, 6, 7), 2,7)

 colnames (a)<-varnames

 df<-as.data.frame (a)

    ID  a.1  b.1  c.1  a.2  b.2  c.2

  1  1    3    5    7    2    4    6

  2  2    4    6    1    3    5    7

 res <- lapply(dup.temp[,1],function(i) {

 breaks <- c(-Inf,quantile(a[,paste(i,1,sep=".")], na.rm=T),Inf)

 cut(a[,paste(i,2,sep=".")],breaks)

 })

 res

[[1]]

[1] (-Inf,3] (-Inf,3]

Levels: (-Inf,3] (3,3.25] (3.25,3.5] (3.5,3.75] (3.75,4] (4, Inf]

[[2]]

[1] (-Inf,5] (-Inf,5]

Levels: (-Inf,5] (5,5.25] (5.25,5.5] (5.5,5.75] (5.75,6] (6, Inf]

[[3]]

[1] (5.5,7] (5.5,7]

Levels: (-Inf,1] (1,2.5] (2.5,4] (4,5.5] (5.5,7] (7, Inf]

Can anyone tell me why am I getting this error and how do I fix it?

1 Answer

0 votes
by (36.8k points)

The error is occurred because of the common b.1, a.2, and b.2 has the same quantile value. This is the reason they can't be used directly as break values in the cut() function.

apply(a,2,quantile,na.rm=T)

       ID      a.1    b.1   a.2      b.2

0%   1.00  37.5000  59.38  75.0  59.3800

25%  2.25  42.5000 100.00  87.5  91.6675

50%  3.50  58.3350 100.00 100.0 100.0000

75%  4.75  91.6675 100.00 100.0 100.0000

100% 6.00 100.0000 100.00 100.0 100.0000

You can solve the problem by putting quantile() inside the unique(). This will remove all the unique values in the quantile which will also help you to have less breaking points in the quantiles.

res <- lapply(dup.temp[,1],function(i) {

  breaks <- c(-Inf,unique(quantile(a[,paste(i,1,sep=".")], na.rm=T)),Inf)

  cut(a[,paste(i,2,sep=".")],breaks)

})

[[1]]

[1] <NA>        (91.7,100]  (58.3,91.7] <NA>        <NA>        (91.7,100] 

Levels: (-Inf,37.5] (37.5,42.5] (42.5,58.3] (58.3,91.7] (91.7,100] (100, Inf]

[[2]]

[1] (59.4,100]  (59.4,100]  (59.4,100]  (-Inf,59.4] (59.4,100]  (59.4,100] 

Levels: (-Inf,59.4] (59.4,100] (100, Inf]

If you are a beginner and want to know more about Data Science the do check out the Data Science course 

1.2k questions

2.7k answers

501 comments

693 users

Browse Categories

...