Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am beginner to data science and started learning using internet. I downloaded the cancer dataset and started to built a knn model on top that datset, the code is given below:

stringsAsFactors = FALSE 

str(prc) 

prc <- prc[-1]  #removes the first variable(id) from the data set.

table(prc$diagnosis_result)  # it helps us to get the numbers of patients

prc$diagnosis <- factor(prc$diagnosis_result, levels = c("B", "M"), labels = c("Benign", "Malignant")) #rename

round(prop.table(table(prc$diagnosis)) * 100, digits = 1)  # it gives the result in the percentage form rounded of to 1 decimal place( and so it’s digits = 1)

normalize <- function(x) {

  return ((x - min(x)) / (max(x) - min(x))) } #very important step (normalizes to a common scale)

prc_n <- as.data.frame(lapply(prc[2:9], normalize))

summary(prc_n$radius)

prc_train <- prc_n[1:65,]

prc_test <- prc_n[66:100,]

prc_train_labels <- prc[1:65, 1]

prc_test_labels <- prc[66:100, 1] 

library(class)

prc_test_pred <- knn(train = prc_train, test = prc_test, cl = prc_train_labels,k=10)

library(gmodels)

CrossTable(x=prc_test_labels, y=prc_test_pred, prop.chisq=FALSE) ```

And I am getting error as shown below:

at prc_test_pred which says Error in knn(train = prc_train, test = prc_test, cl = prc_train_labels, : no missing values are allowed.

Can anyone help me?

1 Answer

0 votes
by (36.8k points)
edited by

I don't know were exactly your code is facing the issue, But i am giving the simple example to build the entire model so that you can see my example and correct yourself. you will also get an idea of how to start.

Here is the code:

#

prc <- read.csv("https://raw.githubusercontent.com/duttashi/learnr/master/data/misc/Prostate_Cancer.csv", header = TRUE, stringsAsFactors = FALSE)

prc <- prc[-1]  

prc$diagnosis <- factor(prc$diagnosis_result, levels = c("B", "M"), labels = c("Benign", "Malignant"))

normalize <- function(x) {

  return ((x - min(x)) / (max(x) - min(x))) } 

prc_n <- as.data.frame(lapply(prc[2:9], normalize))

prc_train <- prc_n[1:65,]

prc_test <- prc_n[66:100,]

prc_train_labels <- prc[1:65, 1]

prc_test_labels <- prc[66:100, 1] 

library(class)

prc_test_pred <- knn(train = prc_train, test = prc_test, cl = prc_train_labels,k=10)

library(gmodels)

CrossTable(x=prc_test_labels, y=prc_test_pred, prop.chisq=FALSE) 

# -------------------------------------------------------------------------

# Cell Contents

#   |-------------------------|

#   |                       N |

#   |           N / Row Total |

#   |           N / Col Total |

#   |         N / Table Total |

#   |-------------------------|

#   

#   

#   Total Observations in Table:  35 

#                   | prc_test_pred 

#   prc_test_labels |         B |         M | Row Total | 

#   ----------------|-----------|-----------|-----------|

#                 B |         6 |        13 |        19 | 

#                   |     0.316 |     0.684 |     0.543 | 

#                   |     0.857 |     0.464 |           | 

#                   |     0.171 |     0.371 |           | 

#   ----------------|-----------|-----------|-----------|

#                 M |         1 |        15 |        16 | 

#                   |     0.062 |     0.938 |     0.457 | 

#                   |     0.143 |     0.536 |           | 

#                   |     0.029 |     0.429 |           | 

#   ----------------|-----------|-----------|-----------|

#      Column Total |         7 |        28 |        35 | 

#                   |     0.200 |     0.800 |           | 

#   ----------------|-----------|-----------|-----------|

Hope this will help you.

Learn Python for Data Science Course to improve your technical knowledge.

Browse Categories

...