Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Azure by (45.3k points)

I have a dataframe like bellow, where ID is numeric value, and comment1 and comment2 string that I am importing as a csv. But the data frame is giving result something like this bellow, where fifth comment should be in the comment2 and the original ID value is replaced by this. This is happening randomly for only a few rows. Moreover, this problem is only occurring when I am importing my R code in Azure ML studio, in RStudio no data misplace is occurring. So what I was thinking, just delete the entire row where the first column ID is not a numeric value. As the misplace string value is a random long sentence, I can not do string matching to delete the row. And the data frame is big enough that I just cannot delete the rows manually. Suggestion, please.

image

You will find a sample of the dataframe here,

 df <-

  read.csv(

    "https://docs.google.com/spreadsheets/d/171YXjzm3FsapXSkqgOSos6UGXNRcd1yxmLyvaRnCX5E/pub?output=csv"

  )

df <- df[-1,]

df <- df[, 1:12]

colnames(df) <-

  c(

    "ID","Created","Comments","Liked_By","Disliked_By", "Recipient_Number",

    "Sender","Recipients","Read_By", "Subject","Introduction","Body"

  )

1 Answer

0 votes
by (16.8k points)

Subset to numeric IDs:

subset(df, grepl('^\\d+$', df$ID))

The pattern should match values of ID that start and end with digits, and only contain digits.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...