Explore Courses Blog Tutorials Interview Questions
0 votes
in Azure by (45.3k points)

I have a dataframe like bellow, where ID is numeric value, and comment1 and comment2 string that I am importing as a csv. But the data frame is giving result something like this bellow, where fifth comment should be in the comment2 and the original ID value is replaced by this. This is happening randomly for only a few rows. Moreover, this problem is only occurring when I am importing my R code in Azure ML studio, in RStudio no data misplace is occurring. So what I was thinking, just delete the entire row where the first column ID is not a numeric value. As the misplace string value is a random long sentence, I can not do string matching to delete the row. And the data frame is big enough that I just cannot delete the rows manually. Suggestion, please.


You will find a sample of the dataframe here,

 df <-




df <- df[-1,]

df <- df[, 1:12]

colnames(df) <-


    "ID","Created","Comments","Liked_By","Disliked_By", "Recipient_Number",

    "Sender","Recipients","Read_By", "Subject","Introduction","Body"


1 Answer

0 votes
by (16.8k points)

Subset to numeric IDs:

subset(df, grepl('^\\d+$', df$ID))

The pattern should match values of ID that start and end with digits, and only contain digits.

Browse Categories