Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (7.3k points)

I am working with NCBI Reference Sequence accession numbers like variable a:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")  

To get information from the biomart package I need to remove the .1, .2, etc. after the accession numbers. I normally do this with this code:

b <- sub("..*", "", a)

# [1] "" "" "" "" "" ""

But as you can see, this isn't the correct way for this variable. Can anyone help me with this?

1 Answer

0 votes
by
edited

To remove the part of the string after “.”, you can use the gsub function with the escape characters (\\) before the “.” as follows:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2"

       ,"NM_010281.2","NM_011419.3", "NM_053155.2") 

gsub("\\..*","",a)

[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"   

[6] "NM_053155" 

If you want to explore more in R programming then watch this R programming tutorial for beginner:

Browse Categories

...