0 votes
1 view
in R Programming by (5.1k points)

I am working with NCBI Reference Sequence accession numbers like variable a:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")  

To get information from the biomart package I need to remove the .1, .2, etc. after the accession numbers. I normally do this with this code:

b <- sub("..*", "", a)

# [1] "" "" "" "" "" ""

But as you can see, this isn't the correct way for this variable. Can anyone help me with this?

1 Answer

0 votes
by (23.2k points)
edited ago by

To remove the part of the string after “.”, you can use the gsub function with the escape characters (\\) before the “.” as follows:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2"

       ,"NM_010281.2","NM_011419.3", "NM_053155.2") 

gsub("\\..*","",a)

[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"   

[6] "NM_053155" 

If you want to explore more in R programming then watch this R programming tutorial for beginner:

...