Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in R Programming by (50.2k points)
edited by

My code looks like this:

library(utils) library(httr) library(tidyverse) library(rvest) library(ggpubr)

#scrapes from wikipedia, xpath is correct url <- "https://en.wikipedia.org/wiki/COVID-19_testing"  tests <- url %>%     read_html() %>%   html_nodes(xpath='//*[@id="mw-content-text"]/div/table[4]') %>%    html_table() %>%    extract2(1) %>% # extracts data table from html list   rename(country = "Country or region", tests = "Tests", positive

= "Positive", asof = "As of", 

         tests_per_million = "Tests /millionpeople" ,

         positive_per_thousand_tests = "Positive /thousandtests", ref = "Ref.") %>%   mutate(tests = as.numeric(gsub(",", "", tests)), positive = as.numeric(gsub(",", "", positive)),

         tests_per_million = as.numeric(gsub(",", "", tests_per_million)),

         positive_per_thousand_tests = round(positive_per_thousand_tests, 0)) #removes commas and coverts to numeric'

I run the code, it works great. But when I try to knit then I am getting the following error message:

Error: Can't rename columns that don't exist. The column Tests<U+2009>/millionpeople doesn't exist.

I have tried to clear the cache, load the image at start, also created a new object for the rename and mutate work, and lots more. Any ideas how to fix this? 

1 Answer

0 votes
by (108k points)

I can see that there are some different names in the table with special characters that might be causing the error. Since you want to rename all the column, you can use rename_all in R programming.

library(rvest)

library(dplyr)

library(readr)

url <- "https://en.wikipedia.org/wiki/COVID-19_testing"

tests <-  url %>%     

  read_html() %>%

  html_nodes(xpath='//*[@id="mw-content-text"]/div/table[4]') %>%

  html_table() %>%

  .[[1]] %>%

  rename_all(~c("country", "tests", "positive", "asof", 

                "tests_per_million","positive_per_thousand_tests", "ref")) %>%   

   mutate(tests = parse_number(tests), positive = parse_number(positive),

          tests_per_million = parse_number(tests_per_million),

          positive_per_thousand_tests = round(positive_per_thousand_tests)) 

Browse Categories

...