0 votes
1 view
in R Programming by (5k points)

I would like to add a prefix to all columns that result from a left join.

left_join() has the possibility to add a suffix when names are identical between the two tables being joined. But, it does not have an option to always add this suffix even when they are not identically named. And it does not have an option to instead add a prefix.

library(dplyr)

library(nycflights13)

flights2 <- flights %>% select(year:day, hour, origin, dest, tailnum, carrier)

airports2 <- airports

result <- flights2 %>% left_join(airports2, c("dest" = "faa")) %>% head()

The result:

Source: local data frame [6 x 14]

year month   day  hour origin  dest tailnum carrier                            name

(int) (int) (int) (dbl)  (chr) (chr)   (chr)   (chr)                           (chr)

1  2013     1     1     5    EWR   IAH  N14228      UA    George Bush Intercontinental

2  2013     1     1     5    LGA   IAH  N24211      UA    George Bush Intercontinental

3  2013     1     1     5    JFK   MIA  N619AA      AA                      Miami Intl

4  2013     1     1     5    JFK   BQN  N804JB      B6                              NA

5  2013     1     1     5    LGA   ATL  N668DN      DL Hartsfield Jackson Atlanta Intl

6  2013     1     1     5    EWR   ORD  N39463      UA              Chicago Ohare Intl

Variables not shown: lat (dbl), lon (dbl), alt (int), tz (dbl), dst (chr)

Here, it is not possible to know, only from the join result, from which original table each column came.

The purpose of adding this prefix is so that column names be reliably calculated from table names and the column names of data loaded in from a relational database. The database structure loaded into and stored in R and naming conventions for the relational database will be used, for example, to identify primary and foreign keys. These will then be used to set up the joins and to later retrieve data from the join results.

1 Answer

0 votes
by (23.2k points)

To prefix all columns after the left_join(), you can add the prefixes to the original table before performing the join().i.e.,

library(dplyr)

library(nycflights13)

flights2 <- flights %>% select(year:day, hour, origin, dest, tailnum, carrier)

airports2 <- airports

names(flights2) <- paste0("flights2.", names(flights2) )

names(airports2) <- paste0("airports2.", names(airports2) )

result <- flights2 %>% left_join(airports2, c("flights2.dest" = "airports2.faa") ) %>% head()

Output:

# A tibble: 6 x 15

  flights2.year flights2.month flights2.day flights2.hour flights2.origin flights2.dest

          <int>          <int>        <int>         <dbl> <chr>           <chr>        

1          2013              1            1             5 EWR             IAH          

2          2013              1            1             5 LGA             IAH          

3          2013              1            1             5 JFK             MIA          

4          2013              1            1             5 JFK             BQN          

5          2013              1            1             6 LGA             ATL          

6          2013              1            1             5 EWR             ORD          

# ... with 9 more variables: flights2.tailnum <chr>, flights2.carrier <chr>,

#   airports2.name <chr>, airports2.lat <dbl>, airports2.lon <dbl>,

#   airports2.alt <int>, airports2.tz <dbl>, airports2.dst <chr>,

#   airports2.tzone <chr>

...