Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am new to data science and I am trying to learn it, in my college we have an assignment to be submitted, it consists of the dataset as shown:

Letter  value

  a     1

  b     2

  c     3

  d     4

  .

  .

  .

  .

  Z     26

And I also have a list consists of many words

Wood

Table

Chair

Desk

The task is to calculate the number of vowels in the letter and check the value of the individual letter and add them and store it in a different column. This is the desired output:

 Word         Sum_of_vowel_       value

   Wood                 30        (15+15)

   Table                 6         (1+5)

   Chair                10         (9+1)

Can anyone help me solve it?

1 Answer

0 votes
by (36.8k points)

Here is the method to solve it. First, you need to split the words in the letter column and store only Vowles and then match those Vowles to the dataset which consists of value and then add them

dataset2$Sum_of_vowel_value <- sapply(strsplit(as.character(dataset2$Word), ""), 

       function(x) sum(dataset1$value[match(vowel[match(tolower(x), vowel)], 

                            dataset1$Letter)], na.rm = TRUE))

dataset2

#   Word Sum_of_vowel_value

#1  Wood                 30

#2 Table                  6

#3 Chair                 10

#4  Desk                  5

For better understanding let us split the function:

In the below code we are splitting the each word

strsplit(as.character(dataset2$Word), "")

#[[1]]

#[1] "W" "o" "o" "d"

#[[2]]

#[1] "T" "a" "b" "l" "e"

#[[3]]

#[1] "C" "h" "a" "i" "r"

#[[4]]

#[1] "D" "e" "s" "k"

We are keeping only Vowles:

sapply(strsplit(as.character(dataset2$Word), ""), 

        function(x) vowel[match(tolower(x), vowel)])

#[[1]]

#[1] NA  "o" "o" NA 

#[[2]]

#[1] NA  "a" NA  NA  "e"

#[[3]]

#[1] NA  NA  "a" "i" NA 

#[[4]]

#[1] NA  "e" NA  NA 

Now we are comparing the Vowles with the other dataset

sapply(strsplit(as.character(dataset2$Word), ""), 

      function(x) dataset1$value[match(vowel[match(tolower(x), vowel)], 

                                        dataset1$Letter)])

#[[1]]

#[1] NA 15 15 NA

#[[2]]

#[1] NA  1 NA NA  5

#[[3]]

#[1] NA NA  1  9 NA

#[[4]]

#[1] NA  5 NA NA

We sum the values:

#[1] 30  6 10  5

example program:

vowel <- c('a', 'e', 'i', 'o', 'u')

dataset1 <- data.frame(Letter = letters, value = 1:26)

dataset2 <- structure(list(Word = structure(c(4L, 3L, 1L, 2L), 

.Label = c("Chair", "Desk", "Table", "Wood"), class = "factor")), 

row.names = c(NA, -4L), class = "data.frame")

If you want to know more about the Data Science then do check out the following Data Science which will help you in understanding Data Science from scratch 

Browse Categories

...