xgboost in R: how does xgb.cv pass the optimal parameters into xgb.train

Question

asked Jul 6, 2019 in Machine Learning by Anurag (33.1k points)

I've been exploring the xgboost package in R and went through several demos as well as tutorials but this still confuses me: after using xgb.cv to do cross-validation, how do the optimal parameters get passed to xgb.train? Or should I calculate the ideal parameters (such as nround, max.depth) based on the output of xgb.cv?

param <- list("objective" = "multi:softprob",
"eval_metric" = "mlogloss",
"num_class" = 12)
cv.nround <- 11
cv.nfold <- 5
mdcv <-xgb.cv(data=dtrain,params = param,nthread=6,nfold = cv.nfold,nrounds = cv.nround,verbose = T)
md <-xgb.train(data=dtrain,params = param,nround = 80,watchlist = list(train=dtrain,test=dtest),nthread=6)

1 Answer

Anurag · Answer 1 · 2019-07-06T14:39:11+0000

I think you need more details about the parameters used in your code. I would try to explain these parameters as follow :

start_char: int. This character is used to mark the start of the sequence. This function needs to Set to 1 because 0 is usually the padding character.

oov_char: int. words that were cut out because of the num_words or skip_top limit will be replaced with this character.

index_from: int. Index actual words with this index and higher.

It looks like the word indices in your dictionary starts from 1.

If you noticed that the indices returned by your keras have <START> and <UNKNOWN> as indexes 1 and 2. (And it assumes you will use 0 for <PADDING>).

Code for solution:

import keras
NUM_WORDS=1000 # only use top 1000 words
INDEX_FROM=3 # word index offset
train,test = keras.datasets.imdb.load_data(num_words=NUM_WORDS, index_from=INDEX_FROM)
train_x,train_y = train
test_x,test_y = test
word_to_id = keras.datasets.imdb.get_word_index()
word_to_id = {k:(v+INDEX_FROM) for k,v in word_to_id.items()}
word_to_id["<PAD>"] = 0
word_to_id["<START>"] = 1
word_to_id["<UNK>"] = 2
id_to_word = {value:key for key,value in word_to_id.items()}
print(' '.join(id_to_word[id] for id in train_x[0] ))

Output:

"<START> this film was just brilliant casting <UNK> <UNK> story
direction <UNK> really <UNK> the part they played and you could just
imagine being there robert <UNK> is an amazing actor ..."

Hope this answer helps.

xgboost in R: how does xgb.cv pass the optimal parameters into xgb.train

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources