Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (33.1k points)

I've been exploring the xgboost package in R and went through several demos as well as tutorials but this still confuses me: after using xgb.cv to do cross-validation, how do the optimal parameters get passed to xgb.train? Or should I calculate the ideal parameters (such as nround, max.depth) based on the output of xgb.cv?

param <- list("objective" = "multi:softprob",

              "eval_metric" = "mlogloss",

              "num_class" = 12)

cv.nround <- 11

cv.nfold <- 5

mdcv <-xgb.cv(data=dtrain,params = param,nthread=6,nfold = cv.nfold,nrounds = cv.nround,verbose = T)

md <-xgb.train(data=dtrain,params = param,nround = 80,watchlist = list(train=dtrain,test=dtest),nthread=6)

1 Answer

0 votes
by (33.1k points)

I think you need more details about the parameters used in your code. I would try to explain these parameters as follow :

start_char: int. This character is used to mark the start of the sequence. This function needs to Set to 1 because 0 is usually the padding character.

oov_char: int. words that were cut out because of the num_words or skip_top limit will be replaced with this character.

index_from: int. Index actual words with this index and higher.

It looks like the word indices in your dictionary starts from 1.

If you noticed that the indices returned by your keras have <START> and <UNKNOWN> as indexes 1 and 2. (And it assumes you will use 0 for <PADDING>).

Code for solution:

import keras

NUM_WORDS=1000 # only use top 1000 words

INDEX_FROM=3   # word index offset

train,test = keras.datasets.imdb.load_data(num_words=NUM_WORDS, index_from=INDEX_FROM)

train_x,train_y = train

test_x,test_y = test

word_to_id = keras.datasets.imdb.get_word_index()

word_to_id = {k:(v+INDEX_FROM) for k,v in word_to_id.items()}

word_to_id["<PAD>"] = 0

word_to_id["<START>"] = 1

word_to_id["<UNK>"] = 2

id_to_word = {value:key for key,value in word_to_id.items()}

print(' '.join(id_to_word[id] for id in train_x[0] ))

Output:

"<START> this film was just brilliant casting <UNK> <UNK> story

 direction <UNK> really <UNK> the part they played and you could just

 imagine being there robert <UNK> is an amazing actor ..."

Hope this answer helps.

Browse Categories

...