0 votes
1 view
in R Programming by (5.3k points)

I have a dataset called spam which contains 58 columns and approximately 3500 rows of data related to spam messages.

I plan on running some linear regression on this dataset in the future, but I'd like to do some pre-processing beforehand and standardize the columns to have zero mean and unit variance.

I've been told the best way to go about this is with R, so I'd like to ask how can I achieve normalization with R? I've already got the data properly loaded and I'm just looking for some packages or methods to perform this task.

1 Answer

0 votes
by (25.3k points)

To standardize your data, i.e., data with a mean of 0 and a standard deviation of 1, you can use the scale function from the base package which is a generic function whose default method centers and/or scales the columns of a numeric matrix.

The basic syntax of scale function is given below:

scale(x, center = TRUE, scale = TRUE)

Where,

x

a numeric matrix(like object).

center

either a logical value or numeric-alike vector of length equal to the number of columns of x, where ‘numeric-alike’ means that as.numeric(.) will be applied successfully if is.numeric(.) is not true.

scale

either a logical value or a numeric-alike vector of length equal to the number of columns of x.

For example:

set.seed(123)

df <- data.frame(x = rnorm(5, 30, .2), 

                  y = runif(5, 3, 5),

                  z = runif(5, 10, 20))

scaled.df <- scale(df)

Output:

Columns with Zero mean:

colMeans(scaled.df)  

            x             y z 

-4.382605e-15 -5.884182e-16  3.330669e-17 

Columns with a Standard Deviation of 1:

apply(scaled.df, 2, sd)

x y z 

1 1 1 

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...