Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)

In my dataset I have a number of continuous and dummy variables. For analysis with glmnet, I want the continuous variables to be standardized but not the dummy variables.

I currently do this manually by first defining a dummy vector of columns that have only values of [0,1] and then using the scale command on all the non-dummy columns. Problem is, this isn't very elegant.

But glmnet has a built in standardize argument. By default will this standardize the dummies too? If so, is there an elegant way to tell glmnet's standardize argument to skip dummies?

1 Answer

0 votes
by (33.1k points)

Glmnet takes a matrix as an input for its X parameter, not a data frame, so it doesn't make the distinction for factor columns which you may have the parameter as a data.frame. If you take a look at the R function, glmnet codes the standardize parameter as

For example:

isd = as.integer(standardize)

that will convert the R boolean to a 0 or 1 integer to feed to any of the internal FORTRAN functions.

Check out this following code for more details:

  subroutine standard1 (no,ni,x,y,w,isd,intr,ju,xm,xs,ym,ys,xv,jerr)    989

real x(no,ni),y(no),w(no),xm(ni),xs(ni),xv(ni)   989

integer ju(ni)                                   990

real, dimension (:), allocatable :: v                                     

allocate(v(1:no),stat=jerr)                      993

  if(jerr.ne.0) return                           994

          w=w/sum(w)                             994

          v=sqrt(w)                              995

          if(intr .ne. 0)goto 10651              995

          ym=0.0                                 995

          y=v*y                                  996

          ys=sqrt(dot_product(y,y)-dot_product(v,y)**2)                         996

          y=y/ys                             997

    10660 do 10661 j=1,ni                    997

          if(ju(j).eq.0)goto 10661           997

          xm(j)=0.0                          997

          x(:,j)=v*x(:,j)                    998

          xv(j)=dot_product(x(:,j),x(:,j))   999

          if(isd .eq. 0)goto 10681           999

          xbq=dot_product(v,x(:,j))**2       999

          vc=xv(j)-xbq                      1000

          xs(j)=sqrt(vc)                    1000

          x(:,j)=x(:,j)/xs(j)               1000

          xv(j)=1.0+xbq/vc                  1001

          goto 10691                        1002

Hope this answer helps you! For more details, study the Machine Learning Course.

Browse Categories

...