Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (16.4k points)
closed by

I need to perform exploratory factor investigation and calculate scores for every perception utilizing Python expecting that there is just 1 fundamental factor. It appears to be that sklearn.decomposition.FactorAnalysis() is the best approach, however sadly, documentation and model (lamentably I couldn't discover different models) are not satisfactory enough for me to sort out some way to take care of business.

I also have the accompanying test file with 41 observations of 29 variables

49.6,34917,24325.4,305,101350,98678,254.8,276.9,47.5,1,3,5.6,3.59,11.9,0,97.5,97.6,8,10,100,0,0,96.93,610.1,100,1718.22,6.7,28,5

275.8,14667,11114.4,775,75002,74677,30,109,9.1,1,0,6.5,3.01,8.2,1,97.5,97.6,8,8,100,0,0,100,1558,100,2063.17,5.5,64,5

2.3,9372.5,8035.4,4.6,8111,8200,8.01,130,1.2,0,5,0,3.33,6.09,1,97.9,97.9,8,8,67.3,342.3,0,99.96,18.3,53,1457.27,4.8,8,4

7.10,13198.0,13266.4,1.1,708,695,6.1,80,0.4,0,4,0,3.1,8.2,1,97.8,97.9,8,8,45,82.7,0,99.68,4.5,80,1718.22,13.8,0,3

1.97,2466.7,2900.6,19.7,5358,5335,10.1,23,0.5,0,2,0,3.14,8.2,0,97.3,97.2,9,9,74.5,98.2,0,99.64,79.8,54,1367.89,6.4,12,4

2.40,2999.4,2218.2,0.80,2045,2100,8.9,10,1.5,1,3,0,2.82,8.6,0,97.4,97.2,8,8,47.2,323.8,0,99.996,13.6,24,1249.67,2.7,12,3

0.59,4120.8,5314.5,0.54,14680,13688,14.9,117,1.1,0,3,0,2.94,3.4,0,97.6,97.7,8,8,11.8,872.6,0,100,9.3,52,1251.67,14,14,2

0.72,2067.7,2364,3,367,298,7.2,60,2.5,0,12,0,2.97,10.5,0,97.5,97.6,8,8,74.7,186.8,0,99.13,12,57,1800.45,2.7,4,2

1.14,2751.9,3066.8,3.5,1429,1498,7.7,9,1.6,0,3,0,2.86,7.7,0,97.6,97.8,8,9,76.7,240.1,0,99.93,13.6,60,1259.97,15,8,3

1.29,4802.6,5026.1,2.7,7859,7789,6.5,45,1.9,0,3,0,2.5,8.2,0,98,98,8,8,34,297.5,0,99.95,10,30,1306.44,8.5,0,4

0.40,639.0,660.3,1.3,23,25,1.5,9,0.1,0,1,0,2.5,8.2,0,97.7,97.8,8,8,94.2,0,0,100,4.3,50,1565.44,19.2,0,4

0.26,430.7,608.1,2,33,28,2.5,7,0.4,0,6,0,2.5,8.2,0,97.4,97.4,8,8,76.5,0,0,98.31,8,54,1490.08,0,0,4

4.99,2141.2,2357.6,3.60,339,320,8.1,7,0.2,0,8,0,2.5,5.9,0,97.3,97.4,8,8,58.1,206.3,0,99.58,13.2,95,1122.92,14.2,8,2

0.36,1453.7,1362.2,3.50,796,785,3.7,9,0.1,0,9,0,2.5,13.6,0,98,98.1,8,8,91.4,214.6,0,99.74,7.5,53,1751.98,11.5,0,2

0.36,1657.5,2421.1,2.8,722,690,8.1,8,0.4,0,1,0,2.5,8.2,0,97.2,97.3,11,12,37.4,404.2,0,99.98,10.9,35,1772.33,10.2,8,3

1.14,5635.2,5649.6,3,2681,2530,5.4,20,0.3,0,1,0,3.1,8.2,0,97.7,97.8,8,11,50.1,384.7,0,99.02,11.6,27,1306.08,16,0,2

0.6,1055.9,1487.9,1.3,69,65,2.5,6,0.4,0,8,0,2.5,8.2,0,97.9,97.7,8,11,63,137.9,0,99.98,5.1,48,1595.06,0,0,4

0.08,795.3,1174.7,1.40,85,76,2.2,7,0.2,0,0,0,2.5,8.2,0,97.4,97.5,8,8,39.3,149.3,0,98.27,5.1,52,1903.9,8.1,0,2

0.90,2514.0,2644.4,2.6,1173,1104,5.5,43,0.8,0,10,0,2.5,13.6,0,97.5,97.5,8,10,58.7,170.5,0,80.29,10,34,1292.72,4,0,2

0.27,870.4,949.7,1.8,252,240,2.2,31,0.2,0,1,0,2.5,8.2,0,97.5,97.6,8,8,64.5,0,0,100,6.6,29,1483.18,9.1,0,3

0.41,1295.1,2052.3,2.60,2248,2135,6.0,12,0.8,0,4,0,2.7,8.2,0,97.7,97.7,8,8,71.1,261.3,0,91.86,4.6,21,1221.71,9.4,0,4

1.10,3544.2,4268.9,2.1,735,730,6.6,10,1.7,0,14,0,2.5,8.2,0,97.7,97.8,8,8,52,317.2,0,99.62,9.8,46,1271.63,14.2,0,3

0.22,899.3,888.2,1.80,220,218,3.6,7,0.5,0,1,0,2.5,8.2,0,97.2,97.5,8,8,22.5,0,0,70.79,10.6,32,1508.02,0,0,4

0.24,1712.8,1735.5,1.30,41,35,5.4,7,0.5,0,1,0,3.28,8.2,0,97.8,97.8,9,10,16.6,720.2,0,99.98,4.3,53,1324.46,0,4,2

0.2,558.4,631.9,1.7,65,64,2.5,7,0.2,0,5,0,2.5,8.2,0,97.7,97.5,8,8,60.7,0,0,99.38,6.1,52,1535.08,0,0,2

0.21,599.9,1029,1.1,69,70,3.7,85.7,0.1,0,12,0,2.5,8.2,0,97.4,97.5,8,8,48.6,221.2,0,100,5.4,40,1381.44,25.6,0,2

0.10,131.3,190.6,1.6,28,25,2.9,7,0.3,0,3,0,2.5,8.2,0,97.7,97.8,8,8,58.9,189.4,0,99.93,6.9,42,1525.58,17.4,0,3

0.44,3881.4,5067.3,0.9,2732,2500,11.2,10,1.5,0,5,0,2.67,8.2,0,97.4,97.3,8,11,14.5,1326.2,0,99.06,3.7,31,1120.54,10.3,10,2

0.18,1024.8,1651.3,1.01,358,345,4.6,35,0.3,0,2,0,2.5,8.2,0,97.8,97.9,8,10,15.9,790.2,0,100,4.3,48,1531.04,10.5,0,3

0.46,682.9,784.2,1.8,103,109,2.2,8,0.4,0,4,0,2.5,8.2,0,97.8,97.9,8,8,82.7,166.3,0,99.96,6.4,44,1373.6,13.5,0,2

0.12,370.4,420.0,1.10,28,25,3.4,10,0.1,0,6,0,2.57,8.2,0,97.6,97.8,8,11,51.6,120,0,99.85,8.1,40,1297.94,0,0,3

0.03,552.4,555.1,0.8,54,49,3.5,10,0.4,0,0,0,2.5,8.2,0,97.4,97.6,8,10,33.6,594.5,0,100,3.2,41,1184.34,6.6,0,3

0.21,1256.5,2434.8,0.9,1265,1138,6.3,20,1.3,0,2,0,2.6,8.2,0,98,97.9,8,9,20.1,881,0,99.1,3.9,31,1265.93,7.8,0,3

0.09,320.6,745.7,1.10,37,25,2.7,8,0.3,0,9,0,2.5,8.2,0,98,97.8,8,8,49.2,376.4,0,99.95,4.3,39,1285.11,0,0,3

0.08,452.7,570.9,1,18,20,4.7,9,0.6,0,2,0,2.45,8.2,0,97.1,97.1,8,8,19.9,1103.8,0,99.996,2.9,22,1562.61,21.9,0,3

0.13,967.9,947.2,1,74,65,4.0,25,1.4,0,6,0,2.5,8.2,0,98,98,9,11,30.1,503.1,0,99.999,3.4,55,1269.33,0,0,2

0.07,495.0,570.3,1.2,27,30,4.3,7,0.5,0,12,0,3.62,8.2,0,98.2,98.2,15,13,29.8,430.5,0,99.7,4.9,40,1461.79,14.6,0,2

0.17,681.9,537.4,1.1,113,120,2.9,12,0.4,0,8,0,2.5,8.2,0,98.2,98.3,8,8,24,74.3,0,100,5,43,1290.16,0,0,3

0.05,639.7,898.2,0.40,9,12,3.0,7,0.1,0,1,0,2.5,8.2,0,97.6,97.8,15,11,11.9,1221.1,0,99.996,1.7,40,1372,7,0,4

0.65,2067.8,2084.2,2.50,414,398,7.3,6,0.7,0,4,0,2.16,8.2,0,97.8,97.9,12,12,60.1,146.3,0,99.96,10.4,44,1059.68,7.4,0,2

0.12,804.4,1416.4,3.30,579,602,4.2,7,1.8,0,1,0,2.5,8.2,0,98.1,98.3,8,10,8.9,2492.3,0,95.4,2.2,34,1345.76,7,0,2

With the help of this code, I also composed based on the official example. I also get an abnormal result.

Code:

from sklearn import decomposition, preprocessing

from sklearn.cross_validation import cross_val_score

import csv

import numpy as np

data = np.genfromtxt('test.csv', delimiter=',')

def compute_scores(X):

    n_components = np.arange(0, len(X), 1)

    X = preprocessing.scale(X) # data normalisation attempt

    pca = decomposition.PCA()

    fa = decomposition.FactorAnalysis(n_components=1)

    pca_scores, fa_scores = [], []

    for n in n_components:

        pca.n_components = n

        fa.n_components = n

        #pca_scores.append(np.mean(cross_val_score(pca, X))) # if I attempt to compute pca_scores I get the error.

        fa_scores.append(np.mean(cross_val_score(fa, X)))

    print pca_scores, fa_scores

compute_scores(data)

Code output:

[],

 [-947738125363.77405,

  -947738145459.86035,

  -947738159924.70471,

  -947738174662.89746,

  -947738206142.62854,

  -947738179314.44739,

  -947738220921.50684,

  -947738223447.3678,

  -947738277298.33545,

  -947738383772.58606,

  -947738415104.84912,

  -947738406361.44482,

  -947738394379.30359,

  -947738456528.69275,

  -947738501001.14319,

  -947738991338.98291,

  -947739381280.06506,

  -947739389033.33557,

  -947739434992.48047,

  -947739549511.2655,

  -947739355699.70959,

  -947739879828.51514,

  -947739898216.39099,

  -947739905804.71033,

  -947739902618.47791,

  -947738564594.54639,

  -948816122907.87366,

  -947744046601.55029,

  -947738624937.61292,

  -947738625325.73486,

  -947738626111.14441,

  -947738624973.92188,

  -947738625200.06946,

  -947738625568.65027,

  -947738625528.69666,

  -947738625359.41992,

  -947738624906.67529,

  -947738625652.12439,

  -947739509002.01868,

  -947738625426.81946,

  -947738625380.45837]

This outcome is a long way based on what is normal. Here is the R code for this undertaking and similar information. Its yield is OK (the outcome is near the yield from some IBM program that can perform FA): 

data <-read.csv("test.csv", header=F)

col_names <- names(data)

drops <- c()

for (name in col_names){

  st_dev <- sd(data[,name], na.rm = T)

  if (st_dev == 0){

    drops <- c(drops, name)

  }

}

da_nal <- data[,!(names(data) %in% drops)]

factanal(na.omit(da_nal), factors = 1, scores = 'regression')$scores

The output for this code is:

    Factor1

1   4.89102190

2   3.65004187

3   0.14628700

4  -0.20255897

5  -0.01565570

6  -0.16438863

7   0.40835986

8  -0.25823984

9  -0.20813064

10  0.09390067

11 -0.28891296

12 -0.28882753

13 -0.26624358

14 -0.25202275

15 -0.25181326

16 -0.15653679

17 -0.28702281

18 -0.28865654

19 -0.23251509

20 -0.28066125

21 -0.18714387

22 -0.24969113

23 -0.28302552

24 -0.28712610

25 -0.29196529

26 -0.28659988

27 -0.29502523

28 -0.15802910

29 -0.27440118

30 -0.29083667

31 -0.29548220

32 -0.29461059

33 -0.23594859

34 -0.29654336

35 -0.29759659

36 -0.29085001

37 -0.29539071

38 -0.29234303

39 -0.29702103

40 -0.27595130

41 -0.27184361

So I'm hoping to get the comparative outcome in Python (I realize that I will not get the specific numbers), yet I don't have the foggiest idea how.

closed

4 Answers

0 votes
by (15.4k points)
selected by
 
Best answer
Certainly! To conduct exploratory factor analysis and calculate scores for each observation using Python, you can utilize the FactorAnalysis class from the sklearn.decomposition module. Here's a revised version of your code that achieves the desired outcome:

from sklearn.decomposition import FactorAnalysis

from sklearn.preprocessing import StandardScaler

import numpy as np

data = np.genfromtxt('test.csv', delimiter=',')

def compute_scores(X):

    scaler = StandardScaler()

    X_scaled = scaler.fit_transform(X)

    fa = FactorAnalysis(n_components=1)

    fa_scores = fa.fit_transform(X_scaled)

    print(fa_scores)

compute_scores(data)

In this updated code, the following changes have been made:

The necessary modules, FactorAnalysis from sklearn.decomposition and StandardScaler from sklearn.preprocessing, are imported.

The input data X is standardized using StandardScaler to perform data normalization.

An instance of FactorAnalysis is created with n_components=1 to specify extracting a single factor.

The fit_transform() method of FactorAnalysis is used to compute the factor scores for each observation.

By running this revised code, you should obtain factor scores similar to the output of the R code you shared. Please note that the exact values might not match, but they should be comparable.
0 votes
by (26.4k points)

Try the below code:

from sklearn import decomposition, preprocessing

import numpy as np

data = np.genfromtxt('rangir_test.csv', delimiter=',')

data = data[~np.isnan(data).any(axis=1)]

data_normal = preprocessing.scale(data)

fa = decomposition.FactorAnalysis(n_components = 1)

fa.fit(data_normal)

for score in fa.score_samples(data_normal):

    print score 

Lamentably the output (see beneath) is totally different to one from factanal(). Any exhorts on decomposition.FactorAnalysis() will be valued.

Scikit-learn scores output:

-69.8587183816

-116.353511148

-24.1529840248

-36.5366398005

-7.87165586175

-24.9012815104

-23.9148486368

-10.047780535

-4.03376369723

-7.07428842783

-7.44222705099

-6.25705487929

-13.2313513762

-13.3253819521

-9.23993173528

-7.141616656

-5.57915693405

-6.82400483045

-15.0906961724

-3.37447211233

-5.41032267015

-5.75224753811

-19.7230390792

-6.75268922909

-4.04911793705

-10.6062761691

-3.17417070498

-9.95916350005

-3.25893428094

-3.88566777358

-3.30908856716

-3.58141292341

-3.90778368669

-4.01462493538

-11.6683969455

-5.30068548445

-24.3400870389

-7.66035331181

-13.8321672858

-8.93461397086

-17.4068326999

Wanna become a Python expert? Come and join the python certification course and get certified.

0 votes
by (25.7k points)
To perform exploratory factor analysis and calculate scores for each observation using Python, you can use the FactorAnalysis class from the sklearn.decomposition module. Here's an example of how you can modify your code to achieve the desired outcome:

from sklearn.decomposition import FactorAnalysis

from sklearn.preprocessing import StandardScaler

import numpy as np

data = np.genfromtxt('test.csv', delimiter=',')

def compute_scores(X):

    scaler = StandardScaler()

    X_scaled = scaler.fit_transform(X)

    fa = FactorAnalysis(n_components=1)

    fa_scores = fa.fit_transform(X_scaled)

    print(fa_scores)

compute_scores(data)

In this code, I've made a few modifications:

I imported the necessary modules, including FactorAnalysis from sklearn.decomposition and StandardScaler from sklearn.preprocessing.

I used StandardScaler to perform data normalization by scaling the input data.

I created an instance of FactorAnalysis with n_components=1 to specify that you want to extract one factor.

I used the fit_transform() method of FactorAnalysis to compute the factor scores for each observation.

This modified code should provide you with factor scores similar to the R code you shared. Note that the specific values may not match exactly, but they should be comparable.
0 votes
by (19k points)
To perform exploratory factor analysis and calculate scores for each observation using Python, you can use the FactorAnalysis class from the sklearn.decomposition module. Here's a concise version of the code:

from sklearn.decomposition import FactorAnalysis

from sklearn.preprocessing import StandardScaler

import numpy as np

data = np.genfromtxt('test.csv', delimiter=',')

scaler = StandardScaler()

X_scaled = scaler.fit_transform(data)

fa = FactorAnalysis(n_components=1)

fa_scores = fa.fit_transform(X_scaled)

print(fa_scores)

In this shortened version, the input data is standardized using StandardScaler for data normalization. Then, an instance of FactorAnalysis is created with n_components=1 to extract one factor. The fit_transform() method is used to compute the factor scores, which are then printed.
...