SciPy in Python Tutorial

Tutorial Playlist

SciPy is a very powerful open-source library that applies wide applications of Python programming, generally used in scientific and technical computations. On top of the structure, SciPy extends all the capabilities given by NumPy with more functions applied to optimization, integration, interpolation, eigenvalue problems, and statistics. It offers both high performance and simplicity in application, making complex mathematical tasks easier to address. Modules are very expansive, with support for virtually anything in linear algebra, advanced signal processing, and statistical analysis. With seamless integration with other libraries, such as Pandas, Matplotlib, and scikit-learn, SciPy enhances the Python ecosystem as a go-to platform for researchers and engineers and data scientists doing domain-specific tasks in data analysis, modeling, and machine learning.

Following is the list of all topics covered in this SciPy Tutorial:

Let’s Learn SciPy with an Example

Let’s start off with this SciPy Tutorial with an example. Scientists and researchers are likely to gather an enormous amount of information and data, which are scientific and technical, from their exploration, experimentation, and analysis. Dealing with such a huge amount of data becomes a hindrance to them. That is, calculating and computing large data manually is not an easy task. Hence, they make use of supercomputers and Data Science for the purpose of faster computing and accurate outcomes.

Another simpler way to deal with scientific and technical computing of data is by making use of one of the Python libraries which is solely built for this purpose. It is referred to as Python SciPy (pronounced as ‘sigh pi’).

Python SciPy is open-source software; therefore, it can be used free of cost and many new Data Science features are incorporated into it.

How to Install SciPy in Python?

The next section of our SciPy tutorial will discuss SciPy installation. SciPy installation varies depending on the operating system. The following content will help us easily figure out how to install SciPy in Python.

Installing SciPy with PIP

Pip is basically a recursive acronym that stands for ‘Pip Installs Packages’. It is a standard package manager which can be installed in most operating systems.

Note: In order to install other packages by using the pip command, we need to make sure that we have Python installed pip in our system.

Python3 -m pip install --user numpy scipy
This command is mainly used for installing SciPy in Windows operating system with the help of pip. We install packages to local users rather than system directories by making use of the ‘–user’ flag.
sudo port install py35-scipy py35-numpy
This command denotes installing SciPy in Mac. Sudo is a command that allows one user to run programs with the security privileges of another user.
sudo apt-get install python-scipy python-numpy
This command is used to install SciPy in a Linux operating system. The apt-get command is one of the command line tools which is necessary to work with APT (Advanced Packaging Tools) software packages.

Get 100% Hike!

Master Most in Demand Skills Now!

Python SciPy Modules

Many dedicated software tools are necessary for Python scientific computing, and SciPy is one such tool or library offering many Python modules that we can work with in order to perform complex operations.

The following table shows some of the modules or sub-packages that can be used for computing:

SL No. Sub-Package Function
1. Interpolation scipy.interpolate
2. Integration scipy.integrate
3. Optimization scipy.optimize
4. Signal processing scipy.signal
5. Statistics scipy.stats
6. Fast Fourier Transforms scipy.fftpack
7. Linear algebra scipy.linalg
8. sparse scipy.sparse
9. Input/output scipy.io
10. Special function scipy.special
11. Multidimensional image processing scipy.ndimage
12. Spatial data structures and algorithms scipy.spatial

Python scipy

import numpy as np
from scipy import signal

This is a basic scipy code where the sub-package signal is being imported. We can import any sub-package in a similar manner. Python Numpy is required for most of the sub-packages. The sub-package signal can be replaced by other modules concerned with scipy.

 

Shape Your Career as a Data Scientist
Grow with Our Data Science Training
quiz-icon

Integration or scipy integrate

Numerical integration is carried out in scipy by making use of scipy.integrate sub-package. This package provides several integration techniques. Some of the integration functions are listed below.

Sl. No. Sub-package Function
1. Interpolation scipy.interpolate
2. Integration scipy.integrate
3. Optimization scipy.optimize
4. Signal processing scipy.signal
5. Statistics scipy.stats
6. Fast Fourier Transforms scipy.fftpack
7. Linear algebra scipy.linalg
8. Sparse scipy.sparse
9. Input/output scipy.io
10. Special function scipy.special
11. Multidimensional image processing scipy.ndimage
12. Spatial data structures and algorithms scipy.spatial

Single Integrals

It can also be called general-purpose integration. When there is only one variable present between two points, then we make use of the function quad.

The function used above is quad with the two limits ranging between a and b. Let us understand the function with an example.
A researcher is gathering data, and he wants to find out the integrals of the data.

Single integralIn the above example, 12x is the function which lies between the intervals 0 and 1.
Example for single integration:

import scipy.integrate
f= lambda x: 12*x
i = scipy.integrate.quad(f, 0, 1)
print (i)
Output:
(6.0, 6.661338147750939e-14)

Lambda function is deployed here so that any number of arguments can be used but it can have only one expression. Like here the expression is 12x, and the researcher makes use of the integrated function scipy.integrate.quad(f, 0, 1).

Double Integral

It is a type of integration where a function consists of at least two variables with y being the first argument and x being the second.

dblquad(func, a, b, gfun, hfun[, args, …])

The function used above is dblquad, and here the y argument lies between the limits a and b, and the x argument lies between the limits g and h. Hence, two variables are defined.
The researcher now adds another variable to the previous data and makes it into a double integral.
double integeral
In the above example, the first integral contains one variable denoting the y function and the second integral contains the second variable denoting the x function.

Example for double integrals
import scipy.integrate
f = lambda x, y : 12*x
g = lambda x : 0
h = lambda y : 1
i = scipy.integrate.dblquad(f, 0, 0.5, g, h)
print(i)
output:
(3.0, 6.661338147750939e-14)

Hence, the above code contains the integration function used for double integrals scipy.integrate.dblquad(f, 0, 0.5, g, h), where f is the function; 12x, 0, and 0.5 are the integrals for the y function; and g and h are the integrals for the x function.

Triple Integrals

It is a type of integration where a function consists of at least three variables.
There will be three functions for x, y, and z. Hence, the researcher computes with three integrals.

tplquad(func, a, b, gfun, hfun, qfun, rfun)

The researcher makes use of the tplquad function with three integrals, one with intervals a and b, the other with intervals g and h, and the third with intervals q and r.

The researcher further analyzes to involve with three integrals, so he thinks of combining the two integral values with the third one represented with the z function.
triple integralThe above example denotes three integrals with a new function dz between the intervals 0 and 3.

Example for triple integration:

from scipy import integrate
f = lambda z, y, x: 12*x
integrate.tplquad(f, 0, 0.5, lambda x: 0, lambda x: 1, lambda x, y: 0, lambda x, y: 3)
Output
(4.5, 1.9940620784844344e-13)

The above code represents the triple integral function integrate.tplquad(f, 0, 1, lambda x: 0, lambda x: 0.5, lambda x, y: 0, lambda x, y: 3), where the researcher has represented three integrals for: the z function between 0 and 3 intervals, the y function between 0 and 0.5 intervals, and the x function between 0 and 1 intervals.

SciPy in Python imread

Images can be read from a file as an array by making use of scipy.misc.imread. We will be able to use this Python Function only if we have installed Python imaging library (PIL).

pip install imageio
import imageio
image = imageio.imread('mario.png')
print(image)

Output (Depending on image to image)

array([[[121, 112, 131],
[138, 129, 148],
[153, 144, 165],
...,
[119, 126, 74],
[131, 136, 82],
[139, 144, 90]],[[ 89, 82, 100],
[110, 103, 121],
[130, 122, 143],
...,
[118, 125, 71],
[134, 141, 87],
[146, 153, 99]],[[ 73, 66, 84],
[ 94, 87, 105],
[115, 108, 126],
...,
[117, 126, 71],
[133, 142, 87],
[144, 153, 98]],
...,[[ 87, 106, 76],
[ 94, 110, 81],
[107, 124, 92],
...,
[120, 158, 97],
[119, 157, 96],
[119, 158, 95]],[[ 85, 101, 72],
[ 95, 111, 82],
[112, 127, 96],
...,
[121, 157, 96],
[120, 156, 94],
[120, 156, 94]],[[ 85, 101, 74],
[ 97, 113, 84], [111, 126, 97],
...,
[120, 156, 95],
[119, 155, 93],
[118, 154, 92]]], dtype=uint8)

In the above output, we have used an image called mario.png, and this image has been read from its saved file and later converted into an array so that it can be later used for image processing.

Optimizing and Minimizing Functions in Python SciPy

Optimization is the method of selecting the most effective or best resource or situation for a given problem. The Python syntax of optimization can be given as:

import numpy as np
from scipy.optimize import minimize

The minimize function can be used to provide a common interface to constrained and unconstrained algorithms for a multivariate scalar function in scipy.optimize sub-package.

The scipy.optimize package contains various modules:

  • Constrained and unconstrained minimization of multivariate scalar functions (minimize ()) using few variety of algorithms (e.g., Nelder–Mead simplex)
  • Least-squares minimization (leastsq()) and curve fitting (curve_fit()) algorithms
  • Multivariate equation system solvers (root()) using a variety of algorithms (e.g., hybrid Powell)
  • Scalar univariate functions minimizers (minimize_scalar()) and root finders (newton())

Using Nelder–Mead Simplex Algorithm

It is applied for nonlinear optimization problems for which derivatives will be unknown, and it is a direct search method. We make use of the minimize() routine along with a Nelder–Mead simplex example.

The method used in this algorithm is (method = ‘Nelder-Mead’)

import numpy as np
from scipy.optimize import minimize
def rosen(x):
  return sum(100.0 * (x[1:] - x[:-1]**2.0)**2.0 + (1 - x[:-1])**2.0)
x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])
res = minimize(rosen, x0, method='nelder-mead')
print(res.x)

Least Squares

We use the optimize function at times to solve the least-squares problem with bounds on the variables.
The following example illustrates the use of a rosenbrock function to implement the least square problem.

def fun_rosenbrock(x):
  return np.array([10 * (x[1] - x[0]**2), (1 - x[0])])
from scipy.optimize import least_squares
input = np.array([2, 2])
res = least_squares(fun_rosenbrock, input)
print (res)

Output:

message: `gtol` termination condition is satisfied.
     success: True
      status: 1
         fun: [ 4.441e-15  1.110e-16]
           x: [ 1.000e+00  1.000e+00]
        cost: 9.866924291084687e-30
         jac: [[-2.000e+01  1.000e+01]
               [-1.000e+00  0.000e+00]]
        grad: [-8.893e-14  4.441e-14]
  optimality: 8.892886493421953e-14
 active_mask: [ 0.000e+00  0.000e+00]
        nfev: 3
        njev: 3

Root Finding

This is one of the minimization methods that comes under optimization. In the following example, we import a root function from the scipy.optimize sub-library in order to use it in further calculation within the code.

import numpy as np
from scipy.optimize import root
def func(x):
   return x*3 + 3 * np.cos(x)
sol = root(func, 0.4)
print (sol)

Output

 message: The solution converged.
 success: True
  status: 1
     fun: [ 0.000e+00]
       x: [-7.391e-01]
  method: hybr
    nfev: 10
    fjac: [[-1.000e+00]]
       r: [-5.021e+00]
     qtf: [-1.270e-10]

Curve Fit

This is part of optimization where we make use of non-linear least squares to fit a function.
The following code illustrates the curve fit:

import numpy as np
from scipy.optimize import root
def func(x):
  return x*3 + 3 * np.cos(x)
sol = root(func, 0.4)
print (sol)

Output

message: The solution converged.
 success: True
  status: 1
     fun: [ 0.000e+00]
       x: [-7.391e-01]
  method: hybr
    nfev: 10
    fjac: [[-1.000e+00]]
       r: [-5.021e+00]
     qtf: [-1.270e-10]
from scipy import optimize
def test_func(x, a, b):
    return a * np.sin(b * x)
params, params_covariance = optimize.curve_fit(test_func, x_data, y_data, p0=[2, 2])
print(params)
plt.figure(figsize=(6, 4))
plt.scatter(x_data, y_data, label='Data')
plt.plot(x_data, test_func(x_data, params[0], params[1]), label='Fitted function')
plt.legend(loc='best')
plt.show()

Curve fir2

Interpolation

Finding a value between two points in a curve or a line can be termed as interpolation.

import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
x = np.linspace(0, 5, 12)
y = np.cos(x**2/3+4)
print (x,y)
plt.plot(x, y,'o')
plt.show()

Output:

[0. 0.45454545 0.90909091 1.36363636 1.81818182 2.27272727 2.72727273 3.18181818 3.63636364 4.09090909 4.54545455 5. ] [-0.65364362 -0.60001388 -0.42313893 -0.09242219 0.37976236 0.84649879 0.9808235 0.46118122 -0.52586509 -0.98820612 -0.10830906 0.97296947]

interpolate

Scipy-stats

This sub-package contains a large number of probability distributions, as well as a growing library of statistical functions.

There are various sub-modules in statistics. They are listed below:

• rv_continuous

• rv_discrete

• rv_histogram

Sub-module: rv_continuous

This is a type of generic continuous random variable class that is mainly meant for sub-classing.

rv_continuous([momtype, a, b, xtol, …])

The continuous random variable is represented as rv_continuous with various parameters within the function.

A simple demonstration of rv_continuous sub-module:

import scipy.stats as st
class my_pdf(st.rv_continuous):
    def _pdf(self,x):
       return 3*x**2

rv_discrete

This is a type of generic random variable class that is mainly meant for sub-classing.

rv_discrete([a, b, name, badvalue, …])

The discrete random variable is represented as rv_discrete with various parameters within the function.

The following example demonstrates rv_discrete sub-module from scipy import stats under scipy.stats:

import matplotlib.pyplot as plt
from scipy import stats
x = np.arange(7)
y = (0.2, 0.3, 0.1, 0.1, 0.1, 0.0, 0.2)
custm = stats.rv_discrete(name='custm', values=(x, y))
fig, ax = plt.subplots(1, 1)
ax.plot(x, custm.pmf(x), 'ro', ms=12, mec='y')
ax.vlines(x, 0, custm.pmf(x), colors='b', lw=4)

rv discrete

In the above-generated graph, seven values are plotted at various points specified along the y-axis.

rv_histogram

This is a type generates a distribution given by a histogram.

rv_histogram(histogram, *args, **kwargs)

The random variable histogram is represented as rv_histogram with various parameters within the function.

The following example demonstrates the representation of an rv_historgams:

import scipy.stats
import numpy as np
import matplotlib.pyplot as plt
data = scipy.stats.norm.rvs(size=1000, loc=0, scale=1.0, random_state=123)
hist = np.histogram(data, bins=100)
hist_dist = scipy.stats.rv_histogram(hist)
X = np.linspace(-5.0, 5.0, 100)
plt.hist(data, density=True, bins=100)
plt.plot(X, hist_dist.pdf(X), label='PDF')
plt.plot(X, hist_dist.cdf(X), label='CDF')

histogram
A gradual increase in and then a stable flow of the wave is represented in the above graph which demonstrates the random variable histogram for the probability density function (PDF) and the cumulative distribution function (CDF).

Sparse Matrix

Arithmetic operations such as addition, subtraction, division, matrix power, and multiplication can make use of sparse matrices. We can implement sparse matrix for following matrix formats:

  • Compressed Sparse Row (CSR) format
  • Compressed Sparse Column (CSC) format
  • Coordinate (COO) Format
  • Dictionary of Keys (DOK) format

Scipy.sparse.csr_matrix

This enables efficient row slicing. Let us see a simple program where we generate an empty 3×3 CSR matrix using scipy.sparse.

import numpy as np
from scipy.sparse import csr_matrix
csr_matrix((3, 3), dtype=np.int8).toarray()

Output:

array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], dtype=int8)

Representation of a 3×3 CSR matrix upon specifying the rows and columns through inputs:

row = np.array([0, 1, 0, 2, 1, 1])
col = np.array([1, 0, 2, 0, 0, 2])
data = np.array([1, 2, 3, 4, 5, 6])
csr_matrix((data, (row, col)), shape=(3, 3)).toarray()

Output:

array([[0, 1, 3],
[7, 0, 6],
[4, 0, 0]], dtype=int32)

Scipy.sparse.csc_matrix

This enables efficient column slicing. Let us see a simple program where we generate an empty 3×3 CSC matrix using scipy.sparse.

import numpy as np
from scipy.sparse import csr_matrix
csc_matrix((3, 3), dtype=np.int8).toarray()

Output:

array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], dtype=int8)

Representation of a 3×3 CSC matrix upon specifying the rows and columns through inputs:

row = np.array([0, 1, 1, 2, 1, 2])
col = np.array([1, 1, 1, 2, 0, 2])
data = np.array([1, 2, 3, 4, 5, 6])
csc_matrix((data, (row, col)), shape=(3, 3)).toarray()

Output:

array([[ 0, 1, 0],
[ 5, 5, 0],
[ 0, 0, 10]], dtype=int32)

The Compressed Sparse Column matrix is more efficient at accessing column operations or column vectors as it is stored as arrays of columns and their values are stored at each row.

The Compressed Sparse Row matrix is vice versa. It is stored as arrays of rows and their values are stored at each column and are more efficient at accessing row operations or row vectors.

CSR and CSC are difficult to construct from scratch, while COO and DOK are easier to construct.

Scipy.sparse.coo_matrix

This enables efficient row slicing. Let us see a simple program where we generate an empty 3×3 COO matrix using scipy.sparse.

from scipy.sparse import coo_matrix
coo_matrix((3, 3), dtype=np.int8).toarray()

Output:

array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], dtype=int8)

Representation of a 3×3 COO matrix upon specifying the rows and columns through inputs:

row = np.array([1, 1, 1, 2, 1, 2])
col = np.array([0, 1, 1, 2, 0, 2])
data = np.array([0, 2, 3, 4, 5, 6])
coo_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[ 0,  0,  0],
[ 5,  5,  0],
[ 0,  0, 10]])

Scipy.sparse.dok_matrix

We can construct sparse matrix incrementally in an efficient manner using this module under the scipy.sparse sub-package.

import numpy as np
from scipy.sparse import dok_matrix

The dictionary of keys format allows access for individual values within the matrix.

 

Unlock the Power of Data Science and AI
Master Data-Driven Decision Making, AI Models, and Advanced Analytics
quiz-icon

Fourier Front Transforms

The method for expressing any function as a sum of periodic components and recovering the signal from those components can be termed as Fourier analysis. When both Fourier transforms and their respective functions are replaced with some discrete counterparts, then it is termed as discrete Fourier transform.
We make use of the Fourier transform sub-package scipy.fftpack:

from scipy.fftpack import fft, ifft
x = np.array([1.0, 2.0, 1.0, -1.0, 1.5])
y = fft(x)
print(y)

Output:

[ 4.5   +0.j    
 2.08155948-1.65109876j
 -1.83155948+1.60822041j                 
-1.83155948-1.60822041j   
2.08155948+1.65109876j]

Fourier front transform is performed on a given array of values, and the above output is generated.

One-dimensional Discrete Fourier Transform

We can compute 1-dimensional Fourier transforms by making use of the following standard syntax:

fft(a[, n, axis, norm])
ifft(a[, n, axis, norm])

We make use of ‘fft’ for 1-dimensional discrete Fourier transforms and ‘ifft’ for 1-dimensional inverse discrete Fourier transforms.

Let us discuss on this sub-package with a simple example of the sum of two cosines for a 1-dimensional discrete Fourier transform.

from scipy.fftpack import fft
N = 600
T = 1.0 / 800.0
x = np.linspace(0.0, N*T, N)
y = np.cos(70.0 * 2.0*np.pi*x) + 0.5*np.cos(90.0 * 2.0*np.pi*x)
yf = fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N//2)
import matplotlib.pyplot as plt
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))

One dimensional
By making use of two cosine functions, the sum is calculated and plotted in the graph. The waves are plotted at 70 and 90 degrees along the x-axis.

Two-dimensional Discrete Fourier Transform and N-dimensional Discrete Fourier Transform

We can compute 2-dimensional Fourier transforms by making use of the following syntax:

fft2(a[, s, axes, norm])
ifft2(a[, s, axes, norm])

We make use of ‘fft’ for 2-dimensional discrete Fourier transforms and ‘ifft’ for 2-dimensional inverse discrete Fourier transforms.

fftn(a[, s, axes, norm])
ifftn(a[, s, axes, norm])

We make use of ‘fft’ for N-dimensional discrete Fourier transforms and ‘ifft’ for N-dimensional inverse discrete Fourier transforms.

Let us consider a simple example of time-domain signals by making use of 2-dimensional inverse fft:

from scipy.fftpack import ifftn
import matplotlib.pyplot as plt
import matplotlib.cm as cm
N = 30
f, ((ax1, ax2, ax3), (ax4, ax5, ax6)) = plt.subplots(2, 3, sharex='col', sharey='row')
xf = np.zeros((N,N))
xf[0, 5] = 10
xf[0, N-5] =10
Z = ifftn(xf)
ax1.imshow(xf, cmap=cm.Reds)
ax4.imshow(np.real(Z), cmap=cm.gray)
xf = np.zeros((N, N))
xf[5, 0] = 10
xf[N-5, N-10] = 10
Z = ifftn(xf)
ax2.imshow(xf, cmap=cm.Reds)
ax5.imshow(np.real(Z), cmap=cm.gray)
xf = np.zeros((N, N))
xf[5, 10] = 10
xf[N-5, N-10] = 10
Z = ifftn(xf)
ax3.imshow(xf, cmap=cm.Reds)
ax6.imshow(np.real(Z), cmap=cm.gray)

Below graphs are obtained from the time domain signal code generated by illustrating the inverse Fourier front transform.
N- dimensional discrete Fourier transform

 

Conclusion

Here, we come to the end of this module in Python Tutorial. I hope that this Tutorial has provided sufficient information about the scientific and technical library of Python, that is, SciPy. There are various computing calculations that are time-consuming and stressful for the human brain. Hence, making use of such a scientific calculation library in Python programming language to carry out this purpose with ease and incredible speed has proved that this library function plays a vital role in Data Science. Now, if you want to know why Python is the most preferred language for data science, you can go through this blog on Python Data Science tutorial.

If interested to learn more, apart from this SciPy Tutorial do check out our Python Certification which is specially designed to get an in-depth understanding of all Python concepts. You can also refer to the trending Python developer interview questions prepared by the industry experts.

Our Python Courses Duration and Fees

Program Name
Start Date
Fees
Cohort starts on 11th Jan 2025
₹20,007
Cohort starts on 11th Jan 2025
₹20,007
Cohort starts on 11th Jan 2025
₹20,007

About the Author

Technical Research Analyst - Full Stack Development

Kislay is a Technical Research Analyst and Full Stack Developer with expertise in crafting Mobile applications from inception to deployment. Proficient in Android development, IOS development, HTML, CSS, JavaScript, React, Angular, MySQL, and MongoDB, he’s committed to enhancing user experiences through intuitive websites and advanced mobile applications.