• Articles
  • Tutorials
  • Interview Questions
  • Webinars

Command Line Tools for Data Science - A Beginner's Guide

Data Science Command Line Tools

Here, we are going to look at the most convenient and common Data Science Command tools for quick analysis of data.

Watch this Data Science Tutorial video

Video Thumbnail
  • alias

It defines or display aliases. It is a Bash built in.

$ help alias
$ alias ll='ls -alF'
  • bash

GNU Bourne-Again SHell

$ sudo apt-get install bash
$ man bash
  • bc

It is used to evaluate equation from standard input

$ sudo apt-get install bc
$ man bc
$ echo 'e(1)' | bc -l
2.71828182845904523536
  • bigmler

Access BigML’s prediction API

$ sudo pip install bigmler
$ bigmler –help
  • cat

It concatenates files and standard input, and print on standard output.

$ sudo apt-get install coreutils
$ man cat
$ cat results-01 results-02 results-03 > results-all
  • cd

It changes the shell working directory. It is a Bash built in.

$ help cd
$ cd ~; pwd; cd ..; pwd
/home/vagrant
/home

Get 100% Hike!

Master Most in Demand Skills Now!

  • chmod

It changes files mode bits. We use it to make our command-line tools executable.

$ sudo apt-get install coreutils
$ man chmod
$ chmod u+x experiment.sh
  • cowsay

It generates an ASCII picture of a cow with a message. Useful for when building up a particular pipeline is starting to frustrate you a bit too much.

$ sudo apt-get install cowsay
$ man cowsay
$ echo 'The command line is awesome!' | cowsay

Certification in Bigdata Analytics

  • cp

Copy files and directories

$ sudo apt-get install coreutils
$ man cp
  • csvcut

Extract columns from CSV data

$ sudo pip install csvkit
$ csvcut –help
  • csvgrep

It filters tabular data to only those rows where certain columns contain a given value or match a regular expression.

$ sudo pip install csvkit
$ csvgrep –help
  • csvjoin

Merge two or more CSV tables together using a method analogous to a SQL JOIN operation.

$ sudo pip install csvkit
$ csvjoin –help
  • csvlook

Renders a CSV file to the command line in a readable, fixed-width format

$ sudo pip install csvkit
$ csvlook --help
$ echo -e "a,b\n1,2\n3,4" | csvlook
  • csvsort

Sort CSV files

$ sudo pip install csvkit
$ csvsort –help
  • csvsql

Execute SQL queries directly on CSV data or insert CSV into a database

$ sudo pip install csvkit
$ csvsql –help
  • csvstack

Stack up the rows from multiple CSV files, optionally adding a grouping value to each row.

$ sudo pip install csvkit
$ csvstack –help
  • csvstat

It prints descriptive statistics for all columns in a CSV file.

$ sudo pip install csvkit
$ csvstat –help
  • curl

Download data from a URL.

$ sudo apt-get install curl
$ man curl
  • cut

Remove sections from each line of files

$ sudo apt-get install coreutils
$ man cut
  • display

Display an image or image sequence on any X server. Can read image data from standard input

$ sudo apt-get install imagemagick
$ man display
  • echo

Display a line of text.

$ sudo apt-get install coreutils
$ man echo
  • env

Run a program in a modified environment. It is used to specify which interpreter should run our script.

$ sudo apt-get install coreutils
$ man env
$ #!/usr/bin/env python
  • export

Set export attribute for shell variables. It is a Bash built in. It is useful for making shell variables available to other command-line tools.

$ help export
$ export WEKAPATH=$HOME/bin
  • feedgnuplot

It generates a script for gnuplot while passing data to standard input.

$ sudo apt-get install feedgnuplot
$ man feedgnuplot
  • find

Search for files in a directory hierarchy

$ sudo apt-get install findutils
$ man find
  • grep

Print lines matching a pattern

$ sudo apt-get install grep
$ man grep
  • head

Output the first part of files

$ sudo apt-get install coreutils
$ man head
$ seq 5 | head -n 3
1
2
3
  • ls

List directory contents

$ sudo apt-get install coreutils
$ man ls
  • man

Read reference manuals of command-line tools

$ sudo apt-get install man
$ man man
$ man grep
  • mkdir

This command is used to make directories.

$ sudo apt-get install coreutils
$ man mkdir
  • mv

It is used to move or rename files and directories

$ sudo apt-get install coreutils
$ man mv
  • paste

It is used to merge lines of files

$ sudo apt-get install coreutils
$ man paste
  • pip

Install and manage Python packages

$ sudo apt-get install python-pip
$ man pip
  • pwd

Print name of current working directory

$ man pwd
$ pwd
/home/vagrant
  • python

It is used to execute Python, which is an interpreted, interactive, and object-oriented programming Language

$ sudo apt-get install python
$ man python

Become a Data Science Architect

  • R

Analyze data and create visualizations with the R programming language.

$ sudo apt-get install r-base-dev
$ man R
  • scp

It is used to Copy remote files securely

$ sudo apt-get install openssh-client
$ man scp
  • sort

Sort lines of text files

$ sudo apt-get install coreutils
$ man sort
  • split

Split a file into pieces

$ sudo apt-get install coreutils
$ man split
  • sql2csv

It executes arbitrary commands against an SQL database and outputs the results as a CSV

$ sudo pip install csvkit
$ sql2csv –help
  • ssh

Login to remote machines

$ sudo apt-get install ssh
$ man ssh
  • sudo

Execute a command as another user

$ sudo apt-get install sudo
$ man sudo
  • tar

Create, list, and extract TAR archives

$ sudo apt-get install tar
$ man tar
  • tee

Read from standard input and write to standard output and files

$ sudo apt-get install coreutils
$ man tee
  • tree

List contents of directories in a tree-like format

$ sudo apt-get install tree
$ man tree
  • type

Display the type of command-line tool. Type is a Bash built-in.

$ help type
$ type cd
  • unrar

Extract files from RAR archives

$ sudo apt-get install unrar-free
$ man unrar
  • unzip

List, test and extract compressed files in a ZIP archive

$ sudo apt-get install unzip
$ man unzip
  • wc

Print newline, word, and byte counts for each file

$ sudo apt-get install coreutils
$ man wc
$ echo 'hello world' | wc -c
12
  • which

Locate a command-line tool

$ man which
$ which man
/usr/bin/man

About the Author

Principal Data Scientist

Meet Akash, a Principal Data Scientist with expertise in advanced analytics, machine learning, and AI-driven solutions. With a master’s degree from IIT Kanpur, Aakash combines technical knowledge with industry insights to deliver impactful, scalable models for complex business challenges.