Data Science Command Line Tools
Here, we are going to look at the most convenient and common Data Science Command tools for quick analysis of data.
Watch this Data Science Tutorial video
It defines or display aliases. It is a Bash built in.
$ help alias
$ alias ll='ls -alF'
GNU Bourne-Again SHell
$ sudo apt-get install bash
$ man bash
It is used to evaluate equation from standard input
$ sudo apt-get install bc
$ man bc
$ echo 'e(1)' | bc -l
2.71828182845904523536
Access BigML’s prediction API
$ sudo pip install bigmler
$ bigmler –help
It concatenates files and standard input, and print on standard output.
$ sudo apt-get install coreutils
$ man cat
$ cat results-01 results-02 results-03 > results-all
It changes the shell working directory. It is a Bash built in.
$ help cd
$ cd ~; pwd; cd ..; pwd
/home/vagrant
/home
Get 100% Hike!
Master Most in Demand Skills Now!
It changes files mode bits. We use it to make our command-line tools executable.
$ sudo apt-get install coreutils
$ man chmod
$ chmod u+x experiment.sh
It generates an ASCII picture of a cow with a message. Useful for when building up a particular pipeline is starting to frustrate you a bit too much.
$ sudo apt-get install cowsay
$ man cowsay
$ echo 'The command line is awesome!' | cowsay
Copy files and directories
$ sudo apt-get install coreutils
$ man cp
Extract columns from CSV data
$ sudo pip install csvkit
$ csvcut –help
It filters tabular data to only those rows where certain columns contain a given value or match a regular expression.
$ sudo pip install csvkit
$ csvgrep –help
Merge two or more CSV tables together using a method analogous to a SQL JOIN operation.
$ sudo pip install csvkit
$ csvjoin –help
Renders a CSV file to the command line in a readable, fixed-width format
$ sudo pip install csvkit
$ csvlook --help
$ echo -e "a,b\n1,2\n3,4" | csvlook
Sort CSV files
$ sudo pip install csvkit
$ csvsort –help
Execute SQL queries directly on CSV data or insert CSV into a database
$ sudo pip install csvkit
$ csvsql –help
Stack up the rows from multiple CSV files, optionally adding a grouping value to each row.
$ sudo pip install csvkit
$ csvstack –help
It prints descriptive statistics for all columns in a CSV file.
$ sudo pip install csvkit
$ csvstat –help
Download data from a URL.
$ sudo apt-get install curl
$ man curl
Remove sections from each line of files
$ sudo apt-get install coreutils
$ man cut
Display an image or image sequence on any X server. Can read image data from standard input
$ sudo apt-get install imagemagick
$ man display
Display a line of text.
$ sudo apt-get install coreutils
$ man echo
Run a program in a modified environment. It is used to specify which interpreter should run our script.
$ sudo apt-get install coreutils
$ man env
$ #!/usr/bin/env python
Set export attribute for shell variables. It is a Bash built in. It is useful for making shell variables available to other command-line tools.
$ help export
$ export WEKAPATH=$HOME/bin
It generates a script for gnuplot while passing data to standard input.
$ sudo apt-get install feedgnuplot
$ man feedgnuplot
Search for files in a directory hierarchy
$ sudo apt-get install findutils
$ man find
Print lines matching a pattern
$ sudo apt-get install grep
$ man grep
Output the first part of files
$ sudo apt-get install coreutils
$ man head
$ seq 5 | head -n 3
1
2
3
List directory contents
$ sudo apt-get install coreutils
$ man ls
Read reference manuals of command-line tools
$ sudo apt-get install man
$ man man
$ man grep
This command is used to make directories.
$ sudo apt-get install coreutils
$ man mkdir
It is used to move or rename files and directories
$ sudo apt-get install coreutils
$ man mv
It is used to merge lines of files
$ sudo apt-get install coreutils
$ man paste
Install and manage Python packages
$ sudo apt-get install python-pip
$ man pip
Print name of current working directory
$ man pwd
$ pwd
/home/vagrant
It is used to execute Python, which is an interpreted, interactive, and object-oriented programming Language
$ sudo apt-get install python
$ man python
Analyze data and create visualizations with the R programming language.
$ sudo apt-get install r-base-dev
$ man R
It is used to Copy remote files securely
$ sudo apt-get install openssh-client
$ man scp
Sort lines of text files
$ sudo apt-get install coreutils
$ man sort
Split a file into pieces
$ sudo apt-get install coreutils
$ man split
It executes arbitrary commands against an SQL database and outputs the results as a CSV
$ sudo pip install csvkit
$ sql2csv –help
Login to remote machines
$ sudo apt-get install ssh
$ man ssh
Execute a command as another user
$ sudo apt-get install sudo
$ man sudo
Create, list, and extract TAR archives
$ sudo apt-get install tar
$ man tar
Read from standard input and write to standard output and files
$ sudo apt-get install coreutils
$ man tee
List contents of directories in a tree-like format
$ sudo apt-get install tree
$ man tree
Display the type of command-line tool. Type is a Bash built-in.
$ help type
$ type cd
Extract files from RAR archives
$ sudo apt-get install unrar-free
$ man unrar
List, test and extract compressed files in a ZIP archive
$ sudo apt-get install unzip
$ man unzip
Print newline, word, and byte counts for each file
$ sudo apt-get install coreutils
$ man wc
$ echo 'hello world' | wc -c
12
Locate a command-line tool
$ man which
$ which man
/usr/bin/man