bing
Flat 10% & upto 50% off + Free additional Courses. Hurry up!

Data visualization

 

After scrubbing of the data, you are ready to explore it. Exploring your data can be done from three perspectives –

  • Inspect the data and its properties
  • Derive statistics from your data
  • Create interesting visualizations

Read these Top Trending Data Science Interview Q’s  blog now that helps you grab high-paying jobs!

Inspect the data and its properties

 If you want to examine the raw data, then it is not good to use cat because cat prints all the data to the screen in one go. In order to examine the raw data at your own pace, we recommend using less with the -S option:

$ less -S file.csv

The -S option is used to ensure that long lines are not being wrapped when they don’t fit in the terminal. The advantage of less is that it does not load the entire file into memory, which is good for viewing large files.

iris.csv

iris

 

  • Feature Names and Data Types

 To gain insight into the data set, it is useful to print the feature names and study them. The feature names may indicate the meaning of the feature. You can apply the sed expression as follows:

$ < data/iris.csv sed -e 's/,/\n/g;q'
sepal_length

sepal_width

petal_length

petal_width

species

 

  • Unique Identifiers, Continuous Variables, and Factors

To find out whether a feature should be treated as a unique identifier or categorical variable, count the number of unique values for a particular column:

$ cat data/iris.csv | csvcut -c species | body "sort | uniq | wc -l"

species

3

In data set if the number of unique values is small in comparison to the number of rows then feature may be treated as a categorical and if the number is equal to the number of rows then it may be a treated as unique identifier.

 

Derive statistics from your data

  • Using csvstat

The command-line tool csvstat gives a lot of information. For each feature it shows:

  • The data type in Python terms
  • The number of unique values
  • Whether it has any missing values (Nulls)
  • Different descriptive statistics i.e., min, max, sum, standard deviation, mean and median for those features for which it’s appropriate.

 

datatypes.csv

datatypes

 

$ csvstat data/datatypes.csv
<type 'int'>

Nulls: False

Values: 2, 66, 42
<type 'float'>

Nulls: True

Values: 0.0, 3.1415

 

 Creating Visualizations

To create visualization mostly two software packages are used : Gnuplot and ggplot2

Introducing Gnuplot and feedgnuplot

The first software package to create visualizations is Gnuplot. It is different from most command-line tools we’ve been using for two reasons. First, it uses a script instead of command-line arguments. Second, the output is always written to a file and not printed to standard output. it’s able to produce visualizations for the command line. That is, it’s able to print its output to the terminal without the need for a GUI. Even then, you would still need to set up a script.

immigration plot by gnuplot

 

Feedgnuplot which is a command tool can help us with setting up a script for Gnuplot. It is completely configurable through command-line arguments and also reads from standard input.

 

Introducing ggplot2

Ggplot2 is a more modern software package for creating visualizations. It is an implementation of the grammar of graphics in R. When used through Rio, this is a very convenient way of creating visualizations from the command line.

Rio stands for R input/output, because it enables you to use R as a filter on the command line. You simply pipe CSV data into Rio and you specify the R commands that you want to run on it. Rio can execute multiple R commands that are separated by semicolons.

Learn more about Data Science in this insightful blog now!

K-Map

K-Map stands for Karnaugh map which provides a simple method for minimising boolean expressions. It provides a graphical method of grouping together expressions with common factors and therefore eliminating useless variables. Karnaugh maps are used to simplify real-world logic requirements so that they can be implemented using a minimum number of physical logic gates.

 

Graph

 A graph is a set of vertices and edges in which vertices are connected to each other using edges. But the difference in graph and tree is that tree does not contain any loop whereas graph contains loop. Graph is a pictorial representation which shows the relationships between different quantities, parameters or measurable variables. It describes that how one quantity changes if another quantity is changed.

Looking for top jobs in Data Science ? This blog post gives you all the information you need !

"0 Responses on Data visualization"

Training in Cities

Bangalore, Hyderabad, Chennai, Delhi, Kolkata, UK, London, Chicago, San Francisco, Dallas, Washington, New York, Orlando, Boston

100% Secure Payments. All major credit & debit cards accepted Or Pay by Paypal.

top

Sales Offer

  • To avail this offer, enroll before 04th December 2016.
  • This offer cannot be combined with any other offer.
  • This offer is valid on selected courses only.
  • Please use coupon codes mentioned below to avail the offer
offer-june

Sign Up or Login to view the Free Data visualization.