Data manipulation based on trends value

Question

asked Jul 11, 2019 in Data Science by sourav (17.6k points)

Given a dataset with Date column and Value column, I need to come up with the best solution of segmenting the data by date based on trends in the Value column. My output should be a CSV filewith the columns: StartDate, EndDate,StartValue,EndValue. Start and End date define the bounds of the segment. A short example is presented: input data:

**Date** **Value**
01/01/2014 10
01/02/2014 5
01/03/2014 5
01/04/2014 0
output:
**StartDate** **EndDate** **StartValue** **EndValue**
01/01/2014 01/15/2014 10 5
01/16/2014 02/03/2014 5 5
02/04/2014 03/10/2014 5 4

1 Answer

Shlok Pandey · Answer 1 · 2019-07-20T05:22:56+0000

Here is an approach using pandas.DataFrame.shift which will give you the desired output.Follow the below steps along with code:

1.Create a dataframe with some data:

import pandas as pd
datelist = pd.date_range('1/1/2019', periods=100).tolist()
values = np.random.randint(1, 5, 100)
df = pd.DataFrame({'Date': datelist, 'Value': values})
df = df.set_index('Date')
df.head(10)

Date Value
2019-01-01 1
2019-01-02 4
2019-01-03 2
2019-01-04 2
2019-01-05 2
2019-01-06 3
2019-01-07 2
2019-01-08 2
2019-01-09 3
2019-01-10 2

2.Drop the duplicate rows:

df = df.loc[df.Value.shift() != df.Value]

Date Value
2019-01-01 2
2019-01-02 1
2019-01-04 2
2019-01-05 3
2019-01-06 1

3.Reset the index:

df = df.reset_index()

4.Rename the existing columns to be the start columns.

df.columns = ['Start_Date', 'Start_Value']

5.Create end columns by shifting the start columns back one row.

df['End_Date'] = df.Start_Date.shift(-1)
df['End_Value'] = df.Start_Value.shift(-1)

6.Drop the NaNs.

df = df.dropna()

Set the End_Value type to int (if preferred).

df['End_Value'] = df['End_Value'].astype(int)
df.head(10)

Start_Date Start_Value End_Date End_Value
0 2019-01-01 1 2019-01-02 4
1 2019-01-02 4 2019-01-03 2
2 2019-01-03 2 2019-01-06 3
3 2019-01-06 3 2019-01-07 2
4 2019-01-07 2 2019-01-09 3
5 2019-01-09 3 2019-01-10 2
6 2019-01-10 2 2019-01-11 1
7 2019-01-11 1 2019-01-12 2
8 2019-01-12 2 2019-01-15 1
9 2019-01-15 1 2019-01-16 4

7.Finally, create a CSV file from the dataframe:

df.to_csv('trends.csv')

If you are interested in learning Data Science then watch this video:

If you wish to learn more about how to use python for data science, then go through this data science python course by Intellipaat for more insights.

Data manipulation based on trends value

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources