Context¶
At work, I helped a coworker with his task where he needed to build a touchpoint date scheduler for a client.
Suppose the client has two different regular service plans for its customers Service A and Service B. Service A is supposed to be operated every two months while Service B is to be done every three months. We are provided the last touchpoint date by the client. It looks something like the following.
import pandas as pd
touchpoints = pd.DataFrame(
{
'customerId': [1, 2, 3],
'Service A': [pd.datetime(2017, 1, 20), pd.datetime(2017, 1, 28), pd.datetime(2017, 2, 1)],
'Service B': [pd.datetime(2017, 1, 10), pd.datetime(2017, 1, 1), pd.datetime(2016, 12, 1)]
},
columns=['customerId', 'Service A', 'Service B']
)
touchpoints
The task is to find the next two services coming up, which heavily depends on the last date when the two services were done. For customer 1, the next service will be Service A on 21.3.2017 followed by Service B on 11.4.2017. In contrast, for customer 3, the next sevice will be Service B on 2.3.2017 followed by Service A on 2.4.2017 one month later.
Pull the up-coming events together for the next half year since the last service the table above would look like the following.
touchpoints.loc[:, 'Service A in two months'] = (touchpoints['Service A'] + pd.Timedelta(2, unit='M')).dt.date
touchpoints.loc[:, 'Service A in four months'] = (touchpoints['Service A'] + pd.Timedelta(4, unit='M')).dt.date
touchpoints.loc[:, 'Service A in six months'] = (touchpoints['Service A'] + pd.Timedelta(6, unit='M')).dt.date
touchpoints.loc[:, 'Service B in three months'] = (touchpoints['Service B'] + pd.Timedelta(3, unit='M')).dt.date
touchpoints.loc[:, 'Service B in six months'] = (touchpoints['Service B'] + pd.Timedelta(6, unit='M')).dt.date
touchpoints
So we need a way to figure out the first two earliest dates from column Service A in two months
to column Service B in six month
and we also have to keep track of which date is for what service. Consequently, four columns are supposed to be added with these information: next service
, next service date
, second service
, second service date
.
Apparently, this is a fairly complicated operation. No expectation can be laid that Pandas dev team would have made a ready idiom for you and you have got to run a function that does that through apply
method. Nevertheless, we can still use Pandas methods like sort_values
. Note that sort_values
is a recent term, which used to be called sort
.
Core function¶
First we define the fucntion that will be passed on to apply
.
def scheduler(row):
# sort values
row = row.sort_values()
# find the first nevents dates
nevents = 2
dates = row.iloc[:nevents].tolist()
# find the events. Note events are stored in the index of the series `row`.
# and we only need to keep `Service X` and remove `in x months`.
events = row.index[:nevents].str.extract(r'(Service \w+) .+', expand=False).tolist()
# now make a series holding the resulting info
index = ['next service', 'second service', 'next service date', 'second service date']
return pd.Series(events+dates, index=index)
Here we go. Let's go ahead and find what we will get. The apply
method should work on row by row by setting axis=1
, which is along the COLUMN axis. I know it's a bit brain-twisting. You will get used to it then it will actually make sense to you when dealing with axis
in other methods like concat
below.
next_services = touchpoints.loc[:, 'Service A in two months':'Service B in six months'].apply(scheduler, axis=1)
next_services
Looks like what we want to have. You might tweak the order of the columns with method reindex
if you like next service date
immediately following next service
. Leave it to you or I'll make another blog on how to do this.
Push back¶
Now we append it to the original dataframe. See how axis=1
makes a lot more sense here as you glue the two dataframe together along y-axis, for which axis=1
.
touchpoints = pd.concat([touchpoints, next_services], axis=1)
touchpoints
Let me know anything still unclear to you in the comments. I'll be more than happy to answer it.
You can also ask me other Pandas questions. Tell me what you want to do with Pandas and I probably will make a blog special for you.
This Jupyter Notebook is available on GitHub
No comments:
Post a Comment