This notebook is available on GitHub.
Quite often we analyse time series data. Pandas ships with a plethra of out-of-box tools for that purpose, some of which is also easily customisable. Useful tutorials can be found in its documentation.
Context¶
In this blog, I will show you how to how to select data in a time window regardless the date. For instance, the data points between 0 am and 1 am everyday. Start off, we create some dummy data.
from datetime import datetime
import numpy as np
import pandas as pd
times = pd.date_range('2010-01-01', periods=48, freq='H')
dat = pd.Series(np.random.random(len(times)), index=times)
dat
Data within any time window is easy to select. Let's see I want data between 1 am and 2 am on Jan 1, 2010. Here is how you do it.
dat.loc['2010-01-01 01:00:00':'2010-01-01 01:59:59']
Attempts¶
But the problem is there isn't way to call out data between 1am and 2am everyday as straightward as the above. You can't do something like this.
dat.loc['01:00:00':'01:59:59']
Or you can't do something like this:
dat.loc[(dat.index.time > '01:00:00') & (dat.index.time < '02:00:00')]
New Light¶
It is good though, the last example shows us the possibility. The method time
returns a datetime.time
object.
type(dat.index.time[0])
Therefore the left question is how to create a datetime.time
so on two sides of the binary operator there will be objects of the same type.
The help info isn't that useful.
help(datetime.time)
I don't know what can be its arguements. So I just try anything reasonable.
datetime.time('01:00:00')
Nope, it doesn't work but it provides crucial hints. We know time
is a method function of datetime.datetime
object. The error message reveals that time
wants an same object, which is just itself
. Knowing this, we can hack it. To get a time, e.g., 1 am, we can create a datetime
object, without caring on which date and then we call time
function!
one_am = datetime(2000, 1, 1, 1).time()
one_am
two_am = datetime(2000, 1, 1, 2).time()
two_am
So after all these struggling, we achieve our goal.
dat.loc[(dat.index.time >= one_am) & (dat.index.time < two_am)]
Conclusion¶
You can see we get the data in the time window on both dates available! But I don't like hacking, which is unreliable in long term. I will update this blog once I find a better way. Or you can leave a comment if you know how to do it.
No comments:
Post a Comment