We’ve seen a lot of explorative movement data analysis in the Movement data in GIS series so far. Beyond exploration, predictive analysis is another major topic in movement data analysis. One of the most obvious movement prediction use cases is trajectory prediction, i.e. trying to predict where a moving object will be in the future. The two main categories of trajectory prediction methods I see are those that try to predict the actual path that a moving object will take versus those that only try to predict the next destination.
Today, I want to focus on prediction methods that predict the path that a moving object is going to take. There are many different approaches from simple linear prediction to very sophisticated application-dependent methods. Regardless of the prediction method though, there is the question of how to evaluate the prediction results when these methods are applied to real-life data.
As long as we work with nice, densely, and regularly updated movement data, extracting evaluation samples is rather straightforward. To predict future movement, we need some information about past movement. Based on that past movement, we can then try to predict future positions. For example, given a trajectory that is twenty minutes long, we can extract a sample that provides five minutes of past movement, as well as the actually observed position five minutes into the future:
But what if the trajectory is irregularly updated? Do we interpolate the positions at the desired five minute timestamps? Do we try to shift the sample until – by chance – we find a section along the trajectory where the updates match our desired pattern? What if location timestamps include seconds or milliseconds and we therefore cannot find exact matches? Should we introduce a tolerance parameter that would allow us to match locations with approximately the same timestamp?
Depending on the duration of observation gaps in our trajectory, it might not be a good idea to simply interpolate locations since these interpolated locations could systematically bias our evaluation. Therefore, the safest approach may be to shift the sample pattern along the trajectory until a close match (within the specified tolerance) is found. This approach is now implemented in MovingPandas’ TrajectorySampler.
def test_sample_irregular_updates(self):
df = pd.DataFrame([
{'geometry':Point(0,0), 't':datetime(2018,1,1,12,0,1)},
{'geometry':Point(0,3), 't':datetime(2018,1,1,12,3,2)},
{'geometry':Point(0,6), 't':datetime(2018,1,1,12,6,1)},
{'geometry':Point(0,9), 't':datetime(2018,1,1,12,9,2)},
{'geometry':Point(0,10), 't':datetime(2018,1,1,12,10,2)},
{'geometry':Point(0,14), 't':datetime(2018,1,1,12,14,3)},
{'geometry':Point(0,19), 't':datetime(2018,1,1,12,19,4)},
{'geometry':Point(0,20), 't':datetime(2018,1,1,12,20,0)}
]).set_index('t')
geo_df = GeoDataFrame(df, crs={'init': '4326'})
traj = Trajectory(1,geo_df)
sampler = TrajectorySampler(traj, timedelta(seconds=5))
past_timedelta = timedelta(minutes=5)
future_timedelta = timedelta(minutes=5)
sample = sampler.get_sample(past_timedelta, future_timedelta)
result = sample.future_pos.wkt
expected_result = "POINT (0 19)"
self.assertEqual(result, expected_result)
result = sample.past_traj.to_linestring().wkt
expected_result = "LINESTRING (0 9, 0 10, 0 14)"
self.assertEqual(result, expected_result)
The repository also includes a demo that illustrates how to split trajectories using a grid and finally extract samples: