Or something… . I often write bad code prior to writing good code. I can’t help myself. I want the solution so that I can, with great satisfaction, replace all the bits of bad code until it is good enough. So, today I needed to separate a bunch of drone data into individual flights so that I can process each flight separately. So, the first thing I need to do is get a nice list of all the collection times from the image metadata, as the time/date stamps on the files have been modified:
exiftool -csv -datetimeoriginal -d '%Y-%m-%d %H:%M:%S' . | awk 'BEGIN { FS = "," } ; {print $2}' | sort | tee dates_alone.txt
Above, first we pull out datetimeoriginal tag, formatted as we like it, and then extract just the date itself. We also should keep around a CSV with both columns, so let’s generate that now:
exiftool -csv -datetimeoriginal -d ‘%Y-%m-%d %H:%M:%S’ . | sort | tee filename_dates.csv
From our old friend stackoverflow, we borrow code meant to identify gaps in time series with heavy use of pandas, and then modify to include our dates as well as look for gaps larger than 10 minutes:
from datetime import datetime, timedelta
import pandas as pd
# Construct dummy dataframe
dates = pd.to_datetime([
'2024-08-26 16:16:06',
'2024-08-26 16:16:27',
'2024-08-26 16:16:33',
'2024-08-26 16:16:36',
'2024-08-26 16:16:39',
'2024-08-26 16:16:41',
'2024-08-26 16:17:10',
'2024-08-26 16:17:16',
'2024-08-26 16:17:19',
'2024-08-26 16:17:21',
'2024-08-26 16:17:24',
...
# Filter diffs (here days > 1, but could be seconds, hours, etc)
gaps = deltas[deltas > timedelta(minutes=10)]
...
Thus generating the final time for each of the flights. Next we’ll separate the data and process independently:
python3 dt.py
19 gaps with average gap duration: 0 days 08:37:33.052631578
Start: 2024-08-26 16:26:46 | Duration: 17:09:49
Start: 2024-08-27 09:47:35 | Duration: 0:17:01
Start: 2024-08-27 10:17:25 | Duration: 0:19:22
Start: 2024-08-27 10:48:23 | Duration: 0:18:51
Start: 2024-08-27 11:24:21 | Duration: 1:51:40
Start: 2024-08-27 13:17:09 | Duration: 0:14:46
Start: 2024-08-27 13:49:55 | Duration: 20:06:14
Start: 2024-08-28 10:25:59 | Duration: 1:18:40
Start: 2024-08-28 11:53:28 | Duration: 21:59:50
Start: 2024-08-29 09:53:18 | Duration: 0:39:34
Start: 2024-08-29 10:32:52 | Duration: 0:17:10
Start: 2024-08-29 11:02:09 | Duration: 3:33:26
Start: 2024-08-29 14:46:07 | Duration: 18:46:36
Start: 2024-08-30 09:47:22 | Duration: 0:33:17
Start: 2024-08-30 10:31:42 | Duration: 1:15:51
Start: 2024-08-30 11:59:56 | Duration: 0:28:14
Start: 2024-08-30 13:04:27 | Duration: 3 days, 2:10:44
Start: 2024-09-02 15:25:58 | Duration: 0:18:45
Start: 2024-09-02 16:09:35 | Duration: 0:13:38
Full modified code from stackoverflow, should that link rot as below (minus a lot of my dates, as that would be 1300 additional lines):
from datetime import datetime, timedelta
import pandas as pd
# Construct dummy dataframe
dates = pd.to_datetime([
'2024-08-26 16:16:06',
'2024-08-26 16:16:27',
'2024-08-26 16:16:33',
'2024-09-02 16:42:25',
'2024-09-02 16:43:03',
'2024-09-02 16:43:23',
'2024-09-02 16:43:43'
])
df = pd.DataFrame(dates, columns=['date'])
# Take the diff of the first column (drop 1st row since it's undefined)
deltas = df['date'].diff()[1:]
# Filter diffs (here days > 1, but could be seconds, hours, etc)
gaps = deltas[deltas > timedelta(minutes=10)]
# Print results
print(f'{len(gaps)} gaps with average gap duration: {gaps.mean()}')
for i, g in gaps.iteritems():
gap_start = df['date'][i - 1]
print(f'Start: {datetime.strftime(gap_start, "%Y-%m-%d %H:%M:%S")} | '
f'Duration: {str(g.to_pytimedelta())}')