/ / Python Pandas achatando um calendário com reuniões sobrepostas para obter tempo real nas reuniões - python, datetime, pandas, dataframe

Pandas Python achatando um calendário com reuniões sobrepostas para obter tempo real nas reuniões - python, datetime, pandas, dataframe

Eu tenho os detalhes do meu calendário semanal(obviamente Alterou os Sujeitos para proteger os inocentes) leu em um dataframe de pandas. Um dos meus objetivos é obter o tempo total nas reuniões. Gostaria de ter um quadro de dados indexado por date_range com frequências horárias da semana mostrando quantos minutos totais eu estava em reuniões durante esse horário. Meu primeiro desafio é que as reuniões se sobreponham e, por mais que eu queira estar em dois lugares ao mesmo tempo, certamente não estou. Eu pulo de um para outro, no entanto. Assim, por exemplo, as linhas nos índices 8 e 9 devem ter um tempo total de reunião de 90 minutos e não 120 minutos, como seria o caso se eu apenas df ["Duration"]. Sum () "d da coluna. Como nivelar os períodos no dataframe para contar apenas a sobreposição uma vez? Parece que há uma resposta em algum lugar usando date_range e period, mas não consigo entender. Abaixo está o meu dataframe df.

               Start                End Duration    Subject
0   07/04/16 10:30:00  07/04/16 11:00:00    30      Inspirational Poster Design Session
1   07/04/16 15:00:00  07/04/16 15:30:00    30      Corporate Speak Do"s and Don"ts
2   07/04/16 09:00:00  07/04/16 12:00:00    180     Metrics or Matrix -Panel Discussion
3   07/04/16 13:30:00  07/04/16 15:00:00    90      "Do More with Less" kickoff party
4   07/05/16 09:00:00  07/05/16 10:00:00    60      Fiscal or Physical -Panel Discussion
5   07/05/16 14:00:00  07/05/16 14:30:00    30      "Why we can"t have nice thing" training video
6   07/06/16 15:00:00  07/06/16 16:00:00    60      One-on-One with manager -Panel Discussion
7   07/06/16 09:00:00  07/06/16 10:00:00    60      Fireing for Performance leadership session
8   07/06/16 13:00:00  07/06/16 14:00:00    60      Birthday Cake in the conference room *MANDATORY*
9   07/06/16 12:30:00  07/06/16 13:30:00    60      Obligatory lunchtime meeting because it was the only time everyone had avaiable

Qualquer ajuda seria muito apreciada.

EDITAR: Esta é a saída que eu esperava com o conjunto de dados acima.

2016-07-04 00:00:00 0
2016-07-04 01:00:00 0
2016-07-04 02:00:00 0
2016-07-04 03:00:00 0
2016-07-04 04:00:00 0
2016-07-04 05:00:00 0
2016-07-04 06:00:00 0
2016-07-04 07:00:00 0
2016-07-04 08:00:00 0
2016-07-04 09:00:00 60
2016-07-04 10:00:00 60
2016-07-04 11:00:00 60
2016-07-04 12:00:00 0
2016-07-04 13:00:00 30
2016-07-04 14:00:00 60
2016-07-04 15:00:00 30
2016-07-04 16:00:00 0
2016-07-04 17:00:00 0
2016-07-04 18:00:00 0
2016-07-04 19:00:00 0
2016-07-04 20:00:00 0
2016-07-04 21:00:00 0
2016-07-04 22:00:00 0
2016-07-04 23:00:00 0
2016-07-05 00:00:00 0
2016-07-05 01:00:00 0
2016-07-05 02:00:00 0
2016-07-05 03:00:00 0
2016-07-05 04:00:00 0
2016-07-05 05:00:00 0
2016-07-05 06:00:00 0
2016-07-05 07:00:00 0
2016-07-05 08:00:00 0
2016-07-05 09:00:00 60
2016-07-05 10:00:00 0
2016-07-05 11:00:00 0
2016-07-05 12:00:00 0
2016-07-05 13:00:00 0
2016-07-05 14:00:00 30
2016-07-05 15:00:00 0
2016-07-05 16:00:00 0
2016-07-05 17:00:00 0
2016-07-05 18:00:00 0
2016-07-05 19:00:00 0
2016-07-05 20:00:00 0
2016-07-05 21:00:00 0
2016-07-05 22:00:00 0
2016-07-05 23:00:00 0
2016-07-06 00:00:00 0
2016-07-06 01:00:00 0
2016-07-06 02:00:00 0
2016-07-06 03:00:00 0
2016-07-06 04:00:00 0
2016-07-06 05:00:00 0
2016-07-06 06:00:00 0
2016-07-06 07:00:00 0
2016-07-06 08:00:00 0
2016-07-06 09:00:00 60
2016-07-06 10:00:00 0
2016-07-06 11:00:00 0
2016-07-06 12:00:00 30
2016-07-06 13:00:00 60
2016-07-06 14:00:00 0
2016-07-06 15:00:00 60
2016-07-06 16:00:00 0
2016-07-06 17:00:00 0
2016-07-06 18:00:00 0
2016-07-06 19:00:00 0
2016-07-06 20:00:00 0
2016-07-06 21:00:00 0
2016-07-06 22:00:00 0
2016-07-06 23:00:00 0
2016-07-07 00:00:00 0

Respostas:

1 para resposta № 1

Uma possibilidade é criar uma série temporal (s abaixo) indexados por minuto que mantém o controle dese você está em uma reunião durante esse minuto ou não e, em seguida, faça uma nova amostragem por hora. Para corresponder à saída desejada, você pode ajustar o horário de início e término do índice de s.

import io
import pandas as pd

data = io.StringIO("""
Start,End,Duration,Subject
0,07/04/16 10:30:00,07/04/16 11:00:00,30,Inspirational Poster Design Session
1,07/04/16 15:00:00,07/04/16 15:30:00,30,Corporate Speak Do"s and Don"ts
2,07/04/16 09:00:00,07/04/16 12:00:00,180,Metrics or Matrix -Panel Discussion
3,07/04/16 13:30:00,07/04/16 15:00:00,90,"Do More with Less" kickoff party
4,07/05/16 09:00:00,07/05/16 10:00:00,60,Fiscal or Physical -Panel Discussion
5,07/05/16 14:00:00,07/05/16 14:30:00,30,"Why we can"t have nice thing" training video
6,07/06/16 15:00:00,07/06/16 16:00:00,60,One-on-One with manager -Panel Discussion
7,07/06/16 09:00:00,07/06/16 10:00:00,60,Fireing for Performance leadership session
8,07/06/16 13:00:00,07/06/16 14:00:00,60,Birthday Cake in the conference room *MANDATORY*
9,07/06/16 12:30:00,07/06/16 13:30:00,60,Obligatory lunchtime meeting because it was the only time everyone
""")
df = pd.read_csv(data, usecols=["Start", "End", "Subject"])
df["Start"] = pd.to_datetime(df["Start"])
df["End"] = pd.to_datetime(df["End"])

# Ranges in datetime indices include the right endpoint
tdel = pd.Timedelta("1min")

s = pd.Series(False, index=pd.date_range(start=df["Start"].min(),
end=df["End"].max()-tdel,
freq="min"))
for _, meeting in df.iterrows():
s[meeting["Start"] : meeting["End"]-tdel] = True

result = s.resample("1H").sum().astype(int)
print(result)

Saída:

2016-07-04 09:00:00    60
2016-07-04 10:00:00    60
2016-07-04 11:00:00    60
2016-07-04 12:00:00     0
2016-07-04 13:00:00    30
2016-07-04 14:00:00    60
2016-07-04 15:00:00    30
2016-07-04 16:00:00     0
2016-07-04 17:00:00     0
2016-07-04 18:00:00     0
2016-07-04 19:00:00     0
2016-07-04 20:00:00     0
2016-07-04 21:00:00     0
2016-07-04 22:00:00     0
2016-07-04 23:00:00     0
2016-07-05 00:00:00     0
2016-07-05 01:00:00     0
2016-07-05 02:00:00     0
2016-07-05 03:00:00     0
2016-07-05 04:00:00     0
2016-07-05 05:00:00     0
2016-07-05 06:00:00     0
2016-07-05 07:00:00     0
2016-07-05 08:00:00     0
2016-07-05 09:00:00    60
2016-07-05 10:00:00     0
2016-07-05 11:00:00     0
2016-07-05 12:00:00     0
2016-07-05 13:00:00     0
2016-07-05 14:00:00    30
2016-07-05 15:00:00     0
2016-07-05 16:00:00     0
2016-07-05 17:00:00     0
2016-07-05 18:00:00     0
2016-07-05 19:00:00     0
2016-07-05 20:00:00     0
2016-07-05 21:00:00     0
2016-07-05 22:00:00     0
2016-07-05 23:00:00     0
2016-07-06 00:00:00     0
2016-07-06 01:00:00     0
2016-07-06 02:00:00     0
2016-07-06 03:00:00     0
2016-07-06 04:00:00     0
2016-07-06 05:00:00     0
2016-07-06 06:00:00     0
2016-07-06 07:00:00     0
2016-07-06 08:00:00     0
2016-07-06 09:00:00    60
2016-07-06 10:00:00     0
2016-07-06 11:00:00     0
2016-07-06 12:00:00    30
2016-07-06 13:00:00    60
2016-07-06 14:00:00     0
2016-07-06 15:00:00    60
Freq: H, dtype: int64