Eu tenho os detalhes do meu calendário semanal(obviamente Alterou os Sujeitos para proteger os inocentes) leu em um dataframe de pandas. Um dos meus objetivos é obter o tempo total nas reuniões. Gostaria de ter um quadro de dados indexado por date_range com frequências horárias da semana mostrando quantos minutos totais eu estava em reuniões durante esse horário. Meu primeiro desafio é que as reuniões se sobreponham e, por mais que eu queira estar em dois lugares ao mesmo tempo, certamente não estou. Eu pulo de um para outro, no entanto. Assim, por exemplo, as linhas nos índices 8 e 9 devem ter um tempo total de reunião de 90 minutos e não 120 minutos, como seria o caso se eu apenas df ["Duration"]. Sum () "d da coluna. Como nivelar os períodos no dataframe para contar apenas a sobreposição uma vez? Parece que há uma resposta em algum lugar usando date_range e period, mas não consigo entender. Abaixo está o meu dataframe df.
Start End Duration Subject
0 07/04/16 10:30:00 07/04/16 11:00:00 30 Inspirational Poster Design Session
1 07/04/16 15:00:00 07/04/16 15:30:00 30 Corporate Speak Do"s and Don"ts
2 07/04/16 09:00:00 07/04/16 12:00:00 180 Metrics or Matrix -Panel Discussion
3 07/04/16 13:30:00 07/04/16 15:00:00 90 "Do More with Less" kickoff party
4 07/05/16 09:00:00 07/05/16 10:00:00 60 Fiscal or Physical -Panel Discussion
5 07/05/16 14:00:00 07/05/16 14:30:00 30 "Why we can"t have nice thing" training video
6 07/06/16 15:00:00 07/06/16 16:00:00 60 One-on-One with manager -Panel Discussion
7 07/06/16 09:00:00 07/06/16 10:00:00 60 Fireing for Performance leadership session
8 07/06/16 13:00:00 07/06/16 14:00:00 60 Birthday Cake in the conference room *MANDATORY*
9 07/06/16 12:30:00 07/06/16 13:30:00 60 Obligatory lunchtime meeting because it was the only time everyone had avaiable
Qualquer ajuda seria muito apreciada.
EDITAR: Esta é a saída que eu esperava com o conjunto de dados acima.
2016-07-04 00:00:00 0
2016-07-04 01:00:00 0
2016-07-04 02:00:00 0
2016-07-04 03:00:00 0
2016-07-04 04:00:00 0
2016-07-04 05:00:00 0
2016-07-04 06:00:00 0
2016-07-04 07:00:00 0
2016-07-04 08:00:00 0
2016-07-04 09:00:00 60
2016-07-04 10:00:00 60
2016-07-04 11:00:00 60
2016-07-04 12:00:00 0
2016-07-04 13:00:00 30
2016-07-04 14:00:00 60
2016-07-04 15:00:00 30
2016-07-04 16:00:00 0
2016-07-04 17:00:00 0
2016-07-04 18:00:00 0
2016-07-04 19:00:00 0
2016-07-04 20:00:00 0
2016-07-04 21:00:00 0
2016-07-04 22:00:00 0
2016-07-04 23:00:00 0
2016-07-05 00:00:00 0
2016-07-05 01:00:00 0
2016-07-05 02:00:00 0
2016-07-05 03:00:00 0
2016-07-05 04:00:00 0
2016-07-05 05:00:00 0
2016-07-05 06:00:00 0
2016-07-05 07:00:00 0
2016-07-05 08:00:00 0
2016-07-05 09:00:00 60
2016-07-05 10:00:00 0
2016-07-05 11:00:00 0
2016-07-05 12:00:00 0
2016-07-05 13:00:00 0
2016-07-05 14:00:00 30
2016-07-05 15:00:00 0
2016-07-05 16:00:00 0
2016-07-05 17:00:00 0
2016-07-05 18:00:00 0
2016-07-05 19:00:00 0
2016-07-05 20:00:00 0
2016-07-05 21:00:00 0
2016-07-05 22:00:00 0
2016-07-05 23:00:00 0
2016-07-06 00:00:00 0
2016-07-06 01:00:00 0
2016-07-06 02:00:00 0
2016-07-06 03:00:00 0
2016-07-06 04:00:00 0
2016-07-06 05:00:00 0
2016-07-06 06:00:00 0
2016-07-06 07:00:00 0
2016-07-06 08:00:00 0
2016-07-06 09:00:00 60
2016-07-06 10:00:00 0
2016-07-06 11:00:00 0
2016-07-06 12:00:00 30
2016-07-06 13:00:00 60
2016-07-06 14:00:00 0
2016-07-06 15:00:00 60
2016-07-06 16:00:00 0
2016-07-06 17:00:00 0
2016-07-06 18:00:00 0
2016-07-06 19:00:00 0
2016-07-06 20:00:00 0
2016-07-06 21:00:00 0
2016-07-06 22:00:00 0
2016-07-06 23:00:00 0
2016-07-07 00:00:00 0
Respostas:
1 para resposta № 1Uma possibilidade é criar uma série temporal (s
abaixo) indexados por minuto que mantém o controle dese você está em uma reunião durante esse minuto ou não e, em seguida, faça uma nova amostragem por hora. Para corresponder à saída desejada, você pode ajustar o horário de início e término do índice de s
.
import io
import pandas as pd
data = io.StringIO("""
Start,End,Duration,Subject
0,07/04/16 10:30:00,07/04/16 11:00:00,30,Inspirational Poster Design Session
1,07/04/16 15:00:00,07/04/16 15:30:00,30,Corporate Speak Do"s and Don"ts
2,07/04/16 09:00:00,07/04/16 12:00:00,180,Metrics or Matrix -Panel Discussion
3,07/04/16 13:30:00,07/04/16 15:00:00,90,"Do More with Less" kickoff party
4,07/05/16 09:00:00,07/05/16 10:00:00,60,Fiscal or Physical -Panel Discussion
5,07/05/16 14:00:00,07/05/16 14:30:00,30,"Why we can"t have nice thing" training video
6,07/06/16 15:00:00,07/06/16 16:00:00,60,One-on-One with manager -Panel Discussion
7,07/06/16 09:00:00,07/06/16 10:00:00,60,Fireing for Performance leadership session
8,07/06/16 13:00:00,07/06/16 14:00:00,60,Birthday Cake in the conference room *MANDATORY*
9,07/06/16 12:30:00,07/06/16 13:30:00,60,Obligatory lunchtime meeting because it was the only time everyone
""")
df = pd.read_csv(data, usecols=["Start", "End", "Subject"])
df["Start"] = pd.to_datetime(df["Start"])
df["End"] = pd.to_datetime(df["End"])
# Ranges in datetime indices include the right endpoint
tdel = pd.Timedelta("1min")
s = pd.Series(False, index=pd.date_range(start=df["Start"].min(),
end=df["End"].max()-tdel,
freq="min"))
for _, meeting in df.iterrows():
s[meeting["Start"] : meeting["End"]-tdel] = True
result = s.resample("1H").sum().astype(int)
print(result)
Saída:
2016-07-04 09:00:00 60
2016-07-04 10:00:00 60
2016-07-04 11:00:00 60
2016-07-04 12:00:00 0
2016-07-04 13:00:00 30
2016-07-04 14:00:00 60
2016-07-04 15:00:00 30
2016-07-04 16:00:00 0
2016-07-04 17:00:00 0
2016-07-04 18:00:00 0
2016-07-04 19:00:00 0
2016-07-04 20:00:00 0
2016-07-04 21:00:00 0
2016-07-04 22:00:00 0
2016-07-04 23:00:00 0
2016-07-05 00:00:00 0
2016-07-05 01:00:00 0
2016-07-05 02:00:00 0
2016-07-05 03:00:00 0
2016-07-05 04:00:00 0
2016-07-05 05:00:00 0
2016-07-05 06:00:00 0
2016-07-05 07:00:00 0
2016-07-05 08:00:00 0
2016-07-05 09:00:00 60
2016-07-05 10:00:00 0
2016-07-05 11:00:00 0
2016-07-05 12:00:00 0
2016-07-05 13:00:00 0
2016-07-05 14:00:00 30
2016-07-05 15:00:00 0
2016-07-05 16:00:00 0
2016-07-05 17:00:00 0
2016-07-05 18:00:00 0
2016-07-05 19:00:00 0
2016-07-05 20:00:00 0
2016-07-05 21:00:00 0
2016-07-05 22:00:00 0
2016-07-05 23:00:00 0
2016-07-06 00:00:00 0
2016-07-06 01:00:00 0
2016-07-06 02:00:00 0
2016-07-06 03:00:00 0
2016-07-06 04:00:00 0
2016-07-06 05:00:00 0
2016-07-06 06:00:00 0
2016-07-06 07:00:00 0
2016-07-06 08:00:00 0
2016-07-06 09:00:00 60
2016-07-06 10:00:00 0
2016-07-06 11:00:00 0
2016-07-06 12:00:00 30
2016-07-06 13:00:00 60
2016-07-06 14:00:00 0
2016-07-06 15:00:00 60
Freq: H, dtype: int64