# Introducing Python Workshop #
### Session III - Date and Time in Pandas ###

__Importing Libraries and Data__

Let us begin by importing the modules we will need for data analysis and visualization and the data itself. For this example, we will use Pandas' __read_html()__ method to read the html table directly from the webpage - https://www.fdic.gov/bank/individual/failed/banklist.html 

In [None]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

data = pd.read_html('https://www.fdic.gov/bank/individual/failed/banklist.html')
data = data[0]
data.head()

Let's explore the dataset.

In [None]:
print( data.shape )
print('\n')
data.info()

<br>

Let's convert the columns "Closing Date" and "Updated Date" into Pandas DateTime object using __to_datetime()__ method. 

In [None]:
data['Closing Date'] = pd.to_datetime( data['Closing Date'], format='%B %d, %Y')
data['Updated Date'] = pd.to_datetime( data['Updated Date'], format='%B %d, %Y')
data.info()

<br>

Now, we can use regular arithmetic operator to get the difference between two dates based on some time interval. In our case, we will calculate the number of days between closing and update date. 

In [None]:
data['time_to_update'] = (data['Updated Date'] - data['Closing Date']).dt.days
data.head()

<br>

To work with time series data in pandas, we use a DatetimeIndex as the index for our DataFrame (or Series).

In [None]:
data = data.set_index('Closing Date')
data.head()

In [None]:
sns.set(rc={'figure.figsize':(10, 4)})

data['time_to_update'].plot(marker='.', alpha=0.5, linestyle='None', figsize=(11, 9), subplots=True)

<br>

We can extract additional features from the DateTimeIndex.

In [None]:
data['Year'] = data.index.year
data['Month'] = data.index.month
data['Weekday Name'] = data.index.weekday_name
data.head()

<br>

We can also perform aggregations based on time periods and plot them to show trend by time periods. The __.resample()__ method in pandas is similar to its groupby method where we essentially group-by a certain time span. We can then specify a method of how we would like to resample.

In [None]:
data_monthly_mean = data[data.columns].resample('M').mean()
data_weekly_mean = data[data.columns].resample('W').mean()

start, end = '2008-01', '2019-06'         # Start and end of the date range to extract
fig, ax = plt.subplots(figsize=(14, 5))   # Plot daily and weekly resampled time series together
ax.plot(data_weekly_mean.loc[start:end, 'time_to_update'], marker='.', linestyle='-', linewidth=0.5, label='Weekly Mean')
ax.plot(data_monthly_mean.loc[start:end, 'time_to_update'], marker='o', markersize=8, linestyle='-', label='Monthly Mean')
ax.set_ylabel('time to update bank closures')
ax.legend()