In [1]:
import pandas as pd
import numpy as np

import os
import requests

import warnings
warnings.filterwarnings("ignore")

Acquire Exercises

Here is the API Review Notebook that supports acquiring data using a REST API.


Explore Site and API

I'm going to investigate the documentation provided by the API and explore a couple of responses before I dig into the exercises.

In [2]:
# I can make a request to the url below and use the `.json` method on my results to return a...

base_url = 'https://python.zach.lol'

type(requests.get(base_url).json())
Out[2]:
dict
In [3]:
# I have two choices of paths I can add to my base url, '/api/v1' and '/documentation'.

requests.get(base_url).json()
Out[3]:
{'api': '/api/v1', 'help': '/documentation'}
In [4]:
# I'll create a doc_url to request help with using this api below.

doc_url = base_url + '/documentation'
In [5]:
# I have two keys in the dictionary returned from my request.

requests.get(doc_url).json().keys()
Out[5]:
dict_keys(['payload', 'status'])
In [6]:
# I can print the value for the status key.

print(requests.get(doc_url).json()['status'])
ok
In [7]:
# I can print the value for the payload key.

print(requests.get(doc_url).json()['payload'])
The API accepts GET requests for all endpoints, where endpoints are prefixed
with

    /api/{version}

Where version is "v1"

Valid endpoints:

- /stores[/{store_id}]
- /items[/{item_id}]
- /sales[/{sale_id}]

All endpoints accept a `page` parameter that can be used to navigate through
the results.

This tells me that I can use 3 different endpoints to access data by adding stores, items, or sales, to my base_url + /api/v1/ like below:

'https://python.zach.lol/api/v1/items'
'https://python.zach.lol/api/v1/stores'
'https://python.zach.lol/api/v1/sales'

There is also a page parameter that I can add to each of these endpoints to navigate through multiple pages of results.

'?page=n'

For example:

'https://python.zach.lol/api/v1/items?page=1'
In [18]:
# I will create my api url.

api_url = base_url + '/api/v1/'

1. Items Pages

Using the code from the lesson as a guide, create a dataframe named items that has all of the data for items.

  • I want to explore the items endpoint first to see how the information returned by this API is structured.
In [9]:
# This submits the request for the first page of results and stores the results in response.
# My request was successful.

response = requests.get(api_url + 'items')
response.ok
Out[9]:
True
In [10]:
# Use .json() method on my response to get a dictionary object; I'll store it in `data` variable.

data = response.json()

print(type(data))
data
<class 'dict'>
Out[10]:
{'payload': {'items': [{'item_brand': 'Riceland',
    'item_id': 1,
    'item_name': 'Riceland American Jazmine Rice',
    'item_price': 0.84,
    'item_upc12': '35200264013',
    'item_upc14': '35200264013'},
   {'item_brand': 'Caress',
    'item_id': 2,
    'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
    'item_price': 6.44,
    'item_upc12': '11111065925',
    'item_upc14': '11111065925'},
   {'item_brand': 'Earths Best',
    'item_id': 3,
    'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
    'item_price': 2.43,
    'item_upc12': '23923330139',
    'item_upc14': '23923330139'},
   {'item_brand': 'Boars Head',
    'item_id': 4,
    'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
    'item_price': 3.14,
    'item_upc12': '208528800007',
    'item_upc14': '208528800007'},
   {'item_brand': 'Back To Nature',
    'item_id': 5,
    'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
    'item_price': 2.61,
    'item_upc12': '759283100036',
    'item_upc14': '759283100036'},
   {'item_brand': 'Sally Hansen',
    'item_id': 6,
    'item_name': 'Sally Hansen Nail Color Magnetic 903 Silver Elements',
    'item_price': 6.93,
    'item_upc12': '74170388732',
    'item_upc14': '74170388732'},
   {'item_brand': 'Twinings Of London',
    'item_id': 7,
    'item_name': 'Twinings Of London Classics Lady Grey Tea - 20 Ct',
    'item_price': 9.64,
    'item_upc12': '70177154004',
    'item_upc14': '70177154004'},
   {'item_brand': 'Lea & Perrins',
    'item_id': 8,
    'item_name': 'Lea & Perrins Marinade In-a-bag Cracked Peppercorn',
    'item_price': 1.68,
    'item_upc12': '51600080015',
    'item_upc14': '51600080015'},
   {'item_brand': 'Van De Kamps',
    'item_id': 9,
    'item_name': 'Van De Kamps Fillets Beer Battered - 10 Ct',
    'item_price': 1.79,
    'item_upc12': '19600923015',
    'item_upc14': '19600923015'},
   {'item_brand': 'Ahold',
    'item_id': 10,
    'item_name': 'Ahold Cocoa Almonds',
    'item_price': 3.17,
    'item_upc12': '688267141676',
    'item_upc14': '688267141676'},
   {'item_brand': 'Honest Tea',
    'item_id': 11,
    'item_name': 'Honest Tea Peach White Tea',
    'item_price': 3.93,
    'item_upc12': '657622604842',
    'item_upc14': '657622604842'},
   {'item_brand': 'Mueller',
    'item_id': 12,
    'item_name': 'Mueller Sport Care Basic Support Level Medium Elastic Knee Support',
    'item_price': 8.4,
    'item_upc12': '74676640211',
    'item_upc14': '74676640211'},
   {'item_brand': 'Garnier Nutritioniste',
    'item_id': 13,
    'item_name': 'Garnier Nutritioniste Moisture Rescue Fresh Cleansing Foam',
    'item_price': 6.47,
    'item_upc12': '603084234561',
    'item_upc14': '603084234561'},
   {'item_brand': 'Pamprin',
    'item_id': 14,
    'item_name': 'Pamprin Maximum Strength Multi-symptom Menstrual Pain Relief',
    'item_price': 7.54,
    'item_upc12': '41167300121',
    'item_upc14': '41167300121'},
   {'item_brand': 'Suave',
    'item_id': 15,
    'item_name': 'Suave Naturals Moisturizing Body Wash Creamy Tropical Coconut',
    'item_price': 9.11,
    'item_upc12': '79400847201',
    'item_upc14': '79400847201'},
   {'item_brand': 'Burts Bees',
    'item_id': 16,
    'item_name': 'Burts Bees Daily Moisturizing Cream Sensitive',
    'item_price': 5.17,
    'item_upc12': '792850014008',
    'item_upc14': '792850014008'},
   {'item_brand': 'Ducal',
    'item_id': 17,
    'item_name': 'Ducal Refried Red Beans',
    'item_price': 1.16,
    'item_upc12': '88313590791',
    'item_upc14': '88313590791'},
   {'item_brand': 'Scotch',
    'item_id': 18,
    'item_name': 'Scotch Removable Clear Mounting Squares - 35 Ct',
    'item_price': 4.39,
    'item_upc12': '21200725340',
    'item_upc14': '21200725340'},
   {'item_brand': 'Careone',
    'item_id': 19,
    'item_name': 'Careone Family Comb Set - 8 Ct',
    'item_price': 0.74,
    'item_upc12': '41520035646',
    'item_upc14': '41520035646'},
   {'item_brand': 'Usda Produce',
    'item_id': 20,
    'item_name': 'Plums Black',
    'item_price': 5.62,
    'item_upc12': '204040000000',
    'item_upc14': '204040000000'}],
  'max_page': 3,
  'next_page': '/api/v1/items?page=2',
  'page': 1,
  'previous_page': None},
 'status': 'ok'}
In [11]:
# List the keys in my dictionary object; I see payload and status.

data.keys()
Out[11]:
dict_keys(['payload', 'status'])

I can see above that 'payload' is also a dictionary object; I also see that the first key, items, has a value that is a list of dictionaries. I can check out all of the key:value pairs in payload to see what is of use to me.

In [12]:
# Look at the keys in the payload dictionary.

data['payload'].keys()
Out[12]:
dict_keys(['items', 'max_page', 'next_page', 'page', 'previous_page'])
In [13]:
# I see that the `items` list holds 20 dictionaries (items).

len(data['payload']['items'])
Out[13]:
20
In [14]:
# I'll check out just the first 2 dictionaries (items) in the list.

data['payload']['items'][:2]
Out[14]:
[{'item_brand': 'Riceland',
  'item_id': 1,
  'item_name': 'Riceland American Jazmine Rice',
  'item_price': 0.84,
  'item_upc12': '35200264013',
  'item_upc14': '35200264013'},
 {'item_brand': 'Caress',
  'item_id': 2,
  'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
  'item_price': 6.44,
  'item_upc12': '11111065925',
  'item_upc14': '11111065925'}]
In [15]:
# Look at the values of the other keys in the 'payload' dictionary.

print(f"The current page of the results from my request is {data['payload']['page']}.")
print(f"The next page of the results from my request is {data['payload']['next_page']}.")
print(f"The total number of pages in the results from my request is {data['payload']['max_page']}.")
print(f"The previous page in the results from my request is {data['payload']['previous_page']}.")
The current page of the results from my request is 1.
The next page of the results from my request is /api/v1/items?page=2.
The total number of pages in the results from my request is 3.
The previous page in the results from my request is None.
In [16]:
# I create a list variable to hold the list of the 20 items from page one.

items = data['payload']['items']
print(len(items))
type(items)
20
Out[16]:
list
In [17]:
# 'next_page' returns the path and page param for the second page of results.

data['payload']['next_page']
Out[17]:
'/api/v1/items?page=2'
In [19]:
# Submit a request for the the next page and store it in the `response` variable.

response = requests.get(base_url + data['payload']['next_page'])
In [20]:
# Use the `.json()` method to return a dictionary object like I did above for page 1.

data = response.json()
In [21]:
# Add items from the second page to our list using `.extend()`

items.extend(data['payload']['items'])
In [22]:
# The `items` list now contains 40 items (dictionaries).

len(items)
Out[22]:
40
In [23]:
# Our next page is page 3 of items out of 3

data['payload']['next_page']
Out[23]:
'/api/v1/items?page=3'
In [24]:
# Grab the next page in the same way and add items to my `items` list.
# I see there are only 10 items on this last page.

response = requests.get(base_url + data['payload']['next_page'])

data = response.json()
len(data['payload']['items'])
Out[24]:
10
In [25]:
# Add the last 10 items to my `items` list, which now contains a total of 50 items.

items.extend(data['payload']['items'])
len(items)
Out[25]:
50

There is no next page, so data['payload']['next_page'] returns None. This could come in handy when we write our function later to automate the above process.

In [26]:
data['payload']['next_page'] == None
Out[26]:
True
In [27]:
# Use our items, our list of dictionaries, to create a DataFrame

items_df = pd.DataFrame(items)
print(f'The items_df has the shape {items_df.shape}.\n')
items_df.head(2)
The items_df has the shape (50, 6).

Out[27]:
item_brand item_id item_name item_price item_upc12 item_upc14
0 Riceland 1 Riceland American Jazmine Rice 0.84 35200264013 35200264013
1 Caress 2 Caress Velvet Bliss Ultra Silkening Beauty Bar... 6.44 11111065925 11111065925

2. Stores Pages

Do the same thing, but for stores.

  • There is only 1 page of stores data to request.
In [28]:
# I want to see I how many pages of stores I have to request.

api_url = base_url + '/api/v1/'
response = requests.get(api_url + 'stores')
data = response.json()
In [29]:
data['payload']['max_page']
Out[29]:
1
In [30]:
# This time I want to grab stores instead of items.

data['payload'].keys()
Out[30]:
dict_keys(['max_page', 'next_page', 'page', 'previous_page', 'stores'])
In [31]:
# Again, I have a list of dictionaries; I can convert this into a pandas DataFrame now.

stores = data['payload']['stores'][:2]
stores_df = pd.DataFrame(stores)
In [32]:
print(f"My stores_df has the shape {stores_df.shape}")
stores_df.head()
My stores_df has the shape (2, 5)
Out[32]:
store_address store_city store_id store_state store_zipcode
0 12125 Alamo Ranch Pkwy San Antonio 1 TX 78253
1 9255 FM 471 West San Antonio 2 TX 78251

3. Sales Pages

Extract the data for sales. Your code should continue fetching data from the next page until all of the data is extracted.

  • There are 183 pages of data here, so I'm going to build a function to automate the above process.
In [33]:
api_url = base_url + '/api/v1/'
response = requests.get(api_url + 'sales')
data = response.json()
data['payload']['max_page']
Out[33]:
183

Build Helper Function

This will request the data from the API and save the individual dataframes to csv files for each path name I pass in, one at a time.

In [34]:
def get_df(name):
    """
    This function takes in the string
    'items', 'stores', or 'sales' and
    returns a df containing all pages and
    creates a .csv file for future use.
    """
    base_url = 'https://python.zach.lol'
    api_url = base_url + '/api/v1/'
    response = requests.get(api_url + name)
    data = response.json()
    
    # create list from 1st page
    my_list = data['payload'][name]
    
    # loop through the pages and add to list
    while data['payload']['next_page'] != None:
        response = requests.get(base_url + data['payload']['next_page'])
        data = response.json()
        my_list.extend(data['payload'][name])
    
    # Create DataFrame from list
    df = pd.DataFrame(my_list)
    
    # Write DataFrame to csv file for future use
    df.to_csv(name + '.csv')
    return df
In [35]:
items_df = get_df('items')
print(items_df.shape)
items_df.head()
(50, 6)
Out[35]:
item_brand item_id item_name item_price item_upc12 item_upc14
0 Riceland 1 Riceland American Jazmine Rice 0.84 35200264013 35200264013
1 Caress 2 Caress Velvet Bliss Ultra Silkening Beauty Bar... 6.44 11111065925 11111065925
2 Earths Best 3 Earths Best Organic Fruit Yogurt Smoothie Mixe... 2.43 23923330139 23923330139
3 Boars Head 4 Boars Head Sliced White American Cheese - 120 Ct 3.14 208528800007 208528800007
4 Back To Nature 5 Back To Nature Gluten Free White Cheddar Rice ... 2.61 759283100036 759283100036
In [36]:
stores_df = get_df('stores')
print(stores_df.shape)
stores_df.head()
(10, 5)
Out[36]:
store_address store_city store_id store_state store_zipcode
0 12125 Alamo Ranch Pkwy San Antonio 1 TX 78253
1 9255 FM 471 West San Antonio 2 TX 78251
2 2118 Fredericksburg Rdj San Antonio 3 TX 78201
3 516 S Flores St San Antonio 4 TX 78204
4 1520 Austin Hwy San Antonio 5 TX 78218
In [37]:
sales_df = get_df('sales')
print(sales_df.shape)
sales_df.head()
(913000, 5)
Out[37]:
item sale_amount sale_date sale_id store
0 1 13.0 Tue, 01 Jan 2013 00:00:00 GMT 1 1
1 1 11.0 Wed, 02 Jan 2013 00:00:00 GMT 2 1
2 1 14.0 Thu, 03 Jan 2013 00:00:00 GMT 3 1
3 1 13.0 Fri, 04 Jan 2013 00:00:00 GMT 4 1
4 1 10.0 Sat, 05 Jan 2013 00:00:00 GMT 5 1

5. Merge DataFrames

  • Combine the data from your three separate dataframes into one large dataframe.
In [38]:
# I can see all of my dataframes above, so I know how to join and what to drop.

df = pd.merge(sales_df, stores_df, left_on='store', right_on='store_id').drop(columns={'store'})
df.head(2)
Out[38]:
item sale_amount sale_date sale_id store_address store_city store_id store_state store_zipcode
0 1 13.0 Tue, 01 Jan 2013 00:00:00 GMT 1 12125 Alamo Ranch Pkwy San Antonio 1 TX 78253
1 1 11.0 Wed, 02 Jan 2013 00:00:00 GMT 2 12125 Alamo Ranch Pkwy San Antonio 1 TX 78253
In [39]:
df = pd.merge(df, items_df, left_on='item', right_on='item_id').drop(columns={'item'})
df.head(2)
Out[39]:
sale_amount sale_date sale_id store_address store_city store_id store_state store_zipcode item_brand item_id item_name item_price item_upc12 item_upc14
0 13.0 Tue, 01 Jan 2013 00:00:00 GMT 1 12125 Alamo Ranch Pkwy San Antonio 1 TX 78253 Riceland 1 Riceland American Jazmine Rice 0.84 35200264013 35200264013
1 11.0 Wed, 02 Jan 2013 00:00:00 GMT 2 12125 Alamo Ranch Pkwy San Antonio 1 TX 78253 Riceland 1 Riceland American Jazmine Rice 0.84 35200264013 35200264013
In [40]:
df.shape
Out[40]:
(913000, 14)

Pagination Using Params

There is another way that I can approach pagination of APIs using the params parameter with the .get() method. The documentation for the API informed me that "All endpoints accept a page parameter that can be used to navigate through the results."

Above we used the value of data['payload']['next_page'] to provide the path and query parameter, '/api/v1/items?page=n', that we concatenated to our base_url, https://python.zach.lol, to access each page.

Below, I will instead pass a dictionary to params to 'turn the pages' so to speak. This is just a different way to access the data and may come in handy when you work with different APIs. If it's TMI right now, skip it; the above method works fine for this API.

default

requests.get(url, params={key: value}, args)

Here are more parameters that can be used with the .get() method.

In [5]:
# Create endpoints for use below.

items_url = 'https://python.zach.lol/api/v1/items'
stores_url = 'https://python.zach.lol/api/v1/stores'
sales_url = 'https://python.zach.lol/api/v1/sales'
In [10]:
# Create an empty list names `results`.
results = []

# Loop through the pages of my endpoint until my reponse is empty.
for i in range(3):
    response =  requests.get(items_url, params = {"page": i+1})    
    
    # We have reached the end of the results if the response length is 0.
    if len(response.json()) == 0:   
        break
    else:
        
        # Convert my response to a dictionary and store as variable `data`.
        data = response.json()
        
        # Add the list of dictionaries to my list
        results.extend(data['payload']['items'])
        
print(results[:2])
len(results)
[{'item_brand': 'Riceland', 'item_id': 1, 'item_name': 'Riceland American Jazmine Rice', 'item_price': 0.84, 'item_upc12': '35200264013', 'item_upc14': '35200264013'}, {'item_brand': 'Caress', 'item_id': 2, 'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct', 'item_price': 6.44, 'item_upc12': '11111065925', 'item_upc14': '11111065925'}]
Out[10]:
50
In [19]:
def get_df_params(name):
    """
    This function takes in the string
    'items', 'stores', or 'sales' and
    returns a df containing all pages and
    creates a .csv file for future use.
    """
    # Create an empty list names `results`.
    results = []
    
    # Create api_url variable
    api_url = 'https://python.zach.lol/api/v1/'
    
    # Loop through the page parameters until an empty response is returned.
    for i in range(3):
        response =  requests.get(items_url, params = {"page": i+1})    
    
        # We have reached the end of the results
        if len(response.json()) == 0:   
            break
            
        else:
            # Convert my response to a dictionary and store as variable `data`
            data = response.json()
        
            # Add the list of dictionaries to my list
            results.extend(data['payload'][name])
    
    # Create DataFrame from list
    df = pd.DataFrame(results)
    
    # Write DataFrame to csv file for future use
    df.to_csv(name + '.csv')
    
    return df
In [20]:
get_df_params('items').head()
Out[20]:
item_brand item_id item_name item_price item_upc12 item_upc14
0 Riceland 1 Riceland American Jazmine Rice 0.84 35200264013 35200264013
1 Caress 2 Caress Velvet Bliss Ultra Silkening Beauty Bar... 6.44 11111065925 11111065925
2 Earths Best 3 Earths Best Organic Fruit Yogurt Smoothie Mixe... 2.43 23923330139 23923330139
3 Boars Head 4 Boars Head Sliced White American Cheese - 120 Ct 3.14 208528800007 208528800007
4 Back To Nature 5 Back To Nature Gluten Free White Cheddar Rice ... 2.61 759283100036 759283100036
In [21]:
# This helper function returns the same data as my other function.

get_df_params('items').shape
Out[21]:
(50, 6)

7a. get_store_date() Function

  • Create a function that checks for a csv file, and if one doesn't exist it creates one.

  • The function should also create one large df using all three df.

  • Create this function using either of our helper functions above; your choice.

In [18]:
def get_store_data():
    """
    This function checks for csv files
    for items, sales, stores, and big_df 
    if there are none, it creates them.
    It returns one big_df of merged dfs.
    """
    # check for csv files or create them
    if os.path.isfile('items.csv'):
        items_df = pd.read_csv('items.csv', index_col=0)
    else:
        items_df = get_df('items')
        
    if os.path.isfile('stores.csv'):
        stores_df = pd.read_csv('stores.csv', index_col=0)
    else:
        stores_df = get_df('stores')
        
    if os.path.isfile('sales.csv'):
        sales_df = pd.read_csv('sales.csv', index_col=0)
    else:
        sales_df = get_df('sales')
        
    if os.path.isfile('big_df.csv'):
        df = pd.read_csv('big_df.csv', index_col=0)
        return df
    else:
        # merge all of the DataFrames into one
        df = pd.merge(sales_df, stores_df, left_on='store', right_on='store_id').drop(columns={'store'})
        df = pd.merge(df, items_df, left_on='item', right_on='item_id').drop(columns={'item'})

        # write merged DateTime df with all data to directory for future use
        df.to_csv('big_df.csv')
        return df
In [19]:
df = get_store_data()
df.head(2)
Out[19]:
sale_amount sale_date sale_id store_address store_city store_id store_state store_zipcode item_brand item_id item_name item_price item_upc12 item_upc14
0 13.0 Tue, 01 Jan 2013 00:00:00 GMT 1 12125 Alamo Ranch Pkwy San Antonio 1 TX 78253 Riceland 1 Riceland American Jazmine Rice 0.84 35200264013 35200264013
1 11.0 Wed, 02 Jan 2013 00:00:00 GMT 2 12125 Alamo Ranch Pkwy San Antonio 1 TX 78253 Riceland 1 Riceland American Jazmine Rice 0.84 35200264013 35200264013
In [9]:
df.shape
Out[9]:
(913000, 14)
In [10]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 913000 entries, 0 to 912999
Data columns (total 14 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   sale_amount    913000 non-null  float64
 1   sale_date      913000 non-null  object 
 2   sale_id        913000 non-null  int64  
 3   store_address  913000 non-null  object 
 4   store_city     913000 non-null  object 
 5   store_id       913000 non-null  int64  
 6   store_state    913000 non-null  object 
 7   store_zipcode  913000 non-null  int64  
 8   item_brand     913000 non-null  object 
 9   item_id        913000 non-null  int64  
 10  item_name      913000 non-null  object 
 11  item_price     913000 non-null  float64
 12  item_upc12     913000 non-null  int64  
 13  item_upc14     913000 non-null  int64  
dtypes: float64(2), int64(6), object(6)
memory usage: 104.5+ MB

6. German Engergy Data

In [4]:
url = 'https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv'
df = pd.read_csv(url)
df.head()
Out[4]:
Date Consumption Wind Solar Wind+Solar
0 2006-01-01 1069.184 NaN NaN NaN
1 2006-01-02 1380.521 NaN NaN NaN
2 2006-01-03 1442.533 NaN NaN NaN
3 2006-01-04 1457.217 NaN NaN NaN
4 2006-01-05 1477.131 NaN NaN NaN
In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4383 entries, 0 to 4382
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Date         4383 non-null   object 
 1   Consumption  4383 non-null   float64
 2   Wind         2920 non-null   float64
 3   Solar        2188 non-null   float64
 4   Wind+Solar   2187 non-null   float64
dtypes: float64(4), object(1)
memory usage: 171.3+ KB

7b. opsd_germany_daily() Function

  • Create a function that retrieves German Energy data and reads/writes csv.
In [6]:
def opsd_germany_daily():
    """
    This function uses or creates the 
    opsd_germany_daily csv and returns a df.
    """
    if os.path.isfile('opsd_germany_daily.csv'):
        df = pd.read_csv('opsd_germany_daily.csv', index_col=0)
    else:
        url = 'https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv'
        df = pd.read_csv(url)
        df.to_csv('opsd_germany_daily.csv')
    return df
In [7]:
gdf = opsd_germany_daily()
gdf.head(2)
Out[7]:
Date Consumption Wind Solar Wind+Solar
0 2006-01-01 1069.184 NaN NaN NaN
1 2006-01-02 1380.521 NaN NaN NaN
In [8]:
gdf.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4383 entries, 0 to 4382
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Date         4383 non-null   object 
 1   Consumption  4383 non-null   float64
 2   Wind         2920 non-null   float64
 3   Solar        2188 non-null   float64
 4   Wind+Solar   2187 non-null   float64
dtypes: float64(4), object(1)
memory usage: 171.3+ KB