Bitcoin Data

Bitcoin is often described as "virtual gold" — as well as everything from a "bubble[1]" to a "Ponzi scheme[2]." But what excites most people, is Bitcoin's potential to drive innovation and reform within the financial sector — and to challenge state-run banks.

bitcoin-10k

With the sudden rise of Bitcoin prices in 2017, its usage and public knowledge increases proportionately. As such, many questions arise with its popularity. How do Bitcoin markets behave? What are the causes of the sudden spikes and dips in cryptocurrency values? How can we predict what will happen next?

The answers begin with the retrieval of Bitcoin Data.


1. Retrieve Bitcoin Data

To start things off, we'll need Quandl's free Bitcoin API, which can be downloaded via conda install quandl or pip install quandl.

Next, we'll define a helper function to download and cache datasets from Quandl. Here's a brief explanation of what it does:

  • We'll use pickle to serialize and save the downloaded data as a file, which will prevent our script from re-downloading the same data each time we run it.
  • The function will return the data as a Pandas Dataframe.
import quandl
import pickle

# Define a helper function to download and cache datasets from Quandl
def get_quandl_data(quandl_id):
    cache_path = '{}.pkl'.format(quandl_id).replace('/','-')
    try:
        with open(cache_path, 'rb') as f:
            df = pickle.load(f)   
            print('Loaded {} from cache'.format(quandl_id))
    except (OSError, IOError) as e:
        print('Downloading {} from Quandl'.format(quandl_id))
        df = quandl.get(quandl_id, returns='pandas')
        df.to_pickle(cache_path)
        print('Cached {} at {}'.format(quandl_id, cache_path))
    return df

Here's how you pull the historical Bitcoin exchange data for the Coinbase Bitcoin exchange:

quandl_id = 'BCHARTS/{}USD'.format('COINBASE')
df = get_quandl_data(quandl_id)

Inspect the first 5 rows[3]:

df.head()
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2015-02-04 228.77 230.25 221.85 227.11 3344.120783 7.578076e+05 226.608934
2015-05-11 241.59 245.49 239.50 242.86 6176.481242 1.498052e+06 242.541273
2015-05-12 242.74 243.40 240.37 242.03 6322.571412 1.531046e+06 242.155569
2015-05-13 242.04 244.13 235.54 236.32 10028.946883 2.424123e+06 241.712609
2015-05-14 236.93 240.00 232.61 237.46 10365.968041 2.462002e+06 237.508205

Inspect the last 5 rows (note the difference in dates):

df.tail()
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2017-12-01 9903.01 10937.0 9517.00 10869.84 32352.839866 3.330137e+08 10293.184141
2017-12-02 10869.85 11100.0 10701.00 10930.24 19500.134341 2.134496e+08 10946.056300
2017-12-03 10930.24 11891.0 10601.00 11290.00 30111.537346 3.407965e+08 11317.806361
2017-12-04 11290.00 11650.0 10950.00 11643.98 19535.803182 2.224975e+08 11389.214464
2017-12-05 11643.99 11680.0 11624.41 11626.00 214.890171 2.504668e+06 11655.572330

2. Plot the Data

It's easier to visualize the data in the form of a chart. Let's do this with good ol' Plotly.

# Start by importing Plotly and enable offline mode
import plotly.offline as py
import plotly.graph_objs as go
py.init_notebook_mode(connected=True)
# Retrieve Bitcoin prices from Coinbase
btc_usd_price_coinbase = get_quandl_data('BCHARTS/{}USD'.format('COINBASE'))
# Plot with Plotly
btc_trace = go.Scatter(x=btc_usd_price_coinbase.index, 
                       y=btc_usd_price_coinbase['Weighted Price'])

layout = go.Layout(
    title='Bitcoin Price (USD) from Coinbase',
    xaxis=dict(title='Dates'),
    yaxis=dict(title='Weighted Price (USD)'))

fig = go.Figure(data=[btc_trace], layout=layout)

py.iplot(fig)

3. Retrieve Bitcoin Data From Other Exchanges

Let's proceed further by retrieving more Bitcoin data from other exchanges, namely:

  1. Coinbase (https://www.coinbase.com/)
  2. Kraken (https://www.kraken.com/)
  3. Bitstamp (https://www.bitstamp.net/)
  4. itBit (https://www.itbit.com/)
exchanges = ['COINBASE', 'KRAKEN', 'BITSTAMP','ITBIT']

exchange_data = {}
for exchange in exchanges:
    exchange_code = 'BCHARTS/{}USD'.format(exchange)
    df = get_quandl_data(exchange_code)
    exchange_data[exchange] = df

Merge all of the pricing data into a single Dataframe that we call btc_usd_datasets.

Note that among all the columns in the Bitcoin datasets (i.e. "Open", "High", "Low", "Close", etc.), we'll only use the "Weighted Price" column.

from pandas import DataFrame

# Helper function to merge a single column of 
# each dataframe into a new combined dataframe
def merge_dfs_on_column(dataframes, labels, col):
    d = {labels[i]:dataframes[i][col] for i in range(len(dataframes))}       
    return DataFrame(d)
    
# Merge the BTC prices into a single dataframe
btc_usd_datasets = merge_dfs_on_column(dataframes=list(exchange_data.values()), 
                                       labels=list(exchange_data.keys()), 
                                       col='Weighted Price')

# Show the last 10 rows of Bitcoin prices
# from various Bitcoin exchanges
btc_usd_datasets.tail(10)
BITSTAMP COINBASE ITBIT KRAKEN
Date
2017-11-26 9027.857089 9206.919540 9184.919496 9023.042380
2017-11-27 9527.863617 9651.360612 9592.928385 9494.206005
2017-11-28 9852.647901 9892.840318 9836.883769 9727.223970
2017-11-29 10377.663143 10368.578491 10405.182638 10275.273668
2017-11-30 9715.306791 9940.329483 9704.881677 9894.191618
2017-12-01 10287.373047 10293.184141 10364.119476 10296.380966
2017-12-02 10927.843358 10946.056300 10929.867866 10926.747343
2017-12-03 11249.499550 11317.806361 11241.321426 11274.703356
2017-12-04 11314.695288 11389.214464 11322.744359 11387.470641
2017-12-05 11616.819714 11655.572330 11625.090000 11640.176467

4. Plot the Data from All Bitcoin Exchanges

Finally, let's plot the data from the four exchanges.

# Generate a scatter plot of the entire DataFrame
def df_scatter(df, title, seperate_y_axis=False, y_axis_label='', scale='linear', initial_hide=False):
    label_arr = list(df)
    series_arr = list(map(lambda col: df[col], label_arr))
    
    layout = go.Layout(
        title=title,
        legend=dict(orientation="h"),
        xaxis=dict(type='date'),
        yaxis=dict(
            title=y_axis_label,
            showticklabels= not seperate_y_axis,
            type=scale
        )
    )
    
    y_axis_config = dict(
        overlaying='y',
        showticklabels=False,
        type=scale )
    
    visibility = 'visible'
    if initial_hide:
        visibility = 'legendonly'
        
    # Form Trace For Each Series
    trace_arr = []
    for index, series in enumerate(series_arr):
        trace = go.Scatter(
            x=series.index, 
            y=series, 
            name=label_arr[index],
            visible=visibility
        )
        
        # Add seperate axis for the series
        if seperate_y_axis:
            trace['yaxis'] = 'y{}'.format(index + 1)
            layout['yaxis{}'.format(index + 1)] = y_axis_config    
        trace_arr.append(trace)

    fig = go.Figure(data=trace_arr, layout=layout)
    py.iplot(fig)
df_scatter(btc_usd_datasets, 'Bitcoin Price (USD) By Exchange')

5. Summary

And that's how you retrieve Bitcoin prices 😎

Stay tuned to the next post as we explore the retrieval of non-Bitcoin cryptocurrencies, such as Ethereum (ETH) and Litecoin (LTC).


6. Codes, In a Nutshell

import quandl
import pickle
from pandas import DataFrame


# Define a helper function to download and cache datasets from Quandl
def get_quandl_data(quandl_id):
    cache_path = '{}.pkl'.format(quandl_id).replace('/','-')
    try:
        with open(cache_path, 'rb') as f:
            df = pickle.load(f)   
            print('Loaded {} from cache'.format(quandl_id))
    except (OSError, IOError) as e:
        print('Downloading {} from Quandl'.format(quandl_id))
        df = quandl.get(quandl_id, returns='pandas')
        df.to_pickle(cache_path)
        print('Cached {} at {}'.format(quandl_id, cache_path))
    return df

# Retrieve Bitcoin data from several exchanges
exchanges = ['COINBASE', 'KRAKEN', 'BITSTAMP','ITBIT']
exchange_data = {}
for exchange in exchanges:
    exchange_code = 'BCHARTS/{}USD'.format(exchange)
    df = get_quandl_data(exchange_code)
    exchange_data[exchange] = df
    

# Helper function to merge a single column of each DF into a new combined DF
def merge_dfs_on_column(dataframes, labels, col):
    d = {labels[i]:dataframes[i][col] for i in range(len(dataframes))}       
    return DataFrame(d)
    
# Merge the BTC prices into a single DF
# Note that we'll only use the "Weighted Price" column
btc_usd_datasets = merge_dfs_on_column(dataframes=list(exchange_data.values()), 
                                       labels=list(exchange_data.keys()), 
                                       col='Weighted Price')

# Show the last 10 rows of Bitcoin prices
btc_usd_datasets.tail(10)

If you enjoyed this post and want to buy me a cup of coffee...

The thing is, I'll always accept a cup of coffee. So feel free to buy me one.

Cheers! ☕️


  1. A bubble is an economic cycle characterized by rapid escalation of asset prices followed by a contraction. It is created by a surge in asset prices unwarranted by the fundamentals of the asset and driven by exuberant market behavior. ↩︎

  2. A Ponzi scheme is a fraudulent investment operation where the operator generates returns for older investors through revenue paid by new investors, rather than from legitimate business activities or profit of financial trading. ↩︎

  3. Made with this awesome HTML Table to Markdown Converter. ↩︎