Bitcoin has been in news quite a bit lately with the price soaring. It was named the top performing currency four of the last five year. And it’ price has the potential to hit over $100,000 in 10 years, which would mark a 3,483 percent rise from its recent record high. In this post, we are modeling Bitcoin’s Market Capitalization.

Bitcoin is the decentralized network which allows users to transact directly, peer to peer, without a middle man to manage the exchange of funds. Please click this link for more information. Data used here is from Bitcoin put together by Quandl, which is a magnificent platform to scout for financial and economic-related data.

Understanding the data for Modeling Bitcoin’s Market Capitalization

We have the data for bitcoin spanning across 14 attributes with very technical lingo. I researched the web to understand these technical terms and frame the analysis to understand which terms are important. These attributes are as follows:

  1. Date: The data for stocks for Bitcoin is distributed over time-series and is recorded beginning August 28th, 2013 till June 30, 2017
  2. Total BTC — This is the total number of bitcoins being involved in transactions till date. This is like a cumulative frequency which keeps getting added on the Total BTC value on it’s predecessor day.
  3. Market Cap — The total USD value of bitcoin supply in circulation, as calculated by the daily average market price across major exchanges.
  4. Transactions last 24h — The aggregate number of confirmed Bitcoin transactions in the past 24 hours and vary differently everyday. This is not a cumulative value.
  5. Transactions avg. per hour — The number of confirmed Bitcoin transactions per hour
  6. Bitcoins sent last 24h — tells you how many bitcoins were sent in the last 24 hours.
  7. Bitcoins sent avg. per hour — tells you how many bitcoins were sent per hour
  8. Count: Current block count
  9. Blocks last 24h — In terms of Bitcoin, a block is a storage section where your transaction data gets permanently recorded. Blocks are basically files which can be thought of as being organized into linear sequences over a period of time known as the block chain.The average block size in MB.https://blockchain.info/charts
  10. Blocks avg. per hour — The average block size in MB per hour
  11. Difficulty — A relative measure of how difficult it is to find a new block. The difficulty is adjusted periodically as a function of how much hashing power has been deployed by the network of miners. The Bitcoin network has a global block difficulty. Valid blocks must have a hash below this target. Mining pools also have a pool-specific share difficulty setting a lower limit for shares. https://en.bitcoin.it/wiki/Difficulty
  12. Next Difficulty — https://bitcoinwisdom.com/bitcoin/difficulty
  13. Network Hash-rate Trahashs — The estimated number of tera hashes per second (trillions of hashes per second) the Bitcoin network is performing.
  14. Network Hash-rate PetaFLOPS — http://bitcoin.sipa.be/

EDA

Data Types:
Date                            object
Total BTC                      float64
Market Cap                     float64
Transactions last 24h          float64
Transactions avg. per hour     float64
Bitcoins sent last 24h         float64
Bitcoins sent avg. per hour    float64
Count                          float64
Blocks last 24h                float64
Blocks avg. per hour           float64
Difficulty                     float64
Next Difficulty                float64
Network Hashrate Terahashs     float64
Network Hashrate PetaFLOPS     float64
dtype: object
There are 101 values missing from Transactions last 24h, Transactions avg. per hour, Bitcoins sent last 24h and Bitcoins sent avg. per hour.

Further, from the statistics :

  1. 28th November 2016 to 11th November 2016
  2. 29th October 2016 to 15th September 2016
  3. 12th May 2016 to 26th May 2016
  4. 18th July 2015, 16th July 2015 and 14th July 2015
  5. 16th April 2014 and 15th April 2014
  6. 23rd Feb 2014 and 22nd Feb 2014
  7. 2nd Feb 2014 and 2nd Jan 2014

Total BTC, Count, Blocks last 24h, Blocks avg. per hour, Difficulty, Next Difficulty, Network Hashrate Terahashs, Network Hashrate PetaFLOPS are same. The value of blocks is zero with Market Cap changing.

Feature Engineering

We are going to add three more features to see if it helps in modeling Bitcoin’s market capitalization. These features are

  1. Total Transaction per day
  2. Total Bitcoins sent in a day
  3. Total Blocks in a day

After an analysis, we see that ‘Total BTC’, ‘Transactions last 24h’, ‘Transactions avg. per hour’, ‘Count’, ‘Difficulty’, ‘Next Difficulty’, ‘Network Hashrate Terahashs’, ‘Network Hashrate PetaFLOPS’, ‘Total transactions per day’ have an effect on ‘Market Cap’.

We will keep these columns for our model. We have different columns and we will be predicting Market Cap as it gives a clear idea about how Bitcoin is doing on transactions each day.

We fill any missing or NaN(Not a number) value with -99999 since with ML algorithms, it would then be treated as an outlier value and would just be rejected. Remember that we also need to perform the scaling because we have a lot of differences in some of the values in the data.

Compile The Model, Fit The Data

We are going to use linear regression for modeling Bitcoin’s Market Capitalization.  We are going to use all the parameters (‘Total BTC’, ‘Transactions last 24h’, ‘Transactions avg. per hour’, ‘Count’, ‘Difficulty’, ‘Next Difficulty’, ‘Network Hashrate Terahashs’, ‘Network Hashrate PetaFLOPS’, ‘Total transactions per day’) to fit a linear regression model.

Estimated intercept coefficient:  14493862.65

Data frame with features and estimated coefficients

As you can see from the data frame that there is a high correlation between Total BTC(4.5), Transactionavg, per hour Network Hashrate Peta Flops and Market Cap. Let’s plot a scatter plot between True Market Cap and True Total BTC.

Below we are plotting a scatter plot to compare the first 200 Bitcoin’s, Market Cap and the predicted market cap.

You can notice that there is some error in the prediction as the market capitalization decreases.