Installing XGBoost for Windows

Installing XGBoost for Windows – walk-through

Posted on Posted in Data Analysis Resources, Machine Learning, scikit-learn

I have the following specification on my computer: Windows10, 64 bit,Python 3.5 and Anaconda3.I tried many times to install XGBoost but somehow it never worked for me. Today I decided to make it happen and am sharing this post to help anyone else who is struggling with installing XGBoost for Windows.

XGBoost is short for “Extreme Gradient Boosting”.XGBoost is an optimized distributed gradient boosting system designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.

XGBoost provides a parallel tree boosting(also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. It is one of the most frequently used packages to win machine learning challenges.

Reference Paper

Firstly I followed the steps from this discussion at StackOverflow:

  1. Download and install MinGW-64:
  2. On the first screen of the install, prompt make sure you set the Architecture to x86_64 and the Threads to win32
  3. I installed to C:\mingw64 (to avoid spaces in the file path) so I added this to my PATH environment variable: C:\mingw64\mingw64\bin
  4. I also noticed that the make utility that is included in bin\mingw64 is called mingw32-make so to simplify things I just renamed this to make
  5. Open a Windows command prompt and type gcc. You should see something like “fatal error: no input file”
  6. Next type make. You should see something like “No targets specified and no makefile found”
  7. Type git. If you don’t have git, install it and add it to your PATH.

To get the source code run these lines:

  1. cd c:\
  2. git clone –recursive
  3. cd xgboost
  4. git submodule init
  5. git submodule update
  6. cp make/
  7. make -j4

To install the Python package, do the following:

  1. cd python-package
  2. python install

And it worked on my Python Interpreter. However, when I tried in Anaconda it failed to import xgboost module.

Here is another excellent post which I followed to finish the installation.

For Anaconda simply use the Anaconda Prompt and go to C:\xgboost\python-package . This points to the python-package directory of XGBoost. Then type C:\xgboost\python-package>python install

Next we open a jupyter notebook and add the path to the g++ runtime libraries to the os environment path variable with:

import os

mingw_path = ‘C:\Program Files\mingw-w64\x86_64-5.3.0-posix-seh-rt_v4-rev0\mingw64\bin’

os.environ[‘PATH’] = mingw_path + ‘;’ + os.environ[‘PATH’]

Now let us try XGBoost for a simple tutorial.

I followed the tutorial.

Download this dataset and place it into your current working directory with the file name “pima-indians-diabetes.csv“.

Importing the classes and functions

In [1]:
import numpy
import xgboost
from sklearn import cross_validation
from sklearn.metrics import accuracy_score

load the CSV file as a NumPy array

In [2]:
# load data
dataset = numpy.loadtxt('pima-indians-diabetes.csv', delimiter=",")

Separate the columns (attributes or features) of the dataset into input patterns (X) and output patterns (Y)

In [3]:
# split data into X and y
X = dataset[:,0:8]
Y = dataset[:,8]

Split the X and Y data into a training and test dataset

In [4]:
# split data into train and test sets
seed = 7
test_size = 0.33
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, Y, test_size=test_size, random_state=seed)

Train the XGBoost Model

XGBoost provides a wrapper class to allow models to be treated like classifiers or regressors in the scikit-learn framework.The XGBoost model for classification is called XGBClassifier.

In [7]:
# fit model no training data
model = xgboost.XGBClassifier(), y_train)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
       gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3,
       min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
       objective='binary:logistic', reg_alpha=0, reg_lambda=1,
       scale_pos_weight=1, seed=0, silent=True, subsample=1)

You can learn more about the defaults for the XGBClassifier and XGBRegressor classes in the XGBoost Python scikit-learn API.

You can learn more about the meaning of each parameter and how to configure them on the XGBoost parameters page.

Make Predictions with XGBoost Model

By default, the predictions made by XGBoost are probabilities. Because this is a binary classification problem, each prediction is the probability of the input pattern belonging to the first class. We can easily convert them to binary class values by rounding them to 0 or 1.

In [8]:
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]

Evaluate the performance

In [9]:
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
Accuracy: 77.95%

I hope this blog post will help Windows user and I am going to use XGBoost in my future machine learning endeavors.

Do you have any questions about XGBoost or about this post? Ask your questions in the comments and I will do my best to answer.

Leave a Reply

Your email address will not be published. Required fields are marked *