I have the following specification on my computer: Windows10, 64 bit,Python 3.5 and Anaconda3.I tried many times to install XGBoost but somehow it never worked for me. Today I decided to make it happen and am sharing this post to help anyone else who is struggling with installing XGBoost for Windows.
XGBoost is short for “Extreme Gradient Boosting”.XGBoost is an optimized distributed gradient boosting system designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.
XGBoost provides a parallel tree boosting(also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. It is one of the most frequently used packages to win machine learning challenges.
Firstly I followed the steps from this discussion at StackOverflow:
- Download and install MinGW-64: http://sourceforge.net/projects/mingw-w64/
- On the first screen of the install, prompt make sure you set the Architecture to x86_64 and the Threads to win32
- I installed to C:\mingw64 (to avoid spaces in the file path) so I added this to my PATH environment variable: C:\mingw64\mingw64\bin
- I also noticed that the make utility that is included in bin\mingw64 is called mingw32-make so to simplify things I just renamed this to make
- Open a Windows command prompt and type gcc. You should see something like “fatal error: no input file”
- Next type make. You should see something like “No targets specified and no makefile found”
- Type git. If you don’t have git, install it and add it to your PATH.
To get the source code run these lines:
- cd c:\
- git clone –recursive https://github.com/dmlc/xgboost
- cd xgboost
- git submodule init
- git submodule update
- cp make/mingw64.mk config.mk
- make -j4
To install the Python package, do the following:
- cd python-package
- python setup.py install
And it worked on my Python Interpreter. However, when I tried in Anaconda it failed to import xgboost module.
For Anaconda simply use the Anaconda Prompt and go to C:\xgboost\python-package . This points to the python-package directory of XGBoost. Then type C:\xgboost\python-package>python setup.py install
Next we open a jupyter notebook and add the path to the g++ runtime libraries to the os environment path variable with:
mingw_path = ‘C:\Program Files\mingw-w64\x86_64-5.3.0-posix-seh-rt_v4-rev0\mingw64\bin’
os.environ[‘PATH’] = mingw_path + ‘;’ + os.environ[‘PATH’]
Importing the classes and functions
import numpy import xgboost from sklearn import cross_validation from sklearn.metrics import accuracy_score
load the CSV file as a NumPy array
# load data dataset = numpy.loadtxt('pima-indians-diabetes.csv', delimiter=",")
Separate the columns (attributes or features) of the dataset into input patterns (X) and output patterns (Y)
# split data into X and y X = dataset[:,0:8] Y = dataset[:,8]
Split the X and Y data into a training and test dataset
# split data into train and test sets seed = 7 test_size = 0.33 X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, Y, test_size=test_size, random_state=seed)
Train the XGBoost Model
XGBoost provides a wrapper class to allow models to be treated like classifiers or regressors in the scikit-learn framework.The XGBoost model for classification is called XGBClassifier.
# fit model no training data model = xgboost.XGBClassifier() model.fit(X_train, y_train) print(model)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, nthread=-1, objective='binary:logistic', reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True, subsample=1)
Make Predictions with XGBoost Model
By default, the predictions made by XGBoost are probabilities. Because this is a binary classification problem, each prediction is the probability of the input pattern belonging to the first class. We can easily convert them to binary class values by rounding them to 0 or 1.
# make predictions for test data y_pred = model.predict(X_test) predictions = [round(value) for value in y_pred]
Evaluate the performance
# evaluate predictions accuracy = accuracy_score(y_test, predictions) print("Accuracy: %.2f%%" % (accuracy * 100.0))
I hope this blog post will help Windows user and I am going to use XGBoost in my future machine learning endeavors.
Do you have any questions about XGBoost or about this post? Ask your questions in the comments and I will do my best to answer.