AdaBoost

The AdaBoost (adaptive boosting) algorithm was proposed in 1995 by Yoav Freund and Robert Shapire as a general method for generating a strong classifier out of a set of weak classifiers . AdaBoost works even when the classifiers come from a continuum of potential classifiers (such as neural networks, linear discriminants, etc.)

AdaBoost

Pros: Low generalization error, easy to code, works with most classifiers, no parameters to adjust

Cons: Sensitive to outliers

Works with: Numeric values, nominal values

General approach to AdaBoost

  1. Collect: Any method.
  2. Prepare: It depends on which type of weak learner you’re going to use. In this chapter, we’ll use decision stumps, which can take any type of data. You could use any classifier, so any of the classifiers from chapters 2–6 would work. Simple classifiers work better for a weak learner.
  3. Analyze: Any method.
  4. Train: The majority of the time will be spent here. The classifier will train the weak learner multiple times over the same dataset.
  5. Test: Calculate the error rate.
  6. Use: Like support vector machines, AdaBoost predicts one of two classes.

Creating a weak learner with a decision stump

A decision stump is a simple decision tree.It’s a tree with only one split, so it’s a stump.

In [1]:
from numpy import *

def loadSimpData():
    datMat = matrix([[ 1. ,  2.1],
        [ 2. ,  1.1],
        [ 1.3,  1. ],
        [ 1. ,  1. ],
        [ 2. ,  1. ]])
    classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]
    return datMat,classLabels
In [2]:
datMat,classLabels = loadSimpData()
In [3]:
def loadDataSet(fileName):      #general function to parse tab -delimited floats
    numFeat = len(open(fileName).readline().split('t')) #get number of fields 
    dataMat = []; labelMat = []
    fr = open(fileName)
    for line in fr.readlines():
        lineArr =[]
        curLine = line.strip().split('t')
        for i in range(numFeat-1):
            lineArr.append(float(curLine[i]))
        dataMat.append(lineArr)
        labelMat.append(float(curLine[-1]))
    return dataMat,labelMat

test If any of the values are less than or gretaer than the threshold value

In [4]:
def stumpClassify(dataMatrix,dimen,threshVal,threshIneq):#just classify the data
    retArray = ones((shape(dataMatrix)[0],1))
    if threshIneq == 'lt':
        retArray[dataMatrix[:,dimen] <= threshVal] = -1.0
    else:
        retArray[dataMatrix[:,dimen] > threshVal] = -1.0
    return retArray

Loop over a weighted version and find the stump that yeilds the lowest error

In [12]:
def buildStump(dataArr,classLabels,D):
    dataMatrix = mat(dataArr); labelMat = mat(classLabels).T
    m,n = shape(dataMatrix)
    numSteps = 10.0; bestStump = {}; bestClasEst = mat(zeros((m,1)))
    minError = inf #init error sum, to +infinity
    for i in range(n):#loop over all dimensions
        rangeMin = dataMatrix[:,i].min(); rangeMax = dataMatrix[:,i].max();
        stepSize = (rangeMax-rangeMin)/numSteps
        for j in range(-1,int(numSteps)+1):#loop over all range in current dimension
            for inequal in ['lt', 'gt']: #go over less than and greater than
                threshVal = (rangeMin + float(j) * stepSize)
                predictedVals = stumpClassify(dataMatrix,i,threshVal,inequal)#call stump classify with i, j, lessThan
                errArr = mat(ones((m,1)))
                errArr[predictedVals == labelMat] = 0
                #Line where AdaBoost interacts with the classifier.
                weightedError = D.T*errArr  #calc total error multiplied by D
                #print ("split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % (i, threshVal, inequal, weightedError))
                if weightedError < minError:
                    minError = weightedError
                    bestClasEst = predictedVals.copy()
                    bestStump['dim'] = i
                    bestStump['thresh'] = threshVal
                    bestStump['ineq'] = inequal
    return bestStump,minError,bestClasEst
In [13]:
D = mat(ones((5,1))/5)
In [14]:
buildStump(datMat,classLabels,D)
Out[14]:
({'dim': 0, 'ineq': 'lt', 'thresh': 1.3}, matrix([[ 0.2]]), array([[-1.],
        [ 1.],
        [-1.],
        [-1.],
        [ 1.]]))

Implementing the full AdaBoost algorithm

For each iteration:

Find the best stump using buildStump()

Add the best stump to the stump array

Calculate alpha

Calculate the new weight vector – D

Update the aggregate class estimate

If the error rate ==0.0 : break out of the for loop
In [67]:
def adaBoostTrainDS(dataArr,classLabels,numIt=40):
    weakClassArr = []
    #mis the number of datapoints in a dataset
    m = shape(dataArr)[0]
    #D holds all the weights of each peice of data
    D = mat(ones((m,1))/m)   #init D to all equal
    #aggregrate estimate of the class for every data point
    aggClassEst = mat(zeros((m,1)))
    for i in range(numIt):
        bestStump,error,classEst = buildStump(dataArr,classLabels,D)#build Stump
        print("D:",D.T)
        alpha = float(0.5*log((1.0-error)/max(error,1e-16)))#calc alpha, throw in max(error,eps) to account for error=0
        bestStump['alpha'] = alpha
        weakClassArr.append(bestStump)                  #store Stump Params in Array
        print ("classEst: ",classEst.T)
        expon = multiply(-1*alpha*mat(classLabels).T,classEst) #exponent for D calc, getting messy
        D = multiply(D,exp(expon))                              #Calc New D for next iteration
        D = D/D.sum()
        #calc training error of all classifiers, if this is 0 quit for loop early (use break)
        aggClassEst += alpha*classEst
        print ("aggClassEst: ",aggClassEst.T)
        aggErrors = multiply(sign(aggClassEst) != mat(classLabels).T,ones((m,1)))
        errorRate = aggErrors.sum()/m
        print ("total error: ",errorRate)
        if errorRate == 0.0: break
    return weakClassArr ,aggClassEst
In [60]:
classifierArray = adaBoostTrainDS(datMat,classLabels,9)
D: [[ 0.2  0.2  0.2  0.2  0.2]]
classEst:  [[-1.  1. -1. -1.  1.]]
aggClassEst:  [[-0.69314718  0.69314718 -0.69314718 -0.69314718  0.69314718]]
total error:  0.2
D: [[ 0.5    0.125  0.125  0.125  0.125]]
classEst:  [[ 1.  1. -1. -1. -1.]]
aggClassEst:  [[ 0.27980789  1.66610226 -1.66610226 -1.66610226 -0.27980789]]
total error:  0.2
D: [[ 0.28571429  0.07142857  0.07142857  0.07142857  0.5       ]]
classEst:  [[ 1.  1.  1.  1.  1.]]
aggClassEst:  [[ 1.17568763  2.56198199 -0.77022252 -0.77022252  0.61607184]]
total error:  0.0

Remember, our class labels were [1.0, 1.0, -1.0, -1.0, 1.0]. In the first iteration, all the D values were equal; then only one value, the first data point, was misclassified. So, in the next iteration, the D vector puts 0.5 weight on the first data point because it was misclassified previously. You can see the total class by looking at the sign of aggClassEst. After the second iteration, you can see that the first data point is correctly classified, but the last data point is now wrong. The D value now becomes 0.5 for the last element, and the other values in the D vector are much smaller. Finally, in the third iteration the sign of all the values in aggClassEst matches your class labels and the training error becomes 0, so you can quit.

In [61]:
classifierArray
Out[61]:
[{'alpha': 0.6931471805599453, 'dim': 0, 'ineq': 'lt', 'thresh': 1.3},
 {'alpha': 0.9729550745276565, 'dim': 1, 'ineq': 'lt', 'thresh': 1.0},
 {'alpha': 0.8958797346140273,
  'dim': 0,
  'ineq': 'lt',
  'thresh': 0.90000000000000002}]

Test: Classifying with AdaBoost

All you need to do is take the train of weak classifiers from your training function and apply these to an instance. The result of each weak classifier is weighted by its alpha. The weighted results from all of these weak classifiers are added together, and you take the sign of the final weighted sum to get your final answer.

In [62]:
def adaClassify(datToClass,classifierArr):
    dataMatrix = mat(datToClass)#do stuff similar to last aggClassEst in adaBoostTrainDS
    m = shape(dataMatrix)[0]
    aggClassEst = mat(zeros((m,1)))
    for i in range(len(classifierArr)):
        classEst =stumpClassify(dataMatrix,classifierArr[i]['dim'],
                                 classifierArr[i]['thresh'],
                                 classifierArr[i]['ineq'])#call stump classify
        aggClassEst += classifierArr[i]['alpha']*classEst
        print (aggClassEst)
    return sign(aggClassEst)

For classify to work please comment out ,aggClassEst in adaBoostTrainingDS

In [63]:
adaClassify([[5, 5],[0,0]],classifierArray)
[[ 0.69314718]
 [-0.69314718]]
[[ 1.66610226]
 [-1.66610226]]
[[ 2.56198199]
 [-2.56198199]]
Out[63]:
matrix([[ 1.],
        [-1.]])

AdaBoost on a difficult dataset

In [64]:
datArr,labelArr = loadDataSet('horseColicTraining2.txt')
classifierArray = adaBoostTrainDS(datArr,labelArr,10)
D: [[ 0.00334448  0.00334448  0.00334448  0.00334448  0.00334448  0.00334448
   0.00334448  0.00334448  0.00334448  0.00334448  0.00334448  0.00334448
   0.00334448  0.00334448  0.00334448  0.00334448  0.00334448  0.00334448
....
....
....
0.00334448  0.00334448  0.00334448  0.00334448  0.00334448  0.00334448
   0.00334448  0.00334448  0.00334448  0.00334448  0.00334448  0.00334448
   0.00334448  0.00334448  0.00334448  0.00334448  0.00334448]]
classEst:  [[-1.  1.  1.  1.  1.  1.  1.  1. -1. -1.  1.  1.  1.  1.  1. -1. -1.  1.
   1.  1. -1.  1.  1.  1.  1.  1.  1.  1.  1.  1. -1.  1.  1.  1. -1. -1.
   1. -1.  1.  1. -1.  1.  1. -1. -1. -1. -1.  1.  1. -1.  1.  1.  1.  1.
....
....
....
-1. -1. -1.  1. -1. -1.  1.  1.  1.  1.  1.  1.  1.  1.  1. -1.  1.  1.
   1.  1.  1. -1.  1.  1.  1. -1. -1.  1.  1.]]
aggClassEst:  [[-0.46166238  0.46166238  0.46166238  0.46166238  0.46166238  0.46166238
   0.46166238  0.46166238 -0.46166238 -0.46166238  0.46166238  0.46166238
   0.46166238  0.46166238  0.46166238 -0.46166238 -0.46166238  0.46166238
....
....
0.46166238  0.46166238  0.46166238 -0.46166238  0.46166238  0.46166238
   0.46166238 -0.46166238 -0.46166238  0.46166238  0.46166238]]
total error:  0.284280936455
D: [[ 0.00233645  0.00588235  0.00233645  0.00588235  0.00588235  0.00233645
   0.00233645  0.00588235  0.00233645  0.00588235  0.00233645  0.00233645
   0.00233645  0.00588235  0.00233645  0.00233645  0.00233645  0.00233645
....
....
1.20719077  0.91726555 -0.10329743 -0.57967867  0.27123572  1.69342306
   0.05809528 -0.65208904 -1.02662219  0.27123572  0.5606821 ]]
total error:  0.230769230769
In [65]:
testArr,testLabelArr = loadDataSet('horseColicTest2.txt')
prediction10 = adaClassify(testArr,classifierArray)
[[ 0.46166238]
 [ 0.46166238]
 [-0.46166238]
...
...
...
[ 1.47791472]
 [ 0.80958618]
 [ 0.54030781]
 [ 0.5273375 ]]
In [66]:
errArr=mat(ones((67,1)))
errArr[prediction10!=mat(testLabelArr).T].sum()
Out[66]:
16.0
In [54]:
def plotROC(predStrengths, classLabels):
    import matplotlib.pyplot as plt
    %matplotlib inline
    cur = (1.0,1.0) #cursor
    ySum = 0.0 #variable to calculate AUC
    numPosClas = sum(array(classLabels)==1.0)
    yStep = 1/float(numPosClas); xStep = 1/float(len(classLabels)-numPosClas)
    sortedIndicies = predStrengths.argsort()#get sorted index, it's reverse
    fig = plt.figure()
    fig.clf()
    ax = plt.subplot(111)
    #loop through all the values, drawing a line segment at each point
    for index in sortedIndicies.tolist()[0]:
        if classLabels[index] == 1.0:
            delX = 0; delY = yStep;
        else:
            delX = xStep; delY = 0;
            ySum += cur[1]
        #draw line from cur to (cur[0]-delX,cur[1]-delY)
        ax.plot([cur[0],cur[0]-delX],[cur[1],cur[1]-delY], c='b')
        cur = (cur[0]-delX,cur[1]-delY)
    ax.plot([0,1],[0,1],'b--')
    plt.xlabel('False positive rate'); plt.ylabel('True positive rate')
    plt.title('ROC curve for AdaBoost horse colic detection system')
    ax.axis([0,1,0,1])
    plt.show()
    print ("the Area Under the Curve is: ",ySum*xStep)
In [55]:
datArr,labelArr = loadDataSet('horseColicTraining2.txt')
In [57]:
classifierArray,aggClassEst = adaBoostTrainDS(datArr,labelArr,10)
D: [[ 0.00334448  0.00334448  0.00334448  0.00334448  0.00334448  0.00334448
   0.00334448  0.00334448  0.00334448  0.00334448  0.00334448  0.00334448
   0.00334448  0.00334448  0.00334448  0.00334448  0.00334448  0.00334448
   0.00334448  0.00334448  0.00334448  0.00334448  0.00334448  0.00334448
   0.00334448  0.00334448  0.00334448  0.00334448  0.00334448  0.00334448
....
....
....
 -0.01293737  1.53203035  0.95841088 -1.04249592  0.23749438  0.5606821
   1.20719077  0.91726555 -0.10329743 -0.57967867  0.27123572  1.69342306
   0.05809528 -0.65208904 -1.02662219  0.27123572  0.5606821 ]]
total error:  0.230769230769
In [58]:
plotROC(aggClassEst.T,labelArr)
the Area Under the Curve is:  0.8582969635063604

There are many ways to include the cost information in classification algorithms. In AdaBoost, you can adjust the error weight vector D based on the cost function. In naïve Bayes, you could predict the class with the lowest expected cost instead of the class with the highest probability. In SVMs, you can use different C parameters in the cost function for the different classes. This gives more weight to the smaller class, which when training the classifier will allow fewer errors in the smaller class.

excerpts from

tutorial

photo

 

One Response

  1. I believe that the dataset you used is classified as 1 or -1. What about Datasets that have 3 or more classes? how can I modify your code to run it with multi-classes sets?