First Kaggle Competition

  |   Source

This Tutorial is from:

![] (/galleries/Kaggle1st/1.jpg, raw = true)

Recently, i have entered Kaggle competition for data scince. i have ranked 342 out of almost 800 other competitors. Pretty impressive eh? Here's how i got to it.

First we make the learning algorithm and the submission itself.

In [4]:
from sklearn.ensemble import RandomForestClassifier
from numpy import genfromtxt,savetxt

def main():
    #create the training & test sets, skipping the header row with [1:]
    dataset = genfromtxt(open('../output/files/Data/train.csv','r'), delimiter=',',dtype='f8')[1:]
    target = [x[0] for x in dataset]
    train = [x[1:] for x in dataset]
    
    test = genfromtxt(open('../output/files/Data/test.csv','r'),delimiter=',',dtype='f8')[1:]
    
    #create and train the random forest
    #multi-core CPUs can use: rf = RandomForestClassifier(n_estimator=100, n_jobs=2)
    rf = RandomForestClassifier(n_estimators= 100)
    rf.fit(train,target)
    predicted_probs = [[index + 1, x[1]] for index,x in enumerate(rf.predict_proba(test))]
    
    savetxt('../output/files/Data/submission.csv', predicted_probs, delimiter=',', fmt='%d,%f',
            header='MoleculeId,PredictedProbability', comments = '')
if __name__=="__main__":
    main()

now let's call this function,

In [6]:
#Testing pylab
%pylab inline
Populating the interactive namespace from numpy and matplotlib
In [14]:
import matplotlib.pyplot as plt
In [15]:
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
Out[15]:
[<matplotlib.lines.Line2D at 0x105dc9510>]