First Kaggle Competition
This Tutorial is from:
![] (/galleries/Kaggle1st/1.jpg, raw = true)
Recently, i have entered Kaggle competition for data scince. i have ranked 342 out of almost 800 other competitors. Pretty impressive eh? Here's how i got to it.
First we make the learning algorithm and the submission itself.
In [4]:
from sklearn.ensemble import RandomForestClassifier
from numpy import genfromtxt,savetxt
def main():
#create the training & test sets, skipping the header row with [1:]
dataset = genfromtxt(open('../output/files/Data/train.csv','r'), delimiter=',',dtype='f8')[1:]
target = [x[0] for x in dataset]
train = [x[1:] for x in dataset]
test = genfromtxt(open('../output/files/Data/test.csv','r'),delimiter=',',dtype='f8')[1:]
#create and train the random forest
#multi-core CPUs can use: rf = RandomForestClassifier(n_estimator=100, n_jobs=2)
rf = RandomForestClassifier(n_estimators= 100)
rf.fit(train,target)
predicted_probs = [[index + 1, x[1]] for index,x in enumerate(rf.predict_proba(test))]
savetxt('../output/files/Data/submission.csv', predicted_probs, delimiter=',', fmt='%d,%f',
header='MoleculeId,PredictedProbability', comments = '')
if __name__=="__main__":
main()
now let's call this function,
In [6]:
#Testing pylab
%pylab inline
In [14]:
import matplotlib.pyplot as plt
In [15]:
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
Out[15]: