泰坦尼克号生存预测(数据读取、处理与建模)

  • 简介:

本文是泰坦尼克号上的生存概率预测,这是基于Kaggle上的一个经典比赛项目。

数据集:

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。

1.Kaggle泰坦尼克号项目页面下载数据:https://www.kaggle.com/c/titanic

2.网盘地址:https://pan.baidu.com/s/1BfRZdCz6Z1XR6aDXxiHmHA      提取码:jzb3 

  • 代码内容

数据读取:

#%%
import tensorflow as tf
import keras
import pandas as pd
import numpy as np

data = pd.read_csv("titanic/train.csv")
print(data.head())
print(data.describe())

 

[数据科学从零到壹]·泰坦尼克号生存预测(数据读取、处理与建模)​​​​​​​ 人工智能 第1张

数据处理:

#%%
strs = "Survived Pclass Sex Age SibSp Parch Fare Embarked"
clos = strs.split(" ")
print(clos)
#%%
x_datas = data[clos]
print(x_datas.head())
#%%
print(x_datas.isnull().sum())

#%%
x_datas["Age"] = x_datas["Age"].fillna(x_datas["Age"].mean())
x_datas["Embarked"] = x_datas["Embarked"].fillna(x_datas["Embarked"].mode()[0])


#x_datas["Sex"] = pd.get_dummies(x_datas["Sex"])
x_datas = pd.get_dummies(x_datas,columns=["Pclass","Sex","Embarked"])
x_datas["Age"]/=100
x_datas["Fare"]/=100

print(x_datas.isnull().sum())
print(x_datas.head())

#%%
seq = int(0.75*(len(x_datas)))

X ,Y = x_datas.iloc[:,1:],x_datas.iloc[:,0]
X_train,Y_train,X_test,Y_test = X[:seq],Y[:seq],X[seq:],Y[seq:]

 

[数据科学从零到壹]·泰坦尼克号生存预测(数据读取、处理与建模)​​​​​​​ 人工智能 第2张

模型搭建:

#%%
strs = "Survived Pclass Sex Age SibSp Parch Fare Embarked"
clos = strs.split(" ")
print(clos)
#%%
x_datas = data[clos]
print(x_datas.head())
#%%
print(x_datas.isnull().sum())

#%%
x_datas["Age"] = x_datas["Age"].fillna(x_datas["Age"].mean())
x_datas["Embarked"] = x_datas["Embarked"].fillna(x_datas["Embarked"].mode()[0])


#x_datas["Sex"] = pd.get_dummies(x_datas["Sex"])
x_datas = pd.get_dummies(x_datas,columns=["Pclass","Sex","Embarked"])
x_datas["Age"]/=100
x_datas["Fare"]/=100

print(x_datas.isnull().sum())
print(x_datas.head())

#%%
seq = int(0.75*(len(x_datas)))

X ,Y = x_datas.iloc[:,1:],x_datas.iloc[:,0]
X_train,Y_train,X_test,Y_test = X[:seq],Y[:seq],X[seq:],Y[seq:]

 

[数据科学从零到壹]·泰坦尼克号生存预测(数据读取、处理与建模)​​​​​​​ 人工智能 第3张

模型训练与评估:

#%%
strs = "Survived Pclass Sex Age SibSp Parch Fare Embarked"
clos = strs.split(" ")
print(clos)
#%%
x_datas = data[clos]
print(x_datas.head())
#%%
print(x_datas.isnull().sum())

#%%
x_datas["Age"] = x_datas["Age"].fillna(x_datas["Age"].mean())
x_datas["Embarked"] = x_datas["Embarked"].fillna(x_datas["Embarked"].mode()[0])


#x_datas["Sex"] = pd.get_dummies(x_datas["Sex"])
x_datas = pd.get_dummies(x_datas,columns=["Pclass","Sex","Embarked"])
x_datas["Age"]/=100
x_datas["Fare"]/=100

print(x_datas.isnull().sum())
print(x_datas.head())

#%%
seq = int(0.75*(len(x_datas)))

X ,Y = x_datas.iloc[:,1:],x_datas.iloc[:,0]
X_train,Y_train,X_test,Y_test = X[:seq],Y[:seq],X[seq:],Y[seq:]

 

[数据科学从零到壹]·泰坦尼克号生存预测(数据读取、处理与建模)​​​​​​​ 人工智能 第4张
  • 输出结果:
_________________________________________________________________
Layer (type) Output Shape Param # ================================================================= dense_1 (Dense) (None, 64) 832 _________________________________________________________________ dropout_1 (Dropout) (None, 64) 0 _________________________________________________________________ dense_2 (Dense) (None, 16) 1040 _________________________________________________________________ dense_3 (Dense) (None, 2) 34 ================================================================= Total params: 1,906 Trainable params: 1,906 Non-trainable params: 0 _________________________________________________________________ ... Epoch 96/100 534/534 [==============================] - 0s 80us/step - loss: 0.3870 - acc: 0.8277 - val_loss: 0.5083 - val_acc: 0.7612 Epoch 97/100 534/534 [==============================] - 0s 80us/step - loss: 0.3921 - acc: 0.8352 - val_loss: 0.5070 - val_acc: 0.7687 Epoch 98/100 534/534 [==============================] - 0s 82us/step - loss: 0.3940 - acc: 0.8371 - val_loss: 0.5102 - val_acc: 0.7687 Epoch 99/100 534/534 [==============================] - 0s 78us/step - loss: 0.3996 - acc: 0.8277 - val_loss: 0.5106 - val_acc: 0.7687 Epoch 100/100 534/534 [==============================] - 0s 80us/step - loss: 0.3892 - acc: 0.8352 - val_loss: 0.5082 - val_acc: 0.7612 223/223 [==============================] - 0s 63us/step test loss is 0.389338, acc 0.829596
[数据科学从零到壹]·泰坦尼克号生存预测(数据读取、处理与建模)​​​​​​​ 人工智能 第5张
  • 完整代码:
#%%
strs = "Survived Pclass Sex Age SibSp Parch Fare Embarked"
clos = strs.split(" ")
print(clos)
#%%
x_datas = data[clos]
print(x_datas.head())
#%%
print(x_datas.isnull().sum())

#%%
x_datas["Age"] = x_datas["Age"].fillna(x_datas["Age"].mean())
x_datas["Embarked"] = x_datas["Embarked"].fillna(x_datas["Embarked"].mode()[0])


#x_datas["Sex"] = pd.get_dummies(x_datas["Sex"])
x_datas = pd.get_dummies(x_datas,columns=["Pclass","Sex","Embarked"])
x_datas["Age"]/=100
x_datas["Fare"]/=100

print(x_datas.isnull().sum())
print(x_datas.head())

#%%
seq = int(0.75*(len(x_datas)))

X ,Y = x_datas.iloc[:,1:],x_datas.iloc[:,0]
X_train,Y_train,X_test,Y_test = X[:seq],Y[:seq],X[seq:],Y[seq:]

 

[数据科学从零到壹]·泰坦尼克号生存预测(数据读取、处理与建模)​​​​​​​ 人工智能 第6张
扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄