Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit.
Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. Dataset
Attribute Information:
Variable Name
Description
Type
SeriousDlqin2yrs
Person experienced 90 days past due delinquency or worse
Y/N
RevolvingUtilizationOfUnsecuredLines
Total balance on credit divided by the sum of credit limits
percentage
age
Age of borrower in years
integer
NumberOfTime30-59DaysPastDueNotWorse
Number of times borrower has been 30-59 days past due
integer
DebtRatio
Monthly debt payments
percentage
MonthlyIncome
Monthly income
real
NumberOfOpenCreditLinesAndLoans
Number of Open loans
integer
NumberOfTimes90DaysLate
Number of times borrower has been 90 days or more past due.
integer
NumberRealEstateLoansOrLines
Number of mortgage and real estate loans
integer
NumberOfTime60-89DaysPastDueNotWorse
Number of times borrower has been 60-89 days past due
integer
NumberOfDependents
Number of dependents in family
integer
Read the data into Pandas
1 2 3 4 5 6 7 8 9 10
# 导入 pandas 库,并设置 pandas 的显示选项,使其能显示最多 500 列 # 使用 zipfile 库打开名为 'KaggleCredit2.csv.zip' 的 zip 文件,并从中读取 'KaggleCredit2.csv' 文件 # 显示 DataFrame 的头几行 import pandas as pd pd.set_option('display.max_columns', 500) import zipfile with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z: ##读取zip里的文件 f = z.open('KaggleCredit2.csv') data = pd.read_csv(f, index_col=0) data.head()
0.9325801329989185
d:\Python\lib\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
warnings.warn(