2023-05-21 来源:飞速影视
for col in numeric_cols: missing = df[col].isnull() num_missing = np.sum(missing) if num_missing > 0: # only do the imputation for the columns that have missing values. print("imputing missing values for: {}".format(col)) df["{}_ismissing".format(col)] = missing med = df[col].median() df[col] = df[col].fillna(med)

很幸运,本文使用的数据集中的分类特征没有缺失值。不然,我们也可以对所有分类特征一次性应用众数填充策略。# impute the missing values and create the missing value indicator variables for each non-numeric column.df_non_numeric = df.select_dtypes(exclude=[np.number])non_numeric_cols = df_non_numeric.columns.values
for col in non_numeric_cols: missing = df[col].isnull() num_missing = np.sum(missing) if num_missing > 0: # only do the imputation for the columns that have missing values. print("imputing missing values for: {}".format(col)) df["{}_ismissing".format(col)] = missing top = df[col].describe()["top"] # impute with the most frequent value. df[col] = df[col].fillna(top)
www.fs94.org-飞速影视 粤ICP备74369512号