[关键词]
[摘要]
该研究依据2013~2017年肉制品抽检数据构造了5个安全风险等级,使用特征构造及独热编码进一步关联与肉制品安全相关的影响因素,构建极端梯度提升树算法(Extreme Gradient Boosting,XGBOOST)研究食品生产过程各类因素对于食品安全风险等级的影响程度,并使用多个指标评价模型。此外通过上采样解决样本不平衡问题、贝叶斯优化调节超参数,来提高模型性能及分类效果。相较于模型决策树(Decision Tree,DT)和随机森林(Random Forest,RF),XGBOOST模型在肉制品安全风险等级分类中的表现效果最佳。研究结果表明,食品生产过程环节错综复杂,使用one-hot encoding处理后的模型能够有效判断出各类因素对于食品安全风险等级的影响程度,集成模型中RF的学习效果比较稳定,XGBOOST经过参数调节后准确率等指标得到有效的提升且优于RF。不同采样下XGBOOST的平均精确率均能达到89.14%,平均F1值为88.59%,说明XGBOOST在肉制品安全风险等级预警中适用性,为日常抽检提供技术指导。
[Key word]
[Abstract]
Five safety risk levels were established based on the detection data of meat products sampled between 2013 and 2017. Feature construction and one-hot encoding were used to further correlate factors relevant to meat product safety. An extreme gradient boosting (XGBOOST) model was established to study the influence levels of various factors during food production on the safety risk level; subsequently then multiple indices were used to evaluate the model. In addition, sample imbalance problem was solved by upsampling, and the hyperparameters were adjusted by Bayesian optimization to improve the model performance and classification results. Simultaneously, the model constructed was compared with the decision tree (DT) and random forest (RF) methods to evaluate their classification performance. The XGBOOST outperformed others in classifying the safety risk levels of meat products. Food production processes are complex, and this study shows that model processing with one-hot encoding could effectively identify the influence levels of various factors on food safety. Moreover, the result suggested that XGBOOST performs better in terms of total accuracy and other indices after parameter adjustment, compared to the RF model, while RF had the most stable learning performance. The average of accuracy and F1 score can reach 89.14% and 88.59%, respectively, under different sampling. The results suggest that XGBOOST can be applied to determine safety risk levels of meat products and provide technical support for daily supervision.
[中图分类号]
[基金项目]
国家重点研发计划项目(2018YFC1603602)