基于深度学习探究胰蛋白酶催化蛋白质的特异性酶切预测
DOI:
CSTR:
作者:
作者单位:

1.五邑大学 药学与食品工程学院;2.广东工业大学生物医药学院;3.湖南农业大学园艺学院

作者简介:

通讯作者:

中图分类号:

基金项目:

广东省科技创新战略专项市县科技创新支撑(大专项+任务清单)项目-暨2023年江门市关键核心技术“揭榜挂帅”制项目(2023780200060009632);广东省基础与应用基础研究基金联合基金青年基金项目(2022A1515110711);广东省“百千万工程”项目(BQW2024001)。


Prediction of Trypsin-Catalyzed Protein Cleavage Specificity Based on Deep Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    胰蛋白酶催化蛋白质的特异性酶切预测对于蛋白质的理论降解和结构分析具有重要意义。本研究采用卷积神经网络和长短期记忆网络构建了一种基于深度学习的胰蛋白酶催化蛋白质酶解预测模型,并探讨了不同超参数对模型性能的影响。结果表明:较低的学习率、较高的Batch size和较浅的卷积层数对模型训练的稳定性和收敛性有利,可保障较好的预测效果,模型的较优工作参数为:学习率0.001、Batch size 512、卷积层数1。在此条件下,模型在PXD010627数据集上的准确率为0.950、特异性为0.987、精确度为0.986、召回率为0.961、F1分数为0.973,显示出良好的预测能力和稳定性。进一步应用于不同物种的公开数据集,模型的准确率均保持在0.920以上,AUC值、精确率、召回率和F1分数均在0.900-0.989之间,说明模型在胰蛋白酶催化蛋白质特异性酶切预测体系中具备较强的泛化能力,可有效提升酶切位点预测的准确性和可靠性。本研究将有望为蛋白质组学鉴定及空间结构分析提供了一种新思路,推动蛋白质酶解研究与生物信息学的发展。

    Abstract:

    The prediction of trypsin-catalyzed specific proteolysis is vital for guiding the theoretical degradation and structural analysis of proteins. In this study, a deep learning-based model was constructed to predict trypsin-catalyzed protein cleavage using convolutional neural networks (CNN) and long short-term memory (LSTM) networks. The impact of various hyperparameters on the model's performance was also explored. Results showed that a lower learning rate, higher batch size, and fewer convolutional layers were beneficial for the model's training stability, ensuring better predictive outcomes. The optimal parameters were identified as a learning rate of 0.001, batch size of 512, and one convolutional layer. In that case, the model achieved an accuracy of 0.950, specificity of 0.987, precision of 0.986, recall of 0.961, and an F1 score of 0.973 on dataset PXD010627, demonstrating excellent predictive capability and stability. Furthermore, when applied to publicly available datasets from different species, the model maintained an accuracy above 0.920, with AUC values, precision, recall, and F1 scores all ranging between 0.900 and 0.989. This indicates a strong generalization ability of the model in predicting specific cleavage sites of trypsin-catalyzed proteins, which could significantly enhance the accuracy and reliability of cleavage site predictions. We hope the work could offer a new idea for protein identification and spatial structure analysis in proteomics, promoting advancements in proteolytic research and bioinformatics.

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-04
  • 最后修改日期:2024-08-17
  • 录用日期:2024-08-21
  • 在线发布日期:
  • 出版日期:
文章二维码
×
因办公室装修,期间暂时无法接听电话,如有事请QQ或邮件联系。信息咨询:QQ: 2553003667稿件处理1:QQ: 1542354573稿件处理2:QQ: 2195608851 财务咨询:QQ: 1347040116 Email:mfood@scut.edu.cn、mfood@foxmail.com