A Multi-Task Graph Convolutional Network for Molecular Toxicity Prediction Using the Tox21 Dataset
DOI:
https://doi.org/10.63282/3050-922X.ICAILLMBA-115Keywords:
Graph Convolutional Network, Tox21, Multi-task Learning, Molecular Toxicity Prediction, Cheminformatics, ROC-AUC, PR-AUCAbstract
Accurate prediction of molecular toxicity is a critical step in drug discovery and environmental safety assessment. Traditional computational models often struggle with multi-task toxicity datasets such as Tox21 due to high sparsity and class imbalance across targets. In this study, we implement a simple multi-task Graph Convolutional Network (GCN) to predict the activity of compounds against twelve toxicity-related targets from the Tox21 dataset. Molecular graphs were generated from SMILES representations, with atomic features and connectivity used as input to the network. The model consists of two GCN layers followed by a global mean pooling and a linear classifier, enabling simultaneous prediction across all targets. To handle missing labels and data imbalance, masked binary cross-entropy was employed during training. Evaluation metrics included ROC-AUC, PR-AUC, and confusion matrices. The proposed GCN achieved ROC-AUC values ranging from 0.46 to 0.68 and PR-AUC values from 0.03 to 0.16 across the twelve targets, demonstrating moderate predictive performance despite dataset sparsity. Training and validation loss curves indicated stable convergence without overfitting. Confusion matrix analysis revealed the impact of class imbalance, highlighting the necessity for weighted loss or data augmentation in future work. Overall, the study demonstrates that even a simple GCN can capture molecular structural information for multi-target toxicity prediction, providing a foundation for more advanced graph-based architectures in cheminformatics applications.
References
[1] R. Tice et al., “Improving the efficiency of toxicology testing,” Environmental Health Perspectives, vol. 123, no. 4, pp. 317–323, 2015.
[2] M. Cronin and T. Schultz, Toxicological QSAR Modeling. Springer, 2009.
[3] Z. Wu et al., “MoleculeNet: a benchmark for molecular machine learning,” Chemical Science, vol. 9, pp. 513–530, 2018.
[4] R. Xu et al., “Deep learning for drug toxicity prediction,” Journal of Chemical Information and Modeling, vol. 59, no. 2, pp. 411–420, 2019.
[5] D. Duvenaud et al., “Convolutional networks on graphs for learning molecular fingerprints,” NeurIPS, 2015.
[6] S. Kearnes et al., “Molecular graph convolutions: moving beyond fingerprints,” Journal of Computer-Aided Molecular Design, vol. 30, pp. 595–608, 2016.
[7] J. Gilmer et al., “Neural message passing for quantum chemistry,” ICML, 2017.
[8] A. Mayr et al., “Large-scale comparison of machine learning methods for drug target prediction on Tox21,” ChemMedChem, vol. 11, pp. 1238–1252, 2016.
[9] K. Ying et al., “Hierarchical graph representation learning with differentiable pooling,” NeurIPS, 2018.
[10] H. Li, Y. Zhang, and J. Wang, “Attention-guided graph convolutional networks for chemical toxicity prediction,” Journal of Chemical Information and Modeling, vol. 62, no. 4, pp. 897–908, 2022.
[11] X. Chen et al., “Hybrid graph-sequence neural networks for multi-task toxicity prediction,” Computational Toxicology, vol. 21, pp. 100204, 2023.
[12] P. Kumar et al., “Transfer learning with pre-trained molecular graph neural networks for toxicity prediction,” Briefings in Bioinformatics, vol. 24, no. 2, pp. bbac555, 2023.