Vollmers, Luis (2023)
Prediction of Cytotoxicity Related PubChem Assays Using High-Content-Imaging Descriptors derived from Cell-Painting.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00020236
Master Thesis, Primary publication, Publisher's Version
Text
master2A.pdf Copyright Information: CC BY 4.0 International - Creative Commons, Attribution. Download (12MB) |
Item Type: | Master Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Prediction of Cytotoxicity Related PubChem Assays Using High-Content-Imaging Descriptors derived from Cell-Painting | ||||
Language: | English | ||||
Referees: | Schmitz, Prof. Dr. Katja ; Bender, Dr. Andreas | ||||
Date: | 2023 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | 79 Seiten | ||||
DOI: | 10.26083/tuprints-00020236 | ||||
Abstract: | The pharmaceutical industry is centred around small molecules and their effects. Apart from the curative effect, the absence of adverse or toxicological effects is cardinal. However, toxicity is at least as elusive as it is important. A simple definition is: ’toxicology is the science of adverse effects of chemicals on living organisms’.1 However, this definition comprises several caveats. What is the organism? Where do therapeutic and adverse effects start and end? Even for the simplest organisms’ toxicity, cytotoxicity, the mechanisms are manifold and difficult to unravel. Hence, it remains obscure which characteristics a compound has to combine to be labelled as toxic. One attempt to illuminate these characteristics are novel cell-painting (CP) assays. For a CP assay, cells are perturbed by libraries of small compounds, which might affect the cellular morphology before images are taken via automated fluorescence microscopy. Five fluorescent channels are used for imaging, and these channels correspond to certain cell organelles.2 Therefore CP data contains information about cell structure variations caused by each compound. Which subinformation is actually valuable within these morphological fingerprints remains elusive. Therefore a significant part of the project presented here is dedicated to exploring the CP data and their predictive capabilities comparatively. They will be compared against different descriptors for a variety of bioassays. The CP data used in this project contains roughly 30 000 compounds and 1800 features.3 In chemistry, the structure determines the properties of a compound or substance. Therefore, apart from CP, structural fingerprints are used as a benchmark descriptor set for comparison. In this project extended-connectivity fingerprints (ECFPs) were used to encode the compounds’ structures as numerical features. This work is concerned with morphological changes that correspond to toxicity. Thus, the CP data were combined with toxicological endpoints from specific assays selected from the PubChem database. The selection process implemented a minimum number of active compounds, a size criterion and the occurrence of toxicologically relevant targets. After the selected assays were combined with each of their descriptors, machine learning models were trained, and their predictive power was evaluated against specific metrics. The predictions can be divided into four cycles. In the first cycle, the CP data are used as descriptors, the second cycle used the structural fingerprints, and the third cycle used a subset of both. A rigorous feature engineering process selected the subsets. The last cycle skipped the feature engineering and combined all CP and ECFP descriptors into one large set of inputs. The evaluation of the prediction metrics illuminates which strengths and shortcomings the morphological fingerprints feature compared to the structural fingerprints. It turned out that there are two groups of assays: those PubChem assays that are generally better predicted with CP features and those that have higher predictive potential when using ECFP. Additionally, it was revealed that ECFP comprise higher specificity compared to CP data which show higher sensitivity on the other hand. A high sensitivity means the prediction rarely mislabels a sample as negative (e.g. non-toxic) compared to the number of correctly labelled positive samples (e.g. toxic compounds.). Based on these results, CP is better suited for toxicity prediction and drug safety evaluations since the mislabelled, positive compound can lead to expenses or even damage to health. Furthermore, based on the data from fluorescent channels, an enrichment measure was introduced and calculated for the aforementioned two groups of PubChem assays. This enrichment connects predictive performance with cell organelle activity. The hypothesis was that PubChem assays, reliably predictable from CP data, should exhibit increased enrichment, which was the case for four out of five fluorescence microscopy channels. As a next step, phenotypic terms were manually generated to categorize the different PubChem assays. These terms corresponded to cellular mechanisms or morphological processes and were generated unbiasedly. Nevertheless, they are subject to human error. The phenotypic annotations that are found to be enriched for successful modelling approaches might guide the preselection of bioassays in future projects. The enrichment analysis of phenotypic annotations detected that PubChem assays that could be well predicted via CP data are related to immune response, genotoxicity and genome regulation and cell death. Finally, the assays are assigned gene ontology (GO) terms obtained from the GO database. These terms comprise a controlled, structured vocabulary that explicitly describes the molecular function and biological processes of a given gene product. For PubChem assays associated with a protein target, the GO terms are collected. If an assay is particularly well predicted via CP descriptors, the associated GO terms can relate this finding to cellular function. Even though the analysis with go terms suffers from a minimal sample size, it was found that CP related assays usually correspond to processes concerning deoxyribonucleic acid (DNA) and other macromolecules. This finding is in good agreement with the analysis of the channel enrichment as well as the phenotypic enrichment. |
||||
Alternative Abstract: |
|
||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-202360 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science 500 Science and mathematics > 540 Chemistry 500 Science and mathematics > 570 Life sciences, biology |
||||
Divisions: | 07 Department of Chemistry > Clemens-Schöpf-Institut > Fachgebiet Biochemie > Biologische Chemie | ||||
Date Deposited: | 02 Jun 2023 12:06 | ||||
Last Modified: | 18 Aug 2023 08:50 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/20236 | ||||
PPN: | 508293987 | ||||
Export: |
View Item |