TU Darmstadt / ULB / TUprints

Learning diagnostic signatures from microarray data using L1-regularized logistic regression

Nandy, Preetam ; Unger, Michael ; Zechner, Christoph ; Dey, Kushal K. ; Koeppl, Heinz (2024)
Learning diagnostic signatures from microarray data using L1-regularized logistic regression.
In: Systems Biomedicine, 2013, 1 (4)
doi: 10.26083/tuprints-00027017
Article, Secondary publication, Publisher's Version

[img] Text
Learning diagnostic signatures from microarray data using L1-regularized logistic regression.pdf
Copyright Information: CC BY-NC 3.0 Unported - Creative Commons, Attribution, NonCommercial.

Download (2MB)
Item Type: Article
Type of entry: Secondary publication
Title: Learning diagnostic signatures from microarray data using L1-regularized logistic regression
Language: English
Date: 22 April 2024
Place of Publication: Darmstadt
Year of primary publication: 2013
Place of primary publication: Austin, Tx.
Publisher: Taylor & Francis
Journal or Publication Title: Systems Biomedicine
Volume of the journal: 1
Issue Number: 4
DOI: 10.26083/tuprints-00027017
URL / URN: https://www.tandfonline.com/doi/full/10.4161/sysb....
Corresponding Links:
Origin: Secondary publication service

Making reliable diagnoses and predictions based on high-throughput transcriptional data has attracted immense attention in the past few years. While experimental gene profiling techniques—such as microarray platforms—are advancing rapidly, there is an increasing demand of computational methods being able to efficiently handle such data.

In this work we propose a computational workflow for extracting diagnostic gene signatures from high-throughput transcriptional profiling data. In particular, our research was performed within the scope of the first IMPROVER challenge. The goal of that challenge was to extract and verify diagnostic signatures based on microarray gene expression data in four different disease areas: psoriasis, multiple sclerosis, chronic obstructive pulmonary disease and lung cancer. Each of the different disease areas is handled using the same three-stage algorithm. First, the data are normalized based on a multi-array average (RMA) normalization procedure to account for variability among different samples and data sets. Due to the vast dimensionality of the profiling data, we subsequently perform a feature pre-selection using a Wilcoxon’s rank sum statistic. The remaining features are then used to train an L1-regularized logistic regression model which acts as our primary classifier. Using the four different data sets, we analyze the proposed method and demonstrate its use in extracting diagnostic signatures from microarray gene expression data.

Uncontrolled Keywords: classification, gene expression, L1-regularization, LASSO, logistic regression, microarray data, RMA normalization, Wilcoxon rank sum test
Status: Publisher's Version
URN: urn:nbn:de:tuda-tuprints-270174
Classification DDC: 600 Technology, medicine, applied sciences > 610 Medicine and health
600 Technology, medicine, applied sciences > 621.3 Electrical engineering, electronics
Date Deposited: 22 Apr 2024 09:49
Last Modified: 23 Apr 2024 04:49
URI: https://tuprints.ulb.tu-darmstadt.de/id/eprint/27017
Actions (login required)
View Item View Item