DExTER

Identification of long regulatory elements in the genome of Plasmodium falciparum and other eukaryotes

Christophe Menichelli Vincent Guitard Rafael M. Martins Sophie Lèbre Jose-Juan Lopez-Rubio Charles-Henri Lecellier Laurent Bréhélin

https://doi.org/10.1371/journal.pcbi.1008909

Overview

DExTER (Domain Exploration To Explain gene Regulation) is a bioinformatics tool designed to automatically identify genomic regions whose nucleotide composition correlates with gene expression levels.

Unlike traditional approaches focusing on short transcription factor binding sites (6-12 bp), DExTER detects Long Regulatory Elements (LREs) that can span tens to hundreds of nucleotides.

This makes it possible to explore a poorly characterized regulatory mechanism: gene regulation driven by the global composition of genomic regions.

The method has shown strong predictive power, explaining a large fraction of gene expression variability, reaching more than 70% prediction accuracy in Plasmodium falciparum.

Method principle

DExTER combines DNA sequences and gene expression data to identify pairs:

(k-mer motif, gene region)

whose frequency is correlated with expression level.

Main steps

Segmentation of regions around genes
Iterative search for k-mers correlated with expression
Informative variable selection (LASSO / machine learning)
Construction of a predictive gene expression model

The final model allows:

identification of candidate regulatory regions
explanation of expression variability
prediction of expression of new genes

Biological applications

DExTER enables the study of regulatory mechanisms not detectable with classical motif discovery methods:

Genomes with few transcription factors
Post-transcriptional regulation
Cell-cycle dependent regulation
Epigenetic regulation linked to DNA composition

Key observations from the study:

Highly dynamic regulation in Apicomplexa
More stable regulation in multicellular organisms
Distinct roles of upstream (transcriptional) vs downstream (post-transcriptional) regions

Input data

Nucleotide sequences aligned per gene (e.g. ±2 kb around TSS or start codon)
Gene expression matrix (RNA-seq or microarray)

Output data

List of candidate regulatory elements (cLREs)
Enriched motifs and associated regions
Variable importance
Predictive expression model
Correlation scores between sequence and expression

You can explore an example of the results generated by DExTER here:
https://api.atgc-montpellier.fr/results/71cb194e-4dc4-4b4d-aca7-f153659cd340/

Typical use cases

Study of non-canonical gene regulation
Comparative genomics
Transcriptome interpretation
Atypical genomes (parasites, plants, protists)

Associated publication

Menichelli C. et al., 2021
Identification of long regulatory elements in the genome of Plasmodium falciparum and other eukaryotes
PLOS Computational Biology
https://doi.org/10.1371/journal.pcbi.1008909

Source code

https://gite.lirmm.fr/menichelli/DExTER

DExTER online execution

Account Information

Name

Name your analysis for easy identification

Email address for job notifications

Confirm email

Please confirm your email address

Private

Protect your job with a randomly generated password. You will receive it by email.

Input data

FASTA file

Drag and drop a file or click to browse.

No file selected

Upload a FASTA file containing sequences of identical length.

Expression data file

Drag and drop a file or click to browse.

No file selected

Expression table with sequence identifiers in the first column and conditions in following columns.

Target condition name

Name of the condition column to analyse in the expression file.

Exploration settings

Number of bins

Number of bins used to segment the sequences.

Alignment point index

Reference position in sequences (0-based) used for alignment.

Maximum k-mer length

Maximum k-mer length to explore (0 disables the limit).

Custom bin boundaries

Optional comma-separated bin boundaries (e.g. -50,10,0,10,50).

Use uniform bin lengths

Use bins of equal size instead of the polynomial sizing strategy.

Correlation thresholds

Correlation increase threshold

Minimum absolute correlation gain required to continue exploration.

Correlation ratio threshold

Minimum relative correlation gain ((rho_new-rho_old)/rho_old).

Minimum correlation

Stop exploration when correlation drops below this value.

Execution options

Random seed

Seed for training/testing split reproducibility.

Log-transform expression data

Apply log transformation to expression values.

Related tool pages:

DExTER DExTER online

Other tools

TFscope

Characterizing the binding preferences of transcription factors (TFs) in different cell types and conditions is key to understand how they orchestrate gene expression. TFscope is a machine learning approach that identifies sequence features explaining the binding differences observed between two ChIP-seq experiments targeting either the same TF in two conditions or two TFs with similar…

DNA binding sites Machine learning Transcription factors and regulatory sites Transcriptional regulatory element prediction JASPAR profile ID BED FASTA meme-motif

AQUAPONY

AquaPony: interactive visualization of phylogeographic scenarios AquaPony is a web application designed to explore and interpret evolutionary scenarios on annotated phylogenetic trees (for example, ancestral geographic states). It was built to make uncertainty in ancestral reconstructions easier to understand and communicate. Why AquaPony? In phylogeography, several scenarios can be nearly as plausible as the best…

FastME 2.0

FastME is a software package for the fast and accurate inference of phylogenetic trees from distance matrices. It implements algorithms based on the Balanced Minimum Evolution (BME) principle, a distance-based criterion closely related to the Neighbor Joining (NJ) method. The goal of the BME framework is to identify the phylogenetic tree that minimizes the total…