DExTER




Overview

DExTER (Domain Exploration To Explain gene Regulation) is a bioinformatics tool designed to automatically identify genomic regions whose nucleotide composition correlates with gene expression levels.

Unlike traditional approaches focusing on short transcription factor binding sites (6-12 bp), DExTER detects Long Regulatory Elements (LREs) that can span tens to hundreds of nucleotides.

This makes it possible to explore a poorly characterized regulatory mechanism: gene regulation driven by the global composition of genomic regions.

The method has shown strong predictive power, explaining a large fraction of gene expression variability, reaching more than 70% prediction accuracy in Plasmodium falciparum.


Method principle

DExTER combines DNA sequences and gene expression data to identify pairs:

(k-mer motif, gene region)

whose frequency is correlated with expression level.

Main steps

  1. Segmentation of regions around genes
  2. Iterative search for k-mers correlated with expression
  3. Informative variable selection (LASSO / machine learning)
  4. Construction of a predictive gene expression model

The final model allows:

  • identification of candidate regulatory regions
  • explanation of expression variability
  • prediction of expression of new genes

Biological applications

DExTER enables the study of regulatory mechanisms not detectable with classical motif discovery methods:

  • Genomes with few transcription factors
  • Post-transcriptional regulation
  • Cell-cycle dependent regulation
  • Epigenetic regulation linked to DNA composition

Key observations from the study:

  • Highly dynamic regulation in Apicomplexa
  • More stable regulation in multicellular organisms
  • Distinct roles of upstream (transcriptional) vs downstream (post-transcriptional) regions

Input data

  • Nucleotide sequences aligned per gene (e.g. ±2 kb around TSS or start codon)
  • Gene expression matrix (RNA-seq or microarray)

Output data

  • List of candidate regulatory elements (cLREs)
  • Enriched motifs and associated regions
  • Variable importance
  • Predictive expression model
  • Correlation scores between sequence and expression

You can explore an example of the results generated by DExTER here:
https://api.atgc-montpellier.fr/results/71cb194e-4dc4-4b4d-aca7-f153659cd340/


Typical use cases

  • Study of non-canonical gene regulation
  • Comparative genomics
  • Transcriptome interpretation
  • Atypical genomes (parasites, plants, protists)

Associated publication

Menichelli C. et al., 2021
Identification of long regulatory elements in the genome of Plasmodium falciparum and other eukaryotes
PLOS Computational Biology
https://doi.org/10.1371/journal.pcbi.1008909


Source code

https://gite.lirmm.fr/menichelli/DExTER



DExTER online execution

Input data
Drag and drop a file or click to browse.
No file selected
Upload a FASTA file containing sequences of identical length.
Drag and drop a file or click to browse.
No file selected
Expression table with sequence identifiers in the first column and conditions in following columns.
Name of the condition column to analyse in the expression file.
Exploration settings
Number of bins used to segment the sequences.
Reference position in sequences (0-based) used for alignment.
Maximum k-mer length to explore (0 disables the limit).
Optional comma-separated bin boundaries (e.g. -50,10,0,10,50).
Use bins of equal size instead of the polynomial sizing strategy.
Correlation thresholds
Minimum absolute correlation gain required to continue exploration.
Minimum relative correlation gain ((rho_new-rho_old)/rho_old).
Stop exploration when correlation drops below this value.
Execution options
Seed for training/testing split reproducibility.
Apply log transformation to expression values.
DExTER DExTER DExTER online

LoRMA: a self correction program for long reads

LoRMA: a self correction program…

Overview LoRMA is an error correction program for long reads, which are sequences obtained using the third generation of sequencing technologies (3GS), either with Oxford Nanopore technology or with Pacific Biosciences technology. LoRMA is a so-called self-correction software, as opposed to e.g. LoRDEC that is a hybrid error correction tool. This means that LoRMA uses…

EPIK: Precise and scalable evolutionary placement with informative k-mers

EPIK: Precise and scalable evolutionary…

EPIK is a program dedicated to « Phylogenetic Placement » (PP) of metagenomic or metabarcoding reads on a reference tree. It is similar in spirit and technically the successor of RAPPAS (Linard et al. 2020). EPIK achieves identical or slightly better accuracy than RAPPAS and outperforms it in speed and flexibility. In many aspects the documentation of RAPPAS…

dipwmsearch

dipwmsearch

Protein binding sites in DNA or RNA sequences are modeled by probabilistic motifs. A Position Weight Matrix (PWM) is a simple, powerful, and widely used representation of such motifs. Because PWMs assume that sequence positions are independent of eachother (which is too restrictive for some binding or interaction sites), a generalisation of PWMs, termed di-nucleotidic…

Bioinformatics Biology Nucleic acid sites, features and motifs Protein sites, features and motifs Sequence analysis Sequence motif recognition Sequence similarity search Sequence motif FASTA
⚠️ Online tool executions will be unavailable on June 16, 2026, from 9:00 AM due to scheduled cluster maintenance. ⚠️ This website is under construction, some information or links could be unavailable ⚠️