Genome-wide detection of structural variants using deep learning

1205 commits | Last update: March 05, 2021

Cite this software

Choose a version:
[[ releases.length > 0 ? releases[selectedIndex].doi : conceptDOI ]]
Copy to clipboard
Choose a reference manager file format:
Download file

What sv-channels can do for you

  • structural variant (SV) caller in short read alignments (BAM files) using one-dimensional Convolutional Neural Networks
  • supports detection of major SV types: deletions (DEL), insertions (INS), inversions (INV), tandem duplications (DUP) and inter-chromosomal translocations (CTX)

The workflow includes the following key steps: 1) Transform read alignments into channels First, split read positions are extracted from the BAM files as candidate regions for SV breakpoints. For each pair of split read positions (rightmost position of the first split part and leftmost position of the second split part) a 2D Numpy array called window is constructed. The shape of a window is [window_size, number_of_channels], where the genomic interval encompassing the window is centered on the split read position with a context of [-100 bp, +100 bp) for a window_size of 200 bp. From all the reads overlapping this genomic interval and from the relative segment subsequence of the reference sequence 79 (number_of_channels) channels are constructed, where each channel encode a signal that can be used for SV calling. The list of channels can be found here. The two windows are joined as linked-windows with a zero padding 2D array of shape [10, number_of_channels] in between to avoid artifacts related to the CNN kernel in the part at the interface between the two windows. The linked-windows are labelled as SV when the split read positions overlap the SV callset used as the ground truth and noSV otherwise, where SV is either DEL,INS,INV,DUP or CTX according to the SV type.

2) Model training The labelled linked-windows are used to train a 1D CNN to learn to classify them as either SV or noSV. Two cross-validation strategies are possible: 10-fold cross-validation and cross-validation by chromosome, where one chromosome is used as the test set and the other chromosomes as the training set.

3) SV calling with a trained model Once a trained model is generated and the BAM file for the test set is converted into linked-windows, the SV calling is performed using the script.

Read more
  • Machine learning
Programming Language
  • Python
  • Java
  • Shell scripts
  • Apache-2.0
Source code

Participating organizations


  • Luca Santuari
    University Medical Center Utrecht
  • Arnold Kuzniar
    Netherlands eScience Center
  • Sonja Georgievska
    Netherlands eScience Center
  • Carl Shneider
    University Medical Center Utrecht
Contact person
Luca Santuari
University Medical Center Utrecht

Information for page maintainers

OAI-PMH metadata:
citation metadata: