BeeTLe#

A deep learning framework for linear B-cell epitope prediction and antibody type-specific epitope classification using Transformer and LSTM encoders.

arXiv | ECML PKDD 2023

Usage#

Command Line#

After installed, run command like below. It takes a few seconds to predict 10000 peptides.

python cli.py -i input.fasta -o output.csv

To show help, run python cli.py -h. The input is a FASTA file of peptides. The output is a table with following columns:

  • identifier: FASTA header.

  • sequence: FASTA sequence.

  • score: Probability of being epitope.

  • epitope: {0, 1}. 1 for epitope (score > 0.5).

  • Ig: {A, E, M}. The antibody most probably binds to in these three types.

Web App#

Without installation, navigate to Streamlit.

Installation#

Linux is preferred. GPU is not required.

  • Clone this repo and navigate to the repo folder.

  • Install with pip, preferably in a virtual environment:

    pip install -r requirements.txt
    
  • Alternatively, to be more specific, use mamba in Linux:

    mamba env create -p ./envs -f environment.yml
    mamba activate ./envs
    

Data#

Follow the notebook data/dataset.py to generate datasets, in which redundancy and false negatives are reduced. The raw data is on figshare.

Development#

The code is designed to be reusable and extensible. It may be adopted in other peptide classification tasks. Some useful components are:

  • Loss functions: logit-adjusted, focal; sigmoid, softmax.

  • LSTM (packed variable length input), Transformer encoder, attention.

  • Amino acid encoder.