Installation Instructions
SQUID
To install SQUID, first create a new Anaconda environment via the command line:
$ conda create --name squid python==3.7.2
Next, activate this environment via conda activate squid
, and install the following package:
$ pip install squid-nn
Alternatively, you can clone SQUID from GitHub using the command line:
$ cd appropriate_directory
$ git clone https://github.com/evanseitz/squid-nn.git
where appropriate_directory
is the absolute path to where you would like
SQUID to reside. Then add the following to the top of any Python file in
which you use SQUID:
# Insert local path to SQUID at beginning of Python's path
import sys
sys.path.insert(0, 'appropriate_directory/squid-nn')
# Load squid
import squid
MAVE-NN
MAVE-NN models and related visualization tools require additional dependencies:
$ pip install logomaker
$ pip install mavenn
$ pip install mavenn --upgrade
Please see the MAVE-NN documentation for more information.
For older DNNs that require inference via Tensorflow 1.x, Python 2.x is required which is not supported by MAVE-NN. Users will need to create separate environments in this case:
Tensorflow 1.x and Python 2.x environment for generating in silico MAVE data
Tensorflow 2.x and Python 3.x environment for training MAVE-NN models
Example
The following example requires an activated squid
environment loaded with two additional dependencies:
$ pip install kipoi --upgrade
$ pip install kipoi-seq --upgrade
Next, copy the following code into a new script squid_example.py
and run via python squid_example.py
import squid
import logomaker
import mavenn
import kipoi
task_idx = 0 #deepstarr task index (either 0 Dev or 1 HK)
alphabet = ['A','C','G','T']
# define sequence of interest
seq = 'GGCTCTGTCTCAGTTTCTGATTCAGTTTCGGATCCACTTCGAGAGGCAGAAGTCGGGGTCCGAGAGGCATTAGCTTGTTAGTTCTACAACCTGCTGGCAAATGTGCCAATATGTTTGCACGCTGATAAGGCCTACATGGCACCGAATTGAAAACCGCTTACATAATGAAGTGAATAGTCAGCGAATCGGCAGAGCAACCGCAATGCATTGCATTCACCATCGCGAATAATCAGATTCAAGGCAACGATC'
# convert to one-hot
x = squid.utils.seq2oh(seq, alphabet)
# instantiate kipoi model for deepstarr
model = kipoi.get_model('DeepSTARR')
# define how to go from kipoi prediction to scalar
kipoi_predictor = squid.predictor.ScalarPredictor(model.predict_on_batch,
task_idx=task_idx, batch_size=512)
# set up mutagenizer class for in silico MAVE
mut_generator = squid.mutagenizer.RandomMutagenesis(mut_rate=0.1, uniform=False)
# generate in silico MAVE
seq_length = len(x)
mut_window = [0, seq_length] #interval in sequence to mutagenize
num_sim = 20000 #number of sequence to simulate
mave = squid.mave.InSilicoMAVE(mut_generator, mut_predictor=kipoi_predictor,
seq_length=seq_length, mut_window=mut_window)
x_mut, y_mut = mave.generate(x, num_sim=num_sim)
# choose surrogate model type
gpmap = 'additive'
# MAVE-NN model with GE nonlinearity
surrogate_model = squid.surrogate_zoo.SurrogateMAVENN(x_mut.shape, num_tasks=y_mut.shape[1],
gpmap=gpmap, regression_type='GE',
linearity='nonlinear', noise='SkewedT',
noise_order=2, reg_strength=0.1,
alphabet=alphabet, deduplicate=True,
gpu=True)
# train surrogate model
surrogate, mave_df = surrogate_model.train(x_mut, y_mut, learning_rate=5e-4, epochs=500, batch_size=100,
early_stopping=True, patience=25, restore_best_weights=True,
save_dir=None, verbose=1)
# retrieve model parameters
params = surrogate_model.get_params(gauge='empirical')
# generate sequence logo
logo = surrogate_model.get_logo(mut_window=mut_window, full_length=seq_length)
# fix gauge for variant effect prediction
variant_effect = squid.utils.fix_gauge(logo, gauge='wildtype', wt=x_mut[0])
# save variant effects to pandas
variant_effect_df = squid.utils.arr2pd(variant_effect, alphabet)
# plot additive logo in wildtype gauge
fig = squid.impress.plot_additive_logo(variant_effect, center=False, view_window=mut_window,
alphabet=alphabet, fig_size=[20,2.5], save_dir=save_dir)