squid.mave
Module Contents
Classes
Module for performing in silico MAVE. |
Functions
|
Creates random shuffles with equiprobability of characters at each position |
|
Creates shuffles of the given sequence, in which dinucleotide frequencies |
Attributes
- class squid.mave.InSilicoMAVE(mut_generator, mut_predictor, seq_length, mut_window=None, context_agnostic=False, inter_window=None, save_window=None, alphabet=['A', 'C', 'G', 'T'])[source]
Module for performing in silico MAVE.
- Parameters:
mut_generator (class) – Module for performing random mutagenesis.
pred_generator (class) – Module for inferring model predictions.
seq_length (int) – Full length L of input sequence.
mut_window ([int, int]) – Index of start and stop position along sequence to probe; i.e., [start, stop], where start < stop and both entries satisfy 0 <= int <= L.
context_agnostic (boole) – Option for generating global neighborhoods, such that the sequence surrounding a conserved pattern of interest is randomly mutated across the in silico MAVE dataset
inter_window ([int, int] or [[int, int], [int, int], ...]) – Index of start and stop position of each inter-site window, where each window defines the boundaries of the sequence in between two sites of interest (optional, for ‘context_agnostic’)
save_window ([int <= mut_window[0], int >= mut_window[1]]) – Window used for delimiting sequences that are exported in ‘x_mut’ array; if used, the ‘save_window’ interval must be equal to or larger ‘mut_window’, and if larger, ‘save window’ must contain the interval ‘mut_window’ entirely
alphabet (list) – The alphabet used to determine the C characters in the logo such that each entry is a string; e.g., [‘A’,’C’,’G’,’T’] for DNA.
- generate(x, num_sim, seed=None, verbose=1)[source]
Randomly mutate segments in a set of one-hot DNA sequences.
- Parameters:
- Returns:
x_mut (numpy.ndarray) – Sequences simulated by mut_predictor.
y_mut (numpy.ndarray) – Inferred predictions for sequences (shape: (N,1)).
- pad_seq(x_mut, x, start_position, stop_position, save_window)[source]
Function to pad mutated sequences on both sides with the original unmutated context.
- Parameters:
x_mut (numpy.ndarray) – Sequences with randomly mutated segments with length l < L defined by l = stop_position - start_position (shape: (N,l,C)).
x (torch.Tensor) – Batch of one-hot sequences (shape: (L,A)).
start_position (int) – Index of start position along sequence to probe.
stop_position (int) – Index of stop position along sequence to probe.
- Returns:
Sequences with randomly mutated segments, padded to correct shape with random DNA (shape: (N,L,C)).
- Return type:
numpy.ndarray
- pad_seq_random(x_mut, x, start_position, stop_position, dinuc=False, inter=False)[source]
Function to pad mutated sequences on both sides with random DNA.
- Parameters:
x_mut (numpy.ndarray) – Sequences with randomly mutated segments with length l < L defined by l = stop_position - start_position (shape: (N,l,C)).
x (torch.Tensor) – Batch of one-hot sequences (shape: (L,A)).
start_position (int) – Index of start position along sequence to probe.
stop_position (int) – Index of stop position along sequence to probe.
dinuc (boole) – Perform mutagenesis by random shuffle (False) or dinucleotide shuffle (True).
inter (boole) – Pad sequence to the left and right of ‘mut_window’ (False) or within ‘inter_window’ (True)
- Returns:
Sequences with randomly mutated segments, padded to correct shape with random DNA (shape: (N,L,C)).
- Return type:
numpy.ndarray
- squid.mave.random_shuffle(seq, alphabet=['A', 'C', 'G', 'T'], num_shufs=None, rng=None)[source]
Creates random shuffles with equiprobability of characters at each position
- Parameters:
seq (ndarray) – one-hot encoding of sequence
num_shufs (int) – the number of shuffles to create; if unspecified, only one shuffle will be created
- Returns:
ndarray of shuffled versions of ‘seq’ (shape=(N,L,D)), also one-hot encoded If ‘num_shufs’ is not specified, then the first dimension of N will not be present (i.e. a single string will be returned, or an LxD array).
- Return type:
ndarray
- squid.mave.dinuc_shuffle(seq, num_shufs=None, rng=None)[source]
Creates shuffles of the given sequence, in which dinucleotide frequencies are preserved.
- Parameters:
- Returns:
list (if ‘seq’ is string) – List of N strings of length L, each one being a shuffled version of ‘seq’
ndarray (if ‘seq’ is ndarray) – ndarray of shuffled versions of ‘seq’ (shape=(N,L,D)), also one-hot encoded If ‘num_shufs’ is not specified, then the first dimension of N will not be present (i.e. a single string will be returned, or an LxD array).