squid.mutagenizer
Module Contents
Classes
Base class for in silico MAVE data generation for a given sequence. |
|
Module for performing random mutagenesis. |
|
Module for performing combinatorial mutagenesis. |
|
Module to perform random mutagenesis using two-hot encoding. |
Functions
|
Function to perform random mutagenesis. |
|
Function to convert two-hot encoding to a DNA sequence. |
|
Function to convert heterozygous DNA sequence to two-hot encoding. |
- class squid.mutagenizer.BaseMutagenesis[source]
Base class for in silico MAVE data generation for a given sequence.
- abstract __call__(x, num_sim)[source]
Return an in silico MAVE based on mutagenesis of ‘x’.
- Parameters:
x (torch.Tensor) – one-hot sequence (shape: (L, A)).
num_sim (int) – Number of sequences to mutagenize.
- Returns:
Batch of one-hot sequences with random augmentation applied.
- Return type:
torch.Tensor
- class squid.mutagenizer.RandomMutagenesis(mut_rate, uniform=False)[source]
Bases:
BaseMutagenesisModule for performing random mutagenesis.
- Parameters:
- Returns:
Batch of one-hot sequences with random mutagenesis applied.
- Return type:
numpy.ndarray
- class squid.mutagenizer.CombinatorialMutagenesis[source]
Module for performing combinatorial mutagenesis.
- Returns:
Batch of one-hot sequences with combinatorial mutagenesis applied, such that the number of sequences produced is the number of characters A in the alphabet raised to the length L of the ‘mut_window’.
- Return type:
numpy.ndarray
- class squid.mutagenizer.TwoHotMutagenesis(mut_rate, uniform=False)[source]
Bases:
BaseMutagenesisModule to perform random mutagenesis using two-hot encoding. That is, encode each individual nucleotide at a given position using a one-hot encoding scheme, then represent the unphased diploid sequence as the sum of the two one-hot encoded nucleotides at each position. The sequence “AYCR”, for example, would be encoded as: [[2, 0, 0, 0], [0, 1, 0, 1], [0, 2, 0, 0], [1, 0, 1, 0]].
- Returns:
Batch of one-hot sequences with random mutagenesis applied, with alphabet: {A, C, G, T, R (A/G), Y (C/T), S (C/G), W (A/T), K (G/T), M (A/C)}, such that heterozygous positions are represented using the IUPAC ambiguity codes.
- Return type:
numpy.ndarray
- squid.mutagenizer.apply_mut_by_seq_index(x_index, shape, num_muts)[source]
Function to perform random mutagenesis.