:py:mod:`squid.mave` ==================== .. py:module:: squid.mave Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: squid.mave.InSilicoMAVE Functions ~~~~~~~~~ .. autoapisummary:: squid.mave.random_shuffle squid.mave.dinuc_shuffle Attributes ~~~~~~~~~~ .. autoapisummary:: squid.mave.seq_length .. py:class:: InSilicoMAVE(mut_generator, mut_predictor, seq_length, mut_window=None, context_agnostic=False, inter_window=None, save_window=None, alphabet=['A', 'C', 'G', 'T']) Module for performing in silico MAVE. :param mut_generator: Module for performing random mutagenesis. :type mut_generator: class :param pred_generator: Module for inferring model predictions. :type pred_generator: class :param seq_length: Full length L of input sequence. :type seq_length: int :param mut_window: Index of start and stop position along sequence to probe; i.e., [start, stop], where start < stop and both entries satisfy 0 <= int <= L. :type mut_window: [int, int] :param context_agnostic: Option for generating global neighborhoods, such that the sequence surrounding a conserved pattern of interest is randomly mutated across the in silico MAVE dataset :type context_agnostic: boole :param inter_window: Index of start and stop position of each inter-site window, where each window defines the boundaries of the sequence in between two sites of interest (optional, for 'context_agnostic') :type inter_window: [int, int] or [[int, int], [int, int], ...] :param save_window: Window used for delimiting sequences that are exported in 'x_mut' array; if used, the 'save_window' interval must be equal to or larger 'mut_window', and if larger, 'save window' must contain the interval 'mut_window' entirely :type save_window: [int <= mut_window[0], int >= mut_window[1]] :param alphabet: The alphabet used to determine the C characters in the logo such that each entry is a string; e.g., ['A','C','G','T'] for DNA. :type alphabet: list .. !! processed by numpydoc !! .. py:method:: generate(x, num_sim, seed=None, verbose=1) Randomly mutate segments in a set of one-hot DNA sequences. :param x: One-hot sequence (shape: (L,A)). :type x: torch.Tensor :param num_sim: Number of sequences to mutagenize for in silico MAVE. :type num_sim: int :param seed: Sets the random number seed. :type seed: int, optional :returns: * **x_mut** (*numpy.ndarray*) -- Sequences simulated by mut_predictor. * **y_mut** (*numpy.ndarray*) -- Inferred predictions for sequences (shape: (N,1)). .. !! processed by numpydoc !! .. py:method:: pad_seq(x_mut, x, start_position, stop_position, save_window) Function to pad mutated sequences on both sides with the original unmutated context. :param x_mut: Sequences with randomly mutated segments with length l < L defined by l = stop_position - start_position (shape: (N,l,C)). :type x_mut: numpy.ndarray :param x: Batch of one-hot sequences (shape: (L,A)). :type x: torch.Tensor :param start_position: Index of start position along sequence to probe. :type start_position: int :param stop_position: Index of stop position along sequence to probe. :type stop_position: int :returns: Sequences with randomly mutated segments, padded to correct shape with random DNA (shape: (N,L,C)). :rtype: numpy.ndarray .. !! processed by numpydoc !! .. py:method:: pad_seq_random(x_mut, x, start_position, stop_position, dinuc=False, inter=False) Function to pad mutated sequences on both sides with random DNA. :param x_mut: Sequences with randomly mutated segments with length l < L defined by l = stop_position - start_position (shape: (N,l,C)). :type x_mut: numpy.ndarray :param x: Batch of one-hot sequences (shape: (L,A)). :type x: torch.Tensor :param start_position: Index of start position along sequence to probe. :type start_position: int :param stop_position: Index of stop position along sequence to probe. :type stop_position: int :param dinuc: Perform mutagenesis by random shuffle (False) or dinucleotide shuffle (True). :type dinuc: boole :param inter: Pad sequence to the left and right of 'mut_window' (False) or within 'inter_window' (True) :type inter: boole :returns: Sequences with randomly mutated segments, padded to correct shape with random DNA (shape: (N,L,C)). :rtype: numpy.ndarray .. !! processed by numpydoc !! .. py:method:: delimit_range(x, start_position, stop_position) Function to delimit sequence to a specific region. :param x: Batch of one-hot sequences (shape: (L,A)). :type x: torch.Tensor :param start_position: Index of start position along sequence to probe. :type start_position: int :param stop_position: Index of stop position along sequence to probe. :type stop_position: int :returns: Delimited sequences with length l < L defined by l = stop_position - start_position (shape: (N,l,C)). :rtype: numpy.ndarray .. !! processed by numpydoc !! .. py:function:: random_shuffle(seq, alphabet=['A', 'C', 'G', 'T'], num_shufs=None, rng=None) Creates random shuffles with equiprobability of characters at each position :param seq: one-hot encoding of sequence :type seq: ndarray :param num_shufs: the number of shuffles to create; if unspecified, only one shuffle will be created :type num_shufs: int :returns: ndarray of shuffled versions of 'seq' (shape=(N,L,D)), also one-hot encoded If 'num_shufs' is not specified, then the first dimension of N will not be present (i.e. a single string will be returned, or an LxD array). :rtype: ndarray .. !! processed by numpydoc !! .. py:function:: dinuc_shuffle(seq, num_shufs=None, rng=None) Creates shuffles of the given sequence, in which dinucleotide frequencies are preserved. :param seq: either a string of length L, or an L x D NumPy array of one-hot encodings :type seq: str or ndarray :param num_shufs: the number of shuffles to create, N; if unspecified, only one shuffle will be created `rng`: a NumPy RandomState object, to use for performing shuffles :type num_shufs: int :returns: * *list (if 'seq' is string)* -- List of N strings of length L, each one being a shuffled version of 'seq' * *ndarray (if 'seq' is ndarray)* -- ndarray of shuffled versions of 'seq' (shape=(N,L,D)), also one-hot encoded If 'num_shufs' is not specified, then the first dimension of N will not be present (i.e. a single string will be returned, or an LxD array). .. !! processed by numpydoc !! .. py:data:: seq_length :value: 249