:py:mod:`squid.mave`
====================

.. py:module:: squid.mave


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   squid.mave.InSilicoMAVE


Functions
~~~~~~~~~

.. autoapisummary::

   squid.mave.random_shuffle
   squid.mave.dinuc_shuffle


Attributes
~~~~~~~~~~

.. autoapisummary::

   squid.mave.seq_length


.. py:class:: InSilicoMAVE(mut_generator, mut_predictor, seq_length, mut_window=None, context_agnostic=False, inter_window=None, save_window=None, alphabet=['A', 'C', 'G', 'T'])


   Module for performing in silico MAVE.

   :param mut_generator: Module for performing random mutagenesis.
   :type mut_generator: class
   :param pred_generator: Module for inferring model predictions.
   :type pred_generator: class
   :param seq_length: Full length L of input sequence.
   :type seq_length: int
   :param mut_window: Index of start and stop position along sequence to probe;
                      i.e., [start, stop], where start < stop and both entries
                      satisfy 0 <= int <= L.
   :type mut_window: [int, int]
   :param context_agnostic: Option for generating global neighborhoods, such that the
                            sequence surrounding a conserved pattern of interest is
                            randomly mutated across the in silico MAVE dataset
   :type context_agnostic: boole
   :param inter_window: Index of start and stop position of each inter-site window,
                        where each window defines the boundaries of the sequence
                        in between two sites of interest (optional,  for 'context_agnostic')
   :type inter_window: [int, int] or [[int, int], [int, int], ...]
   :param save_window: Window used for delimiting sequences that are exported in 'x_mut' array;
                       if used, the 'save_window' interval must be equal to or larger 'mut_window',
                       and if larger, 'save window' must contain the interval 'mut_window' entirely
   :type save_window: [int <= mut_window[0], int >= mut_window[1]]
   :param alphabet: The alphabet used to determine the C characters in the logo such that
                    each entry is a string; e.g., ['A','C','G','T'] for DNA.
   :type alphabet: list


   ..
       !! processed by numpydoc !!
   .. py:method:: generate(x, num_sim, seed=None, verbose=1)

      
      Randomly mutate segments in a set of one-hot DNA sequences.

      :param x: One-hot sequence (shape: (L,A)).
      :type x: torch.Tensor
      :param num_sim: Number of sequences to mutagenize for in silico MAVE.
      :type num_sim: int
      :param seed: Sets the random number seed.
      :type seed: int, optional

      :returns: * **x_mut** (*numpy.ndarray*) -- Sequences simulated by mut_predictor.
                * **y_mut** (*numpy.ndarray*) -- Inferred predictions for sequences (shape: (N,1)).


      ..
          !! processed by numpydoc !!

   .. py:method:: pad_seq(x_mut, x, start_position, stop_position, save_window)

      
      Function to pad mutated sequences on both sides with the original unmutated context.

      :param x_mut: Sequences with randomly mutated segments with length l < L
                    defined by l = stop_position - start_position (shape: (N,l,C)).
      :type x_mut: numpy.ndarray
      :param x: Batch of one-hot sequences (shape: (L,A)).
      :type x: torch.Tensor
      :param start_position: Index of start position along sequence to probe.
      :type start_position: int
      :param stop_position: Index of stop position along sequence to probe.
      :type stop_position: int

      :returns: Sequences with randomly mutated segments, padded to correct shape
                with random DNA (shape: (N,L,C)).
      :rtype: numpy.ndarray


      ..
          !! processed by numpydoc !!

   .. py:method:: pad_seq_random(x_mut, x, start_position, stop_position, dinuc=False, inter=False)

      
      Function to pad mutated sequences on both sides with random DNA.

      :param x_mut: Sequences with randomly mutated segments with length l < L
                    defined by l = stop_position - start_position (shape: (N,l,C)).
      :type x_mut: numpy.ndarray
      :param x: Batch of one-hot sequences (shape: (L,A)).
      :type x: torch.Tensor
      :param start_position: Index of start position along sequence to probe.
      :type start_position: int
      :param stop_position: Index of stop position along sequence to probe.
      :type stop_position: int
      :param dinuc: Perform mutagenesis by random shuffle (False) or dinucleotide shuffle (True).
      :type dinuc: boole
      :param inter: Pad sequence to the left and right of 'mut_window' (False) or within 'inter_window' (True)
      :type inter: boole

      :returns: Sequences with randomly mutated segments, padded to correct shape
                with random DNA (shape: (N,L,C)).
      :rtype: numpy.ndarray


      ..
          !! processed by numpydoc !!

   .. py:method:: delimit_range(x, start_position, stop_position)

      
      Function to delimit sequence to a specific region.

      :param x: Batch of one-hot sequences (shape: (L,A)).
      :type x: torch.Tensor
      :param start_position: Index of start position along sequence to probe.
      :type start_position: int
      :param stop_position: Index of stop position along sequence to probe.
      :type stop_position: int

      :returns: Delimited sequences with length l < L defined by
                l = stop_position - start_position (shape: (N,l,C)).
      :rtype: numpy.ndarray


      ..
          !! processed by numpydoc !!


.. py:function:: random_shuffle(seq, alphabet=['A', 'C', 'G', 'T'], num_shufs=None, rng=None)

   
   Creates random shuffles with equiprobability of characters at each position

   :param seq: one-hot encoding of sequence
   :type seq: ndarray
   :param num_shufs: the number of shuffles to create; if unspecified, only one shuffle will be created
   :type num_shufs: int

   :returns: ndarray of shuffled versions of 'seq' (shape=(N,L,D)), also one-hot encoded
             If 'num_shufs' is not specified, then the first dimension of N will not be present
             (i.e. a single string will be returned, or an LxD array).
   :rtype: ndarray


   ..
       !! processed by numpydoc !!

.. py:function:: dinuc_shuffle(seq, num_shufs=None, rng=None)

   
   Creates shuffles of the given sequence, in which dinucleotide frequencies
   are preserved.

   :param seq: either a string of length L, or an L x D NumPy array of one-hot encodings
   :type seq: str or ndarray
   :param num_shufs: the number of shuffles to create, N; if unspecified, only one shuffle will be created
                     `rng`: a NumPy RandomState object, to use for performing shuffles
   :type num_shufs: int

   :returns: * *list (if 'seq' is string)* -- List of N strings of length L, each one being a shuffled version of 'seq'
             * *ndarray (if 'seq' is ndarray)* -- ndarray of shuffled versions of 'seq' (shape=(N,L,D)), also one-hot encoded
               If 'num_shufs' is not specified, then the first dimension of N will not be present
               (i.e. a single string will be returned, or an LxD array).


   ..
       !! processed by numpydoc !!

.. py:data:: seq_length
   :value: 249