:py:mod:`squid.utils`
=====================

.. py:module:: squid.utils


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::

   squid.utils.arr2pd
   squid.utils.oh2seq
   squid.utils.seq2oh
   squid.utils.fix_gauge


.. py:function:: arr2pd(x, alphabet=['A', 'C', 'G', 'T'])

   
   Function to convert a Numpy array to Pandas dataframe with proper column headings.

   :param x: One-hot encoding or attribution map (shape : (L,C)).
   :type x: numpy.ndarray
   :param alphabet: The alphabet used to determine the C characters in the logo such that
                    each entry is a string; e.g., ['A','C','G','T'] for DNA.
   :type alphabet: list

   :returns: **x** -- Dataframe corresponding to the input array.
   :rtype: pandas.dataframe


   ..
       !! processed by numpydoc !!

.. py:function:: oh2seq(one_hot, alphabet=['A', 'C', 'G', 'T'])

   
   Function to convert one-hot encoding to a sequence.

   :param one_hot: Input one-hot encoding of sequence (shape : (L,C))
   :type one_hot: numpy.ndarray
   :param alphabet: The alphabet used to determine the C characters in the logo such that
                    each entry is a string; e.g., ['A','C','G','T'] for DNA.
   :type alphabet: list

   :returns: **seq** -- Input sequence with length L.
   :rtype: string


   ..
       !! processed by numpydoc !!

.. py:function:: seq2oh(seq, alphabet=['A', 'C', 'G', 'T'])

   
   Function to convert a sequence to one-hot encoding.

   :param seq: Input sequence with length L
   :type seq: string
   :param alphabet: The alphabet used to determine the C characters in the logo such that
                    each entry is a string; e.g., ['A','C','G','T'] for DNA.
   :type alphabet: list

   :returns: **one_hot** -- One-hot encoding corresponding to input sequence (shape : (L,C)).
   :rtype: numpy.ndarray


   ..
       !! processed by numpydoc !!

.. py:function:: fix_gauge(x, gauge, wt=None, r=0.1)

   
   Function to fix the gauge for an attribution matrix.

   :param x: Attribution scores for a sequence-of-interest (shape : (L,C)).
   :type x: numpy.ndarray
   :param gauge: See https://mavenn.readthedocs.io/en/latest/math.html for more info.
                 'uniform'   :   hierarchical gauge using a uniform sequence distribution over
                                 the characters at each position observed in the training set
                                 (unobserved characters are assigned probability 0).
                 'empirical' :   uses an empirical distribution computed from the training data.
                 'consensus' :   wild-type gauge using the training data consensus sequence.
                 'default'   :   default gauge (no change).
   :type gauge: gauge mode used to fix model parameters.
   :param OH_wt: Wild-type sequence (one-hot encoding) for 'wildtype' or 'empirical' gauge (shape : (L,C)).
   :type OH_wt: numpy.ndarray
   :param r: For 'empirical gauge', the probability of mutation used during generation of
             in silico MAVE dataset (should match user-defined 'mut_rate').
   :type r: float

   :returns: **OH** -- Gauge-fixed one-hot encoding corresponding to input sequence (shape : (L,C)).
   :rtype: numpy.ndarray


   ..
       !! processed by numpydoc !!