:py:mod:`squid.utils` ===================== .. py:module:: squid.utils Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: squid.utils.arr2pd squid.utils.oh2seq squid.utils.seq2oh squid.utils.fix_gauge .. py:function:: arr2pd(x, alphabet=['A', 'C', 'G', 'T']) Function to convert a Numpy array to Pandas dataframe with proper column headings. :param x: One-hot encoding or attribution map (shape : (L,C)). :type x: numpy.ndarray :param alphabet: The alphabet used to determine the C characters in the logo such that each entry is a string; e.g., ['A','C','G','T'] for DNA. :type alphabet: list :returns: **x** -- Dataframe corresponding to the input array. :rtype: pandas.dataframe .. !! processed by numpydoc !! .. py:function:: oh2seq(one_hot, alphabet=['A', 'C', 'G', 'T']) Function to convert one-hot encoding to a sequence. :param one_hot: Input one-hot encoding of sequence (shape : (L,C)) :type one_hot: numpy.ndarray :param alphabet: The alphabet used to determine the C characters in the logo such that each entry is a string; e.g., ['A','C','G','T'] for DNA. :type alphabet: list :returns: **seq** -- Input sequence with length L. :rtype: string .. !! processed by numpydoc !! .. py:function:: seq2oh(seq, alphabet=['A', 'C', 'G', 'T']) Function to convert a sequence to one-hot encoding. :param seq: Input sequence with length L :type seq: string :param alphabet: The alphabet used to determine the C characters in the logo such that each entry is a string; e.g., ['A','C','G','T'] for DNA. :type alphabet: list :returns: **one_hot** -- One-hot encoding corresponding to input sequence (shape : (L,C)). :rtype: numpy.ndarray .. !! processed by numpydoc !! .. py:function:: fix_gauge(x, gauge, wt=None, r=0.1) Function to fix the gauge for an attribution matrix. :param x: Attribution scores for a sequence-of-interest (shape : (L,C)). :type x: numpy.ndarray :param gauge: See https://mavenn.readthedocs.io/en/latest/math.html for more info. 'uniform' : hierarchical gauge using a uniform sequence distribution over the characters at each position observed in the training set (unobserved characters are assigned probability 0). 'empirical' : uses an empirical distribution computed from the training data. 'consensus' : wild-type gauge using the training data consensus sequence. 'default' : default gauge (no change). :type gauge: gauge mode used to fix model parameters. :param OH_wt: Wild-type sequence (one-hot encoding) for 'wildtype' or 'empirical' gauge (shape : (L,C)). :type OH_wt: numpy.ndarray :param r: For 'empirical gauge', the probability of mutation used during generation of in silico MAVE dataset (should match user-defined 'mut_rate'). :type r: float :returns: **OH** -- Gauge-fixed one-hot encoding corresponding to input sequence (shape : (L,C)). :rtype: numpy.ndarray .. !! processed by numpydoc !!