got.taxonomies package¶
Submodules¶
got.taxonomies.ete3_functions module¶
Functions for dealing with ete3 for taxonomy representations
-
got.taxonomies.ete3_functions.make_ete3_lifted(taxonomy_tree: Union[got.taxonomies.taxonomy.Node, got.taxonomies.taxonomy.Taxonomy], print_all: bool = True) → str[source]¶ - Returns ete3 representation of a taxonomy tree
- after lifting procedure completed
Parameters: Returns: resulting ete3 representation
Return type: str
-
got.taxonomies.ete3_functions.make_ete3_raw(taxonomy_tree: Union[got.taxonomies.taxonomy.Node, got.taxonomies.taxonomy.Taxonomy]) → str[source]¶ - Returns ete3 representation of a taxonomy tree
- for raw taxonomy
Parameters: taxonomy_tree (Union[Node, Taxonomy]) – the root of the taxonomy tree / sub-tree or taxonomy Returns: resulting ete3 representation Return type: str
-
got.taxonomies.ete3_functions.save_ete3(ete3_desc: str, filename: str = 'taxonomy_tree_lifted.ete') → None[source]¶ Writes resulting ete3 in a file
Parameters: - ete3_desc (str) – ete3 representation in a string
- filename (str, default="taxonomy_tree_lifted.ete") – name of the file for writing
Returns: Return type: None
got.taxonomies.pargenfs module¶
ParGenFS algorithm with accessory functions
-
got.taxonomies.pargenfs.annotate_with_sum(node: got.taxonomies.taxonomy.Node, cluster: Dict[str, float]) → float[source]¶ Annotates a tree with the cluster weights
Parameters: - node (Node) – the root of the taxonomy tree / sub-tree
- cluster (Dict[str, float]) – the cluster
Returns: a not-normalized sum of squared weights
Return type: float
-
got.taxonomies.pargenfs.enumerate_tree_layers(node: got.taxonomies.taxonomy.Node, current_layer: int = 0) → None[source]¶ Assigns a corresponding layer numbers to the all nodes of the taxonomy
Parameters: - node (Node) – the root of the taxonomy tree / sub-tree
- current_layer (int, default=0) – a layer number (nodes’ level) to assign
Returns: Return type: None
-
got.taxonomies.pargenfs.get_cluster_k(tree_leaves: List[got.taxonomies.taxonomy.Node], node_names: List[str], membership_matrix: List[List[float]], k: int) → Dict[str, float][source]¶ Return a membership vector corresponding to a k-th cluster
Parameters: - tree_leaves (List[Node]) – all the leaves of the taxonomy
- node_names (List[str]) – string names of nodes
- membership_matrix (List[List[float]]) – membership matrix, size: (number_of_clusters x number_of_node_names)
- k (int) – index of a cluster
Returns: membership dictionary corresponding to a k-th cluster
Return type: Dict[str, float]
-
got.taxonomies.pargenfs.indicate_offshoots(node: got.taxonomies.taxonomy.Node) → None[source]¶ Indicates all the offshoots in the tree / sub-tree
Parameters: node (Node) – the root of the taxonomy tree / sub-tree Returns: Return type: None
-
got.taxonomies.pargenfs.make_init_step(node: got.taxonomies.taxonomy.Node, gamma_v: float) → None[source]¶ Init step of ParGenFS algorithm
Parameters: - node (Node) – the root of the taxonomy tree / sub-tree
- gamma_v (float) – gamma value
Returns: Return type: None
-
got.taxonomies.pargenfs.make_recursive_step(node: got.taxonomies.taxonomy.Node, gamma_v: float, lambda_v: float) → None[source]¶ Recursive step of ParGenFS algorithm
Parameters: - node (Node) – the root of the taxonomy tree / sub-tree
- gamma_v (float) – gamma value
- lambda_v (float) – lambda value
Returns: Return type: None
-
got.taxonomies.pargenfs.make_result_table(node: got.taxonomies.taxonomy.Node) → List[List[str]][source]¶ Indicates all the offshoots in the tree / sub-tree
Parameters: node (Node) – the root of the taxonomy tree / sub-tree Returns: resulting table for printing / saving in a file Return type: List[List[str]]
-
got.taxonomies.pargenfs.normalize_and_return_leaf_weights(node: got.taxonomies.taxonomy.Node, summ: float) → List[List[Union[str, float]]][source]¶ Normalizes leaves’ weights (annotations)
Parameters: - node (Node) – the root of the taxonomy tree / sub-tree
- summ (float) – sum of weights value
Returns: a list of weights normalized
Return type: List[List[Union[str, float]]]
-
got.taxonomies.pargenfs.pargenfs(cluster: Dict[str, float], taxonomy_tree: got.taxonomies.taxonomy.Taxonomy, gamma_v: float = 0.2, lambda_v: float = 0.2) → None[source]¶ Runs ParGenFS algorithm over a taxonomy tree
Parameters: - cluster (List[float]) – the cluster to generalize
- taxonomy_tree (Taxonomy) – the taxonomy tree
- gamma_v (float, default=.2) – gamma penalty value
- lambda_v (float, default=.2) – lambda penalty value
Returns: Return type: None
-
got.taxonomies.pargenfs.prune_tree(node: got.taxonomies.taxonomy.Node) → None[source]¶ Prunes the tree / sub-tree
Parameters: node (Node) – the root of the taxonomy tree / sub-tree Returns: Return type: None
-
got.taxonomies.pargenfs.reduce_edges(node: got.taxonomies.taxonomy.Node) → None[source]¶ Reduces tree edges for the tree / sub-tree
Parameters: node (Node) – the root of the taxonomy tree / sub-tree Returns: Return type: None
-
got.taxonomies.pargenfs.run(taxonomy_file: str, taxonomy_leaves: str, clusters: str, cluster_number: int) → None[source]¶ Obtains cluster and runs ParGenFS algorithm over a taxonomy tree
Parameters: Returns: Return type: None
-
got.taxonomies.pargenfs.save_result_table(result_table: List[List[str]], filename: str = 'table.csv') → None[source]¶ Writes resulting table in a file
Parameters: - result_table (List[List[str]]) – table for saving
- filename (str, default="table.csv") – name of the file for writing
Returns: Return type: None
-
got.taxonomies.pargenfs.set_gaps_for_tree(node: got.taxonomies.taxonomy.Node) → None[source]¶ Sets gaps for the tree / sub-tree
Parameters: node (Node) – the root of the taxonomy tree / sub-tree Returns: Return type: None
-
got.taxonomies.pargenfs.set_internal_weights(node: got.taxonomies.taxonomy.Node) → float[source]¶ Sets weights for internal nodes
Parameters: node (Node) – the root of the taxonomy tree / sub-tree Returns: summ of the resulting squared weights Return type: float
-
got.taxonomies.pargenfs.set_parameters(node: got.taxonomies.taxonomy.Node) → None[source]¶ Sets parameters G, v, V for the tree / sub-tree
Parameters: node (Node) – the root of the taxonomy tree / sub-tree Returns: Return type: None
-
got.taxonomies.pargenfs.truncate_weights(node: got.taxonomies.taxonomy.Node, threshold: float) → float[source]¶ Truncates (sets to zero) leaves’ weights (annotations) what are less than the threshold
Parameters: - node (Node) – the root of the taxonomy tree / sub-tree
- threshold (float) – the threshold value
Returns: summ of the resulting squared weights
Return type: float
got.taxonomies.taxonomy module¶
A class for taxonomy representing
-
class
got.taxonomies.taxonomy.Node(index: str, name: str, parent: Optional[Node], children: List[Node] = None)[source]¶ Bases:
collections.abc.CollectionA class used to represent a Tree node with the all descendants. This is a basic data structure for a taxonomy representing.
- index : str
- a string representing the node index, for example 1.2.3.
- name : str
- the name of the node
- parent : Node or None
- the parent of the node
- children : List[‘Node’]
- a list of the all direct descendants (children) of the node
- u : float
- membership value (normalized)
- score : float
- membership value (non-normalized)
- v : float
- node’s gap importance
- V : float
- node’s cumulative gap importance
- G : List[‘Node’]
- node’s set of gaps
- L : List[‘Node’]
- node’s set of losses
- p : float
- node’s ParGenFS penalty
- H : List[‘Node’]
- node’s head subjects
- __init__(index, name, parent, children)
- constructor
- __contains__(item)
- checks whether the item is a direct decsendant of the node, one may use “in” operator to check the property above
- __iter__()
- iterates over all descendants of the node, this is a syntactic sugar for iteration over “node.children”
- __len__()
- returns the outgoing degree of the node, i.e., the number of node’s children
- __setattr__(name, value)
- allows to set any custom attribute, this is useful for ParGenFS algorithm
- __getattr__(name)
- allows to get a custom attribute. If there is no such attrubute, returns “None”
- is_leaf() (property)
- checks whether the node is a leaf node
- is_internal() (property)
- checks whether the node is an internal node (i.e., is not a leaf)
- is_root() (property)
- checks whether the node is a root of the tree
-
is_internal¶ Checks whether the node is an internal node (i.e., is not a leaf)
Returns: “True” if the node is an internal node, else “False” Return type: bool
-
is_leaf¶ Checks whether the node is a leaf node
Returns: “True” if the node is a leaf node, else “False” Return type: bool
-
is_root¶ Checks whether the node is a root of the tree
Returns: “True” if the node is a root node, else “False” Return type: bool
-
class
got.taxonomies.taxonomy.Taxonomy(filename: str)[source]¶ Bases:
objectA class for taxonomy representing
- built_from : str
- a string representing the filename using for taxonomy building
- _root : Node
- a root of the taxonomy tree
- leaves_extracted : bool
- label: whether leaves were extracted for the taxonomy or not
- _leaves : List[None]
- containts all the leaves of the taxonomy
- __init__(filename)
- constructor
- __repr__()
- represents basic info about the taxonomy
- get_taxonomy_tree(filename)
- builds the taxonomy from the file
- leaves() (property)
- returns all the leaves of the taxonomy
- root() (property)
- returns the root of the taxonomy
- get_index_and_name(node_repr) (staticmethod)
- returns str representations for index and name of node
-
static
get_index_and_name(node_repr: Tuple[re.Match, re.Match]) → Tuple[str, str][source]¶ returns str representations of index and name
Parameters: node (Tuple[re.Match, re.Match]) – index and name found by regexp Returns: node index and name Return type: Union[str, str]
-
get_taxonomy_tree(filename: str) → got.taxonomies.taxonomy.Node[source]¶ Builds the taxonomy from its description in the file
Parameters: filename (str) – the file with the taxonomy representation in flat-view taxonomy representation (FVTR) format Returns: the root of the taxonomy built Return type: Node
got.taxonomies.visualize module¶
Taxonomy visualization
-
got.taxonomies.visualize.draw_lifting_tree(filename: str) → None[source]¶ Draws a tree from ete3 representation stored in a file
Parameters: filename (str) – a name of the file Returns: Return type: None
-
got.taxonomies.visualize.draw_raw_tree(filename: str) → None[source]¶ Draws a raw tree from ete3 representation stored in a file
Parameters: filename (str) – a name of the file Returns: Return type: None
-
got.taxonomies.visualize.layout_lift(node: ete3.coretype.tree.TreeNode, levels: int = 3) → None[source]¶ Layout implementation for a tree node
Parameters: - node (TreeNode) – the root of the taxonomy tree / sub-tree
- levels (int) – a number of tree levels to draw
Returns: Return type: None
-
got.taxonomies.visualize.layout_raw(node: ete3.coretype.tree.TreeNode, tight_mode: bool = True) → None[source]¶ Layout implementation for a tree node
Parameters: - node (TreeNode) – the root of the taxonomy tree / sub-tree
- tight_mode (bool, default=True) – a mode to print node names more tightly
Returns: Return type: None