got.taxonomies package

Submodules

got.taxonomies.ete3_functions module

Functions for dealing with ete3 for taxonomy representations

got.taxonomies.ete3_functions.make_ete3_lifted(taxonomy_tree: Union[got.taxonomies.taxonomy.Node, got.taxonomies.taxonomy.Taxonomy], print_all: bool = True) → str[source]
Returns ete3 representation of a taxonomy tree
after lifting procedure completed
Parameters:
  • taxonomy_tree (Union[Node, Taxonomy]) – the root of the taxonomy tree / sub-tree or taxonomy
  • print_all (bool, default=True) – label for printing all the parameters
Returns:

resulting ete3 representation

Return type:

str

got.taxonomies.ete3_functions.make_ete3_raw(taxonomy_tree: Union[got.taxonomies.taxonomy.Node, got.taxonomies.taxonomy.Taxonomy]) → str[source]
Returns ete3 representation of a taxonomy tree
for raw taxonomy
Parameters:taxonomy_tree (Union[Node, Taxonomy]) – the root of the taxonomy tree / sub-tree or taxonomy
Returns:resulting ete3 representation
Return type:str
got.taxonomies.ete3_functions.save_ete3(ete3_desc: str, filename: str = 'taxonomy_tree_lifted.ete') → None[source]

Writes resulting ete3 in a file

Parameters:
  • ete3_desc (str) – ete3 representation in a string
  • filename (str, default="taxonomy_tree_lifted.ete") – name of the file for writing
Returns:

Return type:

None

got.taxonomies.pargenfs module

ParGenFS algorithm with accessory functions

got.taxonomies.pargenfs.annotate_with_sum(node: got.taxonomies.taxonomy.Node, cluster: Dict[str, float]) → float[source]

Annotates a tree with the cluster weights

Parameters:
  • node (Node) – the root of the taxonomy tree / sub-tree
  • cluster (Dict[str, float]) – the cluster
Returns:

a not-normalized sum of squared weights

Return type:

float

got.taxonomies.pargenfs.enumerate_tree_layers(node: got.taxonomies.taxonomy.Node, current_layer: int = 0) → None[source]

Assigns a corresponding layer numbers to the all nodes of the taxonomy

Parameters:
  • node (Node) – the root of the taxonomy tree / sub-tree
  • current_layer (int, default=0) – a layer number (nodes’ level) to assign
Returns:

Return type:

None

got.taxonomies.pargenfs.get_cluster_k(tree_leaves: List[got.taxonomies.taxonomy.Node], node_names: List[str], membership_matrix: List[List[float]], k: int) → Dict[str, float][source]

Return a membership vector corresponding to a k-th cluster

Parameters:
  • tree_leaves (List[Node]) – all the leaves of the taxonomy
  • node_names (List[str]) – string names of nodes
  • membership_matrix (List[List[float]]) – membership matrix, size: (number_of_clusters x number_of_node_names)
  • k (int) – index of a cluster
Returns:

membership dictionary corresponding to a k-th cluster

Return type:

Dict[str, float]

got.taxonomies.pargenfs.indicate_offshoots(node: got.taxonomies.taxonomy.Node) → None[source]

Indicates all the offshoots in the tree / sub-tree

Parameters:node (Node) – the root of the taxonomy tree / sub-tree
Returns:
Return type:None
got.taxonomies.pargenfs.make_init_step(node: got.taxonomies.taxonomy.Node, gamma_v: float) → None[source]

Init step of ParGenFS algorithm

Parameters:
  • node (Node) – the root of the taxonomy tree / sub-tree
  • gamma_v (float) – gamma value
Returns:

Return type:

None

got.taxonomies.pargenfs.make_recursive_step(node: got.taxonomies.taxonomy.Node, gamma_v: float, lambda_v: float) → None[source]

Recursive step of ParGenFS algorithm

Parameters:
  • node (Node) – the root of the taxonomy tree / sub-tree
  • gamma_v (float) – gamma value
  • lambda_v (float) – lambda value
Returns:

Return type:

None

got.taxonomies.pargenfs.make_result_table(node: got.taxonomies.taxonomy.Node) → List[List[str]][source]

Indicates all the offshoots in the tree / sub-tree

Parameters:node (Node) – the root of the taxonomy tree / sub-tree
Returns:resulting table for printing / saving in a file
Return type:List[List[str]]
got.taxonomies.pargenfs.normalize_and_return_leaf_weights(node: got.taxonomies.taxonomy.Node, summ: float) → List[List[Union[str, float]]][source]

Normalizes leaves’ weights (annotations)

Parameters:
  • node (Node) – the root of the taxonomy tree / sub-tree
  • summ (float) – sum of weights value
Returns:

a list of weights normalized

Return type:

List[List[Union[str, float]]]

got.taxonomies.pargenfs.pargenfs(cluster: Dict[str, float], taxonomy_tree: got.taxonomies.taxonomy.Taxonomy, gamma_v: float = 0.2, lambda_v: float = 0.2) → None[source]

Runs ParGenFS algorithm over a taxonomy tree

Parameters:
  • cluster (List[float]) – the cluster to generalize
  • taxonomy_tree (Taxonomy) – the taxonomy tree
  • gamma_v (float, default=.2) – gamma penalty value
  • lambda_v (float, default=.2) – lambda penalty value
Returns:

Return type:

None

got.taxonomies.pargenfs.prune_tree(node: got.taxonomies.taxonomy.Node) → None[source]

Prunes the tree / sub-tree

Parameters:node (Node) – the root of the taxonomy tree / sub-tree
Returns:
Return type:None
got.taxonomies.pargenfs.reduce_edges(node: got.taxonomies.taxonomy.Node) → None[source]

Reduces tree edges for the tree / sub-tree

Parameters:node (Node) – the root of the taxonomy tree / sub-tree
Returns:
Return type:None
got.taxonomies.pargenfs.run(taxonomy_file: str, taxonomy_leaves: str, clusters: str, cluster_number: int) → None[source]

Obtains cluster and runs ParGenFS algorithm over a taxonomy tree

Parameters:
  • taxonomy_file (str) – taxonomy description in *.fvtr format
  • taxonomy_leaves (str) – taxonomy leaves in *.txt format
  • clusters (str) – clusters’ membership table in *.dat format
  • cluster_number (int) – number of cluster for lifting
Returns:

Return type:

None

got.taxonomies.pargenfs.save_result_table(result_table: List[List[str]], filename: str = 'table.csv') → None[source]

Writes resulting table in a file

Parameters:
  • result_table (List[List[str]]) – table for saving
  • filename (str, default="table.csv") – name of the file for writing
Returns:

Return type:

None

got.taxonomies.pargenfs.set_gaps_for_tree(node: got.taxonomies.taxonomy.Node) → None[source]

Sets gaps for the tree / sub-tree

Parameters:node (Node) – the root of the taxonomy tree / sub-tree
Returns:
Return type:None
got.taxonomies.pargenfs.set_internal_weights(node: got.taxonomies.taxonomy.Node) → float[source]

Sets weights for internal nodes

Parameters:node (Node) – the root of the taxonomy tree / sub-tree
Returns:summ of the resulting squared weights
Return type:float
got.taxonomies.pargenfs.set_parameters(node: got.taxonomies.taxonomy.Node) → None[source]

Sets parameters G, v, V for the tree / sub-tree

Parameters:node (Node) – the root of the taxonomy tree / sub-tree
Returns:
Return type:None
got.taxonomies.pargenfs.truncate_weights(node: got.taxonomies.taxonomy.Node, threshold: float) → float[source]

Truncates (sets to zero) leaves’ weights (annotations) what are less than the threshold

Parameters:
  • node (Node) – the root of the taxonomy tree / sub-tree
  • threshold (float) – the threshold value
Returns:

summ of the resulting squared weights

Return type:

float

got.taxonomies.taxonomy module

A class for taxonomy representing

class got.taxonomies.taxonomy.Node(index: str, name: str, parent: Optional[Node], children: List[Node] = None)[source]

Bases: collections.abc.Collection

A class used to represent a Tree node with the all descendants. This is a basic data structure for a taxonomy representing.

index : str
a string representing the node index, for example 1.2.3.
name : str
the name of the node
parent : Node or None
the parent of the node
children : List[‘Node’]
a list of the all direct descendants (children) of the node
u : float
membership value (normalized)
score : float
membership value (non-normalized)
v : float
node’s gap importance
V : float
node’s cumulative gap importance
G : List[‘Node’]
node’s set of gaps
L : List[‘Node’]
node’s set of losses
p : float
node’s ParGenFS penalty
H : List[‘Node’]
node’s head subjects
__init__(index, name, parent, children)
constructor
__contains__(item)
checks whether the item is a direct decsendant of the node, one may use “in” operator to check the property above
__iter__()
iterates over all descendants of the node, this is a syntactic sugar for iteration over “node.children”
__len__()
returns the outgoing degree of the node, i.e., the number of node’s children
__setattr__(name, value)
allows to set any custom attribute, this is useful for ParGenFS algorithm
__getattr__(name)
allows to get a custom attribute. If there is no such attrubute, returns “None”
is_leaf() (property)
checks whether the node is a leaf node
is_internal() (property)
checks whether the node is an internal node (i.e., is not a leaf)
is_root() (property)
checks whether the node is a root of the tree
is_internal

Checks whether the node is an internal node (i.e., is not a leaf)

Returns:“True” if the node is an internal node, else “False”
Return type:bool
is_leaf

Checks whether the node is a leaf node

Returns:“True” if the node is a leaf node, else “False”
Return type:bool
is_root

Checks whether the node is a root of the tree

Returns:“True” if the node is a root node, else “False”
Return type:bool
class got.taxonomies.taxonomy.Taxonomy(filename: str)[source]

Bases: object

A class for taxonomy representing

built_from : str
a string representing the filename using for taxonomy building
_root : Node
a root of the taxonomy tree
leaves_extracted : bool
label: whether leaves were extracted for the taxonomy or not
_leaves : List[None]
containts all the leaves of the taxonomy
__init__(filename)
constructor
__repr__()
represents basic info about the taxonomy
get_taxonomy_tree(filename)
builds the taxonomy from the file
leaves() (property)
returns all the leaves of the taxonomy
root() (property)
returns the root of the taxonomy
get_index_and_name(node_repr) (staticmethod)
returns str representations for index and name of node
static get_index_and_name(node_repr: Tuple[re.Match, re.Match]) → Tuple[str, str][source]

returns str representations of index and name

Parameters:node (Tuple[re.Match, re.Match]) – index and name found by regexp
Returns:node index and name
Return type:Union[str, str]
get_taxonomy_tree(filename: str) → got.taxonomies.taxonomy.Node[source]

Builds the taxonomy from its description in the file

Parameters:filename (str) – the file with the taxonomy representation in flat-view taxonomy representation (FVTR) format
Returns:the root of the taxonomy built
Return type:Node
leaves

Containts all the leaves of the taxonomy

Parameters:tree (Node) – the root of the taxonomy
Returns:a list of the taxonomy leaves
Return type:List[Node]
root

returns the root of the taxonomy

Returns:the root of the tree
Return type:Node
got.taxonomies.taxonomy.extract_leaves(tree: got.taxonomies.taxonomy.Node) → List[got.taxonomies.taxonomy.Node][source]

Returns all the leaves of the tree / sub-tree

Parameters:tree (Node) – the root of the tree / sub-tree
Returns:a list of the tree / sub-tree leaves
Return type:List[Node]
got.taxonomies.taxonomy.save_leaves(leaves: List[got.taxonomies.taxonomy.Node], filename: str = 'taxonomy_leaves.txt') → None[source]

Saves all the leaves of the tree / sub-tree

Parameters:leaves (List[Node]) – the list of leaves
Returns:
Return type:None

got.taxonomies.visualize module

Taxonomy visualization

got.taxonomies.visualize.draw_lifting_tree(filename: str) → None[source]

Draws a tree from ete3 representation stored in a file

Parameters:filename (str) – a name of the file
Returns:
Return type:None
got.taxonomies.visualize.draw_raw_tree(filename: str) → None[source]

Draws a raw tree from ete3 representation stored in a file

Parameters:filename (str) – a name of the file
Returns:
Return type:None
got.taxonomies.visualize.layout_lift(node: ete3.coretype.tree.TreeNode, levels: int = 3) → None[source]

Layout implementation for a tree node

Parameters:
  • node (TreeNode) – the root of the taxonomy tree / sub-tree
  • levels (int) – a number of tree levels to draw
Returns:

Return type:

None

got.taxonomies.visualize.layout_raw(node: ete3.coretype.tree.TreeNode, tight_mode: bool = True) → None[source]

Layout implementation for a tree node

Parameters:
  • node (TreeNode) – the root of the taxonomy tree / sub-tree
  • tight_mode (bool, default=True) – a mode to print node names more tightly
Returns:

Return type:

None

got.taxonomies.visualize.read_ete3_from_file(filename: str) → str[source]

Reads ete3 representation from the file

Parameters:filename (str) – a name of the file
Returns:content of the file
Return type:str

Module contents