NORDic UTILS module

NORDic.UTILS.DISGENET_utils module

NORDic.UTILS.DISGENET_utils.get_genes_evidences_from_DISGENET(gene_list, disease, limit=3000, source='CURATED', min_score=0, chunksize=100, user_key=None, quiet=False)

Retrieves the references for the association between each gene in the list and the disease

…

Parameters

gene_listPython character string list: list of associated genes
diseasePython character string: Concept ID (CID) from MedGen
limitPython integer: [default=3000] : limit of the number of references
sourcePython character string: [default=”CURATED”] : DisGeNET data sources [“CURATED”,”ANIMAL MODELS”,”INFERRED”,”ALL”] (see https://www.disgenet.org/dbinfo)
min_scorePython float: [default=0] : minimum evidence score
chunksizePython integer: [default=100] : size of chunks (1 chunk per request)
user_keyPython character string or None: [default=None] : API key from DisGeNET
quietPython bool: [default=False] : prints out verbose

Returns

res_dfPandas DataFrame: rows/[row number] x columns/[“gene_symbol”, “sentence”, “associationtype”, “pmid”, “year”, “score”]

NORDic.UTILS.DISGENET_utils.get_genes_proteins_from_DISGENET(disease_list, limit=3000, source='CURATED', min_score=0, min_ei=0, min_dsi=0.25, min_dpi=0, chunksize=100, user_key=None, quiet=False)

Retrieves a list of protein names (and associated gene names) related to the input disease CIDs

…

Parameters

disease_listPython character string list: list of Concept IDs (CID) from Medgen for each disease
limitPython integer: [default=3000] : max. number of proteins
sourcePython character string: [default=”CURATED”] : DisGeNET data sources [“CURATED”,”ANIMAL MODELS”,”INFERRED”,”ALL”] (see https://www.disgenet.org/dbinfo)
min_scorePython float: [default=0] : minimum global score
min_eiPython float: [default=0] : minimimum Evidence Index
min_dsiPython float: [default=0.25] : minimum Disease Specificity Index
min_dpiPython float: [default=0] : minimum Disease Pleiotropy Index
chunksizePython integer: [default=100] : size of chunks (1 chunk per request)
user_keyPython character string or None: [default=None] : API key from DisGeNET
quietPython bool: [default=False] : prints out verbose

Returns

res_dfPandas DataFrame: rows/[Disease CID] x columns/[“Protein”, “Gene Name”] or None if Not found.

NORDic.UTILS.DISGENET_utils.get_user_key_DISGENET(fname)

Retrieves the user key from DisGeNET to call the API

…

Parameters

fnamePython character string: path of text file containing on the first line the email, the second the password

Returns

user_keyPython character string: from DisGeNET

NORDic.UTILS.LINCS_utils module

NORDic.UTILS.LINCS_utils.binarize_via_CD(df, samples, binarize=1, nperm=10000, quiet=False)

Run a differential expression analysis on a dataframe using Characteristic Direction (CD) [1] (implementation: www.maayanlab.net/CD/) [1] doi.org/10.1186/1471-2105-15-79

…

Parameters

dfPandas DataFrame: one transcriptional profile per column (/!if #genes>25,000, then the 25,000 genes with highest variance will be considered)
samplesPython integer list: indicates which columns correspond to control (=1) / treated (=2) samples
binarizePython integer: [default=1] : whether to return a binary signature or a real-valued column ~magnitude of change in expression
npermPython integer: [default=10000] : number of iterations to build the null distribution on which p-values will be computed
quietPython bool: [default=False] : prints out verbose

Returns

signaturePandas DataFrame: rows/[gene index] x columns/[“aggregated”]: 0=down-regulated (DR), 1=up-regulated (UR) (if binarize=1) else <0=DR, >0=UR

NORDic.UTILS.LINCS_utils.build_url(endpoint, method, params, user_key=None)

Builds the request to CLUE API

…

Parameters

endpointPython character string: in [“sigs”, “cells”, “genes”, “perts”, “plates”, “profiles”, “rep_drugs”, “rep_drug_indications”, “pcls”]
methodPython character string: in [“count”, “filter”, “distinct”]
paramsPython dictionary: additional arguments for the request
user_keyPython character string: [default=None] : API key for LINCS CLUE.io

Returns

urlPython character string: URL of request

NORDic.UTILS.LINCS_utils.compute_interference_scale(sigs, samples, entrez_id, is_oe, taxon_id, lincs_specific_ctl_genes, quiet=True, eps=2e-07)

Computes the interference scale [1] which determines whether a genetic perturbation was successful [1] doi.org/10.1002/psp4.12107

…

Parameters

sigsPandas DataFrame: rows/[genes] x columns/[control and treated samples]
samplesPython integer list: contains 1 for control samples, 2 for treated ones for each column of @sigs
entrez_idPython integer: EntrezID of the perturbed gene
is_oePython bool: is the experiment an overexpression of the perturbed gene (is_oe=True) or a knockdown
taxon_idPython integer: NCBI taxonomy ID
lincs_specific_ctl_genesPython string list: list of HNCG gene symbols for housekeeping genes
quietPython bool: [default=True] : prints out verbose
epsPython float: [default=2e-7] : avoids numerical errors for low-expression housekeeping genes

Returns

iscalePython float: interference scale for the input experiment

NORDic.UTILS.LINCS_utils.convert_ctrlgenes_EntrezGene(taxon_id)

Retrieves EntrezID from control genes in LINCS L1000 [1] [1] doi.org/10.1002/psp4.12107

…

Parameters

taxon_idPython integer: NCBI taxonomy ID

Returns

lincs_specific_ctl_genesPython character string list: list of EntrezGene IDs for all genes in input list

NORDic.UTILS.LINCS_utils.create_restricted_drug_signatures(sig_ids, entrezid_list, path_to_lincs, which_lvl=[3], strict=True, quiet=False)

Create dataframe of drug signatures from LINCS L1000 from a subset of signature and gene IDs

…

Parameters

sig_idsPython character string list: list of signature IDs from LINCS L1000 (Level 3: “distil_id”, Level 5: “sig_id”)
entrezid_listPython character string list: list of EntrezIDs
path_to_lincsPython character string: folder in which LINCS L1000-related files are stored
which_lvlPython integer list: [3] for Level 3, [5] for Level 5
strictPython bool: [default=True] : if set to True, if not all signatures are retrieved, then return None. If set to False, return the (sub)set of retrievable signatures
quietPython bool: [default=False] : prints out verbose

Returns

sigs Pandas DataFrame: rows/[genes] x columns/[drugs]

NORDic.UTILS.LINCS_utils.download_file(path_to_lincs, file_name, base_url, file_sha, check_SHA=True, quiet=False)

Downloads automatically LINCS L1000-related files from Gene Expression Omnibus (GEO) (/!can be time-consuming: expect waiting times up to 20 min with a good Internet connection)

…

Parameters

path_to_lincsPython character string: path to local LINCS L1000 folder in which the files will be downloaded
file_namePython character string: file name to download on GEO
base_urlPython character string: path to GEO repository
file_shaPython character string: file name of corresponding SHA hash to check file integrity
check_SHAPython bool: [default=True] : whether to check the file integrity
quietPython bool: [default=False] : prints out verbose

Returns

0Python integer: 0 means that the download was successful

NORDic.UTILS.LINCS_utils.download_lincs_files(path_to_lincs, which_lvl)

Returns and downloads the proper LINCS L1000 files from Gene Expression Omnibus (GEO)

…

Parameters

path_to_lincsPython character string: path to folder in which LINCS L1000-related files will be locally stored
which_lvlPython integer list: LINCS L1000 Level to download (either [3] -normalized gene expression-, [5] -binary experimental signatures-, [3,5])

Returns

file_listPython list of 4 Python character string lists: gene_files, sig_files, lvl3_files, lvl5_files Python lists of character strings

NORDic.UTILS.LINCS_utils.get_treated_control_dataset(treatment, pert_type, cell, filters, entrez_ids, taxon_id, user_key, path_to_lincs, entrez_id=None, selection='distil_ss', dose=None, iunit=None, itime=None, which_lvl=[5], nsigs=2, same_plate=True, quiet=False, trim_w_interference_scale=True, return_metrics=[])

Retrieve set of experimental profiles, with at least nsigs treated and control sample

…

Parameters

treatmentPython character string: HUGO gene symbol
pert_typePython character string: type of perturbation as accepted by LINCS L1000
cellPython character string: cell line existing in LINCS L1000
filtersPython dictionary: additional parameters for the LINCS L1000 requests
entrez_idsPython integer list: EntrezID genes
taxon_idPython integer: NCBI taxonomy ID
user_keyPython character string: LINCS L1000 user API key
path_to_lincsPython character string: path where LINCS L1000 files are locally stored
entrez_idPython integer: EntrezID identifier for HUGO gene symbol treatment
selectionPython character string: [default=”distil_ss”] : LINCS L1000 metric which is maximized by a given experiment
dosePython character string or None: [default=None] : filter by dose (if not None)
iunitPython character string or None: [default=None] : unit of dose
itimePython character string or None: [default=None] : filter by exposure time (if not None)
which_lvlPython integer list: [default=[3]] : LINCS L1000 data level to consider (either 3 or 5)
nsigsPython integer: [default=2] : minimal number of samples of each condition in each experiment
same_platePython bool: [default=True] : select samples from the same plate for each experiment and condition
quietPython bool: [default=True] : prints out verbose
trim_w_interference_scalePython bool: [default=True] : computes the interference scale criteria for further trimming
return_metricsPython character string list: [default=[]] : list of LINCS L1000 metrics to return as the same time as the profiles

Returns

sigsPandas DataFrame: rows/[genes+”annotation”+”signame”+”sigid”] x columns/[profiles] or None

NORDic.UTILS.LINCS_utils.get_user_key(fname)

Retrieves user key for interacting with LINCS L1000 CLUE API

…

Parameters

fnamePython character string: path to file containing credentials for LINCS L1000 (first line: username, second line: password, third line: user key)

Returns

user_keyPython character string: identifier for the LINCS L1000 CLUE API

NORDic.UTILS.LINCS_utils.post_request(url, quiet=True, pause_time=1)

Post request to API

…

Parameters

urlPython character string: URL formatted as in build_url
quietPython bool: [default=True] : prints out verbose
pause_timePython integer: [default=1] : minimum time in seconds between each request

Returns

dataPython dictionary: (JSON) or Python character string list (if request was method=”distinct”)

NORDic.UTILS.LINCS_utils.select_best_sig(params, filters, user_key, selection='distil_ss', nsigs=2, same_plate=True, iunit=None, quiet=False)

Select “best” set of profiles (“experiment”) (in terms of quality, or criterion “selection”) according to filters

…

Parameters

paramsPython dictionary: additional arguments for the request
filtersPython dictionary: additional arguments for filtering the results of the request (defined with params)
selectionPython character string: [default=”distil_ss”] : name of the metric in LINCS L1000 to define the best signature
nsigsPython integer: [default=2] : minimum number of signatures to retrieve
same_platePython bool: [default=True] : whether to retrieve signatures from the same plate or not
iunitPython character string or None: [default=None] : unit of dose (if None, any)
quietPython bool: [default=False] : prints out verbose

Returns

dataPython dictionary list: the list of profile IDs to retrieve from LINCS L1000

NORDic.UTILS.STRING_utils module

NORDic.UTILS.STRING_utils.get_app_name_STRING(fname)

Retrieves app name from STRING to interact with the API

…

Parameters

fnamePython character string: path to file with a unique line = email adress

Returns

app_namePython character string: identifier for the STRING API

NORDic.UTILS.STRING_utils.get_image_from_STRING(my_genes, taxon_id, file_name='network.png', min_score=0, network_flavor='evidence', network_type='functional', app_name=None, version='11.5', quiet=False)

Retrieves protein IDs in STRING associated with input genes in the correct species

…

Parameters

genes_listPython character list: list of gene symbols
taxon_idPython integer: taxon ID from NCBI
file_namePython character string: [default=”network.png”] : image file name
min_scorePython float: [default=0] : confidence lower threshold (in [0,1])
network_flavorPython character string: [default=”evidence”] : show links related to [“confidence”, “action”, “evidence”]
network_typePython character string: [default=”functional”] : show “functional” or “physical” network
app_namePython character string: [default=None] : identifier for STRING
quietPython bool: [default=False] : prints out verbose

Returns

None: writes the network image to a file file_name

NORDic.UTILS.STRING_utils.get_interactions_from_STRING(gene_list, taxon_id, min_score=0, app_name=None, file_folder=None, version='11.0', strict=False, quiet=False)

Retrieves (un)directed and (un)signed physical interactions from the STRING database

…

Parameters

gene_listPython character string list: list of genes
taxon_idPython integer: NCBI taxonomy ID
min_scorePython integer: [default=0] : in [0,1] STRING combined score
app_namePython character string: [default=None] : identifier for STRING
file_folderPython character string: [default=None]: where to save the file from STRING (if None, the file is not saved)
versionPython character string: [default=”v11.0”] : STRING database version
strictPython bool: [default=False] : if set to True, only keep interactions involving genes BOTH in @gene_list
quietPython bool: [default=False] : prints out verbose

Returns

res_dfPandas Dataframe: rows/[interation number] x columns/[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]

NORDic.UTILS.STRING_utils.get_interactions_partners_from_STRING(gene_list, taxon_id, min_score=0, network_type='functional', add_nodes=0, limit=5, app_name=None, version='11.5', quiet=False)

Retrieves undirected and unsigned interactions from the STRING database

…

Parameters

gene_listPython character string list: list of gene symbols
taxon_idPython integer: NCBI taxonomy ID
min_scorePython integer: [default=0] : minimum STRING combined edge score in [0,1]
network_typePython character string: [default=”functional”] : returns “functional” or “physical” network
limitPython integer: [default=5] : limits the number of interaction partners retrieved per protein (most confident interactions come first)
app_namePython character string: [default=None] : identifier for STRING
versionPython character string: [default=”11.5”] : STRING version
quietPython bool: [default=False] : prints out verbose

Returns

networkPandas DataFrame: rows/[row number] x columns/[“preferredName_A”,”preferredName_B”,”score”,”directed”]

NORDic.UTILS.STRING_utils.get_network_from_STRING(gene_list, taxon_id, min_score=0, network_type='functional', add_nodes=0, app_name=None, version='11.5', quiet=False)

Retrieves undirected and unsigned interactions from the STRING database

…

Parameters

gene_listPython character string list: list of gene symbols
taxon_idPython integer: NCBI taxonomy ID
min_scorePython integer: [default=0] : minimum STRING combined edge score in [0,1]
network_typePython character string: [default=”functional”] : returns “functional” or “physical” network
add_nodesPython integer: [default=0] : add nodes in the closest interaction neighborhood involved with the genes in @gene_list if set to 1
app_namePython character string: [default=None] : identifier for STRING
versionPython character string: [default=”11.5”] : STRING version
quietPython bool: [default=False] : prints out verbose

Returns

networkPandas DataFrame: rows/[row number] x columns/[“preferredName_A”,”preferredName_B”,”score”,”directed”]

NORDic.UTILS.STRING_utils.get_protein_names_from_STRING(gene_list, taxon_id, app_name=None, version='11.5', quiet=False)

Retrieves protein IDs in STRING associated with input genes in the correct species

…

Parameters

genes_listPython character list: list of gene symbols
taxon_idPython integer: taxon ID from NCBI
versionPython character string: [default=”11.5”] : STRING version
app_namePython character string: [default=None] : identifier to access STRING
quietPython bool: [default=False] : prints out verbose

Returns

res_dfPandas DataFrame: rows/[row number] x columns/[“queryItem”, “stringId”, “preferredName”, “annotation”]

NORDic.UTILS.STRING_utils.string_api_url(v)

NORDic.UTILS.utils_data module

NORDic.UTILS.utils_data.convert_EntrezGene_LINCSL1000(file_folder, EntrezGenes, user_key, quiet=False)

Converts EntrezIDs to Gene Symbols present in LINCS L1000

…

Parameters

file_folderPython character string: path to folder of intermediate results
EntrezGenesPython character string list: list of EntrezGene IDs
user_keyPython character string: LINCS L1000 user key
quietPython bool: [default=False] : prints out verbose

Returns

PandasPandas DataFrame: rows/[EntrezID] x columns/[“Gene Symbol”,”Entrez ID”] (“-” if they do not exist)

NORDic.UTILS.utils_data.convert_genes_EntrezGene(gene_list, taxon_id, app_name, chunksize=100, missing_genes={'C11ORF74': 'IFTAP', 'ENSP00000451560': 'TPPP2', 'RP11-566K11.2': 'TUBB4'}, quiet=False)

Convert gene symbols into EntrezGene CID

…

Parameters

gene_listPython character string list: list of genes
taxon_idPython character string: NCBI taxonomy ID
app_namePython character string: STRING identifier
missing_genesPython dictionary of character string x character string: known conversions
chunksizePython integer: [default=100] : 1 chunk per request
quietPython bool: [default=False] : prints out verbose

Returns

res_dfPandas DataFrame: rows/[“InputValue”] x columns/[“Gene ID”/might be separated by “; “] (“-” if they do not exist) or None if no identifier has been found

NORDic.UTILS.utils_data.get_all_celllines(pert_inames, user_key, quiet=False)

Get all cell lines in which one gene in the input list has been specifically perturbed (genetic perturbation)

…

Parameters

pert_inamesPython character string: List of genes (symbols from LINCS L1000)
user_keyPython character string: user key from LINCS L1000 CLUE API
quietPython bool: [default=False] : prints out verbose

Returns

cell_linesPython character string list: list of cell lines in which at least one gene from pert_inames has been perturbed

NORDic.UTILS.utils_data.request_biodbnet(probe_list, from_, to_, taxon_id, chunksize=500, quiet=False)

Converts gene identifier from from to to in a given species

…

Parameters

probe_listPython character string list: list of probes to convert (of type from_)
from_Python character string: an identifier type as recognized by BioDBnet
to_Python character string: an identifier type as recognized by BioDBnet
taxonIdPython integer: NCBI taxonomy ID
chunksizePython integer: [default=500] : 1 chunk per request

Returns

res_dfPandas DataFrame: rows/[“InputValue”/from_] x columns[to_] (“-” if the identifier has not been found)

NORDic.UTILS.utils_exp module

NORDic.UTILS.utils_exp.get_experimental_constraints(file_folder, cell_lines, pert_types, pert_di, taxon_id, selection, user_key, path_to_lincs, thres_iscale=None, nsigs=2, quiet=False)

Retrieve experimental profiles from the provided cell lines, perturbation types, list of genes, in the given species (taxon ID)

…

Parameters

file_folderPython character string: folder where to store intermediary results
cell_linesPython character string list: cell lines present in LINCS L1000
pert_typesPython character string list: types of perturbations as supported by LINCS L1000
pert_diPython dictionary: (keys=Python character string, values=Python integer) associates HUGO gene symbols to their EntrezGene IDs
taxon_idPython integer: NCBI taxonomy ID
selectionPython character string: LINCS L1000 metric to maximize
user_keyPython character string: LINCS L1000 user API key
path_to_lincsPython character string: path to local LINCS L1000 files
thres_iscalePython float or None: [default=None] : lower threshold on the interference scale which quantifies the success of a genetic experiment
nsigsPython integer: [default=2] : minimal number of profiles per experiment and condition
quietPython bool: [default=False] : prints out verbose

Returns

signaturesPandas DataFrame: rows/[genes+annotations] x columns/[profile/signature IDs]

NORDic.UTILS.utils_exp.profiles2signatures(profiles_df, user_key, path_to_lincs, save_fname, backgroundfile=False, selection='distil_ss', thres=0.5, bin_method='binary', nbackground_limits=(4, 30), quiet=False)

Convert experimental profiles into signatures (1 for control samples, 1 for treated ones)

…

Parameters

profiles_dfPandas DataFrame: rows/[genes+annotations] x columns/[samples]
user_keyPython character string: LINCS L1000 user API key
path_to_lincsPython character string: path to local LINCS L1000 files
save_fnamePython character string: path to save normalized expression profiles per cell line
background_filePython bool: [default=False] : retrieves from LINCS L1000 supplementary expression values if set to True to compute more precise basal gene expression levels
selectionPython character string: [default=”distil_ss”] : LINCS L1000 metric to maximize for the “background” data
thresPython float: [default=0.5] : threshold for cutoff normalized gene expression values (in [0,0.5])
bin_methodPython character string: [default=”binary”] : binarization approach
nbackground_limitsPython integer tuple: [default=(4,30)] : lower and upper bounds on the number of profiles for the background expression data
quietPython bool: [default=False] : prints out verbose

Returns

signatures_df Pandas DataFrame: rows/[genes] x columns/[signature ID]

NORDic.UTILS.utils_grn module

NORDic.UTILS.utils_grn.CL(influences)

Computes the average of node-wise clustering coefficients. The clustering coefficient of a node is the ratio of the degree of the considered node and the maximum possible number of connections such that this node and its current neighbors form a clique

…

Parameters

influencesPandas DataFrame: rows/[genes] x columns/[genes]

Returns

CLPython float: network clustering coefficient

NORDic.UTILS.utils_grn.Centr(influences)

Computes the network centralization, which is correlated with the similarity of the network to a graph with a star topology

…

Parameters

influencesPandas DataFrame: rows/[genes] x columns/[genes]

Returns

CentrPython float: network centralization

NORDic.UTILS.utils_grn.DS(influences)

Computes the number of edges over the maximum number of possible connections between the nodes in the network

…

Parameters

influencesPandas DataFrame: rows/[genes] x columns/[genes]

Returns

DSPython float: network density

NORDic.UTILS.utils_grn.GT(influences)

Computes the network heterogeneity, which quantifies the non-uniformity of the node degrees across the network

…

Parameters

influencesPandas DataFrame: rows/[genes] x columns/[genes]

Returns

GTPython float: network heterogeneity

NORDic.UTILS.utils_grn.build_influences(network_df, tau, beta=1, cor_method='pearson', expr_df=None, accept_nonRNA=False, quiet=False)

Filters out (and signs of unsigned) edges based on gene expression

…

Parameters

network_dfPandas DataFrame: rows/[index] x columns/[[“Input”, “Output”, “SSign”]] interactions
tauPython float: threshold on genepairwise expression correlation
betaPython integer: [default=1] : power applied to the adjacency matrix
cor_methodPython character string: [default=”pearson”] : type of correlation
expr_dfPandas DataFrame: [default=None] : rows/[genes] x columns/[samples] gene expression data
accept_nonRNAPython bool: [default=False] : if set to False, ignores gene names which are not present in expr_df
quietPython bool: [default=False] : prints out verbose

Returns

influencesPandas DataFrame: rows/[genes] x columns/[genes] signed adjacency matrix with only interactions s.t. corr^beta>=tau

NORDic.UTILS.utils_grn.build_observations(grn, signatures, quiet=False)

Implement experimental constraints from perturbation experiments in signatures. Experimental constraints are of the form Initial state masked by single-gene perturbation can lead to a steady attractor state Final

…

Parameters

grnInfluenceGraph (from BoneSiS): contains topological constraints
signaturesPandas DataFrame: rows/[genes] x columns/[experiment IDs]. Experiment IDs is of the form “<pert. gene>_<pert. type>_<…>_<cell line>” (treated) or “initial_<cell line>” (control)
quietPython bool: [default=False] : prints out verbose

Returns

BOBoNesis object (from BoneSiS): BoNesis object which can be evaluated

NORDic.UTILS.utils_grn.create_grn(influences, exact=False, max_maxclause=3, quiet=False)

Create a BoneSiS InfluenceGraph

…

Parameters

influencesPandas DataFrame: rows/[genes] x columns/[genes] of interactions, values in {-1,1,0} -1:negative,1:positive,0:absent
exactPython bool: [default=False] : should all interactions be preserved?
max_maxclausePython integer: [default=3] : upper bound on the number of clauses in DNF form
quietPython bool: [default=False] : prints out verbose

Returns

grnBoneSiS InfluenceGraph class object: BoneSiS GRN object

NORDic.UTILS.utils_grn.desirability(x, f_weight_di, A=0, B=1)

Harrington’s desirability function, used by [1] Converts a list of functions to maximize into a single scalar function to maximize with values in [@A,@B]

[1] http://ceur-ws.org/Vol-2488/paper17.pdf: https://cran.r-project.org/web/packages/desirability/vignettes/desirability.pdf

…

Parameters

xdatapoint: any input to functions in f_weight_di
f_weight_diPython dictionary: function with arguments as the same type as x, and associated weight
APython float: [default=0] : lower bound of the function interval
BPython float: [default=1] : upper bound of the function interval

Returns

des(x)Python float: value of the desirability function at point x

NORDic.UTILS.utils_grn.general_topological_parameter(influences, weights)

Computes the general topological parameter (GTP) associated with the input network

…

Parameters

influencesPandas DataFrame: rows/[genes] x columns/[genes]
weightsPython dictionary of (Python character string x Python float): all keys must be in [“DS”,”CL”,”Centr”,”GT”]

Returns

scorePython float: score using the Harrington’s desirability function

NORDic.UTILS.utils_grn.get_genes_downstream(network_fname, gene, n=-1)

Get the list of genes downstream of a gene in a network

…

Parameters

network_fnamePython character string: path to the .BNET file associated with the network
genePython character string: gene name in the network
nPython integer: [default=-1] : number of recursions (if<0, recursively get all downstream genes)

Returns

lst_downstreamPython character string list: list of nodes downstream of @gene

NORDic.UTILS.utils_grn.get_genes_interactions_from_PPI(ppi, connected=False, score=0, filtering=True, quiet=False)

Filtering edges to decrease computational cost while preserving network connectivity (if needed)

…

Parameters

ppiPandas DataFrame: rows/[index] x columns[{“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]; sign in {-1,1,2}, directed in {0,1}, score in [0,1]
connectedPython bool: [default=True] : if set to True, preserve/enforce connectivity on the final network
scorePython float: [default=0] : Lower bound on the edge-associated score
filteringPython bool: [default=True] : Whether to filter out edges by a correlation threshold
quietPython bool: [default=False] : prints out verbose

Returns

ppi_acceptedPandas DataFrame: rows/[index] x columns/[[“Input”, “Output”]]

NORDic.UTILS.utils_grn.get_genes_most_variable(control_profiles, treated_profiles, p=0.8)

Get the list of genes which contribute most to the variation between two conditions (in the @pth percentile of change)

…

Parameters

control_profilesPandas DataFrame: rows/[genes] x columns/[samples] profiles from condition 1
treated_profilesPandas DataFrame: rows/[genes] x columns/[samples] profiles from condition 1
pPython float: 100*p th percentile to consider

Returns

lst_genesPython character string list: list of nodes which contribute most to the variation between conditions

NORDic.UTILS.utils_grn.get_grfs_from_solution(solution)

Retrieve all gene regulatory functions (GRFs) from a given solution

…

Parameters

solutionPandas Series: rows/[genes]

Returns

grfsPython dictionary: {gene: {regulator: sign, …}, …} where sign in {-1,1} -1: inhibitor, 1: activator

NORDic.UTILS.utils_grn.get_maxdegree(influences, activatory=True, quiet=False)

Computes the maximum ingoing degree (or the maximum number of potential activatory regulators) in a graph

…

Parameters

influencesPandas DataFrame: rows/[genes] x columns/[genes] of interactions: -1:negative,1:positive,0:absent
activatoryPython bool: [default=True] : computes the maximum number of potential activatory regulators instead
quietPython bool: [default=False] : prints out verbose

Returns

maxindegreePython integer: maximum ingoing degree (or the maximum number of potential activatory regulators)

NORDic.UTILS.utils_grn.get_minimal_edges(R, maximal=False)

Return one of the solutions with the smallest (or greatest) number of edges

…

Parameters

RPandas DataFrame: rows/[genes] x columns/[solution IDs]
connectedPython bool: [default=False] : if set to True, return the CONNECTED solution which satisfies those constraints
maximalPython bool: [default=False] : if set to True, return the solution with the greatest number of edges

Returns

solution, nedgesPython integer x Python integer: solution and corresponding number of edges

NORDic.UTILS.utils_grn.get_weakly_connected(network_df, gene_list, index_col='preferredName_A', column_col='preferredName_B', score_col='sscore')

Depth-first search (DFS) on undirected network

…

Parameters

network_dfPandas DataFrame: rows/[index] x columns/[[“Input”,”Output”]]
gene_listPython character string list: list of genes (needed to take into account isolated genes in the network)
index_colPython character string: [default=”preferredName_A”] : column in network_df (input gene)
column_colPython character string: [default=”preferredName_B”] : column in network_df (output gene)
score_colPython character string: [default=”sscore”] : column in network_df (edge weight)

Returns

componentsType of @network_df.loc[network_df.index[0]][“Input”] Python list of Python list: list of weakly connected components in the network, ordered by decreasing size

NORDic.UTILS.utils_grn.infer_network(BO, njobs=1, fname='solutions', use_diverse=True, limit=50, niterations=1)

Infer solutions matching topological & experimental constraints

…

Parameters

BOBonesis object (from BoneSiS): contains topological & experimental constraints
fnamePython character string: [default=”solutions”] : path to solution files
use_diversePython bool: [default=True] : use the “diverse” procedure in BoneSiS
limitPython integer: [default=50] : maximum number of solutions to generate per interation
niterationsPython integer: [default=1] : maximum number of iterations

Returns

nsolutionsPython integer: list of # solutions per iteration

NORDic.UTILS.utils_grn.load_grn(fname)

Loads GRN as MPBN class element

…

Parameters

fnamePython character string: BNET file

Returns

BNmpbn.MPBooleanNetwork object: Boolean network with Most Permissive semantics

NORDic.UTILS.utils_grn.reconnect_network(network_fname)

Write the network with all isolated nodes (no ingoing/outgoing edges) filtered out

…

Parameters

network_fnamePython character string: path to the .BNET associated with the network

Returns

fnamePython character string: path to the .BNET associated with the reconnected network

NORDic.UTILS.utils_grn.save_grn(solution, fname, sep=', ', quiet=False, max_show=5, write=True)

Write and/or print .bnet file

…

Parameters

solutionPandas Series: rows/[genes] contains gene regulatory functions (GRF)
fnamePython character string: where to write the file (w/o .bnet extension)
sepPython character string: what separates regulators from regulated genes
quietPython bool: [default=False] : prints out verbose
max_showPython integer: [default=5] : maximum number of printed GRFs
writePython bool: [default=True] : if set to True, write to a .bnet file

Returns

None: writes the GRN to a file fname

NORDic.UTILS.utils_grn.save_solutions(bnetworks, fname, limit)

Enumerate and save solutions

…

Parameters

bnetworksBonesis object: Output of the inference
fnamePython character string: ZIP filename to store the solutions
limitPython integer: maximum number of solutions to enumerate

Parameters

nPython integer: number of enumerated solutions

NORDic.UTILS.utils_grn.solution2influences(solution)

Converts a solution object into a influences object

…

Parameters

solutionPandas Series: rows/[genes]

Returns

influencesPandas DataFrame: rows/[genes] x columns/[genes] contains values {-1,1,0,2} -1: negative, 1: positive, 0: absent, 2: non monotonic

NORDic.UTILS.utils_grn.zip2df(fname)

Extract solutions in ZIP file as DataFrames

…

Parameters

fnamePython character string: zip file which contains BNET solutions

Returns

solutionsPandas DataFrame: rows/[genes] x columns/[solutions] the GRFs for each gene in each solution

NORDic.UTILS.utils_network module

NORDic.UTILS.utils_network.aggregate_networks(file_folder, gene_list, taxon_id, min_score, network_type, app_name, version_net='11.5', version_act='11.0', quiet=0)

This function performs the following pipeline to build a prior knowledge network based on a subset of genes - Retrieve protein actions and predicted PPIs from STRING - Merge the two networks while solving all inconsistencies (duplicates, paradoxes, etc.) in signs, directions, scores - Determine the greatest threshold on the edge score which allows all of the core gene set to be connected (binary search) - Trim out edges which scores are below the threshold, and remove all isolated nodes

…

Parameters

file_folderPython character string: relative path where to store files
gene_listPython character string list: list of core gene symbols to preserve in the network
taxon_idPython integer: NCBI taxonomy ID
min_scorePython integer: minimum score on edges retrieved from the STRING database
app_namePython character string: Identifier for STRING requests
version_netPython character string: [default=”11.5”] : Number of version for interaction data in the STRING database. To avoid compatibility issues, it is strongly advised not to change this parameter
version_actPython character string: [default=”11.0”] : Number of version for protein action data in the STRING database. To avoid compatibility issues, it is strongly advised not to change this parameter
quietPython bool: [default=None] : prints out verbose

Returns

final_networkPandas DataFrame: rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]

NORDic.UTILS.utils_network.capture()

NORDic.UTILS.utils_network.determine_edge_threshold(network, core_gene_set, quiet=True)

Determine the greatest threshold on the edge score which allows all of the core gene set to be connected (binary search)

…

Parameters

networkPandas DataFrame: rows/[interactions] x at least three columns “preferredName_A” (input node), “preferredName_B” (output node), “score” (edge score)
core_gene_setPython character string list: list of genes that should remain connected
quietPython bool: [default=None]: prints out verbose

Returns

tPython float: maximum threshold which allows the connection of all genes in the core set

NORDic.UTILS.utils_network.get_network_from_OmniPath(gene_list=None, disease_name=None, species='human', sources_int='omnipath', domains_int=None, types_int=None, min_curation_effort=-1, domains_annot='HPA_tissue', quiet=False)

Retrieve a network from OmniPath

…

Parameters

gene_listPython character string: [default=None] : List of genes to consider (or do not filter the interactions from Omnipath if =None)
disease_namePython character string: [default=None] : Disease name (in letters) to consider
speciesPython character string: [default=None] : Species to consider (either “human”, “mouse”, or “rat”)
sources_intPython character string: [default=None] : Which databases for interactions to consider (if =None, consider them all)
domains_intPython character string: [default=None] : source of interactions in OmniPath
types_intPython character string: [default=None] : Types of interactions, e.g., “post_translational”, “transcriptional”, “post_transcriptional”, “mirna_transcriptional”
min_curation_effortPython integer: [default=-1] : if positive, select edges based on that criteria (the higher, the better). Counts the unique database-citation pairs, i.e. how many times was an interaction described in a paper and mentioned in a database
domain_annotPython character string: [default=’HPA_tissue’] : source of annotations in OmniPath
quietPython bool: [default=False] : prints out verbose

Returns

final_networkPandas DataFrame: rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]
annot_widePandas DataFrame: rows/[gene symbols] x columns/[annotations from the database @domains_annot]

NORDic.UTILS.utils_network.merge_network_PPI(network, PPI, quiet=True)

Merge two network while solving all inconsistencies (duplicates, paradoxes, etc.) in signs, directions, scores

…

Parameters

networkPandas DataFrame: rows/[interactions] x at least three columns “preferredName_A” (input node), “preferredName_B” (output node), “score” (edge score)
PPIPandas DataFrame: rows/[interactions] x at least three columns “preferredName_A” (input node), “preferredName_B” (output node), “score” (edge score)
quietPython bool: [default=None] : prints out verbose

Returns

final_networkPandas DataFrame: rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]

NORDic.UTILS.utils_network.remove_isolated(network, quiet=False)

Remove all nodes which do not belong to the largest connected component from the network

…

Parameters

networkPandas DataFrame: rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]
quietPython bool: [default=None] : prints out verbose

Returns

trimmed_networkPandas DataFrame: rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]

NORDic.UTILS.utils_plot module

NORDic.UTILS.utils_plot.influences2graph(influences, fname, optional=False, compile2png=True, engine='sfdp')

Plots a network by conversion to a DOT file and then to PNG

…

Parameters

influencesPandas DataFrame: rows/[genes] x columns/[genes], contains {-1,1,2}
fnamePython character string: filename of png file
optionalPython bool: [default=False] : should interactions be drawn as optional (dashed lines)?

Returns

None: writes a DOT file which can be converted to PNG image (if compile2png=True)

NORDic.UTILS.utils_plot.plot_boxplots(scores, patient_scores, ground_truth=None, fsize=12, msize=5, fname='boxplots.pdf')

Plots one boxplot per treatment (all values obtained on patient profiles)

…

Parameters

scoresPandas DataFrame: rows/[drug names] x column/[value]
patient_scoresPandas DataFrame: rows/[drug names] x columns/[patient samples]
ground_truthPandas DataFrame: [default=None] : rows/[drug names] x column/[class] Values in 1: treatment, 0: unknown, -1: aggravating. If not provided: does not color boxplots according to the class
fsizePython integer: [default=18] : font size
msizePython integer: [default=5] : marker size
fnamePython character string: [default=”boxplots”] : file name for the plot

Returns

None: create boxplots of reward scores across patients for each drug

NORDic.UTILS.utils_plot.plot_discrete_distributions(signatures, fname='signature_expression_distribution.png')

Plots the distributions (histograms) of genes with determined status across signatures

…

Parameters

signaturesPandas DataFrame: rows/[genes] x columns/[samples] with values in {0,NaN,1}. Determined status is either 0 or 1.
fnamePython character string: [default=”signature_expression_distribution.png”] : file name

Returns

None: plots the number of genes with expression values 0, 1 or NaN in each signature

NORDic.UTILS.utils_plot.plot_distributions(profiles, fname='gene_expression_distribution.png', thres=None)

Plots the distributions (boxplots) of gene expression across samples for each gene, and the selected threshold for binarization

…

Parameters

profilesPandas DataFrame: rows/[genes+annotations] x columns/[samples]
fnamePython character string: [default=”gene_expression_distribution.png”] : file name
thresPython float or None: [default=None] : binarization threshold (if there is any)

Returns

None: plots boxplots of expression for each gene in profiles

NORDic.UTILS.utils_plot.plot_heatmap(X, ground_truth=None, fname='heatmap.pdf', w=20, h=20, bfsize=20, fsize=20, rot=75)

Plots an heatmap of the signatures, with the potential ground truth

…

Parameters

XPandas DataFrame: rows/[features] x columns/[drug names]
ground_truthPandas DataFrame: [default=None] : rows/[drug names] x column/[class] Values in 1: treatment, 0: unknown, -1: aggravating. If not provided: does not color boxplots according to the class
fnamePython character string: [default=”heatmap.pdf”] : file name for the plot
wPython integer: [default=20] : figure width
hPython integer: [default=20] : figure height
bfsizePython integer: [default=20] : font size in the color bar
rotPython integer: [default=75] : rotation angle of labels

Returns

None: plots an heatmap of similarity across drugs based on the Pearson correlation

NORDic.UTILS.utils_plot.plot_influence_graph(network_df, input_col, output_col, sign_col, direction_col=None, fname='graph.png', optional=True)

Converts a network into a PNG picture

…

Parameters

network_dfPandas DataFrame: rows/[index] x columns/[input_col,output_col,sign_col]
input_col,output_col,sign_col,direction_colPython character string: columns of network_df
fnamePython character string: [default=”graph.png”] : file name for PNG picture
optionalPython bool: [default=True] : should edges be plotted as dashed lines?

Returns

None: Creates a image of the graph in file fname

NORDic.UTILS.utils_plot.plot_precision_recall(pr, prs, tr, beta=1, thres=0.5, fname='PRC.pdf', method_name='predictor', fsize=18)

Plots a Precision-Recall curve (with variations across samples)

…

Parameters

prPandas DataFrame: rows/[drug names] x column/[value]
prsPandas DataFrame: rows/[drug names] x columns/[patient samples]
trPandas DataFrame: [default=None] : rows/[drug names] x column/[class]
betaPython float: [default=1] : value of coefficient beta for the F-measure
thresPython float: [default=0.5] : decision threshold
fnamePython character string: [default=”PRC.pdf”] : file name for the plot
method_namePython character string: [default=”predictor”] : name of the predictor
fsizePython integer: [default=18] : font size

Returns

None: Plots a Precision-Recall curve based on the drug repurposing predictions

NORDic.UTILS.utils_plot.plot_roc_curve(pr, prs, tr, fname='ROC.pdf', method_name='predictor', fsize=18)

Plots a ROC curve (with variations across samples)

…

Parameters

prPandas DataFrame: rows/[drug names] x column/[value]
prsPandas DataFrame: rows/[drug names] x columns/[patient samples]
trPandas DataFrame: [default=None] : rows/[drug names] x column/[class]
fnamePython character string: [default=”ROC.pdf”] : file name for the plot
method_namePython character string: [default=”predictor”] : name of the predictor
fsizePython integer: [default=18] : font size

Returns

None: Plots a ROC curve based on the drug repurposing predictions

NORDic.UTILS.utils_plot.plot_signatures(signatures, perturbed_genes=None, width=10, height=10, max_show=50, fname='signatures')

Print signatures

…

Parameters

signaturesPandas DataFrame: rows/[genes] x columns/[signature IDs]
perturbed_genesPython character string list: [default=None] : list of gene names perturbed in the signatures
width, heightPython integer: [default=10] : dimensions of image
max_showPython integer: [default=50] : maximum number of genes shown (as only the @max_show genes with highest variance across signatures are plotted)
fnamePython character string: [default=”signatures”] : path of resulting PNG image

Returns

None: plots the signatures as heatmaps in file fname

NORDic.UTILS.utils_sim module

class NORDic.UTILS.utils_sim.BN_SIM(seednb=0, njobs=None)

Bases: object

add_initial_states(initial, final=None)

add_permanent_mutation(mutation)

add_transient_mutation(mutation)

attrs_similarity(attrs1, attrs2, gene_outputs=None)

boxplot()

enumerate_attractors(verbose=False)

generate_trajectories(params={}, outputs=[])

initialize_network(network_fname)

up_to_attractors(network_fname, initial, final, mutation_permanent={}, mutation_transient={}, verbose=True)

update_network(network_fname, initial, final=None, mutation_permanent={}, mutation_transient={}, verbose=True)

class NORDic.UTILS.utils_sim.BONESIS_SIM(seednb=0, njobs=None)

Bases: BN_SIM

add_initial_states(initial, final)

add_permanent_mutation(mutation)

add_transient_mutation(mutation)

enumerate_attractors(verbose=True)

generate_trajectories(params={}, outputs=[])

initialize_network(network_fname)

class NORDic.UTILS.utils_sim.MABOSS_SIM(seednb=0, njobs=None)

Bases: BN_SIM

add_initial_states(initial, final=None)

add_permanent_mutation(mutation)

add_transient_mutation(mutation)

enumerate_attractors(verbose=True)

generate_trajectories(params={}, outputs=[])

initialize_network(network_fname)

class NORDic.UTILS.utils_sim.MPBN_SIM(seednb=0, njobs=None)

Bases: BN_SIM

add_initial_states(initial, final=None)

add_permanent_mutation(mutation)

add_transient_mutation(mutation)

enumerate_attractors(max_attrs=-1, verbose=True)

generate_trajectories(params={}, outputs=[], show_plot=True)

initialize_network(network_fname)

NORDic.UTILS.utils_sim.capture()

NORDic.UTILS.utils_sim.choice(a, size=None, replace=True, p=None)

Generates a random sample from a given 1-D array

New in version 1.7.0.

Note

New code should use the ~numpy.random.Generator.choice method of a ~numpy.random.Generator instance instead; please see the random-quick-start.

Parameters

a1-D array-like or int: If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were np.arange(a)
sizeint or tuple of ints, optional: Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
replaceboolean, optional: Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times.
p1-D array-like, optional: The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.

Returns

samplessingle item or ndarray: The generated random samples

Raises

ValueError: If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size

Notes

Setting user-specified probabilities through p uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element of p is 1 / len(a).

Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its axis keyword.

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4]) # random
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0]) # random

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0]) # random
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0]) # random

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random
      dtype='<U11')

NORDic.UTILS.utils_sim.test(enumerator, seednb, njobs, network_fname, control_profile, treated_profiles, compare_to, mutation_permanent={}, mutation_transient={}, gene_outputs=None, print_boxplot=False, verbose=True)

NORDic.UTILS.utils_state module

NORDic.UTILS.utils_state.binarize_experiments(data, thres=0.5, method='binary', strict=True, njobs=1)

Binarize experimental profiles

…

Parameters

dataPandas DataFrame: rows/[genes] x columns/[samples]
thresPython float: [default=0.5] : threshold for @method=”binary” (in [0,0.5])
methodPython character string: [default=”binary”] : binarization method in [“binary”,”probin”]
strictPython bool: [default=True] : takes into account equalities (if set to True, value=thres will lead to undefined for the corresponding gene)
njobsPython integer: [default=1] : parallelism if needed

Returns

signatures : Pandas DataFrame: rows/[genes] x columns[samples] with values in [0,1,NaN]

NORDic.UTILS.utils_state.compare_states(x, y, genes=None)

Computes the similarity between two sets of Boolean states

…

Parameters

xPandas DataFrame: rows/[genes] x columns/[state IDs] contains (0, 1, NaN)
yPandas DataFrame: rows/[genes] x columns/[state IDs] contains (0, 1, NaN)
genesPython character string list: list of gene symbols

Returns

simsNumPy array: similarities between each column of x and each columns of y, on the list of N present genes in genes (if provided) otherwise on the union of N genes in x and y
NPython integer: number of genes on which the similarity is computed

NORDic.UTILS.utils_state.finetune_binthres(df, samples, network_fname, mutation, step=0.005, maxt=0.5, mint=0, score_binthres=<function <lambda>>, njobs=1, verbose=True)

Select the binarization threshold (in function @binarize_experiments) which maximize the dissimilarity interconditions and the similarity intracondition …

Parameters

dfPandas DataFrame: rows/[genes] x columns/[samples]: profiles
samplesPython character string list: annotations of conditions for each sample in df
network_fnamePython character string: file name containing the network
mutationPython dictionary: dictionary (key=gene, value=perturbation type) gene perturbations which are considered
stepPython float: [default=0.005] step in the interval to look for the threshold value
maxtPython float: [default=0.5] maximum threshold value
mintPython float: [default=0.] minimum threshold value
score_binthresPython lambda function: [default=lambda itc,ita_c,ita_t:(1-itc)*ita_c*ita_t] fitness function for the threshold value
njobsPython integer: [default=1] number of parallel jobs
verbosePython bool: [default=True] prints out verbose

Returns

max_thresPython float: threshold value maximizing the fitness function

NORDic.UTILS.utils_state.quantile_normalize(df, njobs=1)