NORDic UTILS module

NORDic.UTILS.DISGENET_utils module

NORDic.UTILS.DISGENET_utils.get_genes_evidences_from_DISGENET(gene_list, disease, limit=3000, source='CURATED', min_score=0, chunksize=100, user_key=None, quiet=False)

Retrieves the references for the association between each gene in the list and the disease

Parameters

gene_listPython character string list

list of associated genes

diseasePython character string

Concept ID (CID) from MedGen

limitPython integer

[default=3000] : limit of the number of references

sourcePython character string

[default=”CURATED”] : DisGeNET data sources [“CURATED”,”ANIMAL MODELS”,”INFERRED”,”ALL”] (see https://www.disgenet.org/dbinfo)

min_scorePython float

[default=0] : minimum evidence score

chunksizePython integer

[default=100] : size of chunks (1 chunk per request)

user_keyPython character string or None

[default=None] : API key from DisGeNET

quietPython bool

[default=False] : prints out verbose

Returns

res_dfPandas DataFrame

rows/[row number] x columns/[“gene_symbol”, “sentence”, “associationtype”, “pmid”, “year”, “score”]

NORDic.UTILS.DISGENET_utils.get_genes_proteins_from_DISGENET(disease_list, limit=3000, source='CURATED', min_score=0, min_ei=0, min_dsi=0.25, min_dpi=0, chunksize=100, user_key=None, quiet=False)

Retrieves a list of protein names (and associated gene names) related to the input disease CIDs

Parameters

disease_listPython character string list

list of Concept IDs (CID) from Medgen for each disease

limitPython integer

[default=3000] : max. number of proteins

sourcePython character string

[default=”CURATED”] : DisGeNET data sources [“CURATED”,”ANIMAL MODELS”,”INFERRED”,”ALL”] (see https://www.disgenet.org/dbinfo)

min_scorePython float

[default=0] : minimum global score

min_eiPython float

[default=0] : minimimum Evidence Index

min_dsiPython float

[default=0.25] : minimum Disease Specificity Index

min_dpiPython float

[default=0] : minimum Disease Pleiotropy Index

chunksizePython integer

[default=100] : size of chunks (1 chunk per request)

user_keyPython character string or None

[default=None] : API key from DisGeNET

quietPython bool

[default=False] : prints out verbose

Returns

res_dfPandas DataFrame

rows/[Disease CID] x columns/[“Protein”, “Gene Name”] or None if Not found.

NORDic.UTILS.DISGENET_utils.get_user_key_DISGENET(fname)

Retrieves the user key from DisGeNET to call the API

Parameters

fnamePython character string

path of text file containing on the first line the email, the second the password

Returns

user_keyPython character string

from DisGeNET

NORDic.UTILS.LINCS_utils module

NORDic.UTILS.LINCS_utils.binarize_via_CD(df, samples, binarize=1, nperm=10000, quiet=False)

Run a differential expression analysis on a dataframe using Characteristic Direction (CD) [1] (implementation: www.maayanlab.net/CD/) [1] doi.org/10.1186/1471-2105-15-79

Parameters

dfPandas DataFrame

one transcriptional profile per column (/!if #genes>25,000, then the 25,000 genes with highest variance will be considered)

samplesPython integer list

indicates which columns correspond to control (=1) / treated (=2) samples

binarizePython integer

[default=1] : whether to return a binary signature or a real-valued column ~magnitude of change in expression

npermPython integer

[default=10000] : number of iterations to build the null distribution on which p-values will be computed

quietPython bool

[default=False] : prints out verbose

Returns

signaturePandas DataFrame

rows/[gene index] x columns/[“aggregated”]: 0=down-regulated (DR), 1=up-regulated (UR) (if binarize=1) else <0=DR, >0=UR

NORDic.UTILS.LINCS_utils.build_url(endpoint, method, params, user_key=None)

Builds the request to CLUE API

Parameters

endpointPython character string

in [“sigs”, “cells”, “genes”, “perts”, “plates”, “profiles”, “rep_drugs”, “rep_drug_indications”, “pcls”]

methodPython character string

in [“count”, “filter”, “distinct”]

paramsPython dictionary

additional arguments for the request

user_keyPython character string

[default=None] : API key for LINCS CLUE.io

Returns

urlPython character string

URL of request

NORDic.UTILS.LINCS_utils.compute_interference_scale(sigs, samples, entrez_id, is_oe, taxon_id, lincs_specific_ctl_genes, quiet=True, eps=2e-07)

Computes the interference scale [1] which determines whether a genetic perturbation was successful [1] doi.org/10.1002/psp4.12107

Parameters

sigsPandas DataFrame

rows/[genes] x columns/[control and treated samples]

samplesPython integer list

contains 1 for control samples, 2 for treated ones for each column of @sigs

entrez_idPython integer

EntrezID of the perturbed gene

is_oePython bool

is the experiment an overexpression of the perturbed gene (is_oe=True) or a knockdown

taxon_idPython integer

NCBI taxonomy ID

lincs_specific_ctl_genesPython string list

list of HNCG gene symbols for housekeeping genes

quietPython bool

[default=True] : prints out verbose

epsPython float

[default=2e-7] : avoids numerical errors for low-expression housekeeping genes

Returns

iscalePython float

interference scale for the input experiment

NORDic.UTILS.LINCS_utils.convert_ctrlgenes_EntrezGene(taxon_id)

Retrieves EntrezID from control genes in LINCS L1000 [1] [1] doi.org/10.1002/psp4.12107

Parameters

taxon_idPython integer

NCBI taxonomy ID

Returns

lincs_specific_ctl_genesPython character string list

list of EntrezGene IDs for all genes in input list

NORDic.UTILS.LINCS_utils.create_restricted_drug_signatures(sig_ids, entrezid_list, path_to_lincs, which_lvl=[3], strict=True, quiet=False)

Create dataframe of drug signatures from LINCS L1000 from a subset of signature and gene IDs

Parameters

sig_idsPython character string list

list of signature IDs from LINCS L1000 (Level 3: “distil_id”, Level 5: “sig_id”)

entrezid_listPython character string list

list of EntrezIDs

path_to_lincsPython character string

folder in which LINCS L1000-related files are stored

which_lvlPython integer list

[3] for Level 3, [5] for Level 5

strictPython bool

[default=True] : if set to True, if not all signatures are retrieved, then return None. If set to False, return the (sub)set of retrievable signatures

quietPython bool

[default=False] : prints out verbose

Returns

sigs Pandas DataFrame

rows/[genes] x columns/[drugs]

NORDic.UTILS.LINCS_utils.download_file(path_to_lincs, file_name, base_url, file_sha, check_SHA=True, quiet=False)

Downloads automatically LINCS L1000-related files from Gene Expression Omnibus (GEO) (/!can be time-consuming: expect waiting times up to 20 min with a good Internet connection)

Parameters

path_to_lincsPython character string

path to local LINCS L1000 folder in which the files will be downloaded

file_namePython character string

file name to download on GEO

base_urlPython character string

path to GEO repository

file_shaPython character string

file name of corresponding SHA hash to check file integrity

check_SHAPython bool

[default=True] : whether to check the file integrity

quietPython bool

[default=False] : prints out verbose

Returns

0Python integer

0 means that the download was successful

NORDic.UTILS.LINCS_utils.download_lincs_files(path_to_lincs, which_lvl)

Returns and downloads the proper LINCS L1000 files from Gene Expression Omnibus (GEO)

Parameters

path_to_lincsPython character string

path to folder in which LINCS L1000-related files will be locally stored

which_lvlPython integer list

LINCS L1000 Level to download (either [3] -normalized gene expression-, [5] -binary experimental signatures-, [3,5])

Returns

file_listPython list of 4 Python character string lists

gene_files, sig_files, lvl3_files, lvl5_files Python lists of character strings

NORDic.UTILS.LINCS_utils.get_treated_control_dataset(treatment, pert_type, cell, filters, entrez_ids, taxon_id, user_key, path_to_lincs, entrez_id=None, selection='distil_ss', dose=None, iunit=None, itime=None, which_lvl=[5], nsigs=2, same_plate=True, quiet=False, trim_w_interference_scale=True, return_metrics=[])

Retrieve set of experimental profiles, with at least nsigs treated and control sample

Parameters

treatmentPython character string

HUGO gene symbol

pert_typePython character string

type of perturbation as accepted by LINCS L1000

cellPython character string

cell line existing in LINCS L1000

filtersPython dictionary

additional parameters for the LINCS L1000 requests

entrez_idsPython integer list

EntrezID genes

taxon_idPython integer

NCBI taxonomy ID

user_keyPython character string

LINCS L1000 user API key

path_to_lincsPython character string

path where LINCS L1000 files are locally stored

entrez_idPython integer

EntrezID identifier for HUGO gene symbol treatment

selectionPython character string

[default=”distil_ss”] : LINCS L1000 metric which is maximized by a given experiment

dosePython character string or None

[default=None] : filter by dose (if not None)

iunitPython character string or None

[default=None] : unit of dose

itimePython character string or None

[default=None] : filter by exposure time (if not None)

which_lvlPython integer list

[default=[3]] : LINCS L1000 data level to consider (either 3 or 5)

nsigsPython integer

[default=2] : minimal number of samples of each condition in each experiment

same_platePython bool

[default=True] : select samples from the same plate for each experiment and condition

quietPython bool

[default=True] : prints out verbose

trim_w_interference_scalePython bool

[default=True] : computes the interference scale criteria for further trimming

return_metricsPython character string list

[default=[]] : list of LINCS L1000 metrics to return as the same time as the profiles

Returns

sigsPandas DataFrame

rows/[genes+”annotation”+”signame”+”sigid”] x columns/[profiles] or None

NORDic.UTILS.LINCS_utils.get_user_key(fname)

Retrieves user key for interacting with LINCS L1000 CLUE API

Parameters

fnamePython character string

path to file containing credentials for LINCS L1000 (first line: username, second line: password, third line: user key)

Returns

user_keyPython character string

identifier for the LINCS L1000 CLUE API

NORDic.UTILS.LINCS_utils.post_request(url, quiet=True, pause_time=1)

Post request to API

Parameters

urlPython character string

URL formatted as in build_url

quietPython bool

[default=True] : prints out verbose

pause_timePython integer

[default=1] : minimum time in seconds between each request

Returns

dataPython dictionary

(JSON) or Python character string list (if request was method=”distinct”)

NORDic.UTILS.LINCS_utils.select_best_sig(params, filters, user_key, selection='distil_ss', nsigs=2, same_plate=True, iunit=None, quiet=False)

Select “best” set of profiles (“experiment”) (in terms of quality, or criterion “selection”) according to filters

Parameters

paramsPython dictionary

additional arguments for the request

filtersPython dictionary

additional arguments for filtering the results of the request (defined with params)

selectionPython character string

[default=”distil_ss”] : name of the metric in LINCS L1000 to define the best signature

nsigsPython integer

[default=2] : minimum number of signatures to retrieve

same_platePython bool

[default=True] : whether to retrieve signatures from the same plate or not

iunitPython character string or None

[default=None] : unit of dose (if None, any)

quietPython bool

[default=False] : prints out verbose

Returns

dataPython dictionary list

the list of profile IDs to retrieve from LINCS L1000

NORDic.UTILS.STRING_utils module

NORDic.UTILS.STRING_utils.get_app_name_STRING(fname)

Retrieves app name from STRING to interact with the API

Parameters

fnamePython character string

path to file with a unique line = email adress

Returns

app_namePython character string

identifier for the STRING API

NORDic.UTILS.STRING_utils.get_image_from_STRING(my_genes, taxon_id, file_name='network.png', min_score=0, network_flavor='evidence', network_type='functional', app_name=None, version='11.5', quiet=False)

Retrieves protein IDs in STRING associated with input genes in the correct species

Parameters

genes_listPython character list

list of gene symbols

taxon_idPython integer

taxon ID from NCBI

file_namePython character string

[default=”network.png”] : image file name

min_scorePython float

[default=0] : confidence lower threshold (in [0,1])

network_flavorPython character string

[default=”evidence”] : show links related to [“confidence”, “action”, “evidence”]

network_typePython character string

[default=”functional”] : show “functional” or “physical” network

app_namePython character string

[default=None] : identifier for STRING

quietPython bool

[default=False] : prints out verbose

Returns

None

writes the network image to a file file_name

NORDic.UTILS.STRING_utils.get_interactions_from_STRING(gene_list, taxon_id, min_score=0, app_name=None, file_folder=None, version='11.0', strict=False, quiet=False)

Retrieves (un)directed and (un)signed physical interactions from the STRING database

Parameters

gene_listPython character string list

list of genes

taxon_idPython integer

NCBI taxonomy ID

min_scorePython integer

[default=0] : in [0,1] STRING combined score

app_namePython character string

[default=None] : identifier for STRING

file_folderPython character string

[default=None]: where to save the file from STRING (if None, the file is not saved)

versionPython character string

[default=”v11.0”] : STRING database version

strictPython bool

[default=False] : if set to True, only keep interactions involving genes BOTH in @gene_list

quietPython bool

[default=False] : prints out verbose

Returns

res_dfPandas Dataframe

rows/[interation number] x columns/[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]

NORDic.UTILS.STRING_utils.get_interactions_partners_from_STRING(gene_list, taxon_id, min_score=0, network_type='functional', add_nodes=0, limit=5, app_name=None, version='11.5', quiet=False)

Retrieves undirected and unsigned interactions from the STRING database

Parameters

gene_listPython character string list

list of gene symbols

taxon_idPython integer

NCBI taxonomy ID

min_scorePython integer

[default=0] : minimum STRING combined edge score in [0,1]

network_typePython character string

[default=”functional”] : returns “functional” or “physical” network

limitPython integer

[default=5] : limits the number of interaction partners retrieved per protein (most confident interactions come first)

app_namePython character string

[default=None] : identifier for STRING

versionPython character string

[default=”11.5”] : STRING version

quietPython bool

[default=False] : prints out verbose

Returns

networkPandas DataFrame

rows/[row number] x columns/[“preferredName_A”,”preferredName_B”,”score”,”directed”]

NORDic.UTILS.STRING_utils.get_network_from_STRING(gene_list, taxon_id, min_score=0, network_type='functional', add_nodes=0, app_name=None, version='11.5', quiet=False)

Retrieves undirected and unsigned interactions from the STRING database

Parameters

gene_listPython character string list

list of gene symbols

taxon_idPython integer

NCBI taxonomy ID

min_scorePython integer

[default=0] : minimum STRING combined edge score in [0,1]

network_typePython character string

[default=”functional”] : returns “functional” or “physical” network

add_nodesPython integer

[default=0] : add nodes in the closest interaction neighborhood involved with the genes in @gene_list if set to 1

app_namePython character string

[default=None] : identifier for STRING

versionPython character string

[default=”11.5”] : STRING version

quietPython bool

[default=False] : prints out verbose

Returns

networkPandas DataFrame

rows/[row number] x columns/[“preferredName_A”,”preferredName_B”,”score”,”directed”]

NORDic.UTILS.STRING_utils.get_protein_names_from_STRING(gene_list, taxon_id, app_name=None, version='11.5', quiet=False)

Retrieves protein IDs in STRING associated with input genes in the correct species

Parameters

genes_listPython character list

list of gene symbols

taxon_idPython integer

taxon ID from NCBI

versionPython character string

[default=”11.5”] : STRING version

app_namePython character string

[default=None] : identifier to access STRING

quietPython bool

[default=False] : prints out verbose

Returns

res_dfPandas DataFrame

rows/[row number] x columns/[“queryItem”, “stringId”, “preferredName”, “annotation”]

NORDic.UTILS.STRING_utils.string_api_url(v)

NORDic.UTILS.utils_data module

NORDic.UTILS.utils_data.convert_EntrezGene_LINCSL1000(file_folder, EntrezGenes, user_key, quiet=False)

Converts EntrezIDs to Gene Symbols present in LINCS L1000

Parameters

file_folderPython character string

path to folder of intermediate results

EntrezGenesPython character string list

list of EntrezGene IDs

user_keyPython character string

LINCS L1000 user key

quietPython bool

[default=False] : prints out verbose

Returns

PandasPandas DataFrame

rows/[EntrezID] x columns/[“Gene Symbol”,”Entrez ID”] (“-” if they do not exist)

NORDic.UTILS.utils_data.convert_genes_EntrezGene(gene_list, taxon_id, app_name, chunksize=100, missing_genes={'C11ORF74': 'IFTAP', 'ENSP00000451560': 'TPPP2', 'RP11-566K11.2': 'TUBB4'}, quiet=False)

Convert gene symbols into EntrezGene CID

Parameters

gene_listPython character string list

list of genes

taxon_idPython character string

NCBI taxonomy ID

app_namePython character string

STRING identifier

missing_genesPython dictionary of character string x character string

known conversions

chunksizePython integer

[default=100] : 1 chunk per request

quietPython bool

[default=False] : prints out verbose

Returns

res_dfPandas DataFrame

rows/[“InputValue”] x columns/[“Gene ID”/might be separated by “; “] (“-” if they do not exist) or None if no identifier has been found

NORDic.UTILS.utils_data.get_all_celllines(pert_inames, user_key, quiet=False)

Get all cell lines in which one gene in the input list has been specifically perturbed (genetic perturbation)

Parameters

pert_inamesPython character string

List of genes (symbols from LINCS L1000)

user_keyPython character string

user key from LINCS L1000 CLUE API

quietPython bool

[default=False] : prints out verbose

Returns

cell_linesPython character string list

list of cell lines in which at least one gene from pert_inames has been perturbed

NORDic.UTILS.utils_data.request_biodbnet(probe_list, from_, to_, taxon_id, chunksize=500, quiet=False)

Converts gene identifier from from to to in a given species

Parameters

probe_listPython character string list

list of probes to convert (of type from_)

from_Python character string

an identifier type as recognized by BioDBnet

to_Python character string

an identifier type as recognized by BioDBnet

taxonIdPython integer

NCBI taxonomy ID

chunksizePython integer

[default=500] : 1 chunk per request

Returns

res_dfPandas DataFrame

rows/[“InputValue”/from_] x columns[to_] (“-” if the identifier has not been found)

NORDic.UTILS.utils_exp module

NORDic.UTILS.utils_exp.get_experimental_constraints(file_folder, cell_lines, pert_types, pert_di, taxon_id, selection, user_key, path_to_lincs, thres_iscale=None, nsigs=2, quiet=False)

Retrieve experimental profiles from the provided cell lines, perturbation types, list of genes, in the given species (taxon ID)

Parameters

file_folderPython character string

folder where to store intermediary results

cell_linesPython character string list

cell lines present in LINCS L1000

pert_typesPython character string list

types of perturbations as supported by LINCS L1000

pert_diPython dictionary

(keys=Python character string, values=Python integer) associates HUGO gene symbols to their EntrezGene IDs

taxon_idPython integer

NCBI taxonomy ID

selectionPython character string

LINCS L1000 metric to maximize

user_keyPython character string

LINCS L1000 user API key

path_to_lincsPython character string

path to local LINCS L1000 files

thres_iscalePython float or None

[default=None] : lower threshold on the interference scale which quantifies the success of a genetic experiment

nsigsPython integer

[default=2] : minimal number of profiles per experiment and condition

quietPython bool

[default=False] : prints out verbose

Returns

signaturesPandas DataFrame

rows/[genes+annotations] x columns/[profile/signature IDs]

NORDic.UTILS.utils_exp.profiles2signatures(profiles_df, user_key, path_to_lincs, save_fname, backgroundfile=False, selection='distil_ss', thres=0.5, bin_method='binary', nbackground_limits=(4, 30), quiet=False)

Convert experimental profiles into signatures (1 for control samples, 1 for treated ones)

Parameters

profiles_dfPandas DataFrame

rows/[genes+annotations] x columns/[samples]

user_keyPython character string

LINCS L1000 user API key

path_to_lincsPython character string

path to local LINCS L1000 files

save_fnamePython character string

path to save normalized expression profiles per cell line

background_filePython bool

[default=False] : retrieves from LINCS L1000 supplementary expression values if set to True to compute more precise basal gene expression levels

selectionPython character string

[default=”distil_ss”] : LINCS L1000 metric to maximize for the “background” data

thresPython float

[default=0.5] : threshold for cutoff normalized gene expression values (in [0,0.5])

bin_methodPython character string

[default=”binary”] : binarization approach

nbackground_limitsPython integer tuple

[default=(4,30)] : lower and upper bounds on the number of profiles for the background expression data

quietPython bool

[default=False] : prints out verbose

Returns

signatures_df Pandas DataFrame

rows/[genes] x columns/[signature ID]

NORDic.UTILS.utils_grn module

NORDic.UTILS.utils_grn.CL(influences)

Computes the average of node-wise clustering coefficients. The clustering coefficient of a node is the ratio of the degree of the considered node and the maximum possible number of connections such that this node and its current neighbors form a clique

Parameters

influencesPandas DataFrame

rows/[genes] x columns/[genes]

Returns

CLPython float

network clustering coefficient

NORDic.UTILS.utils_grn.Centr(influences)

Computes the network centralization, which is correlated with the similarity of the network to a graph with a star topology

Parameters

influencesPandas DataFrame

rows/[genes] x columns/[genes]

Returns

CentrPython float

network centralization

NORDic.UTILS.utils_grn.DS(influences)

Computes the number of edges over the maximum number of possible connections between the nodes in the network

Parameters

influencesPandas DataFrame

rows/[genes] x columns/[genes]

Returns

DSPython float

network density

NORDic.UTILS.utils_grn.GT(influences)

Computes the network heterogeneity, which quantifies the non-uniformity of the node degrees across the network

Parameters

influencesPandas DataFrame

rows/[genes] x columns/[genes]

Returns

GTPython float

network heterogeneity

NORDic.UTILS.utils_grn.build_influences(network_df, tau, beta=1, cor_method='pearson', expr_df=None, accept_nonRNA=False, quiet=False)

Filters out (and signs of unsigned) edges based on gene expression

Parameters

network_dfPandas DataFrame

rows/[index] x columns/[[“Input”, “Output”, “SSign”]] interactions

tauPython float

threshold on genepairwise expression correlation

betaPython integer

[default=1] : power applied to the adjacency matrix

cor_methodPython character string

[default=”pearson”] : type of correlation

expr_dfPandas DataFrame

[default=None] : rows/[genes] x columns/[samples] gene expression data

accept_nonRNAPython bool

[default=False] : if set to False, ignores gene names which are not present in expr_df

quietPython bool

[default=False] : prints out verbose

Returns

influencesPandas DataFrame

rows/[genes] x columns/[genes] signed adjacency matrix with only interactions s.t. corr^beta>=tau

NORDic.UTILS.utils_grn.build_observations(grn, signatures, quiet=False)

Implement experimental constraints from perturbation experiments in signatures. Experimental constraints are of the form Initial state masked by single-gene perturbation can lead to a steady attractor state Final

Parameters

grnInfluenceGraph (from BoneSiS)

contains topological constraints

signaturesPandas DataFrame

rows/[genes] x columns/[experiment IDs]. Experiment IDs is of the form “<pert. gene>_<pert. type>_<…>_<cell line>” (treated) or “initial_<cell line>” (control)

quietPython bool

[default=False] : prints out verbose

Returns

BOBoNesis object (from BoneSiS)

BoNesis object which can be evaluated

NORDic.UTILS.utils_grn.create_grn(influences, exact=False, max_maxclause=3, quiet=False)

Create a BoneSiS InfluenceGraph

Parameters

influencesPandas DataFrame

rows/[genes] x columns/[genes] of interactions, values in {-1,1,0} -1:negative,1:positive,0:absent

exactPython bool

[default=False] : should all interactions be preserved?

max_maxclausePython integer

[default=3] : upper bound on the number of clauses in DNF form

quietPython bool

[default=False] : prints out verbose

Returns

grnBoneSiS InfluenceGraph class object

BoneSiS GRN object

NORDic.UTILS.utils_grn.desirability(x, f_weight_di, A=0, B=1)

Harrington’s desirability function, used by [1] Converts a list of functions to maximize into a single scalar function to maximize with values in [@A,@B]

[1] http://ceur-ws.org/Vol-2488/paper17.pdf

https://cran.r-project.org/web/packages/desirability/vignettes/desirability.pdf

Parameters

xdatapoint

any input to functions in f_weight_di

f_weight_diPython dictionary

function with arguments as the same type as x, and associated weight

APython float

[default=0] : lower bound of the function interval

BPython float

[default=1] : upper bound of the function interval

Returns

des(x)Python float

value of the desirability function at point x

NORDic.UTILS.utils_grn.general_topological_parameter(influences, weights)

Computes the general topological parameter (GTP) associated with the input network

Parameters

influencesPandas DataFrame

rows/[genes] x columns/[genes]

weightsPython dictionary of (Python character string x Python float)

all keys must be in [“DS”,”CL”,”Centr”,”GT”]

Returns

scorePython float

score using the Harrington’s desirability function

NORDic.UTILS.utils_grn.get_genes_downstream(network_fname, gene, n=-1)

Get the list of genes downstream of a gene in a network

Parameters

network_fnamePython character string

path to the .BNET file associated with the network

genePython character string

gene name in the network

nPython integer

[default=-1] : number of recursions (if<0, recursively get all downstream genes)

Returns

lst_downstreamPython character string list

list of nodes downstream of @gene

NORDic.UTILS.utils_grn.get_genes_interactions_from_PPI(ppi, connected=False, score=0, filtering=True, quiet=False)

Filtering edges to decrease computational cost while preserving network connectivity (if needed)

Parameters

ppiPandas DataFrame

rows/[index] x columns[{“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]; sign in {-1,1,2}, directed in {0,1}, score in [0,1]

connectedPython bool

[default=True] : if set to True, preserve/enforce connectivity on the final network

scorePython float

[default=0] : Lower bound on the edge-associated score

filteringPython bool

[default=True] : Whether to filter out edges by a correlation threshold

quietPython bool

[default=False] : prints out verbose

Returns

ppi_acceptedPandas DataFrame

rows/[index] x columns/[[“Input”, “Output”]]

NORDic.UTILS.utils_grn.get_genes_most_variable(control_profiles, treated_profiles, p=0.8)

Get the list of genes which contribute most to the variation between two conditions (in the @pth percentile of change)

Parameters

control_profilesPandas DataFrame

rows/[genes] x columns/[samples] profiles from condition 1

treated_profilesPandas DataFrame

rows/[genes] x columns/[samples] profiles from condition 1

pPython float

100*p th percentile to consider

Returns

lst_genesPython character string list

list of nodes which contribute most to the variation between conditions

NORDic.UTILS.utils_grn.get_grfs_from_solution(solution)

Retrieve all gene regulatory functions (GRFs) from a given solution

Parameters

solutionPandas Series

rows/[genes]

Returns

grfsPython dictionary

{gene: {regulator: sign, …}, …} where sign in {-1,1} -1: inhibitor, 1: activator

NORDic.UTILS.utils_grn.get_maxdegree(influences, activatory=True, quiet=False)

Computes the maximum ingoing degree (or the maximum number of potential activatory regulators) in a graph

Parameters

influencesPandas DataFrame

rows/[genes] x columns/[genes] of interactions: -1:negative,1:positive,0:absent

activatoryPython bool

[default=True] : computes the maximum number of potential activatory regulators instead

quietPython bool

[default=False] : prints out verbose

Returns

maxindegreePython integer

maximum ingoing degree (or the maximum number of potential activatory regulators)

NORDic.UTILS.utils_grn.get_minimal_edges(R, maximal=False)

Return one of the solutions with the smallest (or greatest) number of edges

Parameters

RPandas DataFrame

rows/[genes] x columns/[solution IDs]

connectedPython bool

[default=False] : if set to True, return the CONNECTED solution which satisfies those constraints

maximalPython bool

[default=False] : if set to True, return the solution with the greatest number of edges

Returns

solution, nedgesPython integer x Python integer

solution and corresponding number of edges

NORDic.UTILS.utils_grn.get_weakly_connected(network_df, gene_list, index_col='preferredName_A', column_col='preferredName_B', score_col='sscore')

Depth-first search (DFS) on undirected network

Parameters

network_dfPandas DataFrame

rows/[index] x columns/[[“Input”,”Output”]]

gene_listPython character string list

list of genes (needed to take into account isolated genes in the network)

index_colPython character string

[default=”preferredName_A”] : column in network_df (input gene)

column_colPython character string

[default=”preferredName_B”] : column in network_df (output gene)

score_colPython character string

[default=”sscore”] : column in network_df (edge weight)

Returns

componentsType of @network_df.loc[network_df.index[0]][“Input”] Python list of Python list

list of weakly connected components in the network, ordered by decreasing size

NORDic.UTILS.utils_grn.infer_network(BO, njobs=1, fname='solutions', use_diverse=True, limit=50, niterations=1)

Infer solutions matching topological & experimental constraints

Parameters

BOBonesis object (from BoneSiS)

contains topological & experimental constraints

fnamePython character string

[default=”solutions”] : path to solution files

use_diversePython bool

[default=True] : use the “diverse” procedure in BoneSiS

limitPython integer

[default=50] : maximum number of solutions to generate per interation

niterationsPython integer

[default=1] : maximum number of iterations

Returns

nsolutionsPython integer

list of # solutions per iteration

NORDic.UTILS.utils_grn.load_grn(fname)

Loads GRN as MPBN class element

Parameters

fnamePython character string

BNET file

Returns

BNmpbn.MPBooleanNetwork object

Boolean network with Most Permissive semantics

NORDic.UTILS.utils_grn.reconnect_network(network_fname)

Write the network with all isolated nodes (no ingoing/outgoing edges) filtered out

Parameters

network_fnamePython character string

path to the .BNET associated with the network

Returns

fnamePython character string

path to the .BNET associated with the reconnected network

NORDic.UTILS.utils_grn.save_grn(solution, fname, sep=', ', quiet=False, max_show=5, write=True)

Write and/or print .bnet file

Parameters

solutionPandas Series

rows/[genes] contains gene regulatory functions (GRF)

fnamePython character string

where to write the file (w/o .bnet extension)

sepPython character string

what separates regulators from regulated genes

quietPython bool

[default=False] : prints out verbose

max_showPython integer

[default=5] : maximum number of printed GRFs

writePython bool

[default=True] : if set to True, write to a .bnet file

Returns

None

writes the GRN to a file fname

NORDic.UTILS.utils_grn.save_solutions(bnetworks, fname, limit)

Enumerate and save solutions

Parameters

bnetworksBonesis object

Output of the inference

fnamePython character string

ZIP filename to store the solutions

limitPython integer

maximum number of solutions to enumerate

Parameters

nPython integer

number of enumerated solutions

NORDic.UTILS.utils_grn.solution2influences(solution)

Converts a solution object into a influences object

Parameters

solutionPandas Series

rows/[genes]

Returns

influencesPandas DataFrame

rows/[genes] x columns/[genes] contains values {-1,1,0,2} -1: negative, 1: positive, 0: absent, 2: non monotonic

NORDic.UTILS.utils_grn.zip2df(fname)

Extract solutions in ZIP file as DataFrames

Parameters

fnamePython character string

zip file which contains BNET solutions

Returns

solutionsPandas DataFrame

rows/[genes] x columns/[solutions] the GRFs for each gene in each solution

NORDic.UTILS.utils_network module

NORDic.UTILS.utils_network.aggregate_networks(file_folder, gene_list, taxon_id, min_score, network_type, app_name, version_net='11.5', version_act='11.0', quiet=0)

This function performs the following pipeline to build a prior knowledge network based on a subset of genes - Retrieve protein actions and predicted PPIs from STRING - Merge the two networks while solving all inconsistencies (duplicates, paradoxes, etc.) in signs, directions, scores - Determine the greatest threshold on the edge score which allows all of the core gene set to be connected (binary search) - Trim out edges which scores are below the threshold, and remove all isolated nodes

Parameters

file_folderPython character string

relative path where to store files

gene_listPython character string list

list of core gene symbols to preserve in the network

taxon_idPython integer

NCBI taxonomy ID

min_scorePython integer

minimum score on edges retrieved from the STRING database

app_namePython character string

Identifier for STRING requests

version_netPython character string

[default=”11.5”] : Number of version for interaction data in the STRING database. To avoid compatibility issues, it is strongly advised not to change this parameter

version_actPython character string

[default=”11.0”] : Number of version for protein action data in the STRING database. To avoid compatibility issues, it is strongly advised not to change this parameter

quietPython bool

[default=None] : prints out verbose

Returns

final_networkPandas DataFrame

rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]

NORDic.UTILS.utils_network.capture()
NORDic.UTILS.utils_network.determine_edge_threshold(network, core_gene_set, quiet=True)

Determine the greatest threshold on the edge score which allows all of the core gene set to be connected (binary search)

Parameters

networkPandas DataFrame

rows/[interactions] x at least three columns “preferredName_A” (input node), “preferredName_B” (output node), “score” (edge score)

core_gene_setPython character string list

list of genes that should remain connected

quietPython bool

[default=None]: prints out verbose

Returns

tPython float

maximum threshold which allows the connection of all genes in the core set

NORDic.UTILS.utils_network.get_network_from_OmniPath(gene_list=None, disease_name=None, species='human', sources_int='omnipath', domains_int=None, types_int=None, min_curation_effort=-1, domains_annot='HPA_tissue', quiet=False)

Retrieve a network from OmniPath

Parameters

gene_listPython character string

[default=None] : List of genes to consider (or do not filter the interactions from Omnipath if =None)

disease_namePython character string

[default=None] : Disease name (in letters) to consider

speciesPython character string

[default=None] : Species to consider (either “human”, “mouse”, or “rat”)

sources_intPython character string

[default=None] : Which databases for interactions to consider (if =None, consider them all)

domains_intPython character string

[default=None] : source of interactions in OmniPath

types_intPython character string

[default=None] : Types of interactions, e.g., “post_translational”, “transcriptional”, “post_transcriptional”, “mirna_transcriptional”

min_curation_effortPython integer

[default=-1] : if positive, select edges based on that criteria (the higher, the better). Counts the unique database-citation pairs, i.e. how many times was an interaction described in a paper and mentioned in a database

domain_annotPython character string

[default=’HPA_tissue’] : source of annotations in OmniPath

quietPython bool

[default=False] : prints out verbose

Returns

final_networkPandas DataFrame

rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]

annot_widePandas DataFrame

rows/[gene symbols] x columns/[annotations from the database @domains_annot]

NORDic.UTILS.utils_network.merge_network_PPI(network, PPI, quiet=True)

Merge two network while solving all inconsistencies (duplicates, paradoxes, etc.) in signs, directions, scores

Parameters

networkPandas DataFrame

rows/[interactions] x at least three columns “preferredName_A” (input node), “preferredName_B” (output node), “score” (edge score)

PPIPandas DataFrame

rows/[interactions] x at least three columns “preferredName_A” (input node), “preferredName_B” (output node), “score” (edge score)

quietPython bool

[default=None] : prints out verbose

Returns

final_networkPandas DataFrame

rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]

NORDic.UTILS.utils_network.remove_isolated(network, quiet=False)

Remove all nodes which do not belong to the largest connected component from the network

Parameters

networkPandas DataFrame

rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]

quietPython bool

[default=None] : prints out verbose

Returns

trimmed_networkPandas DataFrame

rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]

NORDic.UTILS.utils_plot module

NORDic.UTILS.utils_plot.influences2graph(influences, fname, optional=False, compile2png=True, engine='sfdp')

Plots a network by conversion to a DOT file and then to PNG

Parameters

influencesPandas DataFrame

rows/[genes] x columns/[genes], contains {-1,1,2}

fnamePython character string

filename of png file

optionalPython bool

[default=False] : should interactions be drawn as optional (dashed lines)?

Returns

None

writes a DOT file which can be converted to PNG image (if compile2png=True)

NORDic.UTILS.utils_plot.plot_boxplots(scores, patient_scores, ground_truth=None, fsize=12, msize=5, fname='boxplots.pdf')

Plots one boxplot per treatment (all values obtained on patient profiles)

Parameters

scoresPandas DataFrame

rows/[drug names] x column/[value]

patient_scoresPandas DataFrame

rows/[drug names] x columns/[patient samples]

ground_truthPandas DataFrame

[default=None] : rows/[drug names] x column/[class] Values in 1: treatment, 0: unknown, -1: aggravating. If not provided: does not color boxplots according to the class

fsizePython integer

[default=18] : font size

msizePython integer

[default=5] : marker size

fnamePython character string

[default=”boxplots”] : file name for the plot

Returns

None

create boxplots of reward scores across patients for each drug

NORDic.UTILS.utils_plot.plot_discrete_distributions(signatures, fname='signature_expression_distribution.png')

Plots the distributions (histograms) of genes with determined status across signatures

Parameters

signaturesPandas DataFrame

rows/[genes] x columns/[samples] with values in {0,NaN,1}. Determined status is either 0 or 1.

fnamePython character string

[default=”signature_expression_distribution.png”] : file name

Returns

None

plots the number of genes with expression values 0, 1 or NaN in each signature

NORDic.UTILS.utils_plot.plot_distributions(profiles, fname='gene_expression_distribution.png', thres=None)

Plots the distributions (boxplots) of gene expression across samples for each gene, and the selected threshold for binarization

Parameters

profilesPandas DataFrame

rows/[genes+annotations] x columns/[samples]

fnamePython character string

[default=”gene_expression_distribution.png”] : file name

thresPython float or None

[default=None] : binarization threshold (if there is any)

Returns

None

plots boxplots of expression for each gene in profiles

NORDic.UTILS.utils_plot.plot_heatmap(X, ground_truth=None, fname='heatmap.pdf', w=20, h=20, bfsize=20, fsize=20, rot=75)

Plots an heatmap of the signatures, with the potential ground truth

Parameters

XPandas DataFrame

rows/[features] x columns/[drug names]

ground_truthPandas DataFrame

[default=None] : rows/[drug names] x column/[class] Values in 1: treatment, 0: unknown, -1: aggravating. If not provided: does not color boxplots according to the class

fnamePython character string

[default=”heatmap.pdf”] : file name for the plot

wPython integer

[default=20] : figure width

hPython integer

[default=20] : figure height

bfsizePython integer

[default=20] : font size in the color bar

rotPython integer

[default=75] : rotation angle of labels

Returns

None

plots an heatmap of similarity across drugs based on the Pearson correlation

NORDic.UTILS.utils_plot.plot_influence_graph(network_df, input_col, output_col, sign_col, direction_col=None, fname='graph.png', optional=True)

Converts a network into a PNG picture

Parameters

network_dfPandas DataFrame

rows/[index] x columns/[input_col,output_col,sign_col]

input_col,output_col,sign_col,direction_colPython character string

columns of network_df

fnamePython character string

[default=”graph.png”] : file name for PNG picture

optionalPython bool

[default=True] : should edges be plotted as dashed lines?

Returns

None

Creates a image of the graph in file fname

NORDic.UTILS.utils_plot.plot_precision_recall(pr, prs, tr, beta=1, thres=0.5, fname='PRC.pdf', method_name='predictor', fsize=18)

Plots a Precision-Recall curve (with variations across samples)

Parameters

prPandas DataFrame

rows/[drug names] x column/[value]

prsPandas DataFrame

rows/[drug names] x columns/[patient samples]

trPandas DataFrame

[default=None] : rows/[drug names] x column/[class]

betaPython float

[default=1] : value of coefficient beta for the F-measure

thresPython float

[default=0.5] : decision threshold

fnamePython character string

[default=”PRC.pdf”] : file name for the plot

method_namePython character string

[default=”predictor”] : name of the predictor

fsizePython integer

[default=18] : font size

Returns

None

Plots a Precision-Recall curve based on the drug repurposing predictions

NORDic.UTILS.utils_plot.plot_roc_curve(pr, prs, tr, fname='ROC.pdf', method_name='predictor', fsize=18)

Plots a ROC curve (with variations across samples)

Parameters

prPandas DataFrame

rows/[drug names] x column/[value]

prsPandas DataFrame

rows/[drug names] x columns/[patient samples]

trPandas DataFrame

[default=None] : rows/[drug names] x column/[class]

fnamePython character string

[default=”ROC.pdf”] : file name for the plot

method_namePython character string

[default=”predictor”] : name of the predictor

fsizePython integer

[default=18] : font size

Returns

None

Plots a ROC curve based on the drug repurposing predictions

NORDic.UTILS.utils_plot.plot_signatures(signatures, perturbed_genes=None, width=10, height=10, max_show=50, fname='signatures')

Print signatures

Parameters

signaturesPandas DataFrame

rows/[genes] x columns/[signature IDs]

perturbed_genesPython character string list

[default=None] : list of gene names perturbed in the signatures

width, heightPython integer

[default=10] : dimensions of image

max_showPython integer

[default=50] : maximum number of genes shown (as only the @max_show genes with highest variance across signatures are plotted)

fnamePython character string

[default=”signatures”] : path of resulting PNG image

Returns

None

plots the signatures as heatmaps in file fname

NORDic.UTILS.utils_sim module

class NORDic.UTILS.utils_sim.BN_SIM(seednb=0, njobs=None)

Bases: object

add_initial_states(initial, final=None)
add_permanent_mutation(mutation)
add_transient_mutation(mutation)
attrs_similarity(attrs1, attrs2, gene_outputs=None)
boxplot()
enumerate_attractors(verbose=False)
generate_trajectories(params={}, outputs=[])
initialize_network(network_fname)
up_to_attractors(network_fname, initial, final, mutation_permanent={}, mutation_transient={}, verbose=True)
update_network(network_fname, initial, final=None, mutation_permanent={}, mutation_transient={}, verbose=True)
class NORDic.UTILS.utils_sim.BONESIS_SIM(seednb=0, njobs=None)

Bases: BN_SIM

add_initial_states(initial, final)
add_permanent_mutation(mutation)
add_transient_mutation(mutation)
enumerate_attractors(verbose=True)
generate_trajectories(params={}, outputs=[])
initialize_network(network_fname)
class NORDic.UTILS.utils_sim.MABOSS_SIM(seednb=0, njobs=None)

Bases: BN_SIM

add_initial_states(initial, final=None)
add_permanent_mutation(mutation)
add_transient_mutation(mutation)
enumerate_attractors(verbose=True)
generate_trajectories(params={}, outputs=[])
initialize_network(network_fname)
class NORDic.UTILS.utils_sim.MPBN_SIM(seednb=0, njobs=None)

Bases: BN_SIM

add_initial_states(initial, final=None)
add_permanent_mutation(mutation)
add_transient_mutation(mutation)
enumerate_attractors(max_attrs=-1, verbose=True)
generate_trajectories(params={}, outputs=[], show_plot=True)
initialize_network(network_fname)
NORDic.UTILS.utils_sim.capture()
NORDic.UTILS.utils_sim.choice(a, size=None, replace=True, p=None)

Generates a random sample from a given 1-D array

New in version 1.7.0.

Note

New code should use the ~numpy.random.Generator.choice method of a ~numpy.random.Generator instance instead; please see the random-quick-start.

Parameters

a1-D array-like or int

If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were np.arange(a)

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

replaceboolean, optional

Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times.

p1-D array-like, optional

The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.

Returns

samplessingle item or ndarray

The generated random samples

Raises

ValueError

If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size

See Also

randint, shuffle, permutation random.Generator.choice: which should be used in new code

Notes

Setting user-specified probabilities through p uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element of p is 1 / len(a).

Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its axis keyword.

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4]) # random
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0]) # random

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0]) # random
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0]) # random

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random
      dtype='<U11')
NORDic.UTILS.utils_sim.test(enumerator, seednb, njobs, network_fname, control_profile, treated_profiles, compare_to, mutation_permanent={}, mutation_transient={}, gene_outputs=None, print_boxplot=False, verbose=True)

NORDic.UTILS.utils_state module

NORDic.UTILS.utils_state.binarize_experiments(data, thres=0.5, method='binary', strict=True, njobs=1)

Binarize experimental profiles

Parameters

dataPandas DataFrame

rows/[genes] x columns/[samples]

thresPython float

[default=0.5] : threshold for @method=”binary” (in [0,0.5])

methodPython character string

[default=”binary”] : binarization method in [“binary”,”probin”]

strictPython bool

[default=True] : takes into account equalities (if set to True, value=thres will lead to undefined for the corresponding gene)

njobsPython integer

[default=1] : parallelism if needed

Returns

signatures : Pandas DataFrame: rows/[genes] x columns[samples] with values in [0,1,NaN]

NORDic.UTILS.utils_state.compare_states(x, y, genes=None)

Computes the similarity between two sets of Boolean states

Parameters

xPandas DataFrame

rows/[genes] x columns/[state IDs] contains (0, 1, NaN)

yPandas DataFrame

rows/[genes] x columns/[state IDs] contains (0, 1, NaN)

genesPython character string list

list of gene symbols

Returns

simsNumPy array

similarities between each column of x and each columns of y, on the list of N present genes in genes (if provided) otherwise on the union of N genes in x and y

NPython integer

number of genes on which the similarity is computed

NORDic.UTILS.utils_state.finetune_binthres(df, samples, network_fname, mutation, step=0.005, maxt=0.5, mint=0, score_binthres=<function <lambda>>, njobs=1, verbose=True)

Select the binarization threshold (in function @binarize_experiments) which maximize the dissimilarity interconditions and the similarity intracondition …

Parameters

dfPandas DataFrame

rows/[genes] x columns/[samples]: profiles

samplesPython character string list

annotations of conditions for each sample in df

network_fnamePython character string

file name containing the network

mutationPython dictionary

dictionary (key=gene, value=perturbation type) gene perturbations which are considered

stepPython float

[default=0.005] step in the interval to look for the threshold value

maxtPython float

[default=0.5] maximum threshold value

mintPython float

[default=0.] minimum threshold value

score_binthresPython lambda function

[default=lambda itc,ita_c,ita_t:(1-itc)*ita_c*ita_t] fitness function for the threshold value

njobsPython integer

[default=1] number of parallel jobs

verbosePython bool

[default=True] prints out verbose

Returns

max_thresPython float

threshold value maximizing the fitness function

NORDic.UTILS.utils_state.quantile_normalize(df, njobs=1)