NORDic UTILS module
NORDic.UTILS.DISGENET_utils module
- NORDic.UTILS.DISGENET_utils.get_genes_evidences_from_DISGENET(gene_list, disease, limit=3000, source='CURATED', min_score=0, chunksize=100, user_key=None, quiet=False)
Retrieves the references for the association between each gene in the list and the disease
…
Parameters
- gene_listPython character string list
list of associated genes
- diseasePython character string
Concept ID (CID) from MedGen
- limitPython integer
[default=3000] : limit of the number of references
- sourcePython character string
[default=”CURATED”] : DisGeNET data sources [“CURATED”,”ANIMAL MODELS”,”INFERRED”,”ALL”] (see https://www.disgenet.org/dbinfo)
- min_scorePython float
[default=0] : minimum evidence score
- chunksizePython integer
[default=100] : size of chunks (1 chunk per request)
- user_keyPython character string or None
[default=None] : API key from DisGeNET
- quietPython bool
[default=False] : prints out verbose
Returns
- res_dfPandas DataFrame
rows/[row number] x columns/[“gene_symbol”, “sentence”, “associationtype”, “pmid”, “year”, “score”]
- NORDic.UTILS.DISGENET_utils.get_genes_proteins_from_DISGENET(disease_list, limit=3000, source='CURATED', min_score=0, min_ei=0, min_dsi=0.25, min_dpi=0, chunksize=100, user_key=None, quiet=False)
Retrieves a list of protein names (and associated gene names) related to the input disease CIDs
…
Parameters
- disease_listPython character string list
list of Concept IDs (CID) from Medgen for each disease
- limitPython integer
[default=3000] : max. number of proteins
- sourcePython character string
[default=”CURATED”] : DisGeNET data sources [“CURATED”,”ANIMAL MODELS”,”INFERRED”,”ALL”] (see https://www.disgenet.org/dbinfo)
- min_scorePython float
[default=0] : minimum global score
- min_eiPython float
[default=0] : minimimum Evidence Index
- min_dsiPython float
[default=0.25] : minimum Disease Specificity Index
- min_dpiPython float
[default=0] : minimum Disease Pleiotropy Index
- chunksizePython integer
[default=100] : size of chunks (1 chunk per request)
- user_keyPython character string or None
[default=None] : API key from DisGeNET
- quietPython bool
[default=False] : prints out verbose
Returns
- res_dfPandas DataFrame
rows/[Disease CID] x columns/[“Protein”, “Gene Name”] or None if Not found.
NORDic.UTILS.LINCS_utils module
- NORDic.UTILS.LINCS_utils.binarize_via_CD(df, samples, binarize=1, nperm=10000, quiet=False)
Run a differential expression analysis on a dataframe using Characteristic Direction (CD) [1] (implementation: www.maayanlab.net/CD/) [1] doi.org/10.1186/1471-2105-15-79
…
Parameters
- dfPandas DataFrame
one transcriptional profile per column (/!if #genes>25,000, then the 25,000 genes with highest variance will be considered)
- samplesPython integer list
indicates which columns correspond to control (=1) / treated (=2) samples
- binarizePython integer
[default=1] : whether to return a binary signature or a real-valued column ~magnitude of change in expression
- npermPython integer
[default=10000] : number of iterations to build the null distribution on which p-values will be computed
- quietPython bool
[default=False] : prints out verbose
Returns
- signaturePandas DataFrame
rows/[gene index] x columns/[“aggregated”]: 0=down-regulated (DR), 1=up-regulated (UR) (if binarize=1) else <0=DR, >0=UR
- NORDic.UTILS.LINCS_utils.build_url(endpoint, method, params, user_key=None)
Builds the request to CLUE API
…
Parameters
- endpointPython character string
in [“sigs”, “cells”, “genes”, “perts”, “plates”, “profiles”, “rep_drugs”, “rep_drug_indications”, “pcls”]
- methodPython character string
in [“count”, “filter”, “distinct”]
- paramsPython dictionary
additional arguments for the request
- user_keyPython character string
[default=None] : API key for LINCS CLUE.io
Returns
- urlPython character string
URL of request
- NORDic.UTILS.LINCS_utils.compute_interference_scale(sigs, samples, entrez_id, is_oe, taxon_id, lincs_specific_ctl_genes, quiet=True, eps=2e-07)
Computes the interference scale [1] which determines whether a genetic perturbation was successful [1] doi.org/10.1002/psp4.12107
…
Parameters
- sigsPandas DataFrame
rows/[genes] x columns/[control and treated samples]
- samplesPython integer list
contains 1 for control samples, 2 for treated ones for each column of @sigs
- entrez_idPython integer
EntrezID of the perturbed gene
- is_oePython bool
is the experiment an overexpression of the perturbed gene (is_oe=True) or a knockdown
- taxon_idPython integer
NCBI taxonomy ID
- lincs_specific_ctl_genesPython string list
list of HNCG gene symbols for housekeeping genes
- quietPython bool
[default=True] : prints out verbose
- epsPython float
[default=2e-7] : avoids numerical errors for low-expression housekeeping genes
Returns
- iscalePython float
interference scale for the input experiment
- NORDic.UTILS.LINCS_utils.convert_ctrlgenes_EntrezGene(taxon_id)
Retrieves EntrezID from control genes in LINCS L1000 [1] [1] doi.org/10.1002/psp4.12107
…
Parameters
- taxon_idPython integer
NCBI taxonomy ID
Returns
- lincs_specific_ctl_genesPython character string list
list of EntrezGene IDs for all genes in input list
- NORDic.UTILS.LINCS_utils.create_restricted_drug_signatures(sig_ids, entrezid_list, path_to_lincs, which_lvl=[3], strict=True, quiet=False)
Create dataframe of drug signatures from LINCS L1000 from a subset of signature and gene IDs
…
Parameters
- sig_idsPython character string list
list of signature IDs from LINCS L1000 (Level 3: “distil_id”, Level 5: “sig_id”)
- entrezid_listPython character string list
list of EntrezIDs
- path_to_lincsPython character string
folder in which LINCS L1000-related files are stored
- which_lvlPython integer list
[3] for Level 3, [5] for Level 5
- strictPython bool
[default=True] : if set to True, if not all signatures are retrieved, then return None. If set to False, return the (sub)set of retrievable signatures
- quietPython bool
[default=False] : prints out verbose
Returns
- sigs Pandas DataFrame
rows/[genes] x columns/[drugs]
- NORDic.UTILS.LINCS_utils.download_file(path_to_lincs, file_name, base_url, file_sha, check_SHA=True, quiet=False)
Downloads automatically LINCS L1000-related files from Gene Expression Omnibus (GEO) (/!can be time-consuming: expect waiting times up to 20 min with a good Internet connection)
…
Parameters
- path_to_lincsPython character string
path to local LINCS L1000 folder in which the files will be downloaded
- file_namePython character string
file name to download on GEO
- base_urlPython character string
path to GEO repository
- file_shaPython character string
file name of corresponding SHA hash to check file integrity
- check_SHAPython bool
[default=True] : whether to check the file integrity
- quietPython bool
[default=False] : prints out verbose
Returns
- 0Python integer
0 means that the download was successful
- NORDic.UTILS.LINCS_utils.download_lincs_files(path_to_lincs, which_lvl)
Returns and downloads the proper LINCS L1000 files from Gene Expression Omnibus (GEO)
…
Parameters
- path_to_lincsPython character string
path to folder in which LINCS L1000-related files will be locally stored
- which_lvlPython integer list
LINCS L1000 Level to download (either [3] -normalized gene expression-, [5] -binary experimental signatures-, [3,5])
Returns
- file_listPython list of 4 Python character string lists
gene_files, sig_files, lvl3_files, lvl5_files Python lists of character strings
- NORDic.UTILS.LINCS_utils.get_treated_control_dataset(treatment, pert_type, cell, filters, entrez_ids, taxon_id, user_key, path_to_lincs, entrez_id=None, selection='distil_ss', dose=None, iunit=None, itime=None, which_lvl=[5], nsigs=2, same_plate=True, quiet=False, trim_w_interference_scale=True, return_metrics=[])
Retrieve set of experimental profiles, with at least nsigs treated and control sample
…
Parameters
- treatmentPython character string
HUGO gene symbol
- pert_typePython character string
type of perturbation as accepted by LINCS L1000
- cellPython character string
cell line existing in LINCS L1000
- filtersPython dictionary
additional parameters for the LINCS L1000 requests
- entrez_idsPython integer list
EntrezID genes
- taxon_idPython integer
NCBI taxonomy ID
- user_keyPython character string
LINCS L1000 user API key
- path_to_lincsPython character string
path where LINCS L1000 files are locally stored
- entrez_idPython integer
EntrezID identifier for HUGO gene symbol treatment
- selectionPython character string
[default=”distil_ss”] : LINCS L1000 metric which is maximized by a given experiment
- dosePython character string or None
[default=None] : filter by dose (if not None)
- iunitPython character string or None
[default=None] : unit of dose
- itimePython character string or None
[default=None] : filter by exposure time (if not None)
- which_lvlPython integer list
[default=[3]] : LINCS L1000 data level to consider (either 3 or 5)
- nsigsPython integer
[default=2] : minimal number of samples of each condition in each experiment
- same_platePython bool
[default=True] : select samples from the same plate for each experiment and condition
- quietPython bool
[default=True] : prints out verbose
- trim_w_interference_scalePython bool
[default=True] : computes the interference scale criteria for further trimming
- return_metricsPython character string list
[default=[]] : list of LINCS L1000 metrics to return as the same time as the profiles
Returns
- sigsPandas DataFrame
rows/[genes+”annotation”+”signame”+”sigid”] x columns/[profiles] or None
- NORDic.UTILS.LINCS_utils.get_user_key(fname)
Retrieves user key for interacting with LINCS L1000 CLUE API
…
Parameters
- fnamePython character string
path to file containing credentials for LINCS L1000 (first line: username, second line: password, third line: user key)
Returns
- user_keyPython character string
identifier for the LINCS L1000 CLUE API
- NORDic.UTILS.LINCS_utils.post_request(url, quiet=True, pause_time=1)
Post request to API
…
Parameters
- urlPython character string
URL formatted as in build_url
- quietPython bool
[default=True] : prints out verbose
- pause_timePython integer
[default=1] : minimum time in seconds between each request
Returns
- dataPython dictionary
(JSON) or Python character string list (if request was method=”distinct”)
- NORDic.UTILS.LINCS_utils.select_best_sig(params, filters, user_key, selection='distil_ss', nsigs=2, same_plate=True, iunit=None, quiet=False)
Select “best” set of profiles (“experiment”) (in terms of quality, or criterion “selection”) according to filters
…
Parameters
- paramsPython dictionary
additional arguments for the request
- filtersPython dictionary
additional arguments for filtering the results of the request (defined with params)
- selectionPython character string
[default=”distil_ss”] : name of the metric in LINCS L1000 to define the best signature
- nsigsPython integer
[default=2] : minimum number of signatures to retrieve
- same_platePython bool
[default=True] : whether to retrieve signatures from the same plate or not
- iunitPython character string or None
[default=None] : unit of dose (if None, any)
- quietPython bool
[default=False] : prints out verbose
Returns
- dataPython dictionary list
the list of profile IDs to retrieve from LINCS L1000
NORDic.UTILS.STRING_utils module
- NORDic.UTILS.STRING_utils.get_app_name_STRING(fname)
Retrieves app name from STRING to interact with the API
…
Parameters
- fnamePython character string
path to file with a unique line = email adress
Returns
- app_namePython character string
identifier for the STRING API
- NORDic.UTILS.STRING_utils.get_image_from_STRING(my_genes, taxon_id, file_name='network.png', min_score=0, network_flavor='evidence', network_type='functional', app_name=None, version='11.5', quiet=False)
Retrieves protein IDs in STRING associated with input genes in the correct species
…
Parameters
- genes_listPython character list
list of gene symbols
- taxon_idPython integer
taxon ID from NCBI
- file_namePython character string
[default=”network.png”] : image file name
- min_scorePython float
[default=0] : confidence lower threshold (in [0,1])
- network_flavorPython character string
[default=”evidence”] : show links related to [“confidence”, “action”, “evidence”]
- network_typePython character string
[default=”functional”] : show “functional” or “physical” network
- app_namePython character string
[default=None] : identifier for STRING
- quietPython bool
[default=False] : prints out verbose
Returns
- None
writes the network image to a file file_name
- NORDic.UTILS.STRING_utils.get_interactions_from_STRING(gene_list, taxon_id, min_score=0, app_name=None, file_folder=None, version='11.0', strict=False, quiet=False)
Retrieves (un)directed and (un)signed physical interactions from the STRING database
…
Parameters
- gene_listPython character string list
list of genes
- taxon_idPython integer
NCBI taxonomy ID
- min_scorePython integer
[default=0] : in [0,1] STRING combined score
- app_namePython character string
[default=None] : identifier for STRING
- file_folderPython character string
[default=None]: where to save the file from STRING (if None, the file is not saved)
- versionPython character string
[default=”v11.0”] : STRING database version
- strictPython bool
[default=False] : if set to True, only keep interactions involving genes BOTH in @gene_list
- quietPython bool
[default=False] : prints out verbose
Returns
- res_dfPandas Dataframe
rows/[interation number] x columns/[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]
- NORDic.UTILS.STRING_utils.get_interactions_partners_from_STRING(gene_list, taxon_id, min_score=0, network_type='functional', add_nodes=0, limit=5, app_name=None, version='11.5', quiet=False)
Retrieves undirected and unsigned interactions from the STRING database
…
Parameters
- gene_listPython character string list
list of gene symbols
- taxon_idPython integer
NCBI taxonomy ID
- min_scorePython integer
[default=0] : minimum STRING combined edge score in [0,1]
- network_typePython character string
[default=”functional”] : returns “functional” or “physical” network
- limitPython integer
[default=5] : limits the number of interaction partners retrieved per protein (most confident interactions come first)
- app_namePython character string
[default=None] : identifier for STRING
- versionPython character string
[default=”11.5”] : STRING version
- quietPython bool
[default=False] : prints out verbose
Returns
- networkPandas DataFrame
rows/[row number] x columns/[“preferredName_A”,”preferredName_B”,”score”,”directed”]
- NORDic.UTILS.STRING_utils.get_network_from_STRING(gene_list, taxon_id, min_score=0, network_type='functional', add_nodes=0, app_name=None, version='11.5', quiet=False)
Retrieves undirected and unsigned interactions from the STRING database
…
Parameters
- gene_listPython character string list
list of gene symbols
- taxon_idPython integer
NCBI taxonomy ID
- min_scorePython integer
[default=0] : minimum STRING combined edge score in [0,1]
- network_typePython character string
[default=”functional”] : returns “functional” or “physical” network
- add_nodesPython integer
[default=0] : add nodes in the closest interaction neighborhood involved with the genes in @gene_list if set to 1
- app_namePython character string
[default=None] : identifier for STRING
- versionPython character string
[default=”11.5”] : STRING version
- quietPython bool
[default=False] : prints out verbose
Returns
- networkPandas DataFrame
rows/[row number] x columns/[“preferredName_A”,”preferredName_B”,”score”,”directed”]
- NORDic.UTILS.STRING_utils.get_protein_names_from_STRING(gene_list, taxon_id, app_name=None, version='11.5', quiet=False)
Retrieves protein IDs in STRING associated with input genes in the correct species
…
Parameters
- genes_listPython character list
list of gene symbols
- taxon_idPython integer
taxon ID from NCBI
- versionPython character string
[default=”11.5”] : STRING version
- app_namePython character string
[default=None] : identifier to access STRING
- quietPython bool
[default=False] : prints out verbose
Returns
- res_dfPandas DataFrame
rows/[row number] x columns/[“queryItem”, “stringId”, “preferredName”, “annotation”]
- NORDic.UTILS.STRING_utils.string_api_url(v)
NORDic.UTILS.utils_data module
- NORDic.UTILS.utils_data.convert_EntrezGene_LINCSL1000(file_folder, EntrezGenes, user_key, quiet=False)
Converts EntrezIDs to Gene Symbols present in LINCS L1000
…
Parameters
- file_folderPython character string
path to folder of intermediate results
- EntrezGenesPython character string list
list of EntrezGene IDs
- user_keyPython character string
LINCS L1000 user key
- quietPython bool
[default=False] : prints out verbose
Returns
- PandasPandas DataFrame
rows/[EntrezID] x columns/[“Gene Symbol”,”Entrez ID”] (“-” if they do not exist)
- NORDic.UTILS.utils_data.convert_genes_EntrezGene(gene_list, taxon_id, app_name, chunksize=100, missing_genes={'C11ORF74': 'IFTAP', 'ENSP00000451560': 'TPPP2', 'RP11-566K11.2': 'TUBB4'}, quiet=False)
Convert gene symbols into EntrezGene CID
…
Parameters
- gene_listPython character string list
list of genes
- taxon_idPython character string
NCBI taxonomy ID
- app_namePython character string
STRING identifier
- missing_genesPython dictionary of character string x character string
known conversions
- chunksizePython integer
[default=100] : 1 chunk per request
- quietPython bool
[default=False] : prints out verbose
Returns
- res_dfPandas DataFrame
rows/[“InputValue”] x columns/[“Gene ID”/might be separated by “; “] (“-” if they do not exist) or None if no identifier has been found
- NORDic.UTILS.utils_data.get_all_celllines(pert_inames, user_key, quiet=False)
Get all cell lines in which one gene in the input list has been specifically perturbed (genetic perturbation)
…
Parameters
- pert_inamesPython character string
List of genes (symbols from LINCS L1000)
- user_keyPython character string
user key from LINCS L1000 CLUE API
- quietPython bool
[default=False] : prints out verbose
Returns
- cell_linesPython character string list
list of cell lines in which at least one gene from pert_inames has been perturbed
- NORDic.UTILS.utils_data.request_biodbnet(probe_list, from_, to_, taxon_id, chunksize=500, quiet=False)
Converts gene identifier from from to to in a given species
…
Parameters
- probe_listPython character string list
list of probes to convert (of type from_)
- from_Python character string
an identifier type as recognized by BioDBnet
- to_Python character string
an identifier type as recognized by BioDBnet
- taxonIdPython integer
NCBI taxonomy ID
- chunksizePython integer
[default=500] : 1 chunk per request
Returns
NORDic.UTILS.utils_exp module
- NORDic.UTILS.utils_exp.get_experimental_constraints(file_folder, cell_lines, pert_types, pert_di, taxon_id, selection, user_key, path_to_lincs, thres_iscale=None, nsigs=2, quiet=False)
Retrieve experimental profiles from the provided cell lines, perturbation types, list of genes, in the given species (taxon ID)
…
Parameters
- file_folderPython character string
folder where to store intermediary results
- cell_linesPython character string list
cell lines present in LINCS L1000
- pert_typesPython character string list
types of perturbations as supported by LINCS L1000
- pert_diPython dictionary
(keys=Python character string, values=Python integer) associates HUGO gene symbols to their EntrezGene IDs
- taxon_idPython integer
NCBI taxonomy ID
- selectionPython character string
LINCS L1000 metric to maximize
- user_keyPython character string
LINCS L1000 user API key
- path_to_lincsPython character string
path to local LINCS L1000 files
- thres_iscalePython float or None
[default=None] : lower threshold on the interference scale which quantifies the success of a genetic experiment
- nsigsPython integer
[default=2] : minimal number of profiles per experiment and condition
- quietPython bool
[default=False] : prints out verbose
Returns
- signaturesPandas DataFrame
rows/[genes+annotations] x columns/[profile/signature IDs]
- NORDic.UTILS.utils_exp.profiles2signatures(profiles_df, user_key, path_to_lincs, save_fname, backgroundfile=False, selection='distil_ss', thres=0.5, bin_method='binary', nbackground_limits=(4, 30), quiet=False)
Convert experimental profiles into signatures (1 for control samples, 1 for treated ones)
…
Parameters
- profiles_dfPandas DataFrame
rows/[genes+annotations] x columns/[samples]
- user_keyPython character string
LINCS L1000 user API key
- path_to_lincsPython character string
path to local LINCS L1000 files
- save_fnamePython character string
path to save normalized expression profiles per cell line
- background_filePython bool
[default=False] : retrieves from LINCS L1000 supplementary expression values if set to True to compute more precise basal gene expression levels
- selectionPython character string
[default=”distil_ss”] : LINCS L1000 metric to maximize for the “background” data
- thresPython float
[default=0.5] : threshold for cutoff normalized gene expression values (in [0,0.5])
- bin_methodPython character string
[default=”binary”] : binarization approach
- nbackground_limitsPython integer tuple
[default=(4,30)] : lower and upper bounds on the number of profiles for the background expression data
- quietPython bool
[default=False] : prints out verbose
Returns
- signatures_df Pandas DataFrame
rows/[genes] x columns/[signature ID]
NORDic.UTILS.utils_grn module
- NORDic.UTILS.utils_grn.CL(influences)
Computes the average of node-wise clustering coefficients. The clustering coefficient of a node is the ratio of the degree of the considered node and the maximum possible number of connections such that this node and its current neighbors form a clique
…
Parameters
- influencesPandas DataFrame
rows/[genes] x columns/[genes]
Returns
- CLPython float
network clustering coefficient
- NORDic.UTILS.utils_grn.Centr(influences)
Computes the network centralization, which is correlated with the similarity of the network to a graph with a star topology
…
Parameters
- influencesPandas DataFrame
rows/[genes] x columns/[genes]
Returns
- CentrPython float
network centralization
- NORDic.UTILS.utils_grn.DS(influences)
Computes the number of edges over the maximum number of possible connections between the nodes in the network
…
Parameters
- influencesPandas DataFrame
rows/[genes] x columns/[genes]
Returns
- DSPython float
network density
- NORDic.UTILS.utils_grn.GT(influences)
Computes the network heterogeneity, which quantifies the non-uniformity of the node degrees across the network
…
Parameters
- influencesPandas DataFrame
rows/[genes] x columns/[genes]
Returns
- GTPython float
network heterogeneity
- NORDic.UTILS.utils_grn.build_influences(network_df, tau, beta=1, cor_method='pearson', expr_df=None, accept_nonRNA=False, quiet=False)
Filters out (and signs of unsigned) edges based on gene expression
…
Parameters
- network_dfPandas DataFrame
rows/[index] x columns/[[“Input”, “Output”, “SSign”]] interactions
- tauPython float
threshold on genepairwise expression correlation
- betaPython integer
[default=1] : power applied to the adjacency matrix
- cor_methodPython character string
[default=”pearson”] : type of correlation
- expr_dfPandas DataFrame
[default=None] : rows/[genes] x columns/[samples] gene expression data
- accept_nonRNAPython bool
[default=False] : if set to False, ignores gene names which are not present in expr_df
- quietPython bool
[default=False] : prints out verbose
Returns
- influencesPandas DataFrame
rows/[genes] x columns/[genes] signed adjacency matrix with only interactions s.t. corr^beta>=tau
- NORDic.UTILS.utils_grn.build_observations(grn, signatures, quiet=False)
Implement experimental constraints from perturbation experiments in signatures. Experimental constraints are of the form Initial state masked by single-gene perturbation can lead to a steady attractor state Final
…
Parameters
- grnInfluenceGraph (from BoneSiS)
contains topological constraints
- signaturesPandas DataFrame
rows/[genes] x columns/[experiment IDs]. Experiment IDs is of the form “<pert. gene>_<pert. type>_<…>_<cell line>” (treated) or “initial_<cell line>” (control)
- quietPython bool
[default=False] : prints out verbose
Returns
- BOBoNesis object (from BoneSiS)
BoNesis object which can be evaluated
- NORDic.UTILS.utils_grn.create_grn(influences, exact=False, max_maxclause=3, quiet=False)
Create a BoneSiS InfluenceGraph
…
Parameters
- influencesPandas DataFrame
rows/[genes] x columns/[genes] of interactions, values in {-1,1,0} -1:negative,1:positive,0:absent
- exactPython bool
[default=False] : should all interactions be preserved?
- max_maxclausePython integer
[default=3] : upper bound on the number of clauses in DNF form
- quietPython bool
[default=False] : prints out verbose
Returns
- grnBoneSiS InfluenceGraph class object
BoneSiS GRN object
- NORDic.UTILS.utils_grn.desirability(x, f_weight_di, A=0, B=1)
Harrington’s desirability function, used by [1] Converts a list of functions to maximize into a single scalar function to maximize with values in [@A,@B]
- [1] http://ceur-ws.org/Vol-2488/paper17.pdf
https://cran.r-project.org/web/packages/desirability/vignettes/desirability.pdf
…
Parameters
- xdatapoint
any input to functions in f_weight_di
- f_weight_diPython dictionary
function with arguments as the same type as x, and associated weight
- APython float
[default=0] : lower bound of the function interval
- BPython float
[default=1] : upper bound of the function interval
Returns
- des(x)Python float
value of the desirability function at point x
- NORDic.UTILS.utils_grn.general_topological_parameter(influences, weights)
Computes the general topological parameter (GTP) associated with the input network
…
Parameters
- influencesPandas DataFrame
rows/[genes] x columns/[genes]
- weightsPython dictionary of (Python character string x Python float)
all keys must be in [“DS”,”CL”,”Centr”,”GT”]
Returns
- scorePython float
score using the Harrington’s desirability function
- NORDic.UTILS.utils_grn.get_genes_downstream(network_fname, gene, n=-1)
Get the list of genes downstream of a gene in a network
…
Parameters
- network_fnamePython character string
path to the .BNET file associated with the network
- genePython character string
gene name in the network
- nPython integer
[default=-1] : number of recursions (if<0, recursively get all downstream genes)
Returns
- lst_downstreamPython character string list
list of nodes downstream of @gene
- NORDic.UTILS.utils_grn.get_genes_interactions_from_PPI(ppi, connected=False, score=0, filtering=True, quiet=False)
Filtering edges to decrease computational cost while preserving network connectivity (if needed)
…
Parameters
- ppiPandas DataFrame
rows/[index] x columns[{“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]; sign in {-1,1,2}, directed in {0,1}, score in [0,1]
- connectedPython bool
[default=True] : if set to True, preserve/enforce connectivity on the final network
- scorePython float
[default=0] : Lower bound on the edge-associated score
- filteringPython bool
[default=True] : Whether to filter out edges by a correlation threshold
- quietPython bool
[default=False] : prints out verbose
Returns
- ppi_acceptedPandas DataFrame
rows/[index] x columns/[[“Input”, “Output”]]
- NORDic.UTILS.utils_grn.get_genes_most_variable(control_profiles, treated_profiles, p=0.8)
Get the list of genes which contribute most to the variation between two conditions (in the @pth percentile of change)
…
Parameters
- control_profilesPandas DataFrame
rows/[genes] x columns/[samples] profiles from condition 1
- treated_profilesPandas DataFrame
rows/[genes] x columns/[samples] profiles from condition 1
- pPython float
100*p th percentile to consider
Returns
- lst_genesPython character string list
list of nodes which contribute most to the variation between conditions
- NORDic.UTILS.utils_grn.get_grfs_from_solution(solution)
Retrieve all gene regulatory functions (GRFs) from a given solution
…
Parameters
- solutionPandas Series
rows/[genes]
Returns
- grfsPython dictionary
{gene: {regulator: sign, …}, …} where sign in {-1,1} -1: inhibitor, 1: activator
- NORDic.UTILS.utils_grn.get_maxdegree(influences, activatory=True, quiet=False)
Computes the maximum ingoing degree (or the maximum number of potential activatory regulators) in a graph
…
Parameters
- influencesPandas DataFrame
rows/[genes] x columns/[genes] of interactions: -1:negative,1:positive,0:absent
- activatoryPython bool
[default=True] : computes the maximum number of potential activatory regulators instead
- quietPython bool
[default=False] : prints out verbose
Returns
- maxindegreePython integer
maximum ingoing degree (or the maximum number of potential activatory regulators)
- NORDic.UTILS.utils_grn.get_minimal_edges(R, maximal=False)
Return one of the solutions with the smallest (or greatest) number of edges
…
Parameters
- RPandas DataFrame
rows/[genes] x columns/[solution IDs]
- connectedPython bool
[default=False] : if set to True, return the CONNECTED solution which satisfies those constraints
- maximalPython bool
[default=False] : if set to True, return the solution with the greatest number of edges
Returns
- solution, nedgesPython integer x Python integer
solution and corresponding number of edges
- NORDic.UTILS.utils_grn.get_weakly_connected(network_df, gene_list, index_col='preferredName_A', column_col='preferredName_B', score_col='sscore')
Depth-first search (DFS) on undirected network
…
Parameters
- network_dfPandas DataFrame
rows/[index] x columns/[[“Input”,”Output”]]
- gene_listPython character string list
list of genes (needed to take into account isolated genes in the network)
- index_colPython character string
[default=”preferredName_A”] : column in network_df (input gene)
- column_colPython character string
[default=”preferredName_B”] : column in network_df (output gene)
- score_colPython character string
[default=”sscore”] : column in network_df (edge weight)
Returns
- componentsType of @network_df.loc[network_df.index[0]][“Input”] Python list of Python list
list of weakly connected components in the network, ordered by decreasing size
- NORDic.UTILS.utils_grn.infer_network(BO, njobs=1, fname='solutions', use_diverse=True, limit=50, niterations=1)
Infer solutions matching topological & experimental constraints
…
Parameters
- BOBonesis object (from BoneSiS)
contains topological & experimental constraints
- fnamePython character string
[default=”solutions”] : path to solution files
- use_diversePython bool
[default=True] : use the “diverse” procedure in BoneSiS
- limitPython integer
[default=50] : maximum number of solutions to generate per interation
- niterationsPython integer
[default=1] : maximum number of iterations
Returns
- nsolutionsPython integer
list of # solutions per iteration
- NORDic.UTILS.utils_grn.load_grn(fname)
Loads GRN as MPBN class element
…
Parameters
- fnamePython character string
BNET file
Returns
- BNmpbn.MPBooleanNetwork object
Boolean network with Most Permissive semantics
- NORDic.UTILS.utils_grn.reconnect_network(network_fname)
Write the network with all isolated nodes (no ingoing/outgoing edges) filtered out
…
Parameters
- network_fnamePython character string
path to the .BNET associated with the network
Returns
- fnamePython character string
path to the .BNET associated with the reconnected network
- NORDic.UTILS.utils_grn.save_grn(solution, fname, sep=', ', quiet=False, max_show=5, write=True)
Write and/or print .bnet file
…
Parameters
- solutionPandas Series
rows/[genes] contains gene regulatory functions (GRF)
- fnamePython character string
where to write the file (w/o .bnet extension)
- sepPython character string
what separates regulators from regulated genes
- quietPython bool
[default=False] : prints out verbose
- max_showPython integer
[default=5] : maximum number of printed GRFs
- writePython bool
[default=True] : if set to True, write to a .bnet file
Returns
- None
writes the GRN to a file fname
- NORDic.UTILS.utils_grn.save_solutions(bnetworks, fname, limit)
Enumerate and save solutions
…
Parameters
- bnetworksBonesis object
Output of the inference
- fnamePython character string
ZIP filename to store the solutions
- limitPython integer
maximum number of solutions to enumerate
Parameters
- nPython integer
number of enumerated solutions
- NORDic.UTILS.utils_grn.solution2influences(solution)
Converts a solution object into a influences object
…
Parameters
- solutionPandas Series
rows/[genes]
Returns
- influencesPandas DataFrame
rows/[genes] x columns/[genes] contains values {-1,1,0,2} -1: negative, 1: positive, 0: absent, 2: non monotonic
NORDic.UTILS.utils_network module
- NORDic.UTILS.utils_network.aggregate_networks(file_folder, gene_list, taxon_id, min_score, network_type, app_name, version_net='11.5', version_act='11.0', quiet=0)
This function performs the following pipeline to build a prior knowledge network based on a subset of genes - Retrieve protein actions and predicted PPIs from STRING - Merge the two networks while solving all inconsistencies (duplicates, paradoxes, etc.) in signs, directions, scores - Determine the greatest threshold on the edge score which allows all of the core gene set to be connected (binary search) - Trim out edges which scores are below the threshold, and remove all isolated nodes
…
Parameters
- file_folderPython character string
relative path where to store files
- gene_listPython character string list
list of core gene symbols to preserve in the network
- taxon_idPython integer
NCBI taxonomy ID
- min_scorePython integer
minimum score on edges retrieved from the STRING database
- app_namePython character string
Identifier for STRING requests
- version_netPython character string
[default=”11.5”] : Number of version for interaction data in the STRING database. To avoid compatibility issues, it is strongly advised not to change this parameter
- version_actPython character string
[default=”11.0”] : Number of version for protein action data in the STRING database. To avoid compatibility issues, it is strongly advised not to change this parameter
- quietPython bool
[default=None] : prints out verbose
Returns
- final_networkPandas DataFrame
rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]
- NORDic.UTILS.utils_network.capture()
- NORDic.UTILS.utils_network.determine_edge_threshold(network, core_gene_set, quiet=True)
Determine the greatest threshold on the edge score which allows all of the core gene set to be connected (binary search)
…
Parameters
- networkPandas DataFrame
rows/[interactions] x at least three columns “preferredName_A” (input node), “preferredName_B” (output node), “score” (edge score)
- core_gene_setPython character string list
list of genes that should remain connected
- quietPython bool
[default=None]: prints out verbose
Returns
- tPython float
maximum threshold which allows the connection of all genes in the core set
- NORDic.UTILS.utils_network.get_network_from_OmniPath(gene_list=None, disease_name=None, species='human', sources_int='omnipath', domains_int=None, types_int=None, min_curation_effort=-1, domains_annot='HPA_tissue', quiet=False)
Retrieve a network from OmniPath
…
Parameters
- gene_listPython character string
[default=None] : List of genes to consider (or do not filter the interactions from Omnipath if =None)
- disease_namePython character string
[default=None] : Disease name (in letters) to consider
- speciesPython character string
[default=None] : Species to consider (either “human”, “mouse”, or “rat”)
- sources_intPython character string
[default=None] : Which databases for interactions to consider (if =None, consider them all)
- domains_intPython character string
[default=None] : source of interactions in OmniPath
- types_intPython character string
[default=None] : Types of interactions, e.g., “post_translational”, “transcriptional”, “post_transcriptional”, “mirna_transcriptional”
- min_curation_effortPython integer
[default=-1] : if positive, select edges based on that criteria (the higher, the better). Counts the unique database-citation pairs, i.e. how many times was an interaction described in a paper and mentioned in a database
- domain_annotPython character string
[default=’HPA_tissue’] : source of annotations in OmniPath
- quietPython bool
[default=False] : prints out verbose
Returns
- final_networkPandas DataFrame
rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]
- annot_widePandas DataFrame
rows/[gene symbols] x columns/[annotations from the database @domains_annot]
- NORDic.UTILS.utils_network.merge_network_PPI(network, PPI, quiet=True)
Merge two network while solving all inconsistencies (duplicates, paradoxes, etc.) in signs, directions, scores
…
Parameters
- networkPandas DataFrame
rows/[interactions] x at least three columns “preferredName_A” (input node), “preferredName_B” (output node), “score” (edge score)
- PPIPandas DataFrame
rows/[interactions] x at least three columns “preferredName_A” (input node), “preferredName_B” (output node), “score” (edge score)
- quietPython bool
[default=None] : prints out verbose
Returns
- final_networkPandas DataFrame
rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]
- NORDic.UTILS.utils_network.remove_isolated(network, quiet=False)
Remove all nodes which do not belong to the largest connected component from the network
…
Parameters
- networkPandas DataFrame
rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]
- quietPython bool
[default=None] : prints out verbose
Returns
- trimmed_networkPandas DataFrame
rows/[interactions] x columns/[[“preferredName_A”, “preferredName_B”, “sign”, “directed”, “score”]]
NORDic.UTILS.utils_plot module
- NORDic.UTILS.utils_plot.influences2graph(influences, fname, optional=False, compile2png=True, engine='sfdp')
Plots a network by conversion to a DOT file and then to PNG
…
Parameters
- influencesPandas DataFrame
rows/[genes] x columns/[genes], contains {-1,1,2}
- fnamePython character string
filename of png file
- optionalPython bool
[default=False] : should interactions be drawn as optional (dashed lines)?
Returns
- None
writes a DOT file which can be converted to PNG image (if compile2png=True)
- NORDic.UTILS.utils_plot.plot_boxplots(scores, patient_scores, ground_truth=None, fsize=12, msize=5, fname='boxplots.pdf')
Plots one boxplot per treatment (all values obtained on patient profiles)
…
Parameters
- scoresPandas DataFrame
rows/[drug names] x column/[value]
- patient_scoresPandas DataFrame
rows/[drug names] x columns/[patient samples]
- ground_truthPandas DataFrame
[default=None] : rows/[drug names] x column/[class] Values in 1: treatment, 0: unknown, -1: aggravating. If not provided: does not color boxplots according to the class
- fsizePython integer
[default=18] : font size
- msizePython integer
[default=5] : marker size
- fnamePython character string
[default=”boxplots”] : file name for the plot
Returns
- None
create boxplots of reward scores across patients for each drug
- NORDic.UTILS.utils_plot.plot_discrete_distributions(signatures, fname='signature_expression_distribution.png')
Plots the distributions (histograms) of genes with determined status across signatures
…
Parameters
- signaturesPandas DataFrame
rows/[genes] x columns/[samples] with values in {0,NaN,1}. Determined status is either 0 or 1.
- fnamePython character string
[default=”signature_expression_distribution.png”] : file name
Returns
- None
plots the number of genes with expression values 0, 1 or NaN in each signature
- NORDic.UTILS.utils_plot.plot_distributions(profiles, fname='gene_expression_distribution.png', thres=None)
Plots the distributions (boxplots) of gene expression across samples for each gene, and the selected threshold for binarization
…
Parameters
- profilesPandas DataFrame
rows/[genes+annotations] x columns/[samples]
- fnamePython character string
[default=”gene_expression_distribution.png”] : file name
- thresPython float or None
[default=None] : binarization threshold (if there is any)
Returns
- None
plots boxplots of expression for each gene in profiles
- NORDic.UTILS.utils_plot.plot_heatmap(X, ground_truth=None, fname='heatmap.pdf', w=20, h=20, bfsize=20, fsize=20, rot=75)
Plots an heatmap of the signatures, with the potential ground truth
…
Parameters
- XPandas DataFrame
rows/[features] x columns/[drug names]
- ground_truthPandas DataFrame
[default=None] : rows/[drug names] x column/[class] Values in 1: treatment, 0: unknown, -1: aggravating. If not provided: does not color boxplots according to the class
- fnamePython character string
[default=”heatmap.pdf”] : file name for the plot
- wPython integer
[default=20] : figure width
- hPython integer
[default=20] : figure height
- bfsizePython integer
[default=20] : font size in the color bar
- rotPython integer
[default=75] : rotation angle of labels
Returns
- None
plots an heatmap of similarity across drugs based on the Pearson correlation
- NORDic.UTILS.utils_plot.plot_influence_graph(network_df, input_col, output_col, sign_col, direction_col=None, fname='graph.png', optional=True)
Converts a network into a PNG picture
…
Parameters
- network_dfPandas DataFrame
rows/[index] x columns/[input_col,output_col,sign_col]
- input_col,output_col,sign_col,direction_colPython character string
columns of network_df
- fnamePython character string
[default=”graph.png”] : file name for PNG picture
- optionalPython bool
[default=True] : should edges be plotted as dashed lines?
Returns
- None
Creates a image of the graph in file fname
- NORDic.UTILS.utils_plot.plot_precision_recall(pr, prs, tr, beta=1, thres=0.5, fname='PRC.pdf', method_name='predictor', fsize=18)
Plots a Precision-Recall curve (with variations across samples)
…
Parameters
- prPandas DataFrame
rows/[drug names] x column/[value]
- prsPandas DataFrame
rows/[drug names] x columns/[patient samples]
- trPandas DataFrame
[default=None] : rows/[drug names] x column/[class]
- betaPython float
[default=1] : value of coefficient beta for the F-measure
- thresPython float
[default=0.5] : decision threshold
- fnamePython character string
[default=”PRC.pdf”] : file name for the plot
- method_namePython character string
[default=”predictor”] : name of the predictor
- fsizePython integer
[default=18] : font size
Returns
- None
Plots a Precision-Recall curve based on the drug repurposing predictions
- NORDic.UTILS.utils_plot.plot_roc_curve(pr, prs, tr, fname='ROC.pdf', method_name='predictor', fsize=18)
Plots a ROC curve (with variations across samples)
…
Parameters
- prPandas DataFrame
rows/[drug names] x column/[value]
- prsPandas DataFrame
rows/[drug names] x columns/[patient samples]
- trPandas DataFrame
[default=None] : rows/[drug names] x column/[class]
- fnamePython character string
[default=”ROC.pdf”] : file name for the plot
- method_namePython character string
[default=”predictor”] : name of the predictor
- fsizePython integer
[default=18] : font size
Returns
- None
Plots a ROC curve based on the drug repurposing predictions
- NORDic.UTILS.utils_plot.plot_signatures(signatures, perturbed_genes=None, width=10, height=10, max_show=50, fname='signatures')
Print signatures
…
Parameters
- signaturesPandas DataFrame
rows/[genes] x columns/[signature IDs]
- perturbed_genesPython character string list
[default=None] : list of gene names perturbed in the signatures
- width, heightPython integer
[default=10] : dimensions of image
- max_showPython integer
[default=50] : maximum number of genes shown (as only the @max_show genes with highest variance across signatures are plotted)
- fnamePython character string
[default=”signatures”] : path of resulting PNG image
Returns
- None
plots the signatures as heatmaps in file fname
NORDic.UTILS.utils_sim module
- class NORDic.UTILS.utils_sim.BN_SIM(seednb=0, njobs=None)
Bases:
object
- add_initial_states(initial, final=None)
- add_permanent_mutation(mutation)
- add_transient_mutation(mutation)
- attrs_similarity(attrs1, attrs2, gene_outputs=None)
- boxplot()
- enumerate_attractors(verbose=False)
- generate_trajectories(params={}, outputs=[])
- initialize_network(network_fname)
- up_to_attractors(network_fname, initial, final, mutation_permanent={}, mutation_transient={}, verbose=True)
- update_network(network_fname, initial, final=None, mutation_permanent={}, mutation_transient={}, verbose=True)
- class NORDic.UTILS.utils_sim.BONESIS_SIM(seednb=0, njobs=None)
Bases:
BN_SIM
- add_initial_states(initial, final)
- add_permanent_mutation(mutation)
- add_transient_mutation(mutation)
- enumerate_attractors(verbose=True)
- generate_trajectories(params={}, outputs=[])
- initialize_network(network_fname)
- class NORDic.UTILS.utils_sim.MABOSS_SIM(seednb=0, njobs=None)
Bases:
BN_SIM
- add_initial_states(initial, final=None)
- add_permanent_mutation(mutation)
- add_transient_mutation(mutation)
- enumerate_attractors(verbose=True)
- generate_trajectories(params={}, outputs=[])
- initialize_network(network_fname)
- class NORDic.UTILS.utils_sim.MPBN_SIM(seednb=0, njobs=None)
Bases:
BN_SIM
- add_initial_states(initial, final=None)
- add_permanent_mutation(mutation)
- add_transient_mutation(mutation)
- enumerate_attractors(max_attrs=-1, verbose=True)
- generate_trajectories(params={}, outputs=[], show_plot=True)
- initialize_network(network_fname)
- NORDic.UTILS.utils_sim.capture()
- NORDic.UTILS.utils_sim.choice(a, size=None, replace=True, p=None)
Generates a random sample from a given 1-D array
New in version 1.7.0.
Note
New code should use the ~numpy.random.Generator.choice method of a ~numpy.random.Generator instance instead; please see the random-quick-start.
Parameters
- a1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were
np.arange(a)
- sizeint or tuple of ints, optional
Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.- replaceboolean, optional
Whether the sample is with or without replacement. Default is True, meaning that a value of
a
can be selected multiple times.- p1-D array-like, optional
The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in
a
.
Returns
- samplessingle item or ndarray
The generated random samples
Raises
- ValueError
If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size
See Also
randint, shuffle, permutation random.Generator.choice: which should be used in new code
Notes
Setting user-specified probabilities through
p
uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element ofp
is 1 / len(a).Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its
axis
keyword.Examples
Generate a uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3) array([0, 3, 4]) # random >>> #This is equivalent to np.random.randint(0,5,3)
Generate a non-uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0]) array([3, 3, 0]) # random
Generate a uniform random sample from np.arange(5) of size 3 without replacement:
>>> np.random.choice(5, 3, replace=False) array([3,1,0]) # random >>> #This is equivalent to np.random.permutation(np.arange(5))[:3]
Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:
>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]) array([2, 3, 0]) # random
Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:
>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher'] >>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3]) array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random dtype='<U11')
- NORDic.UTILS.utils_sim.test(enumerator, seednb, njobs, network_fname, control_profile, treated_profiles, compare_to, mutation_permanent={}, mutation_transient={}, gene_outputs=None, print_boxplot=False, verbose=True)
NORDic.UTILS.utils_state module
- NORDic.UTILS.utils_state.binarize_experiments(data, thres=0.5, method='binary', strict=True, njobs=1)
Binarize experimental profiles
…
Parameters
- dataPandas DataFrame
rows/[genes] x columns/[samples]
- thresPython float
[default=0.5] : threshold for @method=”binary” (in [0,0.5])
- methodPython character string
[default=”binary”] : binarization method in [“binary”,”probin”]
- strictPython bool
[default=True] : takes into account equalities (if set to True, value=thres will lead to undefined for the corresponding gene)
- njobsPython integer
[default=1] : parallelism if needed
Returns
signatures : Pandas DataFrame: rows/[genes] x columns[samples] with values in [0,1,NaN]
- NORDic.UTILS.utils_state.compare_states(x, y, genes=None)
Computes the similarity between two sets of Boolean states
…
Parameters
- xPandas DataFrame
rows/[genes] x columns/[state IDs] contains (0, 1, NaN)
- yPandas DataFrame
rows/[genes] x columns/[state IDs] contains (0, 1, NaN)
- genesPython character string list
list of gene symbols
Returns
- simsNumPy array
similarities between each column of x and each columns of y, on the list of N present genes in genes (if provided) otherwise on the union of N genes in x and y
- NPython integer
number of genes on which the similarity is computed
- NORDic.UTILS.utils_state.finetune_binthres(df, samples, network_fname, mutation, step=0.005, maxt=0.5, mint=0, score_binthres=<function <lambda>>, njobs=1, verbose=True)
Select the binarization threshold (in function @binarize_experiments) which maximize the dissimilarity interconditions and the similarity intracondition …
Parameters
- dfPandas DataFrame
rows/[genes] x columns/[samples]: profiles
- samplesPython character string list
annotations of conditions for each sample in df
- network_fnamePython character string
file name containing the network
- mutationPython dictionary
dictionary (key=gene, value=perturbation type) gene perturbations which are considered
- stepPython float
[default=0.005] step in the interval to look for the threshold value
- maxtPython float
[default=0.5] maximum threshold value
- mintPython float
[default=0.] minimum threshold value
- score_binthresPython lambda function
[default=lambda itc,ita_c,ita_t:(1-itc)*ita_c*ita_t] fitness function for the threshold value
- njobsPython integer
[default=1] number of parallel jobs
- verbosePython bool
[default=True] prints out verbose
Returns
- max_thresPython float
threshold value maximizing the fitness function
- NORDic.UTILS.utils_state.quantile_normalize(df, njobs=1)