enhlink
import "gitlab.com/Grouumf/enhlinktools/enhlink"
Library that compiles the enhlink executable
enhlink inferes enhancer / promoter co-accessibilities (links) using random forests of ID3 trees and Information gain.
enhlink main inputs are:
a) a (cell x peak) sparse matrix,
b) a 4-columns promoter TSV file <chrID, start, stop, geneID> ,
c) an optional (cell x gene) sparse matrix if the gene activity cannot be inferred from the peaks of the the first matrix and the promoter regions. This matrix can either be interpreted as boolean (e.g. the promoter of a given gene is either accessible or not for a given cell), or as a float matrix using the -isExpr option, which reflects the gene expression (for example in a context of a scATAC-seq/RNA-seq multi-omic study)
In addition, covariates (cell x covariates) and clusters (cell x clusterID) TSV file can be provided. Finally, multiple optional parameters can be set to fine tune the speed, accuracies, and range of the models.
<<<<<<<<<<<<<<<<<<<< WARNING >>>>>>>>>>>>>>>>>>>> As of March 20 2024, Enhlink v0.21.0, we Changed some of Enhlink's parameters names for clarity and consistency purpose.
Below are the list of changes: (version < 0.21.0) -> (version >= 0.21.0) cluster -> clusters promoter -> gtf genes -> targets gene -> target isGeneExpr -> isExpr rmPeaksInPromoter -> rmPeaksInTargets onlyPositiveLink -> linkType <<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>
USAGE:
enhlink -mat <file> -xgi <file> -ygi <file> -promoter <file> -out <path> -tag <string>
-mat2 <file> -xgi2 <file> -ygi2 <file> # IF PASSING A GENE MATRIX FILE
-target <string> # IF FOCUSING ON ONE TARGET
-targets <file> # IF FOCUSING ON A LIST OF TARGETS
-isExpr # IF MATRIX 2 IS A EXPRESSION MATRIX
-covariates <file> -xgi_subset <file> -ygi_subset <file> -cluster <file> # OPTIONAL
-downsample <int> -threads <int> -n_boot <int> -depth <int> -max_features <int> # OPTIONAL
-threshold <float> min_matsize <int> -min_leafsize <int> -merging_cutoff <int> # OPTIONAL
-format {coo, mtx, cellRanger} -keep_sparse -maxFeatType <string/int/float> # OPTIONAL
-rmPeaksInTargets -linkType {"all", "positive", "negative"} -secondOrder -ignoreEnhancerWeight # OPTIONAL
-neighborhood <int> -secondOrderMaxFeat <int> -uniformSampling # OPTIONAL
please check enhlink -h and the tutorial and introduction sections for a more precise description of the input parameters
Index
Variables
CLUSTERFILE cluster file
var CLUSTERFILE utils.Filename
DOWNSAMPLE Downsample the number of samples to use
var DOWNSAMPLE int
GENE gene
var GENE string
IGNOREENHANCERWEIGHT Ignore Enhancers weight (the ratio of accessibility) in the computation of the modified Information Gain
var IGNOREENHANCERWEIGHT bool
INPUTFORMAT iput matrix format
var INPUTFORMAT string
INPUTGENEMAT input matrix name for the gene matrix (input)
var INPUTGENEMAT utils.Filename
INPUTMAT input matrix name (input)
var INPUTMAT utils.Filename
ISGENEEXPR using gene expression for the gene mat
var ISGENEEXPR bool
KEEPSPARSE Keep the main ColMat matrix sparse. Usefull for memory reason if background is very large
var KEEPSPARSE bool
LAMBDA1 Lambda parameter of a poisson distribution, that controls the amount of dropouts of the simulated variables
var LAMBDA1 float64
LAMBDA2 Lambda parameter of a poisson distribution, that controls the amount of false positives in the simulated variables
var LAMBDA2 float64
LINKTYPE Which link to keep {"all", "positive", "negative"}
var LINKTYPE string
MAXFEATURES Maximum number of explanatory features per bootstrap model.
var MAXFEATURES int
MAXFEATURESTYPE Maximum of features to be considered for a given tree. {\"all\", \"sqrt\", \"log\"}
var MAXFEATURESTYPE enhlinkobject.MaxFeaturesType
MERGINGCUTOFF merging cutoff for closeby promoters
var MERGINGCUTOFF int
METADATA optional covariate matrix
var METADATA utils.Filename
MINLEAFSIZE Min size of leaf
var MINLEAFSIZE int
MINMATSIZE Min matrix size (int)
var MINMATSIZE int
NBBOOT Number of boostrap
var NBBOOT int
NBSIMFEATURES Number of simulated features to use
var NBSIMFEATURES int
NBTHREADS number of internal threads
var NBTHREADS int
NEIGHBORHOOD number of internal threads
var NEIGHBORHOOD int
ONLYSIM only perform simulation
var ONLYSIM bool
OUTDIR output directory
var OUTDIR string
OUTTAG output files tag
var OUTTAG string
PROMOTERFILE promoter file
var PROMOTERFILE utils.Filename
RMPEAKSINPROMOTERS Remove peaks within promoter boundaries
var RMPEAKSINPROMOTERS bool
SECONDORDER compute second order links - covar correlation
var SECONDORDER bool
SECONDORDERMAXFEATURES Maximum number of explanatory features per bootstrap model for second order models
var SECONDORDERMAXFEATURES int
SHOWVERSION show version and quit
var SHOWVERSION bool
THRESHOLD Significance level
var THRESHOLD float64
TREEDEPTH Max tree level
var TREEDEPTH int
UNIFORMSAMPLING Randomly sample the cells to have an uniform covariate distribution for each bootstrap. Needs a covariate matrix
var UNIFORMSAMPLING bool
XGI row index for input mat
var XGI utils.Filename
XGIGENE row index for input gene mat
var XGIGENE utils.Filename
XGISUBSET row index subset for input mat
var XGISUBSET utils.Filename
YGI column index for input mat
var YGI utils.Filename
YGIGENE column index for input gene mat
var YGIGENE utils.Filename
YGIGENESUBSET column index subset for input gene mat
var YGIGENESUBSET utils.Filename
YGISUBSET column index subset for input mat
var YGISUBSET utils.Filename
var maxfeaturestypeStr string
func main()
func testIfRequiredFilesExist()
Generated by gomarkdoc