enhlink

command module
v0.0.0-...-972a77a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 5, 2024 License: GPL-3.0 Imports: 7 Imported by: 0

README

import "gitlab.com/Grouumf/enhlinktools/enhlink"

enhlink inferes enhancer / promoter co-accessibilities (links) using random forests of ID3 trees and Information gain.

enhlink main inputs are:

a) a (cell x peak) sparse matrix,
b) a 4-columns promoter TSV file <chrID, start, stop, geneID> ,
c) an optional (cell x gene) sparse matrix if the gene activity cannot be inferred from the peaks of the the first matrix and the promoter regions. This matrix can either be interpreted as boolean (e.g. the promoter of a given gene is either accessible or not for a given cell), or as a float matrix using the -isExpr option, which reflects the gene expression (for example in a context of a scATAC-seq/RNA-seq multi-omic study)

In addition, covariates (cell x covariates) and clusters (cell x clusterID) TSV file can be provided. Finally, multiple optional parameters can be set to fine tune the speed, accuracies, and range of the models.

<<<<<<<<<<<<<<<<<<<< WARNING >>>>>>>>>>>>>>>>>>>> As of March 20 2024, Enhlink v0.21.0, we Changed some of Enhlink's parameters names for clarity and consistency purpose.

Below are the list of changes: (version < 0.21.0) -> (version >= 0.21.0) cluster -> clusters promoter -> gtf genes -> targets gene -> target isGeneExpr -> isExpr rmPeaksInPromoter -> rmPeaksInTargets onlyPositiveLink -> linkType <<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>

USAGE:

enhlink -mat <file> -xgi <file> -ygi <file> -promoter <file> -out <path> -tag <string>
        -mat2 <file> -xgi2 <file> -ygi2 <file>   # IF PASSING A GENE MATRIX FILE
        -target <string>  # IF FOCUSING ON ONE TARGET
        -targets <file>  # IF FOCUSING ON A LIST OF TARGETS
        -isExpr # IF MATRIX 2 IS A EXPRESSION MATRIX
        -covariates <file> -xgi_subset <file>  -ygi_subset <file> -cluster <file>  # OPTIONAL
        -downsample <int> -threads <int> -n_boot <int> -depth <int> -max_features <int>  # OPTIONAL
        -threshold <float> min_matsize <int> -min_leafsize <int> -merging_cutoff <int>   # OPTIONAL
        -format {coo, mtx, cellRanger} -keep_sparse -maxFeatType <string/int/float>  # OPTIONAL
        -rmPeaksInTargets -linkType {"all", "positive", "negative"} -secondOrder -ignoreEnhancerWeight  # OPTIONAL
        -neighborhood <int> -secondOrderMaxFeat <int> -uniformSampling # OPTIONAL

please check enhlink -h and the tutorial and introduction sections for a more precise description of the input parameters

Index

Variables

CLUSTERFILE cluster file

var CLUSTERFILE utils.Filename

DOWNSAMPLE Downsample the number of samples to use

var DOWNSAMPLE int

GENE gene

var GENE string

IGNOREENHANCERWEIGHT Ignore Enhancers weight (the ratio of accessibility) in the computation of the modified Information Gain

var IGNOREENHANCERWEIGHT bool

INPUTFORMAT iput matrix format

var INPUTFORMAT string

INPUTGENEMAT input matrix name for the gene matrix (input)

var INPUTGENEMAT utils.Filename

INPUTMAT input matrix name (input)

var INPUTMAT utils.Filename

ISGENEEXPR using gene expression for the gene mat

var ISGENEEXPR bool

KEEPSPARSE Keep the main ColMat matrix sparse. Usefull for memory reason if background is very large

var KEEPSPARSE bool

LAMBDA1 Lambda parameter of a poisson distribution, that controls the amount of dropouts of the simulated variables

var LAMBDA1 float64

LAMBDA2 Lambda parameter of a poisson distribution, that controls the amount of false positives in the simulated variables

var LAMBDA2 float64

LINKTYPE Which link to keep {"all", "positive", "negative"}

var LINKTYPE string

MAXFEATURES Maximum number of explanatory features per bootstrap model.

var MAXFEATURES int

MAXFEATURESTYPE Maximum of features to be considered for a given tree. {\"all\", \"sqrt\", \"log\"}

var MAXFEATURESTYPE enhlinkobject.MaxFeaturesType

MERGINGCUTOFF merging cutoff for closeby promoters

var MERGINGCUTOFF int

METADATA optional covariate matrix

var METADATA utils.Filename

MINLEAFSIZE Min size of leaf

var MINLEAFSIZE int

MINMATSIZE Min matrix size (int)

var MINMATSIZE int

NBBOOT Number of boostrap

var NBBOOT int

NBSIMFEATURES Number of simulated features to use

var NBSIMFEATURES int

NBTHREADS number of internal threads

var NBTHREADS int

NEIGHBORHOOD number of internal threads

var NEIGHBORHOOD int

ONLYSIM only perform simulation

var ONLYSIM bool

OUTDIR output directory

var OUTDIR string

OUTTAG output files tag

var OUTTAG string

PROMOTERFILE promoter file

var PROMOTERFILE utils.Filename

RMPEAKSINPROMOTERS Remove peaks within promoter boundaries

var RMPEAKSINPROMOTERS bool

SECONDORDER compute second order links - covar correlation

var SECONDORDER bool

SECONDORDERMAXFEATURES Maximum number of explanatory features per bootstrap model for second order models

var SECONDORDERMAXFEATURES int

SHOWVERSION show version and quit

var SHOWVERSION bool

THRESHOLD Significance level

var THRESHOLD float64

TREEDEPTH Max tree level

var TREEDEPTH int

UNIFORMSAMPLING Randomly sample the cells to have an uniform covariate distribution for each bootstrap. Needs a covariate matrix

var UNIFORMSAMPLING bool

XGI row index for input mat

var XGI utils.Filename

XGIGENE row index for input gene mat

var XGIGENE utils.Filename

XGISUBSET row index subset for input mat

var XGISUBSET utils.Filename

YGI column index for input mat

var YGI utils.Filename

YGIGENE column index for input gene mat

var YGIGENE utils.Filename

YGIGENESUBSET column index subset for input gene mat

var YGIGENESUBSET utils.Filename

YGISUBSET column index subset for input mat

var YGISUBSET utils.Filename
var maxfeaturestypeStr string

func main

func main()

func testIfRequiredFilesExist

func testIfRequiredFilesExist()

Generated by gomarkdoc

Documentation

Overview

Library that compiles the enhlink executable

enhlink inferes enhancer / promoter co-accessibilities (links) using random forests of ID3 trees and Information gain.

enhlink main inputs are:

a) a (cell x peak) sparse matrix,
b) a 4-columns promoter TSV file <chrID, start, stop, geneID> ,
c) an optional (cell x gene) sparse matrix if the gene activity cannot be inferred from the peaks of the the first matrix and the promoter regions. This matrix can either be interpreted as boolean (e.g. the promoter of a given gene is either accessible or not for a given cell), or as a float matrix using the -isExpr option, which reflects the gene expression (for example in a context of a scATAC-seq/RNA-seq multi-omic study)

In addition, covariates (cell x covariates) and clusters (cell x clusterID) TSV file can be provided. Finally, multiple optional parameters can be set to fine tune the speed, accuracies, and range of the models.

<<<<<<<<<<<<<<<<<<<< WARNING >>>>>>>>>>>>>>>>>>>> As of March 20 2024, Enhlink v0.21.0, we Changed some of Enhlink's parameters names

for clarity and consistency purpose.

Below are the list of changes: (version < 0.21.0) -> (version >= 0.21.0) cluster -> clusters promoter -> gtf genes -> targets gene -> target isGeneExpr -> isExpr rmPeaksInPromoter -> rmPeaksInTargets onlyPositiveLink -> linkType <<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>

USAGE:

enhlink -mat <file> -xgi <file> -ygi <file> -promoter <file> -out <path> -tag <string>
        -mat2 <file> -xgi2 <file> -ygi2 <file>   # IF PASSING A GENE MATRIX FILE
        -target <string>  # IF FOCUSING ON ONE TARGET
        -targets <file>  # IF FOCUSING ON A LIST OF TARGETS
        -isExpr # IF MATRIX 2 IS A EXPRESSION MATRIX
        -covariates <file> -xgi_subset <file>  -ygi_subset <file> -cluster <file>  # OPTIONAL
        -downsample <int> -threads <int> -n_boot <int> -depth <int> -max_features <int>  # OPTIONAL
        -threshold <float> min_matsize <int> -min_leafsize <int> -merging_cutoff <int>   # OPTIONAL
        -format {coo, mtx, cellRanger} -keep_sparse -maxFeatType <string/int/float>  # OPTIONAL
        -rmPeaksInTargets -linkType {"all", "positive", "negative"} -secondOrder -ignoreEnhancerWeight  # OPTIONAL
        -neighborhood <int> -secondOrderMaxFeat <int> -uniformSampling # OPTIONAL

please check enhlink -h and the tutorial and introduction sections for a more precise description of the input parameters

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL