scReadSim.Utility.scATAC_CreateFeatureSets

scReadSim.Utility.scATAC_CreateFeatureSets(INPUT_bamfile, samtools_directory, bedtools_directory, outdirectory, genome_size_file, peak_mode='macs3', macs3_directory=None, INPUT_peakfile=None, INPUT_nonpeakfile=None, OUTPUT_peakfile=None, superset_peakfile=None)[source]

Create the foreground and background feature set for the input scATAC-seq bam file.

Parameters:
  • INPUT_bamfile (str) – Directory of input BAM file.

  • samtools_directory (str) – Directory of software samtools.

  • bedtools_directory (str) – Directory of software bedtools.

  • outdirectory (str) – Output directory.

  • genome_size_file (str) – Directory of Genome sizes file. The file should be a tab delimited text file with two columns: first column for the chromosome name, second column indicates the size.

  • peak_mode (str (default: macs3)) – Specify mode for trustworthy peak and non-peak generation, must be one of the following: “macs3”, “user”, and “superset”.

  • macs3_directory (str (default: None)) – Path to software MACS3. Must be specified if INPUT_peakfile and INPUT_nonpeakfile are None. Must be specified under peak_mode “macs3” or “superset”.

  • INPUT_peakfile (str (default: None)) – Directory of user-specified input peak file. Must be specified under peak_mode “user”.

  • INPUT_nonpeakfile (str (default: None)) – Directory of user-specified input non-peak file. Must be specified under peak_mode “user”.

  • superset_peakfile (str (default: None)) – Directory of a superset of potential chromatin open regions, including sources such as ENCODE cCRE (Candidate Cis-Regulatory Elements) collection. Must be specified under peak_mode “superset”.

  • OUTPUT_peakfile (str (default: None)) – Directory of user-specified output peak file. Synthetic scATAC-seq reads will be generated taking OUTPUT_peakfile as ground truth peaks. Note that OUTPUT_peakfile does not name the generated feature files by function scATAC_CreateFeatureSets.