1. cfDNAPro Functions and Parameters

1.1. readBam

The readBam() function reads a BAM file and returns a curated GRanges object. It processes the BAM file according to various user-defined parameters, such as genome label, strand mode, and sequence names to keep.

Parameters:

  • genome_label (str): Specifies the genome used in the alignment. Accepted values are “hg19”, “hg38”, or “hg38-NCBI”. Default is “hg19”. It loads corresponding genome packages for Homo sapiens sequences.

  • bamfile (str): The path to the BAM file.

  • curate_start_and_end (bool): If TRUE, the start and end coordinates of alignments are curated. Default is TRUE.

  • outdir (str or NA): Path to save the RDS file. If NA, no file is saved.

  • strand_mode (int): Defines strand mode; 1 means the strand of the pair is taken from the first alignment. Default is 1.

  • chromosome_to_keep (vector of str or bool): A character vector containing seqnames to retain in the GRanges object. Default is paste0(“chr”, 1:22). If FALSE, no filtering is applied.

  • use_names (bool): Whether to assign read names to the GRanges object. Default is TRUE.

  • galp_flag (Rsamtools ScanBamFlag): Specifies the flags for scanning the BAM file.

  • galp_what (vector of str): The fields to return from the BAM file, such as “cigar”, “mapq”, “isize”, “seq”, and “qual”.

  • galp_tag (vector of str): Specifies optional fields (tags) to retrieve from the BAM file.

  • galp_mapqFilter (int): The minimum mapping quality to include a read. Default is 40.

  • galp_bqFilter (int): The minimum base quality at the mutation locus. Default is 20.

  • mutation_file (str or NULL): An optional file containing mutation loci for mutational annotation.

  • mut_fragments_only (bool): If TRUE, only retrieves alignments overlapping mutation loci. Default is FALSE.

  • : Additional arguments passed to or from other methods.

Returns:

A curated GRanges object containing the genomic alignments.

1.2. callTrinucleotide

The callTrinucleotide() function processes a GRanges object, summarizing cfDNA fragment information for each target mutation locus. It annotates each mutation locus with the number and type of supporting fragments according to their read-pair overlap status. The median fragment length is also annotated for each read-pair overlap type. Additionally, the function calculates the locus-based consensus mutation by selecting the most frequent mismatch type, with priority given to concordant read-pair mutations (CO_MUT), followed by single-read mutations (SO_MUT). The consensus mutations are used to derive the trinucleotide substitution types (SBS96).

Parameters:

  • frag_obj_mut (GRanges): A GRanges object containing fragment and mutation data.

Returns:

A dataframe with summarized mutational and trinucleotide data.

1.3. plotTrinucleotide

The plotTrinucleotide() function processes and plots trinucleotide data. It first applies specified filters and transformations to the data, then generates a visual representation of the results. The function handles data normalization, exclusion, and retention based on provided column names, and it creates detailed plots with options for customization of plot aesthetics.

Parameters:

  • trinuc_df (DataFrame): The dataframe containing trinucleotide data.

  • exclude_if_type_present (vector): Mutation locus read-pair overlap types (e.g., CO_MUT, SO_MUT, CO_REF, SO_REF, DO, SO_OTHER, CO_OTHER) whose non-zero presence triggers exclusion of loci. For example, c(“DO”) will exclude any loci that contain even a single discordant read-pair overlap.

  • retain_if_type_present (vector): Mutation locus read-pair overlap types that must be present to retain those loci. For example, c(“CO”) will retain loci that contain even a single concordant read-pair overlap.

  • remove_type (vector): Mutation locus read-pair overlap types (e.g., SO_MUT, CO_REF, DO) to set to 0 across all loci in the dataframe.

  • normalize_counts (bool): If TRUE, normalizes SBS counts so they sum to 1. Default is TRUE.

  • show_overlap_type (bool): If TRUE, displays read-pair overlap types. Default is TRUE.

  • ylim (numeric): Limits for the y-axis in the plot. Default is c(0, 0.5).

  • plot_title (str): The title for the plot. Default is “Trinucleotide Profile”.

  • y_axis_title (str): The title for the y-axis. Default is “Percentage of Single Base Substitutions”.

  • draw_x_axis_labels (bool): Whether to draw x-axis labels. Default is TRUE.

  • draw_y_axis_labels (bool): Whether to draw y-axis labels. Default is TRUE.

  • draw_y_axis_title (bool): Whether to display a title for the y-axis. Default is TRUE.

  • output_file (str): The path to the output PDF file. Default is “./trinucleotide_profile.pdf”.

  • ggsave_params (list): A list of parameters to be passed to ggplot2::ggsave(). This list can include any arguments accepted by ggsave(). Example: list(width = 17, height = 6, units = “cm”, device = “pdf”).

Returns:

A trinucleotide SBS plot object and an optional PDF file.