.. _cfdnapro_functions: cfDNAPro Functions and Parameters ================================= readBam -------- The ``readBam()`` function reads a BAM file and returns a curated ``GRanges`` object. It processes the BAM file according to various user-defined parameters, such as genome label, strand mode, and sequence names to keep. **Parameters**: - **genome_label** (str): Specifies the genome used in the alignment. Accepted values are `"hg19"`, `"hg38"`, or `"hg38-NCBI"`. Default is `"hg19"`. It loads corresponding genome packages for Homo sapiens sequences. - **bamfile** (str): The path to the BAM file. - **curate_start_and_end** (bool): If `TRUE`, the start and end coordinates of alignments are curated. Default is `TRUE`. - **outdir** (str or NA): Path to save the RDS file. If `NA`, no file is saved. - **strand_mode** (int): Defines strand mode; 1 means the strand of the pair is taken from the first alignment. Default is `1`. - **chromosome_to_keep** (vector of str or bool): A character vector containing seqnames to retain in the ``GRanges`` object. Default is `paste0("chr", 1:22)`. If `FALSE`, no filtering is applied. - **use_names** (bool): Whether to assign read names to the ``GRanges`` object. Default is `TRUE`. - **galp_flag** (Rsamtools ``ScanBamFlag``): Specifies the flags for scanning the BAM file. - **galp_what** (vector of str): The fields to return from the BAM file, such as "cigar", "mapq", "isize", "seq", and "qual". - **galp_tag** (vector of str): Specifies optional fields (tags) to retrieve from the BAM file. - **galp_mapqFilter** (int): The minimum mapping quality to include a read. Default is `40`. - **galp_bqFilter** (int): The minimum base quality at the mutation locus. Default is `20`. - **mutation_file** (str or NULL): An optional file containing mutation loci for mutational annotation. - **mut_fragments_only** (bool): If `TRUE`, only retrieves alignments overlapping mutation loci. Default is `FALSE`. - **...**: Additional arguments passed to or from other methods. **Returns**: A curated ``GRanges`` object containing the genomic alignments. callTrinucleotide ----------------- The ``callTrinucleotide()`` function processes a GRanges object, summarizing cfDNA fragment information for each target mutation locus. It annotates each mutation locus with the number and type of supporting fragments according to their read-pair overlap status. The median fragment length is also annotated for each read-pair overlap type. Additionally, the function calculates the locus-based consensus mutation by selecting the most frequent mismatch type, with priority given to concordant read-pair mutations (CO_MUT), followed by single-read mutations (SO_MUT). The consensus mutations are used to derive the trinucleotide substitution types (SBS96). **Parameters**: - **frag_obj_mut** (GRanges): A ``GRanges`` object containing fragment and mutation data. **Returns**: A dataframe with summarized mutational and trinucleotide data. plotTrinucleotide ----------------- The ``plotTrinucleotide()`` function processes and plots trinucleotide data. It first applies specified filters and transformations to the data, then generates a visual representation of the results. The function handles data normalization, exclusion, and retention based on provided column names, and it creates detailed plots with options for customization of plot aesthetics. **Parameters**: - **trinuc_df** (DataFrame): The dataframe containing trinucleotide data. - **exclude_if_type_present** (vector): Mutation locus read-pair overlap types (e.g., CO_MUT, SO_MUT, CO_REF, SO_REF, DO, SO_OTHER, CO_OTHER) whose non-zero presence triggers exclusion of loci. For example, `c("DO")` will exclude any loci that contain even a single discordant read-pair overlap. - **retain_if_type_present** (vector): Mutation locus read-pair overlap types that must be present to retain those loci. For example, `c("CO")` will retain loci that contain even a single concordant read-pair overlap. - **remove_type** (vector): Mutation locus read-pair overlap types (e.g., SO_MUT, CO_REF, DO) to set to 0 across all loci in the dataframe. - **normalize_counts** (bool): If `TRUE`, normalizes SBS counts so they sum to 1. Default is `TRUE`. - **show_overlap_type** (bool): If `TRUE`, displays read-pair overlap types. Default is `TRUE`. - **ylim** (numeric): Limits for the y-axis in the plot. Default is `c(0, 0.5)`. - **plot_title** (str): The title for the plot. Default is `"Trinucleotide Profile"`. - **y_axis_title** (str): The title for the y-axis. Default is `"Percentage of Single Base Substitutions"`. - **draw_x_axis_labels** (bool): Whether to draw x-axis labels. Default is `TRUE`. - **draw_y_axis_labels** (bool): Whether to draw y-axis labels. Default is `TRUE`. - **draw_y_axis_title** (bool): Whether to display a title for the y-axis. Default is `TRUE`. - **output_file** (str): The path to the output PDF file. Default is `"./trinucleotide_profile.pdf"`. - **ggsave_params** (list): A list of parameters to be passed to ``ggplot2::ggsave()``. This list can include any arguments accepted by ``ggsave()``. Example: `list(width = 17, height = 6, units = "cm", device = "pdf")`. **Returns**: A trinucleotide SBS plot object and an optional PDF file.