[ PROMPT_NODE_26465 ]

Tools Reference

[ SKILL_DOCUMENTATION ]

# deepTools Complete Tool Reference This document provides a comprehensive reference for all deepTools command-line utilities organized by category. ## BAM and bigWig File Processing Tools ### multiBamSummary Computes read coverages for genomic regions across multiple BAM files, outputting compressed numpy arrays for downstream correlation and PCA analysis. **Modes:** - **bins**: Genome-wide analysis using consecutive equal-sized windows (default 10kb) - **BED-file**: Restricts analysis to user-specified genomic regions **Key Parameters:** - `--bamfiles, -b`: Indexed BAM files (space-separated, required) - `--outFileName, -o`: Output coverage matrix file (required) - `--BED`: Region specification file (BED-file mode only) - `--binSize`: Window size in bases (default: 10,000) - `--labels`: Custom sample identifiers - `--minMappingQuality`: Quality threshold for read inclusion - `--numberOfProcessors, -p`: Parallel processing cores - `--extendReads`: Fragment size extension - `--ignoreDuplicates`: Remove PCR duplicates - `--outRawCounts`: Export tab-delimited file with coordinate columns and per-sample counts **Output:** Compressed numpy array (.npz) for plotCorrelation and plotPCA **Common Usage:** ```bash # Genome-wide comparison multiBamSummary bins --bamfiles sample1.bam sample2.bam -o results.npz # Peak region comparison multiBamSummary BED-file --BED peaks.bed --bamfiles sample1.bam sample2.bam -o results.npz ``` --- ### multiBigwigSummary Similar to multiBamSummary but operates on bigWig files instead of BAM files. Used for comparing coverage tracks across samples. **Modes:** - **bins**: Genome-wide analysis - **BED-file**: Region-specific analysis **Key Parameters:** Similar to multiBamSummary but accepts bigWig files --- ### bamCoverage Converts BAM alignment files into normalized coverage tracks in bigWig or bedGraph formats. Calculates coverage as number of reads per bin. **Key Parameters:** - `--bam, -b`: Input BAM file (required) - `--outFileName, -o`: Output filename (required) - `--outFileFormat, -of`: Output type (bigwig or bedgraph) - `--normalizeUsing`: Normalization method - **RPKM**: Reads Per Kilobase per Million mapped reads - **CPM**: Counts Per Million mapped reads - **BPM**: Bins Per Million mapped reads - **RPGC**: Reads per genomic content (requires --effectiveGenomeSize) - **None**: No normalization (default) - `--effectiveGenomeSize`: Mappable genome size (required for RPGC) - `--binSize`: Resolution in base pairs (default: 50) - `--extendReads, -e`: Extend reads to fragment length (recommended for ChIP-seq, NOT for RNA-seq) - `--centerReads`: Center reads at fragment length for sharper signals - `--ignoreDuplicates`: Count identical reads only once - `--minMappingQuality`: Filter reads below quality threshold - `--minFragmentLength / --maxFragmentLength`: Fragment length filtering - `--smoothLength`: Window averaging for noise reduction - `--MNase`: Analyze MNase-seq data for nucleosome positioning - `--Offset`: Position-specific offsets (useful for RiboSeq, GROseq) - `--filterRNAstrand`: Separate forward/reverse strand reads - `--ignoreForNormalization`: Exclude chromosomes from normalization (e.g., sex chromosomes) - `--numberOfProcessors, -p`: Parallel processing **Important Notes:** - For RNA-seq: Do NOT use --extendReads (would extend over splice junctions) - For ChIP-seq: Use --extendReads with smaller bin sizes - Never apply --ignoreDuplicates after GC bias correction **Common Usage:** ```bash # Basic coverage with RPKM normalization bamCoverage --bam input.bam --outFileName coverage.bw --normalizeUsing RPKM # ChIP-seq with extension bamCoverage --bam chip.bam --outFileName chip_coverage.bw --binSize 10 --extendReads 200 --ignoreDuplicates # Strand-specific RNA-seq bamCoverage --bam rnaseq.bam --outFileName forward.bw --filterRNAstrand forward ``` --- ### bamCompare Compares two BAM files by generating bigWig or bedGraph files, normalizing for sequencing depth differences. Processes genome in equal-sized bins and performs per-bin calculations. **Comparison Methods:** - **log2** (default): Log2 ratio of samples - **ratio**: Direct ratio calculation - **subtract**: Difference between files - **add**: Sum of samples - **mean**: Average across samples - **reciprocal_ratio**: Negative inverse for ratios 1000 columns) - `--dpi`: Figure resolution **Clustering:** - `--kmeans`: k-means clustering - `--hclust`: Hierarchical clustering (slower for >1000 regions) - `--silhouette`: Calculate cluster quality metrics **Visual Customization:** - `--heatmapHeight / --heatmapWidth`: Dimensions (3-100 cm) - `--whatToShow`: plot, heatmap, colorbar (combinations) - `--alpha`: Transparency (0-1) - `--colorMap`: 50+ color schemes - `--colorList`: Custom gradient colors - `--zMin / --zMax`: Intensity scale limits - `--boxAroundHeatmaps`: yes/no (default: yes) **Labels:** - `--xAxisLabel / --yAxisLabel`: Axis labels - `--regionsLabel`: Region set identifiers - `--samplesLabel`: Sample names - `--refPointLabel`: Reference point label - `--startLabel / --endLabel`: Region boundary labels **Common Usage:** ```bash # Basic heatmap plotHeatmap -m matrix.gz -o heatmap.png # With clustering and custom colors plotHeatmap -m matrix.gz -o heatmap.png --kmeans 3 --colorMap RdBu --zMin -3 --zMax 3 ``` --- ### plotProfile Generates profile plots showing scores across genomic regions using computeMatrix output. **Key Parameters:** - `--matrixFile, -m`: Matrix from computeMatrix (required) - `--outFileName, -o`: Output image (png, eps, pdf, svg) (required) - `--plotType`: lines, fill, se, std, overlapped_lines, heatmap - `--colors`: Color palette (names or hex codes) - `--plotHeight / --plotWidth`: Dimensions in centimeters - `--yMin / --yMax`: Y-axis range - `--averageType`: mean, median, min, max, std, sum **Clustering:** - `--kmeans`: k-means clustering - `--hclust`: Hierarchical clustering - `--silhouette`: Cluster quality metrics **Labels:** - `--plotTitle`: Main heading - `--regionsLabel`: Region set identifiers - `--samplesLabel`: Sample names - `--startLabel / --endLabel`: Region boundary labels (scale-regions mode) **Output Options:** - `--outFileNameData`: Export data as tab-separated values - `--outFileSortedRegions`: Save filtered/sorted regions as BED **Common Usage:** ```bash # Line plot plotProfile -m matrix.gz -o profile.png --plotType lines # With standard error shading plotProfile -m matrix.gz -o profile.png --plotType se --colors blue red green ``` --- ### plotEnrichment Calculates and visualizes signal enrichment across genomic regions. Measures percentage of alignments overlapping region groups. Useful for FRiP (Fragment in Peaks) scores. **Key Parameters:** - `--bamfiles, -b`: Indexed BAM files (required) - `--BED`: Region files in BED/GTF format (required) - `--plotFile, -o`: Output visualization (png, pdf, eps, svg) - `--labels, -l`: Custom sample identifiers - `--outRawCounts`: Export numerical data - `--perSample`: Group by sample instead of feature (default) - `--regionLabels`: Custom region names **Read Processing:** - `--minFragmentLength / --maxFragmentLength`: Fragment filters - `--minMappingQuality`: Quality threshold - `--samFlagInclude / --samFlagExclude`: SAM flag filters - `--ignoreDuplicates`: Remove duplicates - `--centerReads`: Center reads for sharper signal **Common Usage:** ```bash plotEnrichment -b Input.bam H3K4me3.bam --BED peaks_up.bed peaks_down.bed --regionLabels "Up regulated" "Down regulated" -o enrichment.png ``` --- ## Miscellaneous Tools ### computeMatrixOperations Advanced matrix manipulation tool for combining or subsetting matrices from computeMatrix. Enables complex multi-sample, multi-region analyses. **Operations:** - `cbind`: Combine matrices column-wise - `rbind`: Combine matrices row-wise - `subset`: Extract specific samples or regions - `filterStrand`: Keep only regions on specific strand - `filterValues`: Apply signal intensity filters - `sort`: Order regions by various criteria - `dataRange`: Report min/max values **Common Usage:** ```bash # Combine matrices computeMatrixOperations cbind -m matrix1.gz matrix2.gz -o combined.gz # Extract specific samples computeMatrixOperations subset -m matrix.gz --samples 0 2 -o subset.gz ``` --- ### estimateReadFiltering Predicts the impact of various filtering parameters without actually filtering. Helps optimize filtering strategies before running full analyses. **Key Parameters:** - `--bamfiles, -b`: BAM files to analyze - `--sampleSize`: Number of reads to sample (default: 100,000) - `--binSize`: Bin size for analysis - `--distanceBetweenBins`: Spacing between sampled bins **Filtration Options to Test:** - `--minMappingQuality`: Test quality thresholds - `--ignoreDuplicates`: Assess duplicate impact - `--minFragmentLength / --maxFragmentLength`: Test fragment filters --- ## Common Parameters Across Tools Many deepTools commands share these filtering and performance options: **Read Filtering:** - `--ignoreDuplicates`: Remove PCR duplicates - `--minMappingQuality`: Filter by alignment confidence - `--samFlagInclude / --samFlagExclude`: SAM format filtering - `--minFragmentLength / --maxFragmentLength`: Fragment length bounds **Performance:** - `--numberOfProcessors, -p`: Enable parallel processing - `--region`: Process specific genomic regions (chr:start-end) **Read Processing:** - `--extendReads`: Extend to fragment length - `--centerReads`: Center at fragment midpoint - `--ignoreDuplicates`: Count unique reads only

Source: claude-code-templates (MIT). See About Us for full credits.

BAGUA AI