[ PROMPT_NODE_26465 ]
Tools Reference
[ SKILL_DOCUMENTATION ]
# deepTools Complete Tool Reference
This document provides a comprehensive reference for all deepTools command-line utilities organized by category.
## BAM and bigWig File Processing Tools
### multiBamSummary
Computes read coverages for genomic regions across multiple BAM files, outputting compressed numpy arrays for downstream correlation and PCA analysis.
**Modes:**
- **bins**: Genome-wide analysis using consecutive equal-sized windows (default 10kb)
- **BED-file**: Restricts analysis to user-specified genomic regions
**Key Parameters:**
- `--bamfiles, -b`: Indexed BAM files (space-separated, required)
- `--outFileName, -o`: Output coverage matrix file (required)
- `--BED`: Region specification file (BED-file mode only)
- `--binSize`: Window size in bases (default: 10,000)
- `--labels`: Custom sample identifiers
- `--minMappingQuality`: Quality threshold for read inclusion
- `--numberOfProcessors, -p`: Parallel processing cores
- `--extendReads`: Fragment size extension
- `--ignoreDuplicates`: Remove PCR duplicates
- `--outRawCounts`: Export tab-delimited file with coordinate columns and per-sample counts
**Output:** Compressed numpy array (.npz) for plotCorrelation and plotPCA
**Common Usage:**
```bash
# Genome-wide comparison
multiBamSummary bins --bamfiles sample1.bam sample2.bam -o results.npz
# Peak region comparison
multiBamSummary BED-file --BED peaks.bed --bamfiles sample1.bam sample2.bam -o results.npz
```
---
### multiBigwigSummary
Similar to multiBamSummary but operates on bigWig files instead of BAM files. Used for comparing coverage tracks across samples.
**Modes:**
- **bins**: Genome-wide analysis
- **BED-file**: Region-specific analysis
**Key Parameters:** Similar to multiBamSummary but accepts bigWig files
---
### bamCoverage
Converts BAM alignment files into normalized coverage tracks in bigWig or bedGraph formats. Calculates coverage as number of reads per bin.
**Key Parameters:**
- `--bam, -b`: Input BAM file (required)
- `--outFileName, -o`: Output filename (required)
- `--outFileFormat, -of`: Output type (bigwig or bedgraph)
- `--normalizeUsing`: Normalization method
- **RPKM**: Reads Per Kilobase per Million mapped reads
- **CPM**: Counts Per Million mapped reads
- **BPM**: Bins Per Million mapped reads
- **RPGC**: Reads per genomic content (requires --effectiveGenomeSize)
- **None**: No normalization (default)
- `--effectiveGenomeSize`: Mappable genome size (required for RPGC)
- `--binSize`: Resolution in base pairs (default: 50)
- `--extendReads, -e`: Extend reads to fragment length (recommended for ChIP-seq, NOT for RNA-seq)
- `--centerReads`: Center reads at fragment length for sharper signals
- `--ignoreDuplicates`: Count identical reads only once
- `--minMappingQuality`: Filter reads below quality threshold
- `--minFragmentLength / --maxFragmentLength`: Fragment length filtering
- `--smoothLength`: Window averaging for noise reduction
- `--MNase`: Analyze MNase-seq data for nucleosome positioning
- `--Offset`: Position-specific offsets (useful for RiboSeq, GROseq)
- `--filterRNAstrand`: Separate forward/reverse strand reads
- `--ignoreForNormalization`: Exclude chromosomes from normalization (e.g., sex chromosomes)
- `--numberOfProcessors, -p`: Parallel processing
**Important Notes:**
- For RNA-seq: Do NOT use --extendReads (would extend over splice junctions)
- For ChIP-seq: Use --extendReads with smaller bin sizes
- Never apply --ignoreDuplicates after GC bias correction
**Common Usage:**
```bash
# Basic coverage with RPKM normalization
bamCoverage --bam input.bam --outFileName coverage.bw --normalizeUsing RPKM
# ChIP-seq with extension
bamCoverage --bam chip.bam --outFileName chip_coverage.bw
--binSize 10 --extendReads 200 --ignoreDuplicates
# Strand-specific RNA-seq
bamCoverage --bam rnaseq.bam --outFileName forward.bw
--filterRNAstrand forward
```
---
### bamCompare
Compares two BAM files by generating bigWig or bedGraph files, normalizing for sequencing depth differences. Processes genome in equal-sized bins and performs per-bin calculations.
**Comparison Methods:**
- **log2** (default): Log2 ratio of samples
- **ratio**: Direct ratio calculation
- **subtract**: Difference between files
- **add**: Sum of samples
- **mean**: Average across samples
- **reciprocal_ratio**: Negative inverse for ratios 1000 columns)
- `--dpi`: Figure resolution
**Clustering:**
- `--kmeans`: k-means clustering
- `--hclust`: Hierarchical clustering (slower for >1000 regions)
- `--silhouette`: Calculate cluster quality metrics
**Visual Customization:**
- `--heatmapHeight / --heatmapWidth`: Dimensions (3-100 cm)
- `--whatToShow`: plot, heatmap, colorbar (combinations)
- `--alpha`: Transparency (0-1)
- `--colorMap`: 50+ color schemes
- `--colorList`: Custom gradient colors
- `--zMin / --zMax`: Intensity scale limits
- `--boxAroundHeatmaps`: yes/no (default: yes)
**Labels:**
- `--xAxisLabel / --yAxisLabel`: Axis labels
- `--regionsLabel`: Region set identifiers
- `--samplesLabel`: Sample names
- `--refPointLabel`: Reference point label
- `--startLabel / --endLabel`: Region boundary labels
**Common Usage:**
```bash
# Basic heatmap
plotHeatmap -m matrix.gz -o heatmap.png
# With clustering and custom colors
plotHeatmap -m matrix.gz -o heatmap.png
--kmeans 3 --colorMap RdBu --zMin -3 --zMax 3
```
---
### plotProfile
Generates profile plots showing scores across genomic regions using computeMatrix output.
**Key Parameters:**
- `--matrixFile, -m`: Matrix from computeMatrix (required)
- `--outFileName, -o`: Output image (png, eps, pdf, svg) (required)
- `--plotType`: lines, fill, se, std, overlapped_lines, heatmap
- `--colors`: Color palette (names or hex codes)
- `--plotHeight / --plotWidth`: Dimensions in centimeters
- `--yMin / --yMax`: Y-axis range
- `--averageType`: mean, median, min, max, std, sum
**Clustering:**
- `--kmeans`: k-means clustering
- `--hclust`: Hierarchical clustering
- `--silhouette`: Cluster quality metrics
**Labels:**
- `--plotTitle`: Main heading
- `--regionsLabel`: Region set identifiers
- `--samplesLabel`: Sample names
- `--startLabel / --endLabel`: Region boundary labels (scale-regions mode)
**Output Options:**
- `--outFileNameData`: Export data as tab-separated values
- `--outFileSortedRegions`: Save filtered/sorted regions as BED
**Common Usage:**
```bash
# Line plot
plotProfile -m matrix.gz -o profile.png --plotType lines
# With standard error shading
plotProfile -m matrix.gz -o profile.png --plotType se
--colors blue red green
```
---
### plotEnrichment
Calculates and visualizes signal enrichment across genomic regions. Measures percentage of alignments overlapping region groups. Useful for FRiP (Fragment in Peaks) scores.
**Key Parameters:**
- `--bamfiles, -b`: Indexed BAM files (required)
- `--BED`: Region files in BED/GTF format (required)
- `--plotFile, -o`: Output visualization (png, pdf, eps, svg)
- `--labels, -l`: Custom sample identifiers
- `--outRawCounts`: Export numerical data
- `--perSample`: Group by sample instead of feature (default)
- `--regionLabels`: Custom region names
**Read Processing:**
- `--minFragmentLength / --maxFragmentLength`: Fragment filters
- `--minMappingQuality`: Quality threshold
- `--samFlagInclude / --samFlagExclude`: SAM flag filters
- `--ignoreDuplicates`: Remove duplicates
- `--centerReads`: Center reads for sharper signal
**Common Usage:**
```bash
plotEnrichment -b Input.bam H3K4me3.bam
--BED peaks_up.bed peaks_down.bed
--regionLabels "Up regulated" "Down regulated"
-o enrichment.png
```
---
## Miscellaneous Tools
### computeMatrixOperations
Advanced matrix manipulation tool for combining or subsetting matrices from computeMatrix. Enables complex multi-sample, multi-region analyses.
**Operations:**
- `cbind`: Combine matrices column-wise
- `rbind`: Combine matrices row-wise
- `subset`: Extract specific samples or regions
- `filterStrand`: Keep only regions on specific strand
- `filterValues`: Apply signal intensity filters
- `sort`: Order regions by various criteria
- `dataRange`: Report min/max values
**Common Usage:**
```bash
# Combine matrices
computeMatrixOperations cbind -m matrix1.gz matrix2.gz -o combined.gz
# Extract specific samples
computeMatrixOperations subset -m matrix.gz --samples 0 2 -o subset.gz
```
---
### estimateReadFiltering
Predicts the impact of various filtering parameters without actually filtering. Helps optimize filtering strategies before running full analyses.
**Key Parameters:**
- `--bamfiles, -b`: BAM files to analyze
- `--sampleSize`: Number of reads to sample (default: 100,000)
- `--binSize`: Bin size for analysis
- `--distanceBetweenBins`: Spacing between sampled bins
**Filtration Options to Test:**
- `--minMappingQuality`: Test quality thresholds
- `--ignoreDuplicates`: Assess duplicate impact
- `--minFragmentLength / --maxFragmentLength`: Test fragment filters
---
## Common Parameters Across Tools
Many deepTools commands share these filtering and performance options:
**Read Filtering:**
- `--ignoreDuplicates`: Remove PCR duplicates
- `--minMappingQuality`: Filter by alignment confidence
- `--samFlagInclude / --samFlagExclude`: SAM format filtering
- `--minFragmentLength / --maxFragmentLength`: Fragment length bounds
**Performance:**
- `--numberOfProcessors, -p`: Enable parallel processing
- `--region`: Process specific genomic regions (chr:start-end)
**Read Processing:**
- `--extendReads`: Extend to fragment length
- `--centerReads`: Center at fragment midpoint
- `--ignoreDuplicates`: Count unique reads only
Source: claude-code-templates (MIT). See About Us for full credits.