Create a new SeqOps pipeline
Input sequences (async iterable)
StaticfromCreate SeqOps pipeline from delimiter-separated file
Supports auto-detection of delimiter and format. Files can be compressed (.gz, .zst) and will be automatically decompressed during streaming.
Path to DSV file (TSV, CSV, or custom delimiter)
Optionaloptions: {Parsing options (delimiter auto-detected if not specified)
New SeqOps pipeline for sequence processing
StaticfromCreate SeqOps pipeline from TSV (tab-separated) file
Convenience method for TSV files with tab delimiter pre-configured.
Path to TSV file
Optionaloptions: Omit<Parsing options (delimiter forced to tab)
New SeqOps pipeline
StaticfromCreate SeqOps pipeline from CSV (comma-separated) file
Convenience method for CSV files with comma delimiter pre-configured. Handles Excel-exported CSV files with proper quote escaping.
Path to CSV file
Optionaloptions: Omit<Parsing options (delimiter forced to comma)
New SeqOps pipeline
StaticfromCreate SeqOps pipeline from JSON file
Parses JSON files containing sequence arrays. Supports both simple array format and wrapped format with metadata. Suitable for datasets under 100K sequences (loads entire file into memory).
Path to JSON file
Optionaloptions: JSONParseOptionsParsing options (format, quality encoding)
New SeqOps pipeline
StaticfromCreate SeqOps pipeline from JSONL (JSON Lines) file
Parses JSONL files where each line is a separate JSON object. Provides streaming with O(1) memory usage, suitable for datasets with millions of sequences.
Path to JSONL file
Optionaloptions: JSONParseOptionsParsing options (format, quality encoding)
New SeqOps pipeline
StaticfromCreate SeqOps pipeline from array of sequences
Convenient method to convert arrays to SeqOps pipelines. Most common use case for examples and small datasets.
Array of sequences
New SeqOps instance
Filter sequences based on criteria
Remove sequences that don't meet specified criteria. All criteria within a single filter call are combined with AND logic.
After calling .enumerate(), the index parameter becomes available in
predicate functions, enabling position-based filtering.
Filter criteria or custom predicate (with index after enumerate)
New SeqOps instance for chaining
// Filter by length and GC content
seqops(sequences)
.filter({ minLength: 100, maxGC: 60 })
.filter({ hasAmbiguous: false });
// Custom filter function
seqops(sequences)
.filter((seq) => seq.id.startsWith('chr'));
// With index (after enumerate) - keep even positions
seqops(sequences)
.enumerate()
.filter((seq, idx) => idx % 2 === 0);
// Async predicate with index
seqops(sequences)
.enumerate()
.filter(async (seq, idx) => {
const valid = await validateSequence(seq);
return valid && idx < 1000;
});
// Type preservation with FastqSequence
seqops<FastqSequence>(reads)
.filter((seq) => seq.quality !== undefined);
// Type: SeqOps<FastqSequence> ✅
Filter sequences based on criteria
Remove sequences that don't meet specified criteria. All criteria within a single filter call are combined with AND logic.
After calling .enumerate(), the index parameter becomes available in
predicate functions, enabling position-based filtering.
Filter criteria or custom predicate (with index after enumerate)
New SeqOps instance for chaining
// Filter by length and GC content
seqops(sequences)
.filter({ minLength: 100, maxGC: 60 })
.filter({ hasAmbiguous: false });
// Custom filter function
seqops(sequences)
.filter((seq) => seq.id.startsWith('chr'));
// With index (after enumerate) - keep even positions
seqops(sequences)
.enumerate()
.filter((seq, idx) => idx % 2 === 0);
// Async predicate with index
seqops(sequences)
.enumerate()
.filter(async (seq, idx) => {
const valid = await validateSequence(seq);
return valid && idx < 1000;
});
// Type preservation with FastqSequence
seqops<FastqSequence>(reads)
.filter((seq) => seq.quality !== undefined);
// Type: SeqOps<FastqSequence> ✅
Filter sequences based on membership in a SequenceSet
Efficiently filters the stream based on whether sequences are present in the provided set. Useful for contamination removal, whitelist/blacklist filtering, and set-based operations.
SequenceSet to filter against
Optionaloptions: { exclude?: boolean; by?: "sequence" | "id" }Filtering options
Filtered SeqOps instance
// Remove contamination sequences
const contaminants = await seqops("contaminants.fasta").collectSet();
await seqops("reads.fastq")
.filterBySet(contaminants, { exclude: true })
.writeFastq("clean_reads.fastq");
Transform sequence content
Apply transformations that modify the sequence string itself.
Transform options
New SeqOps instance for chaining
Extract amplicons via primer sequences
Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.
// Simple amplicon extraction (90% use case)
seqops(sequences)
.amplicon('ATCGATCG', 'CGATCGAT')
.writeFasta('amplicons.fasta');
// With mismatch tolerance (common case)
seqops(sequences)
.amplicon('ATCGATCG', 'CGATCGAT', 2)
.filter({ minLength: 50 });
// Single primer (auto-canonical matching)
seqops(sequences)
.amplicon('UNIVERSAL_PRIMER')
.stats();
// Real-world COVID-19 diagnostics
seqops(samples)
.quality({ minScore: 20 })
.amplicon(
primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
2 // Allow sequencing errors
)
.validate({ mode: 'strict' });
// Long reads with windowed search (massive performance boost)
seqops(nanoporeReads)
.amplicon('FORWARD', 'REVERSE', {
searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
});
// Advanced features (10% use case)
seqops(sequences)
.amplicon({
forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
maxMismatches: 3, // Long-read tolerance
canonical: true, // BED-extracted primers
flanking: true, // Include primer context
region: '-100:100', // Biological context
searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
outputMismatches: true // Debug information
})
.rmdup('sequence')
.writeFasta('advanced_amplicons.fasta');
Extract amplicons via primer sequences
Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.
// Simple amplicon extraction (90% use case)
seqops(sequences)
.amplicon('ATCGATCG', 'CGATCGAT')
.writeFasta('amplicons.fasta');
// With mismatch tolerance (common case)
seqops(sequences)
.amplicon('ATCGATCG', 'CGATCGAT', 2)
.filter({ minLength: 50 });
// Single primer (auto-canonical matching)
seqops(sequences)
.amplicon('UNIVERSAL_PRIMER')
.stats();
// Real-world COVID-19 diagnostics
seqops(samples)
.quality({ minScore: 20 })
.amplicon(
primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
2 // Allow sequencing errors
)
.validate({ mode: 'strict' });
// Long reads with windowed search (massive performance boost)
seqops(nanoporeReads)
.amplicon('FORWARD', 'REVERSE', {
searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
});
// Advanced features (10% use case)
seqops(sequences)
.amplicon({
forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
maxMismatches: 3, // Long-read tolerance
canonical: true, // BED-extracted primers
flanking: true, // Include primer context
region: '-100:100', // Biological context
searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
outputMismatches: true // Debug information
})
.rmdup('sequence')
.writeFasta('advanced_amplicons.fasta');
Extract amplicons via primer sequences
Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.
// Simple amplicon extraction (90% use case)
seqops(sequences)
.amplicon('ATCGATCG', 'CGATCGAT')
.writeFasta('amplicons.fasta');
// With mismatch tolerance (common case)
seqops(sequences)
.amplicon('ATCGATCG', 'CGATCGAT', 2)
.filter({ minLength: 50 });
// Single primer (auto-canonical matching)
seqops(sequences)
.amplicon('UNIVERSAL_PRIMER')
.stats();
// Real-world COVID-19 diagnostics
seqops(samples)
.quality({ minScore: 20 })
.amplicon(
primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
2 // Allow sequencing errors
)
.validate({ mode: 'strict' });
// Long reads with windowed search (massive performance boost)
seqops(nanoporeReads)
.amplicon('FORWARD', 'REVERSE', {
searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
});
// Advanced features (10% use case)
seqops(sequences)
.amplicon({
forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
maxMismatches: 3, // Long-read tolerance
canonical: true, // BED-extracted primers
flanking: true, // Include primer context
region: '-100:100', // Biological context
searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
outputMismatches: true // Debug information
})
.rmdup('sequence')
.writeFasta('advanced_amplicons.fasta');
Extract amplicons via primer sequences
Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.
// Simple amplicon extraction (90% use case)
seqops(sequences)
.amplicon('ATCGATCG', 'CGATCGAT')
.writeFasta('amplicons.fasta');
// With mismatch tolerance (common case)
seqops(sequences)
.amplicon('ATCGATCG', 'CGATCGAT', 2)
.filter({ minLength: 50 });
// Single primer (auto-canonical matching)
seqops(sequences)
.amplicon('UNIVERSAL_PRIMER')
.stats();
// Real-world COVID-19 diagnostics
seqops(samples)
.quality({ minScore: 20 })
.amplicon(
primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
2 // Allow sequencing errors
)
.validate({ mode: 'strict' });
// Long reads with windowed search (massive performance boost)
seqops(nanoporeReads)
.amplicon('FORWARD', 'REVERSE', {
searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
});
// Advanced features (10% use case)
seqops(sequences)
.amplicon({
forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
maxMismatches: 3, // Long-read tolerance
canonical: true, // BED-extracted primers
flanking: true, // Include primer context
region: '-100:100', // Biological context
searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
outputMismatches: true // Debug information
})
.rmdup('sequence')
.writeFasta('advanced_amplicons.fasta');
Extract amplicons via primer sequences
Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.
// Simple amplicon extraction (90% use case)
seqops(sequences)
.amplicon('ATCGATCG', 'CGATCGAT')
.writeFasta('amplicons.fasta');
// With mismatch tolerance (common case)
seqops(sequences)
.amplicon('ATCGATCG', 'CGATCGAT', 2)
.filter({ minLength: 50 });
// Single primer (auto-canonical matching)
seqops(sequences)
.amplicon('UNIVERSAL_PRIMER')
.stats();
// Real-world COVID-19 diagnostics
seqops(samples)
.quality({ minScore: 20 })
.amplicon(
primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
2 // Allow sequencing errors
)
.validate({ mode: 'strict' });
// Long reads with windowed search (massive performance boost)
seqops(nanoporeReads)
.amplicon('FORWARD', 'REVERSE', {
searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
});
// Advanced features (10% use case)
seqops(sequences)
.amplicon({
forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
maxMismatches: 3, // Long-read tolerance
canonical: true, // BED-extracted primers
flanking: true, // Include primer context
region: '-100:100', // Biological context
searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
outputMismatches: true // Debug information
})
.rmdup('sequence')
.writeFasta('advanced_amplicons.fasta');
Clean and sanitize sequences
Fix common issues in sequence data such as gaps, ambiguous bases, and whitespace.
Clean options
New SeqOps instance for chaining
FASTQ quality operations
Filter, trim, and bin sequences based on quality scores. Supports filtering, trimming, and binning operations - all operations are optional and can be combined. Only affects FASTQ sequences; FASTA sequences pass through unchanged.
Quality filtering, trimming, and binning options
New SeqOps instance for chaining
Convert FASTQ quality score encodings
Convert quality scores between different encoding schemes (Phred+33, Phred+64, Solexa). Essential for legacy data processing and tool compatibility. Only affects FASTQ sequences; FASTA sequences pass through unchanged.
New SeqOps instance for chaining
// Primary workflow: Auto-detect source encoding (matches seqkit)
seqops(legacyData)
.convert({ targetEncoding: 'phred33' })
.writeFastq('modernized.fastq');
// Legacy Illumina 1.3-1.7 to modern standard
seqops(illumina15Data)
.convert({
sourceEncoding: 'phred64', // Skip detection for known encoding
targetEncoding: 'phred33' // Modern standard
})
// Real-world pipeline: QC → standardize encoding → analysis
const results = await seqops(mixedEncodingFiles)
.quality({ minScore: 20 }) // Filter first
.convert({ targetEncoding: 'phred33' }) // Standardize
.stats({ detailed: true });
Convert FASTA sequences to FASTQ format
Converts FASTA sequences to FASTQ by adding uniform quality scores. This method is only available when working with FASTA sequences and will cause a compile-time error if called on FASTQ sequences.
New SeqOps instance with FASTQ sequences
// Convert with default quality (Phred+33 score 40)
await seqops(fastaSeqs)
.toFastqSequence()
.writeFastq('output.fastq');
// Convert with custom quality character
await seqops(fastaSeqs)
.toFastqSequence({ quality: 'I' }) // Valid
.writeFastq('output.fastq');
// These will cause compile-time errors:
// seqops(fastaSeqs).toFastqSequence({ quality: '€' }); // Invalid character
// seqops(fastqSeqs).toFastqSequence(); // Cannot convert FASTQ to FASTQ
Convert FASTQ sequences to FASTA format
Converts FASTQ sequences to FASTA by removing quality scores. This method is only available when working with FASTQ sequences and will cause a compile-time error if called on FASTA sequences.
New SeqOps instance with FASTA sequences
// Convert FASTQ to FASTA for BLAST database
await seqops(fastqSeqs)
.toFastaSequence()
.writeFasta('blast_db.fasta');
// Preserve quality metrics for QC tracking
await seqops(fastqSeqs)
.toFastaSequence({ includeQualityStats: true })
.writeFasta('assembly_input.fasta');
// This will cause a compile-time error:
// seqops(fastaSeqs).toFastaSequence(); // Cannot convert FASTA to FASTA
Validate sequences
Check sequences for validity and optionally fix or reject invalid ones.
Validation options
New SeqOps instance for chaining
Search sequences by pattern
Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.
// Simple sequence search (most common case)
seqops(sequences)
.grep('ATCG') // Search sequences for 'ATCG'
.grep(/^chr\d+/, 'id') // Search IDs with regex
// Advanced options for complex scenarios
seqops(sequences)
.grep({
pattern: 'ATCGATCG',
target: 'sequence',
allowMismatches: 2,
searchBothStrands: true
})
Search sequences by pattern
Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.
// Simple sequence search (most common case)
seqops(sequences)
.grep('ATCG') // Search sequences for 'ATCG'
.grep(/^chr\d+/, 'id') // Search IDs with regex
// Advanced options for complex scenarios
seqops(sequences)
.grep({
pattern: 'ATCGATCG',
target: 'sequence',
allowMismatches: 2,
searchBothStrands: true
})
Search sequences by pattern
Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.
// Simple sequence search (most common case)
seqops(sequences)
.grep('ATCG') // Search sequences for 'ATCG'
.grep(/^chr\d+/, 'id') // Search IDs with regex
// Advanced options for complex scenarios
seqops(sequences)
.grep({
pattern: 'ATCGATCG',
target: 'sequence',
allowMismatches: 2,
searchBothStrands: true
})
Search sequences by pattern
Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.
// Simple sequence search (most common case)
seqops(sequences)
.grep('ATCG') // Search sequences for 'ATCG'
.grep(/^chr\d+/, 'id') // Search IDs with regex
// Advanced options for complex scenarios
seqops(sequences)
.grep({
pattern: 'ATCGATCG',
target: 'sequence',
allowMismatches: 2,
searchBothStrands: true
})
Search sequences by pattern
Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.
// Simple sequence search (most common case)
seqops(sequences)
.grep('ATCG') // Search sequences for 'ATCG'
.grep(/^chr\d+/, 'id') // Search IDs with regex
// Advanced options for complex scenarios
seqops(sequences)
.grep({
pattern: 'ATCGATCG',
target: 'sequence',
allowMismatches: 2,
searchBothStrands: true
})
StaticconcatConcatenate multiple sequence files into a single pipeline
Static factory function that creates a SeqOps pipeline from multiple files. Elegant API for combining sequence sources with simple duplicate handling.
Array of file paths to concatenate
How to handle duplicate IDs: 'suffix' | 'ignore' (default: 'ignore')
New SeqOps instance for chaining
Concatenate sequences from multiple sources
Combines sequences from multiple file paths and/or AsyncIterables with sophisticated ID conflict resolution. Maintains streaming behavior for memory efficiency with large datasets.
Array of file paths and/or AsyncIterables to concatenate
Optionaloptions: Omit<ConcatOptions, "sources">Concatenation options (optional)
New SeqOps instance for chaining
// Simple concatenation from files
seqops(sequences)
.concat(['file1.fasta', 'file2.fasta'])
.concat([anotherAsyncIterable])
// Advanced options for complex scenarios
seqops(sequences)
.concat(['file1.fasta', 'file2.fasta'], {
idConflictResolution: 'suffix',
validateFormats: true,
sourceLabels: ['batch1', 'batch2'],
onProgress: (processed, total, source) =>
console.log(`Processed ${processed} from ${source}`)
})
Extract subsequences
Mirrors seqkit subseq functionality for region extraction.
Extraction options
New SeqOps instance for chaining
Generate sliding windows (k-mers) from sequences
Extracts overlapping or non-overlapping windows from sequences with compile-time k-mer size tracking. Essential for k-mer analysis, motif discovery, and sequence decomposition.
Window size (k-mer size)
New SeqOps instance with KmerSequence
// Simple usage - just specify size
const kmers = await seqops(sequences).windows(21).toArray();
// With options - step, circular, greedy modes
seqops(sequences).windows(21, { step: 3, circular: true })
// Non-overlapping tiles
seqops(sequences).windows(100, { step: 100 })
// Greedy mode - include short final window
seqops(sequences).windows(50, { greedy: true })
Generate sliding windows (k-mers) from sequences with options
Window size (k-mer size)
Additional window options (step, circular, greedy, etc.)
New SeqOps instance with KmerSequence
Generate sliding windows (k-mers) from sequences (legacy object form)
Window generation options with k-mer size
New SeqOps instance with KmerSequence
Alias for .windows() - emphasizes sliding window concept
Window size
SeqOps yielding KmerSequence objects
Alias for .windows() - emphasizes sliding window concept
Window size
SeqOps yielding KmerSequence objects
Alias for .windows() - emphasizes sliding window concept
SeqOps yielding KmerSequence objects
Alias for .windows() - emphasizes k-mer generation
K-mer size
SeqOps yielding KmerSequence objects
Alias for .windows() - emphasizes k-mer generation
K-mer size
SeqOps yielding KmerSequence objects
Alias for .windows() - emphasizes k-mer generation
SeqOps yielding KmerSequence objects
Take first N sequences (alias for head)
Returns the first N sequences from the stream. This is an alias for
head() provided for developers familiar with this naming convention.
Mirrors seqkit head functionality.
Number of sequences to take
New SeqOps instance for chaining
Sample sequences from the stream
Supports two modes: exact count sampling with strategy selection, or fraction-based streaming sampling for large datasets.
Number of sequences to sample
Sample sequences from the stream
Supports two modes: exact count sampling with strategy selection, or fraction-based streaming sampling for large datasets.
Number of sequences to sample
Sampling strategy ('reservoir', 'systematic', or 'random')
Sample sequences from the stream
Supports two modes: exact count sampling with strategy selection, or fraction-based streaming sampling for large datasets.
Detailed sampling options
Sort sequences by specified criteria
High-performance sorting optimized for genomic data compression. Automatically switches between in-memory and external sorting based on dataset size. Proper sequence ordering dramatically improves compression ratios for genomic datasets.
Sort criteria and options
New SeqOps instance for chaining
// Sort by length for compression optimization
seqops(sequences)
.sort({ by: 'length', order: 'desc' })
// Sort by GC content for clustering similar sequences
seqops(sequences)
.sort({ by: 'gc', order: 'asc' })
// Custom sorting for specialized genomic criteria
seqops(sequences)
.sort({
custom: (a, b) => a.sequence.localeCompare(b.sequence)
})
Remove duplicate sequences with configurable deduplication strategies
Streaming deduplication with multiple key extraction methods and conflict resolution strategies. Memory-efficient for large datasets when using the default "first" strategy.
Optionaloptions: UniqueOptionsDeduplication options
New SeqOps with deduplicated sequences
// Remove duplicate sequences (most common)
seqops(sequences).unique();
// Remove sequences with duplicate IDs
seqops(sequences).unique({ by: "id" });
// Case-insensitive sequence deduplication
seqops(sequences).unique({ by: "sequence", caseSensitive: false });
Replace sequence names/content by regular expression
Performs pattern-based substitution on sequence IDs (default) or sequence content (FASTA only). Supports capture variables, special placeholders ({nr}, {kv}, {fn}), and grep-style filtering.
Replace options with pattern and replacement string
New SeqOps instance for chaining
// Remove descriptions from sequence IDs
seqops(sequences).replace({ pattern: '\\s.+', replacement: '' })
// Add prefix to all sequence IDs
seqops(sequences).replace({ pattern: '^', replacement: 'PREFIX_' })
// Use capture variables to restructure IDs
seqops(sequences).replace({
pattern: '^(\\w+)_(\\w+)',
replacement: '$2_$1'
})
// Key-value lookup from file
seqops(sequences).replace({
pattern: '^(\\w+)',
replacement: '$1_{kv}',
kvFile: 'aliases.txt'
})
Translate DNA/RNA sequences to proteins
High-performance protein translation supporting all 31 NCBI genetic codes with progressive disclosure for optimal developer experience.
OptionalgeneticCode: number | TranslateOptionsGenetic code number (1-33) or full options object
New SeqOps instance for chaining
Split sequences into multiple files
Terminal operation that writes pipeline sequences to separate files with comprehensive seqkit split/split2 compatibility. Integrates seamlessly with all SeqOps pipeline operations for sophisticated genomic workflows.
Split configuration options
Promise resolving to split results summary
// Basic usage - split after processing
const result = await seqops(sequences)
.filter({ minLength: 100 })
.clean({ removeGaps: true })
.split({ mode: 'by-size', sequencesPerFile: 1000 });
// Real-world genomics: Quality control → split for parallel processing
const qcResults = await seqops(rawReads)
.quality({ minScore: 20, trim: true }) // Quality filter
.filter({ minLength: 50, maxLength: 150 }) // Length filter
.clean({ removeAmbiguous: true }) // Clean sequences
.split({ mode: 'by-length', basesPerFile: 1000000 }); // 1MB chunks
// Genome assembly: Split chromosomes for parallel analysis
const chrResults = await seqops(genome)
.grep({ pattern: /^chr[1-9]/, target: 'id' }) // Autosomal only
.transform({ upperCase: true }) // Normalize case
.split({ mode: 'by-id', idRegex: 'chr(\\d+)' }); // Group by chromosome
// Amplicon sequencing: Process primers → split by target
const amplicons = await seqops(sequences)
.grep({ pattern: forwardPrimer, target: 'sequence' }) // Has forward primer
.grep({ pattern: reversePrimer, target: 'sequence' }) // Has reverse primer
.subseq({ region: '20:-20' }) // Trim primers
.split({ mode: 'by-parts', numParts: 8 }); // Parallel processing
console.log(`Created ${result.filesCreated.length} files`);
Split sequences with streaming results for advanced processing
Returns AsyncIterable of split results following the locate() pattern. Enables sophisticated post-processing workflows where each split result needs individual handling during the splitting process.
Split configuration options
AsyncIterable of split results for processing
// Basic streaming - process each split file as it's created
for await (const result of seqops(sequences).splitToStream(options)) {
await compressFile(result.outputFile);
console.log(`Split ${result.sequenceCount} sequences to ${result.outputFile}`);
}
// Large genome processing: Split → compress → upload pipeline
for await (const chunk of seqops(largeGenome).splitToStream({
mode: 'by-length',
basesPerFile: 50_000_000 // 50MB chunks
})) {
// Process each chunk immediately to manage memory
await compressWithBgzip(chunk.outputFile);
await uploadToCloud(chunk.outputFile + '.gz');
await deleteLocalFile(chunk.outputFile); // Clean up
console.log(`Processed chunk ${chunk.partId}: ${chunk.sequenceCount} sequences`);
}
// Quality control: Split → validate → report pipeline
const qualityReports = [];
for await (const batch of seqops(sequencingRun).splitToStream({
mode: 'by-size',
sequencesPerFile: 10000
})) {
const qc = await runQualityControl(batch.outputFile);
qualityReports.push({
file: batch.outputFile,
sequences: batch.sequenceCount,
qcScore: qc.overallScore
});
}
Split by sequence count (convenience method)
Most common splitting mode - divide sequences into files with N sequences each. Ideal for creating manageable chunks for parallel processing.
Number of sequences per output file
Output directory (default: './split')
Promise resolving to split results
// Simple case - just split
await seqops(sequences).splitBySize(1000);
// Common workflow: Filter → process → split for downstream analysis
await seqops(rawSequences)
.filter({ minLength: 100 })
.clean({ removeGaps: true })
.splitBySize(5000, './chunks');
// RNA-seq: Quality filter → deduplicate → split for differential expression
await seqops(rnaseqReads)
.quality({ minScore: 20 })
.rmdup({ by: 'sequence' })
.splitBySize(100000, './de-analysis');
Split into equal parts (convenience method)
Number of output files to create
Output directory (default: './split')
Promise resolving to split results
Split by base count (convenience method)
Implements seqkit split2's key functionality for splitting by total sequence bases rather than sequence count. Essential for genome processing where you need consistent data sizes regardless of sequence count.
Number of bases per output file
Output directory (default: './split')
Promise resolving to split results
// Genome assembly: Split into 10MB chunks for parallel processing
await seqops(scaffolds).splitByLength(10_000_000);
// Metagenomics: Process → bin → split by data size
await seqops(contigs)
.filter({ minLength: 1000 })
.sort({ by: 'length', order: 'desc' }) // Longest first
.splitByLength(5_000_000, './metagenome-bins');
// Long-read sequencing: Quality control → split for analysis
await seqops(nanoporeReads)
.quality({ minScore: 7 }) // Nanopore quality threshold
.filter({ minLength: 5000, maxLength: 100000 })
.splitByLength(50_000_000, './nanopore-chunks');
Split by sequence ID pattern (convenience method)
Groups sequences by ID patterns for organized analysis. String patterns are automatically converted to RegExp for better developer experience.
String pattern or RegExp to group sequences by ID
Output directory (default: './split')
Promise resolving to split results
// Genome assembly: Split by chromosome
await seqops(scaffolds).splitById('chr(\\d+)'); // chr1, chr2, chr3...
// Multi-species analysis: Group by organism
await seqops(sequences)
.splitById('(\\w+)_gene'); // Groups: human_gene, mouse_gene, etc.
// Transcriptome: Split by gene families
await seqops(transcripts)
.filter({ minLength: 200 })
.transform({ upperCase: true })
.splitById('(HOX\\w+)_transcript', './gene-families');
// Advanced: Use RegExp for complex patterns
await seqops(sequences)
.splitById(/^(chr[XY]|chrM)_/, './sex-chromosomes');
Split by genomic region with compile-time validation (convenience method)
Uses advanced TypeScript template literal types to parse and validate genomic regions at compile time, preventing coordinate errors.
Promise resolving to split results
// ✅ Type-safe region parsing - validated at compile time
await seqops(sequences).splitByRegion('chr1:1000-2000');
await seqops(sequences).splitByRegion('scaffold_1:500-1500');
await seqops(sequences).splitByRegion('chrX:0-1000'); // 0-based OK
// ❌ These cause TypeScript compilation errors:
// await seqops(sequences).splitByRegion('chr1:2000-1000'); // end < start
// await seqops(sequences).splitByRegion('chr1:1000-1000'); // end = start
// await seqops(sequences).splitByRegion('invalid-format'); // bad format
// 🔥 Compile-time coordinate extraction available:
type Coords = ExtractCoordinates<'chr1:1000-2000'>;
// → { chr: 'chr1'; start: 1000; end: 2000; length: 1000 }
Calculate sequence statistics
Terminal operation that processes all sequences to compute statistics.
Mirrors seqkit stats functionality.
Statistics options
Promise resolving to statistics
Write sequences to FASTA file
Terminal operation that writes all sequences in FASTA format.
Output file path
Writer options
Promise resolving when write is complete
Write sequences to FASTQ file
Terminal operation that writes all sequences in FASTQ format. If input sequences don't have quality scores, uses default quality.
Output file path
Default quality string for FASTA sequences
Promise resolving when write is complete
Write sequences to JSON file
Convenience method that converts sequences to tabular format and writes as JSON. Supports both simple array format and wrapped format with metadata. Loads entire dataset into memory before writing.
Output file path
Optionaloptions: Fx2TabOptions<readonly string[]> & JSONWriteOptionsCombined column selection and JSON formatting options
Promise resolving when write is complete
// Simple JSON array
await SeqOps.fromFasta('input.fa')
.writeJSON('output.json');
// With selected columns
await SeqOps.fromFasta('input.fa')
.writeJSON('output.json', {
columns: ['id', 'sequence', 'length', 'gc']
});
// Pretty-printed with metadata
await SeqOps.fromFasta('input.fa')
.writeJSON('output.json', {
columns: ['id', 'sequence', 'length'],
pretty: true,
includeMetadata: true
});
Write sequences to JSONL (JSON Lines) file
Convenience method that converts sequences to tabular format and writes as JSONL (one JSON object per line). Provides streaming with O(1) memory usage, ideal for large datasets.
Note: JSONL format does not support metadata or pretty-printing. Each line is a separate, compact JSON object.
Output file path
Optionaloptions: Fx2TabOptions<readonly string[]>Column selection options (JSON formatting options not applicable)
Promise resolving when write is complete
// Basic JSONL output
await SeqOps.fromFasta('input.fa')
.writeJSONL('output.jsonl');
// With selected columns
await SeqOps.fromFasta('input.fa')
.writeJSONL('output.jsonl', {
columns: ['id', 'sequence', 'length', 'gc']
});
// Large dataset streaming
await SeqOps.fromFasta('huge-dataset.fa')
.filter({ minLength: 100 })
.writeJSONL('filtered.jsonl'); // O(1) memory
Convert sequences to tabular format
Transform sequences into a tabular representation with configurable columns. This is the primary method for tabular conversion, providing a more intuitive name than the seqkit-inspired fx2tab.
Optionaloptions: Fx2TabOptions<Columns>Column selection and formatting options
TabularOps instance for further processing or writing
// Basic conversion to tabular format
await seqops(sequences)
.toTabular({ columns: ['id', 'seq', 'length', 'gc'] })
.writeTSV('output.tsv');
// With custom columns
await seqops(sequences)
.toTabular({
columns: ['id', 'seq', 'gc'],
customColumns: {
high_gc: (seq) => seq.gc > 60 ? 'HIGH' : 'NORMAL'
}
})
.writeCSV('analysis.csv');
Convert sequences to tabular format (SeqKit compatibility)
Alias for .toTabular() maintained for SeqKit parity and backward compatibility.
New code should prefer .toTabular() for better clarity.
Optionaloptions: Fx2TabOptions<Columns>Column selection and formatting options
TabularOps instance for further processing or writing
toTabular - Primary method for tabular conversion
Convert sequences to row-based format
Clearer alias for .toTabular() that emphasizes the row-based structure
used for output to various formats (TSV, CSV, JSON, JSONL).
This method converts sequences into a structured row format that can be written to tabular formats (TSV/CSV) or object formats (JSON/JSONL). Use this when the term "tabular" feels semantically incorrect for your output format (e.g., JSON).
Optionaloptions: Fx2TabOptions<Columns>Column selection and formatting options
TabularOps instance for further processing or writing
toTabular - Original method name
// Writing to JSON - "rows" is clearer than "tabular"
await seqops(sequences)
.asRows({ columns: ['id', 'sequence', 'length'] })
.writeJSON('output.json');
// Writing to JSONL
await seqops(sequences)
.asRows({ columns: ['id', 'seq', 'gc'] })
.writeJSONL('output.jsonl');
// Also works for tabular formats
await seqops(sequences)
.asRows({ columns: ['id', 'seq', 'length'] })
.writeTSV('output.tsv');
Write sequences as TSV (tab-separated values)
Terminal operation that writes sequences as tab-separated values.
Output file path
Conversion options (delimiter will be set to tab)
Write sequences as CSV (comma-separated values)
Terminal operation that writes sequences as comma-separated values. Excel protection is recommended for CSV files.
Output file path
Conversion options (delimiter will be set to comma)
Write sequences as DSV with custom delimiter
Terminal operation for any delimiter-separated format.
Output file path
Custom delimiter character(s)
Conversion options
Collect all sequences into an array
Terminal operation that materializes all sequences in memory. Use with caution on large datasets.
Promise resolving to array of sequences
Collect k-mer sequences into KmerSet with K preservation
When the stream contains KmerSequence objects, returns KmerSet
Promise<KmerSet
Transform sequences with a mapping function
Transforms each sequence in the stream using the provided function. Type parameter U is inferred from the return type of the mapping function, allowing type transformations while preserving specific sequence types when the mapping function returns the same type.
After calling .enumerate(), the index parameter becomes available in
the mapping function signature.
Output sequence type (defaults to T for type preservation)
New SeqOps with transformed sequences
// Transform without index
seqops<FastqSequence>(reads)
.map((seq) => ({ ...seq, id: `sample1_${seq.id}` }));
// Type preserved: SeqOps<FastqSequence>
// Transform with index (after enumerate)
seqops(sequences)
.enumerate()
.map((seq, idx) => ({
...seq,
description: `position=${idx} ${seq.description || ""}`,
}));
// Async transformation
seqops(sequences)
.map(async (seq) => {
const annotation = await fetchAnnotation(seq.id);
return { ...seq, description: annotation };
});
Transform sequences with a mapping function
Transforms each sequence in the stream using the provided function. Type parameter U is inferred from the return type of the mapping function, allowing type transformations while preserving specific sequence types when the mapping function returns the same type.
After calling .enumerate(), the index parameter becomes available in
the mapping function signature.
Output sequence type (defaults to T for type preservation)
New SeqOps with transformed sequences
// Transform without index
seqops<FastqSequence>(reads)
.map((seq) => ({ ...seq, id: `sample1_${seq.id}` }));
// Type preserved: SeqOps<FastqSequence>
// Transform with index (after enumerate)
seqops(sequences)
.enumerate()
.map((seq, idx) => ({
...seq,
description: `position=${idx} ${seq.description || ""}`,
}));
// Async transformation
seqops(sequences)
.map(async (seq) => {
const annotation = await fetchAnnotation(seq.id);
return { ...seq, description: annotation };
});
Attach index to each sequence
Adds a zero-based index property to each sequence in the stream.
After calling this method, downstream operations like .map() and .filter()
can access the index parameter in their callback functions.
The index represents the position of the sequence in the stream (0-based).
New SeqOps with sequences that have an index property
// Enable index parameter in downstream operations
const results = await seqops<FastqSequence>(reads)
.enumerate()
.filter((seq, idx) => idx < 10000) // Index available
.map((seq, idx) => ({
...seq,
description: `${seq.description} pos=${idx}`,
}))
.collect();
// Type: Array<FastqSequence & { index: number }> ✅
results[0].quality; // ✅ Exists (FastqSequence preserved)
results[0].index; // ✅ Exists (from enumerate)
Apply side effects without consuming the stream
Executes a function for each sequence but yields the original sequence unchanged. Useful for logging, progress tracking, or other side effects that shouldn't modify the sequence data.
After calling .enumerate(), the index parameter becomes available.
Same SeqOps for continued chaining
// Progress logging without index
let count = 0;
seqops(sequences)
.tap((seq) => {
count++;
if (count % 1000 === 0) console.log(`Processed ${count}`);
})
.filter({ minLength: 100 })
.writeFasta('output.fasta');
// Progress tracking with index
seqops(sequences)
.enumerate()
.tap((seq, idx) => {
if (idx % 1000 === 0) console.log(`Processed ${idx}`);
})
.filter({ minLength: 100 });
// Collect statistics without modifying stream
const stats = { totalLength: 0, count: 0 };
seqops(sequences)
.tap((seq) => {
stats.totalLength += seq.length;
stats.count++;
})
.filter({ minLength: 100 })
.writeFasta('filtered.fasta');
console.log(`Average length: ${stats.totalLength / stats.count}`);
Apply side effects without consuming the stream
Executes a function for each sequence but yields the original sequence unchanged. Useful for logging, progress tracking, or other side effects that shouldn't modify the sequence data.
After calling .enumerate(), the index parameter becomes available.
Side effect function (with index after enumerate)
Same SeqOps for continued chaining
// Progress logging without index
let count = 0;
seqops(sequences)
.tap((seq) => {
count++;
if (count % 1000 === 0) console.log(`Processed ${count}`);
})
.filter({ minLength: 100 })
.writeFasta('output.fasta');
// Progress tracking with index
seqops(sequences)
.enumerate()
.tap((seq, idx) => {
if (idx % 1000 === 0) console.log(`Processed ${idx}`);
})
.filter({ minLength: 100 });
// Collect statistics without modifying stream
const stats = { totalLength: 0, count: 0 };
seqops(sequences)
.tap((seq) => {
stats.totalLength += seq.length;
stats.count++;
})
.filter({ minLength: 100 })
.writeFasta('filtered.fasta');
console.log(`Average length: ${stats.totalLength / stats.count}`);
Map each sequence to multiple sequences and flatten the result
Transforms each sequence into zero or more sequences, then flattens all results into a single stream. The mapping function can return an array or an async iterable.
After calling .enumerate(), the index parameter becomes available.
Output sequence type (defaults to T for type preservation)
New SeqOps with flattened results
// Expand each sequence to multiple variants
seqops(sequences)
.flatMap((seq) => [
{ ...seq, id: `${seq.id}_variant1`, sequence: variant1(seq.sequence) },
{ ...seq, id: `${seq.id}_variant2`, sequence: variant2(seq.sequence) },
])
.writeFasta('variants.fasta');
// Generate k-mers from each sequence
seqops(sequences)
.flatMap((seq) => generateKmers(seq, 21))
.unique({ by: 'sequence' })
.writeFasta('unique_kmers.fasta');
Map each sequence to multiple sequences and flatten the result
Transforms each sequence into zero or more sequences, then flattens all results into a single stream. The mapping function can return an array or an async iterable.
After calling .enumerate(), the index parameter becomes available.
Output sequence type (defaults to T for type preservation)
New SeqOps with flattened results
// Expand each sequence to multiple variants
seqops(sequences)
.flatMap((seq) => [
{ ...seq, id: `${seq.id}_variant1`, sequence: variant1(seq.sequence) },
{ ...seq, id: `${seq.id}_variant2`, sequence: variant2(seq.sequence) },
])
.writeFasta('variants.fasta');
// Generate k-mers from each sequence
seqops(sequences)
.flatMap((seq) => generateKmers(seq, 21))
.unique({ by: 'sequence' })
.writeFasta('unique_kmers.fasta');
Process each sequence with a callback (terminal operation)
Applies a function to each sequence in the stream. This is a terminal operation that consumes the stream and returns when all sequences have been processed.
After calling .enumerate(), the index parameter becomes available in the callback.
Promise that resolves when all sequences have been processed
// Type-safe with FastqSequence
await seqops<FastqSequence>(reads)
.forEach((seq) => {
console.log(seq.quality); // ✅ TypeScript knows quality exists
});
Process each sequence with a callback (terminal operation)
Applies a function to each sequence in the stream. This is a terminal operation that consumes the stream and returns when all sequences have been processed.
After calling .enumerate(), the index parameter becomes available in the callback.
Callback function to execute for each sequence
Promise that resolves when all sequences have been processed
// Type-safe with FastqSequence
await seqops<FastqSequence>(reads)
.forEach((seq) => {
console.log(seq.quality); // ✅ TypeScript knows quality exists
});
Reduce sequences to a single value using first element as accumulator
Terminal operation that reduces the stream to a single value by applying a function that combines the accumulator with each sequence. The first sequence in the stream becomes the initial accumulator value.
Returns undefined if the stream is empty.
After calling .enumerate(), the index parameter becomes available.
Promise resolving to the final accumulated value, or undefined if empty
// Find longest sequence
const longest = await seqops<FastqSequence>(reads)
.reduce((acc, seq) => seq.length > acc.length ? seq : acc);
// Type: FastqSequence | undefined ✅
Reduce sequences to a single value using first element as accumulator
Terminal operation that reduces the stream to a single value by applying a function that combines the accumulator with each sequence. The first sequence in the stream becomes the initial accumulator value.
Returns undefined if the stream is empty.
After calling .enumerate(), the index parameter becomes available.
Promise resolving to the final accumulated value, or undefined if empty
// Find longest sequence
const longest = await seqops<FastqSequence>(reads)
.reduce((acc, seq) => seq.length > acc.length ? seq : acc);
// Type: FastqSequence | undefined ✅
Fold sequences to a single value with explicit initial value
Terminal operation that reduces the stream to a single value by applying a function that combines the accumulator with each sequence. Unlike reduce(), fold() requires an explicit initial value and can transform to any type.
Never returns undefined - always returns at least the initial value.
After calling .enumerate(), the index parameter becomes available.
Promise resolving to the final accumulated value
// Calculate total length
const totalLength = await seqops(sequences)
.fold((sum, seq) => sum + seq.length, 0);
// Type: number ✅
// Build index mapping
const index = await seqops<FastqSequence>(reads)
.fold(
(map, seq) => map.set(seq.id, seq),
new Map<string, FastqSequence>(),
);
// Type: Map<string, FastqSequence> ✅
// Collect statistics with position tracking
const stats = await seqops(sequences)
.enumerate()
.fold(
(acc, seq, idx) => {
const gc = calculateGC(seq.sequence);
return {
min: Math.min(acc.min, gc),
max: Math.max(acc.max, gc),
sum: acc.sum + gc,
count: acc.count + 1,
positions: [...acc.positions, { idx, gc }],
};
},
{ min: Infinity, max: -Infinity, sum: 0, count: 0, positions: [] },
);
Fold sequences to a single value with explicit initial value
Terminal operation that reduces the stream to a single value by applying a function that combines the accumulator with each sequence. Unlike reduce(), fold() requires an explicit initial value and can transform to any type.
Never returns undefined - always returns at least the initial value.
After calling .enumerate(), the index parameter becomes available.
Promise resolving to the final accumulated value
// Calculate total length
const totalLength = await seqops(sequences)
.fold((sum, seq) => sum + seq.length, 0);
// Type: number ✅
// Build index mapping
const index = await seqops<FastqSequence>(reads)
.fold(
(map, seq) => map.set(seq.id, seq),
new Map<string, FastqSequence>(),
);
// Type: Map<string, FastqSequence> ✅
// Collect statistics with position tracking
const stats = await seqops(sequences)
.enumerate()
.fold(
(acc, seq, idx) => {
const gc = calculateGC(seq.sequence);
return {
min: Math.min(acc.min, gc),
max: Math.max(acc.max, gc),
sum: acc.sum + gc,
count: acc.count + 1,
positions: [...acc.positions, { idx, gc }],
};
},
{ min: Infinity, max: -Infinity, sum: 0, count: 0, positions: [] },
);
Combine two streams element-by-element with a combining function
Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).
New SeqOps with combined elements
// Neither enumerated
const forward = seqops<FastqSequence>("reads_R1.fastq");
const reverse = seqops<FastqSequence>("reads_R2.fastq");
forward.zipWith(reverse, (fwd, rev) => ({
id: `${fwd.id}_merged`,
sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
}));
Combine two streams element-by-element with a combining function
Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).
New SeqOps with combined elements
// Neither enumerated
const forward = seqops<FastqSequence>("reads_R1.fastq");
const reverse = seqops<FastqSequence>("reads_R2.fastq");
forward.zipWith(reverse, (fwd, rev) => ({
id: `${fwd.id}_merged`,
sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
}));
Combine two streams element-by-element with a combining function
Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).
New SeqOps with combined elements
// Neither enumerated
const forward = seqops<FastqSequence>("reads_R1.fastq");
const reverse = seqops<FastqSequence>("reads_R2.fastq");
forward.zipWith(reverse, (fwd, rev) => ({
id: `${fwd.id}_merged`,
sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
}));
Combine two streams element-by-element with a combining function
Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).
New SeqOps with combined elements
// Neither enumerated
const forward = seqops<FastqSequence>("reads_R1.fastq");
const reverse = seqops<FastqSequence>("reads_R2.fastq");
forward.zipWith(reverse, (fwd, rev) => ({
id: `${fwd.id}_merged`,
sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
}));
Combine two streams element-by-element with a combining function
Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).
New SeqOps with combined elements
// Neither enumerated
const forward = seqops<FastqSequence>("reads_R1.fastq");
const reverse = seqops<FastqSequence>("reads_R2.fastq");
forward.zipWith(reverse, (fwd, rev) => ({
id: `${fwd.id}_merged`,
sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
}));
Combine two streams element-by-element with a combining function
Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).
New SeqOps with combined elements
// Neither enumerated
const forward = seqops<FastqSequence>("reads_R1.fastq");
const reverse = seqops<FastqSequence>("reads_R2.fastq");
forward.zipWith(reverse, (fwd, rev) => ({
id: `${fwd.id}_merged`,
sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
}));
Interleave with another stream in alternating order
Combines two streams by alternating elements: left, right, left, right, etc. Both streams must contain sequences of the same type for type safety. Commonly used for Illumina paired-end reads.
Stops when either stream ends (shortest-wins behavior).
Interleaved SeqOps stream
// Basic interleaving
const forward = seqops<FastqSequence>('reads_R1.fastq');
const reverse = seqops<FastqSequence>('reads_R2.fastq');
forward
.interleave(reverse)
.writeFastq('interleaved.fastq');
// Output: F1, R1, F2, R2, F3, R3, ...
// With ID validation for paired-end reads
forward
.interleave(reverse, { validateIds: true })
.writeFastq('interleaved.fastq');
// Throws error if IDs don't match
Repair paired-end read ordering through buffered ID matching
Matches paired-end reads (R1 and R2) from shuffled or out-of-order streams, then outputs them in correctly interleaved order. Supports two modes:
Uses hash-based buffering to handle out-of-order data, making it suitable for sequences that have been sorted, filtered, or otherwise reordered after initial sequencing.
Output Order: Always yields R1, R2, R1, R2, R1, R2... (interleaved)
Memory Management:
Second stream for dual-stream mode (R2 reads)
Optionaloptions: PairOptionsPairing options (ID extraction, buffer limits, unpaired handling)
Paired SeqOps stream in interleaved order
// Dual-stream mode: Match reads from separate R1 and R2 files
const r1 = seqops<FastqSequence>('sample_R1.fastq.gz');
const r2 = seqops<FastqSequence>('sample_R2.fastq.gz');
r1.pair(r2).writeFastq('paired.fastq');
// Output: R1_001, R2_001, R1_002, R2_002, ...
// Single-stream mode: Repair pairing within mixed stream
seqops<FastqSequence>('shuffled.fastq')
.pair()
.writeFastq('repaired.fastq');
// Reads with /1 suffix → R1, /2 suffix → R2
// Custom ID extraction for non-standard naming
r1.pair(r2, {
extractPairId: (id) => id.split('_')[0] // Custom base ID
}).writeFastq('paired.fastq');
// Strict mode: error on unpaired reads
r1.pair(r2, {
onUnpaired: 'error', // Throw on unpaired (default: 'warn')
maxBufferSize: 50000 // Smaller buffer limit
}).writeFastq('paired.fastq');
// Skip unpaired reads silently
seqops<FastqSequence>('mixed.fastq')
.pair({ onUnpaired: 'skip' })
.writeFastq('paired_only.fastq');
Repair paired-end read ordering through buffered ID matching
Matches paired-end reads (R1 and R2) from shuffled or out-of-order streams, then outputs them in correctly interleaved order. Supports two modes:
Uses hash-based buffering to handle out-of-order data, making it suitable for sequences that have been sorted, filtered, or otherwise reordered after initial sequencing.
Output Order: Always yields R1, R2, R1, R2, R1, R2... (interleaved)
Memory Management:
Optionaloptions: PairOptionsPairing options (ID extraction, buffer limits, unpaired handling)
Paired SeqOps stream in interleaved order
// Dual-stream mode: Match reads from separate R1 and R2 files
const r1 = seqops<FastqSequence>('sample_R1.fastq.gz');
const r2 = seqops<FastqSequence>('sample_R2.fastq.gz');
r1.pair(r2).writeFastq('paired.fastq');
// Output: R1_001, R2_001, R1_002, R2_002, ...
// Single-stream mode: Repair pairing within mixed stream
seqops<FastqSequence>('shuffled.fastq')
.pair()
.writeFastq('repaired.fastq');
// Reads with /1 suffix → R1, /2 suffix → R2
// Custom ID extraction for non-standard naming
r1.pair(r2, {
extractPairId: (id) => id.split('_')[0] // Custom base ID
}).writeFastq('paired.fastq');
// Strict mode: error on unpaired reads
r1.pair(r2, {
onUnpaired: 'error', // Throw on unpaired (default: 'warn')
maxBufferSize: 50000 // Smaller buffer limit
}).writeFastq('paired.fastq');
// Skip unpaired reads silently
seqops<FastqSequence>('mixed.fastq')
.pair({ onUnpaired: 'skip' })
.writeFastq('paired_only.fastq');
Find pattern locations in sequences
Terminal operation that finds all occurrences of patterns within sequences
with support for fuzzy matching, strand searching, and various output formats.
Mirrors seqkit locate functionality.
// Simple cases (most common)
const locations = seqops(sequences)
.locate('ATCG') // Exact string match
.locate(/ATG...TAA/) // Regex pattern
.locate('ATCG', 2); // Allow 2 mismatches
// Advanced options for complex scenarios
const locations = seqops(sequences).locate({
pattern: 'ATCG',
allowMismatches: 1,
searchBothStrands: true,
outputFormat: 'bed'
});
for await (const location of locations) {
console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
}
Find pattern locations in sequences
Terminal operation that finds all occurrences of patterns within sequences
with support for fuzzy matching, strand searching, and various output formats.
Mirrors seqkit locate functionality.
// Simple cases (most common)
const locations = seqops(sequences)
.locate('ATCG') // Exact string match
.locate(/ATG...TAA/) // Regex pattern
.locate('ATCG', 2); // Allow 2 mismatches
// Advanced options for complex scenarios
const locations = seqops(sequences).locate({
pattern: 'ATCG',
allowMismatches: 1,
searchBothStrands: true,
outputFormat: 'bed'
});
for await (const location of locations) {
console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
}
Find pattern locations in sequences
Terminal operation that finds all occurrences of patterns within sequences
with support for fuzzy matching, strand searching, and various output formats.
Mirrors seqkit locate functionality.
// Simple cases (most common)
const locations = seqops(sequences)
.locate('ATCG') // Exact string match
.locate(/ATG...TAA/) // Regex pattern
.locate('ATCG', 2); // Allow 2 mismatches
// Advanced options for complex scenarios
const locations = seqops(sequences).locate({
pattern: 'ATCG',
allowMismatches: 1,
searchBothStrands: true,
outputFormat: 'bed'
});
for await (const location of locations) {
console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
}
Find pattern locations in sequences
Terminal operation that finds all occurrences of patterns within sequences
with support for fuzzy matching, strand searching, and various output formats.
Mirrors seqkit locate functionality.
// Simple cases (most common)
const locations = seqops(sequences)
.locate('ATCG') // Exact string match
.locate(/ATG...TAA/) // Regex pattern
.locate('ATCG', 2); // Allow 2 mismatches
// Advanced options for complex scenarios
const locations = seqops(sequences).locate({
pattern: 'ATCG',
allowMismatches: 1,
searchBothStrands: true,
outputFormat: 'bed'
});
for await (const location of locations) {
console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
}
Find pattern locations in sequences
Terminal operation that finds all occurrences of patterns within sequences
with support for fuzzy matching, strand searching, and various output formats.
Mirrors seqkit locate functionality.
// Simple cases (most common)
const locations = seqops(sequences)
.locate('ATCG') // Exact string match
.locate(/ATG...TAA/) // Regex pattern
.locate('ATCG', 2); // Allow 2 mismatches
// Advanced options for complex scenarios
const locations = seqops(sequences).locate({
pattern: 'ATCG',
allowMismatches: 1,
searchBothStrands: true,
outputFormat: 'bed'
});
for await (const location of locations) {
console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
}
Enable direct iteration over the pipeline
Async iterator for sequences
Main SeqOps class providing fluent interface for sequence operations
Enables Unix pipeline-style method chaining for processing genomic sequences. All operations are lazy-evaluated and maintain streaming behavior for memory efficiency with large datasets.
Example