Genotype API Documentation - v0.1.0
    Preparing search index...

    Class SeqOps<T>

    Main SeqOps class providing fluent interface for sequence operations

    Enables Unix pipeline-style method chaining for processing genomic sequences. All operations are lazy-evaluated and maintain streaming behavior for memory efficiency with large datasets.

    // Basic pipeline
    await seqops(sequences)
    .filter({ minLength: 100 })
    .transform({ reverseComplement: true })
    .subseq({ region: "100:500" })
    .writeFasta('output.fasta');

    // Complex filtering and analysis
    const stats = await seqops(sequences)
    .quality({ minScore: 20, trim: true })
    .filter({ minLength: 50 })
    .stats({ detailed: true });

    Type Parameters

    Index

    Constructors

    Methods

    • Filter sequences based on criteria

      Remove sequences that don't meet specified criteria. All criteria within a single filter call are combined with AND logic.

      Parameters

      • options: FilterOptions | ((seq: T) => boolean)

        Filter criteria or custom predicate

      Returns SeqOps<T>

      New SeqOps instance for chaining

      // Filter by length and GC content
      seqops(sequences)
      .filter({ minLength: 100, maxGC: 60 })
      .filter({ hasAmbiguous: false })

      // Custom filter function
      seqops(sequences)
      .filter({ custom: seq => seq.id.startsWith('chr') })
    • Transform sequence content

      Apply transformations that modify the sequence string itself.

      Parameters

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .transform({ reverseComplement: true })
      .transform({ upperCase: true })
      .transform({ toRNA: true })
    • Extract amplicons via primer sequences

      Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.

      Parameters

      • forwardPrimer: string

      Returns SeqOps<T>

      // Simple amplicon extraction (90% use case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT')
      .writeFasta('amplicons.fasta');

      // With mismatch tolerance (common case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT', 2)
      .filter({ minLength: 50 });

      // Single primer (auto-canonical matching)
      seqops(sequences)
      .amplicon('UNIVERSAL_PRIMER')
      .stats();

      // Real-world COVID-19 diagnostics
      seqops(samples)
      .quality({ minScore: 20 })
      .amplicon(
      primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
      primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
      2 // Allow sequencing errors
      )
      .validate({ mode: 'strict' });

      // Long reads with windowed search (massive performance boost)
      seqops(nanoporeReads)
      .amplicon('FORWARD', 'REVERSE', {
      searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
      });

      // Advanced features (10% use case)
      seqops(sequences)
      .amplicon({
      forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
      reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
      maxMismatches: 3, // Long-read tolerance
      canonical: true, // BED-extracted primers
      flanking: true, // Include primer context
      region: '-100:100', // Biological context
      searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
      outputMismatches: true // Debug information
      })
      .rmdup('sequence')
      .writeFasta('advanced_amplicons.fasta');
    • Extract amplicons via primer sequences

      Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.

      Parameters

      • forwardPrimer: string
      • reversePrimer: string

      Returns SeqOps<T>

      // Simple amplicon extraction (90% use case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT')
      .writeFasta('amplicons.fasta');

      // With mismatch tolerance (common case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT', 2)
      .filter({ minLength: 50 });

      // Single primer (auto-canonical matching)
      seqops(sequences)
      .amplicon('UNIVERSAL_PRIMER')
      .stats();

      // Real-world COVID-19 diagnostics
      seqops(samples)
      .quality({ minScore: 20 })
      .amplicon(
      primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
      primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
      2 // Allow sequencing errors
      )
      .validate({ mode: 'strict' });

      // Long reads with windowed search (massive performance boost)
      seqops(nanoporeReads)
      .amplicon('FORWARD', 'REVERSE', {
      searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
      });

      // Advanced features (10% use case)
      seqops(sequences)
      .amplicon({
      forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
      reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
      maxMismatches: 3, // Long-read tolerance
      canonical: true, // BED-extracted primers
      flanking: true, // Include primer context
      region: '-100:100', // Biological context
      searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
      outputMismatches: true // Debug information
      })
      .rmdup('sequence')
      .writeFasta('advanced_amplicons.fasta');
    • Extract amplicons via primer sequences

      Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.

      Parameters

      • forwardPrimer: string
      • reversePrimer: string
      • maxMismatches: number

      Returns SeqOps<T>

      // Simple amplicon extraction (90% use case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT')
      .writeFasta('amplicons.fasta');

      // With mismatch tolerance (common case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT', 2)
      .filter({ minLength: 50 });

      // Single primer (auto-canonical matching)
      seqops(sequences)
      .amplicon('UNIVERSAL_PRIMER')
      .stats();

      // Real-world COVID-19 diagnostics
      seqops(samples)
      .quality({ minScore: 20 })
      .amplicon(
      primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
      primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
      2 // Allow sequencing errors
      )
      .validate({ mode: 'strict' });

      // Long reads with windowed search (massive performance boost)
      seqops(nanoporeReads)
      .amplicon('FORWARD', 'REVERSE', {
      searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
      });

      // Advanced features (10% use case)
      seqops(sequences)
      .amplicon({
      forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
      reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
      maxMismatches: 3, // Long-read tolerance
      canonical: true, // BED-extracted primers
      flanking: true, // Include primer context
      region: '-100:100', // Biological context
      searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
      outputMismatches: true // Debug information
      })
      .rmdup('sequence')
      .writeFasta('advanced_amplicons.fasta');
    • Extract amplicons via primer sequences

      Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.

      Parameters

      • forwardPrimer: string
      • reversePrimer: string
      • options: Partial<AmpliconOptions>

      Returns SeqOps<T>

      // Simple amplicon extraction (90% use case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT')
      .writeFasta('amplicons.fasta');

      // With mismatch tolerance (common case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT', 2)
      .filter({ minLength: 50 });

      // Single primer (auto-canonical matching)
      seqops(sequences)
      .amplicon('UNIVERSAL_PRIMER')
      .stats();

      // Real-world COVID-19 diagnostics
      seqops(samples)
      .quality({ minScore: 20 })
      .amplicon(
      primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
      primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
      2 // Allow sequencing errors
      )
      .validate({ mode: 'strict' });

      // Long reads with windowed search (massive performance boost)
      seqops(nanoporeReads)
      .amplicon('FORWARD', 'REVERSE', {
      searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
      });

      // Advanced features (10% use case)
      seqops(sequences)
      .amplicon({
      forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
      reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
      maxMismatches: 3, // Long-read tolerance
      canonical: true, // BED-extracted primers
      flanking: true, // Include primer context
      region: '-100:100', // Biological context
      searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
      outputMismatches: true // Debug information
      })
      .rmdup('sequence')
      .writeFasta('advanced_amplicons.fasta');
    • Extract amplicons via primer sequences

      Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.

      Parameters

      • options: AmpliconOptions

      Returns SeqOps<T>

      // Simple amplicon extraction (90% use case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT')
      .writeFasta('amplicons.fasta');

      // With mismatch tolerance (common case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT', 2)
      .filter({ minLength: 50 });

      // Single primer (auto-canonical matching)
      seqops(sequences)
      .amplicon('UNIVERSAL_PRIMER')
      .stats();

      // Real-world COVID-19 diagnostics
      seqops(samples)
      .quality({ minScore: 20 })
      .amplicon(
      primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
      primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
      2 // Allow sequencing errors
      )
      .validate({ mode: 'strict' });

      // Long reads with windowed search (massive performance boost)
      seqops(nanoporeReads)
      .amplicon('FORWARD', 'REVERSE', {
      searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
      });

      // Advanced features (10% use case)
      seqops(sequences)
      .amplicon({
      forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
      reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
      maxMismatches: 3, // Long-read tolerance
      canonical: true, // BED-extracted primers
      flanking: true, // Include primer context
      region: '-100:100', // Biological context
      searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
      outputMismatches: true // Debug information
      })
      .rmdup('sequence')
      .writeFasta('advanced_amplicons.fasta');
    • Clean and sanitize sequences

      Fix common issues in sequence data such as gaps, ambiguous bases, and whitespace.

      Parameters

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .clean({ removeGaps: true })
      .clean({ replaceAmbiguous: true, replaceChar: 'N' })
      .clean({ trimWhitespace: true, removeEmpty: true })
    • Convert FASTQ quality score encodings

      Convert quality scores between different encoding schemes (Phred+33, Phred+64, Solexa). Essential for legacy data processing and tool compatibility. Only affects FASTQ sequences; FASTA sequences pass through unchanged.

      Type Parameters

      Parameters

      • this: SeqOps<U>
      • options: ConvertOptions

        Conversion options

      Returns SeqOps<U>

      New SeqOps instance for chaining

      // Primary workflow: Auto-detect source encoding (matches seqkit)
      seqops(legacyData)
      .convert({ targetEncoding: 'phred33' })
      .writeFastq('modernized.fastq');

      // Legacy Illumina 1.3-1.7 to modern standard
      seqops(illumina15Data)
      .convert({
      sourceEncoding: 'phred64', // Skip detection for known encoding
      targetEncoding: 'phred33' // Modern standard
      })

      // Real-world pipeline: QC → standardize encoding → analysis
      const results = await seqops(mixedEncodingFiles)
      .quality({ minScore: 20 }) // Filter first
      .convert({ targetEncoding: 'phred33' }) // Standardize
      .stats({ detailed: true });
    • Validate sequences

      Check sequences for validity and optionally fix or reject invalid ones.

      Parameters

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .validate({ mode: 'strict', action: 'reject' })
      .validate({ allowAmbiguous: true, action: 'fix', fixChar: 'N' })
    • Search sequences by pattern

      Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.

      Parameters

      • pattern: string

      Returns SeqOps<T>

      // Simple sequence search (most common case)
      seqops(sequences)
      .grep('ATCG') // Search sequences for 'ATCG'
      .grep(/^chr\d+/, 'id') // Search IDs with regex

      // Advanced options for complex scenarios
      seqops(sequences)
      .grep({
      pattern: 'ATCGATCG',
      target: 'sequence',
      allowMismatches: 2,
      searchBothStrands: true
      })
    • Search sequences by pattern

      Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.

      Parameters

      • pattern: RegExp

      Returns SeqOps<T>

      // Simple sequence search (most common case)
      seqops(sequences)
      .grep('ATCG') // Search sequences for 'ATCG'
      .grep(/^chr\d+/, 'id') // Search IDs with regex

      // Advanced options for complex scenarios
      seqops(sequences)
      .grep({
      pattern: 'ATCGATCG',
      target: 'sequence',
      allowMismatches: 2,
      searchBothStrands: true
      })
    • Search sequences by pattern

      Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.

      Parameters

      • pattern: string
      • target: "sequence" | "description" | "id"

      Returns SeqOps<T>

      // Simple sequence search (most common case)
      seqops(sequences)
      .grep('ATCG') // Search sequences for 'ATCG'
      .grep(/^chr\d+/, 'id') // Search IDs with regex

      // Advanced options for complex scenarios
      seqops(sequences)
      .grep({
      pattern: 'ATCGATCG',
      target: 'sequence',
      allowMismatches: 2,
      searchBothStrands: true
      })
    • Search sequences by pattern

      Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.

      Parameters

      • pattern: RegExp
      • target: "sequence" | "description" | "id"

      Returns SeqOps<T>

      // Simple sequence search (most common case)
      seqops(sequences)
      .grep('ATCG') // Search sequences for 'ATCG'
      .grep(/^chr\d+/, 'id') // Search IDs with regex

      // Advanced options for complex scenarios
      seqops(sequences)
      .grep({
      pattern: 'ATCGATCG',
      target: 'sequence',
      allowMismatches: 2,
      searchBothStrands: true
      })
    • Search sequences by pattern

      Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.

      Parameters

      • options: GrepOptions

      Returns SeqOps<T>

      // Simple sequence search (most common case)
      seqops(sequences)
      .grep('ATCG') // Search sequences for 'ATCG'
      .grep(/^chr\d+/, 'id') // Search IDs with regex

      // Advanced options for complex scenarios
      seqops(sequences)
      .grep({
      pattern: 'ATCGATCG',
      target: 'sequence',
      allowMismatches: 2,
      searchBothStrands: true
      })
    • Concatenate sequences from multiple sources

      Combines sequences from multiple file paths and/or AsyncIterables with sophisticated ID conflict resolution. Maintains streaming behavior for memory efficiency with large datasets.

      Parameters

      • sources: (string | AsyncIterable<AbstractSequence, any, any>)[]

        Array of file paths and/or AsyncIterables to concatenate

      • Optionaloptions: Omit<ConcatOptions, "sources">

        Concatenation options (optional)

      Returns SeqOps<T>

      New SeqOps instance for chaining

      // Simple concatenation from files
      seqops(sequences)
      .concat(['file1.fasta', 'file2.fasta'])
      .concat([anotherAsyncIterable])

      // Advanced options for complex scenarios
      seqops(sequences)
      .concat(['file1.fasta', 'file2.fasta'], {
      idConflictResolution: 'suffix',
      validateFormats: true,
      sourceLabels: ['batch1', 'batch2'],
      onProgress: (processed, total, source) =>
      console.log(`Processed ${processed} from ${source}`)
      })
    • Extract subsequences

      Mirrors seqkit subseq functionality for region extraction.

      Parameters

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .subseq({
      region: "100:500",
      upstream: 50,
      downstream: 50
      })
    • Take first n sequences

      Mirrors seqkit head functionality.

      Parameters

      • n: number

        Number of sequences to take

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences).head(1000)
      
    • Sample sequences statistically

      Apply statistical sampling to select a subset of sequences. Supports both simple count-based sampling and advanced options.

      Parameters

      • count: number

      Returns SeqOps<T>

      // Simple sampling (most common case)
      seqops(sequences)
      .sample(1000) // Sample 1000 sequences
      .sample(500, 'systematic') // Systematic sampling

      // Advanced options for complex scenarios
      seqops(sequences)
      .sample({
      n: 1000,
      seed: 42,
      strategy: 'reservoir'
      })
    • Sample sequences statistically

      Apply statistical sampling to select a subset of sequences. Supports both simple count-based sampling and advanced options.

      Parameters

      • count: number
      • strategy: "random" | "systematic" | "reservoir"

      Returns SeqOps<T>

      // Simple sampling (most common case)
      seqops(sequences)
      .sample(1000) // Sample 1000 sequences
      .sample(500, 'systematic') // Systematic sampling

      // Advanced options for complex scenarios
      seqops(sequences)
      .sample({
      n: 1000,
      seed: 42,
      strategy: 'reservoir'
      })
    • Sample sequences statistically

      Apply statistical sampling to select a subset of sequences. Supports both simple count-based sampling and advanced options.

      Parameters

      • options: SampleOptions

      Returns SeqOps<T>

      // Simple sampling (most common case)
      seqops(sequences)
      .sample(1000) // Sample 1000 sequences
      .sample(500, 'systematic') // Systematic sampling

      // Advanced options for complex scenarios
      seqops(sequences)
      .sample({
      n: 1000,
      seed: 42,
      strategy: 'reservoir'
      })
    • Sort sequences by specified criteria

      High-performance sorting optimized for genomic data compression. Automatically switches between in-memory and external sorting based on dataset size. Proper sequence ordering dramatically improves compression ratios for genomic datasets.

      Parameters

      • options: SortOptions

        Sort criteria and options

      Returns SeqOps<T>

      New SeqOps instance for chaining

      // Sort by length for compression optimization
      seqops(sequences)
      .sort({ by: 'length', order: 'desc' })

      // Sort by GC content for clustering similar sequences
      seqops(sequences)
      .sort({ by: 'gc', order: 'asc' })

      // Custom sorting for specialized genomic criteria
      seqops(sequences)
      .sort({
      custom: (a, b) => a.sequence.localeCompare(b.sequence)
      })
    • Sort sequences by length (convenience method)

      Parameters

      • order: "asc" | "desc" = "asc"

        Sort order: 'asc' or 'desc' (default: 'asc')

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .sortByLength('desc') // Longest first for compression
      .sortByLength() // Shortest first (default)
    • Sort sequences by ID (convenience method)

      Parameters

      • order: "asc" | "desc" = "asc"

        Sort order: 'asc' or 'desc' (default: 'asc')

      Returns SeqOps<T>

      New SeqOps instance for chaining

    • Sort sequences by GC content (convenience method)

      Parameters

      • order: "asc" | "desc" = "asc"

        Sort order: 'asc' or 'desc' (default: 'asc')

      Returns SeqOps<T>

      New SeqOps instance for chaining

    • Remove duplicate sequences

      High-performance deduplication using probabilistic Bloom filters or exact Set-based approaches. Supports both simple deduplication and advanced configuration for large datasets.

      Parameters

      • by: "sequence" | "id" | "both"

      Returns SeqOps<T>

      // Simple deduplication (most common cases)
      seqops(sequences)
      .rmdup('sequence') // Remove sequence duplicates
      .rmdup('id', true) // Remove ID duplicates (exact)

      // Advanced options for large datasets
      seqops(sequences)
      .rmdup({
      by: 'both',
      expectedUnique: 5_000_000,
      falsePositiveRate: 0.0001
      })
    • Remove duplicate sequences

      High-performance deduplication using probabilistic Bloom filters or exact Set-based approaches. Supports both simple deduplication and advanced configuration for large datasets.

      Parameters

      • by: "sequence" | "id" | "both"
      • exact: boolean

      Returns SeqOps<T>

      // Simple deduplication (most common cases)
      seqops(sequences)
      .rmdup('sequence') // Remove sequence duplicates
      .rmdup('id', true) // Remove ID duplicates (exact)

      // Advanced options for large datasets
      seqops(sequences)
      .rmdup({
      by: 'both',
      expectedUnique: 5_000_000,
      falsePositiveRate: 0.0001
      })
    • Remove duplicate sequences

      High-performance deduplication using probabilistic Bloom filters or exact Set-based approaches. Supports both simple deduplication and advanced configuration for large datasets.

      Parameters

      • options: RmdupOptions

      Returns SeqOps<T>

      // Simple deduplication (most common cases)
      seqops(sequences)
      .rmdup('sequence') // Remove sequence duplicates
      .rmdup('id', true) // Remove ID duplicates (exact)

      // Advanced options for large datasets
      seqops(sequences)
      .rmdup({
      by: 'both',
      expectedUnique: 5_000_000,
      falsePositiveRate: 0.0001
      })
    • Remove sequence duplicates (convenience method)

      Most common deduplication use case - remove sequences with identical content.

      Parameters

      • caseSensitive: boolean = true

        Whether to consider case (default: true)

      Returns SeqOps<T>

      New SeqOps instance for chaining

    • Remove ID duplicates (convenience method)

      Remove sequences with duplicate IDs, keeping first occurrence.

      Parameters

      • exact: boolean = true

        Use exact matching (default: true for IDs)

      Returns SeqOps<T>

      New SeqOps instance for chaining

    • Translate DNA/RNA sequences to proteins

      High-performance protein translation supporting all 31 NCBI genetic codes with progressive disclosure for optimal developer experience.

      Parameters

      • OptionalgeneticCode: number | TranslateOptions

        Genetic code number (1-33) or full options object

      Returns SeqOps<T>

      New SeqOps instance for chaining

      // Simple cases (90% of usage)
      seqops(sequences)
      .translate() // Standard genetic code, frame +1
      .translate(2) // Vertebrate mitochondrial code

      // Advanced options (10% of usage)
      seqops(sequences)
      .translate({
      geneticCode: 1,
      orfsOnly: true,
      minOrfLength: 30
      })
    • Translate using mitochondrial genetic code (convenience method)

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .translateMito() // Genetic code 2 - Vertebrate Mitochondrial
    • Translate all 6 reading frames (convenience method)

      Parameters

      • geneticCode: number = 1

        Genetic code to use (default: 1 = Standard)

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .translateAllFrames() // All frames with standard code
      .translateAllFrames(2) // All frames with mito code
    • Find and translate open reading frames (convenience method)

      Parameters

      • minLength: number = 30

        Minimum ORF length in amino acids (default: 30)

      • geneticCode: number = 1

        Genetic code to use (default: 1 = Standard)

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .translateOrf() // Default: 30 aa minimum
      .translateOrf(100) // 100 aa minimum
      .translateOrf(50, 2) // 50 aa minimum, mito code
    • Split sequences into multiple files

      Terminal operation that writes pipeline sequences to separate files with comprehensive seqkit split/split2 compatibility. Integrates seamlessly with all SeqOps pipeline operations for sophisticated genomic workflows.

      Parameters

      • options: SplitOptions

        Split configuration options

      Returns Promise<SplitSummary>

      Promise resolving to split results summary

      // Basic usage - split after processing
      const result = await seqops(sequences)
      .filter({ minLength: 100 })
      .clean({ removeGaps: true })
      .split({ mode: 'by-size', sequencesPerFile: 1000 });

      // Real-world genomics: Quality control → split for parallel processing
      const qcResults = await seqops(rawReads)
      .quality({ minScore: 20, trim: true }) // Quality filter
      .filter({ minLength: 50, maxLength: 150 }) // Length filter
      .clean({ removeAmbiguous: true }) // Clean sequences
      .split({ mode: 'by-length', basesPerFile: 1000000 }); // 1MB chunks

      // Genome assembly: Split chromosomes for parallel analysis
      const chrResults = await seqops(genome)
      .grep({ pattern: /^chr[1-9]/, target: 'id' }) // Autosomal only
      .transform({ upperCase: true }) // Normalize case
      .split({ mode: 'by-id', idRegex: 'chr(\\d+)' }); // Group by chromosome

      // Amplicon sequencing: Process primers → split by target
      const amplicons = await seqops(sequences)
      .grep({ pattern: forwardPrimer, target: 'sequence' }) // Has forward primer
      .grep({ pattern: reversePrimer, target: 'sequence' }) // Has reverse primer
      .subseq({ region: '20:-20' }) // Trim primers
      .split({ mode: 'by-parts', numParts: 8 }); // Parallel processing

      console.log(`Created ${result.filesCreated.length} files`);
    • Split sequences with streaming results for advanced processing

      Returns AsyncIterable of split results following the locate() pattern. Enables sophisticated post-processing workflows where each split result needs individual handling during the splitting process.

      Parameters

      • options: SplitOptions

        Split configuration options

      Returns AsyncIterable<SplitResult>

      AsyncIterable of split results for processing

      // Basic streaming - process each split file as it's created
      for await (const result of seqops(sequences).splitToStream(options)) {
      await compressFile(result.outputFile);
      console.log(`Split ${result.sequenceCount} sequences to ${result.outputFile}`);
      }

      // Large genome processing: Split → compress → upload pipeline
      for await (const chunk of seqops(largeGenome).splitToStream({
      mode: 'by-length',
      basesPerFile: 50_000_000 // 50MB chunks
      })) {
      // Process each chunk immediately to manage memory
      await compressWithBgzip(chunk.outputFile);
      await uploadToCloud(chunk.outputFile + '.gz');
      await deleteLocalFile(chunk.outputFile); // Clean up
      console.log(`Processed chunk ${chunk.partId}: ${chunk.sequenceCount} sequences`);
      }

      // Quality control: Split → validate → report pipeline
      const qualityReports = [];
      for await (const batch of seqops(sequencingRun).splitToStream({
      mode: 'by-size',
      sequencesPerFile: 10000
      })) {
      const qc = await runQualityControl(batch.outputFile);
      qualityReports.push({
      file: batch.outputFile,
      sequences: batch.sequenceCount,
      qcScore: qc.overallScore
      });
      }
    • Split by sequence count (convenience method)

      Most common splitting mode - divide sequences into files with N sequences each. Ideal for creating manageable chunks for parallel processing.

      Parameters

      • sequencesPerFile: number

        Number of sequences per output file

      • outputDir: string = "./split"

        Output directory (default: './split')

      Returns Promise<SplitSummary>

      Promise resolving to split results

      // Simple case - just split
      await seqops(sequences).splitBySize(1000);

      // Common workflow: Filter → process → split for downstream analysis
      await seqops(rawSequences)
      .filter({ minLength: 100 })
      .clean({ removeGaps: true })
      .splitBySize(5000, './chunks');

      // RNA-seq: Quality filter → deduplicate → split for differential expression
      await seqops(rnaseqReads)
      .quality({ minScore: 20 })
      .rmdup({ by: 'sequence' })
      .splitBySize(100000, './de-analysis');
    • Split into equal parts (convenience method)

      Parameters

      • numParts: number

        Number of output files to create

      • outputDir: string = "./split"

        Output directory (default: './split')

      Returns Promise<SplitSummary>

      Promise resolving to split results

    • Split by base count (convenience method)

      Implements seqkit split2's key functionality for splitting by total sequence bases rather than sequence count. Essential for genome processing where you need consistent data sizes regardless of sequence count.

      Parameters

      • basesPerFile: number

        Number of bases per output file

      • outputDir: string = "./split"

        Output directory (default: './split')

      Returns Promise<SplitSummary>

      Promise resolving to split results

      // Genome assembly: Split into 10MB chunks for parallel processing
      await seqops(scaffolds).splitByLength(10_000_000);

      // Metagenomics: Process → bin → split by data size
      await seqops(contigs)
      .filter({ minLength: 1000 })
      .sort({ by: 'length', order: 'desc' }) // Longest first
      .splitByLength(5_000_000, './metagenome-bins');

      // Long-read sequencing: Quality control → split for analysis
      await seqops(nanoporeReads)
      .quality({ minScore: 7 }) // Nanopore quality threshold
      .filter({ minLength: 5000, maxLength: 100000 })
      .splitByLength(50_000_000, './nanopore-chunks');
    • Split by sequence ID pattern (convenience method)

      Groups sequences by ID patterns for organized analysis. String patterns are automatically converted to RegExp for better developer experience.

      Parameters

      • pattern: string | RegExp

        String pattern or RegExp to group sequences by ID

      • outputDir: string = "./split"

        Output directory (default: './split')

      Returns Promise<SplitSummary>

      Promise resolving to split results

      // Genome assembly: Split by chromosome
      await seqops(scaffolds).splitById('chr(\\d+)'); // chr1, chr2, chr3...

      // Multi-species analysis: Group by organism
      await seqops(sequences)
      .splitById('(\\w+)_gene'); // Groups: human_gene, mouse_gene, etc.

      // Transcriptome: Split by gene families
      await seqops(transcripts)
      .filter({ minLength: 200 })
      .transform({ upperCase: true })
      .splitById('(HOX\\w+)_transcript', './gene-families');

      // Advanced: Use RegExp for complex patterns
      await seqops(sequences)
      .splitById(/^(chr[XY]|chrM)_/, './sex-chromosomes');
    • Split by genomic region with compile-time validation (convenience method)

      Uses advanced TypeScript template literal types to parse and validate genomic regions at compile time, preventing coordinate errors.

      Type Parameters

      • T extends string

      Parameters

      • region: T extends ValidGenomicRegion<T> ? T<T> : never

        Genomic region string with compile-time validation

      • outputDir: string = "./split"

        Output directory (default: './split')

      Returns Promise<SplitSummary>

      Promise resolving to split results

      // ✅ Type-safe region parsing - validated at compile time
      await seqops(sequences).splitByRegion('chr1:1000-2000');
      await seqops(sequences).splitByRegion('scaffold_1:500-1500');
      await seqops(sequences).splitByRegion('chrX:0-1000'); // 0-based OK

      // ❌ These cause TypeScript compilation errors:
      // await seqops(sequences).splitByRegion('chr1:2000-1000'); // end < start
      // await seqops(sequences).splitByRegion('chr1:1000-1000'); // end = start
      // await seqops(sequences).splitByRegion('invalid-format'); // bad format

      // 🔥 Compile-time coordinate extraction available:
      type Coords = ExtractCoordinates<'chr1:1000-2000'>;
      // → { chr: 'chr1'; start: 1000; end: 2000; length: 1000 }
    • Calculate sequence statistics

      Terminal operation that processes all sequences to compute statistics. Mirrors seqkit stats functionality.

      Parameters

      Returns Promise<SequenceStats>

      Promise resolving to statistics

      const stats = await seqops(sequences)
      .seq({ minLength: 100 })
      .stats({ detailed: true });
      console.log(`N50: ${stats.n50}`);
    • Write sequences to FASTA file

      Terminal operation that writes all sequences in FASTA format.

      Parameters

      • path: string

        Output file path

      • options: { wrapWidth?: number } = {}

        Writer options

      Returns Promise<void>

      Promise resolving when write is complete

      await seqops(sequences)
      .seq({ reverseComplement: true })
      .writeFasta('output.fasta');
    • Write sequences to FASTQ file

      Terminal operation that writes all sequences in FASTQ format. If input sequences don't have quality scores, uses default quality.

      Parameters

      • path: string

        Output file path

      • defaultQuality: string = "I"

        Default quality string for FASTA sequences

      Returns Promise<void>

      Promise resolving when write is complete

      await seqops(sequences)
      .seq({ minQuality: 20 })
      .writeFastq('output.fastq', 'IIIIIIIIII');
    • Collect all sequences into an array

      Terminal operation that materializes all sequences in memory. Use with caution on large datasets.

      Returns Promise<AbstractSequence[]>

      Promise resolving to array of sequences

      const sequences = await seqops(input)
      .seq({ minLength: 100 })
      .collect();
      console.log(`Collected ${sequences.length} sequences`);
    • Count sequences

      Terminal operation that counts sequences without loading them in memory.

      Returns Promise<number>

      Promise resolving to sequence count

      const count = await seqops(sequences)
      .filter(seq => seq.length > 100)
      .count();
    • Process each sequence with a callback

      Terminal operation that applies a function to each sequence.

      Parameters

      Returns Promise<void>

      Promise resolving when processing is complete

      await seqops(sequences)
      .forEach(seq => console.log(seq.id, seq.length));
    • Find pattern locations in sequences

      Terminal operation that finds all occurrences of patterns within sequences with support for fuzzy matching, strand searching, and various output formats. Mirrors seqkit locate functionality.

      Parameters

      • pattern: string

      Returns AsyncIterable<MotifLocation>

      // Simple cases (most common)
      const locations = seqops(sequences)
      .locate('ATCG') // Exact string match
      .locate(/ATG...TAA/) // Regex pattern
      .locate('ATCG', 2); // Allow 2 mismatches

      // Advanced options for complex scenarios
      const locations = seqops(sequences).locate({
      pattern: 'ATCG',
      allowMismatches: 1,
      searchBothStrands: true,
      outputFormat: 'bed'
      });

      for await (const location of locations) {
      console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
      }
    • Find pattern locations in sequences

      Terminal operation that finds all occurrences of patterns within sequences with support for fuzzy matching, strand searching, and various output formats. Mirrors seqkit locate functionality.

      Parameters

      • pattern: RegExp

      Returns AsyncIterable<MotifLocation>

      // Simple cases (most common)
      const locations = seqops(sequences)
      .locate('ATCG') // Exact string match
      .locate(/ATG...TAA/) // Regex pattern
      .locate('ATCG', 2); // Allow 2 mismatches

      // Advanced options for complex scenarios
      const locations = seqops(sequences).locate({
      pattern: 'ATCG',
      allowMismatches: 1,
      searchBothStrands: true,
      outputFormat: 'bed'
      });

      for await (const location of locations) {
      console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
      }
    • Find pattern locations in sequences

      Terminal operation that finds all occurrences of patterns within sequences with support for fuzzy matching, strand searching, and various output formats. Mirrors seqkit locate functionality.

      Parameters

      • pattern: string
      • mismatches: number

      Returns AsyncIterable<MotifLocation>

      // Simple cases (most common)
      const locations = seqops(sequences)
      .locate('ATCG') // Exact string match
      .locate(/ATG...TAA/) // Regex pattern
      .locate('ATCG', 2); // Allow 2 mismatches

      // Advanced options for complex scenarios
      const locations = seqops(sequences).locate({
      pattern: 'ATCG',
      allowMismatches: 1,
      searchBothStrands: true,
      outputFormat: 'bed'
      });

      for await (const location of locations) {
      console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
      }
    • Find pattern locations in sequences

      Terminal operation that finds all occurrences of patterns within sequences with support for fuzzy matching, strand searching, and various output formats. Mirrors seqkit locate functionality.

      Parameters

      • pattern: RegExp
      • mismatches: number

      Returns AsyncIterable<MotifLocation>

      // Simple cases (most common)
      const locations = seqops(sequences)
      .locate('ATCG') // Exact string match
      .locate(/ATG...TAA/) // Regex pattern
      .locate('ATCG', 2); // Allow 2 mismatches

      // Advanced options for complex scenarios
      const locations = seqops(sequences).locate({
      pattern: 'ATCG',
      allowMismatches: 1,
      searchBothStrands: true,
      outputFormat: 'bed'
      });

      for await (const location of locations) {
      console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
      }
    • Find pattern locations in sequences

      Terminal operation that finds all occurrences of patterns within sequences with support for fuzzy matching, strand searching, and various output formats. Mirrors seqkit locate functionality.

      Parameters

      • options: LocateOptions

      Returns AsyncIterable<MotifLocation>

      // Simple cases (most common)
      const locations = seqops(sequences)
      .locate('ATCG') // Exact string match
      .locate(/ATG...TAA/) // Regex pattern
      .locate('ATCG', 2); // Allow 2 mismatches

      // Advanced options for complex scenarios
      const locations = seqops(sequences).locate({
      pattern: 'ATCG',
      allowMismatches: 1,
      searchBothStrands: true,
      outputFormat: 'bed'
      });

      for await (const location of locations) {
      console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
      }
    • Enable direct iteration over the pipeline

      Returns AsyncIterator<AbstractSequence>

      Async iterator for sequences

      for await (const seq of seqops(sequences).seq({ minLength: 100 })) {
      console.log(seq.id);
      }