Genotype API Documentation - v0.1.0
    Preparing search index...

    Class SeqOps<T>

    Main SeqOps class providing fluent interface for sequence operations

    Enables Unix pipeline-style method chaining for processing genomic sequences. All operations are lazy-evaluated and maintain streaming behavior for memory efficiency with large datasets.

    // Basic pipeline
    await seqops(sequences)
    .filter({ minLength: 100 })
    .transform({ reverseComplement: true })
    .subseq({ region: "100:500" })
    .writeFasta('output.fasta');

    // Complex filtering and analysis
    const stats = await seqops(sequences)
    .quality({ minScore: 20, trim: true })
    .filter({ minLength: 50 })
    .stats({ detailed: true });

    Type Parameters

    Index

    Constructors

    Methods

    • Create SeqOps pipeline from delimiter-separated file

      Supports auto-detection of delimiter and format. Files can be compressed (.gz, .zst) and will be automatically decompressed during streaming.

      Parameters

      • path: string

        Path to DSV file (TSV, CSV, or custom delimiter)

      • Optionaloptions: {
            delimiter?: string;
            hasHeader?: boolean;
            columns?: string[];
            format?: "fasta" | "fastq";
            qualityEncoding?: "phred33" | "phred64" | "solexa";
        }

        Parsing options (delimiter auto-detected if not specified)

      Returns SeqOps<AbstractSequence>

      New SeqOps pipeline for sequence processing

      // Auto-detect delimiter
      const sequences = await SeqOps.fromDSV('data.txt').collect();

      // Explicit delimiter with custom columns
      const genes = await SeqOps.fromDSV('genes.psv', {
      delimiter: '|',
      format: 'fastq'
      }).filter({ minLength: 100 });

      v0.1.0

    • Create SeqOps pipeline from TSV (tab-separated) file

      Convenience method for TSV files with tab delimiter pre-configured.

      Parameters

      • path: string

        Path to TSV file

      • Optionaloptions: Omit<
            {
                delimiter?: string;
                hasHeader?: boolean;
                columns?: string[];
                format?: "fasta"
                | "fastq";
                qualityEncoding?: "phred33" | "phred64" | "solexa";
            },
            "delimiter",
        >

        Parsing options (delimiter forced to tab)

      Returns SeqOps<AbstractSequence>

      New SeqOps pipeline

      await SeqOps.fromTSV('sequences.tsv')
      .filter({ minLength: 50 })
      .writeFasta('filtered.fa');

      v0.1.0

    • Create SeqOps pipeline from CSV (comma-separated) file

      Convenience method for CSV files with comma delimiter pre-configured. Handles Excel-exported CSV files with proper quote escaping.

      Parameters

      • path: string

        Path to CSV file

      • Optionaloptions: Omit<
            {
                delimiter?: string;
                hasHeader?: boolean;
                columns?: string[];
                format?: "fasta"
                | "fastq";
                qualityEncoding?: "phred33" | "phred64" | "solexa";
            },
            "delimiter",
        >

        Parsing options (delimiter forced to comma)

      Returns SeqOps<AbstractSequence>

      New SeqOps pipeline

      await SeqOps.fromCSV('excel_export.csv')
      .clean()
      .stats()
      .writeFastq('processed.fq');

      v0.1.0

    • Create SeqOps pipeline from JSON file

      Parses JSON files containing sequence arrays. Supports both simple array format and wrapped format with metadata. Suitable for datasets under 100K sequences (loads entire file into memory).

      Parameters

      • path: string

        Path to JSON file

      • Optionaloptions: JSONParseOptions

        Parsing options (format, quality encoding)

      Returns SeqOps<AbstractSequence>

      New SeqOps pipeline

      // Parse JSON array of sequences
      await SeqOps.fromJSON('sequences.json')
      .filter({ minLength: 100 })
      .writeFasta('filtered.fa');

      // Parse FASTQ sequences with quality encoding
      await SeqOps.fromJSON('reads.json', {
      format: 'fastq',
      qualityEncoding: 'phred33'
      }).quality({ minScore: 20 });

      O(n) memory - loads entire file. Use fromJSONL() for large datasets.

      v0.1.0

    • Create SeqOps pipeline from JSONL (JSON Lines) file

      Parses JSONL files where each line is a separate JSON object. Provides streaming with O(1) memory usage, suitable for datasets with millions of sequences.

      Parameters

      • path: string

        Path to JSONL file

      • Optionaloptions: JSONParseOptions

        Parsing options (format, quality encoding)

      Returns SeqOps<AbstractSequence>

      New SeqOps pipeline

      // Stream large JSONL dataset
      await SeqOps.fromJSONL('huge-dataset.jsonl')
      .filter({ minLength: 100 })
      .sample(1000)
      .writeFasta('sampled.fa');

      // Process FASTQ from JSONL
      await SeqOps.fromJSONL('reads.jsonl', { format: 'fastq' })
      .quality({ minScore: 30 })
      .clean()
      .writeFastq('clean.fq');

      O(1) memory - streams line-by-line. Ideal for large files.

      v0.1.0

    • Create SeqOps pipeline from array of sequences

      Convenient method to convert arrays to SeqOps pipelines. Most common use case for examples and small datasets.

      Type Parameters

      Parameters

      • sequences: T[]

        Array of sequences

      Returns SeqOps<T>

      New SeqOps instance

      const sequences = [
      { id: 'seq1', sequence: 'ATCG', length: 4 },
      { id: 'seq2', sequence: 'GCTA', length: 4 }
      ];

      const result = await SeqOps.from(sequences)
      .translate()
      .writeFasta('proteins.fasta');

      v0.1.0

    • Filter sequences based on criteria

      Remove sequences that don't meet specified criteria. All criteria within a single filter call are combined with AND logic.

      After calling .enumerate(), the index parameter becomes available in predicate functions, enabling position-based filtering.

      Parameters

      • this: SeqOps<T & { index: number }>
      • options: FilterOptions | ((seq: T, index: number) => boolean | Promise<boolean>)

        Filter criteria or custom predicate (with index after enumerate)

      Returns SeqOps<T>

      New SeqOps instance for chaining

      // Filter by length and GC content
      seqops(sequences)
      .filter({ minLength: 100, maxGC: 60 })
      .filter({ hasAmbiguous: false });

      // Custom filter function
      seqops(sequences)
      .filter((seq) => seq.id.startsWith('chr'));

      // With index (after enumerate) - keep even positions
      seqops(sequences)
      .enumerate()
      .filter((seq, idx) => idx % 2 === 0);

      // Async predicate with index
      seqops(sequences)
      .enumerate()
      .filter(async (seq, idx) => {
      const valid = await validateSequence(seq);
      return valid && idx < 1000;
      });

      // Type preservation with FastqSequence
      seqops<FastqSequence>(reads)
      .filter((seq) => seq.quality !== undefined);
      // Type: SeqOps<FastqSequence> ✅
    • Filter sequences based on criteria

      Remove sequences that don't meet specified criteria. All criteria within a single filter call are combined with AND logic.

      After calling .enumerate(), the index parameter becomes available in predicate functions, enabling position-based filtering.

      Parameters

      • options: FilterOptions | ((seq: T) => boolean | Promise<boolean>)

        Filter criteria or custom predicate (with index after enumerate)

      Returns SeqOps<T>

      New SeqOps instance for chaining

      // Filter by length and GC content
      seqops(sequences)
      .filter({ minLength: 100, maxGC: 60 })
      .filter({ hasAmbiguous: false });

      // Custom filter function
      seqops(sequences)
      .filter((seq) => seq.id.startsWith('chr'));

      // With index (after enumerate) - keep even positions
      seqops(sequences)
      .enumerate()
      .filter((seq, idx) => idx % 2 === 0);

      // Async predicate with index
      seqops(sequences)
      .enumerate()
      .filter(async (seq, idx) => {
      const valid = await validateSequence(seq);
      return valid && idx < 1000;
      });

      // Type preservation with FastqSequence
      seqops<FastqSequence>(reads)
      .filter((seq) => seq.quality !== undefined);
      // Type: SeqOps<FastqSequence> ✅
    • Filter sequences based on membership in a SequenceSet

      Efficiently filters the stream based on whether sequences are present in the provided set. Useful for contamination removal, whitelist/blacklist filtering, and set-based operations.

      Type Parameters

      Parameters

      • set: SequenceSet<U>

        SequenceSet to filter against

      • Optionaloptions: { exclude?: boolean; by?: "sequence" | "id" }

        Filtering options

      Returns SeqOps<T>

      Filtered SeqOps instance

      // Remove contamination sequences
      const contaminants = await seqops("contaminants.fasta").collectSet();
      await seqops("reads.fastq")
      .filterBySet(contaminants, { exclude: true })
      .writeFastq("clean_reads.fastq");
      // Keep only sequences in whitelist
      const whitelist = await seqops("approved.fasta").collectSet();
      await seqops("candidates.fasta")
      .filterBySet(whitelist, { exclude: false })
      .writeFasta("approved_candidates.fasta");
      // Filter by sequence ID instead of sequence content
      const idSet = await seqops("targets.fasta").collectSet();
      seqops("all_sequences.fasta")
      .filterBySet(idSet, { by: "id" });
    • Transform sequence content

      Apply transformations that modify the sequence string itself.

      Parameters

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .transform({ reverseComplement: true })
      .transform({ upperCase: true })
      .transform({ toRNA: true })
    • Extract amplicons via primer sequences

      Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.

      Parameters

      • forwardPrimer: string

      Returns SeqOps<T>

      // Simple amplicon extraction (90% use case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT')
      .writeFasta('amplicons.fasta');

      // With mismatch tolerance (common case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT', 2)
      .filter({ minLength: 50 });

      // Single primer (auto-canonical matching)
      seqops(sequences)
      .amplicon('UNIVERSAL_PRIMER')
      .stats();

      // Real-world COVID-19 diagnostics
      seqops(samples)
      .quality({ minScore: 20 })
      .amplicon(
      primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
      primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
      2 // Allow sequencing errors
      )
      .validate({ mode: 'strict' });

      // Long reads with windowed search (massive performance boost)
      seqops(nanoporeReads)
      .amplicon('FORWARD', 'REVERSE', {
      searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
      });

      // Advanced features (10% use case)
      seqops(sequences)
      .amplicon({
      forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
      reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
      maxMismatches: 3, // Long-read tolerance
      canonical: true, // BED-extracted primers
      flanking: true, // Include primer context
      region: '-100:100', // Biological context
      searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
      outputMismatches: true // Debug information
      })
      .rmdup('sequence')
      .writeFasta('advanced_amplicons.fasta');
    • Extract amplicons via primer sequences

      Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.

      Parameters

      • forwardPrimer: string
      • reversePrimer: string

      Returns SeqOps<T>

      // Simple amplicon extraction (90% use case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT')
      .writeFasta('amplicons.fasta');

      // With mismatch tolerance (common case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT', 2)
      .filter({ minLength: 50 });

      // Single primer (auto-canonical matching)
      seqops(sequences)
      .amplicon('UNIVERSAL_PRIMER')
      .stats();

      // Real-world COVID-19 diagnostics
      seqops(samples)
      .quality({ minScore: 20 })
      .amplicon(
      primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
      primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
      2 // Allow sequencing errors
      )
      .validate({ mode: 'strict' });

      // Long reads with windowed search (massive performance boost)
      seqops(nanoporeReads)
      .amplicon('FORWARD', 'REVERSE', {
      searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
      });

      // Advanced features (10% use case)
      seqops(sequences)
      .amplicon({
      forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
      reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
      maxMismatches: 3, // Long-read tolerance
      canonical: true, // BED-extracted primers
      flanking: true, // Include primer context
      region: '-100:100', // Biological context
      searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
      outputMismatches: true // Debug information
      })
      .rmdup('sequence')
      .writeFasta('advanced_amplicons.fasta');
    • Extract amplicons via primer sequences

      Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.

      Parameters

      • forwardPrimer: string
      • reversePrimer: string
      • maxMismatches: number

      Returns SeqOps<T>

      // Simple amplicon extraction (90% use case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT')
      .writeFasta('amplicons.fasta');

      // With mismatch tolerance (common case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT', 2)
      .filter({ minLength: 50 });

      // Single primer (auto-canonical matching)
      seqops(sequences)
      .amplicon('UNIVERSAL_PRIMER')
      .stats();

      // Real-world COVID-19 diagnostics
      seqops(samples)
      .quality({ minScore: 20 })
      .amplicon(
      primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
      primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
      2 // Allow sequencing errors
      )
      .validate({ mode: 'strict' });

      // Long reads with windowed search (massive performance boost)
      seqops(nanoporeReads)
      .amplicon('FORWARD', 'REVERSE', {
      searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
      });

      // Advanced features (10% use case)
      seqops(sequences)
      .amplicon({
      forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
      reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
      maxMismatches: 3, // Long-read tolerance
      canonical: true, // BED-extracted primers
      flanking: true, // Include primer context
      region: '-100:100', // Biological context
      searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
      outputMismatches: true // Debug information
      })
      .rmdup('sequence')
      .writeFasta('advanced_amplicons.fasta');
    • Extract amplicons via primer sequences

      Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.

      Parameters

      • forwardPrimer: string
      • reversePrimer: string
      • options: Partial<AmpliconOptions>

      Returns SeqOps<T>

      // Simple amplicon extraction (90% use case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT')
      .writeFasta('amplicons.fasta');

      // With mismatch tolerance (common case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT', 2)
      .filter({ minLength: 50 });

      // Single primer (auto-canonical matching)
      seqops(sequences)
      .amplicon('UNIVERSAL_PRIMER')
      .stats();

      // Real-world COVID-19 diagnostics
      seqops(samples)
      .quality({ minScore: 20 })
      .amplicon(
      primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
      primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
      2 // Allow sequencing errors
      )
      .validate({ mode: 'strict' });

      // Long reads with windowed search (massive performance boost)
      seqops(nanoporeReads)
      .amplicon('FORWARD', 'REVERSE', {
      searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
      });

      // Advanced features (10% use case)
      seqops(sequences)
      .amplicon({
      forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
      reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
      maxMismatches: 3, // Long-read tolerance
      canonical: true, // BED-extracted primers
      flanking: true, // Include primer context
      region: '-100:100', // Biological context
      searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
      outputMismatches: true // Debug information
      })
      .rmdup('sequence')
      .writeFasta('advanced_amplicons.fasta');
    • Extract amplicons via primer sequences

      Finds primer pairs within sequences and extracts the amplified regions. Supports mismatch tolerance, degenerate bases (IUPAC codes), windowed search for long-read performance, canonical matching for BED-extracted primers, and flexible region extraction. Provides complete seqkit amplicon parity with enhanced biological validation and type safety.

      Parameters

      • options: AmpliconOptions

      Returns SeqOps<T>

      // Simple amplicon extraction (90% use case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT')
      .writeFasta('amplicons.fasta');

      // With mismatch tolerance (common case)
      seqops(sequences)
      .amplicon('ATCGATCG', 'CGATCGAT', 2)
      .filter({ minLength: 50 });

      // Single primer (auto-canonical matching)
      seqops(sequences)
      .amplicon('UNIVERSAL_PRIMER')
      .stats();

      // Real-world COVID-19 diagnostics
      seqops(samples)
      .quality({ minScore: 20 })
      .amplicon(
      primer`ACCAGGAACTAATCAGACAAG`, // N gene forward
      primer`CAAAGACCAATCCTACCATGAG`, // N gene reverse
      2 // Allow sequencing errors
      )
      .validate({ mode: 'strict' });

      // Long reads with windowed search (massive performance boost)
      seqops(nanoporeReads)
      .amplicon('FORWARD', 'REVERSE', {
      searchWindow: { forward: 200, reverse: 200 } // 100x+ speedup
      });

      // Advanced features (10% use case)
      seqops(sequences)
      .amplicon({
      forwardPrimer: primer`ACCAGGAACTAATCAGACAAG`,
      reversePrimer: primer`CAAAGACCAATCCTACCATGAG`,
      maxMismatches: 3, // Long-read tolerance
      canonical: true, // BED-extracted primers
      flanking: true, // Include primer context
      region: '-100:100', // Biological context
      searchWindow: { forward: 200, reverse: 200 }, // Performance optimization
      outputMismatches: true // Debug information
      })
      .rmdup('sequence')
      .writeFasta('advanced_amplicons.fasta');
    • Clean and sanitize sequences

      Fix common issues in sequence data such as gaps, ambiguous bases, and whitespace.

      Parameters

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .clean({ removeGaps: true })
      .clean({ replaceAmbiguous: true, replaceChar: 'N' })
      .clean({ trimWhitespace: true, removeEmpty: true })
    • FASTQ quality operations

      Filter, trim, and bin sequences based on quality scores. Supports filtering, trimming, and binning operations - all operations are optional and can be combined. Only affects FASTQ sequences; FASTA sequences pass through unchanged.

      Type Parameters

      Parameters

      Returns SeqOps<U>

      New SeqOps instance for chaining

      seqops(sequences)
      .quality({ minScore: 20 })
      seqops(sequences)
      .quality({ trim: true, trimThreshold: 20, trimWindow: 4 })
      seqops(sequences)
      .quality({ bins: 3, preset: 'illumina' })
      seqops(sequences)
      .quality({
      minScore: 20, // 1. Filter low quality
      trim: true, // 2. Trim ends
      bins: 3, // 3. Bin for compression
      preset: 'illumina'
      })
      seqops(sequences)
      .quality({ bins: 2, boundaries: [25] })
    • Convert FASTQ quality score encodings

      Convert quality scores between different encoding schemes (Phred+33, Phred+64, Solexa). Essential for legacy data processing and tool compatibility. Only affects FASTQ sequences; FASTA sequences pass through unchanged.

      Type Parameters

      Parameters

      • this: SeqOps<U>
      • options: ConvertOptions

        Conversion options

      Returns SeqOps<U>

      New SeqOps instance for chaining

      // Primary workflow: Auto-detect source encoding (matches seqkit)
      seqops(legacyData)
      .convert({ targetEncoding: 'phred33' })
      .writeFastq('modernized.fastq');

      // Legacy Illumina 1.3-1.7 to modern standard
      seqops(illumina15Data)
      .convert({
      sourceEncoding: 'phred64', // Skip detection for known encoding
      targetEncoding: 'phred33' // Modern standard
      })

      // Real-world pipeline: QC → standardize encoding → analysis
      const results = await seqops(mixedEncodingFiles)
      .quality({ minScore: 20 }) // Filter first
      .convert({ targetEncoding: 'phred33' }) // Standardize
      .stats({ detailed: true });
    • Convert FASTA sequences to FASTQ format

      Converts FASTA sequences to FASTQ by adding uniform quality scores. This method is only available when working with FASTA sequences and will cause a compile-time error if called on FASTQ sequences.

      Type Parameters

      Parameters

      • this: SeqOps<U>
      • Optionaloptions: Fa2FqOptions

        Conversion options with compile-time validation for literal values

      Returns SeqOps<FastqSequence>

      New SeqOps instance with FASTQ sequences

      // Convert with default quality (Phred+33 score 40)
      await seqops(fastaSeqs)
      .toFastqSequence()
      .writeFastq('output.fastq');

      // Convert with custom quality character
      await seqops(fastaSeqs)
      .toFastqSequence({ quality: 'I' }) // Valid
      .writeFastq('output.fastq');

      // These will cause compile-time errors:
      // seqops(fastaSeqs).toFastqSequence({ quality: '€' }); // Invalid character
      // seqops(fastqSeqs).toFastqSequence(); // Cannot convert FASTQ to FASTQ
    • Convert FASTQ sequences to FASTA format

      Converts FASTQ sequences to FASTA by removing quality scores. This method is only available when working with FASTQ sequences and will cause a compile-time error if called on FASTA sequences.

      Type Parameters

      Parameters

      • this: SeqOps<U>
      • Optionaloptions: Record<string, never>

        Conversion options

      Returns SeqOps<FastaSequence>

      New SeqOps instance with FASTA sequences

      // Convert FASTQ to FASTA for BLAST database
      await seqops(fastqSeqs)
      .toFastaSequence()
      .writeFasta('blast_db.fasta');

      // Preserve quality metrics for QC tracking
      await seqops(fastqSeqs)
      .toFastaSequence({ includeQualityStats: true })
      .writeFasta('assembly_input.fasta');

      // This will cause a compile-time error:
      // seqops(fastaSeqs).toFastaSequence(); // Cannot convert FASTA to FASTA
    • Validate sequences

      Check sequences for validity and optionally fix or reject invalid ones.

      Parameters

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .validate({ mode: 'strict', action: 'reject' })
      .validate({ allowAmbiguous: true, action: 'fix', fixChar: 'N' })
    • Search sequences by pattern

      Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.

      Parameters

      • pattern: string

      Returns SeqOps<T>

      // Simple sequence search (most common case)
      seqops(sequences)
      .grep('ATCG') // Search sequences for 'ATCG'
      .grep(/^chr\d+/, 'id') // Search IDs with regex

      // Advanced options for complex scenarios
      seqops(sequences)
      .grep({
      pattern: 'ATCGATCG',
      target: 'sequence',
      allowMismatches: 2,
      searchBothStrands: true
      })
    • Search sequences by pattern

      Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.

      Parameters

      • pattern: RegExp

      Returns SeqOps<T>

      // Simple sequence search (most common case)
      seqops(sequences)
      .grep('ATCG') // Search sequences for 'ATCG'
      .grep(/^chr\d+/, 'id') // Search IDs with regex

      // Advanced options for complex scenarios
      seqops(sequences)
      .grep({
      pattern: 'ATCGATCG',
      target: 'sequence',
      allowMismatches: 2,
      searchBothStrands: true
      })
    • Search sequences by pattern

      Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.

      Parameters

      • pattern: string
      • target: "sequence" | "description" | "id"

      Returns SeqOps<T>

      // Simple sequence search (most common case)
      seqops(sequences)
      .grep('ATCG') // Search sequences for 'ATCG'
      .grep(/^chr\d+/, 'id') // Search IDs with regex

      // Advanced options for complex scenarios
      seqops(sequences)
      .grep({
      pattern: 'ATCGATCG',
      target: 'sequence',
      allowMismatches: 2,
      searchBothStrands: true
      })
    • Search sequences by pattern

      Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.

      Parameters

      • pattern: RegExp
      • target: "sequence" | "description" | "id"

      Returns SeqOps<T>

      // Simple sequence search (most common case)
      seqops(sequences)
      .grep('ATCG') // Search sequences for 'ATCG'
      .grep(/^chr\d+/, 'id') // Search IDs with regex

      // Advanced options for complex scenarios
      seqops(sequences)
      .grep({
      pattern: 'ATCGATCG',
      target: 'sequence',
      allowMismatches: 2,
      searchBothStrands: true
      })
    • Search sequences by pattern

      Pattern matching and filtering similar to Unix grep. Supports both simple string patterns and complex options for advanced use cases.

      Parameters

      • options: GrepOptions

      Returns SeqOps<T>

      // Simple sequence search (most common case)
      seqops(sequences)
      .grep('ATCG') // Search sequences for 'ATCG'
      .grep(/^chr\d+/, 'id') // Search IDs with regex

      // Advanced options for complex scenarios
      seqops(sequences)
      .grep({
      pattern: 'ATCGATCG',
      target: 'sequence',
      allowMismatches: 2,
      searchBothStrands: true
      })
    • Concatenate multiple sequence files into a single pipeline

      Static factory function that creates a SeqOps pipeline from multiple files. Elegant API for combining sequence sources with simple duplicate handling.

      Parameters

      • filePaths: string[]

        Array of file paths to concatenate

      • handleDuplicateIds: "ignore" | "suffix" = "ignore"

        How to handle duplicate IDs: 'suffix' | 'ignore' (default: 'ignore')

      Returns SeqOps<FastaSequence>

      New SeqOps instance for chaining

      // Simple concatenation
      const combined = SeqOps.concat(['file1.fasta', 'file2.fasta']);

      // With duplicate ID suffixing
      const merged = SeqOps.concat(['db1.fa', 'db2.fa'], 'suffix')
      .filter({ minLength: 100 })
      .writeFasta('combined.fa');

      v0.1.0

    • Concatenate sequences from multiple sources

      Combines sequences from multiple file paths and/or AsyncIterables with sophisticated ID conflict resolution. Maintains streaming behavior for memory efficiency with large datasets.

      Parameters

      • sources: (string | AsyncIterable<AbstractSequence, any, any>)[]

        Array of file paths and/or AsyncIterables to concatenate

      • Optionaloptions: Omit<ConcatOptions, "sources">

        Concatenation options (optional)

      Returns SeqOps<T>

      New SeqOps instance for chaining

      // Simple concatenation from files
      seqops(sequences)
      .concat(['file1.fasta', 'file2.fasta'])
      .concat([anotherAsyncIterable])

      // Advanced options for complex scenarios
      seqops(sequences)
      .concat(['file1.fasta', 'file2.fasta'], {
      idConflictResolution: 'suffix',
      validateFormats: true,
      sourceLabels: ['batch1', 'batch2'],
      onProgress: (processed, total, source) =>
      console.log(`Processed ${processed} from ${source}`)
      })
    • Extract subsequences

      Mirrors seqkit subseq functionality for region extraction.

      Parameters

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .subseq({
      region: "100:500",
      upstream: 50,
      downstream: 50
      })
    • Generate sliding windows (k-mers) from sequences

      Extracts overlapping or non-overlapping windows from sequences with compile-time k-mer size tracking. Essential for k-mer analysis, motif discovery, and sequence decomposition.

      Type Parameters

      • K extends number

      Parameters

      • size: K

        Window size (k-mer size)

      Returns SeqOps<KmerSequence<K>>

      New SeqOps instance with KmerSequence type

      // Simple usage - just specify size
      const kmers = await seqops(sequences).windows(21).toArray();

      // With options - step, circular, greedy modes
      seqops(sequences).windows(21, { step: 3, circular: true })

      // Non-overlapping tiles
      seqops(sequences).windows(100, { step: 100 })

      // Greedy mode - include short final window
      seqops(sequences).windows(50, { greedy: true })
    • Generate sliding windows (k-mers) from sequences with options

      Type Parameters

      • K extends number

      Parameters

      • size: K

        Window size (k-mer size)

      • options: Omit<WindowOptions<K>, "size">

        Additional window options (step, circular, greedy, etc.)

      Returns SeqOps<KmerSequence<K>>

      New SeqOps instance with KmerSequence type

    • Generate sliding windows (k-mers) from sequences (legacy object form)

      Type Parameters

      • K extends number

      Parameters

      Returns SeqOps<KmerSequence<K>>

      New SeqOps instance with KmerSequence type

    • Take first n sequences

      Mirrors seqkit head functionality.

      Parameters

      • n: number

        Number of sequences to take

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences).head(1000)
      
    • Take first N sequences (alias for head)

      Returns the first N sequences from the stream. This is an alias for head() provided for developers familiar with this naming convention.

      Mirrors seqkit head functionality.

      Parameters

      • n: number

        Number of sequences to take

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences).take(1000)
      
    • Sample sequences from the stream

      Supports two modes: exact count sampling with strategy selection, or fraction-based streaming sampling for large datasets.

      Parameters

      • count: number

        Number of sequences to sample

      Returns SeqOps<T>

      seqops('input.fastq').sample(1000)  // Exactly 1000 sequences
      
      seqops('huge.fastq').sample({ fraction: 0.1 })  // ~10% of sequences
      
      const seed = 42;
      seqops('R1.fastq').sample({ fraction: 0.05, seed })
      seqops('R2.fastq').sample({ fraction: 0.05, seed })
    • Sample sequences from the stream

      Supports two modes: exact count sampling with strategy selection, or fraction-based streaming sampling for large datasets.

      Parameters

      • count: number

        Number of sequences to sample

      • strategy: "random" | "systematic" | "reservoir"

        Sampling strategy ('reservoir', 'systematic', or 'random')

      Returns SeqOps<T>

      seqops('input.fastq').sample(1000)  // Exactly 1000 sequences
      
      seqops('huge.fastq').sample({ fraction: 0.1 })  // ~10% of sequences
      
      const seed = 42;
      seqops('R1.fastq').sample({ fraction: 0.05, seed })
      seqops('R2.fastq').sample({ fraction: 0.05, seed })
    • Sample sequences from the stream

      Supports two modes: exact count sampling with strategy selection, or fraction-based streaming sampling for large datasets.

      Parameters

      • options: SampleOptions

        Detailed sampling options

      Returns SeqOps<T>

      seqops('input.fastq').sample(1000)  // Exactly 1000 sequences
      
      seqops('huge.fastq').sample({ fraction: 0.1 })  // ~10% of sequences
      
      const seed = 42;
      seqops('R1.fastq').sample({ fraction: 0.05, seed })
      seqops('R2.fastq').sample({ fraction: 0.05, seed })
    • Sort sequences by specified criteria

      High-performance sorting optimized for genomic data compression. Automatically switches between in-memory and external sorting based on dataset size. Proper sequence ordering dramatically improves compression ratios for genomic datasets.

      Parameters

      • options: SortOptions

        Sort criteria and options

      Returns SeqOps<T>

      New SeqOps instance for chaining

      // Sort by length for compression optimization
      seqops(sequences)
      .sort({ by: 'length', order: 'desc' })

      // Sort by GC content for clustering similar sequences
      seqops(sequences)
      .sort({ by: 'gc', order: 'asc' })

      // Custom sorting for specialized genomic criteria
      seqops(sequences)
      .sort({
      custom: (a, b) => a.sequence.localeCompare(b.sequence)
      })
    • Sort sequences by length (convenience method)

      Parameters

      • order: "asc" | "desc" = "asc"

        Sort order: 'asc' or 'desc' (default: 'asc')

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .sortByLength('desc') // Longest first for compression
      .sortByLength() // Shortest first (default)
    • Sort sequences by ID (convenience method)

      Parameters

      • order: "asc" | "desc" = "asc"

        Sort order: 'asc' or 'desc' (default: 'asc')

      Returns SeqOps<T>

      New SeqOps instance for chaining

    • Sort sequences by GC content (convenience method)

      Parameters

      • order: "asc" | "desc" = "asc"

        Sort order: 'asc' or 'desc' (default: 'asc')

      Returns SeqOps<T>

      New SeqOps instance for chaining

    • Remove duplicate sequences

      High-performance deduplication using probabilistic Bloom filters or exact Set-based approaches. Supports both simple deduplication and advanced configuration for large datasets.

      Parameters

      • by: "both" | "sequence" | "id"

      Returns SeqOps<T>

      // Simple deduplication (most common cases)
      seqops(sequences)
      .rmdup('sequence') // Remove sequence duplicates
      .rmdup('id', true) // Remove ID duplicates (exact)

      // Advanced options for large datasets
      seqops(sequences)
      .rmdup({
      by: 'both',
      expectedUnique: 5_000_000,
      falsePositiveRate: 0.0001
      })
    • Remove duplicate sequences

      High-performance deduplication using probabilistic Bloom filters or exact Set-based approaches. Supports both simple deduplication and advanced configuration for large datasets.

      Parameters

      • by: "both" | "sequence" | "id"
      • exact: boolean

      Returns SeqOps<T>

      // Simple deduplication (most common cases)
      seqops(sequences)
      .rmdup('sequence') // Remove sequence duplicates
      .rmdup('id', true) // Remove ID duplicates (exact)

      // Advanced options for large datasets
      seqops(sequences)
      .rmdup({
      by: 'both',
      expectedUnique: 5_000_000,
      falsePositiveRate: 0.0001
      })
    • Remove duplicate sequences

      High-performance deduplication using probabilistic Bloom filters or exact Set-based approaches. Supports both simple deduplication and advanced configuration for large datasets.

      Parameters

      • options: RmdupOptions

      Returns SeqOps<T>

      // Simple deduplication (most common cases)
      seqops(sequences)
      .rmdup('sequence') // Remove sequence duplicates
      .rmdup('id', true) // Remove ID duplicates (exact)

      // Advanced options for large datasets
      seqops(sequences)
      .rmdup({
      by: 'both',
      expectedUnique: 5_000_000,
      falsePositiveRate: 0.0001
      })
    • Rename duplicated sequence IDs

      Appends numeric suffixes to duplicate IDs to ensure uniqueness. Useful after merging datasets or processing PCR replicates.

      Parameters

      • Optionaloptions: RenameOptions

        Rename options

      Returns SeqOps<T>

      New SeqOps with unique IDs

      // Basic usage - duplicates get "_2", "_3" suffixes
      seqops(sequences).rename();

      // Rename all occurrences including first
      seqops(sequences).rename({ renameFirst: true, startNum: 1 });
      // Result: "id_1", "id_2", "id_3"
    • Remove duplicate sequences with configurable deduplication strategies

      Streaming deduplication with multiple key extraction methods and conflict resolution strategies. Memory-efficient for large datasets when using the default "first" strategy.

      Parameters

      • Optionaloptions: UniqueOptions

        Deduplication options

      Returns SeqOps<T>

      New SeqOps with deduplicated sequences

      // Remove duplicate sequences (most common)
      seqops(sequences).unique();

      // Remove sequences with duplicate IDs
      seqops(sequences).unique({ by: "id" });

      // Case-insensitive sequence deduplication
      seqops(sequences).unique({ by: "sequence", caseSensitive: false });
      // Keep longest when duplicates found
      seqops(sequences).unique({
      by: "sequence",
      conflictResolution: "longest"
      });

      // Keep highest quality reads (FASTQ only)
      seqops(reads).unique({
      by: "sequence",
      conflictResolution: "highest-quality"
      });
      // Custom deduplication key
      seqops(sequences).unique({
      by: (seq) => seq.id.split("_")[0] // Group by ID prefix
      });
    • Replace sequence names/content by regular expression

      Performs pattern-based substitution on sequence IDs (default) or sequence content (FASTA only). Supports capture variables, special placeholders ({nr}, {kv}, {fn}), and grep-style filtering.

      Parameters

      • options: ReplaceOptions

        Replace options with pattern and replacement string

      Returns SeqOps<T>

      New SeqOps instance for chaining

      // Remove descriptions from sequence IDs
      seqops(sequences).replace({ pattern: '\\s.+', replacement: '' })

      // Add prefix to all sequence IDs
      seqops(sequences).replace({ pattern: '^', replacement: 'PREFIX_' })

      // Use capture variables to restructure IDs
      seqops(sequences).replace({
      pattern: '^(\\w+)_(\\w+)',
      replacement: '$2_$1'
      })

      // Key-value lookup from file
      seqops(sequences).replace({
      pattern: '^(\\w+)',
      replacement: '$1_{kv}',
      kvFile: 'aliases.txt'
      })
    • Translate DNA/RNA sequences to proteins

      High-performance protein translation supporting all 31 NCBI genetic codes with progressive disclosure for optimal developer experience.

      Parameters

      • OptionalgeneticCode: number | TranslateOptions

        Genetic code number (1-33) or full options object

      Returns SeqOps<T>

      New SeqOps instance for chaining

      // Simple cases (90% of usage)
      seqops(sequences)
      .translate() // Standard genetic code, frame +1
      .translate(2) // Vertebrate mitochondrial code

      // Advanced options (10% of usage)
      seqops(sequences)
      .translate({
      geneticCode: 1,
      orfsOnly: true,
      minOrfLength: 30
      })
    • Translate using mitochondrial genetic code (convenience method)

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .translateMito() // Genetic code 2 - Vertebrate Mitochondrial
    • Translate all 6 reading frames (convenience method)

      Parameters

      • geneticCode: number = 1

        Genetic code to use (default: 1 = Standard)

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .translateAllFrames() // All frames with standard code
      .translateAllFrames(2) // All frames with mito code
    • Find and translate open reading frames (convenience method)

      Parameters

      • minLength: number = 30

        Minimum ORF length in amino acids (default: 30)

      • geneticCode: number = 1

        Genetic code to use (default: 1 = Standard)

      Returns SeqOps<T>

      New SeqOps instance for chaining

      seqops(sequences)
      .translateOrf() // Default: 30 aa minimum
      .translateOrf(100) // 100 aa minimum
      .translateOrf(50, 2) // 50 aa minimum, mito code
    • Split sequences into multiple files

      Terminal operation that writes pipeline sequences to separate files with comprehensive seqkit split/split2 compatibility. Integrates seamlessly with all SeqOps pipeline operations for sophisticated genomic workflows.

      Parameters

      • options: SplitOptions

        Split configuration options

      Returns Promise<SplitSummary>

      Promise resolving to split results summary

      // Basic usage - split after processing
      const result = await seqops(sequences)
      .filter({ minLength: 100 })
      .clean({ removeGaps: true })
      .split({ mode: 'by-size', sequencesPerFile: 1000 });

      // Real-world genomics: Quality control → split for parallel processing
      const qcResults = await seqops(rawReads)
      .quality({ minScore: 20, trim: true }) // Quality filter
      .filter({ minLength: 50, maxLength: 150 }) // Length filter
      .clean({ removeAmbiguous: true }) // Clean sequences
      .split({ mode: 'by-length', basesPerFile: 1000000 }); // 1MB chunks

      // Genome assembly: Split chromosomes for parallel analysis
      const chrResults = await seqops(genome)
      .grep({ pattern: /^chr[1-9]/, target: 'id' }) // Autosomal only
      .transform({ upperCase: true }) // Normalize case
      .split({ mode: 'by-id', idRegex: 'chr(\\d+)' }); // Group by chromosome

      // Amplicon sequencing: Process primers → split by target
      const amplicons = await seqops(sequences)
      .grep({ pattern: forwardPrimer, target: 'sequence' }) // Has forward primer
      .grep({ pattern: reversePrimer, target: 'sequence' }) // Has reverse primer
      .subseq({ region: '20:-20' }) // Trim primers
      .split({ mode: 'by-parts', numParts: 8 }); // Parallel processing

      console.log(`Created ${result.filesCreated.length} files`);
    • Split sequences with streaming results for advanced processing

      Returns AsyncIterable of split results following the locate() pattern. Enables sophisticated post-processing workflows where each split result needs individual handling during the splitting process.

      Parameters

      • options: SplitOptions

        Split configuration options

      Returns AsyncIterable<SplitResult>

      AsyncIterable of split results for processing

      // Basic streaming - process each split file as it's created
      for await (const result of seqops(sequences).splitToStream(options)) {
      await compressFile(result.outputFile);
      console.log(`Split ${result.sequenceCount} sequences to ${result.outputFile}`);
      }

      // Large genome processing: Split → compress → upload pipeline
      for await (const chunk of seqops(largeGenome).splitToStream({
      mode: 'by-length',
      basesPerFile: 50_000_000 // 50MB chunks
      })) {
      // Process each chunk immediately to manage memory
      await compressWithBgzip(chunk.outputFile);
      await uploadToCloud(chunk.outputFile + '.gz');
      await deleteLocalFile(chunk.outputFile); // Clean up
      console.log(`Processed chunk ${chunk.partId}: ${chunk.sequenceCount} sequences`);
      }

      // Quality control: Split → validate → report pipeline
      const qualityReports = [];
      for await (const batch of seqops(sequencingRun).splitToStream({
      mode: 'by-size',
      sequencesPerFile: 10000
      })) {
      const qc = await runQualityControl(batch.outputFile);
      qualityReports.push({
      file: batch.outputFile,
      sequences: batch.sequenceCount,
      qcScore: qc.overallScore
      });
      }
    • Split by sequence count (convenience method)

      Most common splitting mode - divide sequences into files with N sequences each. Ideal for creating manageable chunks for parallel processing.

      Parameters

      • sequencesPerFile: number

        Number of sequences per output file

      • outputDir: string = "./split"

        Output directory (default: './split')

      Returns Promise<SplitSummary>

      Promise resolving to split results

      // Simple case - just split
      await seqops(sequences).splitBySize(1000);

      // Common workflow: Filter → process → split for downstream analysis
      await seqops(rawSequences)
      .filter({ minLength: 100 })
      .clean({ removeGaps: true })
      .splitBySize(5000, './chunks');

      // RNA-seq: Quality filter → deduplicate → split for differential expression
      await seqops(rnaseqReads)
      .quality({ minScore: 20 })
      .rmdup({ by: 'sequence' })
      .splitBySize(100000, './de-analysis');
    • Split into equal parts (convenience method)

      Parameters

      • numParts: number

        Number of output files to create

      • outputDir: string = "./split"

        Output directory (default: './split')

      Returns Promise<SplitSummary>

      Promise resolving to split results

    • Split by base count (convenience method)

      Implements seqkit split2's key functionality for splitting by total sequence bases rather than sequence count. Essential for genome processing where you need consistent data sizes regardless of sequence count.

      Parameters

      • basesPerFile: number

        Number of bases per output file

      • outputDir: string = "./split"

        Output directory (default: './split')

      Returns Promise<SplitSummary>

      Promise resolving to split results

      // Genome assembly: Split into 10MB chunks for parallel processing
      await seqops(scaffolds).splitByLength(10_000_000);

      // Metagenomics: Process → bin → split by data size
      await seqops(contigs)
      .filter({ minLength: 1000 })
      .sort({ by: 'length', order: 'desc' }) // Longest first
      .splitByLength(5_000_000, './metagenome-bins');

      // Long-read sequencing: Quality control → split for analysis
      await seqops(nanoporeReads)
      .quality({ minScore: 7 }) // Nanopore quality threshold
      .filter({ minLength: 5000, maxLength: 100000 })
      .splitByLength(50_000_000, './nanopore-chunks');
    • Split by sequence ID pattern (convenience method)

      Groups sequences by ID patterns for organized analysis. String patterns are automatically converted to RegExp for better developer experience.

      Parameters

      • pattern: string | RegExp

        String pattern or RegExp to group sequences by ID

      • outputDir: string = "./split"

        Output directory (default: './split')

      Returns Promise<SplitSummary>

      Promise resolving to split results

      // Genome assembly: Split by chromosome
      await seqops(scaffolds).splitById('chr(\\d+)'); // chr1, chr2, chr3...

      // Multi-species analysis: Group by organism
      await seqops(sequences)
      .splitById('(\\w+)_gene'); // Groups: human_gene, mouse_gene, etc.

      // Transcriptome: Split by gene families
      await seqops(transcripts)
      .filter({ minLength: 200 })
      .transform({ upperCase: true })
      .splitById('(HOX\\w+)_transcript', './gene-families');

      // Advanced: Use RegExp for complex patterns
      await seqops(sequences)
      .splitById(/^(chr[XY]|chrM)_/, './sex-chromosomes');
    • Split by genomic region with compile-time validation (convenience method)

      Uses advanced TypeScript template literal types to parse and validate genomic regions at compile time, preventing coordinate errors.

      Type Parameters

      • T extends string

      Parameters

      • region: T extends ValidGenomicRegion<T> ? T<T> : never

        Genomic region string with compile-time validation

      • outputDir: string = "./split"

        Output directory (default: './split')

      Returns Promise<SplitSummary>

      Promise resolving to split results

      // ✅ Type-safe region parsing - validated at compile time
      await seqops(sequences).splitByRegion('chr1:1000-2000');
      await seqops(sequences).splitByRegion('scaffold_1:500-1500');
      await seqops(sequences).splitByRegion('chrX:0-1000'); // 0-based OK

      // ❌ These cause TypeScript compilation errors:
      // await seqops(sequences).splitByRegion('chr1:2000-1000'); // end < start
      // await seqops(sequences).splitByRegion('chr1:1000-1000'); // end = start
      // await seqops(sequences).splitByRegion('invalid-format'); // bad format

      // 🔥 Compile-time coordinate extraction available:
      type Coords = ExtractCoordinates<'chr1:1000-2000'>;
      // → { chr: 'chr1'; start: 1000; end: 2000; length: 1000 }
    • Calculate sequence statistics

      Terminal operation that processes all sequences to compute statistics. Mirrors seqkit stats functionality.

      Parameters

      Returns Promise<SequenceStats>

      Promise resolving to statistics

      const stats = await seqops(sequences)
      .seq({ minLength: 100 })
      .stats({ detailed: true });
      console.log(`N50: ${stats.n50}`);
    • Write sequences to FASTA file

      Terminal operation that writes all sequences in FASTA format.

      Parameters

      • path: string

        Output file path

      • options: { wrapWidth?: number } = {}

        Writer options

      Returns Promise<void>

      Promise resolving when write is complete

      await seqops(sequences)
      .seq({ reverseComplement: true })
      .writeFasta('output.fasta');
    • Write sequences to FASTQ file

      Terminal operation that writes all sequences in FASTQ format. If input sequences don't have quality scores, uses default quality.

      Parameters

      • path: string

        Output file path

      • defaultQuality: string = "I"

        Default quality string for FASTA sequences

      Returns Promise<void>

      Promise resolving when write is complete

      await seqops(sequences)
      .seq({ minQuality: 20 })
      .writeFastq('output.fastq', 'IIIIIIIIII');
    • Write sequences to JSON file

      Convenience method that converts sequences to tabular format and writes as JSON. Supports both simple array format and wrapped format with metadata. Loads entire dataset into memory before writing.

      Parameters

      • path: string

        Output file path

      • Optionaloptions: Fx2TabOptions<readonly string[]> & JSONWriteOptions

        Combined column selection and JSON formatting options

      Returns Promise<void>

      Promise resolving when write is complete

      // Simple JSON array
      await SeqOps.fromFasta('input.fa')
      .writeJSON('output.json');

      // With selected columns
      await SeqOps.fromFasta('input.fa')
      .writeJSON('output.json', {
      columns: ['id', 'sequence', 'length', 'gc']
      });

      // Pretty-printed with metadata
      await SeqOps.fromFasta('input.fa')
      .writeJSON('output.json', {
      columns: ['id', 'sequence', 'length'],
      pretty: true,
      includeMetadata: true
      });

      O(n) memory - loads all sequences. Use writeJSONL() for large datasets.

      v0.1.0

    • Write sequences to JSONL (JSON Lines) file

      Convenience method that converts sequences to tabular format and writes as JSONL (one JSON object per line). Provides streaming with O(1) memory usage, ideal for large datasets.

      Note: JSONL format does not support metadata or pretty-printing. Each line is a separate, compact JSON object.

      Parameters

      • path: string

        Output file path

      • Optionaloptions: Fx2TabOptions<readonly string[]>

        Column selection options (JSON formatting options not applicable)

      Returns Promise<void>

      Promise resolving when write is complete

      // Basic JSONL output
      await SeqOps.fromFasta('input.fa')
      .writeJSONL('output.jsonl');

      // With selected columns
      await SeqOps.fromFasta('input.fa')
      .writeJSONL('output.jsonl', {
      columns: ['id', 'sequence', 'length', 'gc']
      });

      // Large dataset streaming
      await SeqOps.fromFasta('huge-dataset.fa')
      .filter({ minLength: 100 })
      .writeJSONL('filtered.jsonl'); // O(1) memory

      O(1) memory - streams line-by-line. Use for large datasets.

      v0.1.0

    • Convert sequences to tabular format

      Transform sequences into a tabular representation with configurable columns. This is the primary method for tabular conversion, providing a more intuitive name than the seqkit-inspired fx2tab.

      Type Parameters

      • Columns extends readonly string[] = readonly ["id", "seq", "length"]

      Parameters

      • Optionaloptions: Fx2TabOptions<Columns>

        Column selection and formatting options

      Returns TabularOps<Columns>

      TabularOps instance for further processing or writing

      // Basic conversion to tabular format
      await seqops(sequences)
      .toTabular({ columns: ['id', 'seq', 'length', 'gc'] })
      .writeTSV('output.tsv');

      // With custom columns
      await seqops(sequences)
      .toTabular({
      columns: ['id', 'seq', 'gc'],
      customColumns: {
      high_gc: (seq) => seq.gc > 60 ? 'HIGH' : 'NORMAL'
      }
      })
      .writeCSV('analysis.csv');
    • Convert sequences to tabular format (SeqKit compatibility)

      Alias for .toTabular() maintained for SeqKit parity and backward compatibility. New code should prefer .toTabular() for better clarity.

      Type Parameters

      • Columns extends readonly string[] = readonly ["id", "seq", "length"]

      Parameters

      • Optionaloptions: Fx2TabOptions<Columns>

        Column selection and formatting options

      Returns TabularOps<Columns>

      TabularOps instance for further processing or writing

      toTabular - Primary method for tabular conversion

      // Legacy name for SeqKit users
      await seqops(sequences)
      .fx2tab({ columns: ['id', 'seq', 'gc'] })
      .writeTSV('output.tsv');
    • Convert sequences to row-based format

      Clearer alias for .toTabular() that emphasizes the row-based structure used for output to various formats (TSV, CSV, JSON, JSONL).

      This method converts sequences into a structured row format that can be written to tabular formats (TSV/CSV) or object formats (JSON/JSONL). Use this when the term "tabular" feels semantically incorrect for your output format (e.g., JSON).

      Type Parameters

      • Columns extends readonly string[] = readonly ["id", "seq", "length"]

      Parameters

      • Optionaloptions: Fx2TabOptions<Columns>

        Column selection and formatting options

      Returns TabularOps<Columns>

      TabularOps instance for further processing or writing

      toTabular - Original method name

      // Writing to JSON - "rows" is clearer than "tabular"
      await seqops(sequences)
      .asRows({ columns: ['id', 'sequence', 'length'] })
      .writeJSON('output.json');

      // Writing to JSONL
      await seqops(sequences)
      .asRows({ columns: ['id', 'seq', 'gc'] })
      .writeJSONL('output.jsonl');

      // Also works for tabular formats
      await seqops(sequences)
      .asRows({ columns: ['id', 'seq', 'length'] })
      .writeTSV('output.tsv');

      v0.1.0

    • Write sequences as TSV (tab-separated values)

      Terminal operation that writes sequences as tab-separated values.

      Parameters

      • path: string

        Output file path

      • options: Omit<Fx2TabOptions, "delimiter"> = {}

        Conversion options (delimiter will be set to tab)

      Returns Promise<void>

      // Simple TSV output
      await seqops(sequences).writeTSV('output.tsv');

      // With column selection
      await seqops(sequences).writeTSV('output.tsv', {
      columns: ['id', 'seq', 'length', 'gc']
      });
    • Write sequences as CSV (comma-separated values)

      Terminal operation that writes sequences as comma-separated values. Excel protection is recommended for CSV files.

      Parameters

      • path: string

        Output file path

      • options: Omit<Fx2TabOptions, "delimiter"> = {}

        Conversion options (delimiter will be set to comma)

      Returns Promise<void>

      // CSV with Excel protection
      await seqops(sequences).writeCSV('output.csv', {
      excelSafe: true
      });
    • Write sequences as DSV with custom delimiter

      Terminal operation for any delimiter-separated format.

      Parameters

      • path: string

        Output file path

      • delimiter: string

        Custom delimiter character(s)

      • options: Omit<Fx2TabOptions, "delimiter"> = {}

        Conversion options

      Returns Promise<void>

      // Pipe-delimited output
      await seqops(sequences).writeDSV('output.psv', '|', {
      columns: ['id', 'seq', 'length']
      });

      // Semicolon for European Excel
      await seqops(sequences).writeDSV('output.csv', ';', {
      excelSafe: true
      });
    • Collect all sequences into an array

      Terminal operation that materializes all sequences in memory. Use with caution on large datasets.

      Returns Promise<T[]>

      Promise resolving to array of sequences

      const sequences = await seqops(input)
      .seq({ minLength: 100 })
      .collect();
      console.log(`Collected ${sequences.length} sequences`);
    • Collect k-mer sequences into KmerSet with K preservation

      When the stream contains KmerSequence objects, returns KmerSet which enforces compile-time k-mer size matching for set operations.

      Type Parameters

      • K extends number

      Parameters

      Returns Promise<KmerSet<K>>

      Promise<KmerSet> for k-mer sequences

    • Collect generic sequences into SequenceSet

      For non-k-mer sequences, returns generic SequenceSet which allows flexible set operations across sequence types.

      Parameters

      Returns Promise<SequenceSet<T>>

      Promise<SequenceSet> for generic sequences

    • Count sequences

      Terminal operation that counts sequences without loading them in memory.

      Returns Promise<number>

      Promise resolving to sequence count

      const count = await seqops(sequences)
      .filter(seq => seq.length > 100)
      .count();
    • Transform sequences with a mapping function

      Transforms each sequence in the stream using the provided function. Type parameter U is inferred from the return type of the mapping function, allowing type transformations while preserving specific sequence types when the mapping function returns the same type.

      After calling .enumerate(), the index parameter becomes available in the mapping function signature.

      Type Parameters

      • U extends AbstractSequence = T

        Output sequence type (defaults to T for type preservation)

      Parameters

      • this: SeqOps<T & { index: number }>
      • fn: (seq: T, index: number) => U | Promise<U>

        Mapping function (with index after enumerate)

      Returns SeqOps<U>

      New SeqOps with transformed sequences

      // Transform without index
      seqops<FastqSequence>(reads)
      .map((seq) => ({ ...seq, id: `sample1_${seq.id}` }));
      // Type preserved: SeqOps<FastqSequence>

      // Transform with index (after enumerate)
      seqops(sequences)
      .enumerate()
      .map((seq, idx) => ({
      ...seq,
      description: `position=${idx} ${seq.description || ""}`,
      }));

      // Async transformation
      seqops(sequences)
      .map(async (seq) => {
      const annotation = await fetchAnnotation(seq.id);
      return { ...seq, description: annotation };
      });
    • Transform sequences with a mapping function

      Transforms each sequence in the stream using the provided function. Type parameter U is inferred from the return type of the mapping function, allowing type transformations while preserving specific sequence types when the mapping function returns the same type.

      After calling .enumerate(), the index parameter becomes available in the mapping function signature.

      Type Parameters

      • U extends AbstractSequence = T

        Output sequence type (defaults to T for type preservation)

      Parameters

      • fn: (seq: T) => U | Promise<U>

        Mapping function (with index after enumerate)

      Returns SeqOps<U>

      New SeqOps with transformed sequences

      // Transform without index
      seqops<FastqSequence>(reads)
      .map((seq) => ({ ...seq, id: `sample1_${seq.id}` }));
      // Type preserved: SeqOps<FastqSequence>

      // Transform with index (after enumerate)
      seqops(sequences)
      .enumerate()
      .map((seq, idx) => ({
      ...seq,
      description: `position=${idx} ${seq.description || ""}`,
      }));

      // Async transformation
      seqops(sequences)
      .map(async (seq) => {
      const annotation = await fetchAnnotation(seq.id);
      return { ...seq, description: annotation };
      });
    • Attach index to each sequence

      Adds a zero-based index property to each sequence in the stream. After calling this method, downstream operations like .map() and .filter() can access the index parameter in their callback functions.

      The index represents the position of the sequence in the stream (0-based).

      Returns SeqOps<T & { index: number }>

      New SeqOps with sequences that have an index property

      // Enable index parameter in downstream operations
      const results = await seqops<FastqSequence>(reads)
      .enumerate()
      .filter((seq, idx) => idx < 10000) // Index available
      .map((seq, idx) => ({
      ...seq,
      description: `${seq.description} pos=${idx}`,
      }))
      .collect();

      // Type: Array<FastqSequence & { index: number }> ✅
      results[0].quality; // ✅ Exists (FastqSequence preserved)
      results[0].index; // ✅ Exists (from enumerate)
      // Position-based filtering
      seqops(sequences)
      .enumerate()
      .filter((seq, idx) => idx % 2 === 0) // Keep even positions
      .writeFasta('even_positions.fasta');
      // Progress tracking
      seqops(sequences)
      .enumerate()
      .tap((seq, idx) => {
      if (idx % 1000 === 0) console.log(`Processed ${idx}`);
      })
      .filter({ minLength: 100 });
    • Apply side effects without consuming the stream

      Executes a function for each sequence but yields the original sequence unchanged. Useful for logging, progress tracking, or other side effects that shouldn't modify the sequence data.

      After calling .enumerate(), the index parameter becomes available.

      Parameters

      • this: SeqOps<T & { index: number }>
      • fn: (seq: T, index: number) => void | Promise<void>

        Side effect function (with index after enumerate)

      Returns SeqOps<T>

      Same SeqOps for continued chaining

      // Progress logging without index
      let count = 0;
      seqops(sequences)
      .tap((seq) => {
      count++;
      if (count % 1000 === 0) console.log(`Processed ${count}`);
      })
      .filter({ minLength: 100 })
      .writeFasta('output.fasta');
      // Progress tracking with index
      seqops(sequences)
      .enumerate()
      .tap((seq, idx) => {
      if (idx % 1000 === 0) console.log(`Processed ${idx}`);
      })
      .filter({ minLength: 100 });
      // Collect statistics without modifying stream
      const stats = { totalLength: 0, count: 0 };
      seqops(sequences)
      .tap((seq) => {
      stats.totalLength += seq.length;
      stats.count++;
      })
      .filter({ minLength: 100 })
      .writeFasta('filtered.fasta');
      console.log(`Average length: ${stats.totalLength / stats.count}`);
      // Async side effects (e.g., logging to database)
      seqops(sequences)
      .enumerate()
      .tap(async (seq, idx) => {
      await logToDatabase({ id: seq.id, position: idx });
      })
      .filter({ minLength: 100 });
    • Apply side effects without consuming the stream

      Executes a function for each sequence but yields the original sequence unchanged. Useful for logging, progress tracking, or other side effects that shouldn't modify the sequence data.

      After calling .enumerate(), the index parameter becomes available.

      Parameters

      • fn: (seq: T) => void | Promise<void>

        Side effect function (with index after enumerate)

      Returns SeqOps<T>

      Same SeqOps for continued chaining

      // Progress logging without index
      let count = 0;
      seqops(sequences)
      .tap((seq) => {
      count++;
      if (count % 1000 === 0) console.log(`Processed ${count}`);
      })
      .filter({ minLength: 100 })
      .writeFasta('output.fasta');
      // Progress tracking with index
      seqops(sequences)
      .enumerate()
      .tap((seq, idx) => {
      if (idx % 1000 === 0) console.log(`Processed ${idx}`);
      })
      .filter({ minLength: 100 });
      // Collect statistics without modifying stream
      const stats = { totalLength: 0, count: 0 };
      seqops(sequences)
      .tap((seq) => {
      stats.totalLength += seq.length;
      stats.count++;
      })
      .filter({ minLength: 100 })
      .writeFasta('filtered.fasta');
      console.log(`Average length: ${stats.totalLength / stats.count}`);
      // Async side effects (e.g., logging to database)
      seqops(sequences)
      .enumerate()
      .tap(async (seq, idx) => {
      await logToDatabase({ id: seq.id, position: idx });
      })
      .filter({ minLength: 100 });
    • Map each sequence to multiple sequences and flatten the result

      Transforms each sequence into zero or more sequences, then flattens all results into a single stream. The mapping function can return an array or an async iterable.

      After calling .enumerate(), the index parameter becomes available.

      Type Parameters

      • U extends AbstractSequence = T

        Output sequence type (defaults to T for type preservation)

      Parameters

      • this: SeqOps<T & { index: number }>
      • fn: (seq: T, index: number) => U[] | AsyncIterable<U, any, any> | Promise<U[]>

        Mapping function that returns array or async iterable (with index after enumerate)

      Returns SeqOps<U>

      New SeqOps with flattened results

      // Expand each sequence to multiple variants
      seqops(sequences)
      .flatMap((seq) => [
      { ...seq, id: `${seq.id}_variant1`, sequence: variant1(seq.sequence) },
      { ...seq, id: `${seq.id}_variant2`, sequence: variant2(seq.sequence) },
      ])
      .writeFasta('variants.fasta');
      // Generate k-mers from each sequence
      seqops(sequences)
      .flatMap((seq) => generateKmers(seq, 21))
      .unique({ by: 'sequence' })
      .writeFasta('unique_kmers.fasta');
      // With index - expand based on position
      seqops(sequences)
      .enumerate()
      .flatMap((seq, idx) => {
      const count = idx < 10 ? 3 : 1; // More variants for first 10
      return Array.from({ length: count }, (_, i) => ({
      ...seq,
      id: `${seq.id}_copy${i}`,
      }));
      });
      // Async iterable result
      seqops(sequences)
      .flatMap(async function* (seq) {
      for (const frame of [1, 2, 3, -1, -2, -3]) {
      yield translateFrame(seq, frame);
      }
      });
    • Map each sequence to multiple sequences and flatten the result

      Transforms each sequence into zero or more sequences, then flattens all results into a single stream. The mapping function can return an array or an async iterable.

      After calling .enumerate(), the index parameter becomes available.

      Type Parameters

      • U extends AbstractSequence = T

        Output sequence type (defaults to T for type preservation)

      Parameters

      • fn: (seq: T) => U[] | AsyncIterable<U, any, any> | Promise<U[]>

        Mapping function that returns array or async iterable (with index after enumerate)

      Returns SeqOps<U>

      New SeqOps with flattened results

      // Expand each sequence to multiple variants
      seqops(sequences)
      .flatMap((seq) => [
      { ...seq, id: `${seq.id}_variant1`, sequence: variant1(seq.sequence) },
      { ...seq, id: `${seq.id}_variant2`, sequence: variant2(seq.sequence) },
      ])
      .writeFasta('variants.fasta');
      // Generate k-mers from each sequence
      seqops(sequences)
      .flatMap((seq) => generateKmers(seq, 21))
      .unique({ by: 'sequence' })
      .writeFasta('unique_kmers.fasta');
      // With index - expand based on position
      seqops(sequences)
      .enumerate()
      .flatMap((seq, idx) => {
      const count = idx < 10 ? 3 : 1; // More variants for first 10
      return Array.from({ length: count }, (_, i) => ({
      ...seq,
      id: `${seq.id}_copy${i}`,
      }));
      });
      // Async iterable result
      seqops(sequences)
      .flatMap(async function* (seq) {
      for (const frame of [1, 2, 3, -1, -2, -3]) {
      yield translateFrame(seq, frame);
      }
      });
    • Process each sequence with a callback (terminal operation)

      Applies a function to each sequence in the stream. This is a terminal operation that consumes the stream and returns when all sequences have been processed.

      After calling .enumerate(), the index parameter becomes available in the callback.

      Parameters

      • this: SeqOps<T & { index: number }>
      • fn: (seq: T, index: number) => void | Promise<void>

        Callback function to execute for each sequence

      Returns Promise<void>

      Promise that resolves when all sequences have been processed

      // Type-safe with FastqSequence
      await seqops<FastqSequence>(reads)
      .forEach((seq) => {
      console.log(seq.quality); // ✅ TypeScript knows quality exists
      });
      // With progress tracking after enumerate
      await seqops(sequences)
      .enumerate()
      .forEach((seq, idx) => {
      if (idx % 1000 === 0) console.log(`Progress: ${idx}`);
      });
      // Async callback support
      await seqops(sequences)
      .forEach(async (seq) => {
      await writeToDatabase(seq);
      });
    • Process each sequence with a callback (terminal operation)

      Applies a function to each sequence in the stream. This is a terminal operation that consumes the stream and returns when all sequences have been processed.

      After calling .enumerate(), the index parameter becomes available in the callback.

      Parameters

      • fn: (seq: T) => void | Promise<void>

        Callback function to execute for each sequence

      Returns Promise<void>

      Promise that resolves when all sequences have been processed

      // Type-safe with FastqSequence
      await seqops<FastqSequence>(reads)
      .forEach((seq) => {
      console.log(seq.quality); // ✅ TypeScript knows quality exists
      });
      // With progress tracking after enumerate
      await seqops(sequences)
      .enumerate()
      .forEach((seq, idx) => {
      if (idx % 1000 === 0) console.log(`Progress: ${idx}`);
      });
      // Async callback support
      await seqops(sequences)
      .forEach(async (seq) => {
      await writeToDatabase(seq);
      });
    • Reduce sequences to a single value using first element as accumulator

      Terminal operation that reduces the stream to a single value by applying a function that combines the accumulator with each sequence. The first sequence in the stream becomes the initial accumulator value.

      Returns undefined if the stream is empty.

      After calling .enumerate(), the index parameter becomes available.

      Parameters

      • this: SeqOps<T & { index: number }>
      • fn: (accumulator: T, seq: T, index: number) => T | Promise<T>

        Reducer function that combines accumulator with each sequence

      Returns Promise<T | undefined>

      Promise resolving to the final accumulated value, or undefined if empty

      // Find longest sequence
      const longest = await seqops<FastqSequence>(reads)
      .reduce((acc, seq) => seq.length > acc.length ? seq : acc);
      // Type: FastqSequence | undefined ✅
      // With index tracking
      const result = await seqops(sequences)
      .enumerate()
      .reduce((acc, seq, idx) => {
      console.log(`Comparing at index ${idx}`);
      return acc.length > seq.length ? acc : seq;
      });
      // Find sequence with highest GC content
      const highestGC = await seqops(sequences)
      .reduce((acc, seq) => {
      const accGC = calculateGC(acc.sequence);
      const seqGC = calculateGC(seq.sequence);
      return seqGC > accGC ? seq : acc;
      });
    • Reduce sequences to a single value using first element as accumulator

      Terminal operation that reduces the stream to a single value by applying a function that combines the accumulator with each sequence. The first sequence in the stream becomes the initial accumulator value.

      Returns undefined if the stream is empty.

      After calling .enumerate(), the index parameter becomes available.

      Parameters

      • fn: (accumulator: T, seq: T) => T | Promise<T>

        Reducer function that combines accumulator with each sequence

      Returns Promise<T | undefined>

      Promise resolving to the final accumulated value, or undefined if empty

      // Find longest sequence
      const longest = await seqops<FastqSequence>(reads)
      .reduce((acc, seq) => seq.length > acc.length ? seq : acc);
      // Type: FastqSequence | undefined ✅
      // With index tracking
      const result = await seqops(sequences)
      .enumerate()
      .reduce((acc, seq, idx) => {
      console.log(`Comparing at index ${idx}`);
      return acc.length > seq.length ? acc : seq;
      });
      // Find sequence with highest GC content
      const highestGC = await seqops(sequences)
      .reduce((acc, seq) => {
      const accGC = calculateGC(acc.sequence);
      const seqGC = calculateGC(seq.sequence);
      return seqGC > accGC ? seq : acc;
      });
    • Fold sequences to a single value with explicit initial value

      Terminal operation that reduces the stream to a single value by applying a function that combines the accumulator with each sequence. Unlike reduce(), fold() requires an explicit initial value and can transform to any type.

      Never returns undefined - always returns at least the initial value.

      After calling .enumerate(), the index parameter becomes available.

      Type Parameters

      • U

      Parameters

      • this: SeqOps<T & { index: number }>
      • fn: (accumulator: U, seq: T, index: number) => U | Promise<U>

        Folder function that combines accumulator with each sequence

      • initialValue: U

        The initial accumulator value

      Returns Promise<U>

      Promise resolving to the final accumulated value

      // Calculate total length
      const totalLength = await seqops(sequences)
      .fold((sum, seq) => sum + seq.length, 0);
      // Type: number ✅
      // Build index mapping
      const index = await seqops<FastqSequence>(reads)
      .fold(
      (map, seq) => map.set(seq.id, seq),
      new Map<string, FastqSequence>(),
      );
      // Type: Map<string, FastqSequence> ✅
      // Collect statistics with position tracking
      const stats = await seqops(sequences)
      .enumerate()
      .fold(
      (acc, seq, idx) => {
      const gc = calculateGC(seq.sequence);
      return {
      min: Math.min(acc.min, gc),
      max: Math.max(acc.max, gc),
      sum: acc.sum + gc,
      count: acc.count + 1,
      positions: [...acc.positions, { idx, gc }],
      };
      },
      { min: Infinity, max: -Infinity, sum: 0, count: 0, positions: [] },
      );
    • Fold sequences to a single value with explicit initial value

      Terminal operation that reduces the stream to a single value by applying a function that combines the accumulator with each sequence. Unlike reduce(), fold() requires an explicit initial value and can transform to any type.

      Never returns undefined - always returns at least the initial value.

      After calling .enumerate(), the index parameter becomes available.

      Type Parameters

      • U

      Parameters

      • fn: (accumulator: U, seq: T) => U | Promise<U>

        Folder function that combines accumulator with each sequence

      • initialValue: U

        The initial accumulator value

      Returns Promise<U>

      Promise resolving to the final accumulated value

      // Calculate total length
      const totalLength = await seqops(sequences)
      .fold((sum, seq) => sum + seq.length, 0);
      // Type: number ✅
      // Build index mapping
      const index = await seqops<FastqSequence>(reads)
      .fold(
      (map, seq) => map.set(seq.id, seq),
      new Map<string, FastqSequence>(),
      );
      // Type: Map<string, FastqSequence> ✅
      // Collect statistics with position tracking
      const stats = await seqops(sequences)
      .enumerate()
      .fold(
      (acc, seq, idx) => {
      const gc = calculateGC(seq.sequence);
      return {
      min: Math.min(acc.min, gc),
      max: Math.max(acc.max, gc),
      sum: acc.sum + gc,
      count: acc.count + 1,
      positions: [...acc.positions, { idx, gc }],
      };
      },
      { min: Infinity, max: -Infinity, sum: 0, count: 0, positions: [] },
      );
    • Combine two streams element-by-element with a combining function

      Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).

      Type Parameters

      Parameters

      • this: SeqOps<T & { index: number }>
      • other: SeqOps<U & { index: number }>

        The second stream to zip with (SeqOps or AsyncIterable)

      • fn: (a: T, b: U, indexA: number, indexB: number) => V | Promise<V>

        Combining function that merges elements from both streams

      Returns SeqOps<V>

      New SeqOps with combined elements

      // Neither enumerated
      const forward = seqops<FastqSequence>("reads_R1.fastq");
      const reverse = seqops<FastqSequence>("reads_R2.fastq");
      forward.zipWith(reverse, (fwd, rev) => ({
      id: `${fwd.id}_merged`,
      sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
      }));
      // Left enumerated only
      forward.enumerate().zipWith(reverse, (fwd, rev, idxFwd) => {
      if (idxFwd % 1000 === 0) console.log(`Processed ${idxFwd} pairs`);
      return mergePair(fwd, rev);
      });
      // Both enumerated - verify alignment
      forward.enumerate().zipWith(reverse.enumerate(), (fwd, rev, idxFwd, idxRev) => {
      if (idxFwd !== idxRev) throw new Error(`Alignment mismatch`);
      return mergePair(fwd, rev);
      });
    • Combine two streams element-by-element with a combining function

      Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).

      Type Parameters

      Parameters

      • this: SeqOps<T & { index: number }>
      • other: SeqOps<U>

        The second stream to zip with (SeqOps or AsyncIterable)

      • fn: (a: T, b: U, indexA: number) => V | Promise<V>

        Combining function that merges elements from both streams

      Returns SeqOps<V>

      New SeqOps with combined elements

      // Neither enumerated
      const forward = seqops<FastqSequence>("reads_R1.fastq");
      const reverse = seqops<FastqSequence>("reads_R2.fastq");
      forward.zipWith(reverse, (fwd, rev) => ({
      id: `${fwd.id}_merged`,
      sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
      }));
      // Left enumerated only
      forward.enumerate().zipWith(reverse, (fwd, rev, idxFwd) => {
      if (idxFwd % 1000 === 0) console.log(`Processed ${idxFwd} pairs`);
      return mergePair(fwd, rev);
      });
      // Both enumerated - verify alignment
      forward.enumerate().zipWith(reverse.enumerate(), (fwd, rev, idxFwd, idxRev) => {
      if (idxFwd !== idxRev) throw new Error(`Alignment mismatch`);
      return mergePair(fwd, rev);
      });
    • Combine two streams element-by-element with a combining function

      Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).

      Type Parameters

      Parameters

      • other: SeqOps<U & { index: number }>

        The second stream to zip with (SeqOps or AsyncIterable)

      • fn: (a: T, b: U, indexB: number) => V | Promise<V>

        Combining function that merges elements from both streams

      Returns SeqOps<V>

      New SeqOps with combined elements

      // Neither enumerated
      const forward = seqops<FastqSequence>("reads_R1.fastq");
      const reverse = seqops<FastqSequence>("reads_R2.fastq");
      forward.zipWith(reverse, (fwd, rev) => ({
      id: `${fwd.id}_merged`,
      sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
      }));
      // Left enumerated only
      forward.enumerate().zipWith(reverse, (fwd, rev, idxFwd) => {
      if (idxFwd % 1000 === 0) console.log(`Processed ${idxFwd} pairs`);
      return mergePair(fwd, rev);
      });
      // Both enumerated - verify alignment
      forward.enumerate().zipWith(reverse.enumerate(), (fwd, rev, idxFwd, idxRev) => {
      if (idxFwd !== idxRev) throw new Error(`Alignment mismatch`);
      return mergePair(fwd, rev);
      });
    • Combine two streams element-by-element with a combining function

      Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).

      Type Parameters

      Parameters

      • other: SeqOps<U>

        The second stream to zip with (SeqOps or AsyncIterable)

      • fn: (a: T, b: U) => V | Promise<V>

        Combining function that merges elements from both streams

      Returns SeqOps<V>

      New SeqOps with combined elements

      // Neither enumerated
      const forward = seqops<FastqSequence>("reads_R1.fastq");
      const reverse = seqops<FastqSequence>("reads_R2.fastq");
      forward.zipWith(reverse, (fwd, rev) => ({
      id: `${fwd.id}_merged`,
      sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
      }));
      // Left enumerated only
      forward.enumerate().zipWith(reverse, (fwd, rev, idxFwd) => {
      if (idxFwd % 1000 === 0) console.log(`Processed ${idxFwd} pairs`);
      return mergePair(fwd, rev);
      });
      // Both enumerated - verify alignment
      forward.enumerate().zipWith(reverse.enumerate(), (fwd, rev, idxFwd, idxRev) => {
      if (idxFwd !== idxRev) throw new Error(`Alignment mismatch`);
      return mergePair(fwd, rev);
      });
    • Combine two streams element-by-element with a combining function

      Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).

      Type Parameters

      Parameters

      • this: SeqOps<T & { index: number }>
      • other: AsyncIterable<U>

        The second stream to zip with (SeqOps or AsyncIterable)

      • fn: (a: T, b: U, indexA: number) => V | Promise<V>

        Combining function that merges elements from both streams

      Returns SeqOps<V>

      New SeqOps with combined elements

      // Neither enumerated
      const forward = seqops<FastqSequence>("reads_R1.fastq");
      const reverse = seqops<FastqSequence>("reads_R2.fastq");
      forward.zipWith(reverse, (fwd, rev) => ({
      id: `${fwd.id}_merged`,
      sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
      }));
      // Left enumerated only
      forward.enumerate().zipWith(reverse, (fwd, rev, idxFwd) => {
      if (idxFwd % 1000 === 0) console.log(`Processed ${idxFwd} pairs`);
      return mergePair(fwd, rev);
      });
      // Both enumerated - verify alignment
      forward.enumerate().zipWith(reverse.enumerate(), (fwd, rev, idxFwd, idxRev) => {
      if (idxFwd !== idxRev) throw new Error(`Alignment mismatch`);
      return mergePair(fwd, rev);
      });
    • Combine two streams element-by-element with a combining function

      Zips two streams together, applying a function to each pair of elements. Index parameters appear in the signature only when the corresponding stream has been enumerated. Stops when either stream ends (shortest-wins behavior).

      Type Parameters

      Parameters

      • other: AsyncIterable<U>

        The second stream to zip with (SeqOps or AsyncIterable)

      • fn: (a: T, b: U) => V | Promise<V>

        Combining function that merges elements from both streams

      Returns SeqOps<V>

      New SeqOps with combined elements

      // Neither enumerated
      const forward = seqops<FastqSequence>("reads_R1.fastq");
      const reverse = seqops<FastqSequence>("reads_R2.fastq");
      forward.zipWith(reverse, (fwd, rev) => ({
      id: `${fwd.id}_merged`,
      sequence: fwd.sequence + "NNNN" + reverseComplement(rev.sequence),
      }));
      // Left enumerated only
      forward.enumerate().zipWith(reverse, (fwd, rev, idxFwd) => {
      if (idxFwd % 1000 === 0) console.log(`Processed ${idxFwd} pairs`);
      return mergePair(fwd, rev);
      });
      // Both enumerated - verify alignment
      forward.enumerate().zipWith(reverse.enumerate(), (fwd, rev, idxFwd, idxRev) => {
      if (idxFwd !== idxRev) throw new Error(`Alignment mismatch`);
      return mergePair(fwd, rev);
      });
    • Interleave with another stream in alternating order

      Combines two streams by alternating elements: left, right, left, right, etc. Both streams must contain sequences of the same type for type safety. Commonly used for Illumina paired-end reads.

      Stops when either stream ends (shortest-wins behavior).

      Parameters

      • other: SeqOps<T> | AsyncIterable<T, any, any>

        Stream to interleave with (SeqOps or AsyncIterable)

      • Optionaloptions: InterleaveOptions

        Interleaving options

      Returns SeqOps<T>

      Interleaved SeqOps stream

      // Basic interleaving
      const forward = seqops<FastqSequence>('reads_R1.fastq');
      const reverse = seqops<FastqSequence>('reads_R2.fastq');

      forward
      .interleave(reverse)
      .writeFastq('interleaved.fastq');
      // Output: F1, R1, F2, R2, F3, R3, ...
      // With ID validation for paired-end reads
      forward
      .interleave(reverse, { validateIds: true })
      .writeFastq('interleaved.fastq');
      // Throws error if IDs don't match
      // Custom ID comparison (ignore /1 /2 suffix)
      forward
      .interleave(reverse, {
      validateIds: true,
      idComparator: (a, b) => {
      const stripSuffix = (id: string) => id.replace(//[12]$/, '');
      return stripSuffix(a) === stripSuffix(b);
      }
      })
      .writeFastq('interleaved.fastq');
      // Type safety - only same types can be interleaved
      const fasta = seqops<FastaSequence>('seqs.fasta');
      const fastq = seqops<FastqSequence>('seqs.fastq');

      fasta.interleave(fasta); // ✅ Both FastaSequence
      fasta.interleave(fastq); // ❌ Type error - FastaSequence vs FastqSequence
    • Repair paired-end read ordering through buffered ID matching

      Matches paired-end reads (R1 and R2) from shuffled or out-of-order streams, then outputs them in correctly interleaved order. Supports two modes:

      • Dual-stream: Match reads from two separate files (R1.fastq + R2.fastq)
      • Single-stream: Repair pairing within one mixed stream

      Uses hash-based buffering to handle out-of-order data, making it suitable for sequences that have been sorted, filtered, or otherwise reordered after initial sequencing.

      Output Order: Always yields R1, R2, R1, R2, R1, R2... (interleaved)

      Memory Management:

      • Buffers reads until match found
      • Default limit: 100,000 reads (configurable)
      • Warns at 80% capacity
      • Throws MemoryError if limit exceeded

      Parameters

      • other: SeqOps<T> | AsyncIterable<T, any, any>

        Second stream for dual-stream mode (R2 reads)

      • Optionaloptions: PairOptions

        Pairing options (ID extraction, buffer limits, unpaired handling)

      Returns SeqOps<T>

      Paired SeqOps stream in interleaved order

      When buffer size exceeds maxBufferSize

      When onUnpaired='error' and unpaired reads found

      // Dual-stream mode: Match reads from separate R1 and R2 files
      const r1 = seqops<FastqSequence>('sample_R1.fastq.gz');
      const r2 = seqops<FastqSequence>('sample_R2.fastq.gz');

      r1.pair(r2).writeFastq('paired.fastq');
      // Output: R1_001, R2_001, R1_002, R2_002, ...
      // Single-stream mode: Repair pairing within mixed stream
      seqops<FastqSequence>('shuffled.fastq')
      .pair()
      .writeFastq('repaired.fastq');
      // Reads with /1 suffix → R1, /2 suffix → R2
      // Custom ID extraction for non-standard naming
      r1.pair(r2, {
      extractPairId: (id) => id.split('_')[0] // Custom base ID
      }).writeFastq('paired.fastq');
      // Strict mode: error on unpaired reads
      r1.pair(r2, {
      onUnpaired: 'error', // Throw on unpaired (default: 'warn')
      maxBufferSize: 50000 // Smaller buffer limit
      }).writeFastq('paired.fastq');
      // Skip unpaired reads silently
      seqops<FastqSequence>('mixed.fastq')
      .pair({ onUnpaired: 'skip' })
      .writeFastq('paired_only.fastq');
      • Best case (synchronized): O(1) memory - minimal buffering
      • Average case (partially shuffled): O(k) where k = shuffle distance
      • Worst case (fully shuffled): O(n) - all reads buffered

      v0.1.0

    • Repair paired-end read ordering through buffered ID matching

      Matches paired-end reads (R1 and R2) from shuffled or out-of-order streams, then outputs them in correctly interleaved order. Supports two modes:

      • Dual-stream: Match reads from two separate files (R1.fastq + R2.fastq)
      • Single-stream: Repair pairing within one mixed stream

      Uses hash-based buffering to handle out-of-order data, making it suitable for sequences that have been sorted, filtered, or otherwise reordered after initial sequencing.

      Output Order: Always yields R1, R2, R1, R2, R1, R2... (interleaved)

      Memory Management:

      • Buffers reads until match found
      • Default limit: 100,000 reads (configurable)
      • Warns at 80% capacity
      • Throws MemoryError if limit exceeded

      Parameters

      • Optionaloptions: PairOptions

        Pairing options (ID extraction, buffer limits, unpaired handling)

      Returns SeqOps<T>

      Paired SeqOps stream in interleaved order

      When buffer size exceeds maxBufferSize

      When onUnpaired='error' and unpaired reads found

      // Dual-stream mode: Match reads from separate R1 and R2 files
      const r1 = seqops<FastqSequence>('sample_R1.fastq.gz');
      const r2 = seqops<FastqSequence>('sample_R2.fastq.gz');

      r1.pair(r2).writeFastq('paired.fastq');
      // Output: R1_001, R2_001, R1_002, R2_002, ...
      // Single-stream mode: Repair pairing within mixed stream
      seqops<FastqSequence>('shuffled.fastq')
      .pair()
      .writeFastq('repaired.fastq');
      // Reads with /1 suffix → R1, /2 suffix → R2
      // Custom ID extraction for non-standard naming
      r1.pair(r2, {
      extractPairId: (id) => id.split('_')[0] // Custom base ID
      }).writeFastq('paired.fastq');
      // Strict mode: error on unpaired reads
      r1.pair(r2, {
      onUnpaired: 'error', // Throw on unpaired (default: 'warn')
      maxBufferSize: 50000 // Smaller buffer limit
      }).writeFastq('paired.fastq');
      // Skip unpaired reads silently
      seqops<FastqSequence>('mixed.fastq')
      .pair({ onUnpaired: 'skip' })
      .writeFastq('paired_only.fastq');
      • Best case (synchronized): O(1) memory - minimal buffering
      • Average case (partially shuffled): O(k) where k = shuffle distance
      • Worst case (fully shuffled): O(n) - all reads buffered

      v0.1.0

    • Find pattern locations in sequences

      Terminal operation that finds all occurrences of patterns within sequences with support for fuzzy matching, strand searching, and various output formats. Mirrors seqkit locate functionality.

      Parameters

      • pattern: string

      Returns AsyncIterable<MotifLocation>

      // Simple cases (most common)
      const locations = seqops(sequences)
      .locate('ATCG') // Exact string match
      .locate(/ATG...TAA/) // Regex pattern
      .locate('ATCG', 2); // Allow 2 mismatches

      // Advanced options for complex scenarios
      const locations = seqops(sequences).locate({
      pattern: 'ATCG',
      allowMismatches: 1,
      searchBothStrands: true,
      outputFormat: 'bed'
      });

      for await (const location of locations) {
      console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
      }
    • Find pattern locations in sequences

      Terminal operation that finds all occurrences of patterns within sequences with support for fuzzy matching, strand searching, and various output formats. Mirrors seqkit locate functionality.

      Parameters

      • pattern: RegExp

      Returns AsyncIterable<MotifLocation>

      // Simple cases (most common)
      const locations = seqops(sequences)
      .locate('ATCG') // Exact string match
      .locate(/ATG...TAA/) // Regex pattern
      .locate('ATCG', 2); // Allow 2 mismatches

      // Advanced options for complex scenarios
      const locations = seqops(sequences).locate({
      pattern: 'ATCG',
      allowMismatches: 1,
      searchBothStrands: true,
      outputFormat: 'bed'
      });

      for await (const location of locations) {
      console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
      }
    • Find pattern locations in sequences

      Terminal operation that finds all occurrences of patterns within sequences with support for fuzzy matching, strand searching, and various output formats. Mirrors seqkit locate functionality.

      Parameters

      • pattern: string
      • mismatches: number

      Returns AsyncIterable<MotifLocation>

      // Simple cases (most common)
      const locations = seqops(sequences)
      .locate('ATCG') // Exact string match
      .locate(/ATG...TAA/) // Regex pattern
      .locate('ATCG', 2); // Allow 2 mismatches

      // Advanced options for complex scenarios
      const locations = seqops(sequences).locate({
      pattern: 'ATCG',
      allowMismatches: 1,
      searchBothStrands: true,
      outputFormat: 'bed'
      });

      for await (const location of locations) {
      console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
      }
    • Find pattern locations in sequences

      Terminal operation that finds all occurrences of patterns within sequences with support for fuzzy matching, strand searching, and various output formats. Mirrors seqkit locate functionality.

      Parameters

      • pattern: RegExp
      • mismatches: number

      Returns AsyncIterable<MotifLocation>

      // Simple cases (most common)
      const locations = seqops(sequences)
      .locate('ATCG') // Exact string match
      .locate(/ATG...TAA/) // Regex pattern
      .locate('ATCG', 2); // Allow 2 mismatches

      // Advanced options for complex scenarios
      const locations = seqops(sequences).locate({
      pattern: 'ATCG',
      allowMismatches: 1,
      searchBothStrands: true,
      outputFormat: 'bed'
      });

      for await (const location of locations) {
      console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
      }
    • Find pattern locations in sequences

      Terminal operation that finds all occurrences of patterns within sequences with support for fuzzy matching, strand searching, and various output formats. Mirrors seqkit locate functionality.

      Parameters

      • options: LocateOptions

      Returns AsyncIterable<MotifLocation>

      // Simple cases (most common)
      const locations = seqops(sequences)
      .locate('ATCG') // Exact string match
      .locate(/ATG...TAA/) // Regex pattern
      .locate('ATCG', 2); // Allow 2 mismatches

      // Advanced options for complex scenarios
      const locations = seqops(sequences).locate({
      pattern: 'ATCG',
      allowMismatches: 1,
      searchBothStrands: true,
      outputFormat: 'bed'
      });

      for await (const location of locations) {
      console.log(`Found at ${location.start}-${location.end} on ${location.strand}`);
      }
    • Enable direct iteration over the pipeline

      Returns AsyncIterator<AbstractSequence>

      Async iterator for sequences

      for await (const seq of seqops(sequences).seq({ minLength: 100 })) {
      console.log(seq.id);
      }