2 Sequence preprocessing and quality control
2.1 Read Preprocessing
Raw paired-end sequencing reads generated from Deroceras laeve samples were subjected to a multi-step quality control workflow prior to downstream taxonomic analyses.
2.2 Raw Sequencing Data
Illumina paired-end FASTQ files were used as the starting input for the analysis. Each read entry contained sequence identifiers, nucleotide sequences, and ASCII-encoded quality scores.
Example input files:
dlaeve1_R1.fastqdlaeve1_R2.fastq
The raw sequencing data generated for this study are publicly available in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA1035784 (samples BC59–BC63).
2.3 Initial Quality Assessment
Raw reads were first evaluated using FastQC to inspect sequence quality, GC content, adapter contamination, and other quality metrics.
module load fastqc/0.11.3
fastqc ../1seq_data/dlaeve1_R1.fastq -o . -t 4
fastqc ../1seq_data/dlaeve1_R2.fastq -o . -t 4The resulting HTML reports were visually inspected before applying quality filtering.
2.4 Adapter Removal and Quality Trimming
Reads were processed with Trimmomatic to remove adapter sequences and low-quality bases. Paired and unpaired output files were generated for each sample.
module load trimmomatic/0.39
trimmomatic PE -threads 4 \
../1seq_data/dlaeve1_R1.fastq \
../1seq_data/dlaeve1_R2.fastq \
trimm_dlaeve1_R1.paired.fq.gz \
trimm_dlaeve1_R1.unpaired.fq.gz \
trimm_dlaeve1_R2.paired.fq.gz \
trimm_dlaeve1_R2.unpaired.fq.gz \
ILLUMINACLIP:Next_Illumina.fa:2:28:10 \
LEADING:28 TRAILING:28 \
SLIDINGWINDOW:4:28 \
MINLEN:502.5 Post-trimming Quality Review
Filtered paired reads were evaluated again using FastQC to confirm improvement in sequence quality and identify any remaining issues.
2.6 Additional 5’ Trimming
Based on the quality inspection of trimmed reads, the first 9 nucleotides of each sequence were removed using Cutadapt.
module load anaconda3/2025.06
source activate cutadapt-5.1
cutadapt -u 9 \
-o Dlaeve_R1_cut.fastq \
../3trimmomatic/trimm_dlaeve_R1.paired.fq.gz
cutadapt -u 9 \
-o Dlaeve_R2_cut.fastq \
../3trimmomatic/trimm_dlaeve_R2.paired.fq.gz
2.7 Final Output
The resulting *_cut.fastq files were used as high-quality input reads for subsequent taxonomic profiling and downstream microbiota analyses.