EFSA’s Requirements for Whole Genome Sequencing of Microorganisms

Whole genome sequencing is required by the European Food Safety Authority

Microorganism strains used as such or as production organisms in food or feed require safety assessment and pre-market authorisation. Whole genome sequencing (WGS) provides tools for strain characterisation, including taxonomic identification and characterisation of potential traits of concern.

WGS has been a requirement for strain characterisation since 2018, when it was introduced in the European Food Safety Authority’s (EFSA) Panel on Additives and Products or Substances used in Animal Feed (FEEDAP). Currently, this requirement has been expanded to cover all microorganisms used in the food chain. The characterisation of microorganisms should generally be performed by whole genome sequencing (WGS)-based analyses. The analysis of complete WGS data is required for bacteria, yeasts, filamentous fungi and viruses.

EFSA’s August 2024 statement on Whole Genome Sequencing

The EFSA WGS statements provide more details on WGS requirements for microorganisms intentionally used in the food chain and extend the use of WGS to all areas under food safety regulation.

The statement acknowledges that the field of WGS is rapidly evolving, and applicants may choose the suitable methods and databases to complete the analysis. The methodologies used must be reported in detail to allow scientific assessment of the results. Bacteria, yeasts, and filamentous fungi are not the only organisms covered; the statement also extends the methods to organisms such as viruses or microalgae.

Microbiome research and WGS analysis of secondary metabolites

A summary of the current EFSA requirements for Whole Genome Sequencing

Biosafe always follows the latest instructions in bioinformatic analysis. If you have questions regarding the analysis or our summary, feel free to contact us via the details below.

Please note, that desk-oriented work such as literature research, bioinformatics studies (which includes WGS) and other studies not involving laboratories and testing facilities are not subject to pre-submission phase consultations and public consultations.

Step	Description
Microorganism and nucleic acid extraction	The organism under analysis should be exactly the one that is subject to the application for authorisation. Before DNA extraction, the microorganism should be a pure culture, and the DNA isolation method must be described. For the whole genome analysis, both chromosomes and extra-chromosomal elements, such as plasmids, must be isolated.
Sequencing	Long-read or hybrid sequencing methods are required for bacterial strains, and for viruses that have a genome of 20 kb or larger. This approach is also strongly recommended for yeasts and filamentous fungi. The integration of short-read and long-read sequencing data sets provides the best results in terms of genome completeness (including extra-chromosomal elements) and reliability of correct genome assembly.
Library construction	The library construction method must be described, and if any selection is used, it must be ensured that small fragments are not lost. The program, software version and parameters used for the quality control and filtering of the sequencing reads and the corresponding values obtained should be reported. Trimming of short reads is recommended. The number of reads and total base pairs of sequence data before and after trimming should be reported.
Coverage	The applicant should describe the sequencing instrumentation used, and any base-calling method applied. The sequencing should target 100-fold coverage of the genome, but 30-fold coverage may be acceptable.
Contamination	Contamination in the sequence reads should be assessed, and contamination of ≥5% must be explained.
Genome assembly	Two different approaches can be used for the genome assembly: de novo assembly or reference-based read mapping. For the de novo approach, details must be provided on the software and parameters used, and total number of contigs should be <500 for bacteria and <1000 for yeasts and filamentous fungi. Assembly parameters must be reported, and justification should be provided if the assembly is ±20% of the expected size. For eukaryotic genomes, the assembly quality must be assessed using, e.g., BUSCO gene sets. If genome annotation is performed, the method should be reported. Ideally, > 90% complete matches to BUSCO gene set from the most closely related group of yeasts/filamentous fungi should be present in the assembly. For bacteria, a complete genome sequence should be pursued, but a draft genome may be accepted. The completeness can be assessed by mapping the reads to a reference genome or aligning the genome assembly with the reference.
Identification of the microorganism	Identification is the basis of safety assessment and should be provided, where possible, at species level. For bacteria, yeasts, filamentous fungi and viruses the taxonomic identification should be established through WGS data analyses. For bacteria, the identification should be based on digital DNA-DNA hybridisation or average nucleotide identity (ANI). A phylogenomic analysis is recommended when the ANI or dDDH analysis does not unequivocally assign the strain to a specific species. The data from the microorganism under assessment should be compared with the genome of the type strain of the expected species and with several genomes of type strains of closely related species. For yeasts and filamentous fungi, identification should be done by phylogenomic analysis (e.g. using a concatenation of several conserved sequences to produce a phylogeny against available related genomes), by alignment to a complete reference genome from the same species or by ANI analysis. For viruses, identification should be done by complete genome analysis and comparison of the sequence against maintained and up-to-date databases. For microalgae and other protists the taxonomic identification should be achieved by combining morphological and DNA sequencing information of selected genetic markers, i.e. the complete or a large portion of the 18S rRNA gene, together with loci, which are variable enough to provide a robust identification of the species.
Identification of genes of concern	Genes of concern (those conferring to resistance to clinically relevant antimicrobials, virulence, pathogenicity or toxicity) may be searched using a search/comparison-based approach against maintained databases or a mapping-based approach. The strategy, software and all relevant parameters used to identify genes of interest should be reported, and the results should be presented in a table. The search should not be older than 2 years from the date of submission of the application. When the genomic comparison analysis identifies a hit with a known AMR gene, the acquired or intrinsic nature of the gene in the bacterial species of the strain under assessment should be determined. When searching for antimicrobial resistance genes, at least two maintained databases should be used. Query sequence hits with at least 80% identity and 70% length of the subject sequence should be reported. For virulence factors, the same thresholds are applied. In case two or more fragments covering less than 70% length of the subject sequence with at least 80% identity to the same gene are detected, these should be reported, and it should be checked whether the full gene is present. For bacteriophages, the absence of lysogenic activity and ability to transduce (mobilise) DNA should be assessed.
Production of antimicrobial substances	For non-QPS species, species known to produce relevant antimicrobials, or species included in the QPS list but for which a qualification for antimicrobial production exists, the assessment of the production of antimicrobial substances should be performed using WGS-based analysis and phenotypic tests. The WGS data for the strain should be interrogated for the presence of genes or gene clusters involved in the biosynthesis of antimicrobials against an up-to-date curated database.
Genetic modifications	The characterisation of the genetic modifications can be done by comparing the WGS data of the genetically modified microorganism (GMM) with that of the non-modified reference genome (parental or recipient strain). Based on the alignment between the GMM and the reference, the actual genetic modification should be characterised, and a graphic presentation of the modification should be provided. All modifications, both coding and non-coding should be described, which requires that the person doing the WGS analysis should be informed by the applicant about the intended modifications.
Data submission	The data submission, when relevant, includes sequencing reads (e.g. Bacillus_subtilis_XXX12345_sequence.fastq.gz), assembled sequences (e.g. Bacillus_subtilis_XXX12345_assembled_genome.fasta.gz), for GMMs, the assembled sequence of the non-genetically modified reference strain (e.g. Bacillus_subtilis_XXX12345_reference_genome.fasta.gz) and annotations eg. in gff- or gbk-format.