« back to Insight

EFSA’s Requirements for Whole Genome Sequencing of Microorganisms

Whole genome sequencing is required by the European Food Safety Authority

Microorganism strains used as such or as production organisms in food or feed require safety assessment and pre-market authorisation. Whole genome sequencing (WGS) provides tools for strain characterisation, including taxonomic identification and characterisation of potential traits of concern.

WGS has been a requirement for strain characterisation since 2018, when it was introduced in the European Food Safety Authority’s (EFSA) Panel on Additives and Products or Substances used in Animal Feed (FEEDAP). The requirement to sequence whole genomes was subsequently extended to food enzymes.

 

The update: EFSA’s March 2021 statement on Whole Genome Sequencing

However, the guidance documents from 2018 were limited in their details to how WGS results should be analysed and reported. The latest EFSA statement from March 10, 2021 provides more details on WGS requirements for microorganisms intentionally used in the food chain and extends the use of WGS to all areas under food safety regulation.

The statement acknowledges that the field of WGS is rapidly evolving and applicants may choose the suitable methods and databases to complete the analysis. The methodologies used must be reported in detail to allow scientific assessment of the obtained results. Bacteria, yeasts, and filamentous fungi are not the only organisms covered; the statement also extends the methods to organisms such as viruses or microalgae. 

The final statement is a clear improvement on the draft version, being much more practical now. Our view is that the current requirements are now more reasonable too, especially the threshold values for antimicrobial resistance and virulence factor screening. Biosafe had already adopted many of the details from the statement during the public consultation phase, so when the statement was finally published, the remaining details could be adopted right away.

 

A summary of the current EFSA requirements for Whole Genome Sequencing

See below a summary of the current requirements. Biosafe always follows the latest instructions in bioinformatic analysis. If you have questions regarding the analyses or our summary, feel free to contact us via the details below.

Questions have been raised regarding how the new EU Transparency Regulation (2019/1381) will be applied to WGS and bioinformatic analysis. According to EFSA, desk-oriented work such as literature research, bioinformatics studies (which includes WGS) and other studies not involving laboratories and testing facilities are not subject to pre-submission phase consultations and public consultations.

 

Step

Description

Sequencing and data
quality control

The organism under analysis should be exactly the one that is subject to the application for authorisation. Before DNA extraction, the microorganism should be a pure culture, and the DNA isolation method must be described. For the whole genome analysis, both chromosomes and extra-chromosomal elements, such as plasmids, must be isolated.

Library construction

The library construction method must be described, and if any selection is used, it must be ensured that small fragments are not lost. The applicant should describe the sequencing instrumentation used, and any base-calling method applied. The sequencing should target 100-fold coverage of the genome, but 30-fold coverage may be acceptable. Trimming of short reads is recommended. The number of reads and total base pairs of sequence data before and after trimming should be reported.

Contamination

Contamination in the sequence reads should be assessed, and contamination of ≥5% must be explained. For bacteria, a complete genome sequence should be pursued, but a draft genome may be accepted. The completeness can be assessed by mapping the reads to a reference genome or aligning the genome assembly with the reference.

Genome assembly

Two different approaches can be used for the genome assembly: de novo assembly or reference-based read mapping. For the de novo approach, details must be provided on the software and parameters used, and total number of contigs should be <500 for bacteria and <1000 for yeasts and filamentous fungi. Assembly parameters must be reported, and justification should be provided if the assembly is ±20% of the expected size. For eukaryotic genomes, the assembly quality must be assessed using, e.g., BUSCO gene sets. If genome annotation is performed, the method should be reported.

Identification of the
microorganism

Identification is the basis of safety assessment and should be provided, where possible, at species level. For bacteria, 16S rRNA is not sufficient. The identification should be based on digital DNA-DNA hybridisation, average nucleotide identity (ANI) or phylogenomic methods. For yeasts and filamentous fungi, phylogenomic analysis is preferred. A phylogenetic tree is recommended, especially where there is high level of identity between related species.

Genetic modifications

The characterisation of the genetic modifications can be done by comparing the WGS data of the genetically modified microorganism (GMM) with that of the non-modified reference genome (parental or recipient strain). Often the reference genome is available from genome databases.

Based on the alignment between the GMM and the reference, the actual genetic modification should be characterised, and a graphic presentation of the modification should be provided. All modifications, both coding and non-coding should be described, which requires that the person doing the WGS analysis should be informed by the applicant about the intended modifications.

Identification of
genes of concern

Genes of concern (antimicrobial resistance genes, genes conferring virulence, pathogenicity or toxicity) may be searched using a search/comparison-based approach against maintained databases or a mapping-based approach. The strategy, software and all relevant parameters used to identify genes of interest should be reported, and the results should be presented in a table.

When searching for antimicrobial resistance genes, at least two databases should be used. Query sequence hits with at least 80% identity and 70% length of the subject sequence should be reported. For virulence factors, the same thresholds are applied. For microorganisms, in which no or only few AMR genes are present in databases, searches with Hidden-Markov model tools are recommended.

In case two or more fragments covering less than 70% length of the subject sequence with at least 80% identity to the same gene are detected, these should be reported, and it should be checked whether the full gene is present. This is an important addition to the previous instructions, as some genes may be lost in the filtering step due to truncated reading frames, or the annotation is erroneous.

Data submission

The WGS data must be provided to EFSA in standard formats. The data include raw reads, assembly, annotation and, in case of genetic modifications, the alignment files. The statement provides also a check list for the required information and data (Appendix A). The applicant should complete and sign the document at the time of submission. For this, Biosafe provides the completed table to the applicant after the WGS analysis is finalised.

 

 

If you are looking to test microbial strains and products, download our brochure on the genome analysis of microorganisms.
Download our brochure: Genome analysis of microorganisms

 

Published: 12.04.2021

Find your path to food safety — subscribe to our newsletter!