The Development of Bioinformatics Requirements in Genome Analysis

The past decade has seen a significant transformation in the requirements and practices surrounding genome analysis, particularly in the context of bioinformatics. This evolution has been driven by advancements in technology, increased regulatory scrutiny, and the growing importance of comprehensive safety assessments for microorganisms used in various sectors, including food production.

In 2017, a pivotal moment occurred when the EFSA Panel on Additives and Products or Substances used in Animal Feed (FEEDAP) published draft guidance on the characterisation of microorganisms used as feed additives or production organisms. This draft was the first document to introduce the requirement for whole genome sequencing (WGS) of bacterial and yeast strains in the food chain, while WGS was recommended for filamentous fungi. This guidance, implemented on September 1, 2018, marked a turning point, setting the stage for rapid development in bioinformatics requirements for microbial safety assessment. Since then, WGS has become a standard requirement for assessing food enzymes, novel foods, and microorganisms used in plant protection.

The Early Days: Establishing Foundational Requirements

Initially, EFSA's guidance outlined basic requirements for WGS, including the need for detailed descriptions of sequencing and assembly methods, quality parameters, genome size, and the provision of FASTA files and genome size data. WGS was to be used primarily for species identification and detecting antimicrobial resistance (AMR) genes and virulence factors, utilising up-to-date databases for these searches. Additionally, any genetic modifications present in the microorganisms had to be thoroughly described, with graphical representations provided. While the guidance offered a framework for bioinformatics analysis, it left room for flexibility in reporting, leading to a preference over time for more standardised outputs.

The Move Toward Standardisation: EFSA's 2021 Statement

In response to the growing need for clarity and consistency, EFSA's Scientific Committee published a Statement in 2021 after a public consultation process that took place between 2019 and 2020. This Statement introduced more detailed requirements for conducting and reporting WGS and bioinformatics analyses. It specified that DNA sequencing must be performed on pure cultures of the microorganism under assessment, with both chromosomal and plasmid DNA analysed. Criteria for sequence quality, including read depth, contamination levels, and genome completeness, were clearly defined, and the submission of raw WGS data was added to the requirements.

For bacterial identification, the Statement recommended using either dDDH or ANI analyses, with threshold values of 70% and 95%, respectively. Genetic modifications were to be characterised using the parental strain as a reference, and alignments were required. The Statement also provided detailed instructions on reporting genes of concern, such as AMR genes and virulence factors, with sequence hits meeting specific identity and length criteria being highlighted. For less well-known species, the use of Hidden-Markov model (HMM) tools was recommended.

The Latest Developments: 2024 Update and Beyond

In August 2024, EFSA issued another update to its requirements, reflecting the latest technological advancements. This update introduced instructions for viruses and mandated the use of long-read or combined long- and short-read sequencing technologies for bacterial strains, signalling that short-read sequencing methods, like Illumina, were no longer sufficient. The Average Nucleotide Identity (ANI) value threshold for bacterial identification was lowered to 94%, while yeasts and filamentous fungi now require a 99% identification threshold. Additionally, the search for genes of concern was expanded to include pathways involved in the production of clinically relevant antimicrobials. The Hiddden-Markov model was not mentioned anymore.

A significant new development in 2024 was the introduction of the Microorganisms Pipelines Service (MoPS), an EFSA platform designed for analysing WGS data from bacteria, yeasts, filamentous fungi, and viruses. Since May 1, 2024, EFSA has requested raw WGS data from applicants during the intake phase, using MoPS to cross-check microbial genome information on a case-by-case basis. The platform aims to streamline risk assessments, standardise analyses, and potentially offer applicants direct access to these tools. However, it remains to be seen whether this will speed up the risk assessment process or lead to additional questions and potential delays.

Bioinformatics in Action: Case Studies and Challenges

The practical application of these bioinformatics requirements was highlighted in two articles published by EFSA in the summer of 2024. The first article used WGS data to identify Bacillus licheniformis strains used in food enzyme production, revealing that only 12 out of 27 strains were correctly identified using WGS rather than the traditional 16S rRNA gene method. This underscores the superior accuracy of WGS in bacterial identification. The second article focused on AMR genes in various Bacillus species, demonstrating the power of bioinformatics in identifying genes of concern. However, challenges remain in interpreting these findings, particularly in determining whether an AMR gene is intrinsic or acquired, especially when it is present in a significant proportion of strains, but not in all of them.

The Road Ahead: Balancing Innovation with Regulation

Bioinformatic analysis of microbial genomes offers a detailed and nuanced view of microorganisms, but it requires expertise in both bioinformatics and the biology of microorganisms. While the ongoing development of guidance and standardisation is beneficial in improving the quality and structure of reports, it also presents challenges. Applicants must navigate increasingly detailed requirements, and while adherence to these guidelines reduces the likelihood of additional questions, the need to comply with the latest guidance—even for applications submitted before its publication—can be burdensome.

Another concern is the requirement to submit raw sequence data, which is proprietary to the applicant, as a standard practice. This raises questions about data security, confidentiality, and liability. At Biosafe, our bioinformatics team is closely monitoring these developments, continuously adapting our analysis pipelines to meet new requirements and ensuring that we remain at the forefront of this rapidly evolving field.

As we look to the future, it is clear that bioinformatics will continue to play a crucial role in the safety assessment of microorganisms. By staying informed and agile, Biosafe is committed to helping our clients navigate the complexities of genome analysis, ensuring that their products meet the highest standards of safety and efficacy in a rapidly changing regulatory landscape.

Stay informed about how Biosafe is navigating these changes and leading in food safety by visiting our blog and participating in our upcoming webinars. Together, we are shaping the next decade of food safety innovations.

Biosafe is your guide to successful food and feed product approval, providing extensive expertise in food safety assessment, research and legislation. By helping to bring new and more sustainable food solutions to market, we are working with our customers to create a safer food future.