Generating genetic engineering linked indicator datasets for machine learning classifier training in biosecurity

No Thumbnail Available

Authors

Painter, Christopher
Bastian, Nathaniel D.

Issue Date

2021-04-12

Type

proceedings-article

Language

en_US

Keywords

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

As methods and access to gene synthesis and genetic engineering have become more advanced, the fear that malicious viruses and bacteria will be designed with the express intention of causing harm to humans has received increased attention. In the event that such biological weapons are deployed, the security community needs tools to rapidly recognize the threat and identify responsible parties. Therefore, a key question is whether or not a biological threat is manmade. Currently, experts are capable of qualitatively assessing whether specific genetic sequences are natural or man-made, but few objective criteria exist for characterizing the degree to which a sequence has been engineered. Additionally, progress has recently been made on the task of attributing an engineered gene sequence to a lab-of-origin using machine learning. However, the task of analyzing naturally occurring genetic sequences so as to automatically detect outliers that may have been genetically engineered has received comparatively little attention. This work proposes a method for generating a dataset of natural and engineered sequences that can be used as an input for training machine learning classifiers to perform automatic detection of human engineering in gene sequence data.

Description

Citation

Christopher Painter and Nathaniel D. Bastian "Generating genetic engineering linked indicator datasets for machine learning classifier training in biosecurity", Proc. SPIE 11746, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, 1174624 (12 April 2021); https://doi.org/10.1117/12.2587844

Publisher

License

Journal

Volume

Issue

PubMed ID

ISSN

EISSN