Distributions /
Disclaimer /
Suggested Citation /
Quick Start /
Contact /
Versions
The distributions GenomeGraphR is populated with are downloaded from the
NCBI ftp.
Check on the right side of this page to see the available distributions.
The U.S. Food and Drug Administration (FDA) have taken all reasonable
precautions in creating the GenomeGraphR application. FDA is not
responsible for errors, omissions or deficiencies regarding the
application. The GenomeGraphR application is being made available “as
is” and without warranties of any kind, either expressed or implied,
including, but not limited to, warranties of performance,
merchantability, and fitness for a particular purpose. FDA in not
making a commitment in any way to regularly update the system.
Responsibility for the interpretation and use of the application lies
solely with the user. In no event shall FDA be liable for direct,
indirect, special, incidental, or consequential damages resulting from
the use, misuse, or inability to use the application. “Third parties”
use of or acknowledgment of the application and its accompanying
documentation, including through the suggested citation, does not in
any way represent that FDA endorses such third parties or expresses any
opinion with respect to their statements.
Where the GenomeGraphR is used, reference to the system should be made as follows:
Food and Drug Administration Center for Food Safety and Applied Nutrition (FDA/CFSAN). 2018. GenomeGraphR. FDA CFSAN.
College Park, Maryland, USA. Available at
https://fda-riskmodels.foodrisk.org/genomegraphr/; or
Sanaa M, Pouillot R, Vega FG, Strain E, Van Doren JM (2019) GenomeGraphR:
A user-friendly open-source web application for foodborne pathogen whole genome sequencing data integration, analysis, and visualization.
PLoS ONE 14(2): e0213039.
https://doi.org/10.1371/journal.pone.0213039
On this “Home” tab
Choose a distribution: the latest distribution is pre-loaded. If you
pick another distribution, expect some time for the data to be loaded;
Choose the SNP threshold: this value will be used to consider that two
strains are “connected” (SNP distance lower or equal to this threshold)
or not (SNP distance greater than this threshold);
Choose which date to consider: choose if the “date” used by the
application for a strain should be the date of collection as informed
by the strain submitters (default, there are some missing data), the
date of creation of the strain in the NCBI Pathogen Detection system
(no missing data) or the date of collection replaced, if missing, by
the date of target creation. Note that using the date of target
creation could be misleading when analyzing relatedness between strains
if strains were sent for WGS analysis long after the date of sample
collection;
Choose which location to consider: choose if the “location” used by the
application for a strain should be the location of collection as
informed by the strain submitters (default, there are some missing data),
the location of the submitting center (no missing data) or the
location of the collection replaced, if missing, by the location of the
center. Note that selecting the center location could be misleading
because a laboratory may isolate strains from samples collected from
all over the world;
Choose the format for graph exportation: .png, .jpeg or .pdf;
Go to the “Select” Tab
On the “Select” tab
There are three ways to select the strains to explore:
Select targets strains from the isolation source
Click on one of the nodes of the isolation source tree. The application will indicate which food is selected and will indicate how many clinical strains are closer than the SNP threshold from the strains from the selected food;
Or, Select targets strains from Table
The NCBI data will be provided in a table. Select the columns to show in addition to the default ones, if needed. Search for the strains of interest either through a global search over all columns (field “Search”) or more specifically in one column (field at the top of each column). Click on the selected strains (Click again to unselect). The application indicates the selected target accession number (“target_acc”) of the selected strains and (top of the page) how many clinical strains are closer than the SNP threshold from the selected strains.
Or, Select targets strains from File
Choose an Excel file on your computer. The file should contain a column of strain identifiers. The file will be uploaded. When updated the application will try to match the various variables of your file to the various variables from the NCBI database (you can choose to match the data using only one variable with the “Match on which variable” field). The application will provide two tables: the first one shows your file and, or each line of your file, if there is a matching record in the NCBI database; the second one provides the selected strains from the NCBI database. The application also indicates (top of the page) how many clinical strains are closer than the SNP threshold from the selected strains.
Once you have selected your strains, click on the “Select” button.
On the “Selection” tab
This tab presents data from the strains you selected.
On the “Pivot Table” subtab
This tab provides a pivot table for data description. Drag and drop the variables
either in line or in column to obtain cross tabulation. By clicking on a variable,
you can filter some of the occurences of the variable. You can choose the statistic
and the type of output you want;
On the “Epicurve” subtab
A dynamic graph illustrates the isolate occurrence over time. You can choose to group the strains by month, year or week (See which date). You can color code a variable. Hovering over the bars shows the characteristics of the strains;
On the “Map” subtab
A map of the location of the strains is provided. Warnings: the dots are placed at random within the limits of the state (United States) or the country (other countries, including Canada). The position of each dot doesn't represent the actual location of sampling. Strains from the US not assigned to a specific State are placed in the blue square. Strains not assigned to a specific country are placed in the red square.
You can choose the size of the dots. Clicking on a dot provides the characteristics of the strains.
You can choose the date of isolation. By clicking on the arrow below the slider, you obtain a dynamic graph (localisation of the strains as a function of the date of isolation).
On the “Network” tab
On the “Graph” subtab
This graph shows the connected components (CC) associating the selected strains and the clinical strains. Each node is a strain. An edge is drawn only if i) the distance between two strains is lower or equal to the chosen SNP threshold; ii) it links a selected strain and a clinical strain. Nodes are shown only if they are connected to at least another strain.
(On the left) you can change the layout of the graph (the default tries to provide a nice layout. However, it might be interesting to test other layouts), choose if the color palette for years is qualitative (colors varying from year to year in no order) or quantitative (colors from white – older date - to black – recent year -), and choose a selection criterion. By default, the selection criterion is the country, but can be changed to the year, the project center, the serovar, the SNP cluster, the source or the connected component. When you change this selection criteria, you will be able to identify the strains according to this criterion by clicking on the selector “Select by …”.
When a lot of strains are to be drawn, the graph is split in sub graphs of about 1,000 strains each, starting by the smallest connected components. Use the selection box to show other sub graphs, from smallest to largest CCs.
Click on any strain of a CC of interest and click on the “Update Selection Sub Network” Button.
On the “Characteristics” subtab
This page provides some information on the CCs illustrated by the graph.
The first table (“Connected-component characteristics”) shows the distribution of the size of the CC provided in the graph.
The second table (“Connected-component characteristics per CC”) shows, for each CC numbered from 1 to n, the number of clinical strains, the number of selected strains and the total number of strains. The identifier of the CC is the same as in the following tables. A CC can be identified on the graph by selecting “CC” in the selection criteria field and use this identifier.
The graph (“Connected-component timeline”) shows, for each CC, the date of isolation (See which date) of the various clinical and selected strains. Note that it is a dynamic graph that you can zoom, save, …
The table “Connected-components” provide the NCBI table for these CC. A specific CC can be selected from its identifier using the “CC_number” fiel.d
On the “CC as a function of SNP distance” subtab
Choose the maximum threshold you want to test and click on the “create/update” graph button.
A dynamic graph will show the box-plot of the logarithm (base 10) of the CC size as a function of the SNP threshold. The
x-axis is the SNP threshold (from 0 to the chosen limit), the
y-axis shows the connected component size (in log base 10).
On the “Sub Network” tab
On the “Graph” subtab
The application selects all the non-clinical strains linked to the clinical strains belonging to the previously selected CC and adds them to the CC to construct a sub-network. In this subnetwork all the links with SNP distance less than or equal to the threshold are drawn. Contrary to the previous graph (showing only clinical-selected edges), edges are drawn between clinical, selected, and other strains.
You can change the layout of the graph, choose if the color palette for years is qualitative (color varying from year to year in no order) or quantitative (color from white – older date - to black – recent year -), and choose a selection criterion. By default, the selection criterion is the country, but can be changed to the year, the project center, the serovar or the source. When you change this selection criteria, you will be able to identify the strains according to this criterion by clicking on the selector “Select by …”. You can choose to draw or undraw (clinical – clinical) edges, (non-clinical – clinical) edges and/or (non-clinical – non-clinical) edges.
You can reduce the SNP threshold to cut the subnetwork.
On the “Minimum Spanning Tree” subtab
A minimum spanning tree (MST) is a subgraph that connects all the nodes together, without any cycle and with the minimum possible total link weight (that is SNP distance).
The same options as in the preceding graph are offered.
On the “Tree” subtab
A SNP distance-based tree is provided. You can choose the method of clustering (Complete, Single (close to MST), Ward, Ward (squared), Average (UGPMA), McQuitty (WPGMA), Median (WPGMC) or Centroid (UPGMC)).
On the “Epicurve” subtab
A dynamic graph illustrates the isolate occurrence over time, by isolation source. You can choose to group the strains by month, year or week (See which date). Hovering over the bars shows the characteristics of the strains.
On the “Circular plot” subtab
This dynamic plot visualizes the overall number of links between categories of isolates. Hovering the mouse on a category provides the distribution of links for this category to the other ones.
On the “Sankey” subtab
This dynamic plot visualizes similarly the overall number of links between categories of isolates. You can choose the “source” category of strains you want to see the number of links to.
On the “Map” subtab
A map of the location of the strains is provided (See which location). Warnings: the dots are placed at random within the limits of the state (United States) or the country (other countries, including Canada). The position of each dot doesn't represent the actual location of sampling. Strains from the US not assigned to a specific State are placed in the blue square. Strains not assigned to a specific country are placed in the red square.
You can choose the size of the dots. Clicking on a dot provides the characteristics of the strains.
You can choose the date of isolation (See which date). By clicking on the arrow below the slider, you obtain a dynamic graph (localisation of the strains as a function of the date of isolation).
On the “Sub Network Characteristics” subtab
A table provides the characteristics of the strains.
On the “Download” subtab
You can download the strains data (meta data) and the SNP distance data from the selected CC.
Questions and comments:
FDAFoodSafetyRiskModel@fda.hhs.gov
Version Beta 2.9: Separated Pages for
Listeria and
Salmonella (published 2/4/2021).
Version Beta 2.8: Automatic updates of the distribution for
Listeria and
Salmonella.
New tab for strain description.
Version Beta 2.7: Updated Distribution for
Listeria (published 4/11/2019).
Version Beta 2.6: Updated Distributions (published 2/25/2019).
Version Beta 2.5: first released version on December the 2
nd, 2018.