ARGA User-guide and Help

User-guide last updated: 26 March, 2024
User-guide first published: 2 February, 2024

The Australian Reference Genome Atlas (ARGA) indexes a range of genomic and genetic datasets generated from biological species relevant to Australia. The index includes publicly available data from several genetic data repositories and focuses on animals, plants, fungi, algae and protozoans; currently, viruses and bacteria are not included in the ARGA index. New data sources are added regularly, and already included data sources are refreshed via a scheduler. A full list of data sources, with hyperlinks to their original online location, is provided on the Data Sources page; this page also includes metadata about the last date of indexation and number of records included. The Data Sources page can be accessed from the global menu bar at the top of the page in the ARGA app.

ARGA has a defined scope which covers only eukaryote species, and includes those species known to occur in Australia and external territories, as well as additional species which may be of relevance to Australia’s interests in biosecurity, agriculture, commerce and trade, and the wider evolutionary origins for Australia’s unique biodiversity. The taxonomic backbone that supports the ARGA search engine is built using a mixture of primary sources to create a single unified names index. The ARGA names index relies on authoritative sources to provide currently accepted names for species, and is drawn chiefly from the Australian National Species Lists, which include the Australian Plant Census and the Australian Faunal Directory. Additional sources have been incorporated for species not contained within these lists. A full list of sources can be accessed on the Data Sources page. Sources are indicated for each taxonomic name on its relevant page within the ARGA app.

Attributions of data to individual taxonomic species as displayed in the ARGA index are inherited from source datasets. They are not cross-checked by ARGA for the underlying scientific validity of the taxonomic identification, and as such, users of ARGA should be aware that the aggregation of a specific datum within a species page depends on the identification of that material in the source data (i.e. if a sequence is ascribed to an “incorrect” (as in misidentified) taxonomic name in the source data, it will remain in the “wrong” species). Similarly, data for a particular species may not appear in a search if it has been deposited under a name which is no longer in scientific usage. ARGA has built into its name index a list of known synonyms from reputable sources (and as reflected on the National Species Lists and the Atlas of Living Australia), and while every effort has been taken to ensure currency of this list, it should be acknowledged that these lists are built from taxonomic opinion and precedent, and may therefore represent a given taxon under an unexpected name, and also that no list can be exhaustive. Users are encouraged to explore the nomenclatures used by the National Species Lists and Atlas of Living Australia, as well as those provided at the Global Biodiversity Information Facility (GBIF) and Catalogue of Life. If you believe that a name is inaccurate, please reach out to these data providers directly (with peer-reviewed documentation), or you may also contact us, and we will work with you to access the data providers so that they may assist with having any updated names reflected in the various lists and catalogues of species.

ARGA makes no claims about the quality of any of the genomic data that it indexes. ARGA provides users with metadata to assist them to assess the usefulness of any given data for their own specific context. Users are encouraged to explore all of the data for any given taxon and view the metadata to help them in their own assessments of the applicability of a datum for inclusion and reuse in any downstream studies. Attributions for source data are provided with each piece of data included in the index. Users should respect the data licensing agreements as indicated for each datum, and cite all used data accordingly in any derivative works. Users are free to share (under a CC-BY 4.0 licence) any of the aggregated metrics presented by ARGA on the higher taxon summary pages, and are requested to cite ARGA as a source for these metrics. Please refer to the Data sources and citations section below for instructions on citing ARGA and source materials.

ARGA supports searching for species using both common names and scientific names, and at any taxonomic level. The simplest way to commence a search is to type in the name of your species of interest, for example, you could type {koala} to obtain information about the genomic data discoverable for koalas within the ARGA index. If you know the Latin name of your species, you could achieve the same search results by typing {Phascolarctos cinereus} into the search bar and pressing the {Enter} or {Return} key.

Tip: Make sure any scientific names follow the correct capitalisation, e.g. {Aves} or {Thalassarche bulleri}.

The ARGA names index enables searching for animals and protists using any of the following ranks: kingdom, phylum, class, order, family and genus. For plants, algae and fungi, searching has been enabled for the ranks: regnum, division, classic, ordo, familia and genus. A search using a higher taxon will yield results for all of the indexed members of that group, so a search for {Acacia} will return all of the species reports, genomic data and specimen records that include the term {Acacia} somewhere in the data record. Searching using a species-rich group name, like {Acacia} or {Lutjanus} may yield a large number of results. If you have searched at a high taxonomic level (like phylum or division), you may like to deploy additional filters or search terms to limit your query. Alternatively, you may like to build more complex queries, like {genus:Acacia} or {genus:Lutjanus}, to restrict the returned list of the set of all of the species reports within the ARGA index (i.e. only the aggegrated species reports will be returned, not individual data items) (see query syntax below for additional constructions, strategies and methods for search limitation).

Hint: If you are searching with a species name (or other scientific name) that doesn’t yield any results, check whether it has been described under a different name in an authoritative taxonomic resource.

If you do not know the scientific name of your target species, you can still search using a common name, like {frog} or {kangaroo}, but please be aware that these results will be less specific. For instance, the search {kangaroo} will bring back results for any species with {kangaroo} in the data, such as Amblyomma triguttatum Koch, 1844 (the kangaroo hard-tick) or Requena kangaroo Rentz, 2001 (a grasshopper from Kangaroo Island).

Tip: If you don’t know the scientific name of the species you are looking for, try searching on the ALA, or browsing within one of the higher taxon groups on the ARGA app.

Accession numbers (or equivalent repository-specific sequence identifiers) can be used to search for specific sequences. Repository names, such as {NCBI GenBank} or {BOLD}, can be used as search terms to return data exclusively from specified sources. You can also specify the type of data you are searching for in the search bar to return only genome assemblies {data_type:Genome}, or single loci {data_type:Locus}.

Hint: Use the versioning decimal suffix when searching for INSDC accessions, e.g. {GCA_XXXXXXXXX.2}.

ARGA provides specimen records as a separate data type, and these can be queried using relevant data attributes, including the institution/institution code and collection in which the specimen was deposited and the recorded collector and identifier. You can search for specimens explicitly {data_type:Specimen}, and for specimens indexed from a specified repository.

Tip: ARGA indexes Australian museum specimen records provided to the Atlas of Living Australia by the Online Zoological Collections of Australian Museums (OZCAM). To search for museum-accessioned specimens, use {data_source:OZCAM}.

ARGA’s search accepts specifications on a range of data attribute fields. These can be defined with the colon operator, according to the syntax {attribute}:{value}. Table I below lists the full list of fields which can be used to formulate queries. 

Data attribute field nameDescriptionExample
Taxonomic attributes
canonical_nameAccepted scientific name, including genus and/or species namesPteropus alecto
common_namesCommon or vernacular names for a speciesBlack flying-fox
kingdomTaxonomic ranks (note: this is currently case-sensitive)Animalia
Genomic data attributes
accessionAccession number assigned to a sequence datum by the source repositoryGCA_900500725.1
levelThe assembly level for a genome: contig, scaffold, chromosome, complete genomeScaffold
Specimen data attributes
institution_codeThe code for the institution, such as a museum, where a specimen has been deposited. Most institution codes will follow the convention recorded in the Global Registry of Scientific Collections (GRSciColl)AM
collection_codeThe code for a specific collection within an institutionIchthyology
recorded_byThe group or individual who originally collected the specimenSandy Desert Survey
identified_byThe group or individual who taxonomically identified the specimenJohn Smith
collected_byThe group or individual who collected the specimenJohn Smith
Other data attributes
data_sourceRepository or database providing sequence or specimen dataNCBI Genbank
data_typeThe type of data as defined in ARGAGenome
Table I. A list of attributes (data field names) that can be incorporated into searches. Complex and limiting searches can be created using combinations of these attributes with taxonomic names and other search terms, as well as Boolean operators and other query parsers.

Complex searches can be built by combining search terms with standard query syntax. Examples of the most commonly used operators and expressions are provided below. A more comprehensive description of the syntax accepted by the Tantivy query parser used by ARGA can be found here.

Hint: Searches using specified attribute fields are case sensitive. All attribute terms should be written in lowercase, with words separated by underscores, and search value terms should be in proper case (e.g. Macropus, Genome).

After pushing the {Enter} key (or {Return} key), the ARGA app will proceed to complete a search based on your input terms. The app will then direct you to a results page which lists the results of the search. Figure 1 shows a screenshot of the output in the results view page after searching for the term {dugong}. On any given results view page, there are four types of records which can be displayed: species reports, whole genome assemblies, single genetic loci, and specimens. You can use the filter icon at the top right of the results window to refine the displayed results according to a specific data type (see Figure 2).

Figure 1. Screenshot of the results view as presented on the ARGA app interface. The search was created using the search term {dugong}.

The species report listing includes information about taxonomic classification and an overview of the number of genome assemblies and other genetic data records indexed for that species. Clicking the view button will take you to the complete species report, where you can explore the available data in more detail. The genome and locus listings include key metadata and the source repository from which the data were indexed. The view button opens the full data view page, where more metadata and provenance information are available. Specimen listings provide information about physical specimens which have been formally accessioned into registered collections and/or sampled to produce a genomic datum. You can go to the complete specimen view page to find out more about a specimen and any associated genomic data by clicking the view button.

Figure 2. Screenshot of the results view, filtered for data type, as presented on the ARGA app interface. The search was created using the search term {dugong} and then selecting the filter for {Genomes} in the Data types menu.

The primary aggregator used by ARGA to index genomics data, other genetic data and specimen data is the scientific species name. All species included in ARGA have a species report page, regardless of whether any genomic data are associated with that record. Species reports can be accessed from search results or browse pages. Different data types are presented in different tabs.

The header on the species report page includes the scientific name of the species, followed by icons indicating a higher taxonomic grouping (such as mammals or birds) and/or membership of a curated functional or ecological species group. You can hover over these icons for more information, or click on them to be directed to the higher taxon report for that grouping. At the right of the header block, you can see at a glance whether a reference genome is available for that species (i.e. whether it has a genome assembly on NCBI RefSeq), as indicated by a green tick or red cross.

The genome assemblies tab (Figure 3) lists the nuclear genomes indexed for a given species. If a reference genome is available from NCBI RefSeq, this is designated at the top of the page as the representative genome, and the assembly statistics are provided. Other genome assemblies are listed below; these can be filtered according to assembly level, genome representation, and release type by using the filter icon at the top right. Where applicable, locations are plotted on the map according to the collection location of the specimen or sample from which the assembly data was generated. Clicking on the dark blue view icon beside a listing will bring you to the full data view page for that accession. 

Figure 3. The genome assemblies tab, as presented on the ARGA app interface, which aggregates whole genome data records for specimens attributed to Melopsittacus undulatus.
Figure 3. The genome assemblies tab, as presented on the ARGA app interface, which aggregates whole genome data records for specimens attributed to Melopsittacus undulatus.

The full data view page for each of the indexed genomic assemblies (Figure 4) contains a selection of metadata indexed from the source repository. These are provided in the blocks titled genome details, assembly statistics and data provenance. The data availability block provides a rapid-glance overview of the types of metadata available for a datum within the ARGA index; green ticks indicate data presence and red crosses indicate data absence.

Figure 4. Screenshot of the genome assemblies: full data view within the genome assemblies tab, as presented on the ARGA app interface, which shows the detailed listing and metadata for the reference sequence (NCBI Genbank Accession number QCF_018873765.1) for Telopea speciosissima, the Sydney waratah.

Within the original data box inside the genome details container, there are buttons to assist users to access original data. The first of these is the add to list button. This button saves a link to the datum for later access. Saved data can be viewed at any time by clicking on the white list icon at the top right of the global header (Figure 5). The number of items in the list is designated in a small orange circle beside the list icon (Figure 5). The links provided in the list will send you directly to the source repository to access the data. The get data button opens a new tab where you can download the sequence datum directly, whereas the go to source button directs you to the corresponding page for that accession in the source repository (where available). The send to Galaxy button is designed to send data directly to the online analysis platform, Galaxy Australia; this feature is in development and will be available in future ARGA implementations.

Figure 5. Screenshots of the list icon from the global header bar on the ARGA app interface. The first image (left) shows the icon as it appears for an empty list; the second image (right) shows the number of items saved to list within an orange circle. Clicking on the list icon opens the saved list.

You can find more information about a genomic datum, including additional metadata relating to the collection of a specimen or sample, as well as its sampling, sequencing, and assembly, by clicking on the go to specimen button in the specimen information block. This button directs to the complete specimen view, which provides details about the events between the collection of a specimen and the deposition of a sequence datum and its reuse.

The genomic components tab (Figure 6) lists raw component data which are used by bioinformaticians in the assembly of whole genomes. The tab also lists other data types which are not captured on the genome assemblies and single loci tabs. Data found here may include raw reads, targeted enrichment sequence reads and reduced representation sequence reads (e.g. DArT data). The go to source button alongside each record directs to the relevant page at the source repository, where you can access more information about the data.

Figure 6. Screenshot of the genomic components tab, as presented on the ARGA app interface. This tab lists a variety of datasets that are components which can be used to assemble whole genomes.

Locus and marker sequences are listed under the single loci tab (Figure 7) for a species, along with the accession number (or alternative identifier), the locus (or gene) name, the sequence length, and the source molecule (e.g. DNA, RNA). Data are plotted on the map according to any geolocation co-ordinates provided for the collection of a specimen or sample. More details for each datum are available by clicking the corresponding ‘view’ button, which will take you to the single locus data view page.

Figure 7. Screenshot of the single loci tab, as presented on the ARGA app interface. This tab lists genetic locus and marker sequences for a given taxon.

The full data view page for a single locus (Figure 8) contains a selection of metadata indexed from the source repository for single genetic locus data records. These metadata are organised into containers for molecule data, data provenance, and amplification methods. An overview of the total metadata available for a datum is provided in the data availability block, as indicated for rapid glancing by green ticks and red crosses.

Within the molecule data box, there are buttons to enable access to the raw data. The get FASTA button navigates to an external site for direct download of the sequence data. The go to source button directs to the corresponding page for that accession in the source repository. There is another button, send to Galaxy, to send data directly to the online analysis platform, Galaxy Australia; please note that this feature is in development and is not currently available. The page features a trace data viewer block which can be used to visualise the sequence electropherogram extracted from the relevant ABI file, if this is available. Where multiple files are attached to a single sequence, multiple electropherograms are shown and can be compared vertically.

You can find more information about a sequence datum, including additional metadata relating to the collection of a specimen or sample, as well as its sampling, sequencing, and assembly, by clicking on the go to specimen button in the specimen information block. This will take you to the complete specimen view page, with details describing the events between the collection of a specimen and the deposition of a sequence datum and its reuse.

Figure 8. Screenshot of the single locus full data view tab, as shown on the ARGA app interface.

The data distribution tab (Figure 9) features a map for visualising the geographic distribution of data indexed in ARGA for a given species. Points are plotted according to the coordinates given for the collection location of a specimen or field sample. You can click on any of these points to find a link to the relevant specimen or datum page, where further details are available. Some data have collection locations outside of Australia, and you may need to zoom in (or out) to navigate around the map to find them.

You can adjust the data types (whole genomes, loci, other data, and specimens) displayed on the map using the toggles on the right. The numbers beside each data type indicate the number of data points recorded with geo-coordinates over the total number of data points available for that data type within the ARGA index. Data without collection location geo co-ordinates are not displayed on the map.

Data are displayed as per the coordinates provided in the indexed metadata, however, it is important to note that coordinates may have been intentionally generalised (e.g. for sensitive species), or, in some instances, may have been incorrectly reported. 

Overland regions highlighted in yellow and aquatic regions highlighted in blue designate the known range for that species in Australia, as sourced from the Australian Faunal Directory. These regions are defined according to the interim biogeographic regionalisation of Australia (IBRA) and integrated marine and coastal regionalisation of Australia (IMCRA), respectively. The relevant region names are listed under the Distribution heading.

Figure 9. Screenshot of the data distribution tab from the ARGA app interface. Users can use the toggles in the side panel to turn off (and on) location data for different data types. All data types are shown by default. The map also indicates distribution data (based on IBRA and IMCRA regions), where this is available from authoritative sources.

Records of physical specimens for a given species are listed under the specimens tab (Figure 10). Specimen data are sourced both from sequence data repositories and from records provided to the Atlas of Living Australia by registered collections, including museums and herbaria. Where programmatically feasible, data are aggregated according to specimen registration number, so that a specimen record pulls in any collection or accession information described in collection records as well as any sequencing information which has been attributed to that specimen. For genomic data indexed without a specimen registration number, a sample identifier (such as a biosample number) is used. For data without a sample identifier, a nominal registration number is generated by ARGA, unique to that datum.

In addition to the registration number, specimen records are listed alongside collection location information, type status (designating whether a specimen is considered a holotype, paratype, or other form of type), and whether derived genomic data are available. For records with geo co-ordinate data, specimen collection locations are shown on the map. Clicking the view button will direct you to the complete specimen view for that record.

Figure 10. Screenshot of the specimens tab from the ARGA app interface, showing the listing of specimens attributed to a species, and which have been indexed by ARGA.

The complete specimen view page (Figure 11) presents information relating to each event between the collection of a specimen and deposition of a genomic datum and its reuse. Data on this page may have been sourced from metadata attached to indexed genomic data and/or from records supplied by formal collections. The data source is provided separately for each event. The fields listed under each event have been chosen to accommodate a wide range of information and detailed processes. However, sequence data are rarely deposited with a full complement of relevant metadata. As a result, many of the specimens listed in ARGA will have little data available, and some event stages will have no data at all. 

Sometimes, multiple locus sequences and/or assemblies will have been derived from a single specimen. In these cases, there may be more than one event recorded at each event stage (e.g. amplification and sequencing event). You can navigate between these different events by clicking on the arrows at the bottom left and bottom right of that event block.

Figure 11. Screenshot of a complete specimen view for an individual specimen within the ARGA index, as shown on the ARGA interface. Data provenance can be revealed by following the chain of nest events from initial specimen collection through data generation to final data deposition.

A full taxonomic classification of a species is provided on the taxonomy tab (Figure 12). This view includes the scientific name, the taxonomic status of that name, and common names. There is a higher classification block providing details of the placement of the species within each level of a taxonomic hierarchy; the source for the classification shown is indicated. Clicking on the buttons for any of the higher taxa directs to the higher taxon summary and browse page for that specific higher taxon. 

It should be noted that different repositories and databases may employ different classification structures (in terms of both ranks and attributions of groups within those ranks), and that individual genomic data may not always be updated to account for taxonomic revisions or re-identifications. ARGA indexes data according to the taxonomic names defined in the source repository. These names are matched to an ARGA names index, which is derived largely from the taxonomy as defined in the Australian National Species List and Australian Faunal Directory, and as reflected in the taxonomy of the Atlas of Living Australia; the ARGA names index is supplemented with additional species external to Australia (such as those with biosecurity implications) and other species-like taxonomic units (like molecular operational taxonomic units (= MOTUs)). For each taxon, the source used to assign taxonomy is provided on the taxonomy tab, and can be accessed by clicking on the hyperlink.

Figure 12. Screenshot of a typical taxonomy view tab, as presented on the ARGA interface.

Users can scroll through the data indexed on ARGA by clicking on the browse icons on the homepage. Browsing categories are divided into data types, taxonomic groups, and functional or ecological groups. Clicking on any one of the icons on the app home page launches the browse page for that chosen group.

The browse page feature creates a card for every species, including an image (if available), the scientific name, and an indication of whether one or more genome or locus sequences have been indexed for that species. The presence of data for the species is designated by a green tick, and the absence of data in the index is shown by a red cross. Clicking on a card opens the species report page. Users can return to browsing the set of cards by using the back arrow, which is located on the species report above the taxonomic name.

On any browse page, the selection of species displayed can be adjusted by clicking on the filter icon at the top right, and turning on and off various filter combinations which are provided under the filter submenus. Please note that applying the filters does not adjust the statistics in the data summary provided on the top half of the page (currently implemented for taxonomic groupings only).

The first set of icons on the app home page enables browsing by the type of data contained in the index. Users can browse sets of species for which there are genome assemblies, single genetic loci and registered specimen records (see Figure 13). The number of data records is calculated below each particular data type. Each set of species can be browsed by clicking on the selected icon.

Figure 13.  Screenshot of the browsing icons for each available data type, as presented on the ARGA app interface.
Figure 13. Screenshot of the browsing icons for each available data type, as presented on the ARGA app interface.

The ARGA interface presents the opportunity for browsing species by higher-level taxonomic groups (e.g. plants, animals, or fungi) to see overviews of the data in the index, and for scrolling through the species contained within that higher taxon. Higher taxon reports (Figure 14) can be accessed from the ARGA app home page by clicking on the icon relevant to the group of interest. They can also be reached by clicking on a taxon of interest from within the taxonomy tab of any species report.

Hint: You can also jump to data summary pages directly by typing in the URL using the formulation: {{} + {rank/} + {Taxon}} (making sure to capitalise the taxon name). For example, to go straight to the summary page for beetles (Order Coleoptera), type in {}.

The data summary block provides graphs and statistics describing the availability of indexed genomic data for the species within a designated taxonomic rank. The total number of species included in the taxon is given in the taxonomic breakdown block to the right. These species are listed on the browse cards below. Rank-specific data summaries are available for all eukaryotic taxa at the major Linnaean ranks. From the data summary page for a given taxon, you can navigate to summaries for higher ranks in the classification hierarchy by clicking on the scientific names in the higher classification block above the graphs.

Figure 14. Screenshot of a higher taxon report as it appears on the ARGA interface. The higher taxon report provides summary statistics about the percentage of species within the group with genomes, and with any genetic data (as indexed on ARGA). The summary statistics also provide a breakdown of the relative data availability for subgroups within that grouping (e.g. percentage coverage by classes within a phylum). Users can also browse through all of the species in that group using the cards in the bottom container.

The functional and ecological groups each contain lists of species curated by ARGA to reflect a particular theme (e.g. poisonous and venomous species, or species vulnerable to bushfires). These species lists have been built from authoritative sources and are fully referenced on the Data Sources page.

ARGA indexes and uses data from multiple online sources, which are listed on the data sources page. Users can navigate to this page by clicking on the tab marked Data sources on the global header at the top of the ARGA app. Data sources are provided with hyperlinks and, where relevant, citations. If users any data found on ARGA in any published work (online or in print), they must reference the primary source work, including the author and/or data provider and literature citations (where applicable). 

Please cite ARGA by including the following statement when publishing your research:

This research was supported by the use of the Australian Reference Genome Atlas (ARGA), an NCRIS-enabled platform powered by the Atlas of Living Australia (ALA), in collaboration with Bioplatforms Australia and the Australian BioCommons, and supported by the Australian Research Data Commons (ARDC).

Quill With Ink icon by Icons8