ARGA User-guide and Help | Australian Reference Genome Atlas

User-guide last updated: 16 January, 2025
User-guide first published: 2 February, 2024

About the ARGA index

The Australian Reference Genome Atlas (ARGA) indexes a range of genomic and genetic datasets generated from biological species relevant to Australia. The index includes publicly available data from several genetic data repositories and focuses on animals, plants, fungi, algae and protozoans; currently, viruses and bacteria are not included in the ARGA index. New data sources are added regularly, and already included data sources are refreshed via a scheduler. A full list of data sources, with hyperlinks to their original online location, is provided on the Data Sources page; this page also includes metadata about the last date of indexation and number of records included. The Data Sources page can be accessed from the global menu bar at the top of the page in the ARGA app.

ARGA has a defined scope which covers only eukaryote species, and includes those species known to occur in Australia and external territories, as well as additional species which may be of relevance to Australia’s interests in biosecurity, agriculture, commerce and trade, and the wider evolutionary origins for Australia’s unique biodiversity. The taxonomic backbone that supports the ARGA search engine is built using a mixture of primary sources to create a single unified names index. The ARGA names index relies on authoritative sources to provide currently accepted names for species, and is drawn chiefly from the Australian National Species Lists, which include the Australian Plant Census and the Australian Faunal Directory. Additional sources have been incorporated for species not contained within these lists. A full list of sources can be accessed on the Data Sources page. Sources are indicated for each taxonomic name on its relevant page within the ARGA app.

Important note about data usage and limitations

Attributions of data to individual taxonomic species as displayed in the ARGA index are inherited from source datasets. They are not cross-checked by ARGA for the underlying scientific validity of the taxonomic identification, and as such, users of ARGA should be aware that the aggregation of a specific datum within a species page depends on the identification of that material in the source data (i.e. if a sequence is ascribed to an “incorrect” (as in misidentified) taxonomic name in the source data, it will remain in the “wrong” species). Similarly, data for a particular species may not appear in a search if it has been deposited under a name which is no longer in scientific usage. ARGA has built into its name index a list of known synonyms from reputable sources (and as reflected on the National Species Lists and the Atlas of Living Australia), and while every effort has been taken to ensure currency of this list, it should be acknowledged that these lists are built from taxonomic opinion and precedent, and may therefore represent a given taxon under an unexpected name, and also that no list can be exhaustive. Users are encouraged to explore the nomenclatures used by the National Species Lists and Atlas of Living Australia, as well as those provided at the Global Biodiversity Information Facility (GBIF) and Catalogue of Life. If you believe that a name is inaccurate, please reach out to these data providers directly (with peer-reviewed documentation), or you may also contact us, and we will work with you to access the data providers so that they may assist with having any updated names reflected in the various lists and catalogues of species.

ARGA makes no claims about the quality of any of the genomic data that it indexes. ARGA provides users with metadata to assist them to assess the usefulness of any given data for their own specific context. Users are encouraged to explore all of the data for any given taxon and view the metadata to help them in their own assessments of the applicability of a datum for inclusion and reuse in any downstream studies. Attributions for source data are provided with each piece of data included in the index. Users should respect the data licensing agreements as indicated for each datum, and cite all used data accordingly in any derivative works. Users are free to share (under a CC-BY 4.0 licence) any of the aggregated metrics presented by ARGA on the higher taxon summary pages, and are requested to cite ARGA as a source for these metrics. Please refer to the Data sources and citations section below for instructions on citing ARGA and source materials.

Searching in ARGA

Searches can be made from the search bar on the homepage. Search terms can relate to taxonomic, genomic, or specimen data attributes. You can combine multiple search terms to craft more specific searches. See query syntax for more information on how to build a search around defined parameters.

Searching by scientific or common names for species and higher groups

ARGA supports searching for species using both common names and scientific names, and at any taxonomic level. The simplest way to commence a search is to type in the name of your species of interest, for example, you could type {koala} to obtain information about the genomic data discoverable for koalas within the ARGA index. If you know the Latin name of your species, you could achieve the same search results by typing {Phascolarctos cinereus} into the search bar and pressing the {Enter} or {Return} key.

Tip: Make sure any scientific names follow the correct capitalisation, e.g. {Aves} or {Thalassarche bulleri}.

The ARGA names index enables searching for animals and protists using any of the following ranks: kingdom, phylum, class, order, family and genus. For plants, algae and fungi, searching has been enabled for the ranks: regnum, division, classic, ordo, familia and genus. A search using a higher taxon will yield results for all of the indexed members of that group, so a search for {Acacia} will return all of the species reports, genomic data and specimen records that include the term {Acacia} somewhere in the data record. Searching using a species-rich group name, like {Acacia} or {Lutjanus} may yield a large number of results. If you have searched at a high taxonomic level (like phylum or division), you may like to deploy additional filters or search terms to limit your query. Alternatively, you may like to build more complex queries, like {genus:Acacia} or {genus:Lutjanus}, to restrict the returned list of the set of all of the species reports within the ARGA index (i.e. only the aggegrated species reports will be returned, not individual data items) (see query syntax below for additional constructions, strategies and methods for search limitation).

Hint: If you are searching with a species name (or other scientific name) that doesn’t yield any results, check whether it has been described under a different name in an authoritative taxonomic resource.

If you do not know the scientific name of your target species, you can still search using a common name, like {frog} or {kangaroo}, but please be aware that these results will be less specific. For instance, the search {kangaroo} will bring back results for any species with {kangaroo} in the data, such as Amblyomma triguttatum Koch, 1844 (the kangaroo hard-tick) or Requena kangaroo Rentz, 2001 (a grasshopper from Kangaroo Island).

Tip: If you don’t know the scientific name of the species you are looking for, try searching on the ALA, or browsing within one of the higher taxon groups on the ARGA app.

Searching by genomic data identifier

Accession numbers (or equivalent repository-specific sequence identifiers) can be used to search for specific sequences. Repository names, such as {NCBI GenBank} or {BOLD}, can be used as search terms to return data exclusively from specified sources. You can also specify the type of data you are searching for in the search bar to return only genome assemblies {data_type:Genome}, or single loci {data_type:Locus}.

Hint: Use the versioning decimal suffix when searching for INSDC accessions, e.g. {GCA_XXXXXXXXX.2}.

Searching with specimen data attributes

ARGA provides specimen records as a separate data type, and these can be queried using relevant data attributes, including the institution/institution code and collection in which the specimen was deposited and the recorded collector and identifier. You can search for specimens explicitly {data_type:Specimen}, and for specimens indexed from a specified repository.

Tip: ARGA indexes Australian museum specimen records provided to the Atlas of Living Australia by the Online Zoological Collections of Australian Museums (OZCAM). To search for museum-accessioned specimens, use {data_source:OZCAM}.

Query syntax

ARGA’s search accepts specifications on a range of data attribute fields. These can be defined with the colon operator, according to the syntax {attribute}:{value}. Table I below lists the full list of fields which can be used to formulate queries.

Data attribute field name	Description	Example
Taxonomic attributes
canonical_name	Accepted scientific name, including genus and/or species names	`Pteropus alecto`
common_names	Common or vernacular names for a species	`Black flying-fox`
kingdom	Taxonomic ranks (note: this is currently case-sensitive)	`Animalia`
phylum		`Chordata`
class		`Mammalia`
order		`Chiroptera`
family		`Pteropodidae`
genus		`Pteropus`
regnum		`Plantae`
division		`Charophyta`
classis		`Equisetopsida`
ordo		`Fagales`
familia		`Casuarinaceae`
Genomic data attributes
accession	Accession number assigned to a sequence datum by the source repository	`GCA_900500725.1`
level	The assembly level for a genome: contig, scaffold, chromosome, complete genome	`Scaffold`
Specimen data attributes
institution_code	The code for the institution, such as a museum, where a specimen has been deposited. Most institution codes will follow the convention recorded in the Global Registry of Scientific Collections (GRSciColl)	`AM`
collection_code	The code for a specific collection within an institution	`Ichthyology`
recorded_by	The group or individual who originally collected the specimen	`Sandy Desert Survey`
identified_by	The group or individual who taxonomically identified the specimen	`John Smith`
collected_by	The group or individual who collected the specimen	`John Smith`
Other data attributes
data_source	Repository or database providing sequence or specimen data	`NCBI Genbank`
data_type	The type of data as defined in ARGA	`Genome`

Table I. A list of attributes (data field names) that can be incorporated into searches. Complex and limiting searches can be created using combinations of these attributes with taxonomic names and other search terms, as well as Boolean operators and other query parsers.

Complex searches can be built by combining search terms with standard query syntax. Examples of the most commonly used operators and expressions are provided below. A more comprehensive description of the syntax accepted by the Tantivy query parser used by ARGA can be found here.

{John Smith} will return results where John Smith’s whole name remains intact as John Smith; searching for {J Smith} will broaden the results, returning results containing the terms {Smith} and {J}, e.g. J Smith, James Smith, Smith, J. and Katie Smith and John Silvs.
{Macropus} will return any result containing the term Macropus, but {genus:Macropus} will return a result set of only species summary reports for species in the genus Macropus.
{family:Macropodidae -Macropus} will return species summary reports for all macropod species, excluding any reports for species classified within Macropus (i.e. all non-Macropus macropod species).
{Macropus AND institution_code:AM} will return only specimens from AM (Australian Museum) with Macropus in the scientific name.

Hint: Searches using specified attribute fields are case sensitive. All attribute terms should be written in lowercase, with words separated by underscores, and search value terms should be in proper case (e.g. Macropus, Genome).

Search results

After pushing the {Enter} key (or {Return} key), the ARGA app will proceed to complete a search based on your input terms. The app will then direct you to a results page which lists the results of the search. Figure 1 shows a screenshot of the output in the results view page after searching for the term {dugong}. On any given results view page, there are four types of records which can be displayed: species reports, whole genome assemblies, single genetic loci, and specimens. You can use the filter icon at the top right of the results window to refine the displayed results according to a specific data type (see Figure 2).

user guide fig 1 — **Figure 1.** Screenshot of the results view as presented on the ARGA app interface. The search was created using the search term {`dugong`}.

The species report listing includes information about taxonomic classification and an overview of the number of genome assemblies and other genetic data records indexed for that species. Clicking the view button will take you to the complete species report, where you can explore the available data in more detail. The genome and locus listings include key metadata and the source repository from which the data were indexed. The view button opens the full data view page, where more metadata and provenance information are available. Specimen listings provide information about physical specimens which have been formally accessioned into registered collections and/or sampled to produce a genomic datum. You can go to the complete specimen view page to find out more about a specimen and any associated genomic data by clicking the view button.

user guide fig 2 — **Figure 2.** Screenshot of the results view, filtered for data type, as presented on the ARGA app interface. The search was created using the search term {`dugong`} and then selecting the filter for {`Genomes`} in the **Data types** menu.

Species reports

The primary aggregator used by ARGA to index genomics data, other genetic data and specimen data is the scientific species name. All species included in ARGA have a species report page, regardless of whether any genomic data are associated with that record. Species reports can be accessed from search results or browse pages. Different data types are presented in different tabs.

The header on the species report page includes the scientific name of the species, followed by icons indicating a higher taxonomic grouping (such as mammals or birds) and/or membership of a curated functional or ecological species group. You can hover over these icons for more information, or click on them to be directed to the higher taxon report for that grouping. At the right of the header block, you can see at a glance whether a reference genome is available for that species on NCBI RefSeq), as indicated by a green tick or red cross.

Summary

The summary tab (Figure 3) provides an overview of the total data indexed for a species in the data summary block. This includes the total numbers of records found for a range of genomic and non-genomic data types. At present, ARGA is only indexing genomic- and specimen-related data types; protocols and publications are not yet available. The external resources block provides links to resources which can provide further information for the species, such as the Atlas of Living Australia (ALA). The higher classification describes the species’ taxonomy according to the source listed in the top right-hand corner of the box. Clicking on the buttons for any of the higher taxa directs to the higher taxon summary and browse page for that specific higher taxon.

user guide fig 3 — **Figure 3.** The **summary** tab of the *Podargus strigoides* species report, including total counts of data records indexed for the species.

Data distribution

The data distribution tab (Figure 4) features a map for visualising the geographic distribution of data indexed in ARGA for a given species. Points are plotted according to the coordinates given for the collection location of a specimen or field sample. You can click on any of these points to find a link to the relevant specimen or datum page, where further details are available. Some data have collection locations outside of Australia, and you may need to zoom in (or out) to navigate around the map to find them.

You can adjust the data types (whole genomes, loci, genomic components, and specimens) displayed on the map using the toggles on the right. The numbers beside each data type indicate the number of data points recorded with geo-coordinates over the total number of data points available for that data type within the ARGA index. Data without collection location geo co-ordinates are not displayed on the map.

Data are displayed as per the coordinates provided in the indexed metadata, however, it is important to note that coordinates may have been intentionally generalised (e.g. for sensitive species), or, in some instances, may have been incorrectly reported.

Overland regions highlighted in yellow and aquatic regions highlighted in blue designate the known range for that species in Australia, as sourced from the Australian Faunal Directory. These regions are defined according to the interim biogeographic regionalisation of Australia (IBRA) and integrated marine and coastal regionalisation of Australia (IMCRA), respectively. The relevant region names are listed under the Distribution heading.

user guide fig 4 — **Figure 4.** Screenshot of the **data distribution** tab from the ARGA app interface. Users can use the toggles in the side panel to turn off (and on) location data for different data types. All data types are shown by default. The map also indicates distribution data (based on **IBRA** and IMCRA regions), where this is available from authoritative sources.

Genome assemblies

The genome assemblies tab (Figure 5) lists the nuclear genomes indexed for a given species. If a reference genome is available from NCBI RefSeq, this is designated at the top of the page as the representative genome, and the assembly statistics are provided. Other genome assemblies are listed below; these can be filtered according to assembly level, genome representation, and release type by using the filter icon at the top right. Where applicable, locations are plotted on the map according to the collection location of the specimen or sample from which the assembly data was generated. Clicking on the dark blue view icon beside a listing will bring you to the full data view page for that accession.

user guide fig 5 — **Figure 5.** The **genome assemblies** tab, as presented on the ARGA app interface, which aggregates whole genome data records for specimens attributed to *Melopsittacus undulatus*.

Genome assemblies: full data view

The full data view page for each of the indexed genomic assemblies (Figure 6) contains a selection of metadata indexed from the source repository. These are provided in the blocks titled genome details, assembly statistics and data provenance. The data availability block provides a rapid-glance overview of the types of metadata available for a datum within the ARGA index; green ticks indicate data presence and red crosses indicate data absence.

user guide fig 6 — **Figure 6.** Screenshot of the **genome assemblies: full data view** within the **genome assemblies** tab, as presented on the ARGA app interface, which shows the detailed listing and metadata for the reference sequence (NCBI Genbank Accession number QCF_018873765.1) for *Telopea speciosissima*, the Sydney waratah.

Within the original data box inside the genome details container, there are buttons to assist users to access original data. The first of these is the add to list button. his button saves the record to your saved data list (accessible at the top right of the global header; Figure 7), from where additional download options are available. The get data button directs you to the source FTP server or a page of the source repository from where you can download the sequence datum directly. The go to source button directs you to the corresponding page for that accession in the source repository (where available). The send to Galaxy button is designed to send data directly to the online analysis platform, Galaxy Australia; this feature is in development and will be available in future ARGA implementations.

Figure 7. Screenshots of the list icon from the global header bar on the ARGA app interface. The first image (left) shows the icon as it appears for an empty list; the second image (right) shows the number of items saved to list within an orange circle. Clicking on the list icon opens the saved list.

You can find more information about a genomic datum, including additional metadata relating to the collection of a specimen or sample, as well as its sampling, sequencing, and assembly, by clicking on the go to specimen button in the specimen information block. This button directs to the complete specimen view, which provides details about the events between the collection of a specimen and the deposition of a sequence datum and its reuse.

Genomic components

The genomic components tab (Figure 8) lists raw component data which are used by bioinformaticians in the assembly of whole genomes. The tab also lists other data types which are not captured on the genome assemblies and single loci tabs. Data found here may include raw reads, targeted enrichment sequence reads and reduced representation sequence reads (e.g. DArT data). The go to source button alongside each record directs to the relevant page at the source repository, where you can access more information about the data.

user guide fig 8 — **Figure 8.** Screenshot of the **genomic components** tab, as presented on the ARGA app interface. This tab lists a variety of datasets that are components which can be used to assemble whole genomes.

Single loci

Locus and marker sequences are listed under the single loci tab (Figure 9) for a species, along with the accession number (or alternative identifier), the locus (or gene) name, the sequence length, and the source molecule (e.g. DNA, RNA). Data are plotted on the map according to any geolocation co-ordinates provided for the collection of a specimen or sample. More details for each datum are available by clicking the corresponding ‘view’ button, which will take you to the single locus data view page.

user guide fig 9 — **Figure 9.** Screenshot of the **single loci** tab, as presented on the ARGA app interface. This tab lists genetic locus and marker sequences for a given taxon.

Single genetic loci: full data view

The full data view page for a single locus (Figure 10) contains a selection of metadata indexed from the source repository for single genetic locus data records. These metadata are organised into containers for molecule data, data provenance, and amplification methods. An overview of the total metadata available for a datum is provided in the data availability block, as indicated for rapid glancing by green ticks and red crosses.

Within the molecule data box, there are buttons to enable access to the raw data. The get FASTA button navigates to an external site for direct download of the sequence data. The go to source button directs to the corresponding page for that accession in the source repository. There is another button, send to Galaxy, to send data directly to the online analysis platform, Galaxy Australia; please note that this feature is in development and is not currently available. The page features a trace data viewer block which can be used to visualise the sequence electropherogram extracted from the relevant ABI file, if this is available. Where multiple files are attached to a single sequence, multiple electropherograms are shown and can be compared vertically.

You can find more information about a sequence datum, including additional metadata relating to the collection of a specimen or sample, as well as its sampling, sequencing, and assembly, by clicking on the go to specimen button in the specimen information block. This will take you to the complete specimen view page, with details describing the events between the collection of a specimen and the deposition of a sequence datum and its reuse.

user guide fig 10 — **Figure 10.** Screenshot of the single locus full data view tab, as shown on the ARGA app interface.

Specimens

Records of physical specimens for a given species are listed under the specimens tab (Figure 11). Specimen data are sourced both from sequence data repositories and from records provided to the Atlas of Living Australia by registered collections, including museums and herbaria. Where programmatically feasible, data are aggregated according to specimen registration number, so that a specimen record pulls in any collection or accession information described in collection records as well as any sequencing information which has been attributed to that specimen. For genomic data indexed without a specimen registration number, a sample identifier (such as a biosample number) is used. For data without a sample identifier, a nominal registration number is generated by ARGA, unique to that datum.

In addition to the registration number, specimen records are listed alongside collection location information, type status (designating whether a specimen is considered a holotype, paratype, or other form of type), and whether derived genomic data are available. For records with geo co-ordinate data, specimen collection locations are shown on the map. Clicking the view button will direct you to the complete specimen view for that record.

user guide fig 11 — **Figure 11.** Screenshot of the **specimens** tab from the ARGA app interface, showing the listing of specimens attributed to a species, and which have been indexed by ARGA.

Complete specimen view

The complete specimen view page (Figure 12) presents information relating to each event between the collection of a specimen and deposition of a genomic datum and its reuse. Data on this page may have been sourced from metadata attached to indexed genomic data and/or from records supplied by formal collections. The data source is provided separately for each event. The fields listed under each event have been chosen to accommodate a wide range of information and detailed processes. However, sequence data are rarely deposited with a full complement of relevant metadata. As a result, many of the specimens listed in ARGA will have little data available, and some event stages will have no data at all.

Sometimes, multiple locus sequences and/or assemblies will have been derived from a single specimen. In these cases, there may be more than one event recorded at each event stage (e.g. amplification and sequencing event). You can navigate between these different events by clicking on the arrows at the bottom left and bottom right of that event block.

user guide fig 12 — **Figure 12.** Screenshot of a **complete specimen view** for an individual specimen within the ARGA index, as shown on the ARGA interface. Data provenance can be revealed by following the chain of nest events from initial specimen collection through data generation to final data deposition.

Taxonomy

A full taxonomic classification of a species is provided on the taxonomy tab. This view includes the scientific name, its taxonomic and nomenclatural status, and, where the information is available, its protonym or basionym, the publication containing the species original description, and information about the identity and source location of the type material. Taxonomic synonyms, common names and any subspecies are also listed on this page. The source of the indexed taxonomic information is provided in the block’s top-right corner.

Below the taxonomy block is an interactive taxonomic backbone browser. The default taxonomic classification is listed first. Clicking between the different sources will allow you to inspect and compare full taxonomic classifications between resources. Classifications which differ from the default are marked as ‘different’ beside the taxonomic source name. Any non-identical names or ranks are highlighted in white pills. Hovering over these pills will reveal tool-tips describing how this classification differs from the default. The record history button provides a timeline charting when a taxon was incorporated and described within a taxonomic resource, including any changes to the classification since it was first indexed by ARGA. This feature can be useful in revealing alternative names that have previously been used for a species under different taxonomic systems.

The interactive higher classification contains a taxonomic tree which can be used to explore the availability of genomic data for related taxa. Each node represents a taxonomic rank which can be hovered over to reveal the number of single loci, genome assemblies, specimens and other genomic data being indexed for that taxon. Clicking on the node will expand (or collapse) any child taxa. The icon button in the top-right corner can be used to shift between vertical and horizontal orientations.For species with adequate data available, the taxon history block contains a graphic representing the timeline of taxonomic and nomenclatural events including original descriptions, taxonomic revisions, and name usages. You can hover over or click on the white circles with orange borders to find out more about the taxonomic event. More detailed descriptions of the events are provided in the nomenclatural timeline block below, including the relevant scientific name, the type of nomenclatural act, links to the relevant publications and additional information. To the right is the record history window, indicating when the event was first described in the source dataset and whether there have been any subsequent updates in the original resource.

It should be noted that different repositories and databases may employ different classification structures (in terms of both ranks and attributions of groups within those ranks), and that individual genomic data may not always be updated to account for taxonomic revisions or re-identifications. ARGA indexes data according to the taxonomic names defined in the source repository. These names are matched to an ARGA names index, which is derived largely from the taxonomy as defined in the Australian National Species List and Australian Faunal Directory, and as reflected in the taxonomy of the Atlas of Living Australia; the ARGA names index is supplemented with additional species external to Australia (such as those with biosecurity implications) and other species-like taxonomic units (like molecular operational taxonomic units (= MOTUs)). For each taxon, the source used to assign taxonomy is provided on the taxonomy tab, and can be accessed by clicking on the hyperlink.

Browsing

Users can scroll through the data indexed on ARGA by clicking on the browse icons on the homepage. Browsing categories are divided into data types, taxonomic groups, and functional or ecological groups. Clicking on any one of the icons on the app home page launches the browse page for that chosen group.

The browse page feature creates a card for every species, including an image (if available), the scientific name, and an indication of whether one or more genome or locus sequences have been indexed for that species. The presence of data for the species is designated by a green tick, and the absence of data in the index is shown by a red cross. Clicking on a card opens the species report page. Users can return to browsing the set of cards by using the back arrow, which is located on the species report above the taxonomic name.

On any browse page, the selection of species displayed can be adjusted by clicking on the filter icon at the top right, and turning on and off various filter combinations which are provided under the filter submenus. Please note that applying the filters does not adjust the statistics in the data summary provided on the top half of the page (currently implemented for taxonomic groupings only).

Browsing by data type

The first set of icons on the app home page enables browsing by the type of data contained in the index. Users can browse sets of species for which there are genome assemblies, single genetic loci and registered specimen records (see Figure 14). The number of data records is calculated below each particular data type. Each set of species can be browsed by clicking on the selected icon.

user guide fig 14 — **Figure 14.** Screenshot of the browsing icons for each available data type, as presented on the ARGA app interface.

Browsing by taxonomic groups (and higher taxon reports)

The ARGA interface presents the opportunity for browsing species by higher-level taxonomic groups (e.g. plants, animals, or fungi) to see overviews of the data in the index, and for scrolling through the species contained within that higher taxon. Higher taxon reports (Figure 15) can be accessed from the ARGA app home page by clicking on the icon relevant to the group of interest. They can also be reached by clicking on a taxon of interest from within the taxonomy tab of any species report.

Hint: You can also jump to data summary pages directly by typing in the URL using the formulation: {{app.arga.org.au/} + {rank/} + {Taxon}} (making sure to capitalise the taxon name). For example, to go straight to the summary page for beetles (Order Coleoptera), type in {app.arga.org.au/order/Coleoptera}.

The data summary block provides graphs and statistics describing the availability of indexed genomic data for the species within a designated taxonomic rank. The total number of species included in the taxon is given in the taxonomic breakdown block to the right. These species are listed on the browse cards below. Rank-specific data summaries are available for all eukaryotic taxa at the major Linnaean ranks. From the data summary page for a given taxon, you can navigate to summaries for higher ranks in the classification hierarchy by clicking on the scientific names in the higher classification block above the graphs.

user guide fig 15 — **Figure 15.** Screenshot of a **higher taxon report** as it appears on the ARGA interface. The higher taxon report provides summary statistics about the percentage of species within the group with genomes, and with any genetic data (as indexed on ARGA). The summary statistics also provide a breakdown of the relative data availability for subgroups within that grouping (*e.g.* percentage coverage by classes within a phylum). Users can also browse through all of the species in that group using the cards in the bottom container.

Browsing by functional and ecological groups

The functional and ecological groups each contain data collections curated by ARGA to reflect a particular theme (e.g. poisonous and venomous species, or species vulnerable to bushfires). You can browse the records associated with these groups using the shortcuts on the ARGA homepage (Figure 16). This will take you to the data collection page (Figure 17), from where you can browse through the included species. This page also includes listings of component datasets curated from external resources. Each dataset is accompanied by basic governance information and links to their original sources.

user guide fig 16 — **Figure 16.** Screenshot of the browsing icons for functional and ecological groups, as presented on the homepage of the ARGA app interface.

user guide fig 17 — **Figure 17.** Screenshot of the **data collection page** for the ARGA venomous and poisonous species list. The individual sources used to compile the list are given as component datasets, and species cards representing the listed species are displayed underneath.

Accessing data

ARGA provides several data access routes for users seeking to download sequence data to personal systems, HPC environments or online analysis platforms. Because ARGA does not store genomic data itself, these data access routes are underpinned by the existing infrastructure at source repositories. Users have the option of navigating to the source repository directly from the full data view of any datum. The get data button will redirect users to the relevant page of the source repository to select and download the desired files. For data sourced from NCBI GenBank, this button will direct users to the appropriate file directory on the FTP server. Alternatively, the add to list button on the full data view will save the datum to a list cached in your internet browser.

Please note that ARGA is currently building its data download capacity. At present, ARGA supports genome assembly downloads only.

Downloading data from the saved list

You can access your list of saved data at any time by clicking on the icon at the top right of the global menu, labelled saved data. The number in the orange circle beside the label indicates the total number of records currently saved to your list. To close the list window and return to the normal ARGA interface, click the saved data icon again.

The list window contains one card for each saved datum (Figure 18). Each card is titled with the accession number and genomic data type, and includes information about the organism, the source repository and the date the datum was originally released in the source repository. Records can be downloaded individually using the green buttons at the bottom of the card.

user guide fig 18 — **Figure 18.** Screenshot of the **saved list** window. Data saved to the list are represented in card format. Individual records can be downloaded using the green buttons at the bottom of each card. Bulk downloads can be executed using the buttons listed in the menu bar at the right-hand side of the list window.

Downloading individual records

The green button on the far right of the data record card provides a drop-down to select the desired file format based on the data available in the source repository (however, ARGA currently only supports downloads in fasta format). The download button on the left will trigger a local download of the datum from the source repository. The copy URL button will copy the file address in the source repository (e.g. the FTP link) to your clipboard. You can paste this into a script on your terminal to download the file to a specific location. You can also remove records from your list by clicking the red bin icon at the top right of the card. Records can be selected and deselected by clicking on the green tick at the top left of the card.

Local bulk downloads

To run a bulk download of all data in your list, first check that all records are selected (each card is showing a green tick), then click the download selected button at the right of the list window. This will initiate a local download of a zip file containing all sequence files and a master csv file compiling the metadata associated with all records, including specimen details, sequencing and assembly metadata, and provenance information relating to the original repository from which the data were sourced. The metadata csv file can also be downloaded separately using the download metadata button.

Programmatic access for bulk downloads

ARGA provides basic support for programmatic bulk data retrieval under the tools for programmatic access header in the right-hand box of the list window. The copy URLs to clipboard button will save all selected file addresses (such as FTP addresses) to your clipboard in a line-separated format. You can also download the file addresses in a txt file by selecting the download manifest button. The direct URLs or manifest file can be incorporated into custom scripts or used on the command line to customise downloading to specific directories or locations, such as HPC platforms.

Alternatively, the download script button will download a pre-written curl-based shell script which can be executed to download the selected accessions. You can use a text editor to inspect or adapt the script as desired.

Tip: You can upload data selected on ARGA to Galaxy Australia with the copy URLs to clipboard button. Open Galaxy Australia and select upload (at the top of the left-hand menu bar). In the pop-up window, choose paste/fetch data from the buttons along the bottom, paste the URLs copied from ARGA into the dialog box, and click start. This will initiate the download procedure directly from the source repository. A separate download job will run for each accession in the list.

Data sources and citations

ARGA indexes and uses Data from multiple online sources, which are listed on the Data Sources page. Users can navigate to this page by clicking on the tab marked Data sources on the global header at the top of the ARGA app. Data sources are provided with hyperlinks and, where relevant, citations. If users reuse any data found on ARGA in any published work (online or in print), they must reference the primary source work, including the author and/or data provider with literature citations (where applicable). Please refer to the ARGA Data Use Policy for more information.

Citing ARGA

Please cite ARGA by including the following statement when publishing your research:

This research was supported by the use of the Australian Reference Genome Atlas (ARGA), an NCRIS-enabled platform powered by the Atlas of Living Australia (ALA), in collaboration with Bioplatforms Australia and the Australian BioCommons, and supported by the Australian Research Data Commons (ARDC).

Contact us

For technical assistance with using the ARGA app, please send any questions to . We welcome any feedback or issues and bugs reporting via this address also.

For other general enquiries about the ARGA Project, please direct any questions or feedback to .

Page attributions

Hint icon by Icons8

Quill With Ink icon by Icons8