Help

Virxicon is a knowledge base of RNA and DNA viruses. It gathers all information about viruses and viral sequences in one place. The complete virus taxonomy data are consistent with the International Committee on Taxonomy of Viruses (ICTV); the sequences are collected from the NCBI Viral Genome database and GenBank and annotated due to the Baltimore classification (I-VII).

Authors
Sequence search
Search result
Genome browser
Downloading data
News
Statistics
Database API
System requirements
Funding

1. Authors

Mateusz Kudla^1,2, Kaja Gutowska^1,3, Jaroslaw Synak¹, Mirko Weber², Katrin Sophie Bohnsack², Piotr Lukasiak^1,3, Thomas Villmann², Jacek Blazewicz^1,3, Marta Szachniuk^1,3

Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan Poland
University of Applied Sciences Mittweida, Mittweida, Germany
Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland

2. Sequence search

Users introduce the search criteria:

directly, by entering taxonomy level or species;
indirectly, by selecting group, molecular type, topology, resource.

Direct input of data using levels of the taxonomic hierarchy. Users have the option of filtering the data by manually entering the taxonomy level (Search by levels of taxonomy); taxonomy levels: real, kingdom, order, suborder, family, subfamily, genus, subgenus. The other option allows for entering species as the filter (Search by species).

If users click the Search by levels of taxonomy field, an alphabetical list of names is displayed. Users can choose one of these names or manually enter the level of taxonomy - for example, the name of a specific virus family, e.g., Coronaviridae. Then, on the results page, users receive sequences of all viruses belonging to the selected family.

Indirect input of data using checkboxes. Users have the option of searching the database by selecting different filters: virus groups, molecular type, topology, and resources. Users can choose one or more filters for a single search.

If the user selects one filter, for example, a virus group (e.g., ssRNA(+)), the search engine finds all viruses from this group and provides their complete list on the result page. If more filters are selected, the search engine returns sequences that meet all query conditions simultaneously.

Different examples of sequence search:

3. Search result

As the search result, users obtain the list of viral sequences that match the given search criteria. Each record in the result table can be displayed with additional information if the user clicks the corresponding row of the table. Users obtain detailed data about each sequence: original database ID linked to the reference repository; definition; molecular type; species name; classification; modification date; taxonomy; information about data sources; information about genes, CDS, or related and reference sequences.

Information about genes and coding DNA sequences (CDS) is given in separate tables, which contain details of each gene/CDS: the location, gene ID or protein ID (linked to the reference repository) and also UniProt ID as the reference to the functional information deposited in UniProt Database. Additionally, there is a possiblity for both, genes and CDS, to download the nucleotide sequence (and amino acid sequence for CDS), see example below:

Sequences with information about genes and CDS are visualized using a genome browser (see Genome browser).

4. Genome browser

The virus genome is visualized using the IGV genome browser. Users can zoom in to see single base pairs or zoom out to browse entire genes (green bars) and CDS (grey bars). Clicking the bar reveals all information about the corresponding gene/CDS. It is also possible to visualize any range in the sequence by entering it manually (see Figure 1). The range description has to begin with the ID of the virus, followed by a colon and two numbers (range endpoints) connected by a hyphen, e.g., for virus id NC_045512 and range 1-100bp, see animation example.

AJ866554:1-100
Figure 1. Visualization of any range in the sequence.

5. Downloading data

Users can download sequences separately (one at a time) or in packages (all selected sequences displayed on a single page). Virxicon introduced the limit of data download to 16,000,000 base pairs at a time, which restricts the size of the downloaded file to circa 16MB. If the users want to download more data, including the whole group of viruses or the whole family, they can do it via API (see Database API). Additionally, users can save tabular data (the list of results that includes rows selected by the users) in CSV file with comma-separated data.

6. News

The Virxicon database is updated weekly. Detailed information about the updates appears on the News page. News contains the total number of sequences in the database and the number of sequences in each virus group.

7. Statistics

The Statistics page shows a graphical representation of current holdings in the Virxicon database. The following charts are available:

Data distribution (updated weekly along with the database contents):

by general molecular type and virus group
by topology and general molecular type
by topology and virus group
by resource and general molecular type
by resource and virus group

Growth of sequences per month (updated monthly):

overall
by DNA-only
by RNA-only
by GenBank-only
by NCBI-only
by circular-only
by linear-only

8. Database API

If you want to send your query directly to the database, we provide you our API. If you are not familiar with using GraphQL query language you can learn how to use it on the official website.

8.1. Documentation

Complete documentation of our API is available in the playground, where you can also test your queries.

8.2. External access

You can access to our API be using the playground or by serving over HTTP using GET or POST request. There are also a lot of third-party application like postman that also allows you to make graphql request. All requests have to be querying to our public endpoint.

8.3. Examples

In API you could create your queries to obtain not only information about sequences but also information about sequence filtering parameters such as classification groups, mol types, taxonomies, and topologies.

Taxonomies
Using taxonomies query you can fetch all available taxonomies stored in a database with information about their types.

Example query:

{
  taxonomies {
    name
    type
  }
}

Example JSON Result:

{
  "data": {
  "taxonomies": [
    {
      "name": "Ackermannviridae",
      "type": "FAMILY"
    },
    {
      "name": "Betatorquevirus",
      "type": "GENUS"
    },
    …
  ]
}

Topologies
Using topologies query you can fetch all available topologies stored in the database.

Example query:

{
  topologies {
    name
  }
}

Example JSON Result:

{
  "data": {
    "topologies": [
      {
        "name": "linear"
      },
      {
        "name": "circular"
      }
    ]
  }
}

Molecular types
Using molTypes query you can fetch all available molecular types stored in the database.

Example query:

{
  molTypes {
    name
  }
}

Example JSON Result:

{
  "data": {
    "molTypes": [
      {
        "name": "RNA"
      },
      {
        "name": "DNA"
      },
      {
        "name": "mRNA"
      },
      {
        "name": "cRNA"
      }
    ]
  }
}

Classification groups
Using classificationGroups query you can fetch all available classification groups stored in the database with information on whether the group belongs to the original Baltimore classification.

Example query:

{
  classificationGroups {
    name
    isBaltimore
  }
}

Example JSON Result:

{
  "data": {
    "classificationGroups": [
      {
        "name": "dsDNA",
        "isBaltimore": true
      },
      {
        "name": "ssDNA/dsDNA",
        "isBaltimore": false
      },
      …
    ]
  }
}

Sequences
Using sequences query you can fetch all available sequences stored in the database. You can also filter your result by using parameters including those shown above. Due to the large number of sequences in the database, it has been limited to a simultaneous sequence download of up to 1000, however, it is possible to make several queries that will download all the data we need.

Below is an example of downloading sequences whose data source is NCBI, sequence has Flaviviridae taxonomy and its mol type is RNA using cursor-based pagination (of page size equal to 1).

Example query:

{
  sequences(first:1, dataSources:NCBI, taxonomies:["Flaviviridae"] molTypes:["RNA"]) {
    nodes{
      id
      organism
      sequence
    }
    
    pageInfo{
      endCursor
      hasNextPage
    }
  }
}

Example JSON Result:

{
  "data": {
    "sequences": {
      "nodes": [
        {
          "id": "NC_040815",
          "organism": "Hepacivirus P",
          "sequence": "ACATGGGGGGGGGCTGACAGTGAGTACACTGTGCCAAGCAGGTGCTA…"
        }
      ],
      "pageInfo": {
        "endCursor": "eyJfX3RvdGFsQ291bnQiOjE2NiwiX19wb3NpdGlvbiI6MH0=",
        "hasNextPage": true
      }
    }
  }
}

The result presented above contains an additional two elements: the nodes table, which represents the sequence list, and the pageInfo object, which represents information about further sequences. To retrieve the next sequences, all you have to do is to enter the value of endCursor field as an additional after parameter to sequences query, as shown below.

Example query:

{
  sequences(first:1, after:"eyJfX3RvdGFsQ291bnQiOjE2NiwiX19wb3NpdGlvbiI6MH0=",
      dataSources:NCBI, taxonomies:["Flaviviridae"] molTypes:["RNA"]) {

    nodes{
      id
      organism
      sequence
    }
    
    pageInfo{
      endCursor
      hasNextPage
    }
  }
}

Example JSON Result:

{
  "data": {
    "sequences": {
      "nodes": [
        {
          "id": " NC_040645",
          "organism": " Culex Flavi-like virus",
          "sequence": " CGCTGTTTTCGAATCAGTTTATCGAAGGCGTTCTATAGCGCAAGCTC…"
        }
      ],
      "pageInfo": {
        "endCursor": "eyJfX3RvdGFsQ291bnQiOjE2NiwiX19wb3NpdGlvbiI6MX0=",
        "hasNextPage": true
      }
    }
  }
}

9. System requirements

Virxicon is designed to work with most of the available web browsers. The latest versions of browsers are strongly recommended.

Minimum browser requirements


Chrome 58	Edge 14	Firefox 54	Safari 10	Opera 55

10. Funding

This project has been supported by the National Science Centre, Poland [2016/23/B/ST6/03931, 2019/35/B/ST6/03074], the statutory funds of Poznan University of Technology, and the grant of the European Social Fund (ESF).

Help

Table of contents

1. Authors

2. Sequence search

3. Search result

4. Genome browser

5. Downloading data

6. News

7. Statistics

8. Database API

8.1. Documentation

8.2. External access

8.3. Examples

9. System requirements

Minimum browser requirements

10. Funding