Database includes 0 sequences:
Logo
Poznan University of Technology
Institute of Bioorganic Chemistry, Polish Academy of Sciences
Mittweida University of Applied Sciences
Logo

You are 0 visitor!

Help

Virxicon is a knowledge base of RNA and DNA viruses. It gathers all information about viruses and viral sequences in one place. The complete virus taxonomy data are consistent with the International Committee on Taxonomy of Viruses (ICTV); the sequences are collected from the NCBI Viral Genome database and GenBank and annotated due to the Baltimore classification (I-VII).


Table of contents

  1. Authors
  2. Sequence search
  3. Search result
  4. Genome browser
  5. Downloading data
  6. News
  7. Statistics
  8. Database API
    1. Documentation
    2. External access
    3. Examples
  9. System requirements
  10. Funding


1. Authors

Mateusz Kudla1,2, Kaja Gutowska1,3, Jaroslaw Synak1, Mirko Weber2, Katrin Sophie Bohnsack2, Piotr Lukasiak1,3, Thomas Villmann2, Jacek Blazewicz1,3, Marta Szachniuk1,3

  1. Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan Poland
  2. University of Applied Sciences Mittweida, Mittweida, Germany
  3. Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland

2. Sequence search

Users introduce the search criteria:

  • directly, by entering taxonomy level or species;
  • indirectly, by selecting group, molecular type, topology, resource.

Direct input of data using levels of the taxonomic hierarchy. Users have the option of filtering the data by manually entering the taxonomy level (Search by levels of taxonomy); taxonomy levels: real, kingdom, order, suborder, family, subfamily, genus, subgenus. The other option allows for entering species as the filter (Search by species).

If users click the Search by levels of taxonomy field, an alphabetical list of names is displayed. Users can choose one of these names or manually enter the level of taxonomy - for example, the name of a specific virus family, e.g., Coronaviridae. Then, on the results page, users receive sequences of all viruses belonging to the selected family.

Indirect input of data using checkboxes. Users have the option of searching the database by selecting different filters: virus groups, molecular type, topology, and resources. Users can choose one or more filters for a single search.

If the user selects one filter, for example, a virus group (e.g., ssRNA(+)), the search engine finds all viruses from this group and provides their complete list on the result page. If more filters are selected, the search engine returns sequences that meet all query conditions simultaneously.

Different examples of sequence search:


3. Search result

As the search result, users obtain the list of viral sequences that match the given search criteria. Each record in the result table can be displayed with additional information if the user clicks the corresponding row of the table. Users obtain detailed data about each sequence: original database ID linked to the reference repository; definition; molecular type; species name; classification; modification date; taxonomy; information about data sources; information about genes, CDS, or related and reference sequences.

Information about genes and coding DNA sequences (CDS) is given in separate tables, which contain details of each gene/CDS: the location, gene ID or protein ID (linked to the reference repository) and also UniProt ID as the reference to the functional information deposited in UniProt Database. Additionally, there is a possiblity for both, genes and CDS, to download the nucleotide sequence (and amino acid sequence for CDS), see example below:


Sequences with information about genes and CDS are visualized using a genome browser (see Genome browser).

4. Genome browser

The virus genome is visualized using the IGV genome browser. Users can zoom in to see single base pairs or zoom out to browse entire genes (green bars) and CDS (grey bars). Clicking the bar reveals all information about the corresponding gene/CDS. It is also possible to visualize any range in the sequence by entering it manually (see Figure 1). The range description has to begin with the ID of the virus, followed by a colon and two numbers (range endpoints) connected by a hyphen, e.g., for virus id NC_045512 and range 1-100bp, see animation example.

AJ866554:1-100
Figure 1. Visualization of any range in the sequence.



5. Downloading data

Users can download sequences separately (one at a time) or in packages (all selected sequences displayed on a single page). Virxicon introduced the limit of data download to 16,000,000 base pairs at a time, which restricts the size of the downloaded file to circa 16MB. If the users want to download more data, including the whole group of viruses or the whole family, they can do it via API (see Database API). Additionally, users can save tabular data (the list of results that includes rows selected by the users) in CSV file with comma-separated data.


6. News

The Virxicon database is updated weekly. Detailed information about the updates appears on the News page. News contains the total number of sequences in the database and the number of sequences in each virus group.


7. Statistics

The Statistics page shows a graphical representation of current holdings in the Virxicon database. The following charts are available:

Data distribution (updated weekly along with the database contents):

  • by general molecular type and virus group
  • by topology and general molecular type
  • by topology and virus group
  • by resource and general molecular type
  • by resource and virus group

Growth of sequences per month (updated monthly):

  • overall
  • by DNA-only
  • by RNA-only
  • by GenBank-only
  • by NCBI-only
  • by circular-only
  • by linear-only


8. Database API

If you want to send your query directly to the database, we provide you our API. If you are not familiar with using GraphQL query language you can learn how to use it on the official website.

8.1. Documentation

Complete documentation of our API is available in the playground, where you can also test your queries.

8.2. External access

You can access to our API be using the playground or by serving over HTTP using GET or POST request. There are also a lot of third-party application like postman that also allows you to make graphql request. All requests have to be querying to our public endpoint.

8.3. Examples

In API you could create your queries to obtain not only information about sequences but also information about sequence filtering parameters such as classification groups, mol types, taxonomies, and topologies.

  • Taxonomies
    Using taxonomies query you can fetch all available taxonomies stored in a database with information about their types.

    Example query:
    {
      taxonomies {
        name
        type
      }
    }
    
    Example JSON Result:
    {
      "data": {
      "taxonomies": [
        {
          "name": "Ackermannviridae",
          "type": "FAMILY"
        },
        {
          "name": "Betatorquevirus",
          "type": "GENUS"
        },
        …
      ]
    }
    

  • Topologies
    Using topologies query you can fetch all available topologies stored in the database.

    Example query:
    {
      topologies {
        name
      }
    }
    
    Example JSON Result:
    {
      "data": {
        "topologies": [
          {
            "name": "linear"
          },
          {
            "name": "circular"
          }
        ]
      }
    }
    

  • Molecular types
    Using molTypes query you can fetch all available molecular types stored in the database.

    Example query:
    {
      molTypes {
        name
      }
    }
    
    Example JSON Result:
    {
      "data": {
        "molTypes": [
          {
            "name": "RNA"
          },
          {
            "name": "DNA"
          },
          {
            "name": "mRNA"
          },
          {
            "name": "cRNA"
          }
        ]
      }
    }
    

  • Classification groups
    Using classificationGroups query you can fetch all available classification groups stored in the database with information on whether the group belongs to the original Baltimore classification.

    Example query:
    {
      classificationGroups {
        name
        isBaltimore
      }
    }
    
    Example JSON Result:
    {
      "data": {
        "classificationGroups": [
          {
            "name": "dsDNA",
            "isBaltimore": true
          },
          {
            "name": "ssDNA/dsDNA",
            "isBaltimore": false
          },
          …
        ]
      }
    }
    

  • Sequences
    Using sequences query you can fetch all available sequences stored in the database. You can also filter your result by using parameters including those shown above. Due to the large number of sequences in the database, it has been limited to a simultaneous sequence download of up to 1000, however, it is possible to make several queries that will download all the data we need.

    Below is an example of downloading sequences whose data source is NCBI, sequence has Flaviviridae taxonomy and its mol type is RNA using cursor-based pagination (of page size equal to 1).

    Example query:
    {
      sequences(first:1, dataSources:NCBI, taxonomies:["Flaviviridae"] molTypes:["RNA"]) {
        nodes{
          id
          organism
          sequence
        }
        
        pageInfo{
          endCursor
          hasNextPage
        }
      }
    } 
    
    Example JSON Result:
    {
      "data": {
        "sequences": {
          "nodes": [
            {
              "id": "NC_040815",
              "organism": "Hepacivirus P",
              "sequence": "ACATGGGGGGGGGCTGACAGTGAGTACACTGTGCCAAGCAGGTGCTA…"
            }
          ],
          "pageInfo": {
            "endCursor": "eyJfX3RvdGFsQ291bnQiOjE2NiwiX19wb3NpdGlvbiI6MH0=",
            "hasNextPage": true
          }
        }
      }
    }
    
    The result presented above contains an additional two elements: the nodes table, which represents the sequence list, and the pageInfo object, which represents information about further sequences. To retrieve the next sequences, all you have to do is to enter the value of endCursor field as an additional after parameter to sequences query, as shown below.

    Example query:
    {
      sequences(first:1, after:"eyJfX3RvdGFsQ291bnQiOjE2NiwiX19wb3NpdGlvbiI6MH0=",
          dataSources:NCBI, taxonomies:["Flaviviridae"] molTypes:["RNA"]) {
    
        nodes{
          id
          organism
          sequence
        }
        
        pageInfo{
          endCursor
          hasNextPage
        }
      }
    } 
    
    Example JSON Result:
    {
      "data": {
        "sequences": {
          "nodes": [
            {
              "id": " NC_040645",
              "organism": " Culex Flavi-like virus",
              "sequence": " CGCTGTTTTCGAATCAGTTTATCGAAGGCGTTCTATAGCGCAAGCTC…"
            }
          ],
          "pageInfo": {
            "endCursor": "eyJfX3RvdGFsQ291bnQiOjE2NiwiX19wb3NpdGlvbiI6MX0=",
            "hasNextPage": true
          }
        }
      }
    }
    

9. System requirements

Virxicon is designed to work with most of the available web browsers. The latest versions of browsers are strongly recommended.

Minimum browser requirements

Google ChromeMicrosoft EdgeMozilla FirefoxSafariOpera
Chrome 58Edge 14Firefox 54Safari 10Opera 55

10. Funding

This project has been supported by the National Science Centre, Poland [2016/23/B/ST6/03931, 2019/35/B/ST6/03074], the statutory funds of Poznan University of Technology, and the grant of the European Social Fund (ESF).

Poznan University of Technology
Institute of Bioorganic Chemistry, Polish Academy of Sciences
Mittweida University of Applied Sciences