Bioinformatic analysis

Standard Illumina QC

Initial sequencing QC is performed with FastQC on the raw reads. This determines the number of reads, the quality of the reads and the sequencing adapter content in the data. This informs the parameters used for trimming. Typically, we trim reads with fastp to remove sequencing adapters and low-quality bases (<Q22) on the 3' end of the reads, discarding reads where less than 75bp remain. Of course, this varies according to the nature of the library and results of the initial QC.

After trimming, the following QC is performed.

FastQC, is run again on the trimmed reads to check the trimming has been effective.
Fastq_Screen is run to check for common contamination, e.g. E. coli, H. sapiens.
A subset of reads for each sample is BLASTed against a nucleotide (nt) or rrna based database to check the species composition.
A MultiQC report is generated that collates outputs from FastQC, Fastq_screen and trimming process.
A Composition Report is created using KronaTools that collates the results of the BLAST results and visualises them in a pie-chart format.

Standard QC Nanopore

QC of the whole run is performed with pycoQC, this report and the run report generated by the nanopore platform are provided.

For each sample, a pycoQC and a NanoPlot set of reports are generated.

The information from NanoPlot is summarised in a csv and html report is generated with in-house scripts to summarise from the nano plot outputs and assist the comparison between samples.

A subset of reads for each sample is BLASTed against a nucleotide (nt) or rrna based database to check the species composition.

A Composition Report is created using KronaTools that collates the results of the BLAST results and visualises them in a pie-chart format.

Standard QC for Chromium 10X

10x data is processed with the 10X software suite (i.e cellranger). Fastq files are first generated and run through the illumina QC steps detailed above. Cellranger count (or multi) is run to align the reads and quantify the expression. Cloupe browser files, expression matrices and a web summary are the key outputs

Bioinformatics Analysis for RNA-seq

In addition to the standard Illumina/Nanopore Bioinformatics QC for RNA-seq run on the Illumina platform we can offer trimming to remove adapters, low quality reads and very short reads. Trimming can be omitted upon request.

At extra cost differential gene expression analysis can be performed. We will require the NCBI website and information regarding reference to be used and specific comparisons to be performed. A pipeline that entails alignment and quantification will be used. The alignment rate will be assessed before further procedure and will be illustrated on a multiqc report.

Unnormalized counts will be differentially analysed using Deseq2. During various steps of the project we will deliver scripts of the analysis performed.

Log foldchanges, upregulated and downregulated genes will be supplied in few csv files.

Enrichment analysis can be performed based on the availability of annotation of the reference. If annotation is not available for the reference annotation can be performed with extra time, with annotation file supplied.

Different more advance analysis can be performed upon request and discussion.

Transcriptome generation

In addition to the standard Illumina/Nanopore Bioinformatics QC for transcriptome assembly we can offer trimming of Illumina based data remove adapters, low quality reads and very short reads. Specific parameters all outlined on the project_options.sh file. Trimming can be omitted upon request.

At extra cost we can offer trimming for long-read data, and perform de-novo or reference based assembly. Assemblies will be assessed with various tools such as BUSCO and others to estimate the completeness of the assembly.

Bioinformatics Analysis for DNA

In addition to the standard Illumina/Nanopore Bioinformatics QC DNA based analysis we can offer trimming of Illumina based data remove adapters, low quality reads and very short reads. Specific parameters all outlined on the project_options.sh file. Trimming can be omitted upon request.

At extra cost we can offer trimming for long-read data. At extra cost we can perform reference based and nonreference based assemblies. We can use both long-read,short read or just one type of reads to generate the assembly. With nanopore based assembly we can offer various round of polishing and if there are both long read and short read available, we can polish with both types of data to improve on the accuracy of the genome. At the end we will deliver fasta based files with an unannotated assembly. At additional cost we can offer annotation of the assembly within silico predictions of genes in a csv files format, of a gff file format.

Upon request we can assist with uploading data to the NCBI archive or other preferred databases.

At an additional cost we can SNP/variant call your data based on the quality reference you would like us to use with the gatk pipeline. Upon further agreement a different pipeline can be used but it might require additional pipeline optimisation costs if specific pipelines are requested. Alignment bam files would be delivered in the similar manner that trimmed and raw reads are delivered and a txt file with variant sites and SNPs would be sent alongside a brief explanation.

Different more advance analysis can be performed upon request and discussion.

Bioinformatics Analysis for Amplicons

In addition to the standard Illumina/Nanopore Bioinformatics QC for Amplicon projects run on Illumina we can offer primer trimming with inhouse script based on primer sequence used if requested during quotation.

In addition, 16S/ITS/18S analysis can be performed at extra cost. Cost depended on the complexity of the project and would need a meeting to specify with the bioinformatician of the team to specify details. Pipeline used can vary based on the demands of the projects but we have experience with tools such as Qiime2, qiime,deseq2,dada2 etc ect. Specific packages and pipelines can be used upon request.

Different more advance analysis can be performed upon request and discussion.

Bioinformatics Analysis for Epigenomics

At additional cost epigenomic analysis can be performed with Illumina based reads. As currently we are still working and improving on establishing a pipeline we would be happy to analyse your data and establish a suitable pipeline specific for your dataset at no extra optimisation costs.

At additional cost trimming can be performed with nanopore based data. Epigenomic analysis will be performed on nanopore data with algorithms based within the guppy basecaller. Phasing shall be performed, and aggregation will be run with all intermediary and final outputs delivered to recipient, identifying methylated regions. As currently we are still working and improving on establishing a pipeline we would be happy to analyse your data and establish a suitable pipeline specific for your dataset at no extra optimisation costs.

Different more advance analysis can be performed upon request and discussion.

Bioinformatics Analysis for Metagenomics

For Illumina based data cost is depended on the complexity of the project and would need a meeting to specify with the bioinformatician of the team to specify details. Pipeline used can vary based on the demands of the projects, but we have experience with tools such as Kraken, phyloseq, metaSPAdes etc etc. We are happy to use other pipelines and tools upon request and discussion.