KEMBAR78
Geneious Prime Manual | PDF | Installation (Computer Programs) | Proxy Server
0% found this document useful (0 votes)
623 views322 pages

Geneious Prime Manual

Uploaded by

bowsugarfree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
623 views322 pages

Geneious Prime Manual

Uploaded by

bowsugarfree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 322

Geneious Prime 2022.

1
User Manual

Biomatters Ltd

March 15, 2022


Contents

1 Getting Started 7
1.1 Downloading & Installing Geneious Prime . . . . . . . . . . . . . . . . . . . . . 7
1.2 Geneious Prime setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Upgrading to new versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 The Geneious Prime Main Window 21


2.1 The Sources Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 The Document Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 The Document Viewer Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 The Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 The Help Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Geneious Prime menu bar options . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Importing and Exporting Data 35


3.1 Importing data from the hard drive to your Local folders . . . . . . . . . . . . . 35
3.2 Data input formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Importing files from public databases . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Exporting files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6 Printing and Saving Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 Managing Your Local Documents 59


4.1 Organizing your local documents . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Searching and filtering local documents . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Find Duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Batch Rename . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5 Backing up your local documents . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.6 Document History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5 Creating, Viewing and Editing Sequences 71


5.1 Creating new sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 The Sequence Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3 Customizable text view for sequences . . . . . . . . . . . . . . . . . . . . . . . . 86

2
CONTENTS 3

5.4 Editing sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86


5.5 Complement and Reverse Complement . . . . . . . . . . . . . . . . . . . . . . . 89
5.6 Translating sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.7 Viewing chromatograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.8 Virtual Gel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.9 Meta-data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6 Parent / Descendant Tracking 99


6.1 Editing Linked Parent Documents . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Editing Linked Child Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3 The Lineage View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

7 RNA, DNA and Protein Structure Viewer 107


7.1 RNA/DNA secondary structure fold viewer . . . . . . . . . . . . . . . . . . . . 107
7.2 3D protein structure viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

8 Working with Annotations 111


8.1 Viewing, editing and extracting annotations . . . . . . . . . . . . . . . . . . . . . 111
8.2 Adding annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.3 Compare Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

9 Sequence Alignments 131


9.1 Dotplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.2 Sequence Alignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.3 Alignment viewing and editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.4 Alignment masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
9.5 Consensus sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

10 Assembly and Mapping 151


10.1 Supported sequencing platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10.2 Read processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
10.3 De novo assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
10.4 Map to reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
10.5 The Contig Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
10.6 Editing Contigs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
10.7 Extracting the Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

11 Analysis of Assemblies and Alignments 177


11.1 Finding polymorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
11.2 Analyzing Expression Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

12 Building Phylogenetic Trees 197


12.1 Phylogenetic tree representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
12.2 Tree building in Geneious Prime . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
12.3 Tree building methods and models . . . . . . . . . . . . . . . . . . . . . . . . . . 201
4 CONTENTS

12.4 Resampling – Bootstrapping and jackknifing . . . . . . . . . . . . . . . . . . . . 203


12.5 Viewing and formatting trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

13 Primers 209
13.1 Design New Primers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
13.2 Manual primer design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
13.3 Importing primers from a spreadsheet . . . . . . . . . . . . . . . . . . . . . . . . 219
13.4 Primer Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
13.5 Test with Saved Primers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
13.6 Add Primers to Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
13.7 Characteristics for Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
13.8 Convert to Oligo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
13.9 Primer Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
13.10 Extract PCR Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
13.11 More Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

14 Cloning 231
14.1 Find Restriction Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
14.2 Digest into fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
14.3 Creating a custom enzyme set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
14.4 Introduction to the cloning interface . . . . . . . . . . . . . . . . . . . . . . . . . 239
14.5 Restriction Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
14.6 Gibson Assembly and In-Fusion Cloning . . . . . . . . . . . . . . . . . . . . . . 242
14.7 Parts Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
14.8 Gateway® Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
14.9 Golden Gate / Type IIS Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
14.10 TOPO® Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
14.11 Copy-paste cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
14.12 Analyze Silent Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
14.13 Optimize Codons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

15 CRISPR 261
15.1 CRISPR site finder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
15.2 Analyze CRISPR Editing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

16 BLAST 271
16.1 Setting up a BLAST search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
16.2 BLAST results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
16.3 NCBI BLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
16.4 Custom BLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

17 Workflows 281
17.1 Managing Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
17.2 Creating and editing Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
17.3 Custom code in Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
CONTENTS 5

18 Geneious Education 287


18.1 Creating a tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
18.2 Answering a tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

19 Saving Operation Settings (Option Profiles) 289

20 Shared Databases 291


20.1 Using a Shared Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
20.2 Direct SQL Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
20.3 Geneious Server Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

21 Command Line Interface 311


21.1 Running Geneious Prime operations from the command line . . . . . . . . . . . 311
21.2 Running operations in the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
21.3 Configuring an options profile to automate setting options . . . . . . . . . . . . 312
21.4 Running workflows from the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
21.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

22 Advanced Administration 317


22.1 Default data location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
22.2 Change default preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
22.3 Pre-configuring Shared Database connections . . . . . . . . . . . . . . . . . . . . 319
22.4 Pre-configuring license information . . . . . . . . . . . . . . . . . . . . . . . . . . 320
22.5 Adding custom plugins to the Plugins menu in Geneious Prime . . . . . . . . . 320
22.6 Deleting built-in plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
22.7 Max memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
22.8 Web Linking to Data in Geneious Prime . . . . . . . . . . . . . . . . . . . . . . . 321
6 CONTENTS
Chapter 1

Getting Started

The best way to get started with Geneious Prime is to try out some of our tutorials. The Tutorial
option under the Help menu provides an inbuilt tutorial with a basic introduction to the major
features of Geneious Prime. Additional tutorials on specialized functions can be downloaded
from our website https://www.geneious.com/tutorials.

For additional information and help with troubleshooting, please visit the Geneious support
website at https://help.geneious.com.

1.1 Downloading & Installing Geneious Prime

Geneious Prime is free to download from https://www.geneious.com/download. If you


are a first-time user you will be offered a free trial. If you have already purchased a license you
can enter it when Geneious Prime starts up.

To download the latest version of Geneious Prime, click on https://www.geneious.com/


download (or type it in to your internet browser), choose the version you want to download
and click Download.

Geneious Prime can be run on Windows, Mac, or Linux. The following OS versions are sup-
ported:

Operating System Version


Windows 7/8/10/11
Mac OS 10.11 - current
Linux Ubuntu Desktop LTS, last 2 supported versions

Note: From Geneious Prime 2020 onwards, only 64-bit Windows and Linux operating systems

7
8 CHAPTER 1. GETTING STARTED

are supported.

We recommend at least the following specifications for running Geneious Prime (note that
these are minimum requirements - for working with large datasets such as NGS sequences you
will need a higher-spec machine):

• Processor: Intel x86/x86 64

• Memory: 2048MB or more

• Hard-disk: 2GB or more free space

• Video: 1024x768 resolution or higher

Installing on Windows

Download the installer for Windows, then double-click on it to run it and follow the prompts.
The default installation location is in Program Files.

Installing on macOS

Download the installer for macOS. If the disk image does not automatically open, double-click
on it to open it. Drag the Geneious icon to Applications to complete the installation.

Installing on Linux

The installer for linux is an executable script. To install Geneious Prime, open the Terminal and
use the “cd” command to navigate to the location where you downloaded the installer. Then
to run the installer, type sudo sh ./installer name.sh

e.g. sudo sh ./Geneious Prime linux64 2022 1 1 with jre.sh

Note that Geneious can be installed without sudo privileges, and it will then be installed into
the user’s home directory. However, in order to activate a license FLEXnet will need to be in-
stalled with sudo, using the following command (all on one line):
sudo /home/<username>/Geneious Prime/licensing service/install fnp.sh
"/home/<username>/Geneious Prime/licensing service/linux64/FNPLicensingService"
1.2. GENEIOUS PRIME SETUP 9

1.2 Geneious Prime setup

1.2.1 User preferences

User preferences can be changed by going to Tools → Preferences. This window can also
be opened using the shortcut keys Ctrl+Shift+P (Windows/Linux) or command+Shift+P (Mac
OS X). In the user preferences you can change data storage, memory and connection settings,
install plugins, customize the appearance and behaviour, define shortcut keys, and set up se-
quencing profiles. Many of these options are explained in more detail in the next sections. To
export all of your user preferences in xml format (i.e if you wish to transfer the same prefer-
ences to another database), go to File → Export → Geneious Preferences or click the Export
button in the toolbar and choose Export Other → Geneious Preferences.

The tabs in the Preferences window are as follows:

General

This tab contains general setup options:

• Data storage location - shows where your data is stored on your drive (see section 1.2.2)

• Search history - allows you to clear your search history

• Check for new versions - Geneious Prime can check for the release of new versions every
time it is started. If a new version has been released you will be notified with a link
to download it. Geneious Prime will also check for beta versions if you have enabled
this option. A beta version is a version that is released before the official release for the
purposes of testing. It may therefore be less stable than official releases.

• Max memory available to Geneious Prime - allows you to change the RAM allocated to
Geneious Prime (see section 1.2.4)

• Advanced - allows editing of advanced preferences. You should not alter these unless
you know what you are doing.

• Connection settings - set how Geneious Prime connects to the internet (see section 1.2.5)

Plugins and Features

Install plugins and customize features in this tab, see section 1.2.6. This tab can also be accessed
via Tools → Plugins.
10 CHAPTER 1. GETTING STARTED

Appearance and Behavior

In the Appearance panel you can change the way the main toolbar and the document table
look, show or hide the memory usage bar and folder path (navigation) bar, and turn dark
mode on or off.

In the Behavior panel you can change the way newly created documents are handled, such as
where they should be saved to and whether they are selected straight away. You can choose
whether to store document history, and create active parent/descendant links (see section 6).

Reset questions allows you to reset the questions where you have previously told Geneious
to remember your preference. If you have checked “remember my preference” in a dialog
window, that window will no longer appear. You can click the Reset questions button to get
these windows to appear again.

The Statistics panel (Prime 2021.0 onwards) allows you to set the sequence length threshold
for calculating document table statistics (see section 5.2.9). The Recalculate statistics now but-
ton allows you to add statistics on documents created in earlier versions of Geneious, in the
currently selected folder.

In the Viewer panel you can set whether the same view settings are saved across documents
of the same type, and configure when to show the Dotplot and DNA/RNA fold tabs in the
sequence viewer.

Keyboard

This section contains a list of Geneious Prime functions and allows you to define keyboard
shortcuts for them. Shortcuts that are already defined are highlighted in blue. Setting short-
cuts can help you quickly navigate without using the mouse and also allows you to redefine
shortcuts to ones you may be familiar with from other programs.

Double click on a function to bring up a window to enter your new keyboard shortcut. If
you use one that is already assigned, Geneious will tell you what function currently has that
shortcut.

Keyboard shortcuts can also be added from individual operation dialogs. When the setup
options for a particular operation are opened, click the settings cog in the bottom left of the
window and choose Edit keyboard shortcut to assign or edit a shortcut key for that function.

NCBI

Here you can set the URL for the NCBI BLAST database, enter your NCBI API key if you have
one, and specify which field of the Genbank document should be copied to the “Name” field
in the Geneious document.
1.2. GENEIOUS PRIME SETUP 11

Sequencing

This tab has options for the management of trace files and assemblies, allowing you to set
thresholds that assign sequences as low, medium or high quality. To change the default pa-
rameters or set up a new binning profile, click the Default profile then click Edit/View. The
following options are available:

• Confidence: Set the threshold values of base call confidence used to determine if a base
call is low, medium or high quality. This affects the binning parameters described below
as well as the Base Call Quality color scheme in the Sequence View.

• Sequence binning options: Specifies the requirements for individual traces to be binned
as medium or high quality overall. To see the Bin for a trace, turn on the Bin column
under Table Columns in the View menu.

• Assembly binning options: Specifies the requirements for assembly documents to be


binned as medium or high quality overall. To see the Bin for an assembly, turn on the Bin
column under Table Columns in the View menu.

To create a new profile, change the parameters how you wish then enter a new Profile name
and click OK.

Other options for managing quality bins are:

• Track binning history in meta-data: When turned on, meta-data will be added to traces
when they are trimmed (see the Properties view tab). This meta-data will then be updated
every time the trace is re-trimmed, maintaining a history of the trimming.

• Enable per folder/document binning: When turned on, the Set Binning Parameters
option is added to the Sequence menu. This allows you to select an individual folder or
set of documents and set the binning parameters to use on those documents instead of
the global ones set in the Preferences.

1.2.2 Choosing where to store your data

Geneious Prime stores your data in a folder called Geneious X.Y Data (where X and Y are
the version of Geneious you are using), which is stored separately from the application itself.
When Geneious Prime first starts up you will be asked to choose a location for this folder. The
default location in the user’s home directory is normally the best option. Although it’s possible
to store your data on a network or USB drive so you can access it from other computers, this
is not recommended because it can have adverse effects on performance. Please do not use
Dropbox or other cloud storage applications to store your data. This may corrupt your data.
12 CHAPTER 1. GETTING STARTED

To store your data somewhere different to the default, simply click the ‘Select’ button in the
welcome window and choose an empty folder on your drive where you would like to store
your data.

To change the location of your Geneious Prime database at a later date, go to Tools → Prefer-
ences → General. The Data Storage Location shows the current location of your database (see
Figure 1.1). Click the Browse button to select a new location. Geneious will offer to either copy
your existing data across to a new location, or open an empty database at that location. The
new location will be remembered when you exit.

Figure 1.1: Setting the location of your local documents

Note that if you uninstall Geneious Prime your data will not be wiped as the application itself
is stored in a different location from the data folder.

1.2.3 Sharing files or the local database

The best solution for sharing files with other users is to set up a shared database (see Chapter
20) in a central location which all users can connect to. You can also export documents in
.geneious format and transfer these files via a USB or network drive for others to use.

We do not recommend storing databases on Dropbox or network drives as a way of sharing


files.

1.2.4 Changing the memory available to Geneious Prime

Geneious Prime runs in a Java Virtual Machine. When this JVM starts, it will be allocated a
certain amount of RAM and the program can use less than that but never more. An appropriate
amount of RAM based on your total RAM will automatically be allocated initially. To change
the amount of memory allocated, go to Tools→Preferences, and in the General tab increase the
Max memory available to Geneious Prime setting. On Windows you need to run Geneious
Prime as an administrator to change this setting (to do this, right-click on the Geneious icon
and choose “Run as administrator”).
1.2. GENEIOUS PRIME SETUP 13

You should never allocate the total memory of your computer to Geneious, as you need to
leave some RAM available for your operating system. As a general rule of thumb, on a 64-
bit machine with up to 8GB of RAM in total you can allocate half the RAM to Geneious. On
machines with more than 8GB RAM you should leave 3-4GB spare for your operating system
and allocate the rest to Geneious.

To see how much memory Geneious Prime is using, check the Memory Usage Bar under the
Sources panel. You may need to turn this on under Tools → Preferences → Appearance and
Behavior. Clicking this bar will force a garbage collection and free up memory within the JVM.

1.2.5 Connecting to the internet from within Geneious Prime

In order to activate a license, download plugins and search external databases like NCBI,
Geneious Prime needs to be able to connect to the internet. If you have a firewall preventing di-
rect access, or are behind a proxy server you may need to manually configure your connection
settings.

To find your system proxy settings, open your default internet browser and go to the connection
settings:

In Google Chrome: Settings → Advanced → System → Open Proxy Settings


In Safari: Preferences → Advanced → Proxies (Change Settings...)
In Firefox: Preferences → Network Settings
In Edge: Settings → System → Open computer proxy settings

To enter your proxy settings into Geneious go to Tools → Preferences and click on the General
tab.

The options are as follows:

• Use direct connection. Use this setting when no proxy settings are required.

• Use browser connection settings. This will automatically import the proxy settings. This
may not work with all web browsers.

• Use HTTP proxy server. Enter the proxy host and port as specified in your system proxy
settings. Use this if your proxy server is an HTTP proxy server.

• Use SOCKS proxy server. Enter the proxy host and port as specified in your system
proxy settings. Use this if your proxy server is a SOCKS proxy server.

• Use auto config file. If your proxy settings are specified in an automatic proxy configu-
ration (.pac) file, enter the location of the file here.

Note that if your proxy server requires a username and password you can specify these by
clicking the Proxy Password... button.
14 CHAPTER 1. GETTING STARTED

If there are no proxy settings specified in your browser and you are having trouble connecting,
we suggest you consult with your IT people regarding the appropriate settings.

Figure 1.2: Proxy settings in Preferences

1.2.6 Installing plugins and customizing features

You can extend the functionality of Geneious Prime with a variety of plugins. These can be
downloaded from our website, or managed via the plugins and features preferences (Figure
1.3) in Geneious Prime. To access the plugins preferences, go to Tools → Plugins.

This window contains a list of available plugins in the top window, which lists plugins avail-
able for download which aren’t already installed. To install a plugin, click the Install button,
and for more information about the plugin, click Info.

If you have downloaded a plugin file from our website (or obtained one from another source
in .gplugin format) you can install it by clicking Install plugin from a gplugin file and brows-
ing to the location of the file. You can also install plugins by dragging the gplugin file into
Geneious.

If you are running Geneious as an administrator then any plugin you install will be installed
for all users on the same computer. If you are not running as an administrator then plugins will
only be installed for the current user account. When upgrading plugins, Geneious may display
a message indicating that it needs to be restarted in order to complete the plugin upgrade. If
Geneious was being run as an administrator, it needs to be restarted as an administrator to
complete the plugin upgrade.

Note that on Windows you cannot drag and drop plugins in to Geneious while running as an
administrator.

Installed Plugins lists all the plugins you current have installed. Click the uninstall button
next to a plugin to remove it.

Other options for managing plugins are:

• Check for plugin updates now: Checks if there are any new versions available for the
1.2. GENEIOUS PRIME SETUP 15

Figure 1.3: The plugins preferences in Geneious Prime


16 CHAPTER 1. GETTING STARTED

plugins you have installed.

• Automatically check for updates to installed plugins: If checked, Geneious will check
for new versions of your installed plugins each time the program is started.

• Tell me when new plugins are released: Changes the way the program notifies you
about new plugin releases.

• Also check for beta releases of plugins: Plugins are sometimes initially released as a beta
for the purposes of testing before the officially release. Check this to be notified about the
release of beta plugins.

Customizing features

To see a list of all the features in Geneious Prime, click Customize feature set. Features can
be turned on or off by checking or un-checking the Enabled box next to each feature. You
might like to turn off the Tree Builder and Tree Viewer plugins if you don’t do phylogenetics
for example.

1.3 Upgrading to new versions

To upgrade existing Geneious installations, simply download and install the new version to the
same location. Your existing data and license will be automatically loaded in the new version.
If you are upgrading to a new major version (e.g. 11.1 to Prime 2019.0, or Prime 2019.0 to
2019.1). Geneious will update your data folder to the new format (creating a new folder with
the upgrade’s version number in the name), and will offer to keep a copy of your old data
folder in case you wish to downgrade.

In Geneious Prime 2021 onwards, updates will automatically download when available to your
data directory and you will be prompted to install the newest version of Geneious Prime when
it is available.

1.3.1 Downgrading to an earlier version

If you chose to keep a copy of your old data folder when you upgraded, you can easily down-
grade if you prefer to use the earlier version, or if your license isn’t valid for the latest version.
Downgrading requires that the new version of Geneious is uninstalled first to avoid there be-
ing vestiges of the old copy in place. Once this is done, the old version can be reinstalled and
Geneious will start up and see the old data folder. For instructions on how to import data from
a newer version into an old version see the section below on File compatibility.
1.4. LICENSING 17

1.3.2 File compatibility

Geneious data files are backwards compatible to version 6.0. Thus, files that were created in
version 6.1 or higher can be exported in .geneious format and opened in any version back
to 6.0. If you are using an earlier version than version 6.0 you won’t be able to open files
in .geneious format that were created in a newer version. Entire Geneious databases are not
backwards compatible, so when you upgrade you should accept the offer to keep a backup of
your existing database. If you then need to downgrade to an earlier version you can swap back
to your old database, and if you are on Geneious 6.0 or above, export any files you changed in
the new version in .geneious format and import them into your old database. If you wish to
export data back into a version earlier than 6.0, you will need to export the files in a common
format such as fasta or genbank.

1.4 Licensing

Licenses can be managed under the Help menu in Geneious Prime, using the following op-
tions:

1.4.1 Activate License

In this window you can activate a personal or group license or choose to connect to a li-
cense server. An email address must also be entered here for account management purposes
(Geneious Prime 2020.1 and later). Licensing options are as follows:

• License Key. If you have purchased a personal or group license you can enter the details
here to activate it. Make sure you enter the license key exactly as it appears in the email
you received when you purchased the license. An internet connection is required to ac-
tivate personal or group licenses, and you may need to configure your firewall/proxy
settings to enable access to https://licensing.biomatters.com on port 443.

• License Server. If your organization has purchased a floating license administered through
the Geneious Floating License Manager or your organization’s FLEXnet license server,
this is where you enter the details required to connect to the license server. Ask your
system administrator for the host name and port of the license server.

• Use Sassafras KeyServer. If your organization has purchased a floating license admin-
istered through Sassafras KeyServer, select this option. Your system administrator needs
to configure KeyAccess to point to the KeyServer license server.

• Log in to Geneious Account. This option is only available to Geneious Biologics cus-
tomers.
18 CHAPTER 1. GETTING STARTED

Figure 1.4: License activation options in Geneious Prime

1.4.2 Install FLEXnet

This installs the FLEXnet libraries which are necessary for activating a personal or group li-
cense. This is normally installed automatically, but Geneious will tell you if need to run this
when you activate your license. Only an administrator on your computer can do this but it
only needs to be done once from one user account. Once this has been done, any non-admin
user can activate their license on the machine. The admin should not activate licenses for users
as this may prevent the user from activating the license in their own account.

1.4.3 Borrow Floating License

This item is only available to users for a floating license administered through a FLEXnet li-
cense server. Borrowing a license allows you to borrow one of the seats of a floating license
so you can use it even when disconnected from the network. Since this decreases the number
of seats available for other users, borrowing can only occur with the authorization of the sys-
tem administrator. If your borrowing is approved, the system administrator will provide you
with a “borrow file” authorizing the borrow. To borrow a license, check “Borrow” in the menu,
and navigate to this file when prompted by Geneious. Borrowed licenses have an expiry date,
when they will automatically be returned to the server, but if you are finished with the license
before the expiry date, please uncheck “Borrow” in the menu while connected to the network
in which the license server resides, so that the license is returned to the server and is available
to other users again.

1.4.4 Release License

If you no longer need to have Geneious available on a computer where you have activated it,
use this option to release the license so it is available for use on another computer. Licenses can
only be released twice in a 6 month period so do not do it unnecessarily.
1.5. TROUBLESHOOTING 19

If you’re using a floating license, you can release it allowing another user to access it without
you having to shut Geneious down. Once you’ve released the license, Geneious will enter
restricted use mode.

1.4.5 Buy Online

This item will open the Geneious store in your browser, where you can upgrade licenses, pur-
chase new licenses or contact our sales team.

1.5 Troubleshooting

For help with troubleshooting or to request a feature, please contact the Geneious support team.
You can do this either from within Geneious Prime by going to Help → Contact Support, or by
going to our support website at https://help.geneious.com and submitting a request.
Using the Contact Support option from Geneious will automatically send through some sys-
tem information to help us assist you. If you are submitting a request through our website,
please include details of your operating system and the version of Geneious you are using and
as much information as possible about the nature of your problem (including screenshots to
illustrate the issue if appropriate).

The Geneious support website also contains a comprehensive Knowledge Base with solutions
to common problems and tips for getting the most out of Geneious, as well as a User Forum
where you can post questions to the Geneious community.

You can access the support website and download user manuals, license agreements and re-
lease notes from the Help menu in Geneious.
20 CHAPTER 1. GETTING STARTED
Chapter 2

The Geneious Prime Main Window

Figure 2.1 shows the main Geneious Prime window. This has five important areas or ‘panels’.

Figure 2.1: The main window in Geneious Prime

21
22 CHAPTER 2. THE GENEIOUS PRIME MAIN WINDOW

2.1 The Sources Panel

The Sources Panel displays your stored documents and contains the services Geneious Prime
offers for storing and retrieving data. The arrow (>) symbol indicates that a folder contains
sub-folders. Click this symbol to expand or contract folders.

The Sources Panel allows you to access:

• Your Local Documents.

• NCBI databases - Gene, Genome, Nucleotide, PopSet, Protein, Pubmed, SNP, Structure
and Taxonomy.

• An EMBL database - UniProt.

• Shared databases, if set up.

All these services will be described in detail later in the manual. You can view options for any
selected service with the right mouse button, or by clicking the Options button at the bottom
of the Sources Panel in Mac OS X.

For more information on managing folders in the Sources panel, see section 4.1.

2.2 The Document Table

The Document Table displays summaries of each document in a selected folder or folders,
presented in table form. A local folder may contain any mixture of documents, such as DNA
sequences, protein sequences, journal articles, sequence alignments, and trees (Figure 2.2). The
document types available are listed in figure 2.3.

For information on how to search and filter documents in the Document Table, see section 4.2.

Selecting a document in the Document Table will display its details in the Document View
Panel. Selecting multiple documents will show a view of all the selected documents if they are
of similar types, e.g. selecting two sequences will show both of them in the sequence view.

The easiest way to select multiple documents is by clicking on the checkboxes down the left-
hand side of the table. Standard keyboard controls can also be used (Shift and Ctrl/command
click).

Above the Document Table you can optionally display a navigation bar which shows the folder
path to whichever document or folder is currently selected. This is off by default but can be
enabled by going to Tools → Preferences → Appearance and Behavior and checking Show
folder path.
2.2. THE DOCUMENT TABLE 23

Figure 2.2: The document table, when browsing the local folders

Figure 2.3: Standard file types and icons


24 CHAPTER 2. THE GENEIOUS PRIME MAIN WINDOW

To view the functions available for any particular document or group of documents, right-click
(Ctrl+click on Mac OS X) on a selection of them. These options vary depending on the type of
document.

Document Table features

Editing. Values can be typed into the columns of the table. This is a useful way of editing the
information in a document. To edit a particular value, first click on the document and then
click on the column which you want to edit. Enter the appropriate new information and press
enter. Certain columns cannot be edited however, eg. the NCBI accession number.

Copying. Column values can be copied. This is a quick method of extracting searchable infor-
mation such as an accession number. To copy a value, right-click (Ctrl+click on Mac OS X) on
it, and choose the “Copy name” option, where name is the column name.

Sorting. All columns can be alphabetically, numerically or chronologically sorted, depending


on the data type. To sort by a given column click on its header. If you have different types of
documents in the same folder, click on the “Icon” column to sort then according to their type.

Export. The contents of the documents table can be exported in .csv or .tsv format, enabling
you to export sequences and their statistics for a set of documents. For further information see
3.5.1.

Managing Columns. You can reorder the columns to suit. Click on the column header and
drag it to the desired horizontal position.

You can also choose which columns you want to be visible by right-clicking (Ctrl-click on Mac
OS X) on any column header or by clicking the small header button above the top right
corner of the table. This gives a popup menu with a list of all the available columns. Use the
checkboxes to show or hide the columns you wish to display. Your preference is remembered
so if you turn off a column it will remain hidden in all areas of the program until you show it
again.

This menu also contains the following options to help you manage columns:

• Lock Column Order locks the state of the columns in the current table so that Geneious
will never modify the way the columns are set up. You can still change the columns
yourself however.

• Save Current State... allows you to save the the current state if the columns so you can
easily apply it to other tables. You can give the state a name and it will then appear in the
Load Column State... menu.

• Load Column State... contains all of the columns states you have saved. Selecting a
column state from here will immediately apply that state to the current table and lock
2.3. THE DOCUMENT VIEWER PANEL 25

the columns to maintain the new state. Use Delete Column State... to remove unwanted
columns states from this menu.

Note. New columns can be added to the document table by adding Meta-Data to documents
(see section 5.9 - Meta-Data).

2.3 The Document Viewer Panel

The Document Viewer Panel shows the contents of any document clicked on in the Document
Table, allowing you to view sequences, alignments, trees, 3D structures, journal article abstracts
and other types of documents in a graphical or plain text view (Figure 2.4). Options for con-
trolling the look and layout of a given document are displayed in the right-hand panel. These
options vary depending on what type of document you are viewing. For detailed information
on specific types of viewers, please refer to the sections below:

Sequence/Alignment Viewer - section 5.2


3D Structure Viewer - section 7.2
Dotplot Viewer - section 9.1
Tree Viewer - section 12.5
Journal Article Viewer - section 3.3.3

(a) Nucleotide sequence (b) Phylogenetic tree

Figure 2.4: Document viewers

To view large documents, you can open them in a new window by double clicking.

In the document viewer panel there are three tabs that are common to most types of documents:
Text view, Lineage and Info. Text view shows the document’s information in text format
26 CHAPTER 2. THE GENEIOUS PRIME MAIN WINDOW

(see section 5.3). This tab is not available for pdf documents: instead the user needs to either
click the View Document button or double-click to view it. Under the Info tab, you can view
document Properties (meta-data, section 5.9) and History (section 4.6). Further details on the
Lineage tab are given in section 6.3.

2.3.1 General viewer controls

There are several general options which are available on all viewers, which are shown in the
toolbar at the top right of the viewer. Some of these can also be accessed through the View
menu.

Split View: Provides several options for splitting the view so that multiple views are
shown simultaneously for one document. When the view is split, selection of annotations and
regions of the sequence are synchronized across the viewers. To close split views click the
button which is also on the right of the toolbar.

Expand View: Expands the document view panel to fill the main window by hiding the
sources panel on the right and the document table above. Clicking this again will return the
layout to its original state.

New Window: Opens another view of the current document in a separate window. This
allows you to have several documents open at once and gives more space for viewing. This
can also be achieved by double clicking in the document table.

2.4 The Toolbar

The toolbar contains buttons that provide shortcuts to common functions in Geneious Prime,
including BLAST, Workflows, Align/Assemble, Tree building, Primer Design and Cloning.
The toolbar also provides shortcuts for Adding and Exporting files (see chapter 3), and access-
ing the Help menu options (see section 2.6.7).

The Back and Forward options help you move between previous views and are analogous to
the back and forward buttons in a web browser. To go back and forward between folders you
have recently accessed, use Shift-click.

The toolbar can be customized by right-clicking on it (CTRL-click on Windows/Linux, command-


click on Mac OS X). This gives a popup menu with the following options:

• Show Labels: Turns the text labels on or off.


2.4. THE TOOLBAR 27

Figure 2.5: The Toolbar, including options for customization


28 CHAPTER 2. THE GENEIOUS PRIME MAIN WINDOW

• Large Icons: Switches between large and small icons.

• Customize: Lists all functions that can be added to the toolbar. Selecting/deselecting
buttons will show/hide the buttons in the toolbar (see Fig 2.5)

Toolbar shortcuts for a particular operation can also be added from within that operation’s
setup dialog. When the setup options for a particular operation are opened, click the settings
cog in the bottom left of the window and choose Show toolbar shortcut to add a button to the
toolbar for that option.

2.4.1 Status bar

Below the Toolbar, there is a grey status bar. This bar displays the status of the currently selected
service. For example, when you are running a search, it displays the number of matches, and
the time remaining for the search to finish.

2.5 The Help Panel

The Help Panel is off by default, and can be opened by clicking the Help icon in the toolbar
and going to Quick Help. The Help Panel has a Help tab and a Tutorial tab (Figure 2.6).

The Help tab provides you with information about the service you are currently using or the
viewer you are currently viewing. The help displayed in the help tab changes as you click on
different services and choose different viewers.

The Tutorial is aimed at first-time users of Geneious Prime. It is highly recommended that you
work through the tutorial if you haven’t used Geneious Prime before.

The Help panel can be closed at any time by clicking the button in its top corner, or by toggling
the Quick Help option in the Help menu.

2.6 Geneious Prime menu bar options

All of the functions in Geneious Prime can be accessed from the menu bar above the Toolbar.
This is split into seven main menus File, Edit, View, Tools, Sequence, Annotate & Predict and
Help.
2.6. GENEIOUS PRIME MENU BAR OPTIONS 29

Figure 2.6: The Help Panel


30 CHAPTER 2. THE GENEIOUS PRIME MAIN WINDOW

2.6.1 File Menu

This contains standard options for managing files, including creating new folders, sequences,
enzyme lists and text documents, renaming and moving folders, import/export (see chapter 3),
deleting, saving and backing up files (see 4.5). It also contains options for printing and saving
image files (see 3.6.2).

2.6.2 Edit Menu

This menu contains the standard editing functions for transferring information from within
documents to other locations, both within and outside of Geneious Prime: Cut, Copy, Paste,
Delete and Select All.

This menu also contains options for finding and renaming documents and their contents:

• Find in Document can be used to find text or numbers in a selected document, see section
4.2.4.

• Find Next and Find Previous finds the next or previous match for the text you specified
in the Find in Document dialog.

• Find Duplicates, see section 4.3.

• Batch Rename, see section 4.4.

• Go to base/residue, see section 5.2.2.

2.6.3 View Menu

This contains several options and commands for changing the way you view data in Geneious:

• Back, Forwards and History allow you to return to documents you had selected previ-
ously.

• Search is discussed in section 4.2.

• Agents are discussed in section 3.4.

• Next unread document selects the next document in the current folder which is unread.

• Table Columns contains the same functionality as the popup menu for the document
table header. See section 2.2 for more details.

• Open document in new window Opens a new window with a view of the currently
selected document(s).
2.6. GENEIOUS PRIME MENU BAR OPTIONS 31

• Expand document view expands the document viewer panel in the main window out to
fill the entire main window. Selecting this again to return to normal.

• Split Viewer Left/Right creates a second copy of the document viewer with the two
views laid out side by side.

• Split Viewer Top/Bottom creates a second copy of the document viewer with one on top
of the other.

• Document Windows Lists the currently open document windows. Selecting one from
this menu will bring that document window to the front.

2.6.4 Tools Menu

• Align/Assemble - see section 9.2 and section 10 respectively

• Tree - see section 12

• Primers - see section 13

• Cloning - see section 14

• BLAST - Perform a BLAST search (such as NCBI Blast) to find sequences that are similar
to the currently selected sequence(s). See section 16

• Add/Remove Databases - options for setting up and configuring NCBI and custom BLAST,
see section 16.4.3.

• Extract Annotations - see section 8.1.4.

• Mask Alignment - see section 9.4.

• Concatenate Sequences or Alignments - see section 5.4.1.

• Generate Consensus Sequence - see section 9.5.

• Workflows - access built-in Workflows and create new ones, see chapter 17.

• Plugins - Takes you to the Plugins menu where you can install or uninstall plugins.

• Preferences - see section 1.2.1

Many plugin options will also appear in this menu when installed, such as Classify Sequences
(Sequence Classifier plugin), and Submit to Genbank (Genbank submission plugin)
32 CHAPTER 2. THE GENEIOUS PRIME MAIN WINDOW

2.6.5 Sequence Menu

This contains several operations for manipulating nucleotide and protein sequences, including
processing NGS reads prior to assembly.

• New Sequence: Create a new nucleotide or protein sequence (including oligos) from
residues that you can paste or type in. See section 5.1.
• Extract Region: Extract the selected part of a sequence or alignment into a new document.
• Reverse Complement: Reverse sequence direction and replace each base by its comple-
ment. See section 5.5.
• Translate: Creates a new protein document from the translated DNA, see section 5.6.
• Back Translate: Creates nucleotide version of the selected protein document, see section
5.6.2.
• Circular Sequences: Sets whether the currently selected sequences are circular. This ef-
fects the way the sequence view displays them as well as how certain operations deal
with the sequences (eg. digestion). See section 5.2.3.
• Free End Gaps Alignment: Sets whether the currently selected alignment has free end
gaps. This effects calculation of the consensus sequences and statistics.
• Change Residue Numbering...: Changes the base numbering of the selected sequence.
On a circular sequence this function can be used to shift the origin of the sequence to a
different location. On a linear sequence this can be used to indicate that the sequence is
a subsequence of a larger sequence; to number a sequence with respect to a particular
location (ie make the start of a gene base 1); or to reverse the numbering of a sequence.
This will introduce two numbering systems into your sequence: the original numbering
(Standard), and the numbering that you have specified (Source).
• Convert between DNA and RNA: Changes all T’s in a sequence to U’s or vice versa,
depending on the type of the selected sequence. Once this is performed, click “Save” in
the Sequence View to make the change permanent.
• Set Read Direction: Marks sequences as forward or reverse reads so the correct reads are
reverse complemented by assembly.
• Set Read Technology: Specifies the sequencing platform used to generate sequence reads.
See section 10.1.
• Set Paired Reads: Sets up paired reads for assembly. See section 10.2.1.
• Merge Paired Reads: Merges paired reads using BBMerge, see section 10.2.3.
• Remove Duplicate Reads: Uses Dedupe to remove duplicate sequences from NGS datasets,
see section 10.2.4.
2.6. GENEIOUS PRIME MENU BAR OPTIONS 33

• Error Correct and Normalize: Uses BBNorm to error correct and normalize NGS reads,
see section 10.2.6.

• Separate Reads by Barcode separates multiplex or barcode data (e.g. 454 MID data). See
section 10.2.7

• Group Sequences into a List creates a sequence list containing copies of all of the selected
sequences. See section 5.1.1.

• Extract Sequence from List copies each sequence out of a sequence list into a separate
sequence document.

2.6.6 Annotate & Predict Menu

This menu contains many tools for finding, predicting and annotating regions of interest in
sequences and alignments. Plugins that involve sequence prediction and annotation will also
appear in this menu when installed.

• Trim Ends: Trims vectors, primers and/or poor quality sequence. see section 10.2.2.

• Transfer Annotations: Copies annotations to the reference and/or consensus sequence


of an alignment or assembly. See section 8.2.3.

• Annotate from Database: Annotates sequences with similar annotations from your database.
See section 8.2.4

• Compare Annotations: Compares annotations across up to 3 annotation tracks or docu-


ments. See section 8.3.

• Find ORFs: Finds all open reading frames in a sequence and annotates them

• Find Motifs: Searches for motifs in PROSITE format. Uses “fuzznuc” and “fuzzpro” from
EMBOSS.

• Find CRISPR sites: Searches gRNA (CRISPR) sites and scores them based on on-target
sequence features and off-target interactions. See section 15.1.

• Analyze CRISPR editing results: Measures frequencies of common variants from your
CRISPR experiment by mapping reads to the target sequence. See section 15.2.

• Find Variations/SNPs: Finds variable positions in assemblies and alignments. See section
11.1.1.

• Find Low/High Coverage: Finds regions with low or high read coverage in assemblies.
See section 10.5.
34 CHAPTER 2. THE GENEIOUS PRIME MAIN WINDOW

• Calculate Expression Levels: Calculates expression levels (RPKM, FPKM, TPM) for a
single sample. See section 11.2.

• Compare Expression Levels: Compares expression levels between two samples. See
section 11.2.

2.6.7 Help Menu

This consists of the standard Help options offered by Geneious Prime. This menu is also acces-
sible via the Help button in the Toolbar.

• Quick Help shows and hides the Help panel

• Contact Support allows you to contact the Geneious Support team through Geneious
Prime

• Tutorial shows and hides the Tutorial panel

• User Manual opens the online user manual

• Support Website takes you to help.geneious.com

• Other Resources takes you to videos, tutorials, license agreement, release notes etc

• Check for Updates checks for new versions of Geneious Prime

• Activate License lets you activate a license or connect to a license server

• Install FLEXnet installs the FLEXnet licensing service which is necessary to use FLEXnet
licenses

• Borrow Floating License lets you borrow a license from a FLEXnet server, if the main-
tainer of the server has provided you with a Borrow File

• Release Licenses releases any floating license you are currently holding and returns any
local FLEXnet licenses to our server so they can be activated on a difference machine

• Buy Online sends you to our online store

• About Geneious Prime gives details about the version of Geneious Prime you are run-
ning, and licensing information
Chapter 3

Importing and Exporting Data

Geneious Prime is able to import raw data from different applications and export the results
in a range of formats. All import and export options can be accessed via the Add and
Export buttons in the Toolbar, or via the File menu.

3.1 Importing data from the hard drive to your Local folders

To import files from local disks or network drives, click the Add button in the Toolbar and
select Import Files, or go to File → Import → Files.... This will open up a window where you
can either select the file format or let Geneious autodetect the format. The different file formats
that Geneious can import are described in detail in the next section.

Files can also be dragged and dropped from your hard drive directly into Geneious and the file
type will automatically be determined.

Files imported from disk are imported directly into the currently selected local folder within
Geneious. If no folder is selected, you will be prompted to choose a folder during the import.

3.1.1 Bulk import of multiple files

To import an entire folder and all its subfolders and files into Geneious Prime in one step, click
the Add button and choose Import Folder, or go to File → Import → Folder.... If the folder has
subfolders, the folder structure will be retained when it is imported into Geneious. In version
10.1 and above zip files containing multiple files and subfolders can also be imported.

In version 11.1 onwards, Geneious supports bulk import of a mixture of SAM, BAM, GFF, BED,
VCF and Fasta formatted files, allowing sequence, annotation and assembly information to be
imported in a single step. Any combination of these files can be selected and then dragged and

35
36 CHAPTER 3. IMPORTING AND EXPORTING DATA

dropped into Geneious. The reference sequence will be loaded first, followed by the annotation
and assembly files. Sequence IDs in the files must match for the import to proceed correctly. If
no reference sequence is present in the imported documents, you will be prompted to select a
reference from existing documents in your database, or load onto a blank sequence.

3.2 Data input formats

Geneious Prime version 2022.1 can import the following file formats:

Format Extensions Data types Common sources


BED *.bed Annotations UCSC
Common Assembly Format *.caf Contigs Sequencher
Clone Manager molecule *.cm5 Sequences and annotations Clone Manager
Clustal *.aln Alignments ClustalX
CSFASTA *.csfasta Color space FASTA ABI SOLiD
Comma/Tab Separated Values *.csv, *.tsv Spreadsheet files Microsoft Excel
DNAStar *.seq, *.pro Nucleotide & protein sequences DNAStar
DNA Strider *.str Sequences DNA Strider (Mac program), ApE
Embl/UniProt *.embl, *.swp Sequences Embl, UniProt
EMBOSS codon usage table *.cusp, *.cut Codon usage table EMBOSS cusp tool
Endnote (8.0 or 9.0) XML *.xml Journal article references Endnote, Journal article websites
Excel spreadsheet *.xlsx, *.xls Spreadsheet files Microsoft Excel
FASTA *.fasta, *.fas, *fasta.gz etc. Sequences, alignments PAUP*, ClustalX, BLAST, FASTA
FASTQ *.fastq, *.fq, *fastq.gz etc. Sequences with quality Illumina and other NGS sequencers
GCG *.seq Sequences GCG
GCG codon usage table *.cod Codon usage table GCG CodonFrequency tool, https://www.kazusa.or.jp/codon/
GenBank *.gb, *.xml Nucleotide & protein sequences GenBank
Geneious *.xml, *.geneious Preferences, databases Geneious
Geneious Education *.tutorial.zip Tutorial, assignment etc. Geneious
GFF, GFF3, GTF *.gff, *.gff3, *.gtf Annotations NCBI, Ensembl and other genome browsers
MEGA *.meg Alignments MEGA
Molecular structure *.pdb, *.mol, *.xyz, *.cml,
*.gpr, *.hin, *.nwo 3D molecular structures 3D structure databases and programs
Newick *.tre, *.tree, etc. Phylogenetic trees PHYLIP, Tree-Puzzle, PAUP*, ClustalX
Nexus *.nxs, *.nex Trees, Alignments PAUP*, Mesquite, MrBayes & MacClade
PDB *.pdb 3D Protein structures SP3, SP2, SPARKS, Protein Data Bank
PDF *.pdf Documents, presentations Adobe Writer, LATEX, Miktex
Phrap ACE *.ace Contig assemblies Phrap/Consed
PileUp *.msf Alignments pileup (gcg)
PIR/NBRF *.pir Sequences, alignments NBRF PIR
Qual *.qual Quality file Associated with a FASTA file
Raw sequence text *.seq Sequences Any file that contains only a sequence
Rich Sequence Format *.rsf Sequences, alignments GCGs NetFetch
SAM/BAM *.sam, *.bam Contigs SAMtools
Sequence Chromatograms *.ab1, *.scf Raw sequencing trace & sequence Sequencing machines
SnapGene sequence *.dna, *prot Sequences and annotations SnapGene
Text/html .txt, .rtf, .html Any text Simple text editors
VCF *.VCF Annotations 1000 Genomes Project
Vector NTI sequence *.gb, *.gp Nucleotide & protein sequences Vector NTI
Vector NTI/AlignX alignment *.apr Alignments Vector NTI, AlignX
Vector NTI Archive *.ma4, *.pa4, *.oa4, Nucleotide & protein sequences,
*.ea4, *.ca6 enzyme sets and publications Vector NTI
Vector NTI/ContigExpress *.cep Nucleotide sequence assemblies Vector NTI
Vector NTI database VNTI Database Nucleotide & protein sequences,
enzyme sets and publications Vector NTI

BED annotations

The BED format contains sequence annotation information. You can use a BED file to anno-
tate existing sequences in your local database, import entirely new sequences, or import the
annotations onto blank sequences.
3.2. DATA INPUT FORMATS 37

Clone Manager

Geneious can import annotated sequences files in the standard Clone Manager molecule format
.cm5. This will import name, description, topology, sequence and annotations. Currently it
does not import other fields, restriction cut sites or primer binding sites.

Other Clone Manager formats such as .cx5 and .pd4 are not currently supported for import.

CLUSTAL alignment

The Clustal format is used by the well known multiple sequence alignment programs ClustalW,
ClustalX and Clustal Omega .

Clustal format files are used to store multiple sequence alignments and contain the word clustal
at the beginning. An example Clustal file:

CLUSTAL W (1.74) multiple sequence alignment

HQ625570 MRVMGMWRNYPQWWIWGILGLWM--ICSVVGKLWVTVYYGVPVWTDAKATLFCASDAKAY
HQ625589 MRVKGRSRNYPQWWVWGILGFWMFMICNGVGNRWVTVYYGVPVWKEAKATLFCASDAKAY
HQ625572 MRVKGILKNYQQWWIWVILGFWMLMICNVVGNQWVTVYYGVPVWREAKATLFCASDAKAY
HQ625588 MRVMGKWRNCQQWWIWGILGFWIILICN-AEQLWVTVYYGVPVWKEAKTTLFCASDAKAY
HQ625568 MRVRGTQRNWPQWWIWTSLGFWIILMCR--GNLWVTVYYGVPVWTDAKTTLFCASDAKAY
HQ625581 MRVMGIPRNWPQWWIWGILGFWIMLMCRVEENSWVTVYYGVPVWKEATTTLFCASDAKAY

CSFASTA format

ABI .csfasta files represent the color calls generated by the SOLiD sequencing system.

CSV/TSV (Comma/Tab Separated Values) and Excel spreadsheet files

Sequences, primers and metadata information stored in spreadsheets can be uploaded to Geneious
from either .csv, .tsv, .xlsx or .xls files. For files containing sequences, including nucleotides,
proteins, primers or probes, Geneious will create a new document containing the sequence and
any additional fields chosen for import. For more information on importing primers from a
spreadsheet, see the PCR Primers section. Files containing only metadata can be imported onto
existing sequences in Geneious, see section 3.2.1 for details.
38 CHAPTER 3. IMPORTING AND EXPORTING DATA

DNAStar sequences

DNAStar .seq and .pro files are used in Lasergene, a sequence analysis tool produced by
DNAStar.

DNA Strider sequences

Sequence files generated by the Mac program DNA Strider, containing one Nucleotide or Pro-
tein sequence.

EMBL/Swiss-Prot sequences

Nucleotide sequences from the EMBL Nucleotide Sequence Database, and protein sequences
from UniProt (the Universal Protein Resource)

EndNote 8.0/9.0 XML

EndNote is a popular reference and bibliography manager. EndNote lets you search for journal
articles online, import citations, perform searches on your own notes, and insert references into
documents. It also generates a bibliography in different styles. Geneious can interoperate with
EndNote using Endnote’s XML (Extensible Markup Language) file format to export and import
its files.

FASTA sequences

The FASTA file format is commonly used by many programs and tools, including BLAST, T-
Coffee and ClustalX. Each sequence in a FASTA file has a header line beginning with a “>”
followed by a number of lines containing the raw protein or DNA sequence data. The sequence
data may span multiple lines and these sequence may contain gap characters. An empty line
may or may not separate consecutive sequences. Here is an example of three sequences in
FASTA format (DNA, Protein, Aligned DNA):

>Orangutan
ATGGCTTGTGGTCTGGTCGCCAGCAACCTGAATCTCAAACCTGGAGAGTGCCTTCGAGTG

>gi|532319|pir|TVFV2E|TVFV2E envelope protein


ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQK
3.2. DATA INPUT FORMATS 39

>Chicken
CTACCCCCCTAAAACACTTTGAAGCCTGATCCTCACTA------------------CTGT
CATCTTAA

FASTQ sequences

FASTQ format stores sequences and Phred qualities in a single file. These should typically be
used to import NGS sequence data e.g from Illumina, Ion Torrent, 454 and PacBio sequencers.
From R11 onwards, you can set read technology and pair reads as part of the Fastq import pro-
cess. Note that the native HDF5 file format from PacBio and Oxford Nanopore is not supported
and must be converted to fastq for import into Geneious.

GenBank sequences

Records retrieved from the NCBI website (http://www.ncbi.nlm.nih.gov) can be saved


in a number of formats. Records saved in GenBank or INSDSeq XML formats can be imported
into Geneious.

Geneious format

The Geneious format can be used to store all your local documents, meta-data types and pro-
gram preferences. A file in Geneious format will usually have a .geneious extension or a
.xml extension. This format is useful for sharing documents with other Geneious users and
backing up your Geneious data.

Geneious tutorial

This is an archive containing a whole bundle of files which together comprise a Geneious ed-
ucation document. This format can be used to create assignments for your students, bioinfor-
matics tutorials, and much more. See chapter 18 for information on how to create such files.

GFF annotations

The GFF format contains sequence annotation information (and optional sequences). You can
use a GFF file to annotate existing sequences in your local database, import entirely new se-
quences, or import the annotations onto blank sequences. Geneious also supports GFF3 and
GTF formats.
40 CHAPTER 3. IMPORTING AND EXPORTING DATA

MEGA alignment

The MEGA format is used by MEGA (Molecular Evolutionary Genetics Analysis).

Molecular structure

Geneious imports a range of molecular structure formats. These formats support showing the
locations of the atoms in a molecule in 3D:

• PDB format files from the Research Collaboratory for Structural Bioinformatics (RCSB)
Protein Database
• *.mol format files produced by MDL Information Systems Inc
• *.xyz format files produced by XMol
• *.cml format files in Chemical Markup Language
• *.gpr format ghemical files
• *.hin format files produced by HyperChem
• *.nwo format files produced by NWChem

Newick tree

The Newick format is commonly used to represent phylogenetic trees (such as those inferred
from multiple sequence alignments). Newick trees use pairs of parentheses to group related
taxa, separated by a comma (,). Some trees include numbers (branch lengths) that indicate the
distance on the evolutionary tree from that taxa to its most recent ancestor. If these branch
lengths are present they are prefixed with a colon (:). The Newick format is produced by phy-
logeny programs such as PHYLIP, PAUP*, Tree-Puzzle and PHYML. Geneious can import and
export trees (including bootstrap values and branch lengths) in Newick format.

Nexus tree

The Nexus format was designed to standardize the exchange of phylogenetic data, including
sequences, trees, distance matrices and so on. The format is composed of a number of blocks
such as TAXA, TREES and CHARACTERS. Each block contains pre-defined fields. Geneious
imports and exports files in Nexus format, and can process the information stored in them for
analysis.

If you want to export a tree in a format that preserves bootstrap values make sure you export
with metacomments enabled, otherwise the bootstraps will be lost.
3.2. DATA INPUT FORMATS 41

PDB structure

Protein Databank files contain a list of XYZ co-ordinates that describe the position of atoms in
a protein. These are then used to generate a 3D model which is usually viewed with Rasmol
or SPDB viewer. Geneious can read PDB format files and display an interactive 3D view of the
protein structure, including support for displaying the protein’s secondary structure when the
appropriate information is available.

PDF

PDF stands for Portable Document Format and is developed and distributed by Adobe Sys-
tems. It contains the entire description of a document including text, fonts, graphics, colors,
links and images. The advantage of PDF files is that they look the same regardless of the
software used to create them. Some word processors are able to export a document into PDF
format. Alternatively, Adobe Writer can be used. You can use Geneious to read, store and open
PDF files.

ACE/PHRAP assembly

Ace is the format used by the Phrap/Consed package, created by the University of Washington
Genome Center. This package is used mainly to assemble sequences.

GCG PileUp alignment

The PileUp format is used by the pileup program, a part of the Genetics Computer Group
(GCG) Wisconsin Package.

PIR/NBRF sequences

Format used by the Protein Information Resource, a database established by the National
Biomedical Research Foundation

Qual quality/Phred scores

Quality file which must be in the same folder as the sequence file (FASTA format) for the quality
scores to be used.
42 CHAPTER 3. IMPORTING AND EXPORTING DATA

RSF rich sequences

RSF (Rich Sequence Format) files contain one or more sequences that may or may not be related.
In addition to the sequence data, each sequence can be annotated with descriptive sequence
information.

SAM/BAM alignment

SAM and BAM format are produced and used by SAMtools. SAM/BAM files contain the
results of an assembly in the form of reads and their mappings to reference sequences.

Sequence Chromatograms

Sequence chromatogram documents contain the results of a sequencing run (the trace) and a
guess at the sequence data (base calling).

Informally, the trace is a graph showing the concentration of each nucleotide against sequence
positions. Base calling software detects peaks in the four traces and assigns the most probable
base at more or less even intervals.

SnapGene

Geneious can import annotated DNA sequence files in .dna format and protein sequences in
.prot format from SnapGene. Note that for nucleotide sequences longer than 65,536 bases,
restriction sites are not imported automatically, but you will be asked if you wish to import
them as an enzyme set. You can then re-annotate the sites onto the sequence using “Find
restriction sites”.

Text/HTML

Plain text files and simple HTML can be imported and displayed. HTML is a widely used
markup language that can apply format and structure to text, and will be interpreted by the
sequence viewer. In Geneious R10 and above, text files can also be created and edited in
Geneious, see section 4.1.4.

Unformatted sequence

A file containing only a sequence.


3.2. DATA INPUT FORMATS 43

VCF variant calls

The VCF format contains sequence annotation information. You can use a VCF file to anno-
tate existing sequences in your local database, import entirely new sequences, or import the
annotations onto blank sequences.

Vector NTI®

In addition to the import of whole VNTI databases (section 3.2.2), Geneious supports the im-
port of several Vector NTI file formats:

• *.gb and *.gp formats These formats are used in Vector NTI for saving single nucleotide
and protein sequence documents. They are very similar to the GenBank formats with the
same extensions, although they contain some extra information.
• *.apr format This format is used for storing alignments and trees made with AlignX,
Vector NTI’s alignment module.
• *.ma4, *.pa4, *.oa4, *.ea4 and *.ca6 formats These are the archive formats which Vector
NTI uses to export whole databases.
• *.cep format This format is produced by the ContigExpress module and Geneious will
import sequences (including the positions of the base calls), traces, qualities, trimmed
regions, annotations and editing history for individual reads and contigs.

3.2.1 Importing metadata from a spreadsheet onto existing documents

In Geneious Prime 2020.1 onwards you can add metadata to your Geneious documents by
importing it from a .csv or .tsv file. Importing metadata from an Excel spreadsheet file in .xlsx
format is supported in Geneious Prime 2022.1 onwards. Your file of metadata needs to have
one field in common with the documents you want to import the data onto, such as the name
or sequence ID field, so that Geneious can match the data to the correct document.

Prior to importing the file, the metadata fields you plan to add from your spreadsheet need
to be set up in Geneious. To check which fields are present and to add new fields, select any
document and go to the Info tab. Click Edit Metadata types to see the list of fields currently
available in Geneious. For a step by step guide to adding new metadata fields, see How do I
add custom metadata fields to my sequence?

Once the appropriate metadata fields are set up, you can proceed with importing your data.

To do this, first select the folder containing the documents you want to add the metadata to.
Then go to File → Import → Files..., select CSV/TSV or as the format, and select your file of
metadata. Or alternatively, drag and drop the file of metadata directly into Geneious.
44 CHAPTER 3. IMPORTING AND EXPORTING DATA

In the Import Documents window, click on Metadata as the type of document to import.

Matching Geneious document fields to your spreadsheet

Under Match document, select the Geneious document field that should be used to match
the spreadsheet data in the first dropdown menu. In the second dropdown menu, select the
spreadsheet column that contains the matching data.

For example, if you have a ”Sample ID” column in your spreadsheet that corresponds to the
appropriate document names in Geneious, you should select to Match document [Name] with
spreadsheet column [Sample ID] (see figure 3.1).

Figure 3.1: Matching document name in Geneious with Sample ID column in spreadsheet

Notes:

1. To match successfully, values must (a) be identical in the document and spreadsheet,
including matched case (leading and trailing white space on spreadsheet data is trimmed)
and (b) have a single match between the document and spreadsheet.
2. Some Geneious document fields cannot be used for matching, such as dates, percentages
and true-or-false values. These will be excluded from the dropdown menu. Additionally,
3.2. DATA INPUT FORMATS 45

only those fields present on documents in the current folder will be available to select
here.

3. The spreadsheet column used for matching can be selected based on column number
(counting left to right) or by an optional column header row.

Preview window

The Spreadsheet tab in the preview window shows the data in the spreadsheet file. The Result
Preview tab shows the matching fields in the Geneious document and spreadsheet, and shows
the metadata that will be added (Metadata mapping needs to be set up before this is populated).

Colors are used to highlight information about the import (see Figure 3.2). Hover your mouse
over a coloured value for more information about the notification or error.

Blue: The spreadsheet row does not match any documents in the selected folder. Check that
the correct Geneious field and spreadsheet column has been selected for matching, and ensure
that corresponding match values are identical in the Geneious field and spreadsheet cell. Note
that match values are case sensitive.

Yellow with Asterisk: The metadata field in Geneious already contains data, which will be
overwritten on import. The preview shows the new value that will be imported from the
spreadsheet. To see the old value, hover your mouse over the cell. Once metadata has been
imported it cannot be undone and any overwritten data will be lost.

Red with Warning Icon: There is an error which will prevent the import from proceeding.
Errors will occur when:

1. Multiple Geneious documents in the selected folder match one row in the spreadsheet
based on the columns selected for matching. Each row in the spreadsheet should match
one and only one document in Geneious.

2. Multiple spreadsheet rows contain an identical value in the column being used to match
by.

3. Some fields in Geneious may have value constraints, such as requiring a number between
1 and 10, a true or false value, or a date to name a few. If the spreadsheet data does not
adhere to these constraints an error will occur. Metadata constraints can be viewed or
edited via Edit Metadata Types in the Info tab of any document.

All errors must be resolved before commencing the import operation.

The checkboxes below the preview table allow you to filter which rows are displayed. These
filters do not affect the import operation. Geneious will apply metadata changes on all matched
rows once there are no errors.
46 CHAPTER 3. IMPORTING AND EXPORTING DATA

Figure 3.2: Preview window showing a spreadsheet row that does not have a matching docu-
ment in blue, and duplicated rows in red.
3.2. DATA INPUT FORMATS 47

Metadata mapping

In this section, specify which columns from the spreadsheet should be imported, by assigning
spreadsheet columns to metadata fields on the Geneious document.

In the Metadata dropdown menu, choose the Geneious metadata field you want to add spread-
sheet data to. All metadata types currently available in your database should show up here,
including those you added yourself. If you wish to add a new type, you can do this on any
document under Info → Edit Metadata types. The Standard fields type includes generic fields
such as Name, Description, Accession, Organism, Taxonomy and Notes. Click the + button to
add fields from more than one metadata type. Then click the Fields button to specify which
columns in the spreadsheet should be mapped to which field in Geneious.

For example, in the screenshot below the spreadsheet columns for Sampling Location, Sam-
pling date, and Freezer location have been added under the Sampling Information metadata
type, and the Organism column has been added under the Standard Fields type.

Figure 3.3: Mapping spreadsheet columns to metadata fields

The Result Preview table will be updated automatically when spreadsheet columns are mapped
to metadata fields.
48 CHAPTER 3. IMPORTING AND EXPORTING DATA

Once you are happy with how the import looks, click OK to load the metadata. You should
now see the metadata in the document table (you may need to scroll to the right hand side of
the table to see the new columns).

3.2.2 Importing Vector NTI Databases

Geneious can import whole databases from Vector NTI Advance or Express. Metadata, struc-
ture and lineage information from Vector NTI will be preserved so your files will be organized
in Geneious the same way as they were in Vector NTI.

To import your VNTI database, go to File → Import → Vector NTI Database. Browse to the
location of your VNTI Database folder and click OK to load it into Geneious. For more infor-
mation on VNTI database import, please see this post on our Knowledge Base.

3.3 Importing files from public databases

Geneious Prime is able to communicate with a number of public databases hosted by the Na-
tional Centre for Biotechnology Information (NCBI), as well as the UniProt database. You
can access these databases through the web at http://www.ncbi.nlm.nih.gov and http:
//www.uniprot.org/ respectively. These are all well known and widely used storehouses
of molecular biology data - more information on each is given in the sections below.

You can search these databases through Geneious Prime by selecting the NCBI or Uniprot
database folders at the bottom of the Sources tree, and entering your search term. Press En-
ter or the Search button to initiate the search. If you get a connection error, you may need to
configure your network connections manually, as described in Section 1.2.5.

For advanced search options, click the More Options button. This allows you to search for spe-
cific terms in specific fields of the Genbank or Uniprot documents, such as specific organisms
or author names. By clicking the ’+’ icon you can search in multiple fields at once, and choose
to match either “Any” of the fields (if only one of the fields needs to match), or “All” of the
fields (if all of the fields must match). For more on advanced search options, see section 4.2.

If you have a list of known accession numbers that you wish to download, you can enter these
in the Search box separated by a comma. For consecutive accessions, enter the first and last
numbers separated by a colon, and append [accn] to this. E.g. Entering “AB000001:AB000009[accn]”
will download all accessions between AB000001 and AB000009.

The results will appear in the Document table as they are found. The Search button changes to
a Stop button while the search is running, and this can be clicked at any time to terminate the
search. As the results are downloaded, you will see a small padlock icon in the status bar above
the Document Table, which indicates that these items cannot be modified in any way. You must
drag the file into a folder in your local database if you wish to retain the file and/or modify it.
3.3. IMPORTING FILES FROM PUBLIC DATABASES 49

If you don’t drag the documents from a database search into your local folders the results will
be lost when Geneious is closed. For more information on how to move files between folders
in your database, see 4.1.1.

Note: When searching the Genome, Gene or PopSet databases, the documents returned are
only summaries. To download the whole genome, select the summary(s) of the genome(s)
you would like to download and the click the Download button inside the document view or
just above it. Alternatively you can choose Download Documents in the File menu and in
the popup menu when document summary is right-clicked (Ctrl+click on Mac OS X). The size
of these files is not displayed in the Documents Table. Be aware that whole genomes can be
very large and can take a long time to download. You can cancel the download of document
summaries by selecting Cancel Downloads from any of the locations mentioned above.

3.3.1 UniProt

This database is a comprehensive catalogue of protein data. It includes protein sequences and
functions from Swiss-Prot, TrEMBL, and PIR.

3.3.2 NCBI (Entrez) databases

NCBI was established in 1988 as a public resource for information on molecular biology. Geneious
allows you to directly download information from nine important NCBI databases and perform
NCBI BLAST searches (Table 3.1).

Table 3.1: NCBI databases accessible via Geneious

Database Coverage
Gene Genes
Genome Whole genome sequences
Nucleotide DNA sequences
PopSet sets of DNA sequences from population studies
Protein Protein sequences
PubMed Biomedical literature citations and abstracts
SNP Single Nucleotide Polymorphisms
Structure 3D structural data
Taxonomy Names and taxonomy of organisms

Entrez Gene. Entrez Gene is NCBI’s database for gene-specific information. It does not include
all known or predicted genes; instead Entrez Gene focuses on the genomes that have been
50 CHAPTER 3. IMPORTING AND EXPORTING DATA

completely sequenced, that have an active research community to contribute gene-specific in-
formation, or that are scheduled for intense sequence analysis.

The Entrez Genome database. The Entrez genome database has been retired. For backwards
compatibility Geneious simulates searching of the old genome database by searching the Entrez
Nucleotide database and filtering the results to include only genome results.

The Entrez Nucleotide database. This database in GenBank contains 3 separate components that
are also searchable databases: “EST”, “GSS” and “CoreNucleotide”. The core nucleotide database
brings together information from three other databases: GenBank, EMBI, and DDBJ. These are
part of the International collaboration of Sequence Databases. This database also contains Ref-
Seq records, which are NCBI-curated, non-redundant sets of sequences.

The Entrez Popset database. This database contains sets of aligned sequences that are the result
of population, phylogenetic, or mutation studies. These alignments usually describe evolution
and population variation. The PopSet database contains both nucleotide and protein sequence
data, and can be used to analyze the evolutionary relatedness of a population.

The Entrez Protein database. This database contains sequence data from the translated coding
regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein sequences sub-
mitted to the Protein Information Resource (PIR), SWISS-PROT, Protein Research Foundation
(PRF), and Protein Data Bank (PDB) (sequences from solved structures).

The PubMed database. This is a service of the U.S. National Library of Medicine that includes
over 16 million citations from MEDLINE and other life science journals. This archive of biomed-
ical articles dates back to the 1950s. PubMed includes links to full text articles and other related
resources, with the exception of those journals that need licenses to access their most recent
issues.

Entrez SNP. In collaboration with the National Human Genome Research Institute, The Na-
tional Center for Biotechnology Information has established the dbSNP database to serve as a
central repository for both single base nucleotide subsitutions and short deletion and insertion
polymorphisms.

The Entrez Structure database. This is NCBI’s structure database and is also called MMDB
(Molecular Modeling Database). It contains three-dimensional, biomolecular, experimentally
or programmatically determined structures obtained from the Protein Data Bank.

Entrez Taxonomy. This database contains the names of all organisms that are represented in
the NCBI genetic database. Each organism must be represented by at least one nucleotide or
protein sequence.

3.3.3 Literature searching

Geneious Prime allows you to search for relevant literature in NCBI’s PubMed database. The
results of this search are summarized in columns in the Document Table and include the
3.3. IMPORTING FILES FROM PUBLIC DATABASES 51

PubMed ID (PMID), first and last authors, URL (if available) and the name of the Journal.

When a document is selected, the abstract of the article is displayed in the Document Viewer
(Text View tab) along with a link to the full text of the document if available, and a link to
Google Scholar, both below the author(s) name(s) (Figure 3.4).

As well as the abstract and links, Geneious also shows the summary of the journal article in
BibTex format in a separate tab of the Document Viewer (BibTex tab). BibTex is the standard
LATEX bibliography reference and publication management data format and the information in
the BibTex screen can be imported directly into a LATEX document when creating a bibliography.
Alternatively, a set of articles in Geneious can be directly exported to an EndNote 8.0 compati-
ble format. This is usually done when creating a bibliography for Microsoft Word documents.

Note: If the full text of the article is available for download in PDF format, it can also be stored
in Geneious by saving it to your hard drive and then importing it. This will allow full-text
searches to be performed on the article. To view a .pdf document either double click on the
document in the Documents Table or click on the View Document button. This opens the
document in an external PDF viewer such as Adobe Acrobat Reader or Preview (Mac OS X).
On Linux, you can set an environmental variable named “PDFViewer” to the name of your
external PDF viewer. The default viewers on Linux are kpdf and evince.

Figure 3.4: Viewing bibliographic information in Geneious


52 CHAPTER 3. IMPORTING AND EXPORTING DATA

3.4 Agents

Databases searches can be automated using an Agent, allowing you to continuously receive the
latest information on genomes, sequences, and protein structures. Each agent is a user-defined,
automated search. You can instruct an agent to search any Geneious accessible database at
regular intervals (e.g. weekly). This simple but powerful feature ensures that you never miss
that critical article or DNA sequence. To manage agents go to View → Agents. An agent has
to be set up before it can be used.

3.4.1 Creating agents

To set up an Agent go to View → Agents and click the Create button. You now need to specify a
set of search criteria including the database to search, key words to search on, search frequency
and the folder you wish the agent to deliver its results to.

The search frequency may be specified in minutes, hours, days or weeks. You can only use
whole numbers.

Selecting Only get documents created after today will cause the agent to check what docu-
ments are currently available when the agent is created. Then when the agent searches it will
only get documents that are new since it was created. This is useful if, for example, you have
already read all publications by a particular author and you want the agent to only get new
publications.

The easiest way to organize your search results is to create a new folder and name it appro-
priately. You can do that by navigating to the parent folder in the Deliver to box and clicking
New Folder, or by creating a new folder beforehand as follows:

1. Right-click (Ctrl+click on Mac OS X) on the Sample Documents or Local folders. This


brings up a popup menu with a New Folder... option.

2. Create a new folder and name it according to the contents of the search. (For example,
type “CytB” if searching for cytochrome b complex.)

3. Once created, select the new folder. You can now select the Create or Create and Run.
The agent will then be added to the list in the agent dialog and it will perform its first
search if you clicked Create and Run. Otherwise it will wait until its next scheduled
search.

All downloaded files are stored in the destination folder and are marked “unread” until viewed
for the first time.
3.4. AGENTS 53

Figure 3.5: The Create Agent Dialog

3.4.2 Checking agents

Once you have created one or more agents, Geneious allows you to quickly view their status
in the agents window. Your agents’ details are presented in several columns: Enable, Action,
Status and Deliver To.

Enable: This column contains a check box showing whether the agent is enabled. Action: This
summarizes the user-defined search criteria. It contains:

1. Details of the database accessed. For example, Nucleotide and Genome under NCBI.

2. The search type the Agent performed, e.g. “keyword”.

3. The words the user entered in the search field for the Agent to match against.

Status: This indicates what the Agent is currently doing. The status will be one of the following:

• “Next search in x time” e.g. 18 hours. The agent is waiting until its next scheduled search
and it will search when this time is reached.

• “Searching.” These are shown in bold. The agent is currently searching.


54 CHAPTER 3. IMPORTING AND EXPORTING DATA

• “Disabled.” The agent will not perform any searches.

• “Service unavailable.” The agent cannot find the database it is scheduled to search. This
will happen if the database plugin has been uninstalled.

• “No search scheduled” The agent is enabled but doesn’t have a search scheduled. To
correct this click the “Run now” button in the agent dialog to have it search immediately
and schedule a new search.

Deliver To: This names the destination folder for the downloaded documents. This is usually
your Local Documents or one of your local folders.

Note. If you close Geneious while an agent is running, it will stop in mid-search. It will resume
searching when Geneious is restarted.

3.4.3 Manipulating an agent

Once an agent has been set up, it can be disabled, enabled, edited, deleted and run. All these
options are available from within the Agents dialog.

• Enable or disable an agent by clicking the check box in the Enable column.

• Run Now Cause the agent to search immediately

• Cancel If the agent is currently searching, click this to stop the search.

• Edit Click this to change an agent’s database, search criteria, destination or search inter-
val.

• Delete Delete the agent permanently. Any documents retrieved by the agent will remain
in your local documents.

3.5 Exporting files

To export files from Geneious Prime, select the file or files you want to export and click the
Export button in the Toolbar and choose Export Documents, or go to File → Export →
Documents.... Each data type has several export options, as detailed in the table below. Any set
of documents may be exported in Geneious native format, and these files are back-compatible
to Geneious version 6.0 (e.g. Files exported in .geneious format from Geneious R7 and above
can be imported to Geneious R6 or later).

If you wish to export the parents and/or descendants of a particular file at the same time,
choose File → Export → with Parents/Descendants.
3.5. EXPORTING FILES 55

Data type Export format options


DNA sequence FASTA, Genbank (XML or flat), CSV/TSV, Geneious
Amino acid sequence FASTA, Genbank (XML or flat), CSV/TSV, Geneious
Chromatogram sequence ABI, Geneious
Sequence with quality As above, also FastQ, Qual
Annotation GFF, BED, Genbank, Geneious
Alignment or Assembly Phylip, FASTA, NEXUS, MEGA, Phrap ACE, SAM/BAM, Geneious
Variant calls VCF (single sample only)
Phylogenetic tree Phylip, FASTA, NEXUS, Newick, MEGA, Geneious
PDF document PDF, Geneious
Publication EndNote 8.0, Geneious
Graphs CSV, WIG
Document Properties CSV, TSV, Geneious
Text files .txt, CSV, TSV, Geneious
Assembly report .html

Documents imported in any chromatogram or molecular structure format can be re-exported


in that format as long as no changes have been made to the document.

Both fasta and fastq files can be exported in compressed (fasta.gz and fastq.gz) format for
smaller file size. If exporting paired reads in fasta format, an option to export the forward
and reverse reads to separate files is available by choosing ’Fasta Paired Files’ as the file type
option.

3.5.1 Export to comma-separated (CSV) or tab-separated (TSV) files

The values displayed in the document table can be exported to a csv file which can be loaded
by most spread sheet programs. When choosing to export in csv format Geneious will also
present a list of the available columns in the table (including hidden ones) so you can choose
which to export.

Sequences can also be exported in .csv files either from individual sequence documents, or
sequence lists and alignments. If a sequence list or alignment is chosen for export, you will
have the choice of exporting fields from each sequence in the file as a separate row in the
output file, or exporting fields from the whole document (one row per document in output
file).

3.5.2 Exporting multiple files

There are several options for export of multiple files from Geneious Prime:
56 CHAPTER 3. IMPORTING AND EXPORTING DATA

• Export to a single file: Multiple files can be exported to a single file by selecting all the
files you wish to export and going to File → Export → Documents. This will combine all
the files you selected into a single file for export.

• Batch Export: The option File → Export → To Multiple Files exports each selected file
as a different document. E.g. you can select several sequence documents and use this
option to export each sequence as an individual fasta file. The options for batch export
let you specify the format and folder to export to as well as the extension to use. Each file
will be named according to the Name column in Geneious.

• Export Folder: To export an entire folder to a single file, click on the folder in the Sources
panel and go to File → Export → Folder. Note that folders can only be exported in
.geneious format.

3.5.3 Drag to desktop Genbank export

Sequence files smaller than 1 MB can be exported in Genbank format simply by dragging them
out of the Geneious window to the desktop or to your operation system’s file manager. The
default options for Genbank export are used, except that annotations on the document are
exported as written, and not converted to strict Genbank format. If multiple documents are
dragged out at once, each is exported to a separate file. If multiple sequences are combined
into a list prior to export, they will be exported to a single Genbank document. This option is
not available for alignment documents.

3.6 Printing and Saving Images

Geneious Prime allows you to print (or save as an image) the current display for any document
viewer. This includes the sequence viewer, tree view, dotplot, and text view.

3.6.1 Printing

Choose Print from the file menu. The view is printed without the options panel. It is recom-
mended to turn on Wrap sequence and deselect Colors before printing. Wrapping prints the
sequence as seen in the sequence viewer and the font size is chosen to fill the horizontal width
of the page. The following options are available:

Portrait or landscape. Controls the orientation of the page.

Scale. Can be used to decrease or increase the size of everything in the view, while still printing
within the same region of the page. For many types of document views, this will cause it to
wrap to the following line earlier, usually requiring more pages.
3.6. PRINTING AND SAVING IMAGES 57

Size. Controls the size the printed region on the paper. Effectively, increasing the size, reduces
the margins on the page.

3.6.2 Saving Images

Choose Save as image file from the File menu, or click the Export button in the toolbar and
choose Export to Image. The following options are available:

Size. Controls the size of the image to be saved. Depending on the document view being saved,
these may be fixed or configurable. For example, with the sequence viewer, if wrapping is on,
you are able to choose the width at which the sequence is wrapped, but if wrapping is off, both
the width and height will be fixed.

Format. Controls image format. Vector formats (PDF, SVG and EMF) are ideal for publication
because they won’t become pixelated. Raster formats (PNG and JPG) are easier to share, great
for emailing and posting on the web. If you wish to edit the file outside of Geneious, SVG
or EMF format should be used. SVG files can be edited in tools such as Adobe Illustrator or
Inkscape, and EMF files can be edited on Windows using PowerPoint, or LibreOffice Draw
on Mac or Linux (the Mac version of PowerPoint can’t modify EMF files). With SVG or EMF
it is possible to ungroup components of the graphic for editing, and because they are vector
graphics they will scale without becoming pixelated.

Resolution. Only applies to raster formats (PNG and JPG) and is used to increase the number of
pixels in the saved image. We recommend increasing the resolution to at least 300% for printing
PNG or JPG files.

3.6.3 Exporting sequences and alignments as rich text (Prime 2019.1 onwards)

If you wish to export your sequence or alignment as a rich text file rather than an image, click
the Text view tab and format your sequence the way you wish. For further details on the
options available, see this post on our knowledge base. To export your formatted text, use the
Copy full sequence button to copy the displayed text to the clipboard, or use the Save as *.txt
button to export the displayed text as a text file.
58 CHAPTER 3. IMPORTING AND EXPORTING DATA
Chapter 4

Managing Your Local Documents

4.1 Organizing your local documents

Geneious documents are stored in a hierarchal arrangement of folders under the Local folder
in the Sources Panel. Clicking on a folder will display its contents in the Document Table.
Next to each folder name in the hierarchy is the number of documents it contains. When the
Local folder or a sub-folder is collapsed (minimized), the number next to the folder shows how
many files are contained in that folder as well as all of its sub-folders. In addition, if some of
the documents in a folder are unread, the number of unread documents will also appear in the
brackets.

To create a new folder, select the Local folder or a sub-folder icon in the Sources panel and
either right-click (Ctrl+click on Mac OS X) and select New folder from the popup menu, or go
to File → New → Folder or Add → New Folder (see Figure 4.1). This will open a dialog where
you can name your new folder. This folder will then be created within the folder you originally
selected.

You can also delete, rename, move, export or change the color of a folder by right-clicking on
the folder (or control-click on MacOSX) and selecting the option you require from the menu.
These options are also available under the File menu.

4.1.1 Moving files around

Files can be moved between folders in a number of ways:

Drag and drop. This is quickest and easiest. Select the documents that you want to move. Then,
while holding the mouse button down, drag them over to the desired folder and release. If you
dragged documents from one local folder to another, this action will move the documents – so
that a copy of the document is not left in the original location. In external databases such as

59
60 CHAPTER 4. MANAGING YOUR LOCAL DOCUMENTS

Figure 4.1: Creating a new local folder in the Sources panel

NCBI the documents will be copied, leaving one in its original location.

Drag and copy. While dragging a document over to your folder, hold the Ctrl key (Alt/Option
key on Mac OS X) down. This places a copy of the document in the target folder while leaving
a copy in the original location. This is useful if you want copies in different folders. Folders
themselves can also be dragged and dropped to move them or instead copied by holding down
Ctrl (Alt/Option on Mac OS X).

The Edit menu. Select the document and then open the Edit menu on the menu bar. Click on
Cut (Ctrl+X/command+X), or Copy (Ctrl+C/command+C). Select the destination folder and
Paste (Ctrl+V/command+V) the document into it.

4.1.2 Aliases

An Alias (AKA shortcut or symlink) is a lightweight document that references another doc-
ument. By copying a document and using Edit → Paste Alias this allows you to effectively
have the same document in multiple locations. Changes made to either the alias or the original
(including setting the name of the alias) will modify the original version of the document.

You can also create an alias by dragging and dropping a document to another folder while
holding down the Ctrl and Shift (Cmd and Alt on Mac OS X) keys.

An alias appears in the document table with a little curved arrow on top of the normal docu-
ment icon. To view the original that an alias was created from, right click and choose Go to
4.1. ORGANIZING YOUR LOCAL DOCUMENTS 61

Alias Source. If you wish to turn an alias document into a full document and break the link to
the original, select the alias go to File → Save As. This creates a new document from the alias.

Aliases have the following limitations:

• When an alignment is built from an alias to a sequence, the sequences in the alignment
will always refer back to the original sequence rather than the alias.

• Aliases cannot be made between documents in different databases. (e.g. not between a
shared database and your local documents)

4.1.3 Deleting Data and the Deleted Items folder

The Deleted Items folder is located underneath the local document folders in the Sources
panel. When a folder or document is deleted, it is moved to the Deleted Items folder rather
than erased immediately. This means the data can be recovered if it was deleted by mistake.
Pressing the Delete key is the easiest way to move the selected folder or documents to the
Deleted Items folder.

To recover documents or folders from Deleted Items you can either drag and drop them to
another folder or use Restore from Deleted Items (Put Back from Deleted Items on Mac OS)
in the File menu to automatically move them to folder they were deleted from.

The Deleted Items folder should be cleared periodically to keep hard drive space free. This can
be done by selecting Erase All Deleted Items from the File menu. Geneious will warn you if
the Deleted Items folder contains a large amount of data.

To erase a document immediately without moving it to Deleted Items, use Erase Document
Permanently in the File menu (or press Shift+Delete).

Many of these actions can also be accessed by right clicking on a folder or document.

4.1.4 Adding note documents to folders

To create a notes file in your folder, click the Add button on the toolbar and select New Text
Document or go to File → New → Text Document. This creates an empty text file in the
document table, and the viewer below acts as a simple text editor where information can be
typed in. To save the content you have entered, click Save.

Note documents can also be added by importing text or html files, using File → Import →
Files.
62 CHAPTER 4. MANAGING YOUR LOCAL DOCUMENTS

4.1.5 Searching folders

To find a folder by name, focus the sources panel by clicking on a folder then type any part of
the folder name you are looking for. This will cause a search panel to appear at the top of the
sources panel. Press Enter or click the search button to find the next folder that has the text you
entered in its name. Press Shift+Enter or hold Shift while clicking on the search button to find
the previous folder. This search is case insensitive.

To close the search panel click the X button or press Escape.

4.1.6 Calculating folder size on disk

To calculate the size in MB of selected folders in your Sources panel, right-click on the folder
and go to Show folder size. The size of the selected folder and all subfolders will be shown as a
prefix to the folder name, and folders will automatically be sorted by decreasing size. Note that
the size may include documents not visible in the folder. For example, referenced documents
(e.g. sequences in alignments, or documents with aliases) are not removed from the database
when deleted, until the documents which reference them are also deleted. The folder size will
not automatically update with changes to folder contents and can be removed by unchecking
Show Folder Size, or by restarting Geneious Prime.

4.1.7 Displaying the folder path

A navigation bar showing the file path to a selected document or folder can be optionally dis-
played above the Document Table. To turn on this feature, go to Tools→Preferences→Appearance
and Behavior and select Show folder path. Elements in the path can be clicked to jump to that
folder.

4.2 Searching and filtering local documents

4.2.1 Search

The Geneious Prime database can be searched by entering the term you are looking for in the
Search box at the top right of the tool bar.

The search field can be used to search for documents, folders and operations within Geneious.
4.2. SEARCHING AND FILTERING LOCAL DOCUMENTS 63

When the search field is selected, the search popup initially provides a list of recently browsed
documents and folders for quick access. A timestamp will be shown to the right of each recent
item, this represents the time this item was last opened. The search field can be opened at any
time using the shortcut cmd/cntrl-shift-F.

As you enter the search query, matching results will be updated in real time. By default, all
folders in both local and Shared databases will be searched. To restrict the document search
results to the current folder, change the Search setting from ”Everywhere” to ”This folder”. The
Match option enables you to choose between searching all fields or just the document name.
These settings will be remembered until either the search query is cleared or a highlighted item
has been selected.

Figure 4.2: Searching for documents in Geneious Prime

To filter the current folder according to your search query, just press enter after entering the
query. This automatically opens the advanced search in the currently selected folder, and
shows only matching documents in the table. For other filtering options, see section 4.2.3.

If unexpected errors occur during the search process, an error status will be shown in red on the
bottom of the popup. Click on Details for more information on the errors encountered. These
should not be a common occurrence. If you experience errors consistently, consider contacting
64 CHAPTER 4. MANAGING YOUR LOCAL DOCUMENTS

Support with the information in the error details dialog attached.

4.2.2 Advanced Search options

To search the selected folder with more filters, use the Search In ”Selected Folder” search op-
tion, then click the More Options button.

Figure 4.3: Accessing the Advanced Search options

The advanced search allows you to search for specific terms in specific fields of your docu-
ments. The fields available for a search can be found in the left-most drop-down box; all fields
potentially available on your local documents are listed here. If you have defined a new type
of meta-data in Geneious, and that meta-data field has been added to a document, then this
field will also be available to search.

Advanced Search also provides you with a number of options for restricting the search on a
field depending on the field you are searching against. For example, if you are using numbers
to search for “Sequence length” or “No. of nodes” you can further restrict your search with the
second drop-down box:

• “is greater than” (>)

• “is less than” (<)

• “is greater than or equal to” (≥)

• “is less than or equal to” (≤)

Likewise if you are searching on the “Creation Date” search field you have the following op-
tions
4.2. SEARCHING AND FILTERING LOCAL DOCUMENTS 65

• “is before or on”


• “is after or on”
• “is between”

When searching your local folders you have the option of searching by “Document type”. The
second drop-down list provides the options “is” and “is not”. The third drop-down lists the
various types of documents that can be stored in Geneious such as “3D-Structure”, “Nucleotide
sequence”, and “Oligonucleotide” (see Figure 4.4).

Figure 4.4: Document type search options

And/Or searches

The advanced options lets you search using multiple criteria. By clicking the “+” button on
right of the search term you can add another search criteria (see Figure 4.5). You can remove
search criteria by clicking on the appropriate “-” button. The “Match all/any of the following”
option at the top of the search terms determines how these criteria are combined:

Match “Any” requires a match of one or more of your search criteria. This is a broad search and
results in more matches.

Match “All” requires a match all of your search criteria. This is a narrow search and results in
fewer matches.

4.2.3 Filter

Geneious Prime 2021.1 onwards includes a quick filter option above the document table. To
use, click the filter icon and type in the text you are searching for. Geneious will display
66 CHAPTER 4. MANAGING YOUR LOCAL DOCUMENTS

Figure 4.5: Advanced Search using multiple criteria

all documents in the table that match this text and hide all other documents. To go back to
viewing all documents, clear or close the filter box.

Figure 4.6: Filtering the document table using the quick filter option

This option can also be used for filtering on-the-fly while searching public databases such as
NCBI. Type in the appropriate text in the filter box and only those documents that match both
the original criteria (as specified by the search terms) and the filter text will be displayed.

4.2.4 Find in Document

The Find in Document option under the Edit menu allows you to search for a particular motif
in your sequences, annotations or document names. For example, you can search for a partic-
ular string of nucleotides in sequences or alignments, or search for annotations and sequences
by name or number. The search can optionally be made case sensitive. Matching regions are
selected in the document at the end of the search.

The shortcut for this function is Ctrl+F. To find the next match for the text specified in the dialog
you can use F3 or Ctrl+G, and to find the previous match use Ctrl+Shift+G or Shift+F3.
4.3. FIND DUPLICATES 67

4.2.5 Nucleotide similarity searching and sorting

It is possible to search individual sequence documents not only for text occurrences but by
similarity to sequence fragments. Open the Advanced Search (see Fig 4.3) , click the small
arrow at the bottom of the large T to the left of the search dialog, select Nucleotide similarity
search or Protein similarity search and enter the sequence text. Geneious will try to guess the
type of search based on the text, so that simply entering or pasting a sequence fragment may
change the search type automatically.

The search locates documents containing a similar string of residues, and orders them in de-
creasing order of similarity to the string. The ordering is based on calculating an E-value for
each match. You can read more about the E-value in chapter 16. The search does not work on
sequence lists, alignment/assembly or tree documents.

For the search to be successful, you need to specify a minimum of 11 nucleotides and 3 amino
acids. Note that search times depend on the number and size of your sequence documents,
and so may take a long time to complete.

Similarity sorting

It is also possible to sort individual sequences documents in a given folder by their similar-
ity. To use this function, select a single sequence in the document table and right-click, then
choose Sort. Sort by similarity will rank all other sequences by their similarity to the selected
sequence. The most similar sequence is placed at the top and the least similar sequence at the
bottom. This also produces an E-value column describing how similar the sequences are to
the selected one. The Remove Sort by Similarity option will remove the E-value column and
return the table to its previous sorting.

4.3 Find Duplicates

Find Duplicates, under the Edit menu, is used to identify duplicate copies of sequences and
other documents. Duplicates can be identified by sequence name, database ID (e.g. accession)
or by the residues/bases, and the Search Scope can be set so that it checks within either a
selected set of documents, all documents in a folder or in the sequences of a single alignment
or sequence list.

When searching for duplicates within sequences of a single alignment or sequence list, two
options are available for displaying results once the search has run:

• Select earlier duplicates in list: This will select all but one copy of a duplicated docu-
ment, allowing the duplicates to easily be deleted or moved to another folder leaving one
copy behind.
68 CHAPTER 4. MANAGING YOUR LOCAL DOCUMENTS

• Extract unique sequences: Unique sequences will be extracted to a new sequence list,
and the sequence names modified to show the duplicate count for that sequence. For
large datasets, or removing duplicates in paired reads, or removing non-exact duplicates,
see Remove Duplicate Reads using BBTools.

If you are searching for duplicates within a folder or multiple select documents, you can choose
to select either the most recently or least recently modified copy.

Remove Duplicate Reads

For identifying non-exact duplicates, removing exact duplicates from large datasets, or remov-
ing duplicates on paired read datasets, use Remove Duplicate Reads... from the Sequence
menu. This tool runs Dedupe from the BBTools suite.

For a detailed explanation of any Dedupe setting, hover the mouse over the setting, or click the
help (question mark) button next to the custom options under More Options.

4.4 Batch Rename

Batch rename is located under the Edit menu and can be used to edit any field in multiple
documents at once. It can also be used to batch edit any property of sequences within an
alignment or sequence list.

Existing fields can be replaced with a combination of values from other fields (e.g. Name re-
placed with Organism and/or Accession), and fixed text can be added to the beginning or end
of existing fields.

The advanced options (under the More Options button) enables the use of regular expressions
to replace a specific part of one field with another property or text string. Click the Help
button in this section for more information on formatting expressions. It is also possible to
batch rename from an Annotation name or property by selecting Add Property and choosing
Annotation Name or Property.

4.5 Backing up your local documents

It is important to keep frequent back ups of your data because computers can fail suddenly and
unexpectedly. A computer can be replaced, but your data is much harder to replace. The best
way to back up all of your data and settings in Geneious Prime is to use the Geneious backup
tool under File → Back Up Data (Figure 4.7) .
4.5. BACKING UP YOUR LOCAL DOCUMENTS 69

Note: Due to the way the local database works, it is important that Geneious is not accessing
the database when a backup is taken. For example, Mac users with Time Machine will have
backups taken during the day but if Geneious is running operations when those backups are
taken, they will not be suitable for restoring from. However, backups taken overnight when
Geneious isn’t running should be fine.

Backing up your data directory manually is not recommended because the Geneious database
structure is complex and many programs will fail to back it up properly.

The back up command has two options:

• Export selected folder: This will export the selected folder (including all subfolders) to
a Geneious format file. This allows you to back up an individual project within your
database. The backup can also be imported in to an existing database by drag and drop.
If you have finished working on a project it is a good idea to back it up in this way then
delete it from inside Geneious to keep the size of your database down and improve the
performance of Geneious. You should keep archive backups in addition to these because
this backup will miss your settings and data outside the selected folder.

• Archive all data and settings: This is equivalent to creating a zip archive of your entire
Geneious data directory which includes all your data, preferences, searches and agents.
This option will cause Geneious to cease working on the local database while it creates
the archive. This type of backup cannot be directly imported in to an existing database,
when it is loaded everything in Geneious will revert to how it was when you took the
backup.

Backups should be stored on another drive, or can be left to general system backups safely
since they are made when Geneious is in a non-running state. These backups can also be safely
moved around including to other machines.

4.5.1 Restoring a backup

• Geneious format file Files with the suffix .geneious can be imported like any other file
type, either by dragging and dropping into your Geneious database or using the Import
Files options in the Add or File menus. Alternatively you can use Restore Backup in the
File menu and the file will be added under the Local folder in your current database.

• Backup.zip file: This is an archive of all your data and settings and is an entire database
in itself. These files cannot be imported into an existing Geneious database. To restore a
backup.zip archive, go to File → Restore Backup and choose the backup.zip file you want
to restore (do not unzip the file manually first). At the prompt, choose a new location to
restore the backup to. This folder must either be empty, or must not currently exist on
your drive (Geneious will create the folder as it restores the backup). After the backup
has been extracted, Geneious will offer to load the restored data. If you choose not to load
70 CHAPTER 4. MANAGING YOUR LOCAL DOCUMENTS

Figure 4.7: Using the backup tool

it immediately you can switch to the restored data directory by going to Preferences in the
Tools menu and changing the Data Storage Location on the General tab.

4.6 Document History

The history of an document can be viewed by going to the Info tab above the sequence viewer,
and choosing History. This displays information on how the document was created, plus a
record of each time it has been modified. The exact information displayed is flexible, but is
the entries will always include the time and user responsible for the edit. An entry may also
reference other documents via hyperlinks, and has the ability to display a re-creation of the
options used.

Saving of history can be disabled for performance or privacy reasons by going to the Appear-
ance and Behaviour tab in Preferences, see section 1.2.1.
Chapter 5

Creating, Viewing and Editing


Sequences

5.1 Creating new sequences

New sequences can be imported from existing files as described in chapter 3, or they can be
created manually by going to Sequence → New Sequence, or File → New → Sequence. Here
you can paste or type in the residues for your new sequence, then enter the Name, Description
and Organism for your sequence if required (see figure 5.1). Geneious will automatically deter-
mine whether your sequence is nucleotide or protein based on the composition of the bases you
enter. You can change this by clicking the Type option. If your sequences are oligonucleotides,
choose Primer or Probe as the type. If your primer contains a 50 extension, you can specify
this by setting the length of the binding region. Bases not in the (green) binding region will be
included as a 50 extension.

To create a new sequence from an existing sequence, select the region of sequence that you
want then click the Extract button above the sequence viewer, or go to Sequence → Extract
Regions. This will create a new sequence document containing the selected sequence.

5.1.1 Sequence lists

Sequence lists make it easier to manage large numbers of sequences by grouping related se-
quences into a single document. When you import files containing multiple sequences you
will be asked if you want to store those sequences in a list. To existing sequences in your
database into a list, or combine two lists into one, select the sequences or sequence lists you
want to group and go to Sequence → Group Sequences into a List. Note that this copies your
sequences into a list and retains the original sequence documents.

71
72 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

Figure 5.1: Entering a new sequence in Geneious Prime


5.2. THE SEQUENCE VIEWER 73

To extract sequences from a list, select the sequence(s) you want to extract and go to Sequence
→ Extract Sequences from List. This will copy each sequence out of the list into a separate
document, while retaining the original sequence within the list. To remove a sequence from a
list entirely, select it and click the Delete button on your keyboard, or go to Delete under the
Edit menu.

5.2 The Sequence Viewer

Sequences are displayed in the viewer below the document table. Annotations (Chapter 8.1),
translations (section 5.6) and analysis graphs (section 5.2.7) are also displayed in this viewer.

Figure 5.2: A view of an annotated nucleotide sequence in Geneious Prime

5.2.1 Zoom level

Controls for zooming in and out on sequences are located at the top of the side panel, to the
right of the sequence viewer. The plus and minus buttons increase and decrease the magnifi-
cation of the sequence by 50%, or by 30% if the magnification is already above 50%. To zoom
in or out by a smaller amount, hold down the alt and/or shift key while clicking the plus or
minus button.

zooms in to fit the selected region in the available viewing area.


74 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

zooms to 100%. The 100% zoom level allows for comfortable reading of the sequence.

zooms out so as to fit the entire sequence in the available viewing area.

Zooming can also be quickly achieved by holding down the zoom modifier key, which is the
Ctrl key on Windows/Linux or the Alt/Option key on Mac OS X, and clicking as described
below. When the zoom key is pressed a magnifying glass mouse cursor will be displayed.

• Hold the zoom key and left click on the sequence to zoom in.

• Hold the zoom key and Shift key to zoom out.

• Hold the zoom key and turn the scroll wheel on your mouse (if you have one) to zoom in
and out.

• Hold the zoom key and click on an annotation to zoom to that annotation

You can also pan in the Sequence View by holding Ctrl+Alt (command+Alt on Mac OS X) and
clicking on the sequence and dragging.

5.2.2 Selecting part of a sequence

Within the sequence viewer, there are several ways to select part of a sequence, or select a
subset of sequences from a list, alignment or assembly:

• Mouse dragging: Click and hold down the left mouse button at the start position, and
drag to the end position. You can modify the boundaries of a selected sequence by hold-
ing your mouse over the end of selected region until the <->arrow appears, then clicking
and dragging again to select more or fewer bases. To select multiple regions of a sequence
or alignment, use the Ctrl (Windows/Linux) or command (Mac) keys.

• Select from annotations: When annotations are available, click on any annotation to se-
lect the annotated residues. As with mouse dragging, multiple selections are supported.

• Click on sequence name: This will select the whole sequence.

• Select all: Use the keyboard shortcut Ctrl+A (+A on Mac) to select everything in the
panel.

Keyboard shortcuts for selection of sequences:

• To quickly select a single residue, double-click on it.

• To select a block of residues within a single sequence, triple click.


5.2. THE SEQUENCE VIEWER 75

• To select a block of residues across multiple sequences, quadruple click.


• To select a block of 10 residues, hold down Shift and Cntrl (alt/option on a Mac) and
press the keyboard arrow.
• To select a specific region of sequence, click at beginning of the region you want to select,
hold down Shift and then click the end.
• To modify the right-hand end of a selection, hold down Shift/alt (command on a Mac)
and use the right/left arrows to select more or fewer bases. Holding down Shift and Cntrl
(alt/option on a Mac) modifies the selection by 10 bases at a time.
• To select the same region across multiple sequences in an alignment, select the region
you want in the first sequence, then hold down Shift / alt (command on a mac) and click
the down arrow to apply the selection to the sequences underneath. Holding down Shift
/Cntrl (alt/option on a mac) while pressing the down arrow will select the sequences in
batches of 10.

Go to position

To jump to a particular base in a sequence you can use Go to base under the Edit menu (for
amino acid sequences, this appears as Go to Residue). This allows for the instant navigation
to a particular nucleotide or amino acid coordinate for any sequence in the current document
selection. It also allows the selection of a particular region of sequence, either for individual
sequences or across sequence lists and alignments, or the selection of particular sequences out
of a sequence list. Formatting examples are given in the setup dialog. Go to Position also
appears next to the sequence viewer when in genome view (see section 5.2.4)

5.2.3 Circular sequences

When a circular sequence is selected, the default view is to display the sequence as circular.
The view can be rotated by using the scrollbar at the bottom or by turning the mouse wheel.
Even though a sequence is circular, you can display it as a linear sequence using the Linear
view checkbox under the General section.

To change a linear sequence into a circular sequence, select the sequence then go to Sequence
→ Circular Sequence, then click Save. This will join the ends of your sequence up to create
a circular sequence, but does not check for overlapping ends. Circularization will affect how
some operations (such as restriction digest and map to reference) deal with the sequences.

To show a fully zoomed out circular overview next to the zoomed in view of a sequence, use
the Circular Overview checkbox under the General section.

The circular overview displays a green box to indicate the region currently visible in the zoomed
in view on the right. All settings in the controls (apart from the types of annotations to show)
have no effect on the circular overview.
76 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

5.2.4 Genome View

The genome view (Figure 5.3) is displayed when sequences larger than 100,000 bp are selected
(either as individual sequences or within a sequence list).

Figure 5.3: The minimap and sequence view of an annotated chromosome under the genome
viewer configuration

The genome viewer contains additional controls which allow for the efficient navigation of
large sequences:

1. The Go to Position button allows for the instant navigation to a particular nucleotide
coordinate for any sequence in the current document selection. It also allows the selection of a
particular region of sequence, or the selection of particular sequences out of a sequence list.

2. A minimap is shown above the sequence viewer which shows a representation of the entire
sequence plus its underlying annotations. The portion of the sequence currently visible in the
viewing window is highlighted with a green box on the minimap, showing the relative position
of the visible section to the overall sequence.

The minimap can also be used to quickly navigate around the visible sequence. Clicking on
a section of the minimap will jump the sequence viewer to center on that position. Double-
clicking the minimap will zoom further in on the clicked section. Finally, highlighting a section
of the minimap using a click-drag-release action will display the highlighted region in the se-
quence viewer.

5.2.5 The Side Panel Controls

The panel to the right of the sequence viewer allows you to control what is displayed in the
sequence viewer (e.g. translations, consensus sequences, graphs and annotations), displays
5.2. THE SEQUENCE VIEWER 77

sequence statistics, and provides functions for finding annotations, ORFs, and restriction sites
on your sequences. A brief description of each tab is given below:

General Options
Contains the color options (see section 5.2.6), check-boxes to turn on and off main aspects
of the sequence view and options for what to display as the name of each sequence.

Display
Contains options for displaying the translation and/or complement of a sequence, and
turning off the original nucleotide sequence. See sections 5.5 and 5.6 for more informa-
tion. This tab is not displayed for protein sequences.

Graphs
This option is visible when viewing nucleotide or protein sequences, chromatogram traces,
sequence alignments or assemblies, and includes graphs for GC content, Identity, Cov-
erage, and Quality. The graphs available for display depend on the type of sequence
you are viewing. More detail on graphs for nucleotide and protein sequences is given in
section 5.2.7 and for alignment and contig graphs is given in section 9.3.2.

Annotations
On sequences containing annotations this tab will show a yellow arrow. It contains con-
trols for turning on and off annotations of each type, customising the way each type is
displayed, and filtering based on annotation name or type. See chapter 8.1 for more in-
formation on working with annotations.

Live Annotate and Predict


Contains real-time annotation generators such as Annotate from Database, Find ORFs
and Transfer Annotations. To use one of these, turn on the check-box at the top of the
generator you want to use and annotations will immediately be added to the sequence.
You can then change settings for the generator and the annotations will change on the
sequence in real-time as you do. If you want to save the annotations permanently on the
sequence click Apply.

Restriction Analysis
This behaves similarly to the Live Annotate & Predict section above. Please refer to chap-
ter 14 for full details.

Advanced
Contains advanced options for controlling the look of sequences and alignments, includ-
ing wrapping, numbering, annotation placement and font sizes. See section 5.2.8 for more
information.
78 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

Statistics

Displays statistics about the sequence or alignment currently being viewed, such as length,
molecular weight and nucleotide, codon and amino acid frequencies. See section 5.2.9 for
more information.

5.2.6 Sequence Colors

The colors of nucleotide and amino acid sequences can be set under the General Options tab
by clicking the drop down menu next to Colors.

Coloring schemes differ depending on the type of sequence. For example, the Polarity and
Hydrophobicity color schemes are available only for protein sequences. Alignments and as-
semblies can be colored by similarity, read direction and paired distance (if paired reads are
used), in addition to standard options.

To change the colors that a particular color scheme uses, click the Edit button then click each
base and select the new color you wish to use.

Similarity Color Scheme

The similarity scheme is used for quickly identifying regions of high similarity in an alignment.

In order for a column to be rendered black (100% similar) all pairs of sites in the column must
have a score (according to the specified score matrix) equal to or exceeding the specified thresh-
old.

So for example, if you have a column consisting of only K (Lysine) and R (Arginine) and are
using the Blosum62 score matrix with a threshold of 1, then this column will be colored entirely
black because the Blosum62 score matrix has a value of 2 for K vs R.

If you raised the threshold to 3, then this column would no longer be considered 100% similar.
If the column consisted of 9 K’s and 1 R, then continuing with the threshold value of 3, the 9 K’s
which make up 90% of the column would now be colored the dark-grey (80%→100%) range
while the single R would remain uncolored.

If instead the column consisted of 7 K’s and 3 R’s (still with threshold 3) then 70% of the column
is now similar so those 7 K’s would be colored the lighter grey (60%→80%) range.

Alternatively, going back to the default threshold value of 1, and with a column consisting of 7
K’s, 2 R’s and 1 Y, now since the 7 K’s and 2 R’s have similarity exceeding the threshold whereas
the Y is not that similar to K and R, the K’s and R’s will be colored dark grey since they make
up 90% of the column.
5.2. THE SEQUENCE VIEWER 79

Hydrophobicity color scheme

This colors amino acids from red through to blue according to their hydrophobicity value,
where red is the most hydrophobic and blue is the most hydrophilic. The values the color scale
is based on are given in Figure 5.4. These values are taken from https://web.expasy.org/
protscale/pscale/Hphob.Black.html

Figure 5.4: Hydrophobicity values for amino acids and corresponding color scale

Polarity color scheme

This colors amino acids according to their polarity as follows:

Yellow: Non-polar (G, A, V, L, I, F, W, M, P)


Green: Polar, uncharged (S, T, C, Y, N, Q)
Red: Polar, acidic (D, E)
Blue: Polar, basic (K, R, H)
80 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

5.2.7 Graphs

The Graphs tab enables you to display a range of additional metrics on your sequences.
The type of graphs which are available depend on the type of sequence you are viewing, and
are listed in the sections below. For information on alignment graphs, see section 9.3.2. The
number control to the right of each graph controls the height of that graph (in pixels).

Sliding window size. Many types of graph use a sliding window when calculating values. This
calculates the value of the graph at each position by averaging across a number of surrounding
positions. When the value is 1, no averaging is performed. When the value is 3, the value of
the graph is the average of the residue value at that position and the values on either side.

The numerical values of the graphs can be exported in csv or wig format by clicking the Export
button in the Graphs tab, or by going File → Export → Graphs.

Nucleotide sequence graphs

GC content. This plots a graph of the GC content of the sequence within a window of specified
length as the window is moved along the sequence. If Frame Plot is checked, the graph shows
the GC content in the 3rd codon position only for each frame, where frame 1 = red, frame 2 =
green and frame 3 = blue.

Chromatogram. This is available with chromatogram traces, and displays the chromatogram
trace above the sequence. If Qual is checked the quality scores are displayed as a blue graph
overlaid on the chromatogram. For more information on viewing chromatograms, see section
5.7.

Stylized DNA Helix. Shows the bases in your sequence as a rotating DNA helix. To turn off the
rotation, uncheck the “Animated” option.

Protein sequence graphs

Amino Acid Charge. This runs the EMBOSS charge tool to plot a graph of the charges of the
amino acids within a window of specified length as the window is moved along the sequence.

Hydrophobicity. This displays the Hydrophobicity of the residue at every position, or the aver-
age Hydrophobicity when there are multiple sequences.

pI. pI stands for Isoelectric point and refers to the pH at which a molecule carries no net elec-
trical charge. The pI plot displays the pI of the protein at every position along the sequence,
or the average pI when multiple sequences are being viewed. The values in this graph are
normalised such that the amino acid with the lowest PI has a value of 0, and the amino acid
with the highest PI has a value of 1, and all other amino acid’s values are interpolated linearly
5.2. THE SEQUENCE VIEWER 81

according to their PI.

5.2.8 Advanced sequence view options

Advanced options for controlling the look of sequences and alignments are under the Ad-
vanced tab. These options are as follows:

Layout:

• Wrap sequence. This wraps the sequences in the viewing area.

• Linear view on circular sequences. This forces circular sequences to be shown linearly.

• Spaces every 10 bases. If you are zoomed in far enough to be able to see individual residues,
then an extra white space can be seen every 10 (or whatever number you choose) residues
when this option is selected.

Properties

• Numbering. Enables the display of base position number above the sequence residues.
For alignments and assemblies, options are available for displaying the numbering of
consensus, reference, alignment and/or all original sequences.

• Mini-map. Enables the display of a mini-map at the top of the sequence viewer which
highlights the currently displayed location in the entire sequence.

• Outline residues when zoomed out. This adds a fine line around the sequence which can
help with clarity and printing.

Annotations

• Labels. This option changes where the labels are displayed on the annotation: “Inside”,
“Outside”, “Inside or Outside” and “None”.

• Overlay when zoomed out. When only a single annotation covers a region, it will be placed
on top of the sequence.

• Compress annotations. This option reduces the vertical height of the annotations on dis-
play. This reduces the space occupied by annotations by allowing them to overlap and
increases the amount of the sequence displayed on the screen.

• Show arrow tips. Displays the directional indicator for an annotation as a large arrow tip.

• Hide excessive labels. This will reduce screen clutter by removing annotation labels which
are too frequent.
82 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

Sizes

Here you can set the font size for bases, labels, names and numbering.

5.2.9 Statistics

The Statistics tab displays statistics about the sequence(s) being viewed. If only part of the
sequence/alignment or assembly is selected then the statistics displayed will correspond to the
highlighted part. The length and number of sequences currently selected is shown at the top
of the Statistics tab.

Several of the metrics displayed in the Statistics tab can also be displayed as columns in the
document table. These include sequence length, % pairwise identity, % identical sites, mean
coverage, molecular weight and several protein statistics such as extinction coefficient and iso-
electric point. The value in the document table will be for the entire document, not the currently
selected region.

Molecular weight and protein statistics were added to the document table in Prime 2021, and
will not appear in the table for documents created in an earlier version of Geneious unless that
document is edited in Prime 2021. If you wish to add these statistics to the document table for
documents created in earlier versions of Geneious, select the folder containing the document
and go to Tools → Preferences → Appearance and Behavior and select Recalculate statistics
now. Note that for performance reasons these statistics are only calculated on documents com-
prised of less than 10,000 bp or aa (this threshold applies to the total number of residues and
gaps over all sequences in a document).

General statistics

• Residue frequencies: This section lists the residues and their frequencies for both DNA and
amino acid sequences, for both single sequences and alignments/assemblies. It gives the
frequency of each nucleotide or amino acid over the entire length of the sequence, includ-
ing gaps. If there are gaps, then a second percentage frequency is calculated ignoring gap
characters. The G+C content for nucleotide sequences is shown as well for easy reference
(see GC content, below).

• Amino acid and codon frequencies: These are listed for nucleotide sequences based on the
current translation options. Click Options to change the translation options. For codon
usage statistics, the frequency of all 64 codons (with their associated amino acid) will
be displayed. If any CDS contains non-standard start codons then some of the 64 codons
may be split into 2 entries based on whether they translate to methionine or their standard
translation.

• Amino acid group frequencies: This section list the frequencies of certain types of amino
acids as groups. Total frequencies and percentage frequencies of non-gap, non-ambiguous
5.2. THE SEQUENCE VIEWER 83

amino acids are given for Acidic (DE), Basic (RHK), Charged (DERHK), Polar Uncharged
(NCQSTYW), Hydrophobic (AGILMPVFW), GC-rich (GARP), and AT-rich (FINKY) groups.
These groupings were taken from Biochemistry 8th Edition (Berg, Tymoczko, Gatto, and
Stryer).

• Rough Tm : A rough calculation of the melting point for a nucleotide sequence using the
following calculations:
If the sequence is less than 14bp in length (Marmur and Doty 1962):

Rough Tm = 4 × GCcount + 2 × AT count (5.1)

If the sequence is greater than 13bp in length (Chester and Marshak 1993):

650
Rough Tm = 69.3 + (0.41 × %GC) − (5.2)
length

• Molecular Weight: For protein sequences, the following values are used for the amino
acids:
A=71.0788 R=156.1875 N=114.1038 D=115.0886 C=103.1388 E=129.1155 Q=128.1307 G=57.0519
H=137.1411 I=113.1594 L=113.1594 K=128.1741 M=131.1926 F=147.1766 P=97.1167 S=87.0782
T=101.1051 W=186.2132 Y=163.1760 V=99.1326 U=150.0388 O=237.3018
For DNA sequences, the following values are used:
A=313.21 T=304.2 G=329.21 C=289.18
The DNA molecular weight assumes no modification of the terminal groups of the se-
quence.
If the sequence is a single-stranded, synthesised oligonucleotide (e.g. by primer exten-
sion), the value is adjusted for the removed phosphate group by using:
Molecular Weight = calculated molecular weight - 61.96
If the sequence is a single-stranded sequence cut by a restriction enzyme, the value is
adjusted for the extra 50 -monophosphate left by most restriction enzymes by using:
Molecular Weight = calculated molecular weight - 61.96 + 79.0
For dsDNA, these values are adjusted for both strands.
For RNA sequences, the following values are used:
A=329.21 U=306.2 G=345.21 C=305.18
The RNA molecular weight assumes no modification of the terminal groups of the se-
quence. For a 50 -triphosphate group, weights are adjusted using
Molecular Weight = calculated molecular weight + 159.0
84 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

• Isoelectric Point: Calculates the isoelectric point of a protein using the bisection method
described at isoelectric.org. Amino acid pKa values were taken from the CRC Handbook
of Chemistry and Physics 90th Edition, with general pKa values for terminal amino and
carboxy groups taken from Biochemistry 8th Edition (Berg, Tymoczko, Gatto, and Stryer).

• Charge at pH 7: Estimates the overall charge of a protein at pH 7.0 using methods de-
scribed at isoelectric.org. Amino acid pKa values were taken from the CRC Handbook
of Chemistry and Physics 90th Edition, with general pKa values for terminal amino and
carboxy groups taken from Biochemistry 8th Edition (Berg, Tymoczko, Gatto, and Stryer).

• Extinction Coefficient: Calculates the extinction coefficient of a protein as per Gill and Hip-
pel, 1989, using the following values for the amino acids and assuming all cysteines are
paired in a disulfide bridge (making cystine): C=62.5 (only counting up to an even num-
ber) W=5500 Y=1490

• A[280] of 1 mg/ml: The correction factor at 280nm, calculated by dividing the extinction
coefficient by the molecular weight.

Statistics for multiple sequences (lists, alignments, assemblies)

• Sequences: The number of sequences in the document, or in the currently selected region.

• Identical sites. When viewing alignments or assemblies this considers only those columns
in the alignment that have at least 2 nucleotides/amino acids/gaps that are not free end
gaps and are not columns consisting entirely of gaps. A column not meeting this re-
quirement is not even counted as non-identical for the percentage calculation. A column
meeting this requirement is considered identical if it contains no internal gaps and all
the nucleotides/amino acids are identical. Ambiguity characters are not interpreted, so a
nucleotide column of A and R is not considered identical.

• Pairwise % Identity. When viewing alignments or assemblies this gives the average per-
cent identity over the alignment. This is computed by looking at all pairs of bases at the
same column and scoring a hit (one) when they are identical, divided by the total number
of pairs. Ambiguity characters are interpreted, meaning a nucleotide A vs a nucleotide R
is considered to have 50% identity.
For both Identical sites and Pairwise % Identity, the statistics are calculated from the subset
of sequences and nucleotides/amino acids selected. If just a single sequence is selected,
the statistics are calculated as if all sequences are selected over the selected columns. The
consensus sequence is always excluded from calculation of both of these values.

• Coverage of Bases. When viewing a contig assembly this gives the mean, standard devi-
ation, minimum and maximum of the coverage of each base in the consensus sequence.
For small contigs the coverage is further broken down into coverage by reads mapped
to the forward and reverse strands. For large contigs, separate forward/reverse coverage
can’t be efficiently calculated, so is displayed as ?. If your contig has a reference sequence,
5.2. THE SEQUENCE VIEWER 85

then the percentage of the ungapped reference sequence that is covered by at least 1 read
is also displayed.
Selecting a sub-region of your contig will display statistics for just that region, including
calculation of separate forward/reverse coverage on large contigs.
For contigs where reads extend outside the bounds of the reference sequence, the docu-
ment table mean coverage is calculated excluding regions outside the reference sequence.
The mean coverage displayed in the contig viewer statistics in this same situation when
nothing is selected includes regions outside the reference sequence. Click on the name
of the reference sequence to select just that region in order to display detailed coverage
statistics over just the region spanned by the reference sequence.

• [Ungapped] Lengths of Sequences. Displays the mean, standard deviation, minimum and
maximum of the lengths of the sequences.

• Confidence (mean). When viewing sequences containing quality scores (e.g. chromatograms
or NGS reads) this gives the mean of the confidence scores for the currently selected base
calls. Confidence scores are provided by the base calling program (not Geneious) and
give a measure of quality (higher means a base call is more likely to be correct). An
untrimmed value is also displayed if the selected region contains trims.

• Expected Errors. When viewing sequences containing quality scores, this gives the approx-
imate number of errors that are statistically expected in the currently selected region. This
is calculated by converting the confidence score for each base call to the error probabil-
ity using the formula 10(−Q/10) . For example, a base with a quality score of 30 will have
an error probability of 0.001. The expected errors value is then calculated by summing
up the error rates for each base. This also has a value for the untrimmed selection if the
region contains trims.

GC content

For documents that are created or modified in Geneious 8.1 or later, the GC content can also be
viewed in the %GC column in the document table.

The %GC column shows the percentage of A, C, G, T, U, S, W nucleotides that are either G,
C, or S. Ambiguous bases that contain a mixture of GC and non-GC bases (e.g. R, Y, M, K) are
excluded from the calculation. This field is available on all nucleotide sequences, contigs, align-
ments, and sequence lists that were created or had their sequences last modified in Geneious
8.1 or later. For contigs and alignments, the consensus sequence and reference sequence (if any)
are excluded from the calculation.

For sequences within an alignment, contig or list, the %GC column only shows the overall
value for the alignment. To see a table of GC percentages for all individual sequences within
an alignment or contig, the sequences need to be extracted to stand-alone sequences. Alterna-
86 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

tively, individual values can be viewed in the statistics panel by clicking on the name of the
sequence to select it.

Sequences in a list or alignment can be sorted by GC content by right clicking in the sequence
viewer and choosing Sort → %GC.

5.3 Customizable text view for sequences

In Geneious Prime 2019.1 onwards, the Text View tab is able to display sequences or alignments
in a customisable rich text or plain text document, which can then be exported or copied and
pasted into a text or Microsoft Word document. Sequences can be displayed in either Genbank
format, or in a custom format where the user can choose the layout and add coloring and
highlighting. For further information on the options available, please see this post on our
knowledge base.

5.4 Editing sequences

To edit sequence(s) or an alignment click the Allow Editing toolbar button.

You can manually enter or delete sequence, or use any of the standard editing operations, such
as Copy (Ctrl/command+C), Cut (Ctrl/command+X), Paste (Ctrl/command+V), Paste With-
out Annotations (Shift+Ctrl/command+V), Paste Reverse Complement and Undo (Ctrl/com-
mand+Z). All operations are under the main Edit menu, or can be accessed by right-clicking in
the sequence view and selecting the option from the popup window.

To insert sequence, click at the position where you want to insert the sequence and type or
paste it in. Normally existing residues will be shifted to the right when you insert sequence; to
get them to shift to the left hold down the Shift key as you insert the sequence. To overwrite
sequence, select the sequence you wish to overwrite and type or paste the new sequence in.

To select sequences or regions of sequence, there are several options:

• To quickly select a single residue, double-click on it.

• To select a block of residues within a single sequence, triple click.

• To select a block of residues across multiple sequences, quadruple click.

• To select a block of 10 residues, hold down Shift and Cntrl (alt/option on a Mac) and
press the keyboard arrow.

• To select a specific region of sequence, click at the beginning of the region you want to
select, hold down Shift and then click the end.
5.4. EDITING SEQUENCES 87

• To select the same region across multiple sequences in an alignment, select the region
you want in the first sequence, then hold down Shift / alt (command on a mac) and click
the down arrow to apply the selection to the sequences underneath. Holding down Shift
/Cntrl (alt/option on a mac) while pressing the down arrow will select the sequences in
batches of 10.

Sequences can be reordered within an alignment by clicking the sequence name and dragging.

Sequences can be removed from an alignment by right-clicking (Ctrl+click on Mac OS X) on


the sequence name and choosing the remove sequence option. Alternatively, select the entire
sequence (by clicking on the sequence name) and press the delete key.

To delete a region of a sequence or alignment, select the region and press the delete or backspace
key. Normally this will move residues on the right into the deleted area. To move the residues
on the left into the deleted area, hold down the Shift key while deleting.

To drag a sequence to the left or right, select the region you want to move, then click it again
and drag it to the position you want. Dragging will either move residues over existing gaps
or open new gaps when necessary. Dragging a selection consisting entirely of gaps moves the
gaps to the new location.

After editing is complete, click Save to permanently save the new contents.

5.4.1 Concatenating sequences

To join several sequences end-on-end, select all the sequences and go to Tools → Concatenate
Sequences or Alignments. This creates a single sequence document from the input sequences.
The order in which sequences are concatenated can be chosen in the setup dialog box, and the
resulting sequence can be circularized if required by checking Circularize sequences. If one or
more of the component sequences was an extraction from over the origin of a circular sequence,
you can choose to use the numbering from that sequence, thus producing a circular sequence
with its origin in the same place as the original circular sequence. Overhangs will be taken into
account when concatenating.

You can also concatenate sequence list or alignment documents. When you concatenate mul-
tiple sequence lists or alignments, sequences from each input document will be matched by
either name or index and concatenated.

Concatenating by name allows you to match sequences in different alignments or sequence


lists that aren’t in the same order. To concatenate by name, sequences to be concatenated must
have exactly the same name, including any spaces or punctuation. Note that names are case
sensitive: H. sapiens and H. Sapiens are considered to be different. The one exception to this rule
is that the special suffices “extraction” and “(reversed)” are ignored.

Concatenating by index allows you to match sequences based on their order in lists or align-
88 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

ments, even if they don’t have the same names. The first sequence across all lists will be con-
catenated together, as will the second and so on. This can be very useful when you have ad-
ditional information appended to your sequence names, such as sequencing read direction or
gene names or accession numbers. You can change the sort order for a list of sequences prior to
concatenating by right clicking on the sequence names and selecting one of the Sort submenu
options.

The number of sequences in the set of alignments or sequence lists you wish to concatenate can
be different; however, if you concatenate by index and sequences from the middle of the list
are missing in some documents, later sequences will be concatenated with the wrong partners.

If you have. . . You should concatenate by:


Sequences in arbitrary order, but matching se- Name
quences have the same names
Sequences in fixed order, but matching se- Index
quences have different names
Sequences in fixed order, matching sequences Name or Index
have the same names
Sequences in arbitrary order, matching se- Sort or rename sequences before con-
quences have different names catenating. The Batch Rename opera-
tion in the Edit menu may be useful.

Examples: Concatenating by Name vs. Index

Input names Result names


Doc 1 Doc 2 Concatenate by name Concatenate by index
A/1 A/2 A/1 A/1 - A/2
B/1 B/2 B/1 B/1 - B/2
C/1 C/2 C/1 C/1 - C/2
A/2
B/2
C/2
Geobacter Vibrio Geobacter Geobacter - Vibrio
Hippea Hippea Hippea Hippea - Hippea
Pelobacter Geobacter Pelobacter Pelobacter - Geobacter
Vibrio Pelobacter Vibrio Vibrio - Pelobacter
Corallococcus Corallococcus Corallococcus
5.5. COMPLEMENT AND REVERSE COMPLEMENT 89

5.5 Complement and Reverse Complement

To display the complement (30 to 50 ) of a sequence (displayed 50 to 30 ), check the Complement


box in the Display tab. Note that the complement option is for display purposes only, you
cannot select or edit the complement sequence. If you wish to create a separate document to
work with the complement (not reverse complement) of a sequence, you will need to install the
Complement or Reverse plugin by going to Tools → Plugins. Once the plugin is installed, go
to Sequence → Complement only to create a new document containing the complement.

To reverse complement a nucleotide sequence (i.e. reverse the sequence direction and replace
each base by its complement), click the R.C button above the sequence viewer, or go to Reverse
Complement under the Sequence menu. You can also access this option right-clicking in the
sequence viewer and selecting it from the popup menu. When you click Save after reverse
complementing, the tag (reversed) will be added to the sequence name.

When only part of a sequence is selected, you can choose to either reverse-complement only
the selected region and extract it to a new sequence document, or reverse complement the
entire sequence. On alignment or contig documents you can reverse complement individual
sequences within the alignment or assembly by selecting that sequence, and choosing reverse
complement selected sequence.

5.6 Translating sequences

The protein translation can be viewed alongside the nucleotide sequence by checking the Trans-
lation option in the Display tab. Select the genetic code and reading frame(s) you require.
You can also choose to translate relative to selection or annotations such as CDS (Figure 5.5).
In an alignment, the sequence frame can be calculated relative to the individual sequences, the
alignment, the consensus or a specific reference sequence. On a contig or alignment, the trans-
lation can be displayed on the consensus and reference sequence only, or it can be displayed on
all sequences.

To display the amino acid numbering on the sequence, check the box Show amino acid num-
bering. The amino acid numbering is displayed below the sequence.

To show three letter rather than single letter amino acids, check the box Three letter amino
acids. Translations which have two possibilities will always show as single letter amino acids.
e.g. V/M. For stand-alone protein sequences, there is no option whether to show three letter
codes or not. Instead, three letter codes are automatically displayed when the zoom level is
200% or higher and when not in editing mode.

If you wish to view only the translation and turn off the nucleotide sequence, uncheck Nu-
cleotides. However, this is only for display purposes: if you wish to work with the translation
in downstream analysis you must extract it to a separate document using the Translate but-
90 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

ton above the sequence viewer. The Translate button will create new protein document from
the translated DNA, using your choice of reading frame and genetic code. This option can also
be accessed from the Sequence menu.

To copy the translation of a selected region of a nucleotide sequence, right click on the selected
sequence and choose Copy Translation from the menu. The copied translation can then be
pasted into the New Sequence box or into a document outside of Geneious.

Figure 5.5: Translating a CDS

Note: If a CDS annotation contains an internal stop codon, it will be assigned as a premature
stop codon and will be represented by an asterisk (*) in the translation view. The CDS after the
premature stop codon will appear faded yellow, to represent the truncated protein sequence
(see Figure 5.6)

Figure 5.6: An internal stop codon within a CDS annotation


5.7. VIEWING CHROMATOGRAMS 91

5.6.1 Genetic Codes

Geneious Prime supports a range of genetic codes which can be chosen from the drop-down
menu in the Translation options. To set a default genetic code which will apply to all documents
in your database, click the Settings cog next to the drop-down menu. You can also enter a
custom genetic code here by clicking Add and editing an existing genetic code template.

5.6.2 Back Translating

To create a nucleotide sequence from a protein document, go to Sequence → Back Translate.


Ambiguous back-translation uses a specific genetic code to produce a nucleotide sequence
with ambiguous bases, so that every possible codon is represented for each amino acid. Un-
ambiguous back-translation uses codon usage tables to produce a nucleotide sequence where
the most frequently used codon for that organism is used for each amino acid. Codon usage
tables for some organisms are provided, or you can you can import custom codon usage tables
in GCG CodonFrequency and EMBOSS cusp formats. For further instructions, see How do I
create a custom codon usage table?.

5.7 Viewing chromatograms

Geneious Prime can view chromatogram information from files imported in .ab1 or .scf format.
If the chromatograms are not visible, check Chromatograms under the Graphs tab (see Figure
5.7).

Chromatogram files are produced from sequencing machines such as the Applied Biosystems
3730 DNA analyzer. The raw output of a sequencing machines is known as a trace, a graph
showing the concentration of each nucleotide against sequence positions. The raw trace is pro-
cessed by a “Base Calling” software which detects peaks in the four traces and assigns the most
probable base at more or less even intervals. Base calling may also assign a quality measure
for each such call, typically in terms of the expected probability of making an erroneous call.
Geneious does not perform base-calling itself: this information is already contained in the .ab1
or .scf file.

Chromatogram peaks for individual bases can be turned off by checking the A/G/C/T boxes
in the Graphs tab. Note that since the distance between bases as inferred from the trace varies
the trace may be either contracted or expanded compared with the raw data. The vertical scale
of the chromatogram can be adjusted by clicking and dragging on the graph itself. The total
height of the graph can be adjusted by increasing the number displayed next to the graph on
the right of the Sequence View.

Quality. The quality scores associated with a chromatogram can be viewed by checking the
Qual box under the Chromatogram graph options. This displays a quality measure (typically
92 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

Figure 5.7: A chromatogram file with quality scores enabled, and poor quality regions marked
with pink trim annotations

Phred quality scores) for each base as assessed by the base calling program. The quality is
shown as a shaded blue bar graph overlaid on top of the chromatogram. Note that those
scores represent an estimate of error probability and are on a logarithmic scale - the highest
bar represents a one in a million (10−6 ) probability of calling error while the middle represents
a probability of only a one in a thousand (10−3 ).

Poor quality regions of the chromatogram can be trimmed using Annotate and Predict → Trim
Ends (see section 10.2.2).

Highlighting. When this is turned on, chromatogram peaks that do not match their base call
are colored, and all other peaks are greyed out.

To view the raw chromatogram traces, click the Chromatograms tab above the sequence viewer.
In this view, the exact location of the base call can be viewed by checking Mark calls. To view
sequence logos indicating base quality in this view, check Scale by confidence. The Trace op-
tions for X and Y scales allow you to zoom in on the X or Y axes, respectively.

5.7.1 Binning by quality

Chromatograms can be binned on the basis of their quality scores into Low, Medium or High
quality bins. The parameters for each of these bins are set under Tools → Preferences → Se-
quencing (see section 1.2.1 for more information). To see the bin for a trace, add the Bin column
to the document table by going to View → Table Columns. You can also view the percentage
5.8. VIRTUAL GEL 93

of bases that are low, medium or high quality by adding LQ%, MQ%, and HQ% columns to
the document table.

5.8 Virtual Gel

The Virtual Gel tab will appear above the sequence viewer when nucleotide documents con-
taining fewer than 50 sequences or 1000 restriction sites are selected.

If a single sequence annotated with restriction enzyme sites is selected, the gel displays the
fragments that would result from that restriction digest. If the restriction sites are all from a
single enzyme, the digest pattern is shown in one lane of the gel. If multiple different restriction
enzymes are annotated, the results of digestion with each enzyme will be shown in different
lanes if Digest by single enzymes is checked. If this option is unchecked, the result will be
shown as a multiple enzyme digest in a single lane (see Figure 5.8).

If multiple sequence documents containing restriction sites are selected (either by bulk-selecting
individual sequence documents, or selecting a sequence list), digests for each sequence are
shown in separate lanes on the gel (a maximum of 50 sequences can be shown).

The fragment sizes of all bands shown on the gel can be viewed in the Fragments table or by
clicking Show/Export Fragments table. This table is only available for sequences with anno-
tated restriction sites.

For sequences which do not have annotated restriction sites, these will be shown in individual
lanes on the gel if the sequences are in separate documents, or in one lane of the gel if the se-
quences are contained in a list (for example the results of the Digest into Fragments operation).

By default, sequences will appear on the gel in the order they are displayed in the document
table or sequence list. To change the order of display, simply drag and drop the sequences on
the gel to the desired position.

Note that if uncut circular DNA is selected (for example a plasmid sequence with no restriction
sites), the band on the gel will show the size of the linear fragment, rather than the supercoiled
DNA size. Depending on the size of your DNA and the buffer you use, uncut circular DNA
may migrate differently to linear DNA.

5.8.1 Ladders

Geneious contains a built-in list of common ladders which can be selected from the Ladders
drop down menu to the right of the viewer. To add another ladder, click the ? next to the
Ladders menu, then Edit Ladders File. This brings up a text editor where you can add your
ladder to the existing ladders file.

Ladders must be added in the following format:


94 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

Figure 5.8: Virtual gel showing the digest of a single sequence annotated with PvuII and DraI
restriction sites. In A the results are shown as though the sequence is digested with both en-
zymes in one reaction; B shows the results of digesting the sequence with each enzyme sepa-
rately.

Name: BP or KBP: fragment sizes separated by commas

For example:
10bp DNA Ladder : BP : 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 250, 330

Note that fragment sizes must always be listed in base pairs. The designation BP or KBP refers
only to the display (KBP looks better for large numbers over 100 kbp). Once you have added
the ladder you want, click Save to record the changes.

5.9 Meta-data

Meta-data is additional information you can add to any of your local documents, for example
sample collection history, organism identification, primers used etc. Meta-data is added in
the Properties view under the Info tab (see Figure 5.9). This tab displays standard properties
of documents such as the name and description, plus any meta-data you have added. Any
meta-data that you add can appear in a column in the document table, and can be treated as a
user-defined field for use in sorting, searching and filtering your documents.

When multiple documents are selected, the Properties view displays all of the fields and meta-
data belonging to the selected documents. When all documents have the same value for a field,
5.9. META-DATA 95

it is displayed in the viewer. If the documents have different values, or some of the selected
documents do not have a value, then the field will show that it represents multiple values.
Changes made to the fields will apply to all selected documents.

You can add meta-data to any of your local documents, including molecular sequences, phy-
logenetic trees and journal articles. You cannot add meta-data to search results from NCBI or
EMBL etc until the documents are copied into one of your local folders.

5.9.1 Adding meta-data

To add meta-data to your document, select the Add Meta-Data button on the toolbar and then
choose from the available types. Selecting a meta-data type will create an empty instance of that
type. To fill meta-data values just start typing into the fields. See section 5.9.3 for information
on how to create a new meta-data type.

Figure 5.9: The Properties View


96 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES

5.9.2 Editing Meta-Data

To edit existing meta-data fields, simply click on the field and enter your data. Some fields may
have constraints (which you can edit in the Edit Meta-Data Types dialog, (see 5.9.4). If the data
you have entered does not conform to the constraints of the field, it will be displayed in red
and you will be shown the field’s constraints in a tooltip.

Tip: To enter a new line in a text field, press Shift+enter or Ctrl+enter

5.9.3 Creating a new Meta-Data type

Geneious Prime does not restrict you to the meta-data types that it comes with. You can create
your own types to store any information you want.

To create a new type, click Edit meta-data types, then click the Create button at the bottom left
of the panel. This creates a new type, with one empty field, and displays it in the panel to the
right.

Note. The Name and Description fields distinguish your meta-data type from other user-
defined types. They do not have any constraints.

Next, you need to decide what values your Meta-Data Type will store by specifying its fields:

Field name. This defines what the field will be called. It will be displayed alongside columns
such as Description and Creation Date in the Documents Table. You can have more than one
Field in a single Meta-Data Type - to add or remove a field from the type, click the + or - buttons
to the right of the field.

Field type. This describes the kind of information that the column contains such as Text, Integer,
and True/False. The full list of choices is shown in figure 5.10.

Constraints. These are limiting factors on the data and are specific to each field type. For ex-
ample, numbers have numerical constraints – is greater than, is less than, is greater or equal
to, and is less or equal to. These can be changed to suit. The constraints for each field can be
viewed by clicking the “View Constraints” button next to the field. This will show a pop-up
menu with the constraints you have chosen. (see figure 5.10)

5.9.4 Editing Meta-Data Types

To edit meta-data types, e.g. by adding and removing fields, click Edit Meta-Data Types. Select
the meta-data type you want to edit, and then add, remove or edit the fields as described in
section 5.9.3.
5.9. META-DATA 97

Figure 5.10: The Edit Meta-Data Types and Constraints windows

5.9.5 Using Meta-Data

The main purpose of meta-data is to add user defined information to Geneious documents.
However, meta-data can be searched for and filtered as well. Also, documents can be sorted
according to meta-data values.

Searching - Once meta-data is added to a document, it is automatically added to the standard


search fields. These are listed under the Advanced Search options in the Document Table (see
section 4.2.2). From then on, you can use them to search your Local Documents. If you have
more than one Field in a meta-data type, they will all appear as searchable fields in the search
criteria.

Filtering - Meta-data values can be used to filter the documents being viewed. To do so, type a
value into the Filter Box in the right hand side of the Toolbar (see section 4.2.3). Only matching
documents will be shown.

Sorting - Any meta-data fields added to documents will also appear as columns in the Docu-
ment Table. These new columns can be used to order the table.
98 CHAPTER 5. CREATING, VIEWING AND EDITING SEQUENCES
Chapter 6

Parent / Descendant Tracking

Many documents in Geneious Prime are the output of an operation run on a set of input docu-
ments. The input documents of the operation are known as the parents of the output, and the
output documents the descendants (or children) of the input. Those parent documents may
themselves be the descendants of other documents, each with their own parents, and so on. In
many situations it is useful to preserve this hierarchy, so that future alterations, for example the
re-calling of a base, or the addition of a new annotation, can be transferred downstream to the
molecules affected by this change in a parent.

An active link between a child and its parents means that when you modify any of the parent
documents, you will be given the choice of propagating these changes to the child. When
this modification affects a part of the parent involved in creating the child, the change will be
immediately visible in the child. Modifications include things like editing the residues of a
sequence, adding new annotations, or changing the meta-data associated with the document.

Propagating a change to a parent document causes Geneious to rerun every operation that
links that parent actively to one or more child documents, with the altered parent document
(and any other parents) as input. Geneious stores the options that the operation was originally
run with so that it can reproduce the original operation’s conditions exactly, and afterwards
matches up the newly regenerated child documents with any former children, and replace
their contents where possible.

Occasionally, one or more of the parent documents has been altered to a point where an op-
eration can no longer be rerun, or a necessary parent document is inaccessible. In this case,
Geneious will inform you of the failure, and attempt to be as specific as possible about the
cause of the failure (Figure 6.1)

Inactive links do not propagate changes from parent to child. Inactive links are created in two
different ways; firstly, when you choose not to propagate changes, that active link becomes
temporarily inactive. Secondly, if an operation does not support creation of active links, or was
told not to create them, all links between its parents and children will be permanently inactive.

99
100 CHAPTER 6. PARENT / DESCENDANT TRACKING

Figure 6.1: Failure to propagate an Extract PCR product operation due to a missing forward
primer

All operations in Geneious Prime at least create inactive links.

The following operations in Geneious Prime can produce actively linked documents:

• Cloning: Digest into Fragments, Ligate Sequences, Restriction Cloning, Gibson Assembly,
Gateway Cloning, Golden Gate, TOPO Cloning

• Primers: Extract PCR product

• Sequence Viewer: Paste with Active Link from the right click menu (see copy-paste
cloning, section 14.11)

• Sequence Viewer: Extract and Translate

Note: Extract and Translate will not create active links by default. To do so, you must select
the Actively link source and extracted documents checkbox in the relevant dialog (see Figure
6.2), otherwise they will be created with permanently inactive links.

Figure 6.2: Extract dialog with active link checkbox


6.1. EDITING LINKED PARENT DOCUMENTS 101

6.1 Editing Linked Parent Documents

When you make changes to a document that is the parent of another document, you will be
given the opportunity to either propagate the changes to the descendants, deactivate the link
(which can later be reactivated, see Lineage View, Section 6.3), or save the changed document
as a new copy (Figure 6.3). You may also simply back out of this process by choosing to cancel,
which will return you to your unsaved changes. Note that if you choose to deactivate the link,
this dialog will not be displayed upon subsequent saves of the parent document, unless the
link is reactivated again at some future time.

Figure 6.3: Actively Linked Descendants dialog

In order to aid with your decision making, the dialog allows you to view the document’s de-
scendants in a smaller, cut down version of the Lineage View. Pressing the View Descendants
button will bring up this view (Figure 6.4).

Figure 6.4: Descendants view


102 CHAPTER 6. PARENT / DESCENDANT TRACKING

6.2 Editing Linked Child Documents

Only annotation modifications can be propagated from descendant (child) documents back to
their parents. If you make a non-annotation edit, such as an edit to the sequence itself, you
will immediately be warned that in order to save your changes you will need to deactivate
this link. As with the Actively Linked Descendants view, you will be given the opportunity
to view the document’s lineage. Editing a document that is a descendant of other documents
is usually unintentional; however, in some circumstances you may simply be interested in the
output documents of an operation (not the parent-descendant relationship), and as such you
may hide this dialog (Figure 6.5).

Figure 6.5: Actively linked parents dialog when making a non-annotation modification to a
descendant document

6.2.1 Editing annotations on linked child documents

Annotations on actively linked child documents can be added, deleted or edited without break-
ing links to the parent document. However, there are some limitations to this:

• If residues in the parent document are modified in a way that makes it unclear where the
annotations belong on the child document they might disappear (e.g. the residues in the
parent document are deleted and the changes are propagated to the child).

• If an annotation is deleted from a child document then the same annotation is modified
on the parent document, the annotation will reappear on the child document.

• An annotation which is deleted from a child will remain deleted on the child if the parent
annotation is deleted and an identical annotation recreated on the parent.

• An annotation which is edited on a child and then edited on a parent, will diverge into
two separate annotations.
6.2. EDITING LINKED CHILD DOCUMENTS 103

If you try to add, delete or edit an annotation on a child document, you will be warned that
you must choose whether to keep links, deactivate links or save a copy to keep your changes
(Figure 6.6).

Figure 6.6: Making annotation modifications to a descendant document

If you choose to Continue Annotation Editing and then save your changes to the child doc-
ument you will be asked whether you want to keep the links active, deactivate the links or
save a copy without the parents (Figure 6.7).

Figure 6.7: Saving annotation modifications to a descendant document

Keep active links to parents will allow you to maintain the lineage, but you might lose the
annotations on your child document if the parent document is edited in a way that causes the
annotation on the child to become unclear, as explained above. An active link note (as shown
below) will be added to any annotation added or edited on a child document that retains the
active link to its parent.

Deactivate links to parents will save the annotation changes to your child document, but will
deactivate the links to the parent so that any changes you make to parent documents will no
longer be propagated to the child.
104 CHAPTER 6. PARENT / DESCENDANT TRACKING

Save as copy without parents will preserve the original file with the links to the parents. The
original linked file will not contain your edited annotations if you choose this option, but the
saved copy will bear your annotation modifications and will no longer be linked to the parent
document.

6.3 The Lineage View

Every document that is linked (actively or otherwise) to another document has a tab called
Lineage above the sequence viewer. The lineage view allows you view parent-descendant
relationships, manage links, and navigate between documents (Figure 6.8).

Figure 6.8: The Lineage View

All active links appear as green text, whilst inactive links appear as black text and the docu-
ment currently being viewed (and which is the root of the parents tree and the descendents
tree) appears in blue. Each’s document’s name is displayed along with an icon (similar to the
document table) denoting what type of sequence it is.

Also displayed in the viewer are the operations that generated each set of children, along with
the time at which the operation was run and the type of operation. For supported cloning
operations, the options used when an operation was run can be viewed by clicking the Show
Options button. If preferred, operations can be hidden by unchecking the Show Operations
6.3. THE LINEAGE VIEW 105

checkbox, providing a layout which is akin to Vector NTI® . You can also choose to view only
active links by unchecking the Show Inactive Links checkbox. This will hide all inactively
linked documents, as well as those documents’ parents or descendants. This means that you
will only be viewing documents that are directly affected by one currently being viewed.

You can reactivate temporarily deactivated links from the view by right-clicking (Windows,
Linux) or control-clicking (MacOS) on a document and choosing Activate link to parent/child
from the context menu (Figure 6.9). Alternatively you can reactivate links to all children at once
by choosing Show Operations and right- or control-clicking on the operation, then selecting
Reactivate all links for this operation. You may also manually deactivate links in this fashion.

Figure 6.9: Context Menu

Note: reactivating links immediately reruns the operation; depending on the size and type of
the operation, this can be time consuming. Also note that reactivating will cause any unsaved
changes to any direct or indirect descendants to be overwritten, since this involves a complete
recompute from the parent documents. You will be warned about this before Geneious allows
you to reactivate.

To export the currently selected document (highlighted in blue in the view) directly from the
lineage view, use the Export Documents button. Doing so will bring up a dialog (Figure 6.10).
From here you can choose to export parents or descendants only, or both, as well as choose
to export only those documents that are actively linked in the hierarchy. Similarly to how
unchecking the “Show Inactive Links” checkbox works, unchecking “Inactively linked docu-
ments” here will mean that the export will stop as soon as it finds in inactively linked parent
or descendant (depending on the relevant direction), and stop exporting down that branch of
106 CHAPTER 6. PARENT / DESCENDANT TRACKING

the lineage.

The Save as HTML option saves the parent and descendant trees as a bulleted list in html
format, allowing you to open the contents of the lineage view in a web browser.

Figure 6.10: Export Dialog


Chapter 7

RNA, DNA and Protein Structure


Viewer

7.1 RNA/DNA secondary structure fold viewer

By default this viewer is only shown when an oligo sequence is selected. If you wish to use
RNA fold on a non-oligo sequence, go to Tools → Preferences → Appearance and Behavior
and enable the option Show DNA/RNA fold view on all sequence. This will show the tab for
any sequence less than 3000 bp. If the selected sequence is DNA, the tab will be labelled DNA
Fold and if it is RNA it will be labelled RNA Fold (Figure 7.1).

The fold prediction is performed by the Vienna package RNAfold tool. Information on the
options for this tool can be found at the following web page: http://www.tbi.univie.
ac.at/˜ivo/RNA/RNAfold.html.

The View Options allow you to turn off/on and color the bases, flip the coordinates, highlight
the start (blue) and end (red) of the sequence and rotate the model. As with other viewers, you
can zoom in on the model and drag the view around, or use the scrollwheel using the same
keyboard modifiers as the sequence viewer. Selection is synchronized between the sequence
view and the fold view. In addition, when in split view mode, the fold viewer will scroll to the
selected area when zoomed in.

By default, Color by probability is used where red bases are the ones with the strongest proba-
bility of the bases being paired with each other in paired regions, or being unpaired in unpaired
regions. Green is the middle ground and blue is the lowest probability. Color by probability is
only available when using the Partition Function.

Compute Options will rerun RNAfold when you change their settings, so depending on the
size of the sequence there may be a noticeable recompute time.

107
108 CHAPTER 7. RNA, DNA AND PROTEIN STRUCTURE VIEWER

Figure 7.1: A view of an 16S SSU secondary structure prediction in Geneious Prime

7.2 3D protein structure viewer

For molecular structure documents, such as PDB documents, this displays an interactive three
dimensional view of the structure.

7.2.1 Structure View Manipulation

• Click and drag the mouse to rotate the structure.

• Hold the Alt or Shift key then click and drag to zoom in/out

• Hold the Ctrl key then right-click and drag to pan, or, if you are using a Mac, click and
hold, press Ctrl and Alt/Option then drag to pan.

7.2.2 Selection Controls

To the right of the structure are controls that let you control the selected part of the structure.

• If the structure you are viewing contains more than one model, the model combo box will
you choose between them.

• The select button lets you select all, none or the nonselected region of the structure, as
well as by element, group type or secondary structure.

• The highlight selected checkbox lets you select whether to highlight the selected atoms
in the structure view.
7.2. 3D PROTEIN STRUCTURE VIEWER 109

• The structure tree shows the atoms in the structure in a tree format. Click on regions in
the tree to select those regions. You can also Shift-click and Ctrl-click to select multiple
regions at once.

• The command box lets you type in arbitrary jmol scripting commands. To see some exam-
ples, select one of the pre-populated options in the box’s drop-down. For a complete de-
scription of the commands you can use, see http://www.stolaf.edu/academics/
chemapps/jmol/docs.

Figure 7.2: A view of a 3D protein structure in Geneious Prime

7.2.3 Display Menu

At the top of the viewer is the display menu. Here you can modify the appearance of the
structure.

• Reset lets you reset the position of the structure, reset the appearance of the structure to
the default, or reset the appearance of the structure to its appearance when it was last
saved.
110 CHAPTER 7. RNA, DNA AND PROTEIN STRUCTURE VIEWER

• Color lets you change the color scheme of the selected region of the atom.

• Style lets you change the style of the selected region of the molecule eg to spacefill or
cartoon view.

• Atoms lets you hide atoms or change their size in the selected region of the molecule. You
can also choose whether to show hydrogen atoms and atom symbols.

• Bonds lets you hide bonds or change their size in the selected region of the molecule.
Covalent/ionic bonds, hydrogen bonds and disulfide bonds can be affected separately.

• Effects lets you toggle spin, antialiasing, stereo and slabbing effects for the whole molecule.

• Save saves the current appearance of the molecule.


Chapter 8

Working with Annotations

8.1 Viewing, editing and extracting annotations

Annotations are used to describe and visualize features, such as coding regions, restriction sites
and repetitive elements, on sequences and alignments. Annotations can either be annotated
directly on a sequence in the sequence viewer, or they can be grouped logically into tracks. A
track is a collection of one or more annotation types. Tracks are stacked vertically underneath
the sequence in question, with a separate line for each track and its annotations.

An annotation may have one or more properties or qualifiers associated with them. These can
be added at the time an annotation is created, or at a later date by editing the annotation. To
view the properties of a given annotation, mouse over it in the sequence viewer. This will
display a tooltip, listing the Name, Type, Length, Interval and Sequence for that annotation
plus any additional qualifiers (see Figure 8.1)

Figure 8.1: Annotation properties and qualifiers, displayed by mousing over an annotation

111
112 CHAPTER 8. WORKING WITH ANNOTATIONS

8.1.1 Viewing and Customizing Annotations: The Annotations and Tracks tab

If a sequence contains annotations, the the annotation types present on the sequence will be
listed in the Annotations and Tracks tab to the right of the sequence viewer denoted by the
arrow icon (see figure 8.2). Annotations that are directly on the sequence are listed first,
followed by annotations on tracks underneath. Tracks with only one annotation type will show
a single listing, whilst tracks with multiple annotations will show a list of the annotation types.

Figure 8.2: The annotations options in the sequence viewer

Individual annotation types can be turned on or off using the checkboxes to the left of the
annotation type, or all annotations can be turned off by unchecking Show Annotations at the
top of the panel. Note that turning an annotation type off does not remove it from the sequence,
it only hides it from view.

Directly beneath the Show Annotations box is a filter text field. Typing a term in this field will
highlight any annotations that contain the entered text in their name or qualifiers. To filter for
a term in a specific field, click the down arrow to the right of the box and choose the field and
term you wish to search for.

To quickly find and move between instances of a particular annotation type on a sequence,
click the small left/right buttons to the right of each annotation type. This will move the
selection in the sequence view to the next or previous instance of that annotation type. This is
useful for navigating large genomes or assemblies.
8.1. VIEWING, EDITING AND EXTRACTING ANNOTATIONS 113

To customize the way an annotation type is displayed, click on the preview of the annotation
arrow in the Annotation and Tracks window. This will bring up a popup menu
containing the following options:

• Show only [type] annotations: Turns off all other annotation types and shows only this type.

• Show [type] labels: When this is unchecked, only the annotation arrow is shown on the
sequence and not the annotation name.

• [type] labels >: Allows you to customize which property of the annotation is displayed as
the name.

• Show above/outside sequence: Moves the annotation so that it is sitting above or outside the
sequence, rather than on or below it.

• Edit Color: Allows you to change the color of the annotation arrow.

• Edit all [type] annotations: Allows you to change the name, type or properties of a given
annotation type, or move the annotation to or from a track. This applies the change to all
annotations of this type.

• Delete all [type] annotations: Deletes all annotations of a particular type.

The popup menu for individual tracks has an additional option Color by / Heatmap. This
will color annotations on that track according to the contents of a qualifier field, enabling the
creation of annotation heatmaps by using a score value (or some other metric) stored in the
qualifier of an annotation.

The way annotations are drawn on the sequence can be further customized in the Advanced
tab of the sequence viewer (see section 5.2.8).

8.1.2 The Annotations Table

The Annotations tab appears above the sequence viewer whenever sequences containing an-
notations are selected. It displays each annotation as a row in a table, with columns corre-
sponding to the qualifiers for the annotations. Selection of annotations is synchronised with
other viewers, such as the sequence viewer and dotplot.

To change what is displayed in the annotations table, use the buttons above the table:

• Types allows you to specify what annotation types are displayed in the table.
The Select One button in the menu is a quick way to view just one type while also select-
ing the relevant columns for that type. Relevant columns are deemed to be ones where at
least one annotation of that type has a value for the column.
114 CHAPTER 8. WORKING WITH ANNOTATIONS

• Tracks allows you to specify what tracks are displayed in the table.

• Columns allows control over which columns are visible in the table.

To further filter what is visible in the table, use the filter box at the top right of the table.
Filtering is only done against the currently visible columns for each annotation.

To export the visible rows and columns of your annotations table, click Export table. This
exports the table to a CSV (comma-separated values) file.

The Extract and Translate buttons will create a new document from the selected annotation(s).
Extract extracts the region of the selected annotation to a new document. Translate translates
the nucleotides in the region of the selected annotation into amino acids, using your choice of
reading frame and genetic code, and saves it to a new protein document.

Annotations table functions can also be accessed via a popup menu when right-clicking on
one or more selected annotations in the table. This menu contains options for copying the
selected value, extracting, translating, showing on sequence, editing and deleting the selected
annotations. The show on sequence function in this list will show the selected annotations in
the sequence viewer.

8.1.3 Editing annotations

Annotations can be edited by selecting them either on the sequence or from the annotations
table, and clicking Edit Annotations. This brings up a window where the annotation name,
type, location, properties and intervals can be edited (see Figure 8.3).

To move the annotation onto a track, click the Track option and either choose an existing track,
or type in the name of a new track you want to create. To move the annotation from a track to
the sequence itself, choose No track in this setting.

In the Properties section, properties can be added, edited, removed or moved up and down in
the list by clicking the buttons to the right. The annotation color can also be set in this section
by clicking the color boxes. The Override color sets the color for that particular annotation
only but does not change the color of other annotations of that type. To change the color of all
annotations of that type, click the Color box.

The Intervals section shows where the annotation is located on the sequence. To change the
location, the direction, or mark an annotation as partial, select the interval and click Edit. The
direction of the annotation can be changed by clicking the colored arrow. Check truncated left
end to mark an annotation as partial at the left end (e.g. if a gene is incomplete at the left end
when viewed on the sequence), or truncated right end to denote that the feature is incomplete
at the right end.
8.1. VIEWING, EDITING AND EXTRACTING ANNOTATIONS 115

Figure 8.3: The Edit Annotations Window

Batch editing annotations

Multiple annotations can be edited at once by selecting the annotations while holding down
the cntrl/command key, and clicking Edit Annotations. Only properties of the annotations can
be edited in batch mode, not intervals.

Where properties contain different values in different annotations, Multiple values will be shown
next to that property. To edit the property so that it is the same for all annotations, just type in
the new value that you want to apply to all the annotations. To apply different values for dif-
ferent annotations you can use the copy icon to the right of the annotation name or qualifier.
This allows you to copy values from another annotation or annotation property according to
the options below:

• Sequence property allows you to copy a property from the sequence containing the anno-
tation.

• Same annotation property allows you to copy another property from the same annotation.
116 CHAPTER 8. WORKING WITH ANNOTATIONS

• Another annotation (in intersecting location) allows you to copy the annotation property
from another annotation on the same sequence with intersecting intervals (for example
copying a property from a CDS annotation to the Gene annotation for that gene on the
same sequence).

• Another annotation (in the same location) allows you to copy the annotation property from
another annotation on the same sequence with the same intervals and direction.

Figure 8.4 shows an example where a new ”product” property is added to all selected annota-
tions, and the value is taken from the annotation name.

Figure 8.4: Batch editing annotations

8.1.4 Extracting Annotations

To extract an annotation to a separate document, select it either on the sequence or in the an-
notations table and click Extract. If you want to actively link the extracted annotation to
the source document (so that changes on the source document are propagated to the extracted
document), check Actively link source and extracted documents.
8.1. VIEWING, EDITING AND EXTRACTING ANNOTATIONS 117

If the annotation you are extracting contains multiple intervals, the intervals can be concate-
nated into a single sequence. If this option is not selected, then each interval in the annotation
will be extracted to a separate sequence and grouped into a list.

Bulk extraction

Bulk extraction of annotations can be done in two ways:

1. Select all the annotations you want to extract, either on the sequence or in the Annotations
table and click Extract. As with multi-interval annotations, you are given the option to
concatenate all the annotations into a single sequence.

2. Go to Extract Annotations under the Tools menu. Using this interface, all annotations
on the selected sequences which match certain criteria (e.g. a particular annotation type
or gene name) can be extracted in bulk, without needing to select the annotations on the
sequence first. To define what annotations to extract, select the value of the annotation
type or property (qualifier) that you want to extract by in the chooser (see Figure 8.5). To
set more than one criteria, click the + button to add an additional row of options, and
choose to either Match all... or Match any... of the criteria.

Figure 8.5: The Extract Annotations Interface

What to Extract allows you to set which part of the sequence to extract (e.g. just the
annotated region, or the entire sequence) based on the criteria you have set. To extract
regions of sequence upstream or downstream of the annotated sequence, enter the length
of the additional sequence you want to extract under Extraction Context. Intersecting
118 CHAPTER 8. WORKING WITH ANNOTATIONS

Annotations allows you to set what to do with other annotations that don’t match the
criteria, but which overlap with the matched region.
If there are multiple annotations on one sequence which match the criteria, these can be
concatenated into one sequence by checking Concatenate regions within each sequence.

8.2 Adding annotations

Geneious Prime has many functions for adding annotations to sequences. They can be added
manually, imported from external sources, transferred from other sequences, or added as part
of structure or gene prediction steps. Each of these options is described in the sections below.
For more information on viewing and editing annotations, see section 8.1.

8.2.1 Manual creation of annotations

To create an annotation on your sequence or alignment, select the region of the sequence where
you wish to place the annotation and click the Add Annotation button. In the Add Annota-
tion dialog enter an annotation name and select a existing type or type a new one. If you wish
to put the annotation on a track rather than directly on the sequence, either choose an existing
track from the drop-down menu, or type in a new track. Expand the Properties section to enter
additional properties for that annotation. In the Interval section you can adjust the position of
the annotation, add an additional interval, or mark the annotation as truncated at the 5‘ or 3‘
end.

8.2.2 Importing annotations from external sources

BED files, GFF files and VCF files contain annotation information which can be imported into
Geneious Prime. These files often do not contain the sequence itself, so when you import the
file you will be prompted for the reference sequence as shown in Figure 8.6.

Note that if you choose “use a sequence in the selected folder”, or you have a sequence list
or more than one sequence selected and choose “use a sequence in the selected documents”,
then your sequence’s name in Geneious must match the sequence name in the first column of
the BED, GFF or VCF file. If you select a single sequence and choose “use a sequence in the
selected documents” then your annotations will be imported onto that sequence regardless of
whether it matches the sequence name in the file.

For more information on these file formats, see:

• BED format: http://genome.ucsc.edu/FAQ/FAQformat.html#format1


• GFF format: http://www.sanger.ac.uk/resources/software/gff/spec.html
8.2. ADDING ANNOTATIONS 119

Figure 8.6: Selecting a reference sequence when importing BED, GFF or VCF files

• VCF format: http://samtools.github.io/hts-specs/VCFv4.2.pdf

Note that in BED format the first base is numbered 0 rather than 1, and Geneious accounts for
this when it imports the file so your annotations will be shifted 1 bp to the right compared to
their positions listed in the BED file.

8.2.3 Copying or transferring annotations from other sequences

Copy to...

You can copy annotations from one sequence to other sequences in the same alignment or
assembly document by right-clicking on the annotation and choosing Annotation → Copy to.
This will give you the option of either transferring the annotation to the consensus sequence,
reference sequence (if there is one), or any of the other sequences in the alignment or assembly.
You can also use the associated options: Copy all x to... to copy all annotations of the currently
selected annotation type on the selected sequence; and Copy all in selected region to... to copy
all annotations in the selected sequence. In each case, the annotation(s) will be copied across
regardless of the similarity between the sequences.

Transfer Annotations

Using Transfer Annotations, you can copy annotations to the reference and/or consensus se-
quence of an alignment or assembly. This function can be accessed from the Annotate and
Predict menu, or in the Live Annotate and Predict tab in the sequence viewer. To set a se-
quence as a reference sequence Ctrl-click on it and choose Set as reference sequence.

Annotations will only be transferred where the annotated sequence and the reference/consen-
sus sequence have at least the specified minimum similarity. All transferred annotations will
120 CHAPTER 8. WORKING WITH ANNOTATIONS

be annotated with a Transferred From qualifier indicating the names of the sequences the an-
notation came from (sorted in order of decreasing similarity), and a Transferred Similarity
qualifier which indicates the percentage similarity of the most similar sequence the annotation
was transferred from.

Annotations of the same type and covering the same interval that would be transferred from
multiple sequences are merged together such that the name of the transferred annotation will
consist of the names of all contributing annotations sorted in order of decreasing similarity.
Similarly, if contributing annotations have different qualifier values, the resulting qualifier
value will consist of all contributing qualifier values sorted in order of decreasing similarity.

The percentage similarity is the sum of the similarity values for each position as a fraction of
the sum of the maximum similarity values of the bases/residues in each position. For example
if one sequence is LLK and the other is LIK using the Blosum62 matrix, L/L scores 4, I/I scores
4, L/I scores 2, K/K scores 5. Therefore the total score is 4+2+5=11 out of a maximum of
4+maximum(4,4)+5=13 for a percentage similarity of 85%. Gaps (if allowed) are scored as the
lowest value from the score matrix (e.g. -4 for Blosum62).

Translation properties on CDS annotations are not transferred to the destination sequence. In-
stead these are renamed to Transferred Translation. The Sequence View calculates and dis-
plays an Automatic Translation property on any CDS annotation without a translation. How-
ever, this is not a real annotation property that is visible when exported. To add a real Transla-
tion property to annotations, use Tools →Workflows → Set CDS Translation Property.

8.2.4 Annotate from Database

Annotate from Database allows you to annotate your sequence with particular genes or motifs
from a custom annotation database. This function uses a BLAST-like algorithm to search for
annotations in the Source folder that match your sequence, by aligning it against the full length
of each annotation. Annotations which match with the given similarity are copied to your
sequence.

To use Annotate from Database, click the Live annotate and Predict tab to the right of the
viewer, and check the Annotate From... box. Then set the Source folder you wish to use to
annotate your sequences.

The Source folder

The default Source annotation folder is the Reference Features folder which can be found
at the bottom of your Local folder (see Figure 8.7). Within this is a locked Geneious Plasmid
Features folder which contains a curated list of common plasmid features including promoters,
terminators, tags, rep origins and marker genes.

You can also create a customized annotation database from other sequences in your database.
8.2. ADDING ANNOTATIONS 121

These sequences can be annotated or unannotated nucleotide or protein sequences, such as


reference genomes downloaded from Genbank, lists of peptides, BLAST hits, or your own
previously annotated sequences. These sequences should be placed in their own folder, which
can then be selected as the Source to annotate from. You can set any folder within your database
as the Source folder, but we recommend you place your personal annotation databases in the
Reference Features folder so they are easy to find and access.

Figure 8.7: The Reference Features folder and Annotate from Database settings

Finding and applying matches

You may need to adjust the Similarity slider in order to find matches between your sequence
and the source annotations. This sets the minimum percentage of sites covered by an annota-
tion that must be identical in order to transfer the annotation. Insertions and deletions count
as mismatched sites. Ambiguous matches are counted as partial mismatches. For example, for
nucleotides, N versus A is 0.75 of a mismatch. Similarity is calculated along the full length of
the annotation. For example, if your sequence is only half the length of the annotation, it can
have a maximum similarity of 50%.

By default, only the Best match annotation will be shown in cases where multiple annotations
of the same type overlap with each other in the same region on the target sequence. The best
match is considered as the annotation with the highest similarity at a position, when comparing
122 CHAPTER 8. WORKING WITH ANNOTATIONS

annotations of the same type that overlap in length by the % threshold set in the Advanced
options under Best Match Criteria (default is 75% length overlap). The exception to this is
primer annotations, where all annotations are always annotated.

To turn off this behaviour and annotate all annotations, choose All matching annotations.

Once you are happy with how your annotations look, click Apply to add them to the sequence.
If you only want to add a selected few of the annotations that are previewed on the sequence,
select those annotations you want to add before clicking Apply.

The transferred annotations contain the annotation qualifiers from the original sequence (note
that some of these, such as database references and transferred translation may not be correct
for the target sequence), plus qualifiers detailing the source of the transferred annotation and
the match percentage (See Figure 8.8). The Open Alignment hyperlink allows you to view an
alignment of the target region and matching annotation.

Figure 8.8: Annotation qualifiers on a newly transferred annotation

Advanced options

Annotating nucleotide from protein sequences.


8.2. ADDING ANNOTATIONS 123

Figure 8.9: Advanced options for Annotate from Database


124 CHAPTER 8. WORKING WITH ANNOTATIONS

Nucleotide query sequences can be annotated from protein or nucleotide source sequences. If
annotating from protein, the nucleotide query will be translated in all 6 frames for comparison
to the protein sequences. You can disable this option by unchecking Protein sequences under
the Source database advanced options.

Using unannotated sequences in the Source folder

You can optionally annotate from unannotated sequences in your Source folder. Geneious will
treat sequences without any annotations as though they have an annotation of type ”misc fea-
ture” which covers the full length of the sequence and is named the same as the sequence name.
If a match is found, this ”misc feature” annotation will be transferred to the query sequence.
To turn on this option, check Unannotated sequences (transferred as Misc Feature type an-
notations) under the Source database advanced options. When this option is off, unannotated
sequences will be ignored.

Adjusting the search parameters

• Adjust CDS boundaries up to [x] bp to match nearest ORF. Single interval CDS annota-
tions will be adjusted to match open reading frames (ORFs) on the query sequence, if one
is available within the specified distance. A valid ORF begins with a start and ends with
a stop codon, using the start and stop codons defined in the genetic code for the specified
destination sequence (or if none is selected, using the genetic code specified on the anno-
tation, or the source sequence). Gene annotations will also be adjusted when they exactly
match an adjusted CDS annotation. If adjusting a CDS annotation causes its similarity to
drop below the transfer threshold, the unadjusted annotation will be transferred instead.
This option is on by default. Uncheck this option to transfer CDS annotations exactly as
they match, regardless of whether they span a complete ORF.

• Best match criteria. This option sets the percent length overlap threshold for applying
best match annotations.

• Index length. Specifies the size of words (number of consecutive nucleotides or amino
acids) in each sequence to put into an index for quickly finding all other sequences con-
taining the same word. A higher index sequence length is faster but uses more memory
and is less sensitive. An annotation must have a sub-sequence of at least the index se-
quence length which matches perfectly with the target sequence in order to produce a
result. For annotating longer (i.e. gene length) annotations onto a genome we recom-
mend increasing the nucleotide Index Length to the maximum value of 15.

8.2.5 Annotate by BLAST

Annotate by BLAST allows you to annotate your nucleotide sequences by running a BLAST
search on ORF, CDS or mRNA annotations on your sequence. This function will find and
extract all the annotations of the type selected, translate them and run blastp against a BLAST
8.2. ADDING ANNOTATIONS 125

database of your choice. Annotations from BLAST hits which match with the selected similarity
are back translated and transferred onto your sequence.

To use Annotate by BLAST, you must firstly annotate your sequences with ORF, CDS or mRNA
annotations. ORF annotations can be added using Find ORFs under the Annotate and Predict
menu, or by using the Glimmer plugin for predicting bacterial genes. CDS and/or mRNA
annotations can be added with a gene prediction tool, such as Augustus (available as a plugin).

Then select your sequence and go to Annotate and Predict → Annotate by BLAST.

Figure 8.10: Annotate by BLAST options

In the top panel of the Annotate by BLAST options, select the genetic code for your sequence,
and the type of annotations on your sequence (ORF, CDS or mRNA) (see Figure 8.10).

In the BLAST Options panel, select the database you wish to BLAST against. Note that as blastp
is used, only amino acid databases can be selected.

If you have a large number of annotations we suggest using a custom BLAST database rather
than blasting to NCBI, as large searches to NCBI can be extremely slow. See section 16.4 for in-
structions on setting up and using custom BLAST. Be aware that the larger the BLAST database,
the slower the search will be.

You may need to adjust the Similarity slider in order to find matches between your translated
annotations and the BLAST hits. This sets the minimum percentage of sites covered by an anno-
tation that must be identical in order to transfer the annotation. Insertions and deletions count
as mismatched sites. Ambiguous matches are counted as partial mismatches. For example, for
nucleotides, N versus A is 0.75 of a mismatch. Similarity is calculated along the full length of
the annotation. For example, if your sequence is only half the length of the annotation, it can
have a maximum similarity of 50

Under More Options you can set other BLAST parameters, such as E-value thresholds, or the
number of CPUs for custom BLAST.
126 CHAPTER 8. WORKING WITH ANNOTATIONS

Once the BLAST searches have finished, annotations from the BLAST hits will be back-translated
and transferred to your original sequence. Note that if you have chosen to return multiple hits,
and the hits cover the same region of sequence, only the closest match is annotated. The trans-
ferred annotations will contain the annotation qualifiers from the original nucleotide sequence
plus qualifiers detailing the source of the transferred annotation and the match percentage (See
figure 8.11).

Figure 8.11: Results of Annotate by BLAST, showing the original ORF annotations used for
BLAST in orange, and annotations added by BLAST.

8.2.6 Annotation of sequence features using EMBOSS tools

The EMBOSS tools which were available under the Annotate and Predict menu in previous
Geneious versions are now available as separate plugins. EMBOSS protein analysis includes
antigenic to predict antigenic regions, sigcleave to predict signal cleavage sites, and garnier
to predict secondary structures. The EMBOSS Nucleotide Analysis plugin includes tfscan to
search for transcription factors, and tcode which provides a protein coding prediction graph.

When these plugins are installed, the protein coding prediction graph will be available under
the Graphs tab, and the other options will show up under the Annotate and Predict menu.
Further information on these applications is available at http://emboss.open-bio.org/
8.3. COMPARE ANNOTATIONS 127

8.3 Compare Annotations

Geneious Prime can compare annotations across up to 3 annotation tracks or documents, high-
lighting annotations which are common or unique depending on which criteria you choose.

To use this function, select the sequence or sequences containing annotations you wish to com-
pare, and go to Annotate and Predict → Compare Annotations. In the Annotation Types panel,
select the annotations you wish to compare. The default setting is for pairwise comparison; if
you wish to do three-way comparison select Set C. Choose the annotation type and location
for each set, and the type of comparison.

Comparison options are:

• Names must match: The names of the annotations must be the same. When allowing
partial matches for polymorphism annotations, just the name of the matching region of
the annotation must match.
• All properties must match: All properties in addition to the name must be the same to
be considered a match
• Allow intervals to partially match: When one annotation partially overlaps another, you
can choose to return either Partial Annotations or Full Annotations. Partial Annotations
will return annotations spanning or excluding only the matching region. Full Annota-
tions will return annotations spanning the entire length of the applicable source annota-
tion. Uncheck this option if you only want to return annotations which match across the
full length of both sets.

Results

A new annotation track will be created showing the results of the comparison. The results
panel in the compare annotations set up allows you to choose which comparison to return:

• A-B-C returns annotations found only in set A


• B-A-C returns annotations found only in set B
• C-A-B returns annotations found only in set C
• A&B&C returns annotations common to all sets. For a pairwise comparison, when the
annotation is common to both sets the result will have the name and type of annotation
from set A but share properties from both sets if selected. For a three-way comparison
properties from set A will be used.

More than one of these options can be selected at once, either by checking the box next to
the option, or clicking the appropriate section of the venn diagram to the right. Each result
128 CHAPTER 8. WORKING WITH ANNOTATIONS

comparison is displayed on a separate track on the original sequence, and a preview of these
tracks is given in the Example panel.

Example 1 - finding polymorphisms within a gene or feature (e.g. CDS, restriction site)

To return a track containing polymorphism annotations that are within another gene or fea-
ture, such as a coding sequence, select the Polymorphism annotation type for Set A, and choose
which track it is on (or select anywhere if there are no tracks, or you want to include polymor-
phisms on any track). Then choose the other annotation type (e.g. CDS) for Set B. Uncheck
Names must match as in general polymorphism annotation names do not match those of other
annotation types, and check Allow intervals to partially match and produce partial annota-
tions. Under Results, choose to return A & B (annotations common to both sets). This will
return a new track containing annotations of the type in Set A (polymorphisms) that are con-
tained within Set B (CDS) annotations. See Figure 8.12.

Example 2 - finding polymorphisms in a child which are not present in either parent

To return polymorphisms which are unique to the offspring, set Annotation Type as Polymor-
phism for Sets A, B and C, and set A and B as the parent tracks, and C as the child track, as
in Figure 8.13. Check Names must match and Allow intervals to partially match and pro-
duce partial annotations. Choose C-A-B in the results display. This will return a new track
containing annotations found in Set C (child), but not in Sets A and B (parents).
8.3. COMPARE ANNOTATIONS 129

Figure 8.12: Example 1: Compare annotations setup (above) and results (below) for finding
polymorphisms within a CDS
130 CHAPTER 8. WORKING WITH ANNOTATIONS

Figure 8.13: Example 2: Compare annotations setup (above) and results (below) for finding
polymorphisms in a child which are not present in either parent
Chapter 9

Sequence Alignments

9.1 Dotplots

A dotplot compares two sequences against each other and helps identify similar regions. Using
this tool, it can be determined whether a similarity between the two sequences is global (present
from start to end) or local (present in patches).

To view a dotplot select two nucleotide or protein sequences in the Document Table and se-
lect Dotplot in the tab above the sequence viewer (Figure 9.1). If a single nucleotide or protein
sequence is selected then the dotplot tab will not be shown. If you wish to view a dotplot show-
ing a comparison of the sequence to itself then go menu Tools → Appearance and Behavior
and check the option to Show Dotplot view on single sequences (compare to self). A Dotplot
(Self) Tab will then be visible in the Document viewer pane.

The Geneious dotplot offers two different comparison engines based on the EMBOSS dottup
and dotmatcher programs. You can choose which program to use by setting the sensitivity
under Data Source, the panel to the right of the dot plot. The Low Sensitivity/Fast setting uses
dottup, and the High Sensitivity/Slow setting uses dotmatcher. More information on these
programs can be found by going to http://emboss.sourceforge.net.

The dotplot is drawn from top-left to bottom-right. The Minimap in the panel to the right
of the viewer aids navigation of large dotplots by showing the overall comparison and a box
indicating where the dotplot window sits.

The Colors for the Dotplot can be selected at the top of the settings panel. The Classic scheme
will color the dot plot lines according to the length of the match, from blue for short matches,
to red for matches over 100 bp long.

For nucleotide comparisons, the reverse complement can also be viewed, where matches with
one of the sequences reverse complemented are displayed. These matches are shown by lines
running from the bottom left to top right. When a pairwise alignment is selected, the path that

131
132 CHAPTER 9. SEQUENCE ALIGNMENTS

the alignment takes through the dot plot can be displayed by checking Pairwise alignment
path. This is shown as a light blue line running through the dot plot. Both of these options can
be found under the Display section.

Figure 9.1: A view of dotplot of two sequences in Geneious Prime

Interpreting a Dotplot

• Each axis of the plot represents a sequence.

• A long, largely continuous, diagonal indicates that the sequences are related along their
entire length.

• Sequences with some limited regions of similarity will display short stretches of diagonal
lines.

• Diagonals on either side of the main diagonal indicate repeat regions caused by duplica-
tion.
9.2. SEQUENCE ALIGNMENTS 133

• A random scattering of dots reflects a lack of significant similarity. These dots are caused
by short sub-sequences that match by chance alone.

For more information on dotplots, refer to the paper by Maizel and Lenk 1981

9.2 Sequence Alignments

Over evolutionary time, related DNA or amino acid sequences diverge through the accumula-
tion of mutation events such as nucleotide or amino acid substitutions, insertions and deletions.

A sequence alignment is an attempt to determine regions of homology in a set of sequences. It


consists of a table with one sequence per row, and with each column containing homologous
residues from the different sequences, e.g. residues that are thought to have evolved from a
common ancestral nucleotide/amino acid. If it is thought that the ancestral nucleotide/amino
acid got lost on the evolutionary path to one descendant sequence, this sequence will show a
special gap character “–” in that alignment column.

9.2.1 Pairwise sequence alignments

There are two types of pairwise alignments: local and global alignments.

A local alignment is an alignment of two sub-regions of a pair of sequences. This type of


alignment is appropriate when aligning two segments of genomic DNA that may have local
regions of similarity embedded in a background of a non-homologous sequence.

A global alignment is a sequence alignment over the entire length of two or more nucleic acid
or protein sequences. In a global alignment, the sequences are assumed to be homologous
along their entire length.

Scoring systems in pairwise alignments

In order to align a pair of sequences, a scoring system is required to score matches and mis-
matches. The scoring system can be as simple as “+1” for a match and “-1” for a mismatch
between the pair of sequences at any given site of comparison. However substitutions, inser-
tions and deletions occur at different rates over evolutionary time. This variation in rates is the
result of a large number of factors, including the mutation process, genetic drift and natural
selection. For protein sequences, the relative rates of different substitutions can be empirically
determined by comparing a large number of related sequences. These empirical measurements
can then form the basis of a scoring system for aligning subsequent sequences. Many scoring
systems have been developed in this way. These matrices incorporate the evolutionary prefer-
134 CHAPTER 9. SEQUENCE ALIGNMENTS

ences for certain substitutions over other kinds of substitutions in the form of log-odd scores.
Popular matrices used for protein alignments are BLOSUM and PAM1 matrices.

Note: The BLOSUM and PAM matrices are substitution matrices. The number of a BLOSUM
matrix indicates the threshold (%) similarity between the sequences originally used to create the
matrix. BLOSUM matrices with higher numbers are more suitable for aligning closely related
sequences. For PAM, the lower numbered tables are for closely related sequences and higher
numbered PAMs are for more distant groups.

When aligning protein sequences in Geneious, a number of BLOSUM and PAM matrices are
available.

Algorithms for pairwise alignments

Once a scoring system has been chosen, we need an algorithm to find the optimal alignment of
two sequences. This is done by inserting gaps in order to maximize the alignment score. If the
sequences are related along their entire sequence, a global alignment is appropriate. However,
if the relatedness of the sequences is unknown or they are expected to share only small regions
of similarity, (such as a common domain) then a local alignment is more appropriate.

An efficient algorithm for global alignment was described by Needleman and Wunsch 1970,
and their algorithms was later extended by Gotoh 1982 to model gaps more accurately. For local
alignments, the Smith-Waterman algorithm is the most commonly used. See the references at
the links provided for further information on these algorithms.

Pairwise alignment with the Geneious aligner

Like a dot plot, a pairwise alignment is comparison between two sequences with the aim of
identifying which regions of two sequences are related by common ancestry and which regions
of the sequences have been subjected to insertions, deletions, and substitutions.

To run a pairwise alignment using the Geneious aligner, select the two sequences you wish to
align and choose Align/Assemble → Pairwise align.... The options available for the alignment
cost matrix will depend on the kind of sequence.

• Protein sequences have a choice of PAM and BLOSUM matrices.

• Nucleotide sequences have choices for a pair of match/mismatch costs. Some scores
distinguish between two types of mismatches: transition and transversion. Transitions
(A ↔ G, C ↔ T ) generally occur more frequently than transversions. Differences in the
ratio of transversions and transversions result in various models of substitution. When
1
MO. Dayhoff (ed.), Atlas of protein sequence and structure, vol. 5, National biomedical research foundation
Washington DC, 1978
9.2. SEQUENCE ALIGNMENTS 135

Figure 9.2: Options for nucleotide pairwise alignment

applicable, Geneious indicates the target sequence similarity for the alignment scores, i.e.
the amount of similarity between the sequences for which those scores are optimal.

• Both protein and nucleotide pairwise alignments have choices for gap open / gap exten-
sion penalties/costs. Unlike many alignment programs these values are not restricted to
integers in Geneious.

The score of a pairwise alignment is:

matchCount × matchCost + mismatchCount × mismatchCost

For each gap of length n, a score of gapOpenP enalty + (n − 1) × gapExtensionP enalty is


subtracted from this.

Where

• gapOpenP enalty = The “gap open penalty” setting in Geneious.

• gapExtensionP enalty = The “gap extension penalty” setting in Geneious.

• matchCost = The first number in the Geneious cost matrix.

• mismatchCost = The second number in the Geneious cost matrix.

• matchCount = The number of matching residues in the alignment.

• mismatchCount = The number of mismatched residues in the alignment.

When doing a Global alignment with free end gaps, gaps at either end of the alignment are not
penalized when determining the optimal alignment. This is especially useful if you are aligning
136 CHAPTER 9. SEQUENCE ALIGNMENTS

sequence fragments that overlap slightly in their starting and ending positions, e.g. when
using two slightly different primer pairs to extract related sequence fragments from different
samples. You can also do a Local Alignment if you want to allow free end overlaps, rather
than just free end gaps in one alignment.

9.2.2 Multiple sequence alignments

A multiple sequence alignment is a comparison of multiple related DNA or amino acid se-
quences. A multiple sequence alignment can be used for many purposes including inferring
the presence of ancestral relationships between the sequences. It should be noted that protein
sequences that are structurally very similar can be evolutionarily distant. This is referred to
as distant homology. While handling protein sequences, it is important to be able to tell what
a multiple sequence alignment means – both structurally and evolutionarily. It is not always
possible to clearly identify structurally or evolutionarily homologous positions and create a
single “correct” multiple sequence alignment (Durbin et al 1998).

Multiple sequence alignments can be done by hand but this requires expert knowledge of
molecular sequence evolution and experience in the field. Hence the need for automatic mul-
tiple sequence alignments based on objective criteria. One way to score such an alignment
would be to use a probabilistic model of sequence evolution and select the alignment that is
most probable given the model of evolution. While this is an attractive option there are no
efficient algorithms for doing this currently available. However a number of useful heuristic
algorithms for multiple sequence alignment do exist.

Progressive pairwise alignment methods

The most popular and time-efficient method of multiple sequence alignment is progressive
pairwise alignment. The idea is very simple. At each step, a pairwise alignment is performed.
In the first step, two sequences are selected and aligned. The pairwise alignment is added to the
mix and the two sequences are removed. In subsequent steps, one of three things can happen:

• Another pair of sequences is aligned

• A sequence is aligned with one of the intermediate alignments

• A pair of intermediate alignments is aligned

This process is repeated until a single alignment containing all of the sequences remains. Feng
& Doolittle were the first to describe progressive pairwise alignment. Their algorithm used a
guide tree to choose which pair of sequences/alignments to align at each step. Many variations
of the progressive pairwise alignment algorithm exist, including the one used in the popular
alignment software ClustalX.
9.2. SEQUENCE ALIGNMENTS 137

Multiple sequence alignment using the Geneious aligner

To run a multiple alignment using the Geneious aligner, select all the sequences you wish
to align and click Align/Assemble →Multiple align.... Select Geneious as the alignment al-
gorithm (Figure 9.3). The Geneious multiple alignment algorithm uses progressive pairwise
alignment. The neighbor-joining method of tree building is used to create the guide tree.

Figure 9.3: The multiple alignment window

As progressive pairwise alignment proceeds via a series of pairwise alignments, this function
has all the standard pairwise alignment options, plus the option of refining the multiple se-
quence alignment once it is done. “Refining” an alignment involves removing sequences from
the alignment one at a time, and then realigning the removed sequence to a “profile” of the
remaining sequences. The number of times each sequence is re-aligned is determined by the
refinement iterations option in the multiple alignment window. The resulting alignment is
placed in the folder containing the original sequences.

A profile is a matrix of numbers representing the proportion of symbols (nucleotide or amino


acid) at each position in an alignment. This can then be pairwise aligned to another sequence
or alignment profile. When pairwise aligning profiles, mismatch costs are weighted propor-
tional to the fraction of mismatching bases and gap introduction and gap extension costs are
proportionally reduced at sites where the other profile contains some gaps.

In some cases building a guide tree can take a long time since it requires making a pairwise
alignment between each pair of sequences. The build guide tree via alignment option may
speed this part by taking a different route. First make a progressive multiple alignment using a
random ordering, and use that alignment to build the guide tree. Notice that while this usually
speeds up the process, it may not if the sequences are very distant genetically.
138 CHAPTER 9. SEQUENCE ALIGNMENTS

You can also do a multiple alignment via translation and back, as with pairwise alignment (see
section 9.2.6)

9.2.3 Sequence alignment using Clustal Omega

Note: Clustal Omega replaces ClustalW in Geneious Prime 2020 onwards

Clustal Omega is a general purpose multiple sequence alignment (MSA) program for protein
and DNA/RNA. It produces high quality MSAs and uses multiple execution threads, so is
capable of handling datasets of hundreds of thousands of sequences in reasonable time. Clustal
Omega can be run from within Geneious Prime without having to export your sequences and
can also be used with translation alignment (see section 9.2.6).

To perform an alignment using Clustal Omega, select the sequences or alignment you wish to
align, then select Align/Assemble → Multiple Align.... Select Clustal Omega as the alignment
type, and the options available for a Clustal Omega alignment will be displayed (Figure 9.4).

Figure 9.4: Alignment options for Clustal Omega

To have Clustal Omega automatically select the speed and accuracy settings based on the size
of your dataset, open the More Options tab and select Automatically adjust settings based on
number of sequences.

For further information on the Clustal Omega settings and additional command line options,
9.2. SEQUENCE ALIGNMENTS 139

see the Clustal Omega documentation. The command line input and output for your alignment
can be viewed in the Info tab on the alignment result document in Geneious.

9.2.4 Sequence alignment using MUSCLE

MUSCLE is public domain multiple alignment software for protein and nucleotide sequences.
MUSCLE stands for multiple sequence comparison by log-expectation.

To perform an alignment using MUSCLE, select the sequences or alignment you wish to align
and select Align/Assemble → Multiple Align.... Select MUSCLE as the alignment type, and
the options available for a MUSCLE alignment will be displayed.

For more information on muscle and its options, please refer to the original documentation for
the program: http://www.drive5.com/muscle/muscle.html.

9.2.5 Other sequence alignment plugins for Geneious Prime

MAFFT

MAFFT (Multiple Alignment using Fast Fourier Transform) is a fast multiple alignment pro-
gram suitable for large alignments. To use MAFFT, you must first download the plugin by
going to Plugins under the Tools menu and selecting MAFFT Multiple Alignment from the
list of available plugins. Click the Install button to install it, and then click OK to close the
Plugins window. To run MAFFT, select the sequences or alignment you wish to align, select
the Align/Assemble button from the Toolbar and choose Multiple Alignment. MAFFT should
now be showing as an option for the type of alignment.

For more information on MAFFT and its options, please refer to the original documentation for
the program: http://mafft.cbrc.jp/alignment/software/.

Mauve

The Mauve aligner allows you to construct whole genome multiple alignments in the presence
of large-scale evolutionary events such as rearrangement and inversion. To use Mauve, you
must first download the plugin by going to Plugins under the Tools menu and selecting Mauve
from the list of available plugins. Click the Install button to install it, and then click OK to close
the Plugins window. To run Mauve, select the sequences or alignment you wish to align, select
the Align/Assemble button from the Toolbar and choose Align Whole Genomes.

An alignment produced by Mauve is displayed in the Mauve genome alignment viewer, which
allows you to easy see aligned blocks of sequence and genome rearrangements. Note that this is
not a regular Geneious alignment document and you cannot run downstream analyses such as
140 CHAPTER 9. SEQUENCE ALIGNMENTS

tree building from this document. To run downstream analyses you must first extract the local
alignment blocks. To do this, switch to the Alignment View tab above the sequence viewer and
if you have more than one local alignment block, choose which one you wish to extract in the
General tab to the right of the sequence viewer. Then select all the sequences in that alignment
and click Extract. Choose the option to extract the sequences as an alignment, and a separate
alignment document will be created in the document table.

For more information on Mauve and its options, please refer to the original documentation for
the program: http://darlinglab.org/mauve/mauve.html.

LASTZ

LASTZ is designed for pairwise alignments of whole genomes and can efficiently align chro-
mosomal or genomic sequences millions of nucleotides in length. To use LASTZ, you must
first download the plugin by going to Plugins under the Tools menu and selecting LASTZ
from the list of available plugins. Click the Install button to install it, and then click OK to close
the Plugins window. To run LASTZ, select the sequences or alignment you wish to align, select
the Align/Assemble button from the Toolbar and choose Align Whole Genomes.

For more information on LASTZ and its options, please refer to the original documentation for
the program: http://www.bx.psu.edu/˜rsharris/lastz/

9.2.6 Translation alignment

In a translation alignment, nucleotide sequences are translated into protein, the alignment is
performed on the protein sequence, and then translated back to nucleotide sequence. Trans-
lation alignments can be performed with any of the alignment algorithms in Geneious Prime,
such as the Geneious aligner, MUSCLE or Clustal Omega.

In the translation alignment options you can set the genetic code and translation frame for the
translation as well as the alignment algorithm you wish to use. All sequences in the alignment
must use the same translation frame. Under More Options you can set the parameters such as
cost matrix, gap open penalty and gap extension penalty for the alignment.

9.2.7 Combining alignments and adding sequences to alignments

Consensus Alignment allows you to align two or more alignments together (and create a sin-
gle alignment) and align a new sequence in to an existing alignment. Select the sequences or
alignment you wish to align and select the Align/Assemble button from the Toolbar and choose
Multiple Alignment. Consensus alignment allows you to choose which alignment algorithm
to use for aligning the consensus sequences. All of the pairwise and multiple alignment al-
9.3. ALIGNMENT VIEWING AND EDITING 141

Figure 9.5: Options for nucleotide translation alignment

gorithms are available. The consensus sequence used for each alignment is a 100% consensus
with gaps ignored.

For information on concatenating alignments, see section 5.4.1.

9.3 Alignment viewing and editing

Alignments are displayed in the viewer below the document table, in the same way as indi-
vidual sequences. See section 5.2 for details on the sequence viewer, including basic controls
such as zooming in and out, wrapping sequences, setting colors, and selecting individual or
multiple sequences from an alignment. For a description of alignment statistics available in the
Statistics tab, see section 5.2.9.

To edit an alignment, you must first click the Allow Editing button on the toolbar above the
sequence viewer. Alignments are edited in the same way as for individual sequences - for
details on editing operations and shortcut keys, see section 5.4. If the consensus sequence of
an alignment or assembly is edited, the changes are applied to all sequences in the alignment,
with the exception of the reference sequence.
142 CHAPTER 9. SEQUENCE ALIGNMENTS

9.3.1 Highlighting

Identical or variable sites in an alignment can be highlighted by checking the Highlighting


option under the Display tab (Figure 9.6). The drop down menus allow you to choose
what to highlight, and whether the consensus or reference sequence should be used for the
comparison. The options for what to highlight are:

• Agreements: Greys out residues that are not identical to the consensus or reference al-
lowing you to quickly locate conserved sites in the alignments.

• Disagreements: This greys out residues that are identical to the consensus or reference in
that column, allowing you to quickly locate variable sites in the alignment. Ambiguous
disagreements (e.g. G vs N for nucleotides) are also greyed out and not highlighted.

• All Disagreements: This greys out residues that are identical to the consensus or ref-
erence in that column, allowing you to quickly locate variable sites in the alignment.
Ambiguous disagreements (e.g. G vs N for nucleotides) will not be greyed out, so will
appear highlighted.

• Ambiguities: Greys out non-ambiguous residues.

• Gaps: Greys out non-gap positions.

• Transitions/transversions: Greys out residues that are not transitions/transversions com-


pared to the consensus or reference sequence. When highlighting transitions/transver-
sions, it is recommended you turn on the ignore gaps consensus option or some residues
may be wrongly highlighted due to the consensus displaying N for sites that contain gaps
and non-gaps.

The Go < > arrows allow you to navigate forwards and backwards between highlighted fea-
tures. Where there are more than two consecutive highlights within a sequence the selection
will only stop on the first and last of these. Using the drop down menu, you can choose to
navigate between highlighted features in any sequence, in the current sequence, in a different
column, or the reference/consensus sequence.

When clicking the arrows:

• hold down alt to go to highlighted features in the consensus or reference sequence only.

• hold down ctrl (command) to go to highlighted features that aren’t in the consensus or
reference sequence.

• hold down ctrl (command) and alt to go to highlighted features in the currently selected
sequence.
9.3. ALIGNMENT VIEWING AND EDITING 143

Figure 9.6: A multiple alignment with disagreements to the consensus highlighted

These actions may also be accessed directly using specific shortcuts which can be configured
from Tools → Preferences → Keyboard. Using a keyboard shortcut or modifier key will over-
ride what is set in the drop down menu.

Default Shortcuts (mac keys in brackets):

• ctrl+D (command+D): Go to next highlighted feature


• ctrl+shift+D (command+shift+D): Go to previous highlighted feature
• ctrl+alt+D (ctrl+option+command+D): Go to next highlighted feature in consensus or
reference sequence only
• ctrl+alt+shift+D (ctrl+option+shift+command+D): Go to previous highlighted feature in
consensus or reference sequence only

If Use dots is checked, non-highlighted residues are displayed as dots instead of being greyed
out.

9.3.2 Alignment viewer graphs

Alignment graphs can be toggled on or off in the Graphs tab to the right of the viewer.
In addition to the basic graphs available for individual sequences, the following graphs are
144 CHAPTER 9. SEQUENCE ALIGNMENTS

available for alignments and assemblies:

Coverage: The height of the graph at each position represents the number of sequences which
have a non-gap character at that position. The coverage graph is made up of three bar graphs
overlaid on each other: a blue graph shows the minimum coverage, a black graph shows the
mean coverage and a yellow graph (underneath the blue and black graphs) shows the max-
imum coverage. The minimum graph is drawn over the top of the mean color graph, but if
necessary the minimum color graph will be reduced in height so that a single pixel of the mean
color graph is always visible at each position. Thus, for sequences which are zoomed in so that
the horizontal width of each site is one pixel or more, then the graph will be shown in blue
with a black line across the top, denoting the coverage at that position. For large alignments
which are zoomed out so that the horizontal width of each site is less than one pixel (i.e. each
pixel represents more than one site in the alignment), all three bars are visible, showing the
minimum, mean and maximum coverage of bases within that pixel (see Figure 9.7).

Figure 9.7: The coverage graph for an assembly, shown zoomed out in the top panel, and
zoomed in below

To highlight regions above or below a particular coverage level, open the Graphs tab and check
Highlight above... or Highlight below... and a bar will appear below the coverage graph
across regions which fit these criteria. The “Highlight above” bar is blue, and the “Highlight
below” bar is yellow. Regions where the alignment or assembly is made up of sequences in
a single direction (e.g. forward or reverse sequences only) can be highlighted by checking
Highlight single strand.

The scale bar to the left of the graph shows minimum and maximum coverage for the entire
alignment or assembly, as well as a tick somewhere in between for the mean coverage.
9.4. ALIGNMENT MASKING 145

Sequence Logo: This displays a sequence logo, where the height of the logo at each site is equal
to the total information at that site, and the height of each symbol in the logo is proportional
to its contribution to the information content. When weight by quality is enabled, the height of
the sequence logo is proportional to the quality score at that site, which is useful for identifying
low quality regions and resolving conflicts. When the sequence is zoomed out far enough such
that the horizontal width of each base is less than one pixel, then the height is the average of the
information over multiple sites. When gaps occur at at some sites, the height is scaled down
further to be proportional in height to the number of non-gap residues.

On large contigs (over 100,000 bp long), the sequence logo can’t be efficiently calculated in
regions of over 1000 fold coverage, in which case the sequence logo will display ?.

Identity: This displays the identity across all sequences for every position. Green means that
the residue at the position is the same across all sequences. Sites with 30% to under 100%
identity are yellow, and sites with less than 30% identity are red. (Figure 9.8).

Figure 9.8: The identity graph for an alignment of nucleotide sequences

9.4 Alignment masking

To mask or strip columns from your alignment, go to Tools → Mask Alignment. . . (Figure 9.9).
This tool can be used to add “Masked” annotations to sites or to create a copy of your alignment
with some sites removed. Sites can be removed or masked by sequence content (e.g., you can
remove all gaps, or all constant sites), codon position, existing Masked annotations, or using
a NEXUS-style CHARSET. For the purposes of phylogenetic analyses, you can use this tool to
create alignments from which any sites with Masked-type annotations will be removed (see
146 CHAPTER 9. SEQUENCE ALIGNMENTS

section 12.2.1).

Figure 9.9: Mask alignment options

Masking sites by sequence content:

Sites can be masked or stripped based on the following sequence content:

• Identical residues: Masks or strips all sites that are constant (those sites containing the
same residues or sites consisting entirely of gaps).

• All gaps: Masks or strips all sites containing only gaps. Sites containing gaps as well as
residues will be preserved.

• Gaps (%): Masks or strips all sites containing at least the specified percentage of gaps
(inclusive). Sites containing less than this percentage of gaps will be preserved.

• Any gaps: Masks or strips all sites containing at least one gap.

• Ambiguities: Masks or strips all sites containing at least one ambiguity.

When deciding whether or not to remove a site, regions annotated as Trimmed will not be
considered.
9.5. CONSENSUS SEQUENCES 147

Mask sites by codon position:

1st, 2nd and/or 3rd codon positions can be masked or stripped by checking the codon positions
that you do not want to analyze. Note that this option always takes the first base in your
alignment to be codon position one, regardless of any ORF or CDS annotations that may be on
your sequences. Available for nucleotide sequences only.

Remove masked sites:

If you already have an annotation track on your sequence containing Masked annotations, then
you can strip these regions by choosing the “masked sites” option and the track you wish to
use. Only masks applied to the consensus sequence of an alignment are used for stripping.

Mask sites defined by a NEXUS-style CHARSET:

Sites can be masked or stripped based on a NEXUS-style CHARSET entered into the text area.
This must conform to the NEXUS CHARSET specifications (see Maddison, Swofford & Mad-
dison 1997). Both standard or vector style CHARSETs can be used.

For example, both of the following would mask or strip sites 2, 4, 5, and 6 from an alignment
with 10 sites.

Standard style: 2 4-6; Vector style: 0101110000

9.5 Consensus sequences

To display a consensus sequence on your alignment, check the Consensus option under the
Display tab.

The consensus sequence is displayed above the alignment or assembly, and shows which
residues are conserved (are always the same), and which residues are variable. A consensus is
constructed from the most frequent residues at each site (alignment column), so that the total
fraction of rows represented by the selected residues in that column reaches at least a specified
threshold.

To work with the consensus sequence in a downstream analysis, you must first Extract it from
your alignment. To do this, click on Consensus to select the entire sequence, then click Extract
to extract it to a new sequence document. Alternatively, go to Tools → Generate Consensus
Sequence. This operation allows you to choose the options for how your consensus sequence
is called (as described below), and then saves it to a separate document. If your consensus
148 CHAPTER 9. SEQUENCE ALIGNMENTS

sequence contains ‘?’ characters where there are regions with no or low coverage in your as-
sembly, you can split the consensus sequence at these bases to generate multiple sequences by
checking the option to Split into separate sequences around ‘?’ calls

Threshold settings

The Threshold determines which base in called in the consensus, and can be set to a percentage,
or by using the quality scores on the reads. IUPAC ambiguity codes (such as R for an A or G
nucleotide) are counted as fractional support for each nucleotide in the ambiguity set (A and G,
in this case), thus two rows with R are counted the same as one row with A and one row with G.
When more than one nucleotide is necessary to reach the desired threshold, this is represented
by the best-fit ambiguity symbol in the consensus; for protein sequences, this will always be an
X.

For example, assume a column contains 6 A’s, 3 G’s and 1 T. If the consensus threshold is set
to 60% or below, then the consensus will be A. If the consensus threshold is set to between 60%
and 90%, then the consensus will be R. If the consensus threshold is set to over 90%, then the
consensus will be D.

In the case of ties, either all or none of the involved residues will be selected. For example,
if the above case instead had 6 A’s, 2 G’s and 2 T’s, then for a consensus threshold of 60% or
below, an A will be called. Above a threshold of 60%, a D will be called.

When the aligned sequences contain quality information in the form of chromatograms or fastq
data, you can select Highest Quality to calculate a majority consensus that takes the relative
residue quality into account. This sums the total quality for each potential base call, and if the
total for a base exceeds 60% of the total quality for all bases, then that base is called. Highest
Quality (Raw) uses the raw chromatogram scores.

When Highest Quality (50% or 60% or 75%) consensus calling is selected, in order to improve
consensus accuracy, Geneious will adjust the value of the raw quality scores that contribute to
consensus calling such that

1. Homopolymer regions are made symmetric using the lower of the two quality scores.
For example if there are 3 consecutive G’s with quality scores of 14, 12, 10, these will be
replaced with quality scores of 10, 12, 10 to reflect that the confidence that the first position
is a G is the same as the confidence that the last position is a G. This is important for an
alignment against another sequence where there are only two G’s where it is equally valid
to position the gap before the first G or after the last G.

2. Low quality gaps (under quality 30) have their quality scores up to halved so that consen-
sus calling is not biased towards the sequence with a shorter version of a homopolymer.
For example if NA-N (the A has quality 20) is aligned to NAAN (the 2 As have quality 10
because of the lower confidence that this homopolymer is 2 bp long), then the gap which
9.5. CONSENSUS SEQUENCES 149

would normally be assigned a quality score of 20 (assuming the nearby N has a higher
quality), will have it’s quality score halved so that it will be comparable to the quality
scores in the other sequence.

3. For Sanger traces, gap quality scores take on the minimum quality score of the two pre-
vious and two next homopolymers. For example G–AAA—CCT the quality of the 3 gaps
is the minimum quality of the G, the two outer A’s, the C’s and the T. This is because in
an equally good alignment the C’s or A’s could be shifted into this gap, so the gap could
be between the G/A or C/T instead.

4. For Sanger traces, traces are analyzed and the quality score is reduced if the called peak is
not an obvious peak or if there is an alternative peak over 50% of the height of the called
peak.

Assigning quality to the consensus

The Assign Quality setting enables you to map the quality of the sequences onto the consen-
sus. Choose Highest to map the quality of the highest quality base at each column onto the
consensus. Select Total to map the sum of the contributing bases, minus the sum of the non-
contributing bases.

For example: if there are two G’s and three A’s in a column, with the G’s having qualities of 16
and 24, and the A’s having qualities of 40, 42, and 50 respectively, then because (40 + 42 + 50) >
60% of (40 + 42 + 50 + 16 + 24), then an A will be called for the consensus. This consensus A
will have a quality of (40 + 42 + 50) − (16 + 24) = 92 if using Total or 50 if using Highest.

A more complicated example for Highest Quality consensus calling using Total: Assume a
column contains 2 A’s with qualities of 30 and 25, 1 G with quality 30 and 1 T with quality 15.
Because the total qualities of the A’s is 55 out of 100 for the column, this is not higher than the
60% threshold to call an A. With the G included, the total quality is 30 + 25 + 30 = 85, which
is higher than the 60% threshold, so a consensus call of R will be made. The quality assigned
to this R will be the sum of the bases that agree with the consensus call minus the bases that
disagree, which is 30 + 25 + 30 − 15 = 70.

When reads have mapping qualities (confidence that the entire read is mapped to the correct
location), the mapping quality is combined with the base pair quality to form the quality used
during consensus calling. The log scale qualities are combined as probabilities so a very rough
rule of thumb is the combined quality will be approximately equal to the minimum of the
mapping quality and base call quality, except in cases where the two values are very close in
which case the combined quality will be slightly smaller.

Gaps are treated just like a standard base and are assigned a quality score equal to half the
minimum base call quality on either side of the gap.

For example, if a column has a C with quality 41 and mapping quality 3, a gap with adjacent
150 CHAPTER 9. SEQUENCE ALIGNMENTS

base calls of quality 41 and 30 with mapping quality 240, and a C with quality 41 and mapping
quality 20. The two C’s will have combined quality scores of approximately 3 and 20 respec-
tively. The gap will have an effective combined quality score of 30/2 = 15. So the consensus
call with be a C with quality 20 + 3 − 15 = 8.

Other settings

Ignore Gaps (alignment documents only): When this is checked, the consensus is calculated
as if each alignment column consisted only of the non-gap characters; otherwise, the gap
character is treated like a normal residue, but mixing a gap with any other residue in
the consensus always produces the total ambiguity symbol (N and X for nucleotides and
amino acids, respectively).

If no coverage call: For alignments or contigs with a reference sequence, this setting can be
used to control what character the consensus sequence should use when the reference
sequence has no coverage. Options available are -, X/N, ? or Ref. A ‘?’ represents an
unknown character, potentially a gap. If Ref is selected, then the consensus is assigned
whatever character the reference sequence has at that position. Note that if any sequence
in the alignment/contig has an internal gap in it, that is still considered valid coverage at
that position, and this setting will not apply.

Call N if Quality below: Enables you to change consensus bases to N’s if the quality is be-
low the threshold that you set. This is particularly useful for exporting sequences to file
formats which do not preserve quality (for example FASTA).

Call Sanger Heterozygotes(choromatogram assemblies only): When a Sanger trace has an al-
ternative peak that is at least as high as the specified percentage of the best peak, that trace
will contribute a heterozygous call to the consensus calculation. Base calls with a quality
score of at least 63 (i.e those manually edited) will not be analyzed for heterozygotes.
Chapter 10

Assembly and Mapping

Assembly is normally used to align and merge overlapping fragments of a DNA sequence
(typically produced from Sanger or next-generation sequencing (NGS) sequence platforms) to
reconstruct the original sequence. The assembly essentially appears as a multiple sequence
alignment of reads (called the contig document) and the consensus sequence of the contig can
be used for the reconstruction of the original sequence. Where positional information such as
paired-end and mate-pair data is available, contigs can be joined into longer sequences called
scaffolds.

Sequence assembly can refer either to de novo assembly or map to reference. De novo assembly
focuses on the reconstruction of the original sequence by aligning and merging shorter reads,
while map to reference consists of mapping reads to a reference sequence. The first approach is
usually applied to genomes that have not been characterised yet, while the second one usually
focuses on identifying differences from a well-characterised reference sequence.

10.1 Supported sequencing platforms

The Geneious assembler can handle data from Sanger and high-throughput (NGS) sequence
platforms. Because different sequencing platforms have different error profiles and rates, they
may require different assembly parameters. To ensure the optimal parameters are used for
your dataset you can set the sequencing platform under Sequence → Set Read Technology.
This can also be set during the file import process if you are importing fastq files.

The Geneious assembler works well with Sanger, Illumina, Ion Torrent, 454, and PacBio CCS
data. PacBio CLR and Oxford Nanopore data are more problematic for Geneious due to the
very high error rates. Depending on the data quality, you may be able to use these with map
to reference, but are unlikely to be able to successfully de novo assemble them. When de novo
assembling only Illumina sequencing data, turning off ’Allow gaps’ (in the advanced options)
can sometimes improve performance and results. If any of the input sequences are set as PacBio

151
152 CHAPTER 10. ASSEMBLY AND MAPPING

or Oxford Nanopore, Geneious will automatically use a maximum gap size of 2 to 3 times the
specified value.

Both the Geneious de novo assembler and mapper will also work with arbitrarily long reads.
For example, you can use high quality contig consensus sequences as input to either of these.

10.2 Read processing

10.2.1 Setting paired reads

To assemble paired read (or mate pair) data, prior to assembly you first need to tell Geneious
the reads are paired. The assembler will then automatically used the paired data unless you
turn off the advanced option to Use paired distances.

Paired reads can be set up either during the fastq import process, or by selecting the docu-
ments(s) containing the paired reads and going to Set Paired Reads from the Sequence menu.
Depending on your data source, reads could be in parallel sets of sequences, or interlaced, so
you need to select the appropriate format. Geneious will guess and select the appropriate op-
tion based on the data you have selected, so most of the time you can just use the default value
for this. However, you must make sure you select the correct Relative Orientation for your
data. Different sequencing technologies orientate their paired reads differently. All paired read
data will have a known expected distance between each pair. It is important you set this to the
correct value to achieve good results when assembling. If you don’t know what the relative
orientation or expected distance is between the reads you should ask your sequencing data
provider.

When you click ‘OK’, if you chose to pair by parallel lists of sequences a new document contain-
ing the paired reads will be created. If you do this during the fastq import process, Geneious
will only import the paired read file and not the original unpaired files. If you chose to pair an
interlaced list of sequences (or modify settings for some already paired data), the existing list
of sequences will be modified to mark it as paired.

If you choose to split reads based on the presence of a linker sequence (e.g. for 454 data) the
original sequences will be unmodified and the split reads will be created in a new document.
The default behaviour is to ignore sequences shorter than 4 bp either side of the linker, but this
can be customized from the Edit Linkers option in the paired reads options.

Polonator sequencing machine reads can be split using the Split each read in half option.

10.2.2 Trim Ends

Trimming low quality ends of sequences is normally performed before assembling a contig.
This is because the noise introduced by low quality regions and vector contamination can pro-
10.2. READ PROCESSING 153

duce incorrect assemblies.

To trim and filter using the Geneious tools, select the sequences you wish to trim and choose
Annotate and Predict → Trim Ends. This option allows you to trim vectors, primers and poor
quality bases and filter out reads by length. It can also be performed at the assembly step, by
checking the trim sequences option in the assembly set-up. Geneious R9 and above also have a
plugin for trimming using the BBDuk algorithm from the BBTools suite. This is can be installed
by going to Tools → Plugins.

Trim Ends can soft or hard trim your sequences. If you wish to soft trim, choose to Annotate
new trimmed regions in the Trim Ends set up. The trimmed sequence will then remain vis-
ible but will be annotated with “Trimmed” annotations. Sequence annotated with a trimmed
annotation is ignored by the assembler when constructing a contig and will not be included in
the consensus sequence calculation (refer to section 10.2.2, Operations that respect soft trims
and Operations that do not respect soft trims for lists of operations handling or ignoring Trim
Ends). So although the trimmed regions are visible, they do not affect the results of the assem-
bly at all. Soft trims can be adjusted as needed, or deleted completely. Dragging the ends of
the trim annotation will make the newly untrimmed sequence visible and part of the consen-
sus (Figure 10.1). If you wish to remove the trimmed sequence completely (hard trim), choose
Remove new trimmed regions from sequences.

Figure 10.1: Click and drag the trims to adjust

If you choose to trim your sequences at the assembly step, the sequences are trimmed and
assembled in one operation and you will not be able to view the trimming before assembly is
performed. However, the trimmed regions will still be available and adjustable after assembly
is complete. If you choose to trim your sequences prior to assembly, select Use Existing Trim
Regions when you set up the assembly.

Trimmed annotations can also be created manually using the annotation editing in the sequence
viewer. If you create annotations of type Trimmed and save them, then Geneious will treat them
the same as ones generated automatically and they will be ignored during assembly. Trimmed
annotations can also be modified in this way before or after assembly.
154 CHAPTER 10. ASSEMBLY AND MAPPING

Trim Ends options

Figure 10.2: Trimming options

• Annotate new trimmed regions: Calculate new trimmed regions and annotate them - the
trimmed regions will be ignored when performing assembly and calculating the consen-
sus sequence.
• Remove new trimmed regions from sequences: Calculate new trimmed regions and
remove them from the sequence(s) completely. This can be undone in the Sequence View
before the sequences are saved.
• Remove existing trimmed regions from sequences: This is only available when there
are already trimmed regions on some of the sequences. This will remove the existing
trimmed regions from the sequences permanently; no new trimmed regions are calcu-
lated.
• Trim vectors: Screens the sequences against UniVec or your own custom BLAST database
to locate any vector contamination and trim it. This uses an implementation similar to
10.2. READ PROCESSING 155

NCBI’s VecScreen to detect contamination (http://www.ncbi.nlm.nih.gov/projects/


VecScreen/). Multiple databases can be selected to trim from by clicking the + sign.

• Trim primers: Screens the sequences against primers in your local database.

• Error Probability Limit: Available for chromatogram documents which have quality
(confidence) values. The ends are trimmed using the modified-Mott algorithm (see be-
low) based on these quality values (Richard Mott personal communication).

• Maximum low quality bases: Specifies the maximum number of low quality bases that
can be in the untrimmed region. Low Quality is normally defined as confidence of 20 or
less. This can be adjusted on the Sequencing and Assembly tab of Preferences.

• Maximum Ambiguities: Finds the longest region in the sequence with no more N’s than
the maximum ambiguous bases value and trims what is not in this region. This should
be used when sequences have no quality information attached.

• Trim 50 End and Trim 30 End: These can be set to specify trimming of only the 30 or 50 end
of the sequence. A minimum amount that must be trimmed from each end can also be
specified.

• Minimum length after trim: If the length of the sequence after trimming is less than the
number specified here then the sequence is discarded. This option is useful for filtering
out sequences that are too short to be useful after trimming.

• Maximum length after trim: If the untrimmed region is longer than the specified limit
then the remainder will be trimmed from the 30 end of the sequence until it is this length.

The Modified Mott algorithm

The modified-Mott algorithm for trimming ends based on quality operates as follows:

For each base, it subtracts the base error probability from an error probability cutoff value
(default 0.05) to form the base score. The base error probability is calculated from the quality
score (Q), such that P(error)=10(Q/−10) . This means that low quality bases have high error
probabilities and thus may have a negative base score.

E.g. For Q10, P(error)= 0.1, For Q30, P(error)=0.001

So with an error probability cutoff of 0.05, a base with Q10 has a base score of 0.05-0.1= -0.05,
and a base with Q30 would have a base score of 0.05-0.001=0.049.

The trimming algorithm then calculates the running sum of the base score across the sequence.
If the sum drops below zero it is set to zero. The part of the sequence not trimmed is the region
between the first positive value of the running sum and the highest value of the running sum
(i.e. the highest scoring segment of the sequence). Everything before and after this region is
trimmed.
156 CHAPTER 10. ASSEMBLY AND MAPPING

Operations that respect soft trims

The following operations will exclude sequence that has been soft trimmed:

• De novo assembly*
• Map to Reference*
• Multiple and pairwise alignment (all algorithms)
• Find Variants/SNPs
• Remove Chimeric reads
• Calculation of consensus sequence
• Calculation of sequence identity (displayed in statistics tab and identity graph)
• Calculation of coverage (displayed in statistics tab and coverage graph)
• Calculation of confidence mean / quality score outputs. This includes the outputs shown
in the statistics tab, the calculation of HQ%, MQ% and LQ%, and corresponding sequence
bins based on those figures.
• Document fields: Ambiguities (chromatograms), Post-Trim (length of sequence after trim-
ming)

* Onlythe Geneious assembler supports the use of trimmed annotations. Sequences should be
hard trimmed if using other assembly algorithms, such as SPAdes, Tadpole, Bowtie etc.

Operations that do not respect soft trims

The following operations will include sequence that has been soft trimmed:

• Calculation of nucleotide/amino acid frequencies and molecular weights


• Calculation of sequence lengths (including lengths graph)
• Dotplot
• BBTools operations, including “Trim with BBDuk”, “Merge Paired Reads”, “Remove Du-
plicate Reads”, and “Error correct and normalize”
• Mask Alignments
• Export formats fasta/fastq, Mega, Nexus, Phylip, CSV/TSV

Export formats SAM/BAM, Genbank, GFF, and ACE incorporate the trim information in the
export.
10.2. READ PROCESSING 157

10.2.3 Merging paired reads

If paired reads were sequenced with an insert size shorter than twice the read length then pairs
may overlap with each other, in which case it can be useful merge to each pair into a single
longer read. After setting up paired reads (see section 10.2.1), use Merge Paired Reads... from
the Sequence menu to merge them.

This function uses BBMerge from the BBtools suite. For a detailed explanation of any BBMerge
setting, hover the mouse over the setting, or click the help (question mark) button next to the
custom options under More Options.

Alternatively, an experimental plugin for merging paired reads using FLASH is available from
the Geneious website plugins page.

10.2.4 Removing duplicate reads

To remove duplicate reads from NGS datasets, use Remove Duplicate Reads... under the Se-
quence menu. This function runs Dedupe, and will remove duplicate sequences that are either
exact matches, subsequences, or sequences within some percent identity. It can also find over-
lapping sequences and group them into clusters. For a detailed explanation of any Dedupe
setting, hover the mouse over the setting, or click the help (question mark) button next to the
custom options under More Options.

10.2.5 Removing chimeric reads

To remove chimeric reads from NGS datasets, select the sequence list containing your reads
and go to Sequence → Remove Chimeric Reads. This runs UCHIME by Robert Edgar and is
typically used to remove PCR chimeras from amplicon sequencing (e.g. 16S, ITS). The public
domain version of UCHIME is provided with Geneious. If you would prefer to detect chimeric
sequences using USEARCH, which contains a much faster version of the UCHIME algorithm,
you can optionally specify a USEARCH executable instead.

Geneious supports reference mode only, and you must supply the reference database your-
self. This may be either a nucleotide sequence list or a nucleotide alignment in your Geneious
database. Information about common reference databases for 16S rRNA or fungal ITS se-
quences is available, along with links to download locations, on the Geneious knowledge base.
When you have imported your preferred database into Geneious, choose this document as the
Reference Database.

If your query sequences are paired, you may need to run Merge paired reads before chimera
detection. When a query sequence list with reads set as paired is selected, Geneious will al-
ways consider both members of a pair to be chimeric if either is identified as such. Note
that UCHIME does not recognize paired reads, therefore by default Geneious will concatenate
158 CHAPTER 10. ASSEMBLY AND MAPPING

paired reads and submit each pair to UCHIME as a single sequence. This should generally be
appropriate for reads that are separated by small gaps. To override this setting, you can check
Run paired reads separately under More Options.

The following options are available for configuration within Geneious should you wish to
optimize the settings for your data. The default settings in Geneious are consistent with the
UCHIME defaults.

• Include reverse complement: UCHIME looks only at the sequences provided in the refer-
ence database. You should check the Include reverse complement box if you would like
Geneious to submit both the reference database sequences and the reverse complement
of each to UCHIME.

• Save chimeric reads: This will save the chimeras that are removed as a separate list.
Ordinarily only those reads identified as non-chimeric would be saved, so choose this
option if you want the chimeric sequences for any subsequent steps or analysis.

• Use USEARCH executable: The USEARCH implementation of UCHIME is also sup-


ported. To use it you must first navigate to the USEARCH download page, register for
a licence, and then download USEARCH. Currently Geneious supports USEARCH v8.x.
Once downloaded check the Use USEARCH executable instead box and specify the lo-
cation of the file you downloaded.

• Minimum score to report chimera: The minimum score at which a sequence is consid-
ered a chimera. Values from 0.1 to 5.0 are considered reasonable. Lower values increase
sensitivity but may result in more false positives. This may need to be changed as the
weight of a no vote and minimum divergence ratios are changed.

• Weight of a no vote: The UCHIME algorithm uses a voting system when determining the
score of each read. This option specifies the weight of each no vote. Increasing this option
tends to result in lower scores. Decreasing to around 3 or 4 may give better performance
on denoised data.

• Minimum divergence ratio: This option is used to allow some flexibility in what is con-
sidered chimeric, by allowing you to specify the allowed percent divergence between the
query and the closest reference database sequence. The default (0.5%) allows chimeras
that are up to 99.5% similar to a reference sequence. This is useful when you are not
concerned with chimeras that are similar to the parent sequences.

• Run paired reads separately: Tells Geneious not to concatenate paired reads prior to
running UCHIME. This is useful when there is a long insert between members of a pair
and running them as a pair may lead to increased false negatives. Note that Geneious will
consider both members of a pair as chimeric if either is classified as such by UCHIME,
irrespective of whether this option is selected.
10.2. READ PROCESSING 159

• Number of chunks: This option specifies the number of non-overlapping segments (chunks)
that the query sequence is divided into. Each chunk is used to search the reference
database.

• Sequence length: By default UCHIME is designed to operate on sequences between 10


bp and 10,000 bp. This can be altered by changing the Minimum sequence length and
Maximum sequence length under More Options. Altering the valid sequence lengths
may be necessary when reads are paired and concatenated because the new read length
is the sum of both pairs. Similarly, the minimum sequence length needs to be considered
when trimmed reads are present, as Geneious will perform a hard trim before running
UCHIME.

• Custom UCHIME options: Geneious supports sending additional options to UCHIME.


This is done by entering the desired options into the Custom UCHIME options field
found under More Options. You can use any of the options that UCHIME would nor-
mally support as long as they are not input/output options and do not overlap with the
options provided by Geneious. It is up to you to ensure these are valid. When using a
custom USEARCH executable, refer the appropriate user guide for available command
line options. The following options are provided by Geneious:
UCHIME: --input, --db, --uchimeout, --uchimealns, --minh, --xn, --mindiv,
--chunks, --minlen, --maxlen
USEARCH: --uchime ref, --strand, --minseqlength, --maxseqlength,
--uchimeout, --db, --uchimealns, --minh, --xn, --mindiv, --chunks

10.2.6 Error correction and normalization of reads

Prior to de novo assembly, it can sometimes be useful to error correct the data or to normalize
coverage by discarding reads in regions of high coverage. This functionality is available using
Error Correct & Normalize Reads... from the Sequence menu.

This function uses BBNorm from the BBtools suite, and it requires Java 7 or later in order to
run. For a detailed explanation of any BBNorm setting, hover the mouse over the setting, or
click the help (question mark) button next to the custom options under More Options. For
more information, see the BBNorm page on SeqAnswers.

10.2.7 Splitting multiplex/barcode data

Multiplex or barcode data (e.g. 454 MID data) can be separated using Separate Reads by
Barcode from the Sequence menu. This function copies all sequences matching a given barcode
to a correspondingly named sequence list document.

Default settings are provided for 454 standard and Titanium MID barcodes (with or without
160 CHAPTER 10. ASSEMBLY AND MAPPING

Figure 10.3: Options for separating reads by barcodes, using a custom barcode set, a fixed
adaptor and a fixed end primer
10.3. DE NOVO ASSEMBLY 161

Adaptor B trimming), and Rapid MID barcodes. These settings recognise standard MID se-
quences provided by 454 and use their names when appropriate.

To enter a custom barcode set, select Custom settings, then under Barcode set choose Edit
barcode sets. Then click Add to add your list of barcodes. To specify fixed sequences either
side of the barcode, enter these in the Adaptor and Linker sections.

If you only want to extract sequences with a single, specific barcode sequence (e.g. a primer),
check Specific barcode and enter the sequence. Alternatively if you do not know your barcode
sequences, you can just enter the length of your barcodes and Geneious will automatically
identify what the barcodes are.

Separate reads by barcodes only sorts by barcodes at the 5‘ end of the sequence, but primers,
adapters or barcodes on the 3‘ end of the sequence can be trimmed off by checking Trim End
Adaptor/Primer/Barcode. Either enter a specific sequence to trim, or add your 3‘ barcode to
your custom barcode set and use the text [EN D BARCODE] in the sequence box. Primer
trimming can also be performed after separating by barcodes using Trim Ends (see section
10.2.2).

For further information on splitting barcode data, hover the mouse over any of the settings in
the Separate Reads by Barcode options window.

10.3 De novo assembly

This can be used to assemble a small number of Sanger sequencing reads (i.e. forward and re-
verse reads of the same sequence), or millions of reads generated by NGS platforms such as Il-
lumina, 454, Ion Torrent and PacBio CCS. To assemble a contig firstly select all of the sequences
and/or contigs you wish to assemble in the document table then click Align/Assemble in the
toolbar and choose De Novo Assemble. The basic options for de novo assembly will then be
displayed.

The options available here are as follows:

• Assemble by (aka Assemble by Name): If you have selected several groups of fragments
which are to be assembled separately, you can specify a delimiter and an index at which
the identifier can be found in all of the names. Sequences are grouped according to the
identifier and each group is assembled separately. If a reference sequence is specified, it
is used for all groups. eg. For the names A03.1.ab1, A03.2.ab1, B05.1.ab1, B05.2.ab1 etc
where “A03” and “B05” are the identifiers you would choose “Assemble by 1st part of
name, separated by . (full stop)”

• Use % of data: This option is will show with large datasets and enables you to assemble a
subset of your data, rather than the full dataset. For example, if you enter 20% here, then
the first 20% of reads in a sequence list will be assembled and the rest will be ignored.
162 CHAPTER 10. ASSEMBLY AND MAPPING

Figure 10.4: Basic de novo assembly options

This is useful in situations where the full dataset is too large for the size of genome being
assembled.

• Assembly method: In this section you can choose from the built-in Geneious assembler,
or Tadpole, SPAdes, Velvet, MIRA and CAP3 assemblers if you have these plugins in-
stalled. Click the question mark button next to the method to see a list of the advantages
and disadvantages of each assembler. The Sensitivity setting (Geneious assembler only)
specifies a trade off between the time it takes to assemble and the accuracy of the assem-
bly. Higher sensitivity is likely to result in more reads being assembled.

• Trim Sequences: Select how to trim the ends of the sequences being assembled. See
section 10.2.2.

• Results: Allow you to choose an assembly name and what to return in your results. By
default, only the assembled contigs are saved, but you can also choose to return an as-
sembly report, lists of used or unused reads and the consensus sequences. The assembly
report summarises the assembly statistics and lists which fragments were successfully
assembled and which contig they went in to along with a list of unassembled fragments.
If Save in Subfolder is selected all the results of the assembly will be saved to a new sub-
folder inside the one containing the fragments. This folder will always only contain the
assembly results from the one most recent assembly - it creates a new folder each time it
is run.

• More Options: Under the advanced options you can change the parameters used by
10.3. DE NOVO ASSEMBLY 163

Geneious when aligning fragments together. These are fully documented if you hover
the mouse over them in Geneious. To edit these settings, you must first chooseCustom
Sensitivity in the assembly method panel. For sequences which are lower quality or
contain many errors, or are expected to be divergent from one another, you may need to
decrease the minimum overlap identity and maximum mismatches per read, and increase
the maximum gaps allowed per read.

Choose the options you require and click ‘OK’ to begin assembling the contig. Once complete,
one or more contigs may be generated. If you got more contigs than you expect to get for the se-
lected sequences then you should try adjusting the options for assembly. It is also possible that
no contigs will be generated if no two of the selected sequences meet the overlap requirements.

Note: The orientation of fragments will be determined automatically, and they will be reverse
complemented where necessary.

If you already have a contig and you want to add a sequence to it or join it to another contig
then just select the contig and the contig/sequence and click de novo assembly as normal.

Scaffolding

Scaffolds are contigs which are linked together, with the missing regions between them filled by
Ns. The size of the missing region is based on paired read distances. The Geneious assembler
will produce scaffolds if this option is turned on under More options. If this setting is disabled
it is because your data does not have paired reads or you haven’t marked the data as paired
using Set Paired Reads from the Sequence menu.

Unlike some assemblers where scaffolding is performed after contig formation, Geneious scaf-
folding is integrated into the contig assembly process. When there is strong support for scaf-
folding, it may take precedence over potentially conflicting standard contig formation. For this
reason, Geneious can’t be configured to produce both scaffolds and non-scaffolds from a single
run.

De novo assembly of circular genomes

The Geneious de novo assembler can produce a circular contig if you are working with a cir-
cular genome. To enable this option, click the More Options button and check Circularize
contigs of [x] or more sequences, if ends match. Circularization requires that the ends of the
contig match, and that the contig contains at least the number of specified sequences.

A circular contig will contain reads at either end marked with arrows, which denotes that these
reads span the origin and link back around to the other end of the assembly. The consensus se-
quence produced from this contig will also be circular. The Topology column in the Document
table lists whether a given contig is circular or linear.
164 CHAPTER 10. ASSEMBLY AND MAPPING

10.3.1 The de novo assembly algorithm

The sequence assembler in Geneious is flexible enough to handle read errors consisting of either
incorrect bases or short indels. It can handle reads of any length, including paired-reads and
mixtures of reads from different sequencing machines (hybrid assemblies).

De novo assemblers are generally either overlap or k-mer (De Bruijn graph) based. The Geneious
de novo assembler is an overlap assembler which uses a greedy algorithm similar to that used
in multiple sequence alignment.

1. For each sequence a blast-like algorithm is used to find the closest matching sequence
among all other sequences.

2. The highest scoring sequence and its closest matching sequence are merged together into
a contig (reverse complementing if necessary). This process is repeated, appending se-
quences to contigs and joining contigs where necessary.

3. For paired read de novo assembly, 2 sequences with similar expected mate distances are
given a higher matching score if their mates also score well against each other. Similarly
a sequence and its mate will be given a higher score if they both align at approximately
their expected distance apart to an already formed contig. The effect of this heuristic
is that paired read de novo assembly starts out by finding 2 sets of paired reads and
forming 2 contigs. Each of these 2 contigs will contain 1 sequence from each pair and the
2 contigs are expected to be separated by the expected mate distance. Assembly proceeds
from there either adding new paired reads to the contigs or forming new pairs of contigs
which eventually merge together. Due to the nature of this algorithm, paired read de
novo assembly in Geneious only works well if you have high coverage of paired reads -
a hybrid assembly of mostly unpaired data with a few paired reads will not make good
use of the paired read data, but this is expected to improve in future versions.

4. Each contig generated by a gapped de novo assembly has some minor fine tuning per-
formed on it both during assembly and upon completion. For each gapped position in a
sequence, a base adjacent to the gap is shuffled along into the gap if it is the same base as
the most common base in other sequences in the contig at that position. After doing this
if any column now consists entirely of gaps that column is removed from the contig

5. Other heuristics are applied throughout the assembly to improve the results such as iden-
tifying repeat regions

6. Both the Geneious de novo and reference assemblers use a deterministic method (even
when spreading the work cross multiple CPUs) such that if you rerun the assembler using
the same settings and same input data it will always produce the same results.
10.4. MAP TO REFERENCE 165

10.4 Map to reference

Map to Reference is used when you wish to assemble sequences to a known sequence, for
example to locate differences or SNPs. To perform assembly to a reference sequence select your
sequences and click Align/Assemble and choose Map to Reference. Choose the name of the
sequence you wish to use as the reference in the Reference Sequence chooser and click OK. One
contig will be produced per reference and this will display the reference sequence at the top of
the alignment view with all other sequences below it.

The options available in the Map to Reference setup dialog are similar to those for de novo
assembly (see section 10.3), with a few differences detailed in the subsequent sections. In
the Methods panel, you can choose between the standard Geneious assembler, Geneious for
RNAseq (R9+), or the BBMap, Bowtie, Tophat, or Minimap2 (Geneious Prime 2020+) mappers
if you have these plugins installed.

See chapter 11 for details on identifying differences or SNPs in your assembly.

10.4.1 Choosing reference sequences

In version 8.1 and above, it is not necessary to select the reference sequence prior to choosing
Map to Reference. Instead, the reference sequence can be chosen from within the Map to Refer-
ence setup options by clicking Choose.... This brings up a document chooser from which you
can select single or multiple reference sequences from any folder in your database.

Multiple reference sequences may be selected either by choosing a sequence list containing
your reference sequences, or by selecting multiple sequence documents using the Choose...
button. Single reference sequences may be pre-selected prior to choosing the Map to reference...
option, however when using the Choose... button, only those documents selected within the
Choose... dialog will be used as reference sequences (all other sequences will be mapped).

If multiple reference sequences are selected, each read will be mapped to the sequence with
the best match only, and will produce one contig per reference. Batch assemblies where each
read gets mapped to each reference sequence can be done by using Workflows → Map reads
to each reference sequence.

10.4.2 Fine tuning

When aligning to reference the sequences are not aligned to each other, each of them is instead
aligned to the reference sequence independently and the pairwise alignments are combined
into a contig. However, an iterative fine tuning step can be enabled, which makes reads that
overlap from the initial assembly stage align better to each other. Fine tuning causes reads to
align better to each other around indels which improves the accuracy of consensus and variant
166 CHAPTER 10. ASSEMBLY AND MAPPING

calling. For more information, click the help (question mark) button next to the fine tuning
options in the Map to Reference setup dialog.

If you just wish to use a reference sequence to help construction of the contig where the reads
extend beyond the length of the reference then you have two options. With iterative fine tuning,
reads can extend a bit further past the ends of the reference sequence on each iteration so make
sure you set the number of iterations high enough. Or you could select all sequences including
the reference and use the De Novo assembler.

10.4.3 Deletion, insertion and structural variant discovery (DNA mapping)

Geneious can discover structural rearrangements, short insertions, and arbitrarily large dele-
tions from paired or unpaired reads by analyzing how fragments of each read align to different
regions of the reference sequence(s). To enable this option, check Find structural variants, short
insertions and deletions of any size. If you only want to find deletions up to a specified size,
check Find short insertions and large deletions up to...

For this operation, Geneious makes two passes during mapping. On the first pass each read
mapped will generate candidate junctions (sites for structural variants) based on where frag-
ments of the read align to different regions of the reference sequence(s). The more reads that
support a candidate junction, the more likely it will be used during the second pass. The second
pass involves mapping reads using the discovered junctions.

Insertions, where the ends of a read map to nearby locations but the center of the read doesn’t
map, are also detected. Since discovered insertions must be less than the read length, only short
insertions are generally discovered. Only the most common insertion at a each position will be
annotated and have reads correctly aligned with it.

By default, at least 2 reads must support the discovery of a junction in order for it to be used
during the next pass. This threshold can be adjusted under More Options by changing the
Minimum support for structural variant discovery setting. Insertion discovery can also be
disabled here by unchecking Include insertions in structural variants.

Junctions used during the second mapping pass are annotated on the reference sequence under
a track named after the reads. Annotations are only created for variants which are at least 3 bp
in size. Each junction annotation has the following properties:

• Junction Type: This will be Insertion for short insertions and Deletion for deletions up
to 1000 bp. For longer deletions or structural variants, this will be Rearrangement, with
(inversion) potentially appended.

• Intervals: Junctions of type Insertion are shown as one single interval annotation covering
the gapped region in the contig, or when viewed on the unaligned reference sequence, po-
sitioned between the nucleotides on either side of the insertion. Junctions of type Deletion
and Rearrangement are each represented with two 1-bp annotation intervals positioned
10.4. MAP TO REFERENCE 167

on the last nucleotide before the read jumps and continues elsewhere. For Deletions, this
is a single annotation with two linked intervals. For Rearrangements, the junction site is
split into two separate annotations, each with a jagged edge on one side of the interval to
indicate the side which jumps elsewhere.

• Deletion Size: This is present when Junction Type is Deletion.

• Rearrangement Distance: This is present when Junction Type is Rearrangement to indicate


the distance between the two junction sites.

• Insertion Size: This is present when Junction Type is Insertion to indicate the number of
nucleotides in the insertion.

• Insertion: This is present when Junction Type is Insertion to indicate the nucleotides in-
serted.

• Reads supporting discovery: Indicates the number of reads that supported discovery
of this junction during the first pass. This may be lower than the advanced minimum
support setting in cases where other reads supported discovery of a slightly offset version
of this junction, which allows this junction to be retained on the next pass.

• Reads using: Indicates the number of reads that used this junction as part of their map-
ping during the second pass.

• Junction Source & Junction Destination: Clickable links to the junction positions in the
reference sequence. When the destination is a different reference sequence, this is prefixed
with the sequence name followed by a colon.

• Color: Annotations are colored from blue to green based on increasing values of Reads
supporting discovery. At 5 and above the color is fully green.

Reads spanning junctions may be represented in one of three possible ways:

• For insertions, the insertion is represented as a gap in the reference sequence.

• For deletions under 1,000 bp, the deletion is represented as a gap in the read. This gap
contributes towards calling a gap in the consensus sequence.

• For longer deletions or for structural variants, two copies of the read appear in the con-
tig where the fragment of the read extending past the junction is marked as trimmed.
Trimmed regions do not contribute to consensus sequence calling. These trim regions
will only be visible when in editing mode. When not in editing mode, the trimmed re-
gions will appear as 3 gaps fading to light grey. Clicking on these fading gaps will jump
to the read at the other end of the junction. Faded gaps depend on the presence of the
junction track created at the time of mapping. If these annotations are deleted, the faded
gaps will appear as trimmed regions instead.
168 CHAPTER 10. ASSEMBLY AND MAPPING

10.4.4 RNAseq mapping

To map RNA sequence reads to a genome with introns, choose Geneious RNA as the Mapper
in the Map to Reference setup dialog. This function can map reads that span existing annotated
introns, or discover insertions, novel introns and fusion genes.

This function works in the same way as deletion and structural variant discovery (section
10.4.3) for DNA mapping, by analyzing how fragments of each read align to different regions
of the reference sequence(s), and creating a junction annotation at the point where the read is
split. By default, at least 2 reads must support the discovery of a junction in order for it to be
annotated. This threshold can be adjusted under More Options by changing the Minimum
support for intron/fusion gene discovery setting.

If Span annotated mRNA introns is checked, junctions will be created from existing annota-
tions on the reference sequence. Reads are still allowed to map anywhere, but will be allowed
to freely span these junctions if that produces the best mapping.

To only find introns up to a certain size, check Find novel introns up to...; to find introns of any
size, insertions, or structural rearrangements that may indicate a fusion gene, use Find fusion
genes and novel introns.

As for deletion and structural variant discovery, junctions are annotated on the reference se-
quence under a track named after the reads (see Figure 10.5). Each junction has the following
properties:

• Junction Type: This will be Insertion for short insertions. For introns under 2,000,000
bp, this will be Intron. For longer introns or structural variants, this will be Fusion, with
(inversion) potentially appended.

• Intervals: Junctions of type Insertion are shown as one single interval annotation covering
the gapped region in the contig, or when viewed on the unaligned reference sequence,
positioned between the nucleotides on either side of the insertion. Junctions of type In-
tron and Fusion are each represented with two 1-bp annotation intervals positioned on
the last nucleotide before the read jumps and continues elsewhere. For Introns, this is a
single annotation with two linked intervals. Introns that have common start and finish
nucleotides will be assigned an appropriate direction. For Fusions, the junction site is
split into two separate annotations, each with a jagged edge on one side of the interval to
indicate the side which jumps elsewhere.

• Intron Size: This is present when Junction Type is Intron.

• Fusion Distance: This is present when Junction Type is Fusion to indicate the distance
between the two junction sites.

• Insertion Size: This is present when Junction Type is Insertion to indicate the number of
nucleotides in the insertion.
10.4. MAP TO REFERENCE 169

• Insertion: This is present when Junction Type is Insertion to indicate the nucleotides in-
serted.

• Reads supporting discovery: Indicates the number of reads that supported discovery
of this junction during the first pass. This may be lower than the advanced minimum
support setting in cases where other reads supported discovery of a slightly offset version
of this junction, which allows this junction to be retained on the next pass.

• Reads using: Indicates the number of reads that used this junction as part of their map-
ping during the second pass.

• Junction Source & Junction Destination: Clickable links to the junction positions in the
reference sequence. When the destination is a different reference sequence, this is prefixed
with the sequence name followed by a colon.

• Color: Annotations are colored from blue to green based on increasing values of Reads
supporting discovery.

Reads spanning junctions may be represented in one of three possible ways.

• For insertions, the insertion is represented as a gap in the reference sequence.

• For introns under 15 bp, the deletion is represented as a gap in the read. This gap con-
tributes towards calling a gap in the consensus sequence.

• For longer introns or for fusion genes, two copies of the read appear in the contig where
the fragment of the read extending past the junction is marked as trimmed. Trimmed
regions do not contribute to consensus sequence calling. These trim regions will only
be visible when in editing mode. When not in editing mode, the trimmed regions will
appear as 3 gaps fading to light grey. Clicking on these fading gaps will jump to the
read at the other end of the junction. Faded gaps depend on the presence of the junction
track created at the time of mapping. If these annotations are deleted, the faded gaps will
appear as trimmed regions instead. (see Figure 10.5).

10.4.5 The map to reference algorithm

The reference assembly algorithm used is a seed and expand style mapper followed by an op-
tional fine tuning step to better align reads around indels to each other rather than the reference
sequence. Various optimizations and heuristics are applied at each stage, but a general outline
of the algorithm is

1. First the reference sequence(s) is indexed to create a table making a record of all locations
in the reference sequence that every possible word (series of bases of a specified length)
occurs.
170 CHAPTER 10. ASSEMBLY AND MAPPING

Figure 10.5: A single cDNA sequence mapped to a genomic sequence using the Geneious RNA
algorithm. In the zoomed out view above, the coverage graph and junction annotation track
provides a quick view of where the cDNA maps to the genomic sequence. Five copies of the
cDNA sequence appear in the contig, as it maps across 5 exons. The inset shows a zoomed in
view of a junction, with the junction annotation properties shown.
10.5. THE CONTIG VIEWER 171

2. Each read is processed one at a time. Each word within that read is located in the reference
sequence and that is used as a seed point where the matching range is later expanded
outwards to the end of the read.

3. If a read does not find a perfectly matching seed, the assembler can optionally look for all
seeds that differ by a single nucleotide.

4. Before the seed expansion step, all seeds for a single read that lie on the same diagonal
are filtered down to a single seed.

5. During seed expansion, when mismatches occur a look-ahead is used decide whether to
accept it as a mismatch or to introduce a gap (in either the reference sequence or read)

6. The mapper handles circular reference sequences by indexing reference sequence words
spanning the origin and allowing the expansion step to wrap past the ends

7. All results are given a score based on the number of mismatches and gaps introduced.
Normally the best scoring (or a random one of equally best scoring) matches are saved
although there is an option to map the read to all best scoring locations

8. Paired reads are given an additional score penalty based on their distance from their
expected distance so that they prefer mapping close to their expected distance with as
few mismatches as possible, but they can also map any distance apart if an ideal location
is not found.

9. The final optional fine tuning step at the end, shuffles the gaps around so that they reads
better align to each other rather than the reference sequence.

10. For details on how mapping qualities are calculated, see section ??.

For further details and for a comparison of the Geneious reference assembler to other software,
see the Geneious Mapper white paper.

10.5 The Contig Viewer

Contigs in Geneious Prime are viewed and edited in exactly the same way as alignments (see
section 9.3). Features particularly relevant to contig assemblies are highlighted on Figure 10.6
and described below.

1. Consensus sequence
The consensus sequence is displayed at the top of the assembly. This is the consensus of
the reads only and does not include the reference sequence if one is present. The consen-
sus settings can be found under the Display tab, and information about these settings
172 CHAPTER 10. ASSEMBLY AND MAPPING

Figure 10.6: The Contig Viewer


10.5. THE CONTIG VIEWER 173

can be found in section 9.5. If the sequences in the contig have quality information at-
tached we recommend selecting the Highest Quality consensus type. In most cases this
removes the need for manually editing the contig because the consensus will be the base
with the highest total quality at each position.

2. Coverage graph
The coverage graph shows how many reads map at each position and can be useful for
assessing the quality of your mapping. The Graphs tab enables you to show or hide the
Coverage graph as well as enable other graphs such as the Identity or Sequence Logo
graphs. See section 9.3.2 for a detailed description of these graphs. The data underlying
these graphs can be exported in CSV format by clicking the Export option under the
Graphs tab.

3. Reference sequence
If you have run Map to Reference, the reference sequence is shown at the top of the
assembly and is shaded yellow.

4. Color schemes
The coloring of bases in the assembly can be selected from the dropdown Colors menu
at the top of the General tab. Color schemes particularly useful for contig assemblies
include the following:

• For assemblies of Sanger chromatograms, the default color scheme is Base call qual-
ity. This assigns a shade of blue to each base based on its quality. Dark blue for
confidence < 20, blue for 20 - 40 and light blue for > 40.
• For assemblies of paired reads, the default color scheme is Paired Distance. With
this color scheme paired reads are colored according to how close the actual sepa-
ration between the reads is to their expected separation. Green indicates they are
correct, yellow and blue indicate under or over their expected separation and red
indicates the reads are incorrectly orientated. To configure the colors used, and the
sensitivity for deciding if reads are close enough to their expected distance, click the
Options link next to the color chooser. The status bar below the viewer will show the
actual and expected separation between a particular pair of reads when you hover
the mouse over one of the reads in the pair.
• The Mapping Quality color scheme can be selected for reads mapped to a refer-
ence sequence. A mapping quality represents the confidence that the read has been
mapped to the correct location. For a read with mapping quality Q, the probabil-
ity that it has been incorrectly mapped is 10(-Q/10). For example, a read with a
mapping quality score of 20 has a 1% chance of having been incorrectly mapped.
Reads that could be mapped to multiple locations will have a maximum mapping
quality score of 3, which indicates at least a 50% probability of the read mapping
elsewhere. Mapping qualities have a maximum value of 254 for consistency with
the SAM/BAM format. If a sequence has no mapping quality (i.e the document was
produced in a version of Geneious prior to 8.1 or imported from a SAM/BAM file
174 CHAPTER 10. ASSEMBLY AND MAPPING

that didn?t have mapping quality) then it will be colored gray. Mapping quality for
the sequence under the mouse is also displayed in the status bar. All mappers use
heuristics to calculate mapping qualities. For unpaired reads, the Geneious mapper
assigns a mapping quality of 20*(the number of additional mismatches in the second
best location the read maps to). For paired reads the individual unpaired mapping
qualities are calculated, but these are increased by up to 20 depending on how close
the best pair is to the expected insert distance compared with the second best pair.
• The By Direction color scheme allows you to quickly see whether reads are mapped
in the forward or reverse orientation.

5. Advanced Layout settings


The settings under the Advanced tab contain options for adjusting the layout of the
contig.

Figure 10.7: Advanced layout settings for the contig view

Assemblies of short read data are normally viewed with the option Vertically compress
contig enabled. This option puts the reads side by side where possible so that multiple
reads can be viewed in the same row. Vertically compress contig cannot be enabled if
Wrap contig is switched on. Wrap Contig will wrap the assembly to the screen width and
is not recommended for large assemblies.
Hide sites over x % gaps hides sites that have over the set percentage of gaps, so that in-
dels introduced by sequencing errors do not interfere with the viewing of the alignment.
This setting does not hide sites where the reference is a non-gap, and does not hide scaf-
folding gaps in de novo contigs. This setting can only be enabled when wrap contigs is
switched off, and editing mode is switched off.
Link Paired Reads: With this option on, pairs of reads will be laid out in the same row
with a horizontal line connecting them. This option is only visible on contigs contain-
ing reads that have been paired before assembling - see section 10.2.1. Reads separated
10.6. EDITING CONTIGS 175

by more than 3 times their expected distance are not linked by default unless the Link
distant reads setting is turned on.
The horizontal line between paired reads is colored according to how close the separation
between the reads is to their expected separation. The coloring is the same as for Paired
Distance coloring described above, where Green indicates they are correct, yellow and
blue indicate under or over their expected separation and red indicates the reads are
incorrectly orientated.
If Link Paired Reads is off, it is still possible to view the connecting line between any pair
of reads by mousing over one of the reads. To find the pair of a particular read, right click
on the read and choose Go to paired mate.

Finding regions of low/high coverage

In addition to the coverage graph which gives you a quick overview of coverage, under the
Annotate & Predict toolbar is the Find Low/High Coverage feature. This feature annotates
all regions of low/high coverage which you can then navigate through using the little left
and right arrows next to the coverage annotations in the controls on the right. You can set
the threshold low/high coverage by either specifying an absolute number of sequences or a
number of standard deviations from the mean coverage.

The find low/high coverage tool can also be used to record the minimum, mean, and maximum
coverage of each annotation of a particular type on the reference sequence. To do this, in the
Only Find In section of the options, turn on Annotations in reference sequence of type and
choose Create annotations of same type on reference sequence.

10.6 Editing Contigs

Editing a contig is exactly the same as editing an alignment in Geneious Prime. After selecting
the contig, click the Allow Editing button in the sequence viewer and you can modify, insert
and delete characters like in a standard text editor.

Editing of contigs is done to resolve conflicts between fragments before saving the final consen-
sus. Highlighting can be enabled under the Display tab as described in section 9.3.1, to enable
you to quickly scan for and navigate to sites containing disagreements. With editing enabled,
you can change bases which you believe are bad calls to be the base which you believe is the
correct call. This is often decided by looking at the quality for each of the bases and choos-
ing the higher quality one. Geneious can do this automatically for you if you use the Highest
Quality consensus.

Bases in the consensus sequence can also be edited which will update every sequence at the
corresponding position to match what is set in the consensus.
176 CHAPTER 10. ASSEMBLY AND MAPPING

You can also manually move a read mapped to a reference sequence to a specific position in
the contig. To do this, select the read and right-click, then choose Move read to position.., and
enter the position where you want the left-most base in the read to sit.

Figure 10.8: Highlight disagreements and edit to resolve them

10.7 Extracting the Consensus

Once you are satisfied with a contig you can save the consensus as a new sequence by clicking
on the name of the consensus sequence in your contig and clicking the Extract button. You
can also generate consensus sequences for single or multiple contig documents by selecting the
documents and going to Tools → Generate Consensus Sequence.
Chapter 11

Analysis of Assemblies and Alignments

11.1 Finding polymorphisms

To easily identify bases which do not match the consensus or reference sequence, turn on High-
lighting in the consensus section of the sequence viewer options (see figure 11.1). Select the
options Disagreements to Consensus or Reference depending on your needs. When this is
on, matching bases are grayed out and bases not matching are left colored.

Figure 11.1: Options for highlighting disagreements in an alignment

With this on you can quickly jump to each disagreement by pressing Ctrl+D (command+D on
Mac OS X) or by clicking the Next Disagreement button in the sequence viewer option panel
to the right. Each disagreement can then be examined or resolved.

177
178 CHAPTER 11. ANALYSIS OF ASSEMBLIES AND ALIGNMENTS

11.1.1 Find Variations/SNPs

Manually investigating every little disagreement can be time consuming on larger contigs. The
Find Variations/SNPs feature from the Annotate & Predict menu will annotate regions of dis-
agreement and can be configured to only find disagreements above a minimum threshold to
screen out disagreements due to read errors. This feature can also be configured to only find
disagreements in coding regions (if the reference sequence has CDS annotations present) and
can analyze the effects of variations on the protein translation to allow you to quickly identify
silent or non-silent mutations. It can also calculate p-values for variations and filter only for
variations with a specified maximum P-Value.

For full details of how the various settings work in the Variation/SNP finder, hover the mouse
over them to read the tooltips or click one of the ‘?’ buttons.

Figure 11.2: The Find Variations/SNPs dialog

P-values

The p-value represents the probability of a sequencing error resulting in observing bases with
at least the given sum of qualities. The lower the p-value, the more likely the variation at the
given position represents an real variant. Click the down arrow next to the exponent of the
Maximum Variant P-Value setting to increase the number of variants found.
11.1. FINDING POLYMORPHISMS 179

When calculating P-Values:

• The contig is assumed to have been fine tuned around indels

• Ambiguity characters are ignored (other characters in the column are still used)

• Homopolymer region qualities are reduced to be symmetrical across the homopolymer.


For example if a series of 6 G’s have quality values 37, 31, 23, 15, 7, 2 then these are treated
as though they are 2, 7, 15, 15, 7, 2. This is done because variations may be called at either
end of the homopolymer and because reads may be from different strands.

• Gaps are assumed to have a quality equal to the minimum quality on either side of them
(after adjusting for homopolymers)

• When finding variations relative to a reference sequence, the p-value calculated is for the
variant, not the change. In other words the p-values calculated are independent of the
reference sequence data.

• The approximate p-value method calculates the p-value by first averaging the qualities of
each base equal to the proposed SNP and averaging the qualities of each base not equal
to the proposed SNP.

• Example: Assume you have a column where the reference sequence is an A and there are
3 reads covering that position.
1 read contains an A in the column and the other 2 reads contain a G. All 3 reads have
quality 20 ( = 99% confidence) at this position. We want to calculate the p-value for calling
a G SNP in this column.
Since the quality values are all equal, the p-value is the probability of seeing at least 2
G’s if there isn’t really a variant here. In other words, the probability of seeing 2 G’s by
chance due to a sequencing error plus the probability of seeing 3 G’s by chance due to a
sequencing error, which is calculated using the binomial distribution: 3 C2 ∗ 0.012 ∗ 0.99 +3
C3 ∗ 0.013 = 0.0003 (N CK is a binomial coefficient)

False SNPs due to strand-bias (when sequencing errors tend to occur only on reads in a single
direction) can be eliminated by specifying a value for the Minimum Strand-Bias P-value set-
ting. A Strand-Bias P-Value property is added to each SNP to indicate the probability of seeing
a strand bias at least this extreme assuming that there is no strand bias. SNPs with a smaller
strand bias p-value will be excluded from the results when using this setting.

Strand-Bias >50% P-value example: Assume you have a column covered by 9 reads containing
an A, 8 of which are on the forward strand. We calculate the probability of seeing bias at least
this extreme, assuming there is no strand-bias, which is the probability of seeing either 0, 1, 8,
or 9 reads on the forward strand. Using the binomial distribution, this is 9 C0 ∗ 0.59 +9 C1 ∗
0.59 +9 C8 ∗ 0.59 +9 C9 ∗ 0.59 = 0.039 (N CK is a binomial coefficient)
180 CHAPTER 11. ANALYSIS OF ASSEMBLIES AND ALIGNMENTS

Click the up arrow next to the exponent of the Minimum Strand-Bias P-Value setting to in-
crease the number of variants found. If there are any forward/reverse or reverse/forward
style paired reads, then variants with strand bias which are less than 1.5 times the insert size
from either end of the contig will not be filtered out.

Results display

The results of the Variant/SNP finder are added to the reference sequence in the assembly or
alignment as an annotation track. Clicking Save and clicking “Yes” when prompted to apply
the changes to the original sequences will add this annotation track onto the original reference
sequence file. If there is no reference sequence for the alignment or assembly the annotations
are added to the consensus sequence.

Figure 11.3: Results of SNP calling, shown as an annotation track on the sequence (above) and
the annotations table (below)

The results are also displayed in the annotations table and the following columns can be dis-
played:

• Change: Indicates the reference sequence nucleotides followed by the variant nucleotides.
For example ‘C → A’
11.2. ANALYZING EXPRESSION LEVELS 181

• Coverage: The number of reads that cover the SNP region in the contig. The coverage
includes both the reads containing the SNP and other reads at that position.
• Reference Frequency: The percentage of reads that agree with the reference sequence at
that position. This field will only be present if at least 1 read agrees with the reference
sequence.
• Variant Frequency: The percentage of reads that have the variation at that position. For
variations that span more than a single nucleotide, the variant frequency may appear as
a range (e.g. 47.8% – 51.7%) to indicate the minimum/maximum variant frequency over
that range.
• Polymorphism Type: This may be one of the following.
SNP (Transition): a single nucleotide transition change from the reference sequence
SNP (Transversion): a single nucleotide transversion change from the reference sequence
SNP: At a single position, there are multiple variations from the reference sequence
Substitution: A change of 2 or more adjacent nucleotides from the reference sequence
Insertion: 1 or more nucleotides inserted relative to the reference sequence
Deletion: 1 or more nucleotides deleted relative to the reference sequence
Mixture: multiple variations from the reference sequence which are not all the same
length

For variations inside coding regions (CDS annotations) the following fields can be displayed:

• Codon Change: indicates the change in codon. Essentially this is the same as the ‘Change’
field, but extended to include the full codon(s). For example ‘TTC → TTA’
• Amino Acid Change: indicates the change (if any) in the amino acid(s) by translating the
codon change. For example ‘F → L’
• Protein Effect: summarizes the change on the protein as either a substitution, frame shift,
truncation (stop codon introduced) or extension (stop codon lost)
• Average Quality: is the average of the quality score of all base-calls in reads that have the
variation at that position.

11.2 Analyzing Expression Levels

11.2.1 Calculating Expression Levels

The Calculate Expression Levels feature from the Annotate & Predict menu calculates nor-
malised expression measures from mapped RNA-seq data (see figure 11.4). RPKM, FPKM and
182 CHAPTER 11. ANALYSIS OF ASSEMBLIES AND ALIGNMENTS

TPM are calculated for each transcript annotation on the reference sequence of a contig and
the results are displayed as a heat map annotation track. Transcript annotations can be of type
CDS, Gene, mRNA, miRNA, ncRNA or ORF, and you must specify the annotation type you
wish to use when you run Calculate Expression Levels. For simplicity we refer to transcript
annotations as CDS annotations for the rest of this section.

Figure 11.4: Options for Calculate Expression Levels

If you have multiple reference sequences for each sample (e.g. reads mapped to multiple chro-
mosomes), all contigs from a single sample should be selected and run in a single step.

To calculate differential expression between samples you need to run Calculate Expression
Levels on the mapped reads for each sample separately, save the expression level track to the
reference sequence, and then compare the results on the reference sequence using Compare
Expression Levels.

Counting

The three metrics are calculated by normalizing the count of reads that map to each CDS anno-
tation. If a read at least partially intersects at least one interval from a CDS annotation, then it
will be treated as though that read mapped to that CDS annotation.

For reads that map to multiple locations, or reads that map to a location that intersect multiple
CDS annotations, these may either be counted as partial matches, excluded from the calcula-
tions, or counted as full matches to each location they map to. For example if a read maps to
two locations, then it will be counted as if 0.5 reads mapped to each of the two locations.

When calculating statistics, reads that don’t map or map outside of an annotation CDS annota-
tion are ignored.

RPKM

Reads per kilobase per million normalizes the raw count by transcript length and sequencing
depth.
11.2. ANALYZING EXPRESSION LEVELS 183

RPKM = (CDS read count * 109 ) / (CDS length * total mapped read count)

FPKM

Same as RPKM except if the data is paired then only one of the mates is counted, i.e., fragments
are counted rather than reads.

TPM

Transcripts per million (as proposed by Wagner et al 2012) is a modification of RPKM designed
to be consistent across samples. It is normalized by total transcript count instead of read count
in addition to average read length.

TPM = (CDS read count * mean read length * 106 ) / (CDS length * total transcript count)

Results

Results are displayed as an annotation track on the reference sequence. By default, annotations
are colored based on the TPM property, ranging from blue for 0, through to white for the mean
TPM, up to red for the highest TPM for any gene in the sample. In the results view, by clicking
on the little down arrow to the left of the track’s name, you can choose to color by a different
property.

The values for RPKM, FPKM and TPM, as well as the raw read counts, are entered as proper-
ties on the annotation and can be displayed by mousing over an annotation. To export these
values as a table, switch to the Annotations tab above the sequence viewer then click the Track
button and choose the Expression track to display. Then click the Columns button and add
the columns for FPKM, RPKM, TPM and/or the raw counts. Once you have the columns you
need, you can export the table in .csv format by clicking Export table.

11.2.2 Comparing Expression Levels

Geneious Prime is able to find differentially expressed genes between two or more sample
conditions. This can be done through one of two methods: the built-in Geneious method, or
the DESeq2 method. The Geneious method should be used when there are two samples only.
For more than two samples, the DESeq2 method should be used and requires at least two
replicates per condition.

To compare expression levels, you must first assemble the reads for each sample, to the same
reference sequence(s), then run Calculate Expression Levels from the Annotate & Predict
menu on each contig assembly. This will save an Expression Level track for each contig on
184 CHAPTER 11. ANALYSIS OF ASSEMBLIES AND ALIGNMENTS

Figure 11.5: Results from calculating expression levels of 4 samples, displayed as heat-map
annotation tracks on the reference sequence

the reference sequence(s). To compare these tracks, select the reference sequence document(s)
and go to Annotate & Predict → Compare Expression Levels.

If your samples map to multiple reference sequences, you should run both Calculate Expres-
sion Levels and Compare Expression Levels on all contigs from a single sample at once to
produce correct results in all cases. However, when using the Geneious method, Compare Ex-
pression Levels can optionally be run on just one reference sequence at a time when using a
normalization system other than Median of Gene Expression Ratios.

11.2.3 Geneious Method for Comparing Expression Levels

The Geneious method should be used to compare expression between two single-sample con-
ditions. Either read counts, fragment counts or transcript counts from each annotation can be
compared.

Since a single transcript can produce multiple reads and fragments, the number of reads and
fragments produced aren’t independent events so the confidence values produced by compar-
ing these are unlikely to be accurate. For this reason we recommend comparing samples using
transcript counts.
11.2. ANALYZING EXPRESSION LEVELS 185

Figure 11.6: Geneious method for comparing expression levels between two samples with no
replicates

Normalization

Different samples produce different quantities of transcripts, therefore, in order to compare


values between samples, the counts need to be normalized using one of the following methods.

• Total Count: The counts in each gene are scaled according to the total number of tran-
scripts mapped to all genes. For example, if one sample has twice as many transcripts
mapped as the other sample, then the counts for each gene need to be halved to make
them comparable with the other sample.
• Median Expression: The expression level of all expressed genes from the sample are
calculated and the median values of these from each sample are used to normalize. For
example, if one sample has a median twice as high as the other sample, then the counts
for each gene need to be halved to make them comparable with the other sample.
• Total Count Excluding Upper Quartile: The expression level of all expressed genes from
the sample are calculated and the total number of reads, fragments, or transcripts from
the lowest 75% of those are totaled. Values are normalized between samples based on
this total.
• Median of Gene Expression Ratios: For each gene the ratio of the expression level be-
tween samples is calculated. Then the median ratio across all expressed genes is used as
the normalization scale. This normalization method is the same as that implemented by
DESeq2.

All of these normalization methods (and more) are described and compared by Dillies et al
2012, who recommend using Median of Gene Expression Ratios. One reason for this is that a few
186 CHAPTER 11. ANALYSIS OF ASSEMBLIES AND ALIGNMENTS

highly expressed genes can greatly affect the total number of transcripts produced, so this can
distort the fraction of the total reads that contribute to genes with lower expression. The choice
of normalization method determines the Differential Expression Ratio for each gene.

P-Value Calculation

In addition to calculating the differential expression ratio, it is useful to know whether or not
that differential expression is statistically significant. This is represented by a p-value. A num-
ber of advanced methods have been published for the calculation of p-values based on a range
of assumptions. Many of these are compared by Soneson & Delorenzi 2013, who conclude that
no single method is optimal under all circumstances and that very small samples sizes impose
problems for all evaluated methods.

In this basic differential expression plugin in Geneious we have implemented a simple statisti-
cal test based on the assumption that the gene which each observed transcript came from is an
independent event.

For a given gene, the probability that a randomly selected transcript would come from that gene
is calculated as number of transcripts mapped to that gene/total number of transcripts from
that sample. This probability is normalized, the mean probability between the two samples cal-
culated, and this mean un-normalized for each sample. This produces an expected probability
that a randomly selected transcript from this sample comes from that gene, assuming that this
gene is not differentially expressed.

The Binomial Distribution is used to calculate the probability that an observed count at least as
extreme as the observed one would be seen, assuming this non-differentially expressed mean
probability. The probabilities from each sample are multiplied together to form the p-value.

11.2.4 DESeq2 Method for Comparing Expression Levels

The DESeq2 method should be used when you have multiple replicates for each of your con-
ditions. At least two replicates are required. Geneious Prime compares expression levels using
the DESeq2 package within R (see the R project website), which is automatically downloaded
and installed the first time the DESeq method is run on Windows and MacOS. On Linux sys-
tems, you will need to install R manually. Refer to 11.2.8

Fit Type

The Fit Type defines the model that will be used by DESeq2 to explain the observed dispersion
of read counts:

• Parametric: This is the default model. Use when decreasing gene-wise dispersion esti-
11.2. ANALYZING EXPRESSION LEVELS 187

Figure 11.7: DESeq2 method for comparing expression levels between 2 sample conditions with
replicates
188 CHAPTER 11. ANALYSIS OF ASSEMBLIES AND ALIGNMENTS

mates over the mean are observed. Fits a dispersion-mean relation using the following
equation:
asymptoticDispersion + extraP oisson
dispersion = (11.1)
mean
• Local: Typically used if the parametric model fails, such as when there is an insufficient
number of data points. The locfit package is used to fit a local regression of:

log (dispersions)
(11.2)
log (basemean)

The points used in this model are weight using the normalized mean count.

• Mean: Use when there is no apparent dispersion estimates over the mean. This model
uses the mean of the gene-wise dispersion estimates.

For more information refer to the estimateDispersions package documentation.

Assigning Samples to Conditions

Samples can either be assigned to conditions manually, by selecting either A or B in the Assign
Conditions table for each expression track (sample), or automatically based on a field on the
expression tracks.

You can select any properties on Expression Level tracks with exactly two values for Geneious
to use for automatically assigning conditions. To do this, you first need to set up a property
for Geneious to use. This can be done by adding a Meta-data field to the sample reads before
assembly or to the sample contigs prior to running Calculate Expression Levels (Geneious will
then propagate the information to create a property on the Expression Level track), or by adding
a property directly to the Expression Level tracks.

To add a Meta-data field to your samples, go to the Info tab and Create a new Meta-data type
in Edit Meta-Data Types, then add a field of this type with Add Meta-Data. For example, you
could add a Meta-data field of type ‘Cancer’ with values ‘Yes’ and ‘No’. Meta-data fields must
have exactly two values to be used for automatically assigning conditions.

To add a property directly to the expression level tracks, click on the arrow to the left of the
track name, choose Edit Properties and then Add a new property.

11.2.5 Results

Results are displayed as a Diff Expression annotation track on the reference sequence, colored
by the property Differential Expression Confidence (see figure 11.8). To change the annotation
coloring, click the arrow to the left of the track name and choose Color by / Heatmap. For
11.2. ANALYZING EXPRESSION LEVELS 189

Figure 11.8: Results from comparing expression levels with DESeq2, colored by Differential
Expression Confidence

example, you may prefer to color by Differential Expression Log2 Ratio once you have filtered at
an appropriate confidence level (e.g using an abs > 6 filter).

The results may be exported as a table in *.csv format from the Annotations tab above the
sequence viewer. Click the Track button and choose the Expression track to display. Then click
the Columns button and add the results columns you wish to display. Use the Export table
button to export the displayed values.

For both the DESeq2 and Geneious methods, each Expression Difference annotation produced
will have the following properties:

• Differential Expression p-value: A p-value indicating the confidence that this gene is
differentially expressed.

• Differential Expression adjusted p-value (DESeq2 results only): The p-value adjusted
for multiple tests.

• Differential Expression Absolute Confidence: The negative base 10 logarithm of the


p-value (adjusted p value for DESeq2 results). This field is useful for filtering on. For
example in the Filter box on the results, you could type “absolute confidence>6” (without
quotes) to only show genes that we are fairly confident have differential expression. Since
filtering works on partial property names, in most cases just a shorter filter like abs > 6 is
sufficient unless your genes have existing numeric properties containing the text “abs”.

• Differential Expression Confidence: The same as Absolute Confidence, but adjusted to


be negative for genes that are under expressed in sample 2 compared to sample 1, or
positive for over expressed genes. The results are colored based on this property, from
blue for under expressed genes, through to white for genes that are not differentially
expressed, through to red for genes that are over expressed. Confidence coloring reaches
maximum intensity at ± 8.
190 CHAPTER 11. ANALYSIS OF ASSEMBLIES AND ALIGNMENTS

• Differential Expression Ratio: (Geneious method only) The ratio of the normalized val-
ues between the two samples, but ratios less than one are replaced with −value−1 . This
results in a value greater than 1 for over expressed genes and less than −1 for under ex-
pressed genes. When one sample has no or very low expression, the ratio is capped at
± 1,000,000. Ratio coloring reaches maximum intensity at ± 5.

• Differential Expression Log2 Ratio: The base 2 log of the ratio of the normalized values
between the two samples. When one sample has no or very low expression, the log2 ratio
is capped at ± 1,000,000.

11.2.6 Principal Component Analysis for DESeq2 results

Principal component analysis (PCA) can be used to visualize variation between expression
analysis samples. This method is especially useful for quality control, for example in identify-
ing problems with your experimental design, mislabeled samples, or other problems.

When you perform a PCA, the normalized differences in expression patterns are used to com-
pute a distance matrix. The X- and Y-axes in a PCA plot correspond to a mathematical trans-
formation of these distances so that data can be displayed in two dimensions. This can make
interpreting PCA plots challenging, as their meaning is fairly abstract from a biological per-
spective.

Creating a PCA Plot

A PCA plot will automatically be generated when you compare expression levels using DESeq2.
This plot will be available to view in the PCA Plot viewer (Figure 11.9) once you have saved the
newly-generated differential expression sequence track to your document. If you have multiple
differential expression tracks from running DESeq2 more than once, you will have the option
to select which track you’d like to show in the PCA Plot viewer.

Interpreting PCA Plots

PCA is typically used primarily as a quality control or exploratory tool. In general, if your
samples were produced under two experimental conditions (e.g. treated vs. untreated), the
PCA plot should normally show that a) samples subjected to the same condition cluster to-
gether, and b) the clusters should be reasonably well-separated along the X-axis (“PC1”: the
first principal component).
11.2. ANALYZING EXPRESSION LEVELS 191

Figure 11.9: PCA plot viewer for RNA-Seq data from Vibrio fischeri ES114 collected under two
conditions with three samples per condition (Thompson et al, Env Microbiol 2017). This plot
shows that samples cluster with other samples grown in the same type of medium, and the
first component explains most of the variance.
192 CHAPTER 11. ANALYSIS OF ASSEMBLIES AND ALIGNMENTS

An Example of Using PCA As An Exploratory Tool

The plot in Figure 11.10 shows data from a fictitious bacterial strain that could potentially
be useful for bioremediation, cultured in the presence or absence of a halogenated industrial
solvent (“Halogen”). The halogen is toxic to the two mutant strains, but not to the wild type.
In this case, samples were compared according to the presence (blue) or absence (orange) of
the halogen in the culture medium. The mutants contain a deletion in a transcriptional element
thought to affect metabolism of the halogen, so the expected result is that expression levels in
mutants would be similar to those of wild-type samples grown in the absence of the halogen.

PCA for Diff Expression: Halogen, no vs. yes

mutant B +Halogen

4.9

3.9

2.9
mutant B -Halogen
wt -Halogen sample 2
1.9
PC2 (31.26% variance)

0.9

wt +Halogen sample 1
-0.1
wt -Halogen sample 1

wt +Halogen sample 2
-1.1

-2.1

-3.1

-4.1
mutant A -Halogen

mutant A +Halogen
-5.1

-3.2 -2.0 -0.8 0.4 1.6 2.8 4.0 5.2 6.4 7.6 8.8 10.0 11.2
PC1 (64.06% variance)

Figure 11.10: PCA plot of expression data from wild-type and mutant strains grown in the
presence and absence of a halogenated solvent.

On inspection of the PCA plot in Figure 11.10, two things are apparent:

1. The variance explained by the first principal component (X-axis) is not consistent with the
expected result of this experiment, which is a strong indication that further investigation
is required. Some possible explanations for this result are:

• Perhaps two of the wild type samples were mislabeled and the sample labeled “wt
-Halogen sample 1” was actually grown in the presence of the halogen and “wt
+Halogen sample 2” without the halogen.
• There could be some other explanation that should be investigated (e.g. contami-
nated medium, a malfunctioning incubator, etc.).
• These samples might be outliers that you choose to discard.
11.2. ANALYZING EXPRESSION LEVELS 193

2. The second principal component (Y-axis) explains quite a lot of variance, particularly the
variance between the two mutant strains. While this result might be expected, it could
also be interesting!

11.2.7 Volcano Plots

The Volcano Plot allows you to see the most highly differentially expressed loci. This is au-
tomatically generated when you compare expression levels using either Geneious or DESeq2.
This plot will be available to view in the Volcano Plot viewer (Figure 11.11) once you have
saved the newly-generated differential expression sequence track to your document. If you
have multiple differential expression tracks from running more than one analysis, you will
have the option to select which track you’d like to show in the Volcano Plot viewer.

The Volcano Plot shows the fold change (log2 Ratio) plotted against the Absolute Confidence
(-log10 adjusted p value). Each dot on the plot is one gene, and the “outliers” on this graph
represent the most highly differentially expressed genes. Red asterisks are genes where the
ratio is more than 10,000 or for which the p value is so tiny it is recorded as zero. As it’s not
possible to take the log of 0, the value is clipped so it is displayed at the higher ends but with
an asterisk. To see the gene represented by each dot, mouse over the dot. Click the dot to select
it and display the gene name on the graph. The dots on the Volcano plot are linked with both
the annotations table and the annotations in the sequence viewer. Thus, if a locus is selected on
the Volcano Plot, it will also be selected in the annotations table and the sequence viewer.

The Volcano Plot can be configured by clicking the Advanced button. Here you will find the
following options:

• Show zero line: Shows a vertical line at log2 ratio = 0 (fold change = 1)

• Show vertical lines at expression ratio: Shows vertical lines at ± log2 ratio values.

• Show horizontal line at p value: Shows a horizontal line at the -log10 p value. “Signifi-
cant” values will be above this line.

• Highlight values outside threshold lines: Highlights outliers defined by the specified
p value and expression ratio thresholds. Loci with p values less than the threshold and
log2 ratios outside the inter-quantile interval will be indicated by this color.

• Selected color: Dot color for selected loci.

11.2.8 Comparing Expression Levels Using the DESeq2 Method on Linux Systems

Before you can analyze RNA-Seq data using DESeq2 within Geneious on Linux, you will first
need to install R and the DESeq2 package, following the instructions below. This is only nec-
194 CHAPTER 11. ANALYSIS OF ASSEMBLIES AND ALIGNMENTS

Figure 11.11: Volcano Plot viewer showing the Advanced options along with some selected
outliers.

essary for Linux systems, as Geneious will automatically install R on Windows and MacOS
systems the first time DESeq2 is run.

Once R has been installed, you will be able to run the DESeq2 method to Compare Expression
Levels within Geneious like any other Geneious operation. You may need to specify the R
executable location in the DESeq2 options, for example, if R is not installed on PATH.

Note that the installation instructions for R on Ubuntu, Centos or Red Hat Enterprise Linux
require sudo or root access. If you do not have root access, you can install R from source in
your home directory.

Installing R

Install R version 3.3.0 or greater. For more information, or more detailed instructions for select
Linux distributions, refer to the R Project website.

R installation on Ubuntu:

1. On the terminal, enter the following commands to install R and its dependencies:
11.2. ANALYZING EXPRESSION LEVELS 195

$ sudo apt-get install libcurl4-openssl-dev libxml2-dev


$ sudo apt-get install r-base r-base-dev

2. You can now proceed with DESeq2 installation

R installation on Centos or Red Hat Enterprise Linux:

1. On the terminal, enter the following commands to install R's dependencies:


$ sudo yum install curl
$ sudo yum install libcurl libcurl-devel
$ sudo yum install libxml2 libxml2-devel

2. Now install R with the following command:


$ sudo yum install R

3. You can now proceed with DESeq2 installation

R installation from source in your home directory


You should only install R from source in your home directory if you do not have root access.

1. Download the R source archive from the R project website

2. On the terminal, go to the directory where you downloaded the the source

3. Extract the archive:


$ tar -xvzf R-3.X.Y.tar.gz

(Replace 3.X.Y with the version you are installing)

4. Enter the source directory:


$ cd R-3.X.Y

5. Run the following commands to build and install R:


$ ./configure --prefix=$HOME/R
$ ./configure --prefix=$HOME/R

Note that if the first command fails because of missing libraries, it may still be possible to install
these dependencies without root access.

This will install R in your home directory under $HOME/R/bin.


196 CHAPTER 11. ANALYSIS OF ASSEMBLIES AND ALIGNMENTS

You can add this directory to your PATH variable or run R from the terminal with the command
˜/R/bin/R.

You can now proceed with DESeq2 installation.

Installing DESeq2

Once you have installed R, you must install Bioconductor, then use Bioconductor to install
DESeq2. Only DESeq2 version 1.14.1 is currently supported.

1. From the terminal, launch R normally using the command R

2. Install Bioconductor:
> source("https://bioconductor.org/biocLite.R")

3. Install DESeq2:
> biocLite("DESeq2")

Note that you will be asked to install any dependent packages that are not already in-
stalled. This step can take some time: wait for it to finish before proceeding

4. Quit R by entering CTRL-D. You will be asked whether you’d like to save your workspace
image; you may answer ‘n’ (answer ‘y’ if you’d like to use R from the terminal in the
future and wish to save the history of the commands you used).
Chapter 12

Building Phylogenetic Trees

Geneious Prime provides inbuilt algorithms for Neighbour-joining (Saitou & Nei 1987) and UP-
GMA (Mitchener & Sokal 1957) methods of tree reconstruction, which are suitable for prelimi-
nary investigation of relationships between newly acquired sequences. For more sophisticated
methods of phylogenetic reconstruction such as Maximum Likelihood and Bayesian MCMC,
external plugins for specialist software are available (see section 12.3.5 for a full list). These can
be downloaded from the plugins page on our website or within Geneious by going to Plugins
under the Tools menu.

12.1 Phylogenetic tree representation

A phylogenetic tree describes the evolutionary relationships amongst a set of sequences. They
have a few commonly associated terms that are depicted in Figure 12.1 and are described below.

Branch length. A measure of the amount of divergence between two nodes in the tree. Branch
lengths are usually expressed in units of substitutions per site of the sequence alignment.

Nodes or internal nodes of a tree represent the inferred common ancestors of the sequences
that are grouped under them.

Tips or leaves of a tree represent the sequences used to construct the tree.

Taxonomic units. These can be species, genes or individuals associated with the tips of the
tree.

A phylogenetic tree can be rooted or unrooted. A rooted tree consists of a root, or the common
ancestor for all the taxonomic units of the tree. An unrooted tree is one that does not show the
position of the root. An unrooted tree can be rooted by adding an outgroup (a species that is
distantly related to all the taxonomic units in the tree).

197
198 CHAPTER 12. BUILDING PHYLOGENETIC TREES

Figure 12.1: Phylogenetic tree terms

For information on viewing and formatting trees in Geneious, see Section 12.5.

12.2 Tree building in Geneious Prime

To build a tree, select an alignment or a set of related sequences (all DNA or all protein) in the
Document table and click the Tree icon or choose this option from the Tools menu.

If you are building a simple tree (Neigbour joining or UPGMA) using the Geneious tree builder,
the tree can be built directly from a set of unaligned sequences, as the alignment will be built
as part of the tree-building process. For more advanced trees, or if you wish to bootstrap your
trees you must build an alignment first and use that as input for your tree. You can also select
an existing tree document (which contains an alignment) and build another tree from that, as
the alignment will simply be extracted from the existing tree and used build the new tree.

The following options are available in the tree-building dialog for the Geneious tree builder.
For more information on these options see sections 12.3 to 12.4.

• Exclude masked sites. Excludes sites containing Masked annotations from the analysis
without permanently removing them from your alignment. See section 12.2.1.

• Genetic distance model. This lets the user choose the kind of substitution model used
to estimate branch lengths. If you are building a tree from DNA sequences you have the
choices “Jukes Cantor”, “HKY” and “Tamura Nei”. If you are building a tree from amino
acid sequences you only have the option of “Jukes Cantor” distance correction.

• Tree building method. There are two methods under this option – Neighbor joining and
UPGMA.
12.2. TREE BUILDING IN GENEIOUS PRIME 199

Figure 12.2: Tree building options in Geneious Prime


200 CHAPTER 12. BUILDING PHYLOGENETIC TREES

• Outgroup. Choose which sample to use as an outgroup, or leave it as “no outgroup” to


build an unrooted tree.

• Resample tree. Check this to turn on resampling options (bootstrapping or jackknifing)


to generate support values for your tree. See section 12.4 for further information.

• Resampling method. Either bootstrapping or jackknifing can be performed when resam-


pling columns of the sequence alignment.

• Number of samples. The number of alignments and trees to generate while resampling.
A value of at least 100 is recommended.

• Create Consensus Tree. Choose this to create a consensus tree from the resampled data.

• Sort Topologies. Produce trees which summarise the topologies resulting from resam-
pling.

• Support threshold. This is used to decide which monophyletic clades to include in the
consensus tree, after comparing all the trees in the original set. For example setting this
on 50

• Topology Threshold. The percentage of topologies in the original trees which must be
represented by the summarizing topologies.

• Save raw trees. If this is turned on then all of the trees created during resampling will
be save in the resulting tree document. The number of raw trees saved will therefore be
equal to the number of samples.

12.2.1 Using alignment masks

To exclude certain sites or regions of your alignment before tree building, you can apply Masked-
type annotations to an alignment’s consensus sequence. Masked-type annotations can be ap-
plied manually, or using the Mask Alignment option under the Tools menu (see section 9.4).
This tool enables you to either annotate masked sites, or make a copy of the alignment with
masked sites removed. If you choose to annotate masked sites, these can be removed for tree
building by checking Exclude masked sites in the Tree building options. With this option, any
sites covered by Masked-type annotations will not be used when the tree is inferred but will be
retained on the alignment.

Masked-type annotations can be either directly on the consensus sequence of the alignment, or
on one or more tracks. If you have multiple tracks containing Masked annotations, you need
to select the track you want to use in the Exclude masked sites option. In this way, you can
use multiple masks to compare trees inferred from different subsets of your alignments (e.g.,
excluding different codon positions, or excluding fast-evolving sites). Only the track used to
exclude masked sites when inferring the tree will be included in the Alignment View once the
tree is built (see section 12.5)
12.3. TREE BUILDING METHODS AND MODELS 201

12.3 Tree building methods and models

12.3.1 Neighbor-joining

In this method, neighbors are defined as a pair of leaves with one node connecting them. The
principle of this method is to find pairs of leaves that minimize the total branch length at each
stage of clustering, starting with a star-like tree. The branch lengths and an unrooted tree
topology can quickly be obtained by using this method without assuming a molecular clock
(see Saitou & Nei 1987).

12.3.2 UPGMA

This clustering method is based on the assumption of a molecular clock. It is appropriate only
for a quick and dirty analysis when a rooted tree is needed and the rate of evolution does not
vary much across the branches of the tree (see Mitchener & Sokal 1957).

12.3.3 Distance models or molecular evolution models for DNA sequences

The evolutionary distance between two DNA sequences can be determined under the assump-
tion of a particular model of nucleotide substitution. The parameters of the substitution model
define a rate matrix that can be used to calculate the probability of evolving from one base to
another in a given period of time. This section briefly discusses some of the substitution models
available for the Geneious tree builder. Most models are variations of two sets of parameters –
the equilibrium frequencies and relative substitution rates.

Equilibrium frequencies refer to the background probability of each of the four bases A, C,
G, T in the DNA sequences. This is represented as a vector of four probabilities πA , πC , πG , πT
that sum to 1.

Relative substitution rates define the rate at which each of the transitions (A ↔ G, C ↔ T ) and
transversions (A ↔ C, A ↔ T , C ↔ G, G ↔ T ) occur in an evolving sequence. It is represented
as a 4x4 matrix with rates for substitutions from every base to every other base.

Additionally, gaps are not penalized when using the Geneious Tree Builder. Sites with gaps
are ignored when calculating pairwise distances (i.e, gaps are not treated as a fifth nucleotide
state). Similarly, sites with ambiguous nucleotides are always ignored in distance calculations.

Jukes-Cantor

This is the simplest substitution model. It assumes that all bases have the same equilibrium
base frequency, i.e., each nucleotide base occurs with a frequency of 0.25 in DNA sequences.
202 CHAPTER 12. BUILDING PHYLOGENETIC TREES

This model also assumes that all nucleotide substitutions occur at equal rates (see Jukes and
Cantor 1969).

If the proportion of non-gap, non-ambiguous sites that are mismatched between the sequences
is given as p, the formula for computing the distance between the sequences is:

d = − 34 ∗ log(1 − 4
3 ∗ p)

Under Jukes-Cantor, the number of substitutions is assumed to be Poisson distributed with a


rate of 34 u, i.e. the probability of no substitutions at a given site over a branch of length ut is
4
e− 3 ut .

HKY

The HKY model assumes every base has a different equilibrium base frequency, and also as-
sumes that transitions evolve at a different rate to the transversions (see Hasegawa et al 1985).

Tamura-Nei

This model also assumes different equilibrium base frequencies. In addition to distinguishing
between transitions and transversions, it also allows the two types of transitions (A ↔ G and
C ↔ T ) to have different rates (see Tamura & Nei 1993).

12.3.4 Distance models or molecular evolution models for Amino Acid sequences

The evolutionary distance between two amino acid sequences can be determined under the
assumptions of a particular model of amino acid substitution. The substitution model defines
a rate matrix that can be used to calculate the probability of evolving from one amino acid to
another over a given time.

As with nucleotides, gaps are not penalized when using the Geneious Tree Builder. Sites with
gaps are ignored when calculating pairwise distances (i.e., gaps are not treated as a 21st amino
acid state).

Jukes-Cantor

This is the simplest substitution model. It assumes that all amino acids have the same equilib-
rium base frequency, i.e., each amino acid occurs with a frequency of 0.05 in protein sequences.
This model also assumes that all amino acid substitutions occur at equal rates.
12.4. RESAMPLING – BOOTSTRAPPING AND JACKKNIFING 203

If the proportion of non-gap, non-ambiguous sites that are mismatched between the sequences
is given as p, the formula for computing the distance between the sequences is:

d = − 19
20 ∗ log(1 −
20
19 ∗ p)

Under Jukes-Cantor the number of substitutions is assumed to be Poisson distributed with a


20
rate of 19 u, i.e., the probability of no substitutions at a given site over a branch of length ut is
20
− 19 ut
e .

Technically, Jukes-Cantor for amino acid sequences is the Neyman model (Neyman 1971) with
20 states.

12.3.5 Advanced Tree Building methods

Other plugins are available for running maximum likelihood or Bayesian phylogenetic anal-
yses in Geneious, including MrBayes, PhyML, RAxML, FastTree, and PAUP*. These can be
downloaded from the plugins page on our website or within Geneious by going to Plugins
under the Tools menu. For more information on running these programs, please consult the
user manual for the source software.

12.4 Resampling – Bootstrapping and jackknifing

Resampling is a statistical technique where a procedure (such as phylogenetic tree building) is


repeated on a series of datasets generated by sampling from one original dataset. The results
of analyzing the sampled datasets are then combined to generate summary information about
the original dataset.

In the context of tree building, resampling involves generating a series of sequence alignments
by sampling columns from the original sequence alignment. Each of these alignments (known
as pseudoreplicates) is then used to build an individual phylogenetic tree. A consensus tree can
then be constructed by combining information from the set of generated trees or the topologies
that occur can be sorted by their frequency (see below).

Bootstrapping is the statistical method of resampling with replacement. To apply bootstrap-


ping in the context of tree building, each pseudo-replicate is constructed by randomly sampling
columns of the original alignment with replacement until an alignment of the same size is ob-
tained (see Felsenstein 1985).

Jackknifing is a statistical method of numerical resampling based on deleting a portion of the


original observations for each pseudo-replicate. A 50% jackknife randomly deletes half of the
columns from the alignment to create each pseudo-replicate.
204 CHAPTER 12. BUILDING PHYLOGENETIC TREES

12.4.1 Consensus trees

A consensus tree provides an estimate for the level of support for each clade in the final tree.
It is built by combining clades which occurred in at least a certain percentage of the resampled
trees. This percentage is called the consensus support threshold. A 100% support threshold
results in a Strict consensus tree which is a tree where the included clades are those that are
present in all the trees of the original set. A 50% threshold results in a Majority rule consensus
tree that includes only those clades that are present in the majority of the trees in the original
set. A threshold less that 50% gives rise to a Greedy consensus tree. In constructing a Greedy
consensus tree clades are first ordered according to the number of times they appear (i.e. the
amount of support they have), then the consensus tree is constructed progressively to include
all those clades whose support is above the threshold and that are compatible with the tree
constructed so far.

The length of the consensus tree branches is computed from the average over all trees contain-
ing the clade. The lengths of tip branches are computed by averaging over all trees.

Note: The above definitions apply to rooted trees. The same principles can be applied to un-
rooted trees by replacing “clades” with “splits”. Each branch (edge) in an unrooted tree corre-
sponds to a different split of the taxa that label the leaves of this tree.

12.4.2 Creating a consensus tree of existing trees

Select a tree set document (e.g. a set of bootstrap replicate trees) and choose Tree then Con-
sensus Tree Builder at the top of the setup dialog. Check Create Consensus Tree and choose
the Support Threshold % you wish to use. This will create a consensus tree using the trees
already in the document (no resampling will be performed) and it will either be added to the
tree document or saved as a separate tree document.

12.4.3 Sort topologies

This will produce one or more trees sorted by topology, summarizing the results of resampling,
check Sort topologies under the Consensus Tree Builder options. The frequency of each topol-
ogy in the set of original trees is calculated and the topologies are sorted by their frequency. A
number of these topologies, based on the topology threshold, will be output as summary trees.
The summary trees have branch lengths that are the average of the lengths of the same branch
from trees with the same topology.

The topology threshold determines what percentage of the original tree topologies must be
represented by the summarizing topologies. The most common topology will always be output
as the first summary tree. If the frequency (%) of this does not meet the threshold then the next
most frequent topology will be added, and so on until the total frequency of the topologies
reaches the threshold value.
12.5. VIEWING AND FORMATTING TREES 205

A topology threshold of 0 will result in only the most common topology being output, a thresh-
old of 100 will result in all topologies being output.

12.5 Viewing and formatting trees

Once the tree is built it will appear in the Document Viewer window (Figure 12.3). When
viewing a tree a number of other view tabs may be available depending on the information at
hand. The Alignment View tab will be visible if the tree was built from a sequence alignment
using Geneious. The Text View shows the tree in text format (Newick).

Figure 12.3: A view of a phylogenetic tree in Geneious Prime

The tabs to the right of the tree viewer contain options for controlling the look of the tree, and
the information displayed on it. The toolbar above the tree provides additional formatting
options, and allows you to change the root, or rearrange the tree. The subsequent sections
provide more detail on these options.

Current Tree

If you are viewing a tree set, this option will be displayed. Select the tree you want to view
from the list.
206 CHAPTER 12. BUILDING PHYLOGENETIC TREES

General

The General tab has 3 buttons showing the different possible tree views: rooted, circular, and
unrooted. The Zoom slider controls the zoom level of the tree while the Expansion slider
expands the tree vertically (in the rooted layout).

Layout

This has different options depending on the layout that you select above:

• Root Length Sets the length of the visible root of the tree (Rooted and Circular trees)

• Curvature Adds curvature to the tree branches (Rooted view only)

• Align Taxon Labels Aligns the tip labels to make viewing a large tree easier (Rooted view
only)

• Root Angle Rotates the tree in the viewer (Circular and Unrooted views)

• Angle Range Compresses the branches into an arc (Circular view only)

Formatting

The following options are available for formatting branches:

• Flip the tree horizontally flips the tree so that branches go from right to left, rather than
left to right.

• Transform branches allows the branches to be equal like a cladogram, or proportional.


Leaving it unselected leaves the tree in its original form.

• Ordering orders branches in increasing or decreasing order of length, but within each
clade or cluster.

• Show root branch displays the position of the root of the tree (has no effect in the un-
rooted layout).

• Line weight can be increased or decreased to change the thickness of the lines represent-
ing the branches.

• Show selected subtree only shows only the part of the tree that is selected (or the entire
tree if there is no selection).
12.5. VIEWING AND FORMATTING TREES 207

Show Tips, Node and Branch Labels

If you are unfamiliar with tree structures, please refer to Figure 12.1 for a diagram of tips, nodes
and labels.

Show tip labels: This refers to labels on the tips of the branches of the tree. Tip labels can be
any of the fields on your document, and can be set in the Display option. To select multiple
fields to display at the tips, hold down the command/control key while selecting.

Show node labels: This refers to labels on the internal nodes of the tree. If you are viewing a
consensus tree, you can display consensus support % here, or you can display the node heights.

Show branch labels: This refers to labels the branches of the tree. You can display substitu-
tions per site (branch lengths) here, or for a consensus or bootstrapped tree you can display
consensus support % or bootstrap support %. Checking “show next to node” will move the
labels from the middle of the branch to adjacent to the node.

For node and branch labels, the font can be set using the Font Size options in the tab. The tree
viewer will shrink the font size of some labels if they cannot all fit in the available space. The
lower end of the range specifies the minimum size that the tree viewer is allowed to shrink
the label font to. The font sizes for the tip labels are set using the Font button in the toolbar
above the tree viewer. Significant Digits sets how many digits to display if the value the node
is displaying is numeric.

Automatically collapse subtrees

This option enables groups of similar nodes to be collapsed into a single node that represents
that subtree. The maximum distance within the subtrees is determined by the Subtree Distance
slider. Use this option to help navigate trees with many nodes and tips.

Collapsed nodes are labelled with the name of one of the tips, a count of how many tips the
subtree contains, and the maximum distance between the top of the subtree and any of the tips
within it. Double-clicking a node in a tree will force it to expand or contract. Automatically
Collapse Subtrees will not override this state. To reset the state of double-clicked nodes in
the tree, click Reset state of X nodes. X is the number of nodes with a manually expanded or
collapsed state.

Show scale bar

This displays a scale bar at the bottom of the tree view to indicate the length of the branches of
the tree. It has three options: Scale range, font size and line weight. Setting the scale range to
0.0 allows the scale bar to choose its own length, otherwise it will be the length that you specify.
208 CHAPTER 12. BUILDING PHYLOGENETIC TREES

Statistics

Displays information on the number of nodes and number of tips in the tree.

The Toolbar

The buttons on the toolbar along the top of the viewer allow you to edit the tree.

Click on a node in the tree viewer to select the node and its clade. Double-click the node to
collapse/un-collapse the clade in the view. Once you have selected a clade in the view, you can
edit it using the following toolbar buttons:

• Color Nodes: allows you to choose a new color for the selected clade.

• Font: allows you to change the font for the tip labels.

• Root: allows you to re-root the tree on the selected node.

• Swap Siblings: allows you to swap the position of the sibling clades of the selected node.

In version 9 onwards, the toolbar also contains a Search box that allows you to search for
particular tip labels. If a match is found, this tip is displayed on the tree and all other tips are
greyed out. If you wish to search by a field that is not currently displayed on a tip label, you
need to change the field under Show Tip Labels first.
Chapter 13

Primers

Geneious Prime provides several operations for designing and working with PCR Primers and
DNA or hybridisation probes. PCR Primers and DNA or hybridisation probes can be designed
for or tested on existing nucleotide sequences or alignments. A PCR product can be extracted
from a sequence that has been annotated with both a forward and a reverse primer. 50 ex-
tensions consisting of restriction enzymes or arbitrary sequence may also be added to primer
documents.

In addition Geneious Prime can determine the primer characteristics for a primer sized se-
quence and convert it into a primer. Characteristics can also be determined for any number of
primer sized selections made in the Sequence View.

To use any one of these primer operations simply select the appropriate nucleotide sequences
and either select Primers from the Tools menu or right-click (Ctrl+click on Mac OS X) on the
document(s) and select Primers. A popup menu will appear showing the operations valid for
your current selection.

13.1 Design New Primers

Geneious Prime uses Primer3 to design PCR primers. The Primer Design dialog allows you to
set options for where your PCR primers should sit, what size product to return and character-
istics such as primer length and melting temperature.

Three options are available for primer design: Design New, Design with Existing, or Design
for Sequencing.

Design New designs a pair of forward and reverse primers. You can specify if you wish to
design with or without a matching probe. Design with Existing can design a partner primer to
match an existing one, for example a reverse primer for a forward or vice versa. It also allows

209
210 CHAPTER 13. PRIMERS

Figure 13.1: The primer design dialog


13.1. DESIGN NEW PRIMERS 211

you to design a probe to match a pair of primers. For both of these options, Generic (section
13.1.1) or Cloning (section 13.1.2) primers can be designed. Design for Sequencing (section
13.1.10) allows you to tile primers across a sequence in one or both directions to facilitate Sanger
sequencing of the selected molecule or region.

13.1.1 Generic primers

This option will design standard PCR primers according to the region input options you select.
These options allow you to specify what part of a sequence you wish to amplify. Most options
are optional and can be enabled or disabled with the associated check boxes beside them. If
you have selected a region in the sequence before opening the primer dialog then this region
will automatically be used for Included Region and Target Region. All of these are expressed
in base pairs from the beginning of the sequence and are as follows:

• Included Region: Specifies the region of the sequence within which primers are allowed
to fall. This must surround the target region and allows you to choose a small region on
either side of the target in which primers must lie.

• Target Region: Specifies which region of the sequence you wish to amplify and unless the
advanced options allow otherwise, the forward and reverse primers must fall somewhere
outside this region.

• Product Size: Specifies the range of sizes which the product of a primer pair can have.
The product size is the distance in bp between the beginning of the forward primer to the
end of the reverse primer.

• Optimal Product Size: Specifies the preferred size of the product. Setting this will mean
primer pairs that have a product size close to this will be chosen over those that do not.
Warning: Setting these options can cause the primer design process to take considerably
longer to complete.

The final option in this section is Number of Pairs to Generate which specifies how many
candidate pairs of primers and DNA probes to generate and is compulsory. Setting this to 1
will give you only the primer pair which was considered best by the set parameters.

If you have chosen to return more than one pair of primers and you do not want the same
primer used in different pairs, open the Characteristics panel and check Minimum Primer
Distance. The number of bases specified here is the minimum distance between the 30 ends of
primers of the same direction in different pairs. If this is on but set to zero, Geneious will not
limit primer distances but will try not to reuse the same primer in different pairs.
212 CHAPTER 13. PRIMERS

13.1.2 Cloning primers

This option allows you to design primers to amplify a specific region. Only the included region
can be set, and the primers will be designed to the very ends of this region so that the entire
region is included in the PCR product. This option is useful for amplifying an entire CDS for
creating an insert for cloning.

13.1.3 Tm calculation

Tm estimates generated by Primer3

This section provides references for formulas used by Primer3 to calculate melting tempera-
tures for the binding region of a primer. Under Formula you can choose between two different
tables of thermodynamic parameters and methods for melting temperature calculation:

• Breslauer et al. 1986. This is used by old versions of Primer3 (until version 1.0.1), and
uses the formula for melting temperature calculation suggested by Rychlik et al. 1990.

• SantaLucia 1998. This is the recommended value.

Three different Salt Correction Formula options are available:

• Schildkraut and Lifson 1965 . This is used by old versions of Primer3 (until version 1.0.1)

• SantaLucia 1998. This is the recommended value.

• Owczarzy et al. 2004.

Rough Tm

The rough Tm of a selected region of sequence is dynamically calculated and displayed in the
Statistics Tab in the Viewer side panel, and is calculated as per formulas 5.1 or 5.2. The rough
Tm is usually within 1-3° of that calculated by Primer3.

If you select a region of sequence less than 100 bp in length, then the Rough Tm will be dis-
played as a tooltip as shown in Figure 13.2.

13.1.4 Characteristics

The Characteristics section allows you to set absolute limits on properties of primers and probes
such as melting point and GC content. Optimum values can also be specified. For details on
13.1. DESIGN NEW PRIMERS 213

Figure 13.2: Display of rough Tm of selected sequence

individual options hover your mouse over them and a popup box will describe the function of
the option.

Characteristics can be set for either Primers or DNA Probes, depending on the task you have
chosen. The Primer section is available if one of Forward Primer or Reverse Primer is being
designed or tested and DNA Probe is available if a DNA Probe is being designed or tested.
These two sections are quite similar; the DNA probe section has a subset of the options avail-
able in the primer section. This is because primers are usually chosen in pairs and so several
options can be set for how pairs are chosen.

13.1.5 Primer Picking Weights

At the bottom of the Characteristics panel there is a Primer Picking Weights button. Clicking
this brings up a second dialog containing many more options. The purpose of all of these
options is to allow you to assign penalty weights to each of the parameters you can set in the
options. The weight specified here determines how much of a penalty primers and probes get
when they do not match the optimal options. The higher the value the less likely a primer or
probe will be chosen if it does not meet the optimal value.

Some of the weights allow you to specify a “Less Than” and “Greater Than”. This is for options
which allow you to specify an optimum score such as GC content. These weights are used when
looking at primers whose value for this option falls below and above the optimum respectively.
The other weights are applied no matter in which direction they vary.

For details on individual options in the Primer Picking Weights dialog, again hover your mouse
over the option to see a short description.
214 CHAPTER 13. PRIMERS

Figure 13.3: Primer characteristics options


13.1. DESIGN NEW PRIMERS 215

13.1.6 Degenerate Primer Design

A degenerate primer contains a mix of bases at one or more sites. They are useful when you
only have the protein sequence of your gene of interest so want to allow for the degeneracy
in the genetic code, or when you want to isolate similar genes from a variety of species where
the primer binding sites may not be identical. You can design degenerate primers by using
either a sequence containing ambiguous bases or an alignment as the template and checking
the Allow degeneracy box. The degeneracy value that you specify is the maximum number
of primers that any primer sequence is allowed to represent. For example, a primer which
contains the nucleotide character N once (and no other ambiguities) has a degeneracy of 4
because N represents the four bases A,C,G and T. A primer that contains an N and an R has
degeneracy 4 × 2 = 8 because R represents the two bases A and G.

13.1.7 Advanced Options

In the Advanced panel there are options to add 50 extensions to primers and to specify a mis-
priming library.

A 50 extension can be your own sequence, a restriction enzyme or Gateway site, or a combina-
tion of these. For more information see section 13.9.

A mispriming library is a set of sequences (usually repeats) which the primers should not bind
to. Four inbuilt libraries are available for selection, or you can upload a custom library of
sequences in fasta format. For more information on the inbuilt libraries, see the Primer3 help
page.

13.1.8 Alignment primer design

If you are designing primers off an alignment the primer will be designed on the consensus
sequence by default. To design primers for every sequence in the alignment and have the
primers annotated separately on each sequence or on a few selected sequences, choose Design
primers on “Every Sequence”, or “Selected Sequences” in the alignment options at the bottom
of the Design New Primers window.

13.1.9 Batch Primer Design

Multiple primer pairs can be designed at once by selecting multiple regions within a single
sequence and opening Design New Primers, selecting Design New or Design with Existing.
When multiple regions are selected, a checkbox Use all selections will appear next to Target
and Included regions (for Precise primer design, only Included region can be selected). Check
216 CHAPTER 13. PRIMERS

this option for either Target or Included regions to design primers to all selected regions in one
step.

13.1.10 Sequencing Primer Design

This option will design primers for Sanger sequencing, placing primers at a specified interval
across a whole sequence or selected region. Choose Unidirectional if you wish to sequence
with forward primers only. Choose Bidirectional to design forward and reverse primers that
will enable recovery of double stranded sequence.

Figure 13.4: The Design for Sequencing dialog

The specified interval is the distance from the 30 end of one primer to the 30 end of the next
primer on the same strand. Primer3 will attempt to place primers within 20 bp of the specified
distance, although this may not always be possible.

If a selected region is chosen for sequencing primer design, the first primer will be placed
approximately 50 bp 50 to the selection if there is enough flanking sequence to do so. If there is
no sequence upstream of the selection the first primer is placed as close to the beginning of the
selection as possible.

If designing bidirectional primers, the minimum distance from the 30 end of the fwd primer to
the 30 end of the next primer on the reverse strand is 20 bp.

13.1.11 Output from Primer Design

Once the task and options have been set, click the OK button to design the primers. A progress
bar may appear for a short time while the process completes. When complete, primers and
13.1. DESIGN NEW PRIMERS 217

Geneious Primer Characteristics Primer3 Web Interface Primer3 Command Line


%GC Primer GC% PRIMER {LEFT,RIGHT} GC PERCENT
Tm Primer Tm PRIMER {LEFT,RIGHT} TM
Hairpin Max Self Complementarity (Any) PRIMER {LEFT,RIGHT,INTERNAL OLIGO} SELF ANY
Primer-Dimer Max 30 Self Complementarity PRIMER {LEFT,RIGHT,INTERNAL OLIGO} SELF END
Monovalent Salt Concentration Concentration of monovalent cations PRIMER SALT CONC
Divalent Salt Concentration Concentration of divalent cations PRIMER DIVALENT CONC
DNTP Concentration Concentration of dNTPs PRIMER DNTP CONC
Sequence Seq PRIMER {LEFT,RIGHT} SEQUENCE
Product Size Product Size Ranges PRIMER PRODUCT SIZE
Pair Hairpin PAIR ANY COMPL PRIMER PAIR COMPL ANY
Pair Primer-Dimer PAIR 30 COMPL PRIMER PAIR COMPL END
Pair Tm Diff Max Tm Difference PRIMER PRODUCT TM OLIGO TM DIFF

Table 13.1: Geneious primer characteristics and their Primer3 counterparts

probes will be added as annotations on the sequences. The annotations will be labelled with
the base number the primer starts at, followed by either F (forward primer), R (reverse primer),
or P (probe). Primers will be coloured green and probes red.

Detailed information such as melting point, tendency to form primer-dimers and GC content
can be seen by hovering the mouse over the primer annotation. The information will be pre-
sented in a popup box. Alternatively, double clicking on an annotation will display its details
in the annotation editing dialog. Table 13.1 shows how the values in the Geneious primer an-
notation map to the original Primer3 values. Note that in Geneious Prime 2020 onwards, for
primers with 50 extensions the primer length, Tm and %GC is calculated both with and with-
out the extension. All other values including hairpin and self dimer Tm are calculated with
extension included.

In Geneious Prime 2019.1 onwards, the primer annotation includes a list of Off-target sites
for that primer, including their location and sequence. These are putative non-specific primer
binding sites identified on the sequence that was used for primer design. The entire sequence
will be searched for off-target sites, even if only a selected region is chosen for primer design.
An off-target site will be listed if it has no mismatches to the first four 30 bases of the primer,
and less than 10% mismatches with the primer overall. Mismatches between the primer and
off-target will be shown in red.

The best way to save a primer or DNA probe for further testing or use is to select the annotation
for that primer and click the Extract button in the sequence viewer. This will generate a sepa-
rate, short sequence document in oligo format which just contains the primer sequence and the
annotation (which contains the primer characteristics). In the case of the reverse primer it will
automatically be reverse complemented.

To delete primers that you don’t want, just select the primer annotation and click the Delete
button. You will then be given the option to delete the pair of that primer at the same time.
218 CHAPTER 13. PRIMERS

Figure 13.5: Primer design output

13.1.12 When no primers can be found

If no primers or DNA probes that match the specified criteria can be found in one or more of the
sequences then a dialog is shown describing how many had no matches and for what reasons.

To see why no primers or DNA probes were found for particular sequences, click the ‘Details’
button at the bottom of the dialog. The dialog will then open out to display a list of all the
sequences for which no primers or DNA probes were found. For each of the sequences the
following information is listed:

• Which of Forward Primer, Reverse Primer, Primer Pair and/or DNA Probe could not be
found in the sequence

• For each of these, specific reasons for rejection are listed (eg. “Tm too high” or “Unaccept-
able product size”) along with a percentage which expresses how many of the candidate
primers or probes were rejected for this reason.

After examining the details you can choose take no action or continue and annotate the primer
and/or DNA probes on the sequences which were successfully designed for.
13.2. MANUAL PRIMER DESIGN 219

13.2 Manual primer design

It is possible to create PCR primers by adding a primer annotation directly onto a sequence.
This is especially useful for cloning applications as generally the primers must bind to a speci-
fied set of bases at the beginning and end of the gene to be cloned. To manually add a primer,
select the region of sequence where you wish the primer to bind. You will see a selection hint
with the length and rough Tm of the selection, and an Add Primer button.

Once satisfied with the position and Tm of the selection then you can click the Add Primer
button to open the Add Annotation dialog. The Add Annotation dialog will open with settings
appropriate for creation of a new primer as shown in Figure 13.6. You can then give the primer a
name, set the primer direction and, if required, add an extension. Changing the primer binding
site position in the Add annotation window will automatically update the primer sequence and
characteristics. A 50 extension can also be added directly onto a primer in this step by clicking
the button next to “Extension”. See section 13.9 for more information on adding 50 extensions.

13.2.1 Manually pairing primers

If you have added your primers to a sequence using the manual selection method described
above, or you have run Add Primers to Sequence (section 13.6) or Test with Saved Primers
(section 13.5) without checking ”pairs only”, then your primers will be annotated on the se-
quence as individual primers, rather than associated with their pair. To pair individual primers
after they are annotated, select both primers using cntrl/command-click, then right-click on
one of primers and select Manually pair primers.

You should then see a line linking the F and R primer, and pair characteristics (Pair dimer Tm
and product size) will be added to both primer annotations.

13.3 Importing primers from a spreadsheet

You can import primers and probes directly into Geneious Prime from spreadsheet documents
in either comma-separated (.csv), tab-separated (.tsv) or native Excel (.xlsx or .xls) format. You
can either import them from the Import → From File menu, drag and drop the file in, or simply
paste the contents of the document into Geneious Prime.

When Geneious Prime has successfully recognized the file format, you will see the following
dialogue (Figure 13.8).

You will be asked which type of sequence you are importing. When you choose to import
primers or probes, you will receive some options that allow you to determine characteristics
for them as an extra step.
220 CHAPTER 13. PRIMERS

Figure 13.6: Create a primer by adding a primer annotation


13.4. PRIMER DATABASE 221

Figure 13.7: Manually pairing primers

Immediately below this is a preview of the first few rows of data, and a checkbox that allows
you to specify that the top row is a heading row and should be ignored.

Below the preview is a list of common and additional fields, along with dropdown boxes. These
boxes allow you to specify which column contains which piece of data – often, one or more of
these won’t be applicable and can be left as None. Note that at minimum, you must specify a
Sequence field.

Lastly, any add additional data in the form of meta-data. Clicking the dropdown box next
to Meta Data at the bottom of the dialog will allow you to import values to meta-data, and
clicking the + or − will allow you to insert or remove additional meta-data types. Next, click
the Fields... button to bring up a dialog.

An additional set of dropdown boxes will allow you to specify again which columns of data
contain the fields which comprise this meta-data type. This includes custom meta-data types
that you have created and saved in the past.

When you’re ready, hit OK to begin importing. When the import is complete, you may be
presented with the option of grouping the sequences you imported into a sequence list. This
option is recommended if you’re importing very large sets of sequences.

13.4 Primer Database

The Primer Database consists of all the oligonucleotide documents that exist in your Local
or Shared Databases. The oligonucleotide document type is a short nucleotide sequence
representing either a primer or a probe. The text view lists the primer characteristics (Tm , GC
222 CHAPTER 13. PRIMERS

Figure 13.8: Importing primers from a spreadsheet


13.5. TEST WITH SAVED PRIMERS 223

etc). These properties can be shown in the document table. Tm is shown by default, but you
can turn on others by right clicking on the table header.

Oligo documents are created via one of the following methods:

• Extract a primer/probe annotation from a sequence

• Select Sequence → New Sequence from the menu and choose Primer or Probe as the
type of the new sequence

• Select one or more existing primer sequences that are in nucleotide format (maybe ones
imported from a file) then click Primers → Convert to Oligo to transform them into oligo
type sequences

• Import primer sequences from a comma separated file (.csv) and choose Primer or Probe
as the sequence type (see section 13.3).

• Use Add Primers to Sequence to test primers not currently in your database against a
sequence. If a match is found and Extract primers to folder is selected, the oligo document
will be added to the database.

If you select a target sequence and go to Test with Saved Primers or Design Primers → De-
sign With Existing, Geneious will find all oligo sequences in your database and offer them as
options in the list of oligo sequences. There is no need to select them along with the target
sequence before starting the operation.

The meta-data type Primer Info can be used to note the fridge location etc of a particular primer.

13.5 Test with Saved Primers

Primers and probes can also be quickly tested against large numbers of sequences to see which
ones the primers will bind to. Primers can be tested against a single selection on a sequence,
against the whole sequence or against multiple entire sequences at once. To test primers, select
one or more target sequences then choose Primers → Test with Saved Primers.

The Source option specifies which primers will be considered for testing on the selected se-
quence. To test all primers in your local database, choose All Folders. If you select Current Folder,
all primers in the same folder(s) as the selected sequence will be tested. Click the Choose button
to select any other folder in your local or shared database that contains primers or probes to
test.

With Annotate you can limit the primers to a certain direction (relative to the direction of the
currently selected sequence). If Pairs Only is selected, inward-directed forward and reverse
pairs of primers will be annotated, and the option to limit the Product Size (in the Constraints
section) will be enabled. The Pairs Only option normally produces many more annotations, as
224 CHAPTER 13. PRIMERS

Figure 13.9: Testing saved primers on a selected sequence

most primers will be annotated as part of multiple pairs, with all valid partners. If you select
the Probe checkbox, oligonucleotide sequences of type “Probe” will be tested (you can create
probes when you add a new sequence manually, section 5.1, or by running “Convert to Oligo”,
section 13.8).

The Region allows you to choose whether to find primers that bind either anywhere on the se-
quence or inside the selected region(s) (only enabled if you have selected one or more regions
on the sequence you’re testing primers with). Alternatively, if you want to test whether your
primers will amplify a selected region by binding within a specified number of bases upstream
or downstream of the selection, you can select Amplify/target selected region. This is only avail-
able if both Forward and Reverse primers are selected in the Annotate options. Click the “?”
button for more explanation on using the Amplify/target selected region option.

By default, only primers that match the target sequence exactly will be found. If you wish to
allow a limited number of mismatches between the primer and target sequence you can specify
this under Mismatches, where you can also limit the proximity of mismatches to the 30 end of
the binding site.

Expand the Constraints section in order to limit the Product size range (available if both For-
ward and Reverse are checked in the Annotate options) or to specify the characteristics used for
binding site Tm calculation (by clicking the “Tm Options” button).
13.6. ADD PRIMERS TO SEQUENCE 225

Click OK to begin testing primers. Once complete, any primers matching your specified cri-
teria will be annotated on the sequence. If you chose to test the primers as pairs, they will be
annotated as pairs on the sequence, with a connecting line between the pairs. The number of
matching primers found will be displayed briefly in the status bar below the sequence view.

When testing primers Geneious will automatically check the entire template for other binding
sites, even if only a selected region of the sequence is chosen for testing primers on. If a non-
specific binding site is found, the details will be listed on the primer annotation under Off-
target sites. As with Design New Primers, an off-target site will be listed if it has no mismatches
to the first four 3’ bases of the primer, and less than 10% mismatches with the primer overall.
Mismatches between the primer and off-target will be shown in red.

13.6 Add Primers to Sequence

If you have primer sequences you’d like to test against a sequence but you don’t have the
primer documents in your database already, you can use Add Primers to Sequence. This lets
you enter a Name and Sequence for one or more primers, and gives you some basic options
for testing. You can test multiple primers by adding more rows via the + buttons on the side.

Figure 13.10: Test new primers against a sequence using ”Add Primers to Sequence”

Up to ten previously entered primer names and sequences are remembered in the correspond-
ing fields to make it easier using the same primer again.

All specified primers that bind the selected sequence will be annotated on the sequence, as
well as mismatch annotations, as appropriate. Geneious will automatically check the entire
template for other binding sites, if any are found they will be annotated on the primer under
226 CHAPTER 13. PRIMERS

Off-target sites. If Extract primers to folder is selected, all annotated primers will also be saved
as separate primer documents to the folder containing the selected sequence.

13.7 Characteristics for Selection

The Characteristics for Selection option will determine the primer characteristics of a selection
of sequence within a larger sequence. Select a region of 60bp or less in the Sequence View
and choose Primers → Characteristics for Selection. The primer characteristics will then be
added as an annotation over the exact region that was selected. This will also work on multiple
selected regions in the Sequence View. Hold the Ctrl key while clicking and dragging to select
multiple regions simultaneously.

13.8 Convert to Oligo

Geneious Prime can convert nucleotide sequences into primers. This is necessary for sequences
to show up in the oligo database. To do this, select your sequences and choose Primers →
Convert to Oligo from the popup menu that appears. Convert to Oligo can only be performed
on sequences less than 400 bp long, and full characteristics (such as primer dimer and hairpin
Tms) can only be calculated for primers 60bp or less.

13.9 Primer Extensions

You can add an extension to a primer annotation using the “Edit Annotation” button in the
sequence view or by double-clicking the annotation. You can also add a primer extension to
an existing oligonucleotide document by selecting Primers → Add 50 Extension. You can add
an arbitrary sequence, a restriction site, and/or Gateway cloning site. Multiple components
can be added to an extension, and the preview window in the 50 extension dialog (figure 13.11)
shows how the extension components will be arranged on the primer. The components can
be rearranged by dragging and dropping them in this window. Primer extensions can also be
added at the time the primer is designed from within the “Design New Primers” options.

The 50 extension sequence and annotations are visible on primer annotations in the sequence
view. The extension sequence is also shown in the list of the annotation’s properties in the
Annotation Table. Tm , size and GC content values for the primer are shown with and without
the extension, and all other values such as hairpin Tm and primer dimer Tm are calculated with
the extension included. These values can be viewed by mousing over the primer annotation,
or, for primers annotated on a sequence, by clicking Edit Annotation (see figure 13.12).

If a primer annotation is extracted to a separate document, the extension will be included. It


13.9. PRIMER EXTENSIONS 227

Figure 13.11: Primer extension editing dialog

Figure 13.12: Primer with 50 extension, shown as an extracted primer above, and mapped onto
a sequence below
228 CHAPTER 13. PRIMERS

is not covered by the binding region annotation, but may have its own annotations, as shown
in figure 13.12. If a PCR product is extracted using this annotation, the result will include the
extension. Extensions will be ignored when primer testing is conducted against potential target
sequences.

13.10 Extract PCR Product

Once primers are annotated on a sequence, the resulting PCR product can be extracted by
selecting Primers → Extract PCR Product. If only a single pair of primers are annotated on
a sequence then these will automatically be chosen as the Forward and Reverse primer. If
multiple primers are annotated on a sequence, then the drop down menus allow you to choose
which primers to use for extracting a single PCR product, or alternatively you can choose to
Extract PCR products from all primers.

When you click OK a new sequence document is produced containing only the sequence span-
ning (and including) the PCR primers. Any 50 extensions on the primers will also be included.

PCR products can also be extracted directly from the annotated sequence by selecting a forward
and reverse pair of primers while holding down the shift key to select the sequence bounded
by the primers. Then click the Extract button. You will be given the option to either extract
the PCR product, or the Selected region. If PCR product is selected, the resulting product will
include any 50 extensions on the primers.

13.11 More Information

The Primer features in Geneious Prime are based on a modified version of the program Primer3
(http://bioinfo.ut.ee/primer3/).

Copyright (c) 1996,1997,1998,1999,2000,2001,2004 Whitehead Institute for Biomedical Research.


All rights reserved.

A copy of the modified Primer3 source that Geneious Prime runs is distributed with the plugin.

If you use the primer design feature of Geneious Prime for publication we request that you cite
Primer3 as:

Steve Rozen and Helen J. Skaletsky (2000) Primer3 on the WWW for general users and for
biologist programmers. In: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols:
Methods in Molecular Biology. Humana Press, Totowa, NJ, pp 365-386 Source code available
at http://sourceforge.net/projects/primer3/.

Further information can be found in the primer3 documentation available here: http://
primer3.ut.ee/primer3web_help.htm. Please note that some controls have been changed,
13.11. MORE INFORMATION 229

renamed or removed from Geneious, but most of the primer3 functionality is available.
230 CHAPTER 13. PRIMERS
Chapter 14

Cloning

The cloning features in Geneious Prime allow you to simulate several different types of cloning,
including Restriction cloning, Golden Gate cloning, Gibson Assembly, Topo cloning and Gate-
way cloning. You can also create enzyme lists, find restriction sites on your sequence of interest,
and simulate digestion and ligation reactions.

The following sections give more detail on each option.

14.1 Find Restriction Sites

Restriction Enzymes1 cut a nucleotide sequence at specific positions relative to the occurrences
of the enzyme’s recognition sequence in the sequence. For example, the enzyme EcoRI has the
recognition sequence GAATTC and cuts both the strand and the antistrand sequence after the
G inside the recognition sequence2 , leaving a single-stranded overhang (sticky end (overhang)):

To find and annotate restriction sites on a nucleotide sequence, go to Find Restriction Sites...
under the Tools → Cloning menu or open the Restriction Analysis tab to the right of the se-
quence viewer.

You can configure the following options:


1
The restriction enzyme information included in Geneious Prime was obtained from Rebase, available for free
at http://rebase.neb.com.
2
Like many restriction enzymes EcoRI is methylation dependent and cuts only if the second A in the recognition
sequence is not methylated to N6-methyladenosine.

231
232 CHAPTER 14. CLONING

• Candidate Enzymes lets you select a set of restriction enzymes from which you want to
draw the ones to use in the analysis. This includes the options to use commonly used,
or all known commercially available restriction enzymes. If you have created your own
restriction enzyme set from your local database then this will also be listed (see section
14.3 for how to create such a document). To select specific enzymes from a particular
enzyme set, click the Advanced button (see below).
• Enzymes must match X to Y times: only returns restriction enzymes which cut the se-
quence X to Y times. Results for enzymes that cut the sequence more or less than this
will be discarded. If you set X to be 0, when this operation is complete, it will report
which candidate enzymes do not cut the sequence.
• Specifying cut regions: To specify a region of sequence where you want the enzyme to
cut or not cut, choose one of the options below, and use the base counters to specify a
sequence range that the options apply to. If you have selected a region of sequence in the
sequence viewer, clicking the refresh arrow next to the base number counters will copy
the selected region to this setting. The following options are available:
– Cut Anywhere: Returns enzymes which cut anywhere in the entire sequence. It is
not possible to select a subregion with this setting.
– Must only cut between: Returns enzymes which only cut between the specified
bases.
– Must cut between (may cut outside): Returns enzymes which cut between the spec-
ified bases, and may also cut outside that region.
– Must not cut between: Returns enzymes which only cut outside the specified bases.
• Advanced: This displays a table of all enzymes in your candidate set, including their
recognition site, overhang, effective length and methylation information (Figure 14.1).
Only the enzymes selected in this table will be considered in the analysis; initially, all
rows are selected. You can click on the column headers to sort the table ascending or
descending by that column, and you can Shift+click and Ctrl+click to select a range of
rows and to toggle the selection of a row, respectively.

After configuring your options, click Apply to record the restriction enzyme site annotations
on the sequence. The annotation shows the enzyme’s recognition site, cut site and methyla-
tion sensitivity. If Highlight Methylation Sites is selected, restriction sites which would not be
cleaved due to the presence of methylation by Dam, Dcm or EcoKI are greyed out and marked
with an asterisk (for further information on Highlight Methylation Sites see below).

Once the document is saved, two new tabs will appear above the sequence view: Enzymes
displays the list of enzymes and their cut positions; Fragments displays a list of fragments that
would be produced from the restriction digests. These tables can be exported as .csv files for
subsequent processing with other software such as e.g. Microsoft Excel® .

To select the region between two cut sites on a sequence, Shift+click on the two restriction site
annotations in the sequence view.
14.1. FIND RESTRICTION SITES 233

Figure 14.1: Find Restriction Sites restriction enzymes table accessible under the Advanced
option.
234 CHAPTER 14. CLONING

To find enzymes that do not cut a particular sequence, use Find non-cutting enzymes under
the Cloning menu. See section 14.3 for further details.

Highlight Methylation Sites

In Geneious Prime 2021 onwards, Geneious will flag restriction sites where restriction enzyme
cleavage ability may be affected due to the presence of methyl groups added by methyltrans-
ferases Dam, Dcm, and EcoKI.

This is determined by experimental validation according to rebase.neb.com, which uses the


following terms to describe the effect of methylation:

Cut Not sensitive to methylation at the overlapping site


Impaired Rate of cleavage is lower than for unmethylated overlapping site
Blocked Will not cleave when overlapping site is methylated
Some blocked Blocked by methylation of the overlapping site in some but not all flanking DNA contexts
Variable Conflicting reports of sensitivity to methylation at the overlapping site
Untested Effects of methylation at the overlapping site have not been tested

Restriction sites are greyed out and marked with an asterisk if the ability of the restriction
enzyme to cleave at the restriction site is variable, impaired or blocked (see Figure 14.1). Methy-
lated restriction sites where cleavage has not been experimentally validated, or might be methy-
lated but have ambiguities in the target sequence, are highlighted with an asterisk but not
greyed out.

The highlighted restriction sites and indicated methylation effect takes into consideration the
neighboring sequence. For example, cleavage by XbaI is only blocked by dcm methylation if the
recognition site (TCTAGA) is preceded by GA or followed by TC. If these flanking bases are not
present, the site will not be marked as affected by methylation, but a note on the restriction site
annotation will say what the effect of methylation would be if the correct bases were present.

Restriction Enzyme effective length

Effective length for restriction enzymes is displayed in both the Advanced table of enzymes,
and in the Enzymes tab on the sequence viewer.

Effective length is a measure of how frequently an enzyme will cut, taking into account both
sequence length and ambiguities. In other words, lower effective length means an enzyme is
expected to cut more frequently. Because ambiguous bases are more likely to match a sequence
by chance, they contribute less than 1 to the effective length.

Effective length is calculated as the sum of the following formula across all symbols in the
recognition sequence, where n is the number of nucleotides each symbol represents.
14.2. DIGEST INTO FRAGMENTS 235

Figure 14.2: Restriction site affected by methylation. The EcoRII site is blocked by dcm methy-
lation and is colored grey, while the BstN1 site is not affected and is colored blue.

1 - log(n) / log(4)

I.e. 1 for a ACGT, 0.5 for 2-ambiguity (MRWSYK), .208s for 3-ambiguity (VHDB) and 0 for N.

Note: The sum displayed in Geneious is rounded down to nearest .0 or .5

See http://search.cpan.org/dist/BioPerl/Bio/Restriction/Enzyme.pm#cutter
for a little explanation of why you would use this.

14.2 Digest into fragments

The option Digest into fragments... from the Tools → Cloning menu or the context menu
allows you to generate the nucleotide sequences that would result from a digestion experiment.
You can digest multiple nucleotide sequences at a time. If the digestion results in overhangs,
these will be recorded as annotations on the fragments.

• If you have selected only one nucleotide sequence document and it has annotated re-
striction sites, you can select Annotated cut positions to cut the document on these sites.
When this option is selected, the options to filter the enzymes by their effective recog-
nition sequence length or number of hits are disabled. You can select a subset of the
annotated enzyme sites under More Options.
• If you do not have annotated cut sites already on your document, you can choose Enzyme
set and select which enzyme(s) you wish to use. This runs Find Restriction Sites first, but
does not generate restriction site annotations on your original sequence. See the section
on Find Restriction Sites (14.1) for more detail.
236 CHAPTER 14. CLONING

Figure 14.3: Digest into fragments options dialog, with extended options showing.
14.2. DIGEST INTO FRAGMENTS 237

• Where multiple enzymes are selected, you can either digest the original sequence by Each
enzyme separately, which returns a separate sequence list of the fragments produced for
each enzyme, or by All enzymes at once, which digests by all selected enzymes in one
operation.

14.2.1 Direct extraction of restriction fragment

To extract the restriction digest product between two cut sites, click on the first site, then hold
down the shift key and click on the second site. This will select the region between the sites.
Then click the Extract button. You will be given the option to either extract the Digested frag-
ment, or the Selected region (see Figure 14.4). If Digested fragment is selected, the resulting
product will contain the restriction site overhangs produced by the digestion.

Figure 14.4: Direct extraction of a restriction site product, showing (A) selection between two
restriction sites, (B) the Extract dialog for extraction of the digested product, and (C) restriction
site overhangs on the resulting product
238 CHAPTER 14. CLONING

14.3 Creating a custom enzyme set

If there is a specific set of enzymes that you want to use repeatedly, you can create a custom
enzyme set. There are two ways to do this:

1. In the Find Restriction Sites interface, click Advanced and select the enzymes you want
in your set. Then click Save Selected Enzymes and give the set a name. This set will then
be available in the Candidate Enzymes lists.

2. Go to New Enzyme Set under Cloning or File → New. Give your set a name and click
OK. This will create a new, empty enzyme set document. To add enzymes to the set, click
Add Enzymes and select the ones you want from the list.

New enzyme sets are created as documents with the icon. These are created in whatever
folder you are working in, but all enzyme sets in your database are available in the Candidate
Enzymes drop down list, in the Find Restriction Sites interface.

To create an enzyme set containing restriction enzymes which do not cut a particular sequence,
select your sequence and go to Cloning → Find non-cutting enzymes. Select the criteria and
the candidate enzymes you wish to test, and click OK. A new enzyme set document will then
be created containing the enzymes which do not cut your sequence.
14.4. INTRODUCTION TO THE CLONING INTERFACE 239

14.4 Introduction to the cloning interface

Overview

Tools for Restriction Cloning, Golden Gate and Gibson Assembly can be found under the
Cloning menu. These tools share a common interface which differs only in options specific to
each cloning tool. Input sequences may be selected prior to launching the operation, or may be
selected or removed from within the cloning options.

Options

Backbone: Use this option to choose a sequence that will serve as the vector in your cloning
reaction. When the cloning dialog is first opened, this will show Use Leftmost, which
refers to the sequence represented in by the leftmost tag in the Construct Layout panel.
Geneious will automatically choose the longest (preferably circular) sequence as the back-
bone. The backbone can be manually changed either by using the dropdown to choose
from selected sequences or recently used backbones, by selecting a backbone with the
Choose. . . button, or by dragging and dropping the tags within the Construct Layout
panel to the desired position. The sequence chosen as the backbone is always in the
leftmost position in the Construct Layout panel and has blue background shading. If a
backbone is selected it will always result in a Circular Product. Nucleotides in the final
construct will be numbered beginning with the origin of the backbone sequence.

Enzyme or Exonuclease: Set the enzyme(s) to consider for your reaction. Geneious will au-
tomatically try to use the best reaction setup with the specified enzyme(s), and you can
manually adjust the reaction setup in the Construct Layout Panel. The enzyme options
differ between the individual cloning operations - information specific to each cloning
tool can be found in the relevant sections of this manual.

Circular Product: Select this option to create a circular product when no Backbone is specified.

Construct Layout Panel: Each sequence or sequence list (where available, these are shown in
gold), is represented as a sequence tag in the upper Construct Layout Panel, in the or-
der in which they will be ligated. You can rearrange inserts by dragging and dropping
the sequence tags, remove them by clicking on the × in the top corner, or select addi-
tional inserts using the Add Inserts. . . button. If you have designated a sequence as the
Backbone (vector), it will appear in green to the left of any other sequences.

Tags show the following information:

Sequence Name (highlighted in blue if reverse complemented)


Reaction Type
Additional Information such as extracted region, length, or number of sequences
240 CHAPTER 14. CLONING

Figure 14.5: A Restriction Cloning options dialog. The Construct Layout Panel shows the back-
bone sequence shaded blue to the left and an insert to the right. The Detailed View Panel shows
the selected annotated sequence.
14.5. RESTRICTION CLONING 241

Warnings for sequences in an error state, shown in red. Additional details about these
warnings can be found below the Detailed View Panel.
Dropdown Arrow H to adjust individual reactions or modify the sequence

Detailed View Panel: Click on a sequence tag in the Construct Layout Panel to display the
annotated sequence for that tag in the Detailed View Panel. The region of your sequence
that will be included in the final construct will be highlighted by blue shading, and en-
zymes used will be written in purple. You can adjust the zoom level in the detail view
using the keyboard and mouse, as described in section 5.2.1. It is not possible to edit
sequences in the Detailed View Panel – if you wish to edit your sequence, you will need
to close the cloning dialog and edit the sequence directly in the sequence viewer.

Save used primers (where applicable): Creates a primer document for each of the primers
used in the operation. For batch cloning reactions, the set of primers for each ligation
product will be saved in a sequence list.

Save intermediate products: Creates a primer document for each of the primers used in the
operation. For batch cloning reactions, the set of primers for each ligation product will be
saved in a sequence list.

Save in sub-folder: Create a new sub-folder within the current directory to save result docu-
ments to. If another folder with the specified name already exists, a new folder with that
name and an incrementing number will be created.

14.5 Restriction Cloning

Use the Restriction Cloning operation to ligate two or more linear or circular nucleotide se-
quences. For an overview of the restriction cloning interface, see section 14.4.

To run a Restriction Cloning operation, optionally select a Backbone sequence, along with one
or more Insert sequences, and then specify the enzymes you wish to consider for the reaction.
You can adjust the order in which the sequences will be ligated, and the cut sites to be used for
the ligation in the Construct Layout Panel. See Figure 14.4 for an example.

Candidate Enzymes: Select Annotated Enzymes if the restriction sites for your reaction are al-
ready annotated on your sequences. Enzymes annotated on any of the selected sequences
will be available for all sequences in your reaction. To also use restriction enzymes which
have not been annotated on your sequences, select Enzyme Set. If None is selected, only
blunt end ligation or annotated single-stranded overhangs (‘sticky ends’) will be avail-
able. Annotated overhangs and blunt end ligation will remain available if either Anno-
tated Enzymes or Enzyme Set are selected. Restriction enzymes with recognition sites
shorter than 5 bp will only be considered if they have been annotated. Contact Geneious
Support if you want to change this minimum recognition site length.
242 CHAPTER 14. CLONING

Enzyme Set: This option is enabled when Enzyme Set is selected under the Candidate En-
zymes option. All enzyme sets in your database will be available here. You can create
new enzyme sets from the Enzymes tab in the Sequence Viewer. If you wish to only
use specific enzymes from within your chosen Enzyme Set, click the Choose button and
select the enzymes you require.
Construct Layout: Sequences will be displayed as Tags in the Construct Layout Panel, in the
order in which they will be ligated. Overhangs for each sequence, corresponding to the
selected restriction enzymes, will be displayed between sequence tags. Pairs of overhangs
will be green if they are compatible or red if they are not. Unused overhangs that will be
generated at either end of a linear construct will be shown in grey.
To change ligation reactions or reverse complement a sequence, click on the H at the right
of the sequence tag:

50 Reaction / 30 Reaction: Enzymes from the specified Candidate Enzymes will be shown
in the appropriate section, if they have cut sites that are compatible with a neighbor.
If more than 30 restriction sites are available, click on Choose enzyme cut site. . . to
select from the available restriction sites. Sites shown in grey italics can be selected,
but will require further changes to create a valid reaction. Predigested overhangs
and blunt ends are also shown.
Reverse Complement: Select this option to reverse complement the current sequence,
and recalculate possible compatible cut sites on the current sequence and its neigh-
bours. The original sequence will not be altered. Names of reverse complemented
sequences appear in blue.
Modify Overhangs: Select this option to modify overhangs after choosing restriction
sites. This allows you to simulate end repair by polymerase-mediated backfilling
or nuclease-mediated overhang removal (blunting). Modified overhangs will be re-
tained until you select a new restriction enzyme or use the Reverse Complement
option.

14.6 Gibson Assembly and In-Fusion Cloning

Gibson Assembly or Gibson Cloning is a method for seamless ligation of multiple sequences in
a single reaction, without the need for restriction sites. The Gibson Assembly operation allows
you to simulate cloning reactions that use an exonuclease to generate overlapping fragments
for ligation, including Gibson Assembly, GeneArt® Seamless Cloning (50 exonuclease), SLIC
and In-Fusion® Cloning (30 exonuclease). It can also be used to simulate cloning operations
that do not use exonuclease such as CPEC and SLiCE. For an overview of the cloning interface,
see section 14.4.

Note that In-Fusion cloning can also be run from the In-Fusion cloning option under the
Cloning menu. With this option, 3’ exonuclease will automatically be used and the primer
overhang length is automatically set to 15 bp.
14.6. GIBSON ASSEMBLY AND IN-FUSION CLONING 243

To run a Gibson Assembly operation, optionally select a Backbone sequence, along with one
or more Insert sequences, and then specify the type of exonuclease you wish to use for the
reaction. For an example, see figure 14.6. You can adjust the order in which the sequences will
be ligated in the Construct Layout Panel. Currently the Gibson Assembly operation can only
be run using linear sequences.

Batch cloning with alternate sequences can be performed using sequence lists. Each of the
individual nucleotide sequences in a list will be inserted into a separate product at the same
insert position.

Figure 14.6: The Gibson assembly cloning window, showing batch assembly with a list of pro-
moter sequences
244 CHAPTER 14. CLONING

Exonuclease: The exonuclease activity defines which strand (if any) gets digested to expose
complementary overhangs:

• 50 exonuclease (Gibson & GeneArt® Seamless Cloning): The enzyme chews back bases
from the 50 end to expose complementary overhangs. GeneArt® Seamless Cloning rec-
ommends a 15 bp overhang, while to perform a Gibson assembly a longer overhang of 25
to 40 bp is used in many protocols.

• 30 exonuclease (SLIC & In-Fusion® Cloning): The enzyme chews back bases from the
30 end to expose complementary overhangs. A 15 bp overhang is recommended for In-
Fusion® Cloning, while for SLIC a 25 bp overhang is recommended.

• None (CPEC & SLiCE): These two methods don’t use a specific exonuclease. Instead,
CPEC amplifies the whole strand with a polymerase, and SLiCE uses a cell extract to
recombine DNA molecules using short-end homologies.

If there are existing, compatible, overhangs annotated on the selected sequences (e.g. those
derived from restriction digest), they will be used for the ligation reaction. If existing over-
hang annotations are not compatible, or do not meet the specified requirements, the overhangs
will be removed or filled in (depending on the overhang and the exonuclease chosen) prior to
primer design.

Primer Options: Here you can adjust parameters that influence creation of primers and over-
hangs:

• Min Overhang Length: The minimum length of the complementary overhangs between
two adjacent sequences. Usually half of this length is added via primer extension to each
sequence. For sequences flanking the backbone, the full Min Overhang Length will be
added to the insert sequence.

• Min Overhang Tm : The minimum melting temperature allowed for the annealing region
between two adjacent sequences. The overhang length will be increased until the melting
temperature satisfies this condition.

• Tm Calculation: Additional settings required by Primer3 to calculate the melting temper-


ature. These settings are applied to both the overhang Tm calculation as well as to the
primer Tm calculation.

Extension primers for creation of complementary overlaps

For sequences without existing complementary overlaps on both ends, a pair of primers will
be created to generate complementary overlaps via Primer Extension PCR, as required. If both
14.7. PARTS CLONING 245

ends are complementary no primers will be generated. Primer design uses non-stringent con-
ditions which may result in poor quality primers - we recommend that you check the generated
primers before ordering them.

Extensions will be added to the primer corresponding to the neighboring sequence, to generate
complementary overhangs. Primers are generated only for insert sequences, as it is assumed
that the vector should stay unmodified. For this reason the extension length of primers ex-
tending to the vector will be the full specified minimal overlap length, whereas extensions on
primers between two inserts, will each be half of the total overlap length. If you wish to manu-
ally add modifications to primer extensions, these must be annotated onto the insert sequence
as type ‘Gibson Primer Extension’, otherwise they will be included within the binding region.

The melting temperature (Tm ) for the annealing region between the neighbouring sequences is
calculated using the Tm characteristics setting in Primer Options. In many cases the Phusion
DNA polymerase is used, for which it is recommended to use the Tm formula of Breslauer et
al. 1986 (See section 13.1.3).

For very short or long extensions, Primer3 might be unable to calculate a Tm . If Primer3 fails,
Geneious will calculate the Tm using formula 5.1 (for short sequences) or 5.2.

The primers generated in your Gibson Assembly will be listed in the Report Document, along
with the calculated characteristics and any errors that occurred during the primer generation
process. Furthermore any modifications (recession or maintaining overhangs, adding exten-
sions to primers) are shown at the beginning of the document.

14.7 Parts Cloning

Parts, or combinatorial cloning refers to the generation of multi-part constructs from libraries
of genetic elements, such as regulators, gene coding regions and terminators.

The Parts Cloning tool in Geneious will concatenate libraries of genetic elements outputting
all possible combinations. Each library of genetic elements must be contained in a separate
sequence list. To simulate Parts cloning in Geneious, select all sequence lists and individual
sequences you wish to include, then select Parts Cloning under the Cloning menu.

Each part is represented by a tag in the construct layout panel - drag and drop these into the
order you wish you ligate them. Sequence lists are shown as brown boxes, and individual
sequences are grey.

If your assembly includes a vector, set this under Backbone and tick the Circular Product box
if your final product will be circular. Note that your vector must be linearised at the insertion
point prior to running Parts Cloning. If you are not including a vector, set Backbone to None.

The bottom panel shows the number of combinations that will be output from the assembly of
parts. If Order results by is set to None, all combinations will be output into a single sequence
246 CHAPTER 14. CLONING

Figure 14.7: Parts cloning example


14.8. GATEWAY® CLONING 247

list. If Order results by is set to a particular part, a separate sequence list will be created per
specified part, with all constructs containing that part.

14.8 Gateway® Cloning

Geneious Prime contains three operations to assist with Gateway® cloning. Gateway is a regis-
tered trademark of Invitrogen Corporation. The Gateway option under the Cloning menu will
perform a BP reaction and/or an LR reaction on the selected documents. If there are a mixture
of AttB/AttP and AttL/AttR sites on the input documents, it will perform a BP reaction on all
documents with AttB/AttP sites, followed by an LR reaction on the results of the BP reaction
and any input documents with AttL/AttR sites.

For example, to insert a PCR product with attB sites directly into a destination vector, select
the PCR product, a donor vector, and a destination vector. Geneious will first produce an entry
clone from the PCR product and donor vector, then react this entry clone with the destination
vector to produce an expression clone.

Annotate att sites...

This operation searches for AttB, AttP, AttL and AttR sites and annotates them on your se-
quence.

Add AttB Sites to PCR product

This operation allows you to add AttB sites to a PCR product. It will work on the following
types of document:

• A PCR product. AttB sites will be appended to the PCR product.

• A document with primer binding sites annotated. If there is more than one pair, Geneious
will ask you which pair to use. The PCR product will be extracted and AttB sites ap-
pended.

14.9 Golden Gate / Type IIS Cloning

Golden Gate is a method to conveniently digest and ligate multiple sequences with the same
Type IIS enzyme in a single reaction. For an overview of the cloning interface, see section 14.4.
248 CHAPTER 14. CLONING

To run a Golden Gate cloning operation, optionally select a Backbone sequence, along with
one or more Insert sequences, and then specify the enzyme you wish to use for the reaction.
You can adjust the order in which the sequences will be ligated and optionally select primers
for use in the ligation reaction in the Construct Layout Panel. See Figure 14.9 for an example.

Enzyme: Select a single Type IIS restriction enzyme to use in the ligation reaction. The set
of available enzymes includes those commercially available Type IIS enzymes with an
effective recognition site of at least 4 bases which produce an overhang of at least 3 bases
on either strand. Only enzymes with 2 or fewer cut sites in each of the selected sequence
may be used.

Construct Layout: Sequences will be displayed as Tags in the Construct Layout Panel, in the
order in which they will be ligated. Overhangs for each sequence, corresponding to the
selected restriction enzymes, will be displayed between sequence tags. Pairs of overhangs
will be green if they are compatible or red if they are not. Unused overhangs that will be
generated at either end of a linear construct will be shown in grey.
To select primers or reverse complement a sequence, click on the H at the right of the
sequence tag.
Ligation reactions will be shown on the sequence tags as follows:

Pre-digested if the sequence has existing overhangs on both ends


Cut by [enzyme] if the sequence will be cut with the selected enzyme
PCR Product if restriction sites for the chosen enzyme will be introduced via PCR and at
least one of the existing primers on the sequence is used
Full-length PCR if newly generated primers will amplify the full sequence and append
restriction sites at both ends
Where different reaction types are used at either end (Overhang, Enzyme or Primer),
this will also be indicated

14.9.1 Sequence ordering, rules and assumptions

When the Golden Gate options are launched, or Auto-arrange is clicked, Geneious will detect
existing type IIS restriction sites, overhang annotations, primer annotations and blunt ends on
the selected sequences, and will try to produce the optimal arrangement based on the available
overhangs. Geneious will only consider cut sites where the recognition site will be cut out, or
primers that point towards each other. The ligation order and selected primers may be changed
manually in the Construct Layout Panel.

Geneious will automatically arrange sequences using the following rules, in order of prece-
dence:
14.9. GOLDEN GATE / TYPE IIS CLONING 249

Figure 14.8: The Golden Gate cloning setup window


250 CHAPTER 14. CLONING

1. Existing type IIS cut site(s): If Geneious detects a pair of appropriately orientated type IIS
sites with unique overhangs, these will be selected and no PCR primers will be designed.
If only one type IIS cut site is detected, then Geneious will design a PCR primer that
incorporates the site, and will design a second primer to introduce a compatible opposite
orientation restriction site based on rules 3-6.

2. Existing Overhang(s): If Geneious detects a pair of valid overhangs compatible with the
specified type IIS site, then these will be selected. No PCR primers will be designed.

3. Existing Primer Bind annotation(s) with valid type IIS cut site(s) on the extension: If
Geneious detects a pair of inward facing Primer Bind annotations with extensions that
produce valid, compatible, type IIS restriction sites, these will be used. New primers will
not be designed. If Geneious detects a single Primer Bind annotation with a suitable type
IIS site Geneious will use this, and will design a second primer to introduce a compatible
opposite orientation restriction site based on rules 4-6.

4. Existing Primer Bind annotations with other extensions: If Geneious detects a Primer
Bind annotation with an extension which does not contain a valid type IIS site, this primer
will be used with an addition to the primer extension to introduce a cut site for the se-
lected type IIS restriction enzyme. This will be done by further extension to the 50 termi-
nus of the existing primer.

5. Existing Primer Bind annotation(s) without extension(s): If a primer bind annotation


without an extension is found, then an extension will be appended to introduce a cut site
for the selected type IIS recognition enzyme, resulting in a new primer sequence.

6. Blunt ends: If Geneious finds a blunt end, and no suitable type IIS sites or Primer Bind
annotations are present, then a primer with an appropriate type IIS site extension will be
designed. The fusion point will be the termini(us) of the blunt end fragment.

Important:

• Removal of unwanted internal Type IIS sites: If one or more sequences have cut sites
for the specified type IIS restriction enzyme, then Geneious will assume that these sites
should be used in the assembly process and design a strategy accordingly. If any of the
sequences contain type IIS restriction sites that should not be used in the assembly then
these will first need to be engineered out of the fragment using PCR.

14.9.2 Batch cloning

Batch Golden Gate cloning can be performed where one or more parts to be cloned is a list
of sequences. Each of the individual nucleotide sequences in the list will be inserted into a
separate product at the same insert position. If multiple sequence lists are used, all possible
combinations of products will be output.
14.10. TOPO® CLONING 251

Note that the sequences within each list must be in the same orientation and must use the same
overhangs.

Sequence lists are represented by the brown tags in the construct layout panel, and the number
of products that will be generated is listed below the viewer window. Products can optionally
be saved in a sequence list, by ticking the option save results as list.

14.10 TOPO® Cloning

TOPO Cloning lets you ligate a single fragment into a Vector within only 5 minutes using the
natural activity of Topoisomerase I which recognizes a specific motif 50 - (C/T)CCTT - 30 on the
DNA. TOPO is a registered trademark of Invitrogen Corporation.

The TOPO cloning function under the Cloning menu allows you to insert linear fragments
into either linear TOPO vectors (when a TOPO-site is present at the extremities) or into circular
TOPO vectors. You can select as many sequences at once as you like, they will be ligated into
each other in a batch operation.

• Three different options (TA- Blunt- or Directional cloning) are shown on the top. If Direc-
tional is selected the user can define an overlap sequence. If this field is blank it has the
same effect as Blunt cloning.

• The field below shows which of the selected sequences have been detected as vectors, all
other sequences are inserts.

• If any complications occur, eg. when more than one TOPO site is detected or when a
linear sequence with TOPO site is selected it will print a message in this box, also showing
how the corresponding sequence is processed if the user clicks OK.

• The resulting sequences will be optionally saved in a sub-folder.

14.11 Copy-paste cloning

Cloning reactions can also be simulated using basic copy and paste functions in Geneious
Prime. To use this, select the region you wish to insert into your vector and use command/cntrl-
C or Edit → Copy to copy it. Then select the region in the vector to be replaced, right-click and
choose Paste with Active Link or use the shortcut key Ctrl-Alt-Shift-V (Command-Alt-Shift-V
on Mac) (see Figure 14.11).

A new document containing the product of the copy-paste will then be created, and this docu-
ment will be actively linked to the two parent documents (see Chapter 6).
252 CHAPTER 14. CLONING

Figure 14.9: Batch golden gate cloning example with a list of codon-optimized CDS sequences
14.11. COPY-PASTE CLONING 253

Figure 14.10: TOPO Cloning options dialog.

Figure 14.11: Right-click menu on the destination sequence, showing Paste with Active Links
option.
254 CHAPTER 14. CLONING

Note: Paste with Active Link does not check for compatible overhangs. It is up to you to ensure
these are correct.

14.12 Analyze Silent Mutations

Analyze Silent Mutations under the Cloning menu finds restriction sites that could be intro-
duced by one or more synonymous mutations to a coding sequence.

To specify the set of restriction enzyme sites you want to introduce, first create a new enzyme
set containing the restriction enzymes you require. See section 14.3 for how to create a new
enzyme set. Then open the Analyze Silent Mutations options and select this list in the Restric-
tion Enzymes drop down.

To limit the number of substitutions required to generate a restriction site, check Limit maxi-
mum substitutions to and set it to the number you require.

Analyze Silent Mutations can be run on either the entire sequence, or a selected region. If
no region is selected, Geneious will use the first reading frame starting from base 1 of the
sequence to determine the synonymous substitution sites. If you use a selected region of se-
quence, Geneious will use frame 1 of the selected region, unless you select an annotation with a
different frame specified in the annotation properties (e.g. a CDS annotation with the qualifier
codon start=2 will use frame 2).

Once the analysis has run, green Potential Restriction Site annotations will be generated on
the sequence. To see the modification required to create a particular restriction site, mouse
over the annotation. To apply the modification to the sequence, first click Allow Editing. Then
click on the potential restriction site annotation to select it, right click and choose Apply po-
tential restriction site from the drop down menu. The sequence will then be edited with the
modification required, and the potential restriction site annotation will be changed to a regular
restriction site annotation. Figure 14.12 shows an example site before and after applying the
change to create the new site.

Note that if an intact restriction site is present in the selected region then a blue Restriction Site
annotation for the site will be added to the sequence.

To remove existing restriction sites from a sequence using synonymous substitutions, use Op-
timize Codons (see Section 14.13).

14.13 Optimize Codons

The Optimize Codons. . . operation is accessed via the Cloning button on the Toolbar, or via
Tools → Cloning in the main menu. This tool allows you to adapt a nucleotide sequence to the
14.13. OPTIMIZE CODONS 255

Figure 14.12: A. Potential restriction site annotated on a sequence after running Analyze silent
mutations. B. Shows the same site after applying the modification.

genetic code and “preferred” synonymous codon usage of a particular expression host.

The resulting sequence is optimized to avoid or reduce the use of codons that rarely occur in the
highly expressed genes of the expression host, increasing the likelihood that the gene product
will be expressed at a higher level if the optimized sequence is synthesized and recombinantly
expressed in the expression host.

In addition, the tool can introduce synonymous codon changes to eliminate “forbidden” se-
quence motifs, such as homopolymers, recognition sites for a specific set of restriction enzymes,
or other undesirable sequences. Simultaneous sequence optimization while avoiding including
or introducing forbidden motifs uses the algorithm described by Condon and Thachuk 2012.

“Preferred” codons are specified using an appropriate codon usage table (CUT) that reflects
synonymous codon frequencies for coding sequences known (or predicted) to express at high
levels in the expression host.

Geneious Prime provides a number of CUTs you can use, or you can import and use your
own custom CUT, see How do I import a custom codon usage table?. We recommend users
consult the literature for advice on appropriate CUTs to use for a particular expression host.
In most cases you should not use CUTs compiled from whole genome data, such as those ob-
tained from https://www.kazusa.or.jp/codon/. In general, “whole genome” CUTs are biased
towards codons used by poorly expressed proteins and will be less likely to yield a CDS that
will give optimal high-level expression.
256 CHAPTER 14. CLONING

This tool outputs either the Fraction (frequency) or relative adaptiveness (w) of each optimized
codon, calculated based on the select CUT (see Sharp and Li, 1987). In Geneious Prime 2020
and onwards, optimized codons are selected in proportion to their relative frequencies among
synonymous codons (i.e., fraction entries in the chosen CUT).

You can configure the following options: (Figure 14.13):

Optimize Selected Region or Full Sequence: Select Full sequence to optimize the full length
sequence. If you wish to optimize a portion of a sequence, for example a CDS annotated
as part of a larger sequence, then select the CDS annotation prior to running Optimize
Codons. . . . If you manually select a region, you must ensure the selection is in-frame
with the coding sequence.
Source Genetic Code: Lets you select the genetic code to be used when translating the source
sequence/s. If you have selected multiple source sequence documents with different
genetic codes, the choice “Multiple Values” will be available to indicate that the genetic
code associated with each document should be used. You can select a genetic code other
than the one that is shown as the default for the selected input documents if you want to
override the default.
Codon Usage Table: Lets you select a CUT for the target expression host. You can import
custom codon usage tables in GCG CodonFrequency and EMBOSS cusp formats. See
How do I import a custom codon usage table?. CUT formats supported by Geneious
include amino acid translations for each codon. If this information does not correspond
to the genetic code of expression host, you can specify an Override Genetic Code in the
Advanced options.
Optimize All codons in Sequence/Selected Region Select this option to generate a new se-
quence by randomly choosing among synonymous codons according to the specified
codon usage table. If you choose to Eliminate Rare Codons, codons with relative adaptive-
ness or fraction values above the threshold will be randomly sampled according to their
relative usage fraction. The rare codon threshold can be set in the Advanced options.
Optimize Rare Codons Only: Select this option to optimize only rare codons with relative
adaptiveness or fraction values below the threshold by randomly sampling among syn-
onymous codons with values above the threshold according to their relative usage frac-
tion.
Forbidden Motifs: Lets you specify sequences to avoid including or introducing in the result.
If you choose to Forbid Restriction Sites, the result will not include any sites that match
recognition sequences of the selected enzymes. Select Forbid Custom Motifs to specify
arbitrary sequence motifs to forbid in the result.
Save Result As: Creates a new sequence containing the optimized bases plus an annotation
track detailing the change at each optimized codon. If you choose to create two or more
co-optimized copies, a sequence list containing multiple different optimized sequences
will be generated.
14.13. OPTIMIZE CODONS 257

Annotate Sequence Without Changing Nucleotides: Does not change the sequence, but adds
an annotation track on the selected sequence with an annotation on each codon that
would be changed by optimization.

Advanced Options:

Override Genetic Code: Lets you specify the genetic code of the target organism, if it differs
from the genetic code implied in the CUT. Ensure you select the correct genetic code for
your target expression host.

Rare Codon Threshold: A number between 0 and 1. Set whether you wish to use the fre-
quency (Fraction) or Relative Adaptiveness of a codon as a threshold; codons with values
less than this threshold are candidates to be replaced by higher-value codons that trans-
late to the same amino acid. If a fraction threshold is specified and no codons with high
enough values translate to the correct amino acid, the highest fraction synonymous codon
will be used even though it falls below the threshold.

Restrict Maximum Length of Homopolymer Repeats: Specify the maximum allowable length
of repeats of the same nucleotide.

Maximize Distance Between Co-optimized Copies: When generating two or more co-optimized
result sequences, this option attempts to use a different codon in each result whenever
possible. Rare codons will not be introduced, but this option will cause codon usage
frequencies to deviate from the target distribution.

Specify Random Number Seed: By default, Optimize Codons will normally produce a differ-
ent sequence each time it is used on the same input sequence. Use this option to override
this behavior: If the same seed is used Optimize Codons will generate the same result
each time for the same input sequence (providing all the same options are used). The seed
used for Optimize Codons can be found in the annotation track properties (visible when
mousing over the track name) or, if saving a new document, in the Document History, in
the Info tab above the Sequence View.

Results display

After the analysis has finished, either a new document will be created containing the opti-
mized bases (if you choose to save the result as a new document), or optimized codons will be
annotated on the original sequence (if you chose to annotate sequence without changing the
nucleotides). With either option, an annotation track on the sequence contains the details about
each optimized codon, including the codon change, synonymous codons and the Fraction or
Relative Adaptiveness values for those codons (depending on what was set in the Rare Codon
Threshold option), see Figure 14.14.
258 CHAPTER 14. CLONING

Figure 14.13: Codon Optimization options dialog.

If you have chosen to annotate the sequence but not change the nucleotides, you can change the
sequence to the optimized codon at a later time by right clicking on the annotation and choos-
ing Apply Optimized Codon. Optimized codons applied to the sequence can also be reverted
to the original by right-clicking the annotation and selecting Revert Optimized Codon.
14.13. OPTIMIZE CODONS 259

Figure 14.14: Codon Optimization results, showing annotation track of optimized codons
260 CHAPTER 14. CLONING
Chapter 15

CRISPR

15.1 CRISPR site finder

The CRISPR system is an RNA-guided endonuclease technology for gene editing. This system
uses a guide RNA (gRNA) of around 20 bp, next to an enzyme-specific PAM (Protospacer
Adjacent Motif), to direct the enzyme complex to the cleavage site. The Find CRISPR Sites
tool searches for gRNA (“CRISPR”) sites in your selected sequences, and scores them based on
on-target sequence features and off-target interactions.

Find CRISPR Sites can be run on any number of sequences and sequence list documents,
including on selections within the documents. For best performance the target sequence for
each document should be limited to 1000 bp.

To use the tool, select your target sequences and click Find CRISPR Sites in the Annotate and
Predict menu. Check Anywhere in sequence or Selected region depending on whether you
are using a selection or not, and choose the PAM Site Location (see below). Enter the Target
and PAM Site motifs you want to search for in the CRISPR site panel (for syntax help, hit the
help button to the top right of the panel). You can then select your scoring and pairing options
and hit OK. Any CRISPR sites found will be annotated back on to your original sequences, with
the associated scores listed in the label on the CRISPR annotation.

PAM Site Location

Different CRISPR enzymes recognize different PAM site motifs. The CRISPR-Cas9 family rec-
ognizes PAM Sites on the 30 end of the guide sequence, but the more recently discovered
CRISPR-Cpf1 enzymes (Zhang et al. (2016)) recognize a 50 PAM site. Change the PAM Site
location option to match the orientation of the enzyme you are using. Note that activity (on-
target) scoring methods for Cpf1 sites are not currently available. If you have a particular
scoring method you would like to use for Cpf1, please contact Support.

261
262 CHAPTER 15. CRISPR

Figure 15.1: The Find CRISPR Sites setup options

Activity Scoring

Activity, or on-target scoring looks at the sequence features of the CRISPR site itself, and com-
pares them to an experimentally determined model. The model then scores the site based on
its predicted level of activity.

Geneious Prime offers activity scoring for Cas9 CRISPR sites using the methods from Doench
et al. (2014) and Doench et al. (2016). The Doench et al. (2014) method analyzes the one- and
two-base features of the gRNA, as well as the GC content, to generate the score. Scores are
between 0 and 1, with a higher score denoting higher expected activity. Note that CRISPR sites
with ambiguous bases will have an undefined score and be colored blue.

Doench et al. (2016) also accounts for the sgRNA location within a protein sequence, melting
temperatures of features of the sgRNA and the sgRNA sequence as a whole, and counts of
single and dinucleotides in a position-independent manner. Scores are between 0 and 1, with a
15.1. CRISPR SITE FINDER 263

higher score denoting higher expected activity. CRISPR sites with ambiguous bases will have
an undefined score and be colored blue. The Doench et al. (2016) algorithm requires python to
run; on Windows this will install automatically the first time that the scoring algorithm is used,
and on mac and linux the system installation of python will be used (python 2.7 or 3.6-3.9 is
required).

Specificity Scoring

Use these options to score your CRISPR sites based on how unique they are and how likely
they are to cause off-target effects. Scores are between 0 and 100, with a higher score denoting
better specificity and less off-target activity.

When finding CRISPR sites within a Selected region of your sequence, Geneious will check
for off-target interactions in the unselected regions of your sequence. If Score against an off-
target database is selected, Geneious will also search sequence documents in the selected folder
for off-target interactions. While doing so, Geneious will skip any sequences in the off-target
database that are exact duplicates of the target sequence, and report the intervals it skipped at
the end of the operation.

Use Maximum mismatches allowed against off-targets and Maximum mismatches allowed
to be indels to tailor how similar off-target sites must be to the CRISPR site before they con-
tribute to the score.

If multiple sequences are given as the targets, the option to Score each sequence against all
other selected sequences is available. This checks for off-target interactions between the target
sequences and incorporates them into the off-target scores.

The scoring algorithm for CRISPR sites with 30 PAMs was proposed by Zhang et al. (2013).
Mismatched bases between the CRISPR site and off-target sites have different weightings in the
final score, which are experimentally determined and based on their position. Geneious also
recognizes a 10 bp seed region immediately next to the PAM, which can tolerate a maximum
of 2 mismatches (Cho et al. (2014)).

For CRISPR sites with 50 PAMs, a modified version of the Zhang et al. (2013) algorithm is used.
Geneious recognizes a 6 bp seed region next to the PAM, which tolerates up to 2 mismatches
and has a high mismatch weighting in the score. From position 7 to 18, mismatches have an
average weighting, and after position 18, mismatches do not contribute to the Specificity score.
These weightings were not experimentally determined and were chosen to give a similar score
spread to the Zhang et al. (2013) method. See Kim et al. (2016) and Kleinstiver et al. (2016) for
details on the seed, trunk, and non-weighted regions used.

Scoring against an off-target database will significantly increase the time taken for the operation to
complete.
264 CHAPTER 15. CRISPR

Pair CRISPR sites

The Cas9 D10 Nickase enzyme induces a single stranded break at the target instead of the usual
double stranded break. This enzyme can be used with a pair of CRISPR sites on complementary
strands to induce a double stranded break with sticky ends. This method also minimises off-
target effects because any that occur are single-stranded, and therefore repaired by the cell with
high fidelity. This process is described by Zhang et al. (2013).

Select Pair CRISPR sites to only return guides that are within range of another guide on the
opposite strand. You can specify the Maximum overlap of paired sites and the Maximum
allowed space between the paired sites, which are measured from the 50 end of the CRISPR
sites. Optimal CRISPR pairs will be linked when they are annotated onto your target sequence.

Results output

CRISPR sites are returned as annotations on your original sequences (see Figure 15.2). Hover-
ing the mouse over one of these annotations will bring up a tooltip about the CRISPR site: its
Target Sequence, PAM (Protospacer Adjacent Motif), and any scores that were calculated for
the CRISPR site.

If the CRISPR sites were scored through off-target analysis, the five most similar off-target sites
are also included in the annotation. Mismatches between the CRISPR site and its off-target sites
are highlighted in red, and insertions in the off-target site are red and underlined.

CRISPR site annotations can also be colored by their scores. Choose the score to color by using
the Color CRISPR sites by option. You can change which score is used for coloring later by
selecting Color by / heatmap in the annotation Track options. The colors move from green, for
good scores, through yellow, and down into red, for poor scores.

Figure 15.2: Putative CRISPR gRNA sites, colored according to their Doench et al. 2016 Activity
Score
15.2. ANALYZE CRISPR EDITING RESULTS 265

15.2 Analyze CRISPR Editing Results

To analyze the results of your CRISPR editing experiment, go to Annotate and Predict → An-
alyze CRISPR Editing Results. This operation measures the frequency of variants around the
CRISPR editing site by mapping reads to the target sequence. Mapped reads are collapsed
into clusters based on their difference from the reference and the frequency of each cluster is
reported in the alignment (for a description of the algorithm, see 15.2.3).

This operation is designed for amplicon sequences produced from either Sanger or NGS se-
quencing. Sequences must be in a sequence list, and paired reads should be merged via menu
Sequence → Merge Paired Reads prior to running this operation.

15.2.1 Operation options

Figure 15.3: Analyze CRISPR editing results options dialog.


266 CHAPTER 15. CRISPR

Reference Sequence

The reference sequence should be a short sequence spanning the CRISPR editing site, of similar
length to the reads. This sequence is normally the unedited, or target sequence for calling vari-
ants against. The reference sequence can be selected together with the reads prior to opening
the operation, or can be set from the operation dialog.

Workflows: The reference sequence option is not available from workflows. If this operation is
included in a workflow, the reference sequence must be provided as input to the workflow. Or
you can insert it into the workflow using the ’Add document chosen when running workflow’
option.

Variants of Interest

Only the portion of each read which spans the specified region of interest will be used for
variant calling. This region can be either a specified number of bases around the probable cut
site (default 50bp), the region currently selected in the sequence viewer, or the entire range
covered by the reads.

Reads will be entirely excluded from variant calling if they match poorly on the ends of the
reference sequence range matched by 99% of reads. See the algorithm overview for details.

Minimum Variant Frequency

The minimum variant frequency setting is used to exclude low frequency variants from the
results displayed. Note that this setting does not change the reported frequencies of variants,
i.e. the frequencies will be a percentage of both included and excluded variants.

Translation Frame

The translation frame is used for calculating variant effects on the protein. The genetic code is
obtained from the reference sequence properties which can be set in the Info tab or Sequence
View.

Sequencing Error Handling

Most of the time we can have reasonable confidence whether or not a rare variant is likely due
to sequencing error and either correctly collapse it into the cluster it belongs to or correctly
keep it separate. The setting Collapse sequencing errors with confidence controls what to do
in borderline cases.
15.2. ANALYZE CRISPR EDITING RESULTS 267

• If this value is 0, then reads will be collapsed (combined into a single cluster) if there is
greater than a 50% chance the variant is due to a sequencing error.

• A positive value for this setting will err on the side of not collapsing reads so that true
rare variants are unlikely to be excluded from the final results.

• A negative value for this setting will err on the side of collapsing reads so that variants
which are not real are unlikely to appear in the final results.

The value is log scale, so a value of +10 (or -10) means reads are collapsed (or not collapsed)
with 90% confidence it is correct to do so, ±20 means 99% confidence, ±30 means 99.9% confi-
dence.

Turning off this setting is equivalent to using a large positive value. For sequencing reads
without Phred quality scores, each base is assumed to have quality score of 20 (99% confidence)

15.2.2 Results display

The CRISPR editing analysis tool produces an assembly document showing one representative
of each cluster mapped to the reference sequence. The frequency of the variant, and its effect
(either No nucleotide variant, Frame shift, In frame stop codon, In frame protein variant, or In
frame silent variant) are shown in the sequence name. A summary of these results can also be
viewed in the assembly description and the Info tab.

Each variant is also annotated on the mapped read, but annotations are turned off by default.
To view the annotations, go to the Annotations and Tracks tab to the right of the viewer and
turn on the variant annotations. The annotations contain the same information as the Info
tab, such as variant effect and frequency. Using the annotations table, this information can be
exported in tabular format.

15.2.3 Algorithm Overview

1. A range within the reference sequence is identified by mapping all reads to the refer-
ence sequence and trimming them such that no more than 2 out of 10 bp at either end
mismatch. They are then further trimmed to ensure that no more than 50% of bases of
shorter lengths mismatch. This is to filter out primers on the ends of sequences. The
range of the reference sequence used is that which is covered by 99% of the trimmed
reads which map.

2. Untrimmed Reads which have more than 2 mismatches out of 10 bp at either end of this
reference sequence range are discarded. This is to ensure that reads which are very poor
quality on either end (and therefore not true variants from the reference sequence) do not
interfere with variant calling.
268 CHAPTER 15. CRISPR

Figure 15.4: Assembly output from analyzing CRISPR editing results

3. If the variants region of interest in the options specifies to use a region within a specified
number of bp of the cut site, the region of interest is calculated as follows

(a) Reads are sampled randomly, taking the intersection of the range of variants within
those reads until it converges to a small range.
(b) Repeat the above step 100 times.
(c) The ranges converged on are sorted by decreasing frequency.
(d) Take the union of the most frequent ranges until that covers 1/6 of the region of in-
terest size, but also include ranges with frequency over 1/5 of the highest frequency
as long as the combined range does not exceed 3/4 of the region of interest size.
(e) The full region of interest is centered on this range.

4. Read alignments are trimmed to the region of interest, and collapsed into identical clus-
ters. This means that reads which only differ from each other outside of the region of
interest will be considered identical.

5. Clusters are sorted by decreasing frequency, and lower frequency clusters are merged
into higher frequency clusters if it is likely the lower frequency cluster came from the
15.2. ANALYZE CRISPR EDITING RESULTS 269

higher frequency cluster due to sequencing error. Only substitution errors are considered
during this merging process. Indel sequencing errors are never collapsed.
270 CHAPTER 15. CRISPR
Chapter 16

BLAST

BLAST stands for Basic Local Alignment Search Tool (Altschul et al 1990). It allows you to
query a sequence database with a sequence in order to find entries in the database that contain
similar sequences. When “BLAST-ing”, you are able to specify either nucleotide or protein
sequences and nucleotide sequences can be either DNA or RNA sequences. Sequences can be
BLAST-ed against databases held at NCBI (see NCBI BLAST), or contained within your local
Geneious database (Custom BLAST).

16.1 Setting up a BLAST search

To run a BLAST search in Geneious Prime, select your query sequence or sequences and click
the BLAST button in the toolbar. This operation can also be accessed by going to the Tools
menu or by right-clicking (Ctrl+click on Mac OS X) on a sequence document and choosing
BLAST. You can choose to BLAST either your currently selected sequence documents or a
sequence you enter manually. If you choose to enter your sequence manually, then Geneious
will display a large text box in which you can enter your query sequence as either unformatted
text or FASTA format.

Select your database using the first drop-down box. Databases are grouped together under
their respective services. Then choose which kind of BLAST search you wish to run under
Program. The available programs will depend on the database you have chosen.

Geneious Prime can perform seven different kinds of BLAST search:

• blastn: Compares a nucleotide query sequence against a nucleotide sequence database.


• Megablast: A variation on blastn that is faster but only finds matches with high similarity.
• Discontiguous Megablast: A variation on blastn that is slower but more sensitive. It will
find more dissimilar matches so it is ideal for cross-species comparison.

271
272 CHAPTER 16. BLAST

Figure 16.1: BLAST Options

• blastp: Compares an amino acid query sequence against a protein sequence database.

• blastx: Compares a nucleotide query sequence translated in all reading frames against a
protein sequence database. You could use this option to find potential translation prod-
ucts of an unknown nucleotide sequence.

• tblastn: Compares a protein query sequence against a nucleotide sequence database dy-
namically translated in all reading frames.

• tblastx: Compares the six-frame translations of a nucleotide query sequence against the
six-frame translations of a nucleotide sequence database.

Three options are available for displaying your results:

• Hit table: Returns one alignment for every hit against the database and displays them in
a hit table. Each query displays a separate table and is also viewable as a query-centric
alignment. This is suitable for less than 100 queries.

• Query-centric alignment: Returns one alignment for each query, showing the hits aligned
against the query sequence. This is well suited for large batch searches but it doesn’t dis-
play a hit table.

• Bin into ’has hit’ vs. ’no hit’ Returns two sequence lists: one containing queries which
get a hit in the database, the other containing queries which don’t. Details about the hits
and alignments are discarded. This can be used to filter contamination (eg. human) from
sequencing reads.
16.2. BLAST RESULTS 273

You can also specify how much of each matching sequence to retrieve from your database:

• Matching region: Just the region of the database sequence which matches the query.

• Matching region with annotations: The region of the database sequence which matches
the query, plus any annotations on that sequence.

• Extended region with annotations: The matching region plus additional flanking regions
upstream and downstream.

• Full sequence with annotations: The entire database sequence (this could be large and
slow).

Geneious also allows you to specify most of the advanced options that are available in BLAST.
To access the advanced options click the More Options button which is in the bottom left of
the BLAST options. The available options vary depending on the kind of BLAST search you
have selected. For details on each of the options you can hover your mouse over the option to
see a short description or refer to the BLAST documentation from NCBI.

16.2 BLAST results

Once a search has started, a results subfolder will be created in the same folder as your query
sequence. Search progress is shown in the document table. The search can be cancelled by
clicking on the red square labelled Stop.

16.2.1 BLAST hit table

If you chose to return your results in a hit table, each search hit is displayed separately in the
document table sorted by bit score. The bit score gives an indication of how good the alignment
is; the higher the score, the better the alignment. In general terms, this score is calculated from
a formula that takes into account the alignment of similar or identical residues, as well as any
gaps introduced to align the sequences.

Search hits can also be sorted by other columns by clicking on the column header. Columns that
may be useful to sort by include E-value, Percent Identity, Query Coverage or Grade. E value
or “Expect value” represents the number of hits with at least this score that you would expect
purely by chance, given the size of the database and query sequence. The lower the E-value,
the more likely that the hit is real. The Grade column is a percentage calculated by Geneious by
combining the query coverage, e-value and identity values for each hit with weights 0.5, 0.25
and 0.25 respectively. This allows you to sort hits such that the longest, highest identity hits are
at the top.
274 CHAPTER 16. BLAST

Specifically, Grade = 50 ∗ f ractionCoverage + 25 ∗ (maximum(0, 1 − eV alue/10−20 )) + 25 ∗


(maximum(0, (percentIdentity − minGradedIdentity)/(100 − minGradedIdentity))) , where
minGradedIdentity is 50 for nucleotide and 25 for protein sequences

Figure 16.2: BLAST Hit table results

You can also download the full database sequence that corresponds to a BLAST hit. To retrieve
the full sequence select a BLAST alignment and go to File → Download Documents or click
the Download Full Sequence(s) button located above the viewer tabs. The full sequence will
be available in the Sequence View tab once the download has completed and the region that
matches the query sequence will be annotated as BLAST Hit (see Figure 16.3). In addition the
annotations from the full sequence will be transferred over to the BLAST alignment and can be
viewed in Alignment view.
16.3. NCBI BLAST 275

Figure 16.3: Document after full sequence download

16.2.2 Query-centric view

This view displays all of the hits to your query in a single alignment. Results of single BLAST
searches can be viewed in query-centric view instead of a hit table by clicking the Query Cen-
tric View tab at the top of the document table. Or, you can choose to only return a query-
centric alignment when you set up the BLAST search. This option is particularly useful for
batch BLAST, as only one alignment per query is returned and all the results are displayed in
a single folder. In this view each hit sequence in each alignment is annotated with a Search hit
annotation. If you mouse over the annotation you can bring up the values for E-value, pair-
wise identity, Grade etc. To display these values in a table, switch to the Annotations tab in the
sequence viewer and add these columns to the table by clicking the Columns button.

16.3 NCBI BLAST

Geneious Prime is able to BLAST to many different databases held at NCBI. These databases are
listed in the Tables 16.1 and 16.2, and can be selected in the Databases drop down menu in the
BLAST set up dialog. You must be able to connect to the internet from within Geneious Prime
to BLAST to NCBI, and if you are behind a proxy server you may need to enter your proxy
server settings under Tools → Preferences → Connection Settings, as described in Section
1.2.5.
276 CHAPTER 16. BLAST

Table 16.1: Nucleotide BLAST databases

Database Nucleotide searches


Nucleotide collection (nr) All non-redundant GenBank+EMBL+DDBJ+PDB sequences (no EST, STS, GSS
or HTGS sequences)
16S ribosomal RNA 16S rRNA sequences from bacteria and archaea
18S ribosomal RNA 18S rRNA sequences (Fungal)
28S ribosomal RNA 28S rRNA sequences (Fungal)
Environmental samples (env nt) Nucleotide sequences from large environmental sequence projects
Expressed sequence tags (est) Database of GenBank + EMBL + DDBJ sequences from EST Divisions
EST human Human subset of est
EST mouse Mouse subset of est
EST others Non-Human, non-mouse subset of est
Genomic Survey Sequences (gss) Genome Survey Sequence, includes single-pass genomic data, exon-trapped
sequences, and Alu PCR sequences
High Throughput Genomic Sequences (htgs) Unfinished HTGS: phases 0, 1 and 2 (finished, phase 3 HTG sequences are in
nr)
Human ALU repeat elements (alu repeats) A small database of Human ALU repeat elements
Human RefSeqGene (RefSeq Gene) NCBI transcript reference sequences from human
Internal transcribed spacer region (ITS) ITS region from fungal type and reference material
NCBI Genomes (chromosome) Complete genomes and chromosomes from the NCBI Reference Sequence
project.
NCBI Reference Genomic Sequences (refseq genomic) Genomic Reference sequences
Patented Protein Sequences (pat) Nucleotide sequences derived from the Patent division of GenBank
Protein Data Bank (PDB) Sequences derived from the 3D-structures of proteins from PDB
Reference RNA (refseq rna) NCBI Transcript Reference Sequences
RefSeq Representative genomes Best quality and minimum redundancy genomes from NCBI Refseq Genomes
Sequence Tagged Sites (dbsts) Database of GenBank+EMBL+DDBJ sequences from STS Divisions
WGS Human Whole-genome shotgun contigs for Homo sapiens

Table 16.2: Protein BLAST databases

Database Protein searches


Nucleotide collection (nr) All non-redundant GenBank coding region (CDS) translations+PDB+SwissProt+PIR+PRF
Metagenomic proteins (env nr) Translations of sequences in env nt
Patented Protein Sequences (pat) Protein sequences derived from the Patent division of GenBank
Protein Data Bank (PDB) Sequences derived from 3D structure Brookhaven PDB
Reference Proteins (refseq protein) NCBI protein reference sequences
UniProtKB/SwissProt Non-redundant protein sequences information from EMBL

16.3.1 Edit BLAST Databases

You can edit display settings for NCBI BLAST databases, and change which BLAST databases
are available via Geneious Prime by clicking on Edit Databases in the Tools → Add/Remove
Databases → Set Up BLAST Services window. The actual databases on the BLAST server will
not be changed by any edits made via this window. The following fields are available and may
be edited:
16.4. CUSTOM BLAST 277

• Database Name: A unique, case-sensitive name for the database which is specified by
the NCBI or other database server. This must be correct for Geneious to be able to find
and search the database. The database name may be composed of multiple parts, e.g.
‘wgs:9606’ to access WGS sequences for Homo Sapiens.

• Display Name: The name that is displayed in Geneious for that database. This can be
any unique and non-empty value.

• Description: Additional information to describe the database.

• Nucleotide / Protein: This option specifies the molecule type of the sequences contained
in the database. Either Nucleotide, Protein, or both options must be selected.

16.4 Custom BLAST

Custom BLAST allows you to create your own custom database from either FASTA files or
sequences in your local folders, and BLAST against it. The Custom BLAST plugin requires
access to NCBI BLAST+ binary files.

16.4.1 Setting up the Custom BLAST files through Geneious Prime

Geneious Prime provides a download manager to help you download and extract the Custom
BLAST files. To use it, go to Tools → Add/Remove Databases → Set Up BLAST Services
and select Custom BLAST from the Service drop-down box (see Figure 16.4). Make sure Let
Geneious do the setup is checked. Then click ‘OK’. After a few seconds the compressed file
containing all the files needed to run Custom BLAST will start downloading. You can click
‘Pause’ to pause the download. You can add and search Custom BLAST databases as soon as
it has finished downloading and extracting. If you shut down Geneious with the file partially
downloaded, you will need to start downloading it again from the beginning.

16.4.2 Setting up the Custom BLAST files yourself

It is also possible to manually install the NCBI BLAST+ binary files. You can download the
latest version of Blast+ from here:

https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Choose the appropriate installer for your operating system, download and extract it and install
Blast+ an appropriate location on your computer.

You will then need to let Geneious know the location of the Blast+ installation. To do this, go to
menu Tools → Add/Remove Databases → Set Up BLAST Services and set Service: to Custom
278 CHAPTER 16. BLAST

(a) Setup Options

(b) Downloading

Figure 16.4: Setting Up Custom BLAST

BLAST. Enter your data location or click Browse to point Geneious to the location of the Blast+
folder. Uncheck the option to Let Geneious do the setup and click OK. Geneious will now use
your manually installed Blast+ executables.

16.4.3 Adding Databases

Now that you have set up the executables, it is time to add databases to your BLAST.

Creating a database from a fasta file

To create a database from the sequences in a FASTA file, go to Tools → Add/Remove Data-
bases → Add BLAST Database and select Custom BLAST from the Service drop-down box
(Figure 16.5). Choose to Create from file on disk and then click Browse to navigate to the
FASTA file that contains the sequences you want to BLAST. Enter a name for the database and
click ‘OK’. There are two requirements for a FASTA file to be suitable for creating a database
from:
16.4. CUSTOM BLAST 279

Figure 16.5: Adding a FASTA database

• The FASTA file must contain only the same types of sequence (i.e. Nucleotide or Amino
Acid)

• The sequences in the FASTA file must all have unique names

If the file meets these requirements it will be added as a database, otherwise you will be in-
formed of the problem.

Creating a database from local documents

To create a BLAST database from sequences in your local documents folders, first select the doc-
uments that you want. Then go to Tools → Add/Remove Databases → Add BLAST Database
and select Custom BLAST from the Service drop-down box. Enter a name for the database,
and click ‘OK’.

16.4.4 Using Custom BLAST

Once you have added one or more databases, they will appear under Custom BLAST in the
BLAST database drop down (Figure 16.6). These can be used in exactly the same way as the
NCBI BLAST ones.
280 CHAPTER 16. BLAST

Figure 16.6: Searching a Custom BLAST database


Chapter 17

Workflows

Workflows allow you to group Operations together to reduce the number of steps required to
perform an often-used combination of Operations. Options for each Operation may be precon-
figured, or some or all options ’exposed’ for configuration when the Workflow is run. Geneious
Prime provides a number of example Workflows for a variety of tasks that you can try. Work-
flows can be run or managed via the Tools menu or the Geneious Toolbar (See section 17.1).
Workflows can be shared with other people either by exporting and importing them, or if you
are connected to a shared database, by ticking the option to share them (See Figure 17.1). If
you have any programming knowledge you can even add customised code in Java (See section
17.3).

17.1 Managing Workflows

Workflows can be accessed from the Workflows menu under Tools, or from the Workflows
icon in the Geneious Toolbar. If the Workflows icon does not appear in your toolbar, you can
add it by right clicking on the main Geneious Toolbar, choosing Customize... and checking
Workflows. The Workflows menu allows you to select and run saved Workflows, manage
your Workflows and create new ones.

The Manage Workflows... window lists all available workflows and contains options for view-
ing, editing, copying, deleting, exporting and importing workflows. Each workflow listed in
the Manage Workflows window with an icon next to it will be shown in the drop down un-
der the main Workflows menu (See Figure 17.1). The order in which workflows appear in the
drop down menu can be altered either by dragging and dropping workflows in the list to their
desired position, or by using the Move Up and Move Down buttons (see Figure 17.1). Multi-
ple workflows can be selected using Shift- or CNTRL-click, and they can then be exported or
added/removed from the drop-down list in one go.

Workflows shared by other users in a server database are indicated by an icon. These Work-

281
282 CHAPTER 17. WORKFLOWS

Figure 17.1: The Manage Workflows window

flows will not appear in the drop down menu unless you choose to show them. Workflows
shared by other users in a shared database are only editable by the creator. However, you can
use the Copy button to create your own personal editable copy of any Workflow.

17.2 Creating and editing Workflows

To create or edit a workflow, go to Manage Workflows... under the Workflows menu. Here you
have the option to create a New Workflow, View/Edit an existing Workflow, or Copy and edit
an existing Workflow. Each of these options opens an Edit Workflow window (See example
in Figure 17.2) where you can name your Workflow, describe its function, specify an icon for
display in the Workflows menu, share via a Shared database, and build/edit your Workflow
using the Add Step/Delete Step buttons.

Each Workflow is made up of one or more Steps. A Step may be an Operation (for example,
perform Muscle alignment), or a special Step (for example, Group Documents). Each Step
accepts one or more documents as input and produces one or more documents as output which
are then used as input to the next step in the Workflow. All documents selected when the
Workflow is run are provided as input to the first Step (unless specified otherwise, see section
17.2.2 for further information). The output from the final Step of a Workflow is saved in your
Geneious database. Outputs from intermediate steps are not saved, unless you include the
17.2. CREATING AND EDITING WORKFLOWS 283

Figure 17.2: The Edit Workflow window

Figure 17.3: Add step Menu options for building and editing Workflows

Save Documents / Branch option after the Step. A document is a single entry that can be
selected from your Geneious database. For example a single document can be an alignment, a
sequence list, a tree, or a stand-alone sequence.

The Add Step button (Figure 17.3) provides a dropdown menu with a range of Steps that can
be added to your WorkFlow. The purpose/function of each type of Step is summarised in Table
17.1.

17.2.1 Configuring options for Operation and Steps

For each Operation added to your Workflow, you can edit and specify values for the config-
urable options available for each operation. To do this, select the Step you wish to configure,
and click View/Edit Options. To set the Workflow up so that the options are preconfigured
284 CHAPTER 17. WORKFLOWS

Table 17.1: Add Step options for Workflows

Step Purpose/Function

Add Operation Runs one of the standard Operations available in Geneious. This can
include other Workflows

For Each Document Runs the next Step in the Workflow independently on each document
For Each Sequence/Extract Sequences From List Runs the next Step in the Workflow independently on each sequence,
extracting sequences from a list if necessary
Group Documents Groups results from multiple operations together so that they are all
used in a single invocation of the next operation
Group Sequences Groups sequences into a sequence list document

Add document chosen when running workflow Prompts the user to choose a document when they first start the
Workflow. This document may either be chosen from anywhere in
their database, or from one of the documents selected when the work-
flow is run. In the latter case, the selected document will be excluded
from the list of documents provided to the first Step of the Workflow.
If only a single document from the selected documents matches the
specified document type, then it will be automatically selected in-
stead of asking the user. In both cases, the selected document is com-
bined with each of the results from the previous Step in the Workflow.
See section 17.2.2 for more details
Combine with Earlier Document(s) For each result from the previous Operation, combines with the cor-
responding input document(s) of an earlier operation in the Work-
flow. The documents from the earlier Operation are added to the end
of the list of documents from the previous operation
Save Documents / Branch Optionally saves the current results. Also, optionally starts a branch
beginning from an earlier Step in the Workflow

Rename Documents Renames the document(s)


Copy Property Between Documents Works on a pair of documents. Copies a property from one docu-
ment to another document and outputs just the destination docu-
ment. If only a single document is provided, outputs that document
unmodified

Filter Documents Filters some documents based on the content of their fields

Sort Documents Sorts some documents based on the contents of their fields

Custom Java Code Write some custom Java code to do whatever you want. See section
17.3 for more details
17.2. CREATING AND EDITING WORKFLOWS 285

and cannot be changed when the workflow is run, select Expose no options, then select the
Operation options you want the workflow to use. To allow some or all or the options to be
configured each time the workflow is run, select Expose all options or Expose some options.
Exposed options can be presented in a number of ways as described below.

• Optionally label exposed options as: Use this to group and label all the exposed options
for this Workflow Step under a labelled section rather than mixing the options for this
Step in with options from other Steps in the workflow.

• Access exposed options via button: Rather than displaying all the options for this work-
flow Step in the top level options dialog, instead provide a button for the user to click on
before showing the options for this Workflow Step.

• Exposing options with dependencies: Some options have dependencies on other op-
tions. For example when a checkbox is off, another option may become disabled. If you
choose to expose a subset of the options rather than all options, the dependencies be-
tween options will be discarded. For checkbox options that have an associated value,
you will probably want to expose the associated value too. Often this associated value
doesn’t have a label in the user interface so it will appear with its programmatic name
immediately following the checkbox option in the drop down list of available options.

Note that for some Operations, not all options may be available when run in a Workflow. Some
special Steps also have configurable options, which in some cases can also be exposed when
the Workflow is running. For example, the Filter Documents Step can be exposed so that the
user can set the Filter criteria when the Workflow is running.

17.2.2 Advanced document management

Grouping and separating documents

In simple Workflows all documents provided to the Workflow are grouped as a single set of
documents which are used as input to a single invocation of the first Step in the Workflow. Each
Workflow Step will produce one or more output documents, all of which are grouped together
and used as input into the next Step in the Workflow.

However, it is possible to create Workflows where each Step in the Workflow may be invoked
in parallel on different sets of documents. For example, if the first Step in the Workflow uses
the For Each Sequence / Extract Sequences From List Step, then each input sequence is placed
into a separate document and the following Step will be invoked independently on each se-
quence. Each call to the following Step may produce one or more documents. Each of these
sets of output documents are used independently as inputs to multiple invocations of the next
Workflow Step. Alternatively you could group these results together again, using Group se-
quences or Group Documents to use as input to a single call to the next Workflow Step. No
286 CHAPTER 17. WORKFLOWS

matter what document grouping or separation (For Each...) Operations are used, each Step in
the Workflow is always run to completion on all datasets before starting on the next Step in the
Workflow.

Inputting documents into later stages of a Workflow

It is also possible to insert documents into later stages of the workflow, either from additional
documents the user selects in the options when they first start the workflow (using the Add
document chosen when running workflow Step) or to use documents generated from earlier
stages in the workflow (using the Combine With Earlier Document(s) Step). You can create
branches in your workflows by using the Save Documents / Branch Step often in conjunction
with a Filter Documents Step as the first Step in each branch. For an example on branching
and filtering, see the sample ‘Identify Organism’ Workflow.

17.3 Custom code in Workflows

Custom code allows you to create Geneious operations that do almost anything. The Work-
flow custom code automatically inserts the surrounding import statements for the complete
Geneious API and a class framework around the methods you implement here. Additional
import statements can be provided prior to the first method. Documentation for the API is
available at Geneious API. For more advanced programmatic access to Geneious (for example
creating importers, exporters or viewers), please download and refer to the Geneious Plugin
Development Kit.
Chapter 18

Geneious Education

This feature allows a teacher to create interactive tutorials and exercises for their students. A
tutorial consists of a number of HTML pages and Geneious documents. The student edits the
pages and documents to answer the tutorial questions, and then exports the tutorial to submit
for marking.

18.1 Creating a tutorial

Geneious Tutorials are comprised of HTML documents with linked images and geneious files.
Simply create your html documents, and place them together in a folder. The first page of the
tutorial should be called “index.html”, and this will be loaded as the main page. Geneious will
follow all hyperlinks between the pages, and external hyperlinks (beginning with http://)
will be opened in the user’s browser. If you want to include figures and diagrams in the pages,
just put the image files in the same folder and reference them with <img> tags like a normal
HTML document (supported image formats are GIF, JPG, and PNG).

If you want to include Geneious documents in your tutorial, simply place them in the same
folder as the html documents and they will automatically be imported into Geneious with
the tutorial. If you want to link to them from the tutorial pages, create a hyperlink pointing
to the file in the HTML document. For example, to create a link to the file sequence.fasta in
your tutorial folder, use the HTML <a href="sequence.fasta">click here</a>. To
open more than one document from a link, separate the filenames with the pipe (|) character,
for example <a href="sequence.fasta|sequence2.fasta">click here</a>. Note
that geneious files must contain only one document to be imported automatically with the
tutorial.

You can add a short one-line summary by writing your summary in a file called “summary.txt”
(case sensitive) and putting it in the tutorial folder. Make sure that the entire summary is on
the first line of the file, as all other lines will be ignored.

287
288 CHAPTER 18. GENEIOUS EDUCATION

Once you have all your files together, put the contents of the folder in a zip file with the exten-
sion .tutorial.zip. Subfolders within the zip file are supported in Geneious R8 and higher.

18.2 Answering a tutorial

Import the tutorial document into Geneious (use File → Import → From file, or drag it in). The
tutorial document and any associated geneious documents will be imported into the currently
selected folder. The tutorial itself will be displayed in the help pane on the right hand side of
the Geneious window. If you accidentally close the help pane, you can display it by choosing
Help from the Help menu.

If the tutorial requires you to enter answers, click the edit button at the top of the tutorial
window and type your answer in to the space provided. Click the Save button when you are
done.

If the tutorial has a link to a Geneious document, when you click the link the document will be
opened in the document viewer. Any changes you make to this document will be preserved
when you export the tutorial.

When you have finished the tutorial, export it by selecting the tutorial document and choosing
File → Export → Selected Documents from the main menu. Make sure that Geneious Tutorial
File is selected as the filetype, and then give it a name and click Export.
Chapter 19

Saving Operation Settings (Option


Profiles)

Profiles allow you to save the settings for almost any analysis operation in Geneious Prime so
they can be loaded later or shared with others. Eg. the recommended trimming parameters
for your organization can be saved as a profile and then shared on the Shared Database for
everyone to use.

The operation settings button appears in the bottom-left corner of any options window.
Click on this button to reset to defaults, load a profile, save a new profile or manage your
existing profiles, as described below.

Saving a profile

To create a profile, set the options up the way you want, click the operation settings button then
choose Save Current Settings. You can then enter a name for your profile and choose whether
it is shared. For a description of shared profiles see the section on sharing profiles.

When you save a profile it is attached to the particular analysis window that you have open.
Eg. if you save a profile for Alignment it can only be loaded for Alignment, not for Assembly.

Loading a profile

To load a profile, click the operation settings button and choose Load Profile and click on the
name of the profile you want to load. The settings for the operation will immediately update
to reflect the profile.

Note: Sometimes when you load a profile the settings may not exactly match what was saved.

289
290 CHAPTER 19. SAVING OPERATION SETTINGS (OPTION PROFILES)

This is because the available settings can change depending on what type of documents you
have selected.

Managing profiles

Click on Manage Profiles under Load Profile to see a list of profiles with options for delet-
ing, editing, importing and exporting profiles. See sharing profiles section below for more on
import and export.

Sharing profiles

There are two ways to share option profiles:

• Import and export from the Manage Profiles window allows you to save a file containing
a particular profile. These can be emailed to other Geneious Prime users and imported
for use with their data. The easiest way to import a profile is by dragging the file directly
in to Geneious Prime.

• If a profile is marked as Shared (when it was created or by editing it) then the profile
will be copied across to any Shared Database that you connect to. This means anyone
else who connects to the same Shared Database will automatically have the profile under
their Load Profile menu. Note: Once a profile is shared it cannot be un-shared, but it can
be deleted. Also, other users can edit or delete a shared profile at any time.
Chapter 20

Shared Databases

A Shared Database provides a synchronized storage location accessible by multiple concurrent


users, which behaves just like the local folders with additional Group based access controls
on folders. Once logged in, folders in the shared database are available under the Shared
Databases icon in the Sources panel.

Two types of Shared Databases are available in Geneious Prime. Your Geneious license will
provide access to set up a basic Shared Database using Direct SQL Connection. A Shared
Database with additional features and increased security is available to customers who have
purchased a license for Geneious Server Database in addition to Geneious Prime.

20.1 Using a Shared Database

A Shared Database can be used in the same way as your local database, but also allows multiple
users to access the same database at the same time. The folders in your Shared Database can be
private to an individual, shared with a subset of users, or shared with all database users, and
users can be given read or write access to documents as required.

This section provides an overview of what you need to know to connect to and use an existing
Shared Database. More information on setting up a Shared Database and administering access
to folders is available in sections 20.2.1 and 20.2.2.

20.1.1 Overview of Direct SQL Connection

This section provides an overview of key points for using a Shared Database with Direct SQL
Connection. For more details, including how to set up a new Shared Database, refer to sec-
tion 20.2.

291
292 CHAPTER 20. SHARED DATABASES

Connecting to your database:

• Before you can use a Shared Database with Geneious Prime, an SQL database needs to be
created and set up on your server. This requires knowledge of creating and administering
SQL databases.

• You can connect to your Shared Database from different computers or Geneious Prime
installations.

Sharing documents:

• Access to documents is controlled by creating Groups and Roles which specify the per-
mission level for members of the group (for more information on Groups and Roles, refer
to section 20.2.2).

• When new folders are added on the root folder, by default everybody has access to them.
This can be changed by a Database Admin.

• Subfolders will be added to the same group as their parent folder.

• Only Database Admins can create and remove groups and add users to new groups.

• If you want to change the group of a folder, you need to be a Group Admin.

Read only folders:

• If you have VIEW access to a folder, the documents will be read only.

• Read only folders are indicated by a grey padlock on the folder icon.

• If you try to edit a document in a read only folder, you will be asked to select a different
location to save a copy of the document with your changes.

20.1.2 Overview of Geneious Server Database

This section provides an overview of key points for using a Geneious Server Database. For
more details, refer the section 20.3.

Connecting to your database:

• Geneious Server Database is available to customers who have purchased a license for
Geneious Server Database in addition to their Geneious Prime license.
20.1. USING A SHARED DATABASE 293

• Once the database has been set up, you can connect to your shared database from differ-
ent computers or Geneious Prime installations.

Sharing documents:

• Access to documents is controlled by creating Groups and Roles which specify the per-
mission level for members of the group (for more information on Groups and Roles, refer
to section 20.3.2).

• When new folders are added on the root folder, by default they are private to the user
who created them. This can be changed by a Database Admin.

• Subfolders will be added to the same group as their parent folder.

• All users can create and remove groups and add users to new groups by default. This
can be changed by a Database Admin.

• By default, you can change the group of a folder if you have EDIT access. This can be
restricted to Admins by a Database Admin.

Read only folders:

• If you have VIEW access to a folder, the documents will be read only.

• Read only folders are indicated by a grey padlock on the folder icon.

• If you try to edit a document in a read only folder, you will be asked to select a different
location to save a copy of the document with your changes.

20.1.3 Connecting to a Shared Database

This section explains how to connect to an existing Shared Database in Geneious. You may need
to speak to your system administrator for connection details, including the database type and
your user account details. System administrators can also preconfigure the connection settings
for users via the geneious.properties file (see section 22.3). If you are looking for information
on how to set up a new Shared Database to use with Geneious Prime, refer to section 20.2.1.

To connect to a Shared Database, select the Shared Database service in the Sources panel and
click on New database connection. Select the Connection Method tab that corresponds to
your shared database – either Direct SQL Connection or Geneious Server Database – and enter
the connection details provided by your system administrator. You can optionally choose to
Connect account on start up (selected by default) if you would like to automatically log in to
your shared database every time you use Geneious.
294 CHAPTER 20. SHARED DATABASES

Note that from Geneious Prime 2019.2.1 onwards, the drivers required to use Windows Au-
thentication with Microsoft SQL Server are bundled with Geneious. To use this option, select
Microsoft SQL Server as your database type and check Use Windows Authentication. If you
are on an older version of Geneious and wish to use Windows authentication, contact support
for further instructions.

You can log in manually by right clicking on the root folder for your shared database, and
selecting Connect. You can also Edit Account Details via this menu, if required.

20.1.4 Removing a Shared Database

To remove your Shared Database from Geneious Prime, simply right click on its root folder and
choose Remove database. This will remove your connection to the Shared Database, but will
not delete the database nor any of the data contained within it, nor will it remove your account.
You can add the database again following the instructions to section 20.1.3.

20.1.5 Storing Documents in a Shared Database

Document Compatibility Between Versions

By default the Shared Database stores documents in the latest document format the same way
the Local Database does. This can cause compatibility problems if different sets of users are
using different versions of Geneious. For example if a user on Geneious R11 saves a new
alignment document, users on Geneious R9 will be unable to read those documents.

To get around this issue the Shared Database can be configured to save documents in a format
specified by an older version of Geneious if that version is still supported. In most cases this
will allow for users using Geneious 8.1 or later to seamlessly share documents with users of any
version back to Geneious 6.0. However be aware that if the document format has changed since
the compatibility version, then users of the newer version may find their documents missing
properties or bug fixes that have been added since the format changed.

The Document Compatibility Version setting will apply to all users using the database with
Geneious 8.1 or higher and can only be set by a Database Admin. You will find this setting by
choosing Administration in the shared database context menu (visible when you right-click on
the database root folder) and then Set Document Compatibility Version.

Document Size Limitations

For Direct SQL Connection Shared Databases, document size is limited by the underlying
database system. Talk to your system administrator if you work with very large files such
as contig assemblies and complete genomes.
20.2. DIRECT SQL CONNECTION 295

Geneious Server Databases have no limits on document size, and performance with large doc-
uments is significantly better than with Direct SQL Connection Shared Databases.

20.1.6 Backing up your shared database

It is important that you back up your shared database on a regular basis following the rec-
ommendations for the SQL system you are using. For example, the PostGreSQL manual (see
link below) provides instructions on how to create and restore ’dump’ backups of a shared
database. https://www.postgresql.org/docs/9.1/static/backup-dump.html

Note that for Geneious Server database, data is also stored on the file system in the direc-
tory /home/HOSTNAME/.geneiousServerX.Ydata (where x and y are version numbers for
Geneious), so this must also be backed up. See this article for further information.

20.2 Direct SQL Connection

A basic shared database can be set up using Direct SQL Connection, to provide a synchronized
storage location for Geneious data, accessible by multiple concurrent users. To use a Direct SQL
Connection Shared Database, an empty SQL database first needs to be set up using a supported
database management program (see section 20.2.1). Once the database has been set up, users
can connect to the Shared Database from Geneious Prime, add files to it and use it in the same
way as the local database.

Folders added to the shared database will be shared with all users by default, but this can be
configured by the Database Admin(s). Database Admins can also add Groups and set user
Roles for each group, which control access to folders by granting users permission to view, edit
or administrate a group.

20.2.1 Setting up a Direct SQL Connection Shared Database

This section has instructions for setting up a new Shared Database. For instructions on how to
connect to a Shared Database from Geneious Prime refer to section 20.1.3.

Supported Database Systems

To use a database as a Shared Database, Geneious requires that it support transactions with
an isolation level set to READ COMMITTED. Supported database vendors include Microsoft
SQL Server, PostgreSQL, Oracle and MySQL. It is possible to use other database vendors if you
provide the database driver, see section 20.2.1.
296 CHAPTER 20. SHARED DATABASES

The following SQL database versions (with the default configurations) are currently tested as
Shared Databases:

• Microsoft SQL Server 2019

• PostgreSQL 13.2

• Oracle Database 19c

• MySQL 8.0

Document Size Limitations

For Direct SQL Connection Shared Databases, document size is limited by the database’s max-
imum binary large object (BLOB) size. This varies between 1-4 GB. Refer to the documentation
of your database system for the maximum BLOB size.

Recommended SQL configurations for the Shared Database

In most cases, the default settings for whatever SQL database system you are using are suf-
ficient. The character set encoding for MS SQL should be set to UCS-2, all others should use
UTF-8 to avoid any potential problems indexing documents containing unusual characters.

To avoid connection failures when many users access the Shared Database, the default connec-
tion pool should be adjusted to accommodate 6 connections for each user.

There are a few exceptions where we recommend changes to the default configuration, listed
below:

MYSQL

Some changes you may need to make to the MySQL configuration file (my.cnf) to improve
performance are as follows:

• innodb buffer pool size=[On a dedicated database server you may set this parameter up
to 80% of the machine physical memory size]

• innodb flush log at trx commit=2

• query cache limit=2M

• query cache size=128M

• max allowed packet= 1073741824 (Note this is the maximum size for max allowed packet
and can prevent errors when handling large files)
20.2. DIRECT SQL CONNECTION 297

Further optional improvements:

• innodb log file size=256M

• innodb log buffer size=16M

Note: If you change the innodb log file size then you will need to delete the current log files
before the server will start up again.

Microsoft SQL Server

The character encoding for MS SQL should be changed to UCS-2. Other default settings for
MS SQL server are generally sufficient. However, we strongly recommend using Snapshot
Isolation to avoid deadlocks. These can occur when multiple users use the shared database
at once and may lead to failure of some actions. This setting becomes necessary if there will
be more than a handful of users using the database concurrently. Read more about Snapshot
Isolation here.

Setting up an SQL database to use with Geneious Prime

Follow these steps to set up a Direct SQL Connection shared database to use with Geneious
Prime. The instructions in this section require knowledge of how to create and administer an
SQL database and are intended for your IT department or system administrator. For instruc-
tions on connecting to your shared database once it has been set up, refer to section 20.1.3.

1. Create database

• Install a supported database management system if you do not already have one
(see section 20.2.1).
• Create a new database with your desired database name. Make sure that you have
a user that has rights to create tables.
• Make sure to create the database with a suitable character set and collation - UTF8
(or UCS-2 for Microsoft SQL Server) is recommended.
• You should consider possible limitations to document size due to the database’s
maximum BLOB size when selecting a database system.

2. Add users to your database using the database management system

• Make sure all users of the database have SELECT, INSERT, UPDATE and DELETE
rights, otherwise they will not be able to use the Shared Database as intended.

3. Set up database with Geneious Prime


298 CHAPTER 20. SHARED DATABASES

• Run Geneious Prime and click on the Shared Database icon in the Sources panel,
then click the New database connection button to connect to your database. Geneious
will automatically set up the database when you connect, if it has not yet been set
up.
• This will only succeed if you have permission to create tables on the database.

4. Set the first Database Admin

• To create your first Database Admin, right click on the root folder of your database
and select Make the current user a Database Admin. This option is only available
when your shared database does not have any Database Admins.
• Additional Database Admins can be added by setting the user’s role in the group
Everybody to ADMIN.

5. Set up sharing permissions for your database

• By default, all folders in your database will be shared with all users. This will allow
you to use your Shared Database like a shared local database. If this is what you
want, no further setup is required.
• If you want to restrict access to particular folders you can do so using Groups and
Roles – refer to section 20.2.2 for more details. Groups can only be created by a
Database Admin.

Your database should now be ready to use with Geneious Prime. Users can connect to the
database by clicking on Shared Databases in the Sources panel and then clicking New database
connection. This will bring up a dialog for the user to enter in the database details. If you wish
to preconfigure the connection settings for users, you can do so via the geneious.properties file
(see section 22.3).

Supplying Your Own Database Driver

Shared Databases were designed with the supported databases in mind and packaged with
database drivers for them. However, Geneious allows you to supply your own jdbc database
driver if you want to, for example if you have an updated driver, or want to use a driver
for an unsupported database. It is not guaranteed that Shared Databases will work with an
unsupported database system but it is likely that it will if you provide the correct driver.

You can supply your own driver under More Options in the New database connection dialog,
or the Edit Account Details dialog (available by right clicking on the root folder of your shared
database) for an existing shared database.
20.2. DIRECT SQL CONNECTION 299

20.2.2 Administration of Direct SQL Connection Shared Databases

Administration options are available in the Administration sub-menu (accessed by right-clicking


any folder in the shared database), which allows Database Admins to manage database set-
tings, add and remove users and groups, and assign groups and roles. The Administration
sub-menu and options are only available to users with admin privileges for the database.

Admin Users

There are two types of Admin users in a shared database: Database Admins and Group Ad-
mins.

Database Admins
A Shared Database can have one or more Database Administrators. Database Admins are users
who have the ADMIN role in the Everybody group. This role should be assigned with great
care, as Database Admins always have access to all folders and documents in the database,
irrespective of any group roles that have been set for a particular group.

Database Admins are the only users of the shared database who can add or remove groups or
edit all user settings.

If there are no Database Admins in your shared database any user can set him or herself as
Database Admin with the option Make the current user a Database Admin. This option is
accessed by right clicking on any folder in the Shared Database, and is only available when
there are no Database Admins.

Group Admins
Group Admins are users with the ADMIN role for a particular group. The first Group Admin
for each group needs to be added by a Database Admin, after which Group Admins may add
additional Group Admins to the group if they wish.

The admin permissions of Group Admins are limited to changing the group of folders and
editing user roles for the group (or groups) that they administer. Group Admins cannot add or
remove groups – a Database Admin is required to do this.

User Administration

User administration can only be performed by Database Admins.


300 CHAPTER 20. SHARED DATABASES

Adding and Editing Users


Users must be added to your database via an appropriate SQL administration tool – you may
need to get your system administrator to do this. Users need to be created in the database and
granted SELECT, UPDATE, INSERT and DELETE permissions. For information on how to do
this, refer to the relevant database documentation.

Users will not be automatically added to the underlying database if you add them in Geneious
via Administration → Add New User. This option is useful if you want to set up Group roles
for a user before they have logged in for the first time, or before your system administrator has
created the user in the underlying database.

Once a user has logged in for the first time, you can edit their group roles and primary group
via Administration → Edit User. For more information on Groups and Roles, refer to sec-
tion 20.2.2.

Removing Users
Users can be removed from the Geneious Shared Database by going to Administration → Re-
move Users. This will delete the user’s Geneious Shared Database account, but will not remove
the user from the database (the user can log in again after being removed, which will register a
new Geneious Shared Database account for them in the Shared Database).

When a user is removed, any folders or data that they have added to the shared database will
remain. Any groups that the user is the sole member of will also remain, and will be accessible
by Database Admins.

Groups and Roles

Groups and Roles are used to manage the sharing of documents in your Shared Database.
Documents can be shared with all database users, private to an individual, or shared with a
subset of users, and users can be given read or write access to documents as required. This is
achieved by creating a Group in which users have the appropriate Roles, and assigning folders
to that group.

By default there is only one group, the Everybody group, and all folders are assigned to this
group, for which all users have edit permission. If you want to manage access to documents,
then additional groups with specified users can be added by a Database Admin. Once a group
is created, Group Admins can be added who can manage the group and edit roles for users
already in the group.

Adding and Removing Groups


Groups can be set up in the Geneious Prime interface as follows (see Figure 20.1).

• Right-click on the Shared Database root folder, go to Administration → Add Group


20.2. DIRECT SQL CONNECTION 301

• Enter a name for your Group and add the users you want to assign to that group. Group
members must be assigned ADMIN, EDIT, or VIEW permissions.

Note that only a Database Admin can create new groups.

Figure 20.1: Adding a new group (“Group B”) to the Shared Database, with “User b” assigned
EDIT privileges

Once a group has been created, folders can be added to that group by right clicking on the
folder and selecting Change Group of Folder (see Figure 20.2). To do so, a user must have the
ADMIN role for both the current and new group.

To remove a group, you must first ensure the group has no folders associated with it, then right
click on any folder in the shared database, click on Administration → Remove Groups and
select the group/s you wish to remove. Groups can only be removed by Database Admins.

User Roles for Groups


User Roles for a group specify the access level that each user has for the folders (and thus
documents) in that group. Users can belong to any number of groups and can have a different
role within each group.

The three roles are:

VIEW allows the user to view the contents of folders.


EDIT allows the user to view and edit the contents of folders.
ADMIN allows the user to view and edit the contents of folders, and to manage folders and
user roles for that group.
302 CHAPTER 20. SHARED DATABASES

Figure 20.2: Assigning the folder “User B data” to the Group “Group B”
20.2. DIRECT SQL CONNECTION 303

If a user is not in the group of a folder, they will not be able to access the documents in that
folder. Inaccessible folders will either be hidden or will display a red circle with a white bar
on the folder icon, depending on whether the Show Inaccessible Folders option (accessed by
right clicking on the root folder) is selected.

Everybody Group
The Everybody Group is a group to which all users have at least EDIT access. This group is
automatically created when the Shared Database is initially set up.

The Everybody Group behaves differently from user-created groups in the following ways:

• Group Admins of the Everybody Group are always Database Admins.

• The Everybody Group can never be deleted

• All users must have EDIT access

• The root folder always belongs to this group and cannot be changed

User’s Primary Group


The user’s Primary Group specifies the group for new folders that are created on the root folder
of the shared database. By default, a user’s Primary Group is the Everybody group. A user’s
Primary Group can be changed by a Database Admin, by right clicking on any folder, clicking
on Administration → Edit Users and selecting the user to configure.

To be able to move folders out of their Primary Group, a user must either have ADMIN per-
mission for their Primary Group.

Assigning Folders to Groups


Each folder in a Shared Database belongs to a Group, which defines the users who can access
the documents within that folder, and their access permissions. Any number of users can be
added to a group.

When a new folder is created in the shared database it will be added to a group as follows:

• If it is created in the root folder of the Shared Database, the new folder will be added to
the current user’s Primary Group (by default this is the Everybody group).

• If it is a subfolder (i.e the new folder’s parent is not the root folder of the shared database),
the new folder will be added to the same group as the parent folder.

Once a folder has been created, its group can be changed by right clicking on the folder and
selecting Change Group of Folder. Only Admins can change the group of a folder.
304 CHAPTER 20. SHARED DATABASES

20.3 Geneious Server Database

Geneious Server Database provides a synchronized storage location for Geneious data acces-
sible by multiple concurrent users, with a greater level of control and increased security for
data access for your organisation. This feature is available to customers who have purchased a
license for Geneious Server Database in addition to Geneious Prime.

Geneious Server Database is installed on a dedicated server and can be configured to utilise
existing authentication infrastructure such as LDAP to secure your data and simplify account
management. It provides increased security over the regular Shared Database by restricting
communication and preventing access to the underlying SQL database.

Once set up, users can connect to Geneious Server Database from Geneious Prime and use it in
the same way as the local database. Folders added to Geneious Server Database will be private
to the current user by default, but this can be configured by the Database Admin/s. Sharing
of folders can be controlled by setting user Roles, which grant users permission to View, Edit
or Administer a Group. Database Admins can choose whether Groups may be created by all
users, or only Admin users.

20.3.1 Setting up a Geneious Server Database

Instructions for setting up a Geneious Server Database are available in the Geneious Server
Installation Manual, which is provided with Geneious Server. Instructions for connecting to an
existing Geneious Server Database are available in section 20.1.3.

20.3.2 Administration of Geneious Server Databases

This section relates to Administration of Geneious Server Database from within Geneious, in-
cluding managing user access to folders and some database settings. For administration of
licensing and user accounts via the Geneious Server Admin interface, refer to the Geneious
Server Installation Manual.

Administration options within Geneious Prime are available in the Administration sub-menu
(accessed by right-clicking any folder in the database), including options for managing database
settings, adding and removing users and groups, and assigning groups and roles. The Admin-
istration options for Group management are available to all database users by default, but this
can be restricted to Admins only via Administration → Set Who Can Manage Folders if pre-
ferred.
20.3. GENEIOUS SERVER DATABASE 305

Admin Users

There are two types of Admin users in a Geneious Server Database: Database Admins and
Group Admins.

Database Admins
Geneious Server Database can have one or more Database Admins, who are users with the
ADMIN role in the Everybody group. This role should be assigned with great care, as Database
Admins always have access to all folders and documents in the database (including documents
in folders assigned to a user’s Private Group).

A Database Admin can also manage the following database settings available via the Ad-
ministration sub-menu (accessed by right clicking on the root folder of your Geneious Server
Database):

• Change Group for New Folders: This setting determines whether new folders created in
the database root will be added to a user’s Private Group (default) or to a user’s Primary
Group. A user’s Primary Group is Everybody by default, which can be changed via
Administration → Edit User.
• Set Who Can Manage Groups: This setting determines whether groups can be created
and removed by non-admin users (default), or only by Admins.
• Root Folder Access: This setting determines whether documents can be saved directly in
the root folder of the database, or must be saved in a subfolder (default). Folders can be
added to the root folder of the database by all users irrespective of this setting.
• Database Admins can also create new groups or users, and can modify and delete any ex-
isting users or groups. These options are also available to Group Admins, and depending
on the database settings may also be available to regular database users.

If there are no Database Admins in your Geneious Server Database any user can set his or
her self as Database Admin with the option Make the current user a Database Admin. This
option is only available when there are no Database Admins. As Database Admins can access
all folders in the database, it is important that the first Database Admin is set to the appropriate
person as soon as possible.

Group Admins
Group Admins are users with the ADMIN role for a particular group, who can manage folders
and user roles for that group. Group Admins can be added to a group when the Group is
created, or later by anyone who is an Admin for that Group.

For a Geneious Server Database which has been set to only allow Groups to be managed by
Admins, the first Group Admin for each group needs to be added by a Database Admin. This
306 CHAPTER 20. SHARED DATABASES

may be changed via Administration → Set Who Can Manage Groups, which by default allows
all users to create groups. Note that irrespective of this setting Database Admins will always
be able to access all data and reassign roles for all groups.

User Administration

The Administration sub-menu provides tools for adding, editing, and removing users. How-
ever, the Geneious Server system administrator first needs to add or remove users from the
underlying database system. For example if LDAP is being used for authentication then the
user must be added to or removed from the LDAP directory. Refer to the Geneious Server
Installation Manual for information about the Geneious Server Admin interface and adding
users.

Adding Users
If the server is configured to use database authentication, users can be added to your Geneious
Server Database either by going to Administration → Add New User, or via the Geneious
Server Database Admin interface.

For all other methods of authentication you must add users in the underlying system. In this
case, it is also possible to register a user in Geneious Prime using Add New User before they
are added to the underlying database system. This might be useful if you want to set up
Group roles for a user before your system administrator has created the user in the under-
lying database. Note that if you do this, the user will only be able to log in to Geneious once
they have been added to the underlying database system.

Editing Users
Database Admins can change the Group roles for a user, or set the user’s Primary Groups in
Administration → Edit User. Group Admins may access Edit User for users who are members
of the Groups that they Administer, and may edit the user’s Roles for those groups. For more
information on Groups and Roles, refer to section 20.3.2.

Removing Users
If the server is configured to use database authentication, users can be removed from your
Geneious Server Database either by going to Administration → Remove Users, or via the
Geneious Server Admin interface. For all other methods of authentication you must remove
users in the underlying system first.

To remove a user from a Geneious Server Database, folders associated with their Private Group
must first be removed or moved to a different group, as the Private Group will be removed
along with the user. Since groups cannot be removed when they have folder’s associated with
them, attempts to remove the user will fail if any folder’s remain in their Private Group.
20.3. GENEIOUS SERVER DATABASE 307

Any user-created groups that the user is the sole member of will remain in the database and
will be accessible by Database Admins. Additionally, any folders or data that a user has added
to your Geneious Server Database will remain after the user has been removed.

Groups and Roles

Groups and Roles are used to manage the sharing of documents in your shared database. Docu-
ments can be shared with all database users, private to an individual, or shared with a subset of
users, and users can be given VIEW or EDIT access to documents as required. This is achieved
by creating a Group in which users have the appropriate Roles, and assigning a folder to that
Group.

The default groups in Geneious Server Database are the Everybody group, for which all users
have EDIT permission, and a Private Group for each user. If you want to manage access to
documents, then additional groups with specified users can be added.

Adding and Removing Groups


Groups can be set up in the Geneious Prime interface as follows:

• Right-click on any Shared Database folder, go to Administration → Add Group

• Enter a name for the Group and add the users you want to assign to that group. Group
members must be assigned ADMIN, EDIT, or VIEW rights.

Who can add new groups to your database depends on the setting of Administration → Set
Who Can Manage Folders:

• If all users may manage groups (default), then any user may create a new group.

• If managing groups is restricted to Admin users, then only Database Admins may create
new groups.

Once a group has been created folders can be added to that group by right clicking on the folder
and selecting Change Group of Folder. This option is only available if you have permission to
change the group of a particular folder.

To remove a group, you must first ensure the group has no folders associated with it, then right
click on any folder in the shared database, click on Administration → Remove Groups and
select the group(s) you wish to remove. You must be a Group Admin to remove a group.

User Roles for Groups


User Roles for a group specify the access level that each user has for the folders (and thus
308 CHAPTER 20. SHARED DATABASES

documents) in that group. Users can belong to any number of groups and can have different
roles in each group.

The three roles are:

VIEW allows the user to view the contents of folders.

EDIT allows the user to view and edit the contents of folders.

ADMIN allows the user to view and edit the contents of folders, and to manage folders and
user roles for that group.

If a user is not in the group assigned to a folder, they will not be able to access the documents
in that folder and any folders in that group will not be visible for that user.

Everybody Group
The Everybody Group is a group to which all users have at least EDIT access. This group is
automatically created when the Shared Database is initially set up.

The Everybody Group behaves differently from user-created groups in the following ways:

• Group Admins of the Everybody Group are always Database Admins.

• The Everybody Group can never be deleted

• All users must have EDIT access

• The root folder always belongs to this group and cannot be changed

User’s Private Group


A Private Group is created automatically for each user when the user is added to the Geneious
Server Database. Each user is the sole member of their private group, and Group Admin for it.
Note that Database Admins have access to the Private Groups of all users.

Private Groups are distinct from user-created groups in the following ways:

• No other users can be added to a user’s Private Group, such that documents and folders
in a user’s Private Group are never accessible to other regular users for the purpose of
either viewing or editing (a folder in a Private Group can be moved to a different group
if you wish to share it).

• A user’s role in their Private Group cannot be edited

• Private Groups are created and deleted when the user is added or removed, and cannot
be created or deleted any other way.
20.3. GENEIOUS SERVER DATABASE 309

By default, new folders added to the root folder of a Shared Database will be added to the
user’s Private Group. This can be changed by a Database Admin by right clicking on any
folder, clicking on Administration → Change Group for New Folders and selecting the option
to use the user’s Primary Group.

Note that a Private Group is distinct from a user-created group which contains a single user.
Folders added to a user-created group with only one user in it will also be private to that user,
until such time as other users are added, but adding additional users is possible and the group
will behave like any other user-created group.

User’s Primary Group


The user’s Primary Group can be used to specify the group for new folders that are created on
the root folder of the shared database. The alternative (default) option is to assign these folders
to the user’s Private Group.

To assign new folders created under the root to the user’s Primary Group, go to Adminis-
tration → Change Group for New folders, and set Assign folders created under the root to
“User’s primary group”.

Each user can be assigned a Primary Group in Administration → Edit User. By default this
will be the Everybody group but it can be set to any group, including the user’s Private Group,
by a Database Admin. To be able to move folders out of their Primary Group:

• If all users can manage groups, a user must have the EDIT role for their Primary Group.
• If only Admins can manage groups, a user must have ADMIN permission for their Pri-
mary Group.

Assigning Folders to Groups


Each folder in a Geneious Server Database belongs to a group, which defines the users who can
access the documents within that folder, and their access permissions. User-created groups can
include any number of users. Folders in a group that a user does not have permission to access
will not be visible to that user.

When a new folder is created in the shared database it will be added to a group as follows:

• If it is created in the root folder of the shared database, the new folder will be added to
the current user’s Private Group (default) or their Primary Group. This is configurable
via Administration → Change Group for New folders.
• If it is a subfolder (i.e. the new folder’s parent is not the root folder of the shared database),
the new folder will be added to the same group as the parent folder.

Once a folder has been created, its group can be changed by right-clicking on the folder and
selecting Change Group of Folder. This option is only available for a folder if you have the
310 CHAPTER 20. SHARED DATABASES

correct permissions for the folder’s current group, which will depend on the Administration
sub-menu option to Set Who Can Manage Groups.

• If all users may manage groups (default), then any user with EDIT access to the folder’s
current and new groups may change the group of a folder.

• If managing groups is restricted to Admin users, then only users with the ADMIN role
for the folder’s current and new groups may change the group of a folder.

20.3.3 Backing up a Geneious Server Database

There are two parts to backing up a Geneious Server Database:

1. Backup the sql database using the standard backup procedure for your choice of SQL
database. The default SQL database name is ”geneiousserver” but the actual name used
is specified in the database.properties configuration file found in
/home/HOSTNAME/.geneiousServerConfig/.

2. Backup the following directory on the Geneious Server:


/home/HOSTNAME/.geneiousServerNN.Ndata, where ”HOSTNAME” is the actual host
name of the computer, and ”NN.N” is the major and minor version number of Geneious
Server, for example ”11.1” if running version 11.1.5. This directory contains the contents
of Geneious documents that are too large to store completely in the SQL database..

20.3.4 Audit Logging

Geneious Server Database empowers users to choose the level of access to grant to other mem-
bers of their organization. In order to enable system administrators to monitor any changes
made that can cause users to gain or lose access to data, Geneious Server Database records an
audit log of relevant events:

• Users or groups are added or deleted

• Users are added to or removed from a group

• Users’ roles within a group change

• Documents or folders are added, deleted, or moved

• A folder is assigned to a different group

The log file (GeneiousServerAccess.log) is saved to the Tomcat log directory.


Chapter 21

Command Line Interface

21.1 Running Geneious Prime operations from the command line

In Geneious Prime 2022 onwards, you can run Geneious operations from the command line.

To use the command line interface (CLI), you must have a licensed copy of Geneious installed
on the machine. Your license must have been activated either in the GUI interface, or by con-
figuring the geneious.properties file (see section 22.2.2) prior to using the CLI. You must also
install any plugins you wish to use via the Plugins menu in the GUI interface.

On Windows and Linux, no further configuration is required to run the CLI after installation
of Geneious, just open the Command Prompt or Terminal and type geneious to run it.

On macOS you must create symlinks yourself so that the geneious command can be run from
any directory in the Terminal.

To do this, use the following command:


sudo ln -sf /Applications/Geneious Prime.app/Contents/Resources/app/resources/ge
/usr/local/bin

(this assumes Geneious is installed in the default location in Applications)

Note that sudo (admin) privileges are required to create the symlink. However once this is
created, the geneious CLI can be run by non-admin users.

21.2 Running operations in the CLI

Geneious operations in the CLI follow the basic format:

311
312 CHAPTER 21. COMMAND LINE INTERFACE

geneious -i <input file name> --operation <operation name> <-options> -o


<output file name>

To see all available operations, type


geneious --list (add --filter <operation type> to filter the list on specific types
of operations)

To see all options for a given operation, type


geneious --options-for <operation> (add --advanced to see advanced options)

Note that the command line interface cannot currently access files directly from your Geneious
data folder. If you have files currently in Geneious you want to use as input to operations in
the CLI, you must export them from geneious so that they are stored as files on your drive.
They can be exported in either .geneious format or any of the common sequence formats. You
can also use raw sequence files (such as fastq files from your sequencing provider) that have
not been imported into Geneious.

To see some examples of how to use the CLI, type


geneious --examples

21.3 Configuring an options profile to automate setting options

Using an option profile file allows you to preset the options you want to use in Geneious, and
then apply them when running an operation in the command line interface.

For example, if you want to specify particular parameters for de novo assembly but do not
want to enter each one in the command line, you can set it up in the GUI interface and generate
an options profile file to use in the command line.

To do this, open the operation settings in the GUI interface for Geneious, and set any parame-
ters that you wish to change from the defaults. Click the settings cog at the bottom left of the
window and go Save current settings. Give your profile a name.

Then click the Settings cog again and go Load Profile → Manage Profiles. Select your profile
and click Export to create an options profile file. Save this to your drive.

You can then call this file in the command line using the -x option. Note that you must include
the full path to the file if you have saved it in a subfolder in your home directory. It is not
necessary to specify the operation when using an options profile file, as this is set in the profile.

For example:
geneious -i inputfile.geneious -x myoptions.optionprofile -o outputfile.geneious
21.3. CONFIGURING AN OPTIONS PROFILE TO AUTOMATE SETTING OPTIONS 313

Figure 21.1: Saving an options profile in the Geneious GUI


314 CHAPTER 21. COMMAND LINE INTERFACE

21.4 Running workflows from the CLI

Workflows configured in the GUI interface can be exported and run in the command line in a
similar way to the options profile file.

First, set up your workflow within Geneious using the Manage Workflows tool. Then click the
Export option to export it to a file.

Figure 21.2: Exporting a Workflow

You can then call this file in the command line using the option -w workflow name.geneiousWorkflow.

For example
geneious -i inputfile.geneious -x myworkflow.geneiousWorkflow -o outputfile.gene
21.5. LIMITATIONS 315

Note that if there are spaces in the workflow name, it should be enclosed in quotes ("my
workflow name.geneiousWorkflow).

21.5 Limitations

The command line interface currently has the following limitations:

• Data must be exported from your current Geneious database to a file on your drive in
order to be accessed by the CLI.

• The content of dialog windows that would normally show in Geneious may not be visible
in the command line (this may include operations that return no result)

• The name of the file output from the CLI may not necessarily be the same as the name of
the sequence document within the file that is displayed when imported into Geneious.

The command line interface is under active development. If you would like to see additional
functionality or have any comments regarding it, please contact support.
316 CHAPTER 21. COMMAND LINE INTERFACE
Chapter 22

Advanced Administration

22.1 Default data location

By default, the data location will be in the user’s home directory. You can change this by setting
an environment variable which will be used by the Geneious Prime launcher such as setting a
$HOME$ variable to be where you want a user to store their data.

On Windows and Linux, edit the Geneious.in.use.vmoptions file in the installation di-
rectory, and add -DdataDirectoryRoot=$HOME$/Geneious on a new line after the other
settings.

On Mac OS X, edit the /Applications/Geneious\Prime.app/Contents/Info.plist


and find the <key>Arguments</key> section to match the following:

<key>Arguments</key>
<string>-distributionVersion
-DdataDirectoryRoot=$HOME$/Geneious</string>

A special $JAVA USER HOME$ variable is normally used which resolves to user.home and is
what Geneious uses by default. The program will create a Geneious 2022.1 Data folder
inside the directory you specify.

If you wish to hide the local folders and force users to use only the Shared Database, edit the
geneious.properties file by removing the # before the line show-local-database=false.
See section 22.2.2 for further details on editing the geneious.properties file.

317
318 CHAPTER 22. ADVANCED ADMINISTRATION

22.2 Change default preferences

22.2.1 Change preferences within Geneious Prime

Start a fresh copy of Geneious Prime, set it up the way you want. Shut down and then copy the
file Geneious 2022.1 Data/user preferences.xml.

On Windows and Linux, copy the file to the Geneious installation directory and rename it to
default user preferences.xml.

On MacOS rename the file to default user preferences.xml and copy it to either
/Users/(username)/Library/Application Support/Geneious/ (to set preferences for a single
user account)
or
/Library/Application Support/Geneious/ (to set preferences for all users of a computer)

The default preferences set for all users of a computer will take precedence over default prefer-
ences set for individual users.

After copying the default user preferences file to the appropriate location, when users start
Geneious Prime for the first time they will get the configuration you set rather than the normal
default.

Examples of features you can change:

• Turn off automatic updates

• Set default custom BLAST location

• Set up a shared Database

• Set up a proxy server default

• Turn off particular plugins

Any users who have already run Geneious should click the “Reset All Preferences” button in
the Geneious Preferences to load these defaults.

22.2.2 geneious.properties file

Any preferences which can be set within Geneious Prime can also be set from the geneious.properties
file which can be found in the Geneious Prime installation directory. On MacOS this file is lo-
cated in Geneious Prime.app/Contents/Resources/app (Prime 2021.1 and later) or Geneious.app/Contents/J
(Prime 2021.0 and earlier).
22.3. PRE-CONFIGURING SHARED DATABASE CONNECTIONS 319

Note that on MacOS this file should be copied to an external location before it is edited.
Modifying this file directly in the application bundle will break its app signature, which may
have unintended side effects such as intermittent application crashes. The file should be copied
to either
/Users/(username)/Library/Application Support/Geneious/geneious.properties (to set prop-
erties for a single user account)
or
/Library/Application Support/Geneious/geneious.properties (to set properties for all users
of a computer)

Properties defined for all users take priority over properties defined for individual users.

Some examples of how to modify geneious.properties are present in the file already - remove
the hashes from the start of the lines and modify the values to use them. If you need to find out
how to set other preferences using this file, please contact Geneious support.

It is also possible to turn off access to all external services by editing this file (this is not possible
via the preferences inside Geneious). This includes access to NCBI, Uniprot, Geneious plugin
download, submission of error reports, support requests and usage tracking.

22.3 Pre-configuring Shared Database connections

The Shared Database connection settings can be set in the geneious.properties file so that users
do not have to enter these settings in Geneious (see 22.2.2). The properties to set depend on the
connection method used.

22.3.1 Create Direct SQL Connection through geneious.properties

Open the geneious.properties file and scroll down to ##database configuration , and en-
ter the details for database-host (the host running the database server), database-dbname
(the database name) and database-type (one of ”MYSQL”, ”POSTGRESQL”, ”SQL SERVER”
or ”ORACLE”). These three fields are required for the connection to work. You must also spec-
ify the location of the database-driver if using MySQL; this field is optional with the other
database servers. The optional property database-user will be used if provided. Remove
the # at the start of these lines for these settings to be used.

22.3.2 Create Geneious Server Database connection through geneious.properties

Open the geneious.properties file and scroll down to ##Geneious server connection
and enter the url for gserver-host (the host in which Geneious Server is running). Other op-
tional properties that can be provided are gserver-ssl (”true” to use SSL), gserver-context
320 CHAPTER 22. ADVANCED ADMINISTRATION

(Geneious Server context to use) and gserver-port (port to connect). If any of these optional
properties are not provided then the default values will be used. Remove the # at the start of
these lines for the settings to be used.

22.4 Pre-configuring license information

License information can be preset in the geneious.properties file so that users do not need to
enter this themselves (see 22.2.2).

22.4.1 Personal or Group licenses

Open the geneious.properties file in a text editor and scroll down to ##provide a flexnet-local
license key (except trial). Remove the # next to license-key= and add your li-
cense key.

22.4.2 Floating licenses

Scroll down to ##license server settings, and change override-property-flexnet server.hos


and override-property-flexnet server.port to the settings you require. Remove the
# at the start of these lines for the setting to be used. This setting cannot be used to configure
Sassafras KeyServer licenses.

22.5 Adding custom plugins to the Plugins menu in Geneious Prime

Administrators can add custom, in-house plugins to the Plugins menu in Geneious Prime, and
restrict access to publicly available plugins by doing the following:

1. Set up an XML file that contains information for each plugin you want to add, includ-
ing its name, description, version information, and a URL from which Geneious can
download it. For an example of the format to use, see https://desktop-links.
geneious.com/assets/plugins/DownloadablePlugins.xml

2. Go to Tools → Preferences → General, and click “Advanced. . . ”. Look for the advanced
preference CustomPluginXmlUrl. Select it, click Edit and enter the address of the XML
file you created. This can be a link to a web server hosting the XML file, or can be a link to
a file in the format file://<host>/path/to/file, which is installed at a known ac-
cessible location on local or shared disk. You must restart Geneious to apply the changes.
22.6. DELETING BUILT-IN PLUGINS 321

Geneious uses the contents of the XML file to add the custom plugins to the Plugins menu in
Geneious. All available plugins from Biomatters will still be displayed, along with the addi-
tional plugins from the custom XML file. The xml file is also used to determine when to notify
the user that a new version of a plugin is available to download.

If you wish to turn off access to the Biomatters-provided plugins, so that only the custom plug-
ins are visible, use the Advanced Preference DisableCheckForBiomattersPluginUpdates
and change this to “true”, then restart Geneious. If you wish to restrict network access to the
outside or control which Biomatters-provided plugins are available, you can locally mirror the
Biomatters plugins you want to provide, and use a custom XML file to specify the details.

22.6 Deleting built-in plugins

Features of Geneious Prime can be turned off in preferences as described in the section above
on changing default preferences. If you really want to delete a feature completely so your users
can’t reinstate it you should shut down Geneious Prime, go to the installation directory, into
the bundledPlugins directory and delete the desired plugin jar files/folders.

22.7 Max memory

On Windows and Linux, edit the files Geneious.default.64bit.vmoptions and Geneious.in.use.vm


in the installation directory and change the -Xmx value to your preferred setting.

On Mac OS X, edit the /Applications/Geneious.app/Contents/Info.plist file and


find the VMOptions section and modify the -Xmx setting.

22.8 Web Linking to Data in Geneious Prime

It is possible to create web links which will open data in Geneious Prime when they are clicked
in another program. This only works on Windows and Mac OS and Geneious Prime has to be
installed on the machine where the link is clicked. Note that some browsers (including Safari
on MacOS) may not support these type of links.

There are two types of links supported:

1. A link which will download and import a file from a given location into Geneious Prime.
To do this, use the following form of URL:
geneious://file=<PathToFile>
322 CHAPTER 22. ADVANCED ADMINISTRATION

For example, to download and import the pET15-MHL vector from the Addgene Plasmid
Repository, use
geneious://file=https://media.addgene.org/snapgene-media/v1.6.2-0-g4b4ed87/
sequences/22/50/12250/addgene-plasmid-26092-sequence-12250.dna.
To open a file in a local folder, use the following format:
geneious://file=file:///C:/Users/Your_name/Desktop/file_name
The file can be of any format which Geneious Prime is able to import (see 3.2), including
plugin format. Only one file can be linked to in this way.

2. A link which will select documents which are already stored in Geneious Prime, either in
your local folders or a Shared Database. To do this, use the following form of URL:
geneious://prime/documents?urn=<URNofDocument>
or to select several documents:
geneious://prime/documents?urn=<URNofDocument1>&urn=<URNofDocument2>
To find the URN of a document, click on the small column selector button at the top-right
corner of the document table and enable the URN column. You can then right-click on
the URN of a document in the table and choose Copy URN.
For documents in the Shared Database, you can automatically create a web link to a doc-
ument by right clicking on the document and going Copy link to Document. This can
also be performed for multiple selected documents. The full web link (not just the URN)
will then be copied to the clipboard. If the link is opened or clicked in another program,
such as a web browser or chat program, Geneious Prime will automatically open with
these documents selected. The link can also be pasted into the search box in Geneious to
quickly find the document. Anyone who has access to the same Shared Database can use
the link.

You might also like