Roslin Bioinformatics - VIPER

VIPER Pedigree Visualisation

VIPER incorporates our novel Sandwich Visualisation for animal pedigrees, which provides a space efficient, family centric representation of generations in the pedigree hierarchy. Multi-dimensional data (genotypes, error metrics etc.) can be overlaid on to individuals and families in the pedigree structure, providing a clear visual localisation of both reported inheritance inconsistencies and missing genotype data. Development of this visualisation has been published (Graham et al., 2011; Paterson et al., 2011).

Sections

The Pedigree Sandwich View Visualisation

Figure 1: The Sandwich View

Views of a pedigree with 2 child generations. A. The two generations stack as two sandwiches. In each generation offspring are grouped into family icons, sandwiched between the sire and dam parents. Note the duplication of female node G717 to allow the specified row ordering (by sire name, see details). B. The visualisation is modified to show individual Offspring icons. C. The offspring are now partitioned according to their sex. D. Rows have been resized and Generation 1 collapsed so that the names of individual male offspring in Generation 2 are legible.

Overlaying information about error rates, missing data and masking

Separate colour renderers are used to report various inheritance metrics and interface actions on to the individuals and family icons in the Sandwich View.

A: INHERITANCE INCONSISTENCIES

Three types of inheritance inconsistency are reported in the users preferred colour scheme (red here) by colour portions of the individual or family icon:

  • nil-inherited-from-sire inconsistency (triangle pointing to sire)
  • nil-inherited-from-dam inconsistency (triangle pointing to dam)
  • novel allele found

In the aggregate view of genotype data (i.e. for all markers) the red colouration reflects the error-rate of the individual on a grey to deep red colour scale, and the sensitivity of the the colour-map can be altered using the Individual Errorgram filter control. When the pedigree is reporting on a single focused marker the error values are Boolean.

Figure 2. Combination of Error Glyphs used in VIPER.

Six different combinations of possible error for genotype are shown as a Venn diagram (the asterisked combination being impossible).

B: INCOMPLETE DATA

A common feature of many pedigree genotype datasets is the occurrence of incomplete data ( i.e. missing genotypes for some markers on some individuals). This may be the result of systematic choices in study design (e.g. when only founders and terminal generation offspring are typed) or can be caused by sporadic assay failure, sample or data loss. Knowledge about the occurrence of incomplete data is critical for understanding the inheritance pattern of errors, because the inheritance checking algorithm must infer over missing genotype data. This can result in errors being propagated though missing data points and reported in near or more distant relatives of the actual source bad datapoint.

The user can choose an appropriate contrasting colour (blue here) to report missing data frequencies by the addition of a border to those individuals or families with missing genotype data. Again this is reported as a frequency on the aggregate view of genotype data (via the intensity of a blue border), whereas in the single marker focus view the border is applied in a Boolean fashion (see Figure 3).

 

C: MASKED DATA

The 'Masking' operation which is used to remove suspect data (genotypes and pedigree relationships) forms the basis of exploratory data cleaning in VIPER (described in detail elsewhere). Individuals that have been masked for genotype data are hatched over with the chosen 'missing data' colour (blue here). Where a pedigree link has been broken a(blue) coloured triangle is added to the individual pointing at the broken relationship (up for paternity, down for maternity) (see Figure 3).

Figure 3: Examples of Error, Missing Data and Masking Glyphs in VIPER.

A. Two families are shown, with the offspring grouped as a single hexagon. The generation 1 (F1) family (00Sx00F) has 3 individuals, for which the data is incomplete for some markers (blue border). The generation 2 (F2) family (001x002) has 13 offspring, reports data incompleteness and inheritance errors of all 3 types (see panel B). B. Expanded view of pedigree, showing individual offspring as hexagonal glyphs. In F1 only individual '00X' reports missing genotype data, whilst most F2 offspring report incompleteness (blue border). The three types of inheritance inconsistency are shown by the combination of (red) coloured glyphs: 'nil from dam' a downwards triangle 003,5,6,9); 'nil from sire' an upwards triangle (004,5,7,9); 'novel allele' a central rectangle (005,6,7,8). The operation of masking for data cleaning is shown: the blue hatching on '007' and '015' flags masking for all genotype data; a blue upwards triangle indicates a broken paternity relationship ('002' and '013') , downward blue triangle, broken maternity ('014' and '002'). Where '002' appears as a parent, a 'broken-link' icon flags that this individual has been 'orphaned'. C. The data in B after to recalculation of errors. Note the pedigree has altered because of the masked relationships, and new glyphs are introduced for 'unknown' parents. The masked individual '007' now reports no inconsistencies  (because inference cannot introduce errors).