nearest-methods {GenomicRanges}R Documentation

Finding the nearest genomic range neighbor

Description

The nearest, precede, follow, distance and distanceToNearest methods for GenomicRanges objects and subclasses.

Usage

## S4 method for signature 'GenomicRanges,GenomicRanges'
precede(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
precede(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE)

## S4 method for signature 'GenomicRanges,GenomicRanges'
follow(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
follow(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE)

## S4 method for signature 'GenomicRanges,GenomicRanges'
nearest(x, subject, select=c("arbitrary", "all"),
        algorithm=c("nclist", "intervaltree"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
nearest(x, subject, select=c("arbitrary", "all"),
        algorithm=c("nclist", "intervaltree"), ignore.strand=FALSE)

## S4 method for signature 'GenomicRanges,GenomicRanges'
distanceToNearest(x, subject, algorithm=c("nclist", "intervaltree"),
                  ignore.strand=FALSE, ...)
## S4 method for signature 'GenomicRanges,missing'
distanceToNearest(x, subject, algorithm=c("nclist", "intervaltree"),
                  ignore.strand=FALSE, ...)

## S4 method for signature 'GenomicRanges,GenomicRanges'
distance(x, y, ignore.strand=FALSE, ...)

Arguments

x

The query GenomicRanges instance.

subject

The subject GenomicRanges instance within which the nearest neighbors are found. Can be missing, in which case x is also the subject.

y

For the distance method, a GRanges instance. Cannot be missing. If x and y are not the same length, the shortest will be recycled to match the length of the longest.

select

Logic for handling ties. By default, all methods select a single interval (arbitrary for nearest, the first by order in subject for precede, and the last for follow).

When select="all" a Hits object is returned with all matches for x. If x does not have a match in subject the x is not included in the Hits object.

ignore.strand

A logical indicating if the strand of the input ranges should be ignored. When TRUE, strand is set to '+'.

algorithm

This argument is passed to findOverlaps, which nearest and distanceToNearest use internally. See ?findOverlaps for more information. Note that it will be removed in BioC 3.3 so please don't use it unless you have a good reason to do so (e.g. troubleshooting).

...

Additional arguments for methods.

Details

Value

For nearest, precede and follow, an integer vector of indices in subject, or a Hits if select="all".

For distanceToNearest, a Hits object with a column for the query index (queryHits), subject index (subjectHits) and the distance between the pair.

For distance, an integer vector of distances between the ranges in x and y.

Author(s)

P. Aboyoun and V. Obenchain <vobencha@fhcrc.org>

See Also

Examples

  ## -----------------------------------------------------------
  ## precede() and follow()
  ## -----------------------------------------------------------
  query <- GRanges("A", IRanges(c(5, 20), width=1), strand="+")
  subject <- GRanges("A", IRanges(rep(c(10, 15), 2), width=1),
                          strand=c("+", "+", "-", "-"))
  precede(query, subject)
  follow(query, subject)
 
  strand(query) <- "-"
  precede(query, subject)
  follow(query, subject)
 
  ## ties choose first in order
  query <- GRanges("A", IRanges(10, width=1), c("+", "-", "*"))
  subject <- GRanges("A", IRanges(c(5, 5, 5, 15, 15, 15), width=1),
                          rep(c("+", "-", "*"), 2))
  precede(query, subject)
  precede(query, rev(subject))
 
  ## ignore.strand=TRUE treats all ranges as '+'
  precede(query[1], subject[4:6], select="all", ignore.strand=FALSE)
  precede(query[1], subject[4:6], select="all", ignore.strand=TRUE)


  ## -----------------------------------------------------------
  ## nearest()
  ## -----------------------------------------------------------
  ## When multiple ranges overlap an "arbitrary" range is chosen
  query <- GRanges("A", IRanges(5, 15))
  subject <- GRanges("A", IRanges(c(1, 15), c(5, 19)))
  nearest(query, subject)
 
  ## select="all" returns all hits
  nearest(query, subject, select="all")
 
  ## Ranges in 'x' will self-select when 'subject' is present
  query <- GRanges("A", IRanges(c(1, 10), width=5))
  nearest(query, query)
 
  ## Ranges in 'x' will not self-select when 'subject' is missing
  nearest(query)

  ## -----------------------------------------------------------
  ## distance(), distanceToNearest()
  ## -----------------------------------------------------------
  ## Adjacent, overlap, separated by 1
  query <- GRanges("A", IRanges(c(1, 2, 10), c(5, 8, 11)))
  subject <- GRanges("A", IRanges(c(6, 5, 13), c(10, 10, 15)))
  distance(query, subject)

  ## recycling
  distance(query[1], subject)

  ## zero-width ranges
  zw <- GRanges("A", IRanges(4,3))
  stopifnot(distance(zw, GRanges("A", IRanges(3,4))) == 0L)
  sapply(-3:3, function(i) 
      distance(shift(zw, i), GRanges("A", IRanges(4,3))))

  query <- GRanges(c("A", "B"), IRanges(c(1, 5), width=1))
  distanceToNearest(query, subject)

  ## distance() with GRanges and TxDb see the 
  ## ?'distance,GenomicRanges,TxDb-method' man 
  ## page in the GenomicFeatures package.

[Package GenomicRanges version 1.22.1 Index]