makeGRangesFromDataFrame {GenomicRanges}R Documentation

Make a GRanges object from a data.frame or DataFrame

Description

makeGRangesFromDataFrame finds the fields in the input that describe genomic ranges and returns them as a GRanges object.

For convenience, coercing a data.frame or DataFrame df into a GRanges object is supported and does makeGRangesFromDataFrame(df, keep.extra.columns=TRUE)

Usage

makeGRangesFromDataFrame(df,
    keep.extra.columns=FALSE,
    ignore.strand=FALSE,
    seqinfo=NULL,
    seqnames.field=c("seqnames", "chr", "chrom"),
    start.field=c("start", "chromStart"),
    end.field=c("end", "chromEnd", "stop", "chromStop"),
    strand.field="strand",
    starts.in.df.are.0based=FALSE)

Arguments

df

A data.frame or DataFrame object.

keep.extra.columns

TRUE or FALSE (the default). If TRUE, then the columns in df that are not used to form the genomic ranges returned in the GRanges object will be stored in it as metadata columns. Otherwise, they will be ignored.

ignore.strand

TRUE or FALSE (the default). If TRUE, then the strand of the returned GRanges object will be set to "*".

seqinfo

Either NULL, or a Seqinfo object, or a character vector of seqlevels, or a named numeric vector of sequence lengths. When not NULL, it must be compatible with the genomic ranges in df i.e. it must include at least the sequence levels represented in df.

seqnames.field

A character vector of recognized names for the column in df that contains the chromosome name (a.k.a. sequence name) associated with each genomic range. Only the first name in seqnames.field that is found in colnames(df) will be used. If no one is found, then an error is raised.

start.field

A character vector of recognized names for the column in df that contains the start positions of the genomic ranges. Only the first name in start.field that is found in colnames(df) will be used. If no one is found, then an error is raised.

end.field

A character vector of recognized names for the column in df that contains the end positions of the genomic ranges. Only the first name in start.field that is found in colnames(df) will be used. If no one is found, then an error is raised.

strand.field

A character vector of recognized names for the column in df that contains the strand associated with each genomic range. Only the first name in strand.field that is found in colnames(df) will be used. If no one is found or if ignore.strand is TRUE, then the strand of the returned GRanges object will be set to "*".

starts.in.df.are.0based

TRUE or FALSE (the default). If TRUE, then the start positions of the genomic ranges in df are considered to be 0-based and are converted to 1-based in the returned GRanges object. This feature is intended to make it more convenient to handle input that contains data obtained from resources using the "0-based start" convention. A notorious example of such resource is the UCSC Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables).

Value

A GRanges object with one element per row in the input.

If the seqinfo argument was supplied, the returned object will have exactly the seqlevels specified in seqinfo and in the same order.

If df has non-automatic row names (i.e. rownames(df) is not NULL or seq_len(nrow(df))), then they will be used to set the names of the returned GRanges object.

Author(s)

H. Pages, based on a proposal by Kasper Daniel Hansen

See Also

Examples

df <- data.frame(chr="chr1", start=11:13, end=12:14,
                 strand=c("+","-","+"), score=1:3)

makeGRangesFromDataFrame(df)
gr <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE)
gr2 <- as(df, "GRanges")  # equivalent to the above
stopifnot(identical(gr, gr2))

makeGRangesFromDataFrame(df, ignore.strand=TRUE)
makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
                             ignore.strand=TRUE)

makeGRangesFromDataFrame(df, seqinfo=paste0("chr", 4:1))
makeGRangesFromDataFrame(df, seqinfo=c(chrM=NA, chr1=500, chrX=100))
makeGRangesFromDataFrame(df, seqinfo=Seqinfo(paste0("chr", 4:1)))

if (require(rtracklayer)) {
  session <- browserSession()
  genome(session) <- "sacCer2"
  query <- ucscTableQuery(session, "Most Conserved")
  df <- getTable(query)

  ## A common pitfall is to forget that the UCSC Table Browser uses the
  ## "0-based start" convention:
  gr0 <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE)
  head(gr0)
  min(start(gr0))

  ## The start positions need to be converted into 1-based positions,
  ## to adhere to the convention used in Bioconductor:
  gr1 <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE,
                                  starts.in.df.are.0based=TRUE)
  head(gr1)
}

[Package GenomicRanges version 1.16.3 Index]