Globals Variables {GeneR} | R Documentation |
There are two ways to store sequences in GeneR:
In a C adapted class (buffers) that stores in addition some globals variables, like working strand, size of original sequence and so on.
It is usefull when, for example, we have to work on a subset of a whole chromosome (i.e. a gene). In this case it will be worthwhile to load only the gene in R. Nevertheless, it will remain easy to associate positions on chromosome and positions on gene ...
As a character string, the more logical way to store short sequences like "ATGTCGTG". It concerns all functions like "strxxx" (strComp, strReadFasta etc.).
When GeneR load a subset of a larger sequence stored in a bank file, it will store the following informations in the C adapted class (buffers, by default 100 buffers than can be extended if necessary):
subsequence (i.e. the succession of A,T,G,C).
postions of the extremities of the subsequence in the master sequence
size of the whole sequence in the bank file
name of the sequence
For specific purposes as renaming a sequence, all these variables
can be viewed and carefully changed at any time (here functions
getAccn
and setAccn
).
Several sequences can be stored simultaneously and called by their buffer number.
Strand is another global variable which can be set and viewed
(functions getStrand
and setStrand
). It
is used as input parameter in many functions to analyze
complementary strand. It was designed to avoid doing explicitly the
complement of the loaded strand then to store it in a buffer with,
as consequence, loss of the informations linked to the master sequence.
We have defined 3 types of addresses on a subsequence extracted from a master sequence:
Absolute addresses i.e. addresses on the master sequence, from the 5' end of the input strand refered as forward (noted A)
Real addresses, i.e. addresses on the master sequence, from the 5' end of one of strands (noted R)
Relative addresses, i.e. addresses on working subsequence, from the 5' end of one of strands (noted T).
Let's show an example, if we read sequence from 11 to 20 from a gene of size 40:
Strand 0 (Forward strand) 1 11 20 40 Absolute (A) 1 11 20 40 Real (R) 1 10 Relative (T) xxxxxxxxxxATGTGTCGTAxxxxxxxxxxxxxxxxxxxx 10 1 Relative (T) 40 30 21 1 Real (R) 1 11 20 40 Absolute (A) Strand 1 (Reverse strand)
Obviously, when an entire sequence is stored, real and relative addresses will be the same.
Although all functions using positions need and return absolute
addresses, 6 functions allow to convert R, A, T into any other type
(functions RtoA, RtoT, AtoR,
AtoT, TtoR, TtoA
).
A global variable strand
is used to convert positions
(see setStrand
getStrand
).
AtoT
,
AtoR
,
RtoA
,
RtoT
,
TtoA
,
TtoR
,
setStrand
,
getStrand
,
getParam
,
setParam
,
getAccn
,
setAccn
## Make a dummy sequence s <- "xxxxxxxxxxATGTGTCGTAxxxxxxxxxxxxxxxxxxxx" placeString(s) writeFasta(file="toto.fa") indexFasta("toto.fa") readFasta("toto.fa",from=11,to=20) getParam() ## $begin ## [1] 11 ## $size ## [1] 40 ## $strand ## [1] 0 ## [...] ## With strand = 0 TtoA(c(1,10)) ##[1] 10 19 TtoR(c(1,10)) ##[1] 10 19