Zoo Package In R Download
zoo is an R package providing an S3 class with methods for indexed totally ordered observations, such as discrete irregular time series. Its key design goals are independence of a particular index/time/date class and consistency with base R and the "ts" class for regular time series. This paper describes how these are achieved within zoo and provides several illustrations of the available methods for "zoo" objects which include plotting, merging and binding, several mathematical operations, extracting and replacing data and index, coercion and NA handling. A subclass "zooreg" embeds regular time series into the "zoo" framework and thus bridges the gap between regular and irregular time series classes in R.
Content may be subject to copyright.
Discover the world's research
- 20+ million members
- 135+ million publications
- 700k+ research projects
Join for free
arXiv:math/0505527v1 [math.ST] 25 May 2005
zoo: An S3 Class and Meth od s for Indexed
Totally Ordered Observations
Achim Zeileis
Wirtschaftsuniversit
¨
at Wien
Gab or Grothendieck
Abstract
A previous version to this introduction to the R package zoo has been published as ? in
the Journal of Statistical Software.
zoo is an R package providing an S3 class with methods for indexed totally ordered ob-
servations, such as discrete irregular time series. Its key design goals are independence of a
particular index/time/date class and consistency with base R and the "ts" class for regular
time series. This paper describes how these are achieved within zoo and provides several
illustrations of the available methods for "zoo" objects which include plotting, merging and
binding, several mathematical operations, extracting and replacing data and index, coercion
and NA handling. A subclass "zooreg" embeds regular time series into the "zoo" framework
and thus bridges the gap between regular and irregular time series classes in R.
Keywords : totally ordered observations, irregular time series, regular time series, S3, R.
1. Introduction
The R system for statistical computing (R Development Core Team 2005, http://www.R-project.org/ )
ships with a a class for regularly spaced time series, "ts" in package stats, but has no native class
for irregularly spaced time series. With the increased interest in computational finance with R
over the last years several implementations of classes for irregular time series emerged which are
aimed particularly at finance applica tions. These include the S3 classes "timeSeries" in package
fCalendar from the Rmetrics bundle (Wuertz 2005) and "irts" in package tseries (Tr apletti 2005)
and the S4 class "its" in package its (Heywood 2004). With these packages available, why would
anybody want yet another package providing infrastructure for irregular time series? The above
mentioned implementations have in common that they are restricted to a particular class for the
time scale: the former implementation comes with its own time class "timeDate" built on top
of the "POSIXt" classes available in base R whereas the latter two use "POSIXct" directly. And
this was the starting point for the zoo project: the first author of the present paper needed more
general support for ordered observations, independent of a particular index class, for the package
strucchange (Zeileis, Leisch, Hor nik, and Kle iber 2002). Hence, the package was called zoo which
stands for Z
's ordered observations. Since the first release, a major part of the additions to zoo
were provided by the second author of this paper, so that the name of the package does not really
reflect the authorship anymore. Nevertheless, independence of a particular index class r e mained
the mos t important design goal. While the package evolved to its current status, a second key
design goal became mor e and more clear: to provide methods to standard generic functions for the
"zoo" class that are similar to those for the "ts" class (and base R in general) such that the usage
of zoo is very intuitive beca us e few additional commands have to be learned. This paper describes
how these design goals ar e implemented in zoo . The resulting package provides the "zoo" class
which offers an extensive (and still growing) set o f standard and new methods fo r working with
indexed observa tions and 'talks' to the classes "ts", "its", "irts" and "timeSeries". It also
bridges the gap between regular and irr e gular time series by providing coercion with (virtually) no
loss of information betwee n "ts" and "zoo". With these tools zoo provides the basic infrastructure
2 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
for working with indexed totally ordered observations and the package can be either employed by
users directly or can be a basic ingredient on top of which other more specialized applications can
be built.
The remainder of the paper is organized as follows: Section 2 explains how "zoo" objects are
created a nd illustra tes how the corre sponding methods for plotting, merging and binding, several
mathematical operations, extracting and replacing data and index, coercion and NA handling can
be used. Section 3 outlines how other packages can build on this basic infrastructure. Section 4
gives a few summarizing remarks a nd an outlook on future developments. Finally, an app e ndix
provides a reference card that gives an overview of the functionality contained in zoo.
2. The class "zoo" and its methods
This section describes how "zoo" series can be cre ated a nd subsequently manipulated, visualized,
combined or coerced to other class e s. In Section 2.1, the general class "zoo" for totally ordered
series is described. Subsequently, in Sectio n 2.2, the subclass "zooreg" for regular "zoo" series,
i.e., series which have an index with a specified frequency, is discussed. The metho ds illustrated in
the remainder of the section are mostly the same for both "zoo" and "zooreg" objects and hence
do not have to be discussed se parately. The few differences in merging and binding are briefly
highlighted in Section 2.4.
2.1. Creation of "zoo" objects
The simple idea for the creation of "zoo" objects is to have some vector or matrix of observations
x which are totally ordered by some index vector. In time series applications, this index is a
measure of time but every other numeric, character or even more abstract vector that provides
a total ordering of the observations is also suitable. Objects of class "zoo" are created by the
function
zoo(x, order.by)
where x is the vector or matrix of observations
1
and order.by is the index by which the observa-
tions should be o rdered. It has to be of the same length as NROW(x), i.e., either the same length
as x for vectors or the same number of rows for matrices.
2
The "zoo" object created is essentially
the vector/matrix as befor e but has an additional "index" attribute in which the index is stored.
3
Both the observations in the vector/matrix x and the index order.by can, in principle, be of
arbitrary classes. However, most of the following methods (plotting, aggregating, mathematical
operations) for "zoo" o bjects are typically only useful for numeric observa tio ns x. Special effo rt
in the design was put into indepe ndence from a particular class for the index vector. In zoo, it is
assumed that combination c(), quer ying the length(), value matching MATCH(), subsetting [,,
and, of course, ordering ORDER() work when applied to the index. In addition, an as.character()
method might improve printed output
4
and as.numeric() could be used for computing distances
between indexes, e.g., in interpola tion. Both methods are not neces sary for working with "zoo"
objects but could be used if available. All these methods are available, e.g., for standard numeric
and character vectors and for vectors of classes "Date", "POSIXct" or "times" from package
chron, but not for the class "dateTime" in fCalendar. In the last case , the solution is to pro-
vide methods for the above mentioned functions so that indexing "zoo" objects with "dateTime"
1
In principle, more general objects can be indexed, but curre ntly zoo does not support this. Development plans
are that zoo should eventually support indexed factors, data frames and lists.
2
The only case where this restriction is not imposed is for zero-length vectors, i.e., vectors that only have an
index but no data.
3
There is some limited support for indexed factors available in which case the "zoo" obj ect also has an attribute
"oclass" with the original class of x. This feature is still under development and might change in future versions.
4
If an as.character() method is already defined, but gives not the desired output for printing, then an in-
dex2char() method can be defined. This is a generic convenience function used for creating character representations
of the index vector and it defaults to using as.character().
Achim Zeileis, Gabor Grothendieck 3
vectors works (see Section 3.3 for an example). To achieve this independence of the index c lass,
new generic functions for ordering (ORDER() ) and value matching (MATCH()) are introduced as the
corresponding base functions order() and match() are non-generic. The default methods simply
call the corresponding base functions, i.e., no new method needs to be introduced for a particular
index class if the non-generic functions order() and match() work for this class.
To illustrate the usag e of zoo() , we first load the package and se t the random seed to make the
examples in this paper exactly reproducible.
R> library(zoo)
R> set.seed(1071)
Then, we create two vectors z1 and z2 with "POSIXct" indexes, one with random obse rvations
R> z1.index <- ISOdatetime(2004, rep(1:2, 5), sample(28, 10), 0,
+ 0, 0)
R> z1.data <- rnorm(10)
R> z1 <- zoo(z1.data, z1.index)
and one with a sine wave
R> z2.index <- as.POSIXct(paste(2004, rep(1:2, 5), sample(1:28,
+ 10), sep = "-"))
R> z2.data <- sin(2 * 1:10/pi)
R> z2 <- zoo(z2.data, z2.index)
Furthermore, we create a ma trix Z with random o bs e rvations and a "Date" index
R> Z.index <- as.Date(sample(12450:12500, 10))
R> Z.data <- matrix(rnorm(30), ncol = 3)
R> colnames(Z.data) <- c("Aa", "Bb", "Cc")
R> Z <- zoo(Z.data, Z.index)
In the examples above, the generation of indexes looks a bit awkward due to the fact the indexes
need to be randomly generated (and there are no special functions for random indexes beca us e
these are rarely needed in practice). In "real world" applications, the indexes are typically part of
the raw data set read into R so the code would be even simpler. See Section 3 for such examples.
5
Methods to several standard generic functions are available for "zoo" objects, such as print ,
summary, str, head, tail and [ (subsetting), a few of which are illustrated in the following.
There are three printing code styles for "zoo" objects: vectors are by default printed in "hori-
zontal" style
R> z1
2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07
0.74675994 0.02107873 -0.29823529 0.68625772 1.94078850 1.27384445
2004-02-12 2004-02-16 2004-02-20 2004-02-24
0.22170438 -2.07607585 -1.78439244 -0.19533304
R> z1[3:7]
2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12
-0.2982353 0.6862577 1.9407885 1.2738445 0.2217044
5
Note, that in the code above a new as.Date method, provided in zoo, is used to convert days since 1970-01-01
to class "Date". See the respective help page for more details.
4 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
and matrices in "vertical" style
R> Z
Aa Bb Cc
2004-02-02 1.25543390 0.68157316 -0.63292049
2004-02-08 -1.49458326 1.32341223 -1.49442269
2004-02-09 -1.87462247 -0.87329289 0.62733971
2004-02-21 -0.14538608 0.45234903 -0.14597401
2004-02-22 0.22542418 0.53838938 0.23136133
2004-02-29 1.20695518 0.31814222 -0.01129202
2004-03-05 -1.20861025 1.42379785 -0.81614483
2004-03-10 -0.11039563 1.34774254 0.95522468
2004-03-14 0.84202385 -2.73842019 0.23150695
2004-03-20 -0.19019104 0.12308872 -1.51862157
R> Z[1:3, 2:3]
Bb Cc
2004-02-02 0.6815732 -0.6329205
2004-02-08 1.3234122 -1.4944227
2004-02-09 -0.8732929 0.6273397
Additionally, there is a "plain" style which simply firs t prints the data and then the index.
Above, we have illustrated that "zoo" series can be indexed like vectors or matrice s respectively,
i.e., with integers correponding to their observation number (and column number). But for indexed
observations, one would obviously also like to be able to index with the index class. This is
also available in [ which only uses vector/matrix-type subsetting if its first argument is of class
"numeric", "integer" or "logical".
R> z1[ISOdatetime(2004, 1, c(14, 25), 0, 0, 0)]
2004-01-14 2004-01-25
0.02107873 0.68625772
If the index c lass happens to be "numeric", the index has to be either insulated in I() like z[I(i)]
or the window() method can be used (see Section 2.6).
Summaries and most other methods for "zoo" objects are carried out column wise, reflecting the
rectangular s tructure. In addition, a summary of the index is provided.
R> summary(z1)
Index z1
Min. :2004-01-05 00:00:00 Min. :-2.07608
1st Qu.:2004-01-20 12:00:00 1st Qu.:-0.27251
Median :2004-02-01 12:00:00 Median : 0.12139
Mean :2004-02-01 09:36:00 Mean : 0.05364
3rd Qu.:2004-02-15 00:00:00 3rd Qu.: 0.73163
Max. :2004-02-24 00:00:00 Max. : 1.94079
R> summary(Z)
Achim Zeileis, Gabor Grothendieck 5
Index Aa Bb Cc
Min. :2004-02-02 Min. :-1.8746 Min. :-2.7384 Min. :-1.51862
1st Qu.:2004-02-12 1st Qu.:-0.9540 1st Qu.: 0.1719 1st Qu.:-0.77034
Median :2004-02-25 Median :-0.1279 Median : 0.4954 Median :-0.07863
Mean :2004-02-25 Mean :-0.1494 Mean : 0.2597 Mean :-0.25739
3rd Qu.:2004-03-08 3rd Qu.: 0.6879 3rd Qu.: 1.1630 3rd Qu.: 0.23147
Max. :2004-03-20 Max. : 1.2554 Max. : 1.4238 Max. : 0.95522
2.2. Creation of "zooreg" objects
Strictly regular series are such se ries observations where the distance betwe e n the indexes of every
two adjacent observations is the same. Such series can a lso be described by their fre quency, i.e.,
the reciprocal value o f the dis tance between two observations. As "zoo" can be used to store series
with arbitrary type of index, it can, of course, also be used to store series with regular indexes.
So why should this case be given special attention, in particular as there is alre ady the "ts" class
devoted entirely to regular series? There are two reasons: First, to be able to convert back and
forth between "ts" and "zoo", the frequency of a certain series nee ds to be stored on the "zoo"
side. Second, "ts" is limited to strictly regular serie s and the regularity is lost if some internal
observations are omitted. Series that can be created by omitting some internal observations from
strictly regular series will in the following be refered to as being (weakly) re gular. Therefore, a
class that bridges the gap between irregular and strictly regular ser ie s is needed and "zooreg"
fills this gap. Objects of class "zooreg" inherit from class "zoo" but have an additional attribute
"frequency" in which the frequency of the series is stored. Therefore, they can be employed to
represent bo th strictly and weakly regular series.
To cr eate a "zooreg" object, either the command zoo() can be used or the co mma nd zooreg().
zoo(x, order.by, frequency)
zooreg(data, start, end, frequency, deltat, ts.eps, order.by)
If zoo() is called as in the previous section but with an additional frequency a rgument, it
is checked whether frequency complies with the index order.by: if it does an object of class
"zooreg" inheriting from "zoo" is re tur ned. The command zooreg() takes mostly the sa me ar-
guments as ts().
6
In both cases , the index clas s is more restricted than in the plain "zoo" case.
The index must be of a clas s which can be coerced to "numeric" (for checking its regularity) and
when converted to numeric the index must be expressa ble as multiples of 1/ frequency. Further-
more, adding/substr acting a numeric to/from an observation of the index clas s, should return the
correct value of the index class again, i.e., group generic functions Ops should be defined.
7
The following calls yield equivalent series
R> zr1 <- zooreg(sin(1:9), start = 2000, frequency = 4)
R> zr2 <- zoo(sin(1:9), seq(2000, 2002, by = 1/4), 4)
R> zr1
2000(1) 2000(2) 2000(3) 2000(4) 2001(1) 2001(2) 2001(3)
0.8414710 0.9092974 0.1411200 -0.7568025 -0.9589243 -0.2794155 0.6569866
2001(4) 2002(1)
0.9893582 0.4121185
R> zr2
6
Only if order.by is specified in the zooreg() call, then zoo(x, order.by, frequency) is called.
7
An application of non-numeric indexes for regular series are the classes "yearmon" and "yearqtr" which are
designed for monthly and quarterly series respectively and are discussed in Section 3. 4.
6 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
2000(1) 2000(2) 2000(3) 2000(4) 2001(1) 2001(2) 2001(3)
0.8414710 0.9092974 0.1411200 -0.7568025 -0.9589243 -0.2794155 0.6569866
2001(4) 2002(1)
0.9893582 0.4121185
to which methods to standard ge neric functions for regular series can be applied, such as fre-
quency , deltat , cycle.
As stated above, the advantage of "zooreg" series is that they remain regular even if an internal
observation is dropped:
R> zr1 <- zr1[-c(3, 5)]
R> zr1
2000(1) 2000(2) 2000(4) 2001(2) 2001(3) 2001(4) 2002(1)
0.8414710 0.9092974 -0.7568025 -0.2794155 0.6569866 0.9893582 0.4121185
R> class(zr1)
[1] "zooreg" "zoo"
R> frequency(zr1)
[1] 4
This facilitates NA ha ndling significantly compa red to "ts" and makes "zooreg" a much more
attractive data type, e.g., for time series regression.
zooreg() can also deal with non-numeric indexes provided tha t adding "numeric" observations
to the index class preserves the class and does not coerce to "numeric".
R> zooreg(1:5, start = as.Date("2005-01-01"))
2005-01-01 2005-01-02 2005-01-03 2005-01-04 2005-01-05
1 2 3 4 5
To check whether a certain series is (strictly) regular, the new g e neric function is.regular(x,
strict = FALSE) can be used:
R> is.regular(zr1)
[1] TRUE
R> is.regular(zr1, strict = TRUE)
[1] FALSE
This function (and also the frequency, deltat and cycle) also work for "zoo" objects if the
regularity can still be inferred from the data:
R> zr1 <- as.zoo(zr1)
R> zr1
2000 2000.25 2000.75 2001.25 2001.5 2001.75 2002
0.8414710 0.9092974 -0.7568025 -0.2794155 0.6569866 0.9893582 0.4121185
Achim Zeileis, Gabor Grothendieck 7
R> class(zr1)
[1] "zoo"
R> is.regular(zr1)
[1] TRUE
R> frequency(zr1)
[1] 4
Of course , inferring the underlying regularity is not always reliable and it is safer to s tore a regular
series as a "zooreg" object if it is intended to be a regular series.
If a weakly regular series is coerced to "ts" the missing observations are filled with NAs (see also
Section 2.8). For strictly regular series with numeric index, the class can be switched between
"zoo" and "ts" without loss of information.
R> as.ts(zr1)
Qtr1 Qtr2 Qtr3 Qtr4
2000 0.8414710 0.9092974 NA -0.7568025
2001 NA -0.2794155 0.6569866 0.9893582
2002 0.4121185
R> identical(zr2, as.zoo(as.ts(zr2)))
[1] TRUE
This enables direct use of functions such as acf, arima, stl etc. on "zooreg" objects as these
methods coerce to "ts" first. The result only has to be coerced back to "zoo", if appropriate.
2.3. Plot ting
The plot method for "zoo" objects, in particular for multivariate "zoo" series , is based on the
corresponding method for (multivariate) regular time series. It relies on plot and lines methods
being available for the index class which can plot the index against the obse rvations.
By default the plot method creates a panel for each series
R> plot(Z)
but can also display all series in a single panel
R> plot(Z, plot.type = "single", col = 2:4)
In both cases additional graphical parameters like color col , plotting character pch and line type
lty can be expanded to the number of series. But the plot method for "zoo" objects offers some
more flexibility in specifica tio n o f graphical parameters as in
R> plot(Z, type = "b", lty = 1:3, pch = list(Aa = 1:5, Bb = 2, Cc = 4),
+ col = list(Bb = 2, 4))
8 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
The argument lty be haves as before and sets every series in another line type. The pch argument
is a named list that assigns to each series a different vector of plotting character s each of which
is expanded to the number of observations. Such a list does not necessarily have to include the
names of all se ries, but can also specify a s ubse t. For the remaining series the default parameter
is then used which can again be changed: e.g., in the above example the col argument is set to
display the ser ies "Bb" in red and all remaining series in blue. The results of the multiple panel
plots are depicted in Figure 2 and the single panel plot in 1.
2.4. Merging and binding
As for many rectangular data formats in R, ther e are both methods for combining the rows and
columns of "zoo" objects respectively. For the rbind method the number of columns of the
combined objects has to be identical and the indexes may not overlap.
R> rbind(z1[5:10], z1[2:3])
2004-01-14 2004-01-19 2004-01-27 2004-02-07 2004-02-12 2004-02-16
0.02107873 -0.29823529 1.94078850 1.27384445 0.22170438 -2.07607585
2004-02-20 2004-02-24
-1.78439244 -0.19533304
The c method simply calls rbind and hence behave s in the s ame way.
The cbind method by default combines the columns by the union of the indexes and fills the
created g aps by NAs.
R> cbind(z1, z2)
z1 z2
2004-01-03 NA 0.94306673
2004-01-05 0.74675994 -0.04149429
2004-01-14 0.02107873 NA
−2 −1 0 1
Index
Z
Feb 02 Feb 12 Feb 22 Mar 03 Mar 13
Figure 1: Example of a single panel plot
Achim Zeileis, Gabor Grothendieck 9
Feb 02 Feb 12 Feb 22 Mar 03 Mar 13
Index
Z
Feb 02 Feb 12 Feb 22 Mar 03 Mar 13
Index
Z
Figure 2: Examples of multiple panel plots
10 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
2004-01-17 NA 0.59448077
2004-01-19 -0.29823529 -0.52575918
2004-01-24 NA -0.96739776
2004-01-25 0.68625772 NA
2004-01-27 1.94078850 NA
2004-02-07 1.27384445 NA
2004-02-08 NA 0.95605566
2004-02-12 0.22170438 -0.62733473
2004-02-13 NA -0.92845336
2004-02-16 -2.07607585 NA
2004-02-20 -1.78439244 NA
2004-02-24 -0.19533304 NA
2004-02-25 NA 0.56060280
2004-02-26 NA 0.08291711
In fact, the cbind method is synonymous with the merge method
8
except that the latter provides
additional arguments which allow for combining the columns by the intersection of the indexes
using the argument all = FALSE
R> merge(z1, z2, all = FALSE)
z1 z2
2004-01-05 0.74675994 -0.04149429
2004-01-19 -0.29823529 -0.52575918
2004-02-12 0.22170438 -0.62733473
Additionally, the filling pattern can be changed in merge , the naming of the columns can be
modified and the return class of the result can be specified. In the case of merging of objects with
different index classes, R gives a warning a nd tries to coerce the indexes. Merging objects with
different index classes is generally discouraged—if it is used nevertheless, it is the responsibility
of the user to ensure that the re sult is as intended. If at least one of the merged/binded o bjects
was a "zooreg" object, then merge tries to r eturn a "zooreg" object. This is done by asse ssing
whether there is a common maximal frequency and by checking whether the resulting index is still
(weakly) regular.
If non-"zoo" objects are included in merging, then merge g ives plain vectors/factors/matrices the
index of the fir st argument (if it is of the same length). Scalars are always added for the full index
without missing values.
R> merge(z1, pi, 1:10)
z1 pi 1:10
2004-01-05 0.74675994 3.14159265 1.00000000
2004-01-14 0.02107873 3.14159265 2.00000000
2004-01-19 -0.29823529 3.14159265 3.00000000
2004-01-25 0.68625772 3.14159265 4.00000000
2004-01-27 1.94078850 3.14159265 5.00000000
2004-02-07 1.27384445 3.14159265 6.00000000
2004-02-12 0.22170438 3.14159265 7.00000000
2004-02-16 -2.07607585 3.14159265 8.00000000
2004-02-20 -1.78439244 3.14159265 9.00000000
2004-02-24 -0.19533304 3.14159265 10.00000000
8
Note, that i n some situations the column naming in the resulting object is somewhat problematic in the cbind
method and the merge method might provide better formatting of the column names.
Achim Zeileis, Gabor Grothendieck 11
Another function which performs ope rations along a subset o f indexes is aggregate , which is
discussed in this section although it does not combine several objects. Using the aggregate
method, "zoo" o bjects are split into subsets along a coarser index grid, summary s tatistics are
computed for e ach and then the reduced object is retur ned. In the following example, first a
function is set up which returns for a given "Date" value the corresponding first of the month.
This function is then used to compute the coarser grid for the aggregate call: in the first example,
the g rouping is computed explicitely by firstofmonth(Z.index) and the mean of the observations
in the month is returned—in the second example, only the function that computes the grouping
(when applied to index(Z)) is supplied and the first observation is used for aggregation.
R> firstofmonth <- function(x) as.Date(sub("..$", "01", format(x)))
R> aggregate(Z, firstofmonth(Z.index), mean)
Aa Bb Cc
2004-02-01 0.53820841 0.04508597 -0.12412352
2004-03-01 -1.18080051 0.58156655 -0.45730045
R> aggregate(Z, firstofmonth(Z.index), head, 1)
Aa Bb Cc
2004-02-01 1.2554339 0.6815732 -0.6329205
2004-03-01 -1.4945833 1.3234122 -1.4944227
2.5. Mathematical operations
To allow for standard mathematical operations among "zoo" objects, zoo extends group generic
functions Ops. These perform the oper ations only for the intersection of the indexe s of the objects.
As an example, the summation and log ical comparison with < of z1 and z2 yield
R> z1 + z2
2004-01-05 2004-01-19 2004-02-12
0.7052657 -0.8239945 -0.4056304
R> z1 < z2
2004-01-05 2004-01-19 2004-02-12
FALSE FALSE FALSE
Additionally, methods for transposing t of "zoo" objects—which coerces to a matrix before—and
computing cumulative quantities such as cumsum, cumprod, cummin , cummax which are all applied
column wis e .
R> cumsum(Z)
Aa Bb Cc
2004-02-02 1.2554339 0.6815732 -0.6329205
2004-02-08 -0.2391494 2.0049854 -2.1273432
2004-02-09 -2.1137718 1.1316925 -1.5000035
2004-02-21 -2.2591579 1.5840415 -1.6459775
2004-02-22 -2.0337337 2.1224309 -1.4146162
2004-02-29 -0.8267785 2.4405731 -1.4259082
2004-03-05 -2.0353888 3.8643710 -2.2420530
2004-03-10 -2.1457844 5.2121135 -1.2868283
2004-03-14 -1.3037606 2.4736933 -1.0553214
2004-03-20 -1.4939516 2.5967820 -2.5739429
12 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
2.6. Ext racting and replacing the data and the index
zoo provides several generic functions and methods to work on the data contained in a "zoo"
object, the index (or time) attribute associated to it, and on both data a nd index.
The data stored in "zoo" objects can be extracted by coredata which strips off all "zoo"-specific
attributes and it can be re placed using coredata<-. Both are new generic functions
9
with methods
for "zoo" objects as illustrated in the following example.
R> coredata(z1)
[1] 0.74675994 0.02107873 -0.29823529 0.68625772 1.94078850 1.27384445
[7] 0.22170438 -2.07607585 -1.78439244 -0.19533304
R> coredata(z1) <- 1:10
R> z1
2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12
1 2 3 4 5 6 7
2004-02-16 2004-02-20 2004-02-24
8 9 10
The index a ssociated with a "zoo" object can be extracted by index and modified by index<-.
As the interpretation of the index as "time" in time series a pplica tio ns is natural, there are also
synonymous methods time and time<-. Hence, the commands index(z2) and time(z2) return
equivalent results.
R> index(z2)
[1] "2004-01-03 Eastern Standard Time" "2004-01-05 Eastern Standard Time"
[3] "2004-01-17 Eastern Standard Time" "2004-01-19 Eastern Standard Time"
[5] "2004-01-24 Eastern Standard Time" "2004-02-08 Eastern Standard Time"
[7] "2004-02-12 Eastern Standard Time" "2004-02-13 Eastern Standard Time"
[9] "2004-02-25 Eastern Standard Time" "2004-02-26 Eastern Standard Time"
The index scale of z2 can be changed to that of z1 by
R> index(z2) <- index(z1)
R> z2
2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07
0.94306673 -0.04149429 0.59448077 -0.52575918 -0.96739776 0.95605566
2004-02-12 2004-02-16 2004-02-20 2004-02-24
-0.62733473 -0.92845336 0.56060280 0.08291711
The start and the end of the index/time vector can be queried by start and end:
R> start(z1)
[1] "2004-01-05 Eastern Standard Time"
R> end(z1)
9
The coredata functionality is similar in spirit to the core function in its and value in tseries. However, the
focus of those functions is somewhat narrower and we try to provide more general purpose generic functions. See
the respective manual page for more details.
Achim Zeileis, Gabor Grothendieck 13
[1] "2004-02-24 Eastern Standard Time"
To work on both data and index/time, zoo provides window and window<- methods for "zoo"
objects. In both cases the window is specified by
window(x, index, start, end)
where x is the "zoo" object, index is a set of indexes to be selected (by default the full index of
x) and start and end can be use d to restrict the index set.
R> window(Z, start = as.Date("2004-03-01"))
Aa Bb Cc
2004-03-05 -1.2086102 1.4237978 -0.8161448
2004-03-10 -0.1103956 1.3477425 0.9552247
2004-03-14 0.8420238 -2.7384202 0.2315069
2004-03-20 -0.1901910 0.1230887 -1.5186216
R> window(Z, index = index(Z)[5:8], end = as.Date("2004-03-01"))
Aa Bb Cc
2004-02-22 0.22542418 0.53838938 0.23136133
2004-02-29 1.20695518 0.31814222 -0.01129202
The first example selects all observations starting from 2004-0 3-01 where as the se c ond selects from
the from the 5th to 8th obse rvation tho se up to 2004-03-01.
The same syntax can be used for the corresponding replace ment function.
R> window(z1, end = as.POSIXct("2004-02-01")) <- 9:5
R> z1
2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12
9 8 7 6 5 6 7
2004-02-16 2004-02-20 2004-02-24
8 9 10
Two methods that are standard in time series a pplications are lag and diff. These are available
with the same arguments as the "ts" methods.
10
R> lag(z1, k = -1)
2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12 2004-02-16
9 8 7 6 5 6 7
2004-02-20 2004-02-24
8 9
R> merge(z1, lag(z1, k = 1))
10
diff als o has an additional argument that also allows for geometric and not only allows arithmetic differences.
Furthermore, note the sign of the lag in lag: by default it is positive and shifts the observations forward, to obtain
the more standard backward shift the lag has to be negative.
14 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
z1 lag(z1, k = 1)
2004-01-05 9 8
2004-01-14 8 7
2004-01-19 7 6
2004-01-25 6 5
2004-01-27 5 6
2004-02-07 6 7
2004-02-12 7 8
2004-02-16 8 9
2004-02-20 9 10
2004-02-24 10 NA
R> diff(z1)
2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12 2004-02-16
-1 -1 -1 -1 1 1 1
2004-02-20 2004-02-24
1 1
2.7. Coercion to and from "zoo"
Coercion to and from "zoo" objects is available for o bjects of various classes, in particular "ts" ,
"irts" and "its" objects can be coerce d to "zoo" and back if the index is of the appropriate
class.
11
Coercion between "zooreg" and "zoo" is also available and is essentially dropping the "fre-
quency" attribute or try ing to add o ne, r e spectively.
Furthermore, "zoo" objects can be coerced to vectors, matrices, lists and data frames (the latter
dropping the index/time attribute). A simple example is
R> as.data.frame(Z)
Aa Bb Cc
2004-02-02 1.2554339 0.6815732 -0.63292049
2004-02-08 -1.4945833 1.3234122 -1.49442269
2004-02-09 -1.8746225 -0.8732929 0.62733971
2004-02-21 -0.1453861 0.4523490 -0.14597401
2004-02-22 0.2254242 0.5383894 0.23136133
2004-02-29 1.2069552 0.3181422 -0.01129202
2004-03-05 -1.2086102 1.4237978 -0.81614483
2004-03-10 -0.1103956 1.3477425 0.95522468
2004-03-14 0.8420238 -2.7384202 0.23150695
2004-03-20 -0.1901910 0.1230887 -1.51862157
2.8. NA handling
Four metho ds for dealing w ith NAs (missing observations) in the observations are applicable
to "zoo" objects: na.omit, na.contiguous, na.approx and na.locf. na.omit—or its de-
fault method to be more precise —returns a "zoo" object with incomplete observatio ns removed.
na.contiguous extracts the longest consecutive stretch of non-missing values. Furthermore, new
generic functions na.approx and na.locf and corresponding default metho ds are introduced in
zoo. The former replaces NAs by linear interpolation (using the function approx) and the name
11
Coercion from "zoo" to "irts" is contained in the tseries package.
Achim Zeileis, Gabor Grothendieck 15
of the latter stands for last observation carried forward. It replaces missing observatio ns by the
most recent non-NA prior to it. Leading NAs, which cannot be replaced by previous observations,
are remove d in both functions by default.
R> z1[sample(1:10, 3)] <- NA
R> z1
2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12
9 NA 7 6 5 6 NA
2004-02-16 2004-02-20 2004-02-24
8 9 NA
R> na.omit(z1)
2004-01-05 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-16 2004-02-20
9 7 6 5 6 8 9
R> na.contiguous(z1)
2004-01-19 2004-01-25 2004-01-27 2004-02-07
7 6 5 6
R> na.approx(z1)
2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12
9.000000 7.714286 7.000000 6.000000 5.000000 6.000000 7.111111
2004-02-16 2004-02-20
8.000000 9.000000
R> na.approx(z1, 1:NROW(z1))
2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12
9 8 7 6 5 6 7
2004-02-16 2004-02-20
8 9
R> na.locf(z1)
2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12
9 9 7 6 5 6 6
2004-02-16 2004-02-20 2004-02-24
8 9 9
As the above example illustrates, na.approx uses by default the underlying time scale for inter-
polation. This can be changed, e.g., to an equidistant spacing, by setting the sec ond argument of
na.approx.
2.9. Rolling functions
A typical task to be performed on ordered obser vations is to evaluate some function, e.g., comput-
ing the mean, in a window of obse rvations that is moved over the full sample period. The resulting
statistics are usually synonymously referred to a s rolling/r unning/moving sta tistics. In zoo , the
generic function rapply is provided along with a "zoo" and a "ts" method. The most important
arguments are
16 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
rapply(data, width, FUN)
where the function FUN is applied to a rolling window of size width of the observations data.
The function rapply currently only evaluates the function for windows of full size width, hence
the result has width - 1 fewer observations than the original ser ie s. But it can be determined
whether the 'lost' observa tions should be padded with NA s and whether the result should be left-
or right-aligned or centered (default) with respect to the original index.
R> rapply(Z, 5, sd)
Aa Bb Cc
2004-02-09 1.2814876 0.8018950 0.8218959
2004-02-21 1.2658555 0.7891358 0.8025043
2004-02-22 1.2102011 0.8206819 0.5319727
2004-02-29 0.8662296 0.5266261 0.6411751
2004-03-05 0.9363400 1.7011273 0.6356144
2004-03-10 0.9508642 1.6892246 0.9578196
R> rapply(Z, 5, sd, na.pad = TRUE, align = "left")
Aa Bb Cc
2004-02-02 1.2814876 0.8018950 0.8218959
2004-02-08 1.2658555 0.7891358 0.8025043
2004-02-09 1.2102011 0.8206819 0.5319727
2004-02-21 0.8662296 0.5266261 0.6411751
2004-02-22 0.9363400 1.7011273 0.6356144
2004-02-29 0.9508642 1.6892246 0.9578196
2004-03-05 NA NA NA
2004-03-10 NA NA NA
2004-03-14 NA NA NA
2004-03-20 NA NA NA
To improve the performance of rapply(x, k, foo ) for some frequently used functions foo, more
efficient implementations roll foo (x, k) are available (and also called by rapply ). Currently,
these are the generic functions rollmean, rollmedian and rollmax which have methods for "zoo"
and "ts" series a nd a default method fo r plain vectors.
R> rollmean(z2, 5, na.pad = TRUE)
2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27
NA NA 0.0005792538 0.0031770388 -0.1139910497
2004-02-07 2004-02-12 2004-02-16 2004-02-20 2004-02-24
-0.4185778750 -0.2013054791 0.0087574946 NA NA
3. Combining zoo with other packages
The main purpose of the package zoo is to provide basic infrastructure for working with indexed
totally ordered observations that can be either employed by users directly or can be a basic
ingredient on top of which other packages can build. The latter is illustrated with a few brief
examples involving the packages strucchange, tseries and fCalendar in this section. Finally, the
classes "yearmon" and "yearqtr" (provided in zoo) are used for illustrating how zoo can be
extended by creating a new index class.
Achim Zeileis, Gabor Grothendieck 17
3.1. strucchange: Empirical fluctuation processes
The package strucchange provides a collection of methods for testing, monitoring and dating
structural changes, in particular in linear regression models. Tests for structural change a ssess
whether the parameters of a model remain constant over an ordering with respect to a specified
variable, usually time. To adequatly store and visualize empirical fluctuation proce sses which
capture instabilities over this ordering, a data type for indexed order e d observations is required.
This was the motivation for starting the zoo project.
A simple example for the need of "zoo" objects in strucchange w hich ca n not be (easily) imple-
mented by other irregular time series classes available in R is described in the following. We assess
the constancy of the electrical resistance over the apparent juice content of kiwi fruits.
12
The da ta
set fruitohms is contained in the DAAG package (Maindonald and Braun 2004). The fitted ocus
object contains the OLS-based CUSUM process for the mean of the electr ic al resistance (variable
ohms) indexed by the juice content (variable juice).
R> library(strucchange)
R> library(DAAG)
R> data(fruitohms)
R> ocus <- gefp(ohms ~ 1, order.by = ~juice, data = fruitohms)
R> plot(ocus)
10 20 30 40 50 60
0 1 2 3 4
juice
empirical fluctuation process
M−fluctuation test
Figure 3: Empirical M-fluctuation process for fruitohms data
This OLS-based CUSUM process can be visualized using the plot method for "gefp" objects
which builds on the "zoo" method and yields in this case the plot in Figure 3 showing the process
which crosses its 5% critical value and thus signals a significant decrease in the mean electrical
resistance over the juice content. For more information on the package strucchange and the
function gefp see Zeileis et al. (2 002) and Zeileis (2004).
12
A different approach would b e to test whether the slope of a regression of electrical resistance on juice content
changes wi th increasing j uice content, i.e., to test for instabilities in ohms ~ juice instead of ohms ~ 1. Both lead
to similar results.
18 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
3.2. tseries: Historical financial data
A typical application for irregular time series which became increasingly important over the last
years in computational statistics and finance is daily (or higher frequent) financial data. The
package tseries provides the function get.hist.quote for obtaining historical financial data by
querying Yahoo! Finance at http://finance.yahoo.com/ , an online portal quoting data provided
by Reuters. The following code queries the quotes of Lucent Technologies starting from 2001-01-01
until 2004-09-30:
R> library(tseries)
R> LU <- get.hist.quote(instrument = "LU", start = "2001-01-01",
+ end = "2004-09-30", origin = "1970-01-01")
In the returned LU object the irregular data is stored by extending it in a r e gular grid and filling
the gaps with NAs. The time is stored in days starting from an origin, in this case spe cified to
be 1970-01-01, the origin used by the Date class. This s e ries can be trans fo rmed eas ily into an
irregular "zoo" series using a "Date" index. The log-difference returns for Lucent Technologies is
depicted in Fig ure 4.
R> LU <- as.zoo(LU)
R> index(LU) <- as.Date(index(LU))
R> LU <- na.omit(LU)
3.3. fCalendar: Indexes of class "timeDate"
Although the methods in zoo work out of the box for many index classe s, it might be necessary
for some index classes to provide c , length, ORDER and MATCH methods such that the methods
in zoo work properly. An example fo r s uch an index class which requires a bit more attention is
"timeDate" from the fCalendar package.
But after the necessary methods have been defined
R> length.timeDate <- function(x) prod(x@Dim)
R> ORDER.timeDate <- function(x, ...) order(as.POSIXct(x), ...)
R> MATCH.timeDate <- function(x, table, nomatch = NA, ...) match(as.POSIXct(x),
+ as.POSIXct(table), nomatch = NA, ...)
the class "timeDate" can be used for indexing "zoo" objects. The following example illustrates
how z2 can be transformed to use the "timeDate" c lass.
R> library(fCalendar)
R> z2td <- zoo(coredata(z2), timeDate(index(z2), FinCenter = "GMT"))
R> z2td
2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07
0.94306673 -0.04149429 0.59448077 -0.52575918 -0.96739776 0.95605566
2004-02-12 2004-02-16 2004-02-20 2004-02-24
-0.62733473 -0.92845336 0.56060280 0.08291711
3.4. The classes "yearmon" and "yearqtr": Roll your own index
One of the stre ngths of the zoo package is its independence of the index class, such that the
index can be easily customized. The previous section already explained how an existing cla ss
("timeDate") can be used as the index if the necessary methods are created. This section has a
Achim Zeileis, Gabor Grothendieck 19
R> plot(diff(log(LU)))
−0.2 0.0 0.2 0.4−0.2 0.0 0.1 0.2
2001 2002 2003 2004
Index
diff(log(LU))
Figure 4: Log-difference returns for Lucent Technologies
similar but slightly different focus: it describes how new index classes can be created addressing
a certain type of indexes. These classes are "yearmon" and "yearqtr" (already contained in
zoo) which pr ovide indexes for monthly and qua rterly data respectively. As the code is virtually
identical for both classes—except that one has the frequency 12 and the other 4—we will only
discuss "yearmon" explicitly.
Of course, monthly data can simply be stored using a numeric index just as the class "ts" does.
The problem is tha t this does not have the meta-information attached that this is really specifying
monthly data which is in "yearmon" simply added by a class attribute. Hence, the class crea tor
is simply defined as
20 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
yearmon <- function(x) structure(floor(12*x + .0001)/12, class = "yearmon")
which is very similar to the as.yearmon coercion functions provided.
As "yearmon" data is now explicitly declared to describe monthly data, this can b e exploited for
coercion to other time classes: either to coarser time scales such as "yearqtr" or to finer time
scales such as "Date", "POSIXct" or "POSIXlt" which by defa ult associate the fir st day within
a month with a "yearmon" observatio n. Adding a format and as.character method produces
human readable character representations of "yearmon" data and Ops and MATCH methods complete
the methods needed for conveniently working with monthly data in zoo. Note, that all o f these
methods are very simple and rather obvious (a s can be seen in the zoo sources), but prove very
helpful in the following examples.
First, we create a regular series zr3 with "yearmon" index which leads to improved printing
compared to the regular series zr1 and zr2 from Section 2.2.
R> zr3 <- zooreg(rnorm(9), start = yearmon(2000), frequency = 12)
R> zr3
Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun 2000
-0.30969096 0.08699142 -0.64837101 -0.62786277 -0.61932674 -0.95506154
Jul 2000 Aug 2000 Sep 2000
-1.91736406 0.38108885 1.51405511
This could be aggregated to quarterly da ta v ia
R> aggregate(zr3, as.yearqtr, mean)
2000 Q1 2000 Q2 2000 Q3
-0.2903569 -0.7340837 -0.0074067
The index can easily be transformed to "Date", the default being the first day of the month but
which can also be changed to the last day of the month.
R> as.Date(index(zr3))
[1] "2000-01-01" "2000-02-01" "2000-03-01" "2000-04-01" "2000-05-01"
[6] "2000-06-01" "2000-07-01" "2000-08-01" "2000-09-01"
R> as.Date(index(zr3), frac = 1)
[1] "2000-01-31" "2000-02-29" "2000-03-31" "2000-04-30" "2000-05-31"
[6] "2000-06-30" "2000-07-31" "2000-08-31" "2000-09-30"
Furthermore, "yearmon" indexes can easily be coerced to "POSIXct" such that the series could be
exp orted as a "its" or "irts" series.
R> index(zr3) <- as.POSIXct(index(zr3))
R> as.irts(zr3)
2000-01-01 00:00:00 GMT -0.3097
2000-02-01 00:00:00 GMT 0.08699
2000-03-01 00:00:00 GMT -0.6484
2000-04-01 00:00:00 GMT -0.6279
2000-05-01 00:00:00 GMT -0.6193
2000-06-01 00:00:00 GMT -0.9551
2000-07-01 00:00:00 GMT -1.917
2000-08-01 00:00:00 GMT 0.3811
2000-09-01 00:00:00 GMT 1.514
Achim Zeileis, Gabor Grothendieck 21
Again, this functionality makes switching between different time scales or index representations
particularly easy and zoo provides the user with the flexibility to adjust a certain index to his/her
problem of interest.
22 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
4. Summary and outlook
The package zoo provides an S3 class and methods for indexed totally ordered obs e rvations, such
as both regular and irregular time series. Its key design go als are independence of a particular
index class and compatibility with standard generics similar to the behaviour of the c orresponding
"ts" methods. This paper describes how these are implemented in zoo and illustrates the usage of
the methods for plotting, merging and binding, s e veral mathematical operations, extracting and
replacing data and index, coercion and NA handling.
An indexed object of clas s "zoo" can be thought of as data plus index where the data are essentially
vectors or matrices and the index can be a vector of (in principle) arbitrary class. For (weakly)
regular "zooreg" series, a "frequency" attribute is stored in addition. Therefore, objects of
classes "ts" , "its", "irts" and "timeSeries" can ea sily be transformed into "zoo" objects—
the reverse transformation is also possible provided that the index fulfills the restrictions of the
respective class. Hence, the "zoo" class can also be used as the basis for other classes of indexed
observations and more sp e cific functionality can be built on top of it. Furthermore, it bridges the
gap between irregular and regular series, facilitating operations such as NA handling compared to
"ts".
Whereas a lot of effort was put into achieving independence of a particular index class, the types
of data that can be indexed with "zoo" are currently limited to vectors and matrices, typically
containing numeric values. Although, there is some limited support available for indexed facto rs,
one important direction for future development of zoo is to add better support for other objects
that can also naturally b e indexed including specifically factors, data frames and lists.
Computational details
The results in this paper were obtained using R 2.1.0 with the packages zoo 1.0–0, strucchange
1.2–10, fCalendar 201.10060, tseries 0.9–2 7 and DAAG 0.46. R itself and all packages used are
available from CRAN at http://CRAN.R-project.org/ .
References
Giles Heywood. its: Irregular Time Series. Portfolio & Risk Adviso ry Group and Commerzbank
Securities, 2004. R package version 1.0.4.
John Maindonald and W. John Braun. DAAG: Data Analysis and Graphics, 2004. URL
http://www.stats.uwo.ca/DAAG/ . R package version 0.46 .
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foun-
dation for Statistical Computing, Vienna, Austria, 2005. URL http://www.R-project.org/ .
ISBN 3-900051-00-3.
Adrian Trapletti. tseries: Time Series Analysis and Computational Finance, 2005. R package
version 0.9-25.
Diethelm Wuertz. Rmetrics: An Environment and Software Collection for Teaching Financial
Engineering and Computational Finance, 2005. URL http://www.Rmetrics.org/ . R package
fCalendar, version 201.10059.
Achim Zeileis. Implementing a class of structural change tests: An econometric c omputing ap-
proach. Report 7, Department of Statistics and Mathematics, Wirtschaftsuniversit
¨
at Wien,
Research Report Series, July 2004. URL http://epub.wu-wien.ac.at/ .
Achim Zeileis , Friedrich Leisch, Kurt Hornik, and Christian Kleiber . strucchange: An R package
for testing for structural change in linear regres sion models. Journal of Statistical Software, 7
(2):1–38, 2002. URL http://www.jstatsoft.org/v07/i02/ .
Achim Zeileis, Gabor Grothendieck 23
A. Reference card
Creation
zoo(x, order.by) creation of a "zoo" object from the observations x (a vector
or a matrix) and an index order.by by which the observa-
tions are ordered.
For computations on arbitrary index classes, methods to
the following ge nric functions are assumed to work: combin-
ing c(), querying length length(), subsetting [, ordering
ORDER() and value matching MATCH(). For pretty print-
ing an as.character and/or index2char method might
be helpful.
Creation of regular series
zoo(x, order.by, freq) works as above but creates a "zooreg" object which inherits
from "zoo" if the frequency freq complies with the index
order.by. An as.numeric method has to be available for
the index class.
zooreg(x, start, end, freq) creates a "zooreg" series with a numeric index as above
and has (almost) the same interface as ts().
Standard methods
plot plotting
lines adding a "zoo" series to a plot
print printing
summary summarizing (co lumn-wis e )
str displaying structure of "zoo" objects
head, tail head and tail of "zoo" objects
Coercion
as.zoo coercion to "zoo" is available for objects of class "ts",
"its", "irts" (plus a defa ult method).
as.class .zoo coercion from "zoo" to other classes. Currently available
for class in "matrix", "vector" , "data.frame", "list",
"irts" , "its" and "ts".
is.zoo querying wether an object is of class "zoo"
Merging and binding
merge union, intersection, le ft join, right join along indexes
cbind column binding along the intersection of the index
c, rbind combining/row binding (indexes may not overlap)
aggregate compute summary statistics along a coarser grid of indexe s
Mathematical operations
Ops group generic functions performed along the intersection of
indexes
t transposing (coerces to "matrix" before)
cumsum compute (columnwise) cumulative quantities: sums cum-
sum(), products cumprod(), ma ximum cummax() , mini-
mum cummin().
24 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations
Extracting and replacing data and index
index, time extract the index of a series
index<-, time<- replace the index of a series
coredata, coredata<- extract and replace the data associated with a "zoo" object
lag lagged observations
diff arithmetic a nd ge ometric differences
start, end querying start and end of a series
window, window<- subsetting of "zoo" objects using their index
NA handling
na.omit omit NAs
na.contiguous compute longest sequence of non-NA observations
na.locf impute NAs by carrying forward the last observation
na.approx impute NAs by interpolation
Rolling functions
rapply apply a function to rolling mar gin of a n a rray
rollmean more efficient functions for c omputing the rolling mean, me-
dian and maximum are rollmean(), rollmedian() and
rollmax(), respectively
Methods for regular series
is.regular checks whether a series is weakly (or str ic tly if strict =
TRUE) regular
frequency, deltat extracts the frequency or its reciprocal value respectively
from a se ries, for "zoo" series the functions try to determine
the regularity and fr equency in a data-driven way
cycle gives the position in the cycle of a regular series
... The LAI values for the six vines in lysimeters were averaged to receive one weekly value to represent the time series. Linear temporal interpolation was used to generate a daily time-series, using the "zoo" package in R (Zeileis and Grothendieck, 2005). ...
... Finally, a rolling RMSE analysis was performed, using a seven-day bandwidth consecutively calculating the errors for the previous seven days, to assess the weekly bias of the forecast values. Rolling functions are typically applied on ordered observations using a predefined window that is moved over the full sample period (Zeileis and Grothendieck, 2005). The rolling RMSE statistic was applied on a seven-day window since this is a commonly used time gap for irrigation decision making. ...
... The mean weekly RMSE values were calculated based on rolling RMSE performed for the forecast values of all seasons and for all models. This analysis was conducted by R using package "zoo" (Zeileis and Grothendieck, 2005). ...
Vineyard irrigation management relies on accurate assessment of crop evapotranspiration (ETc). ETc is affected by the by type of plant, its physiological properties, and meteorological parameters. Rapid measurement of these factors facilitates quantification of ETc and enables skilled decision-making for data-driven irrigation. Our main objective was to quantify the performance of different modeling approaches for forecasting seasonal ETc using meteorological and vegetative data (e.g., leaf area) from five consecutive growing seasons (2013–2017) of Vitis vinifera 'Cabernet Sauvignon' vines. Time series of ETc was acquired from water balance from vines grown in drainage lysimeters within the vineyard. ETc forecasts were generated for each season using twelve regression models: six linear and six non-linear multivariate adaptive regression spline (MARS) models. Each regression model constituted a unique combination of variables, some relying on crop coefficient (Kc) and others based on direct ETc forecasting. The models were trained using data from four growing seasons and compared via measures of coefficient of determination (R2), residual standard deviation, and coefficient of variation. Each model was then tested using ETc forecasts for a fifth growing season, and compared to the measured ETc values using correlation, root mean squared error (RMSE), and normalized RMSE. Finally, a mean-seasonal rolling RMSE with a window of 7 days was used to assess the accuracy of the different models. The results show a clear advantage to using non-linear modeling for ETc forecasting (average RMSE range of 0.81–1.05 vs. 0.64–0.71 mm day−1, respectively). Furthermore, direct forecasting and Kc-based methods yielded similar results, and all models benefited from the incorporation of leaf area data. Similar outcomes were found for the rolling RMSE analysis, with improved model accuracy credited to the inclusion of leaf area, especially early in the season. Our findings confirm that advanced algorithms promote site-specific and location-oriented irrigation management.
... (R Core Team, 2019), we were then able to extract three-dimensional positions from the list of XYZ coordinates. Missing coordinates between two known coordinates were interpolated using the package zoo (Zeileis & Grothendieck, 2005). ...
Research on diel vertical migration (DVM) is generally conducted at the population level, whereas few studies have focused on how individual animals behaviorally respond to threats when also having access to foraging opportunities. We utilized a 3D tracking platform to record the swimming behavior of Daphnia magna exposed to ultraviolet radiation (UVR) in the presence or absence of a food patch. We analyzed the vertical position of individuals before and during UVR exposure and found that the presence of food reduced the average swimming depth during both sections of the trial. Since UVR is a strong driver of zooplankton behavior, our results highlight that biotic factors, such as food patches, have profound effects on both the amplitude and the frequency of avoidance behavior. In a broader context, the trade-off between threats and food adds to our understanding of the strength and variance of behavioral responses to threats, including DVM.
... We use R for the data analysis (R Core Team, 2020a). The main packages are tidyverse (Wickham et al., 2019), ncdf4 (Pierce, 2019), ggplot2 15 (Wickham, 2016), raster (Hijmans, 2020), zoo, (Zeileis and Grothendieck, 2005), plyr (Wickham, 2011), and (Wickham et al., 2021). We use the nest R package (https://github.com/krehfeld/nest ...
The incorporation of water isotopologues into the hydrology of general circulation models (GCMs) facilitates the comparison between modelled and measured proxy data in paleoclimate archives. However, the variability and drivers of measured and modelled water isotopologues, and indeed the diversity of their representation in different models are not well constrained. Improving our understanding of this variability in past and present climates will help to better constrain future climate change projections and decrease their range of uncertainty. Speleothems are a precisely datable paleoclimate archive and provide well preserved (semi-)continuous multivariate isotope time series in the lower and mid-latitudes, and are, therefore, well suited to assess climate and isotope variability on decadal and longer timescales. However, the relationship between speleothem oxygen and carbon isotopes to climate variables also depends on site-specific parameters, and their comparison to GCMs is not always straightforward. Here we compare speleothem oxygen and carbon isotopic signatures from the Speleothem Isotopes Synthesis and AnaLysis database version 2 (SISALv2) to the output of five different water-isotope-enabled GCMs (ECHAM5-wiso, GISS-E2-R, iCESM, iHadCM3, and isoGSM) over the last millennium (850-1850 common era, CE). We systematically evaluate differences and commonalities between the standardized model simulation outputs. The goal is to distinguish climatic drivers of variability for both modelled and measured isotopes. We find strong regional differences in the oxygen isotope signatures between models that can partly be attributed to differences in modelled temperatures. At low latitudes, precipitation amount is the dominant driver for water isotope variability, however, at cave locations the agreement between modelled temperature variability is higher than for precipitation variability. While modelled isotopic signatures at cave locations exhibited extreme events coinciding with changes in volcanic and solar forcing, such fingerprints are not apparent in the speleothem isotopes, and may be attributed to the lower temporal resolution of speleothem records compared to the events that are to be detected. Using spectral analysis, we can show that all models underestimate decadal and longer variability compared to speleothems, although to varying extent. We found that no model excels in all analyzed comparisons, although some perform better than the others in either mean or variability. Therefore, we advise a multi-model approach, whenever comparing proxy data to modelled data. Considering karst and cave internal processes through e.g. isotope-enabled karst models may alter the variability in speleothem isotopes and play an important role in determining the most appropriate model. By exploring new ways of analyzing the relationship between the oxygen and carbon isotopes, their variability, and co-variability across timescales, we provide methods that may serve as a baseline for future studies with different models using e.g. different isotopes, different climate archives, or time periods.
... Analysis was undertaken using R [15]. Rolling averages over time were calculated using the 'rollmean' function of the zoo (v1.8.9) package in R [16]. A centred rolling window of 14 days was used for daily deaths and daily medical admissions and a window of 28 days for all other plots. ...
Background To better understand the impact of the COVID-19 pandemic on hospital healthcare, we studied activity in the emergency department (ED) and acute medicine department of a major UK hospital. Methods Electronic patient records for all adult patients attending ED ( n = 243,667) or acute medicine ( n = 82,899) during the pandemic (2020–2021) and prior year (2019) were analysed and compared. We studied parameters including severity, primary diagnoses, co-morbidity, admission rate, length of stay, bed occupancy, and mortality, with a focus on non-COVID-19 diseases. Results During the first wave of the pandemic, daily ED attendance fell by 37%, medical admissions by 30% and medical bed occupancy by 27%, but all returned to normal within a year. ED attendances and medical admissions fell across all age ranges; the greatest reductions were seen for younger adults in ED attendances, but in older adults for medical admissions. Compared to non-COVID-19 pandemic admissions, COVID-19 admissions were enriched for minority ethnic groups, for dementia, obesity and diabetes, but had lower rates of malignancy. Compared to the pre-pandemic period, non-COVID-19 pandemic admissions had more hypertension, cerebrovascular disease, liver disease, and obesity. There were fewer low severity ED attendances during the pandemic and fewer medical admissions across all severity categories. There were fewer ED attendances with common non-respiratory illnesses including cardiac diagnoses, but no change in cardiac arrests. COVID-19 was the commonest diagnosis amongst medical admissions during the first wave and there were fewer diagnoses of pneumonia, myocardial infarction, heart failure, cellulitis, chronic obstructive pulmonary disease, urinary tract infection and other sepsis, but not stroke. Levels had rebounded by a year later with a trend to higher levels of stroke than before the pandemic. During the pandemic first wave, 7-day mortality was increased for ED attendances, but not for non-COVID-19 medical admissions. Conclusions Reduced ED attendances in the first wave of the pandemic suggest opportunities for reducing low severity presentations to ED in the future, but also raise the possibility of harm from delayed or missed care. Reassuringly, recent rises in attendance and admissions indicate that any deterrent effect of the pandemic on attendance is diminishing.
The FtsLB complex is a key regulator of bacterial cell division, existing in either an off or on state which supports the activation of septal peptidoglycan synthesis. In Escherichia coli, residues known to be critical for this activation are located in a region near the C-terminal end of the periplasmic coiled-coil domain of FtsLB, raising questions about the precise role of this conserved domain in the activation mechanism. Here, we investigate an unusual cluster of polar amino acids found within the core of the FtsLB coiled coil. We hypothesized that these amino acids likely reduce the structural stability of the domain and thus may be important for governing conformational changes. We found that mutating these positions to hydrophobic residues increased the thermal stability of FtsLB but caused cell division defects, suggesting that the coiled-coil domain is a "detuned" structural element. In addition, we identified suppressor mutations within the polar cluster, indicating that the precise identity of the polar amino acids is important for fine-tuning the structural balance between the off and on states. We propose a revised structural model of the tetrameric FtsLB (named the "Y-model") in which the periplasmic domain splits into a pair of coiled-coil branches. In this configuration, the hydrophilic terminal moieties of the polar amino acids remain more favorably exposed to water than in the original four-helix bundle model ("I-model"). We propose that a shift in this architecture, dependent on its marginal stability, is involved in activating the FtsLB complex and triggering septal cell wall reconstruction.
A comprehensive understanding of the behaviours of the various geophysical processes requires, among others, detailed investigations across temporal scales. In this work, we propose a new time series feature compilation for advancing and enriching such investigations in a hydroclimatic context. This specific compilation can facilitate largely interpretable feature investigations and comparisons in terms of temporal dependence, temporal variation, "forecastability", lumpiness, stability, nonlinearity (and linearity), trends, spikiness, curvature and seasonality. Detailed quantifications and multifaceted characterizations are herein obtained by computing the values of the proposed feature compilation across nine temporal resolutions (i.e., the 1-day, 2-day, 3-day, 7-day, 0.5-month, 1-month, 2-month, 3-month and 6-month ones) and three hydroclimatic time series types (i.e., temperature, precipitation and streamflow) for 34-year-long time series records originating from 511 geographical locations across the continental United States. Based on the acquired information and knowledge, similarities and differences between the examined time series types with respect to the evolution patterns characterizing their feature values with increasing (or decreasing) temporal resolution are identified. To our view, the similarities in these patterns are rather surprising. We also find that the spatial patterns emerging from feature-based time series clustering are largely analogous across temporal scales, and compare the features with respect to their usefulness in clustering the time series at the various temporal resolutions. For most of the features, this usefulness can vary to a notable degree across temporal resolutions and time series types, thereby pointing out the need for conducting multifaceted time series characterizations for the study of hydroclimatic similarity.
- Camilla Ugolini
- Logan Mulroney
- Adrien Leger
- Tommaso Leonardi
The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested sub genomic RNAs used to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5′ cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.
- Kyle Newton
- Dovi Kacev
- Simon RO Nilsson
- Lavinia Sheets
Zebrafish lateral line is an established model for hair cell organ damage, yet few studies link mechanistic disruptions to changes in biologically relevant behavior. We used larval zebrafish to determine how damage via ototoxic chemicals impact rheotaxis. Larvae were treated with CuSO4 or neomycin to disrupt lateral line function then exposed to water flow stimuli. Their swimming behavior was recorded, and DeepLabCut and SimBA software were used to track movements and classify rheotaxis behavior. Lateral line-disrupted fish performed rheotaxis, but they swam greater distances, for shorter durations, and with greater angular variance than controls. Further, spectral decomposition analyses demonstrated that lesioned fish exhibited toxin-specific behavioral profiles with distinct fluctuations in the magnitude, timing, and cross-correlation between changes in linear and angular movements. Our observations support that lateral-line input is needed for fish to perform rheotaxis efficiently in flow and reveals commonly used lesion methods have unique effects on behavior.
- Kyle Beattie
Policy makers and mainstream news anchors have promised the public that the COVID-19 vaccine rollout worldwide would reduce symptoms, and thereby cases and deaths associated with COVID-19. While this vaccine rollout is still in progress, there is a large amount of public data available that permits an analysis of the effect of the vaccine rollout on COVID-19 related cases and deaths. Has this public policy treatment produced the desired effect? One manner to respond to this question can begin by implementing a Bayesian causal analysis comparing both pre- and post-treatment periods. This study analyzed publicly available COVID-19 data from OWID utlizing the R package CausalImpact to determine the causal effect of the administration of vaccines on two dependent variables that have been measured cumulatively throughout the pandemic: total deaths per million (y1) and total cases per million (y2). After eliminating all results from countries with p > 0.05, there were 128 countries for y1 and 103 countries for y2 to analyze in this fashion, comprising 145 unique countries in total (avg. p < 0.004). Results indicate that the treatment (vaccine administration) has a strong and statistically significant propensity to causally increase the values in either y1 or y2 over and above what would have been expected with no treatment. y1 showed an increase/decrease ratio of (+115/-13), which means 89.84% of statistically significant countries showed an increase in total deaths per million associated with COVID-19 due directly to the causal impact of treatment initiation. y2 showed an increase/decrease ratio of (+105/-16) which means 86.78% of statistically significant countries showed an increase in total cases per million of COVID-19 due directly to the causal impact of treatment initiation. Causal impacts of the treatment on y1 ranges from -19% to +19015% with an average causal impact of +463.13%. Causal impacts of the treatment on y2 ranges from -46% to +12240% with an average causal impact of +260.88%. Hypothesis 1 Null can be rejected for a large majority of countries. This study subsequently performed correlational analyses on the causal impact results, whose effect variables can be represented as y1.E and y2.E respectively, with the independent numeric variables of: days elapsed since vaccine rollout began (n1), total vaccination doses per hundred (n2), total vaccine brands/types in use (n3) and the independent categorical variables continent (c1), country (c2), vaccine variety (c3). All categorical variables showed statistically significant (avg. p: < 0.001) postive Wilcoxon signed rank values (y1.E V:[c1 3.04; c2: 8.35; c3: 7.22] and y2.E V:[c1 3.04; c2: 8.33; c3: 7.19]). This demonstrates that the distribution of y1.E and y2.E was non-uniform among categories. The Spearman correlation between n2 and y2.E was the only numerical variable that showed statistically significant results (y2.E ~ n2: rho: 0.34 CI95%[0.14, 0.51], p: 4.91e-04). This low positive correlation signifies that countries with higher vaccination rates do not have lower values for y2.E, slightly the opposite in fact. Still, the specifics of the reasons behind these differences between countries, continents, and vaccine types is inconclusive and should be studied further as more data become available. Hypothesis 2 Null can be rejected for c1, c2, c3 and n2 and cannot be rejected for n1, and n3. The statistically significant and overwhelmingly positive causal impact after vaccine deployment on the dependent variables total deaths and total cases per million should be highly worrisome for policy makers. They indicate a marked increase in both COVID-19 related cases and death due directly to a vaccine deployment that was originally sold to the public as the "key to gain back our freedoms." The effect of vaccines on total cases per million and its low positive association with total vaccinations per hundred signifies a limited impact of vaccines on lowering COVID-19 associated cases. These results should encourage local policy makers to make policy decisions based on data, not narrative, and based on local conditions, not global or national mandates. These results should also encourage policy makers to begin looking for other avenues out of the pandemic aside from mass vaccination campaigns. Some variables that could be included in future analyses might include vaccine lot by country, the degree of prevalence of previous antibodies against SARS-CoV or SARS-CoV-2 in the population before vaccine administration begins, and the Causal Impact of ivermectin on the same variables used in this study.
- Achim Zeileis
- Friedrich Leisch
- Kurt Hornik
- Christian Kleiber
This paper introduces ideas and methods for testing for structural change in linear regression models and presents how these have been realized in an R package called strucchange. It features tests from the generalized fluctuation test framework as well as from the F test (Chow test) framework. Extending standard significance tests it contains methods to fit, plot and test empirical fluctuation processes (like CUSUM, MOSUM and estimatesbased processes) on the one hand and to compute, plot and test sequences of F statistics with the supF , aveF and expF test on the other. Thus, it makes powerful tools available to display information about structural changes in regression relationships and to assess their significance. Furthermore it is described how incoming data can be monitored online. Keywords: structural change, CUSUM, MOSUM, recursive estimates, moving estimates, online monitoring, R, S. 1
- Achim Zeileis
The implementation of a recently suggested class of structural change tests, which test for parameter instability in general parametric models, in the R language for statistical computing is described: Focus is given to the question how the conceptual tools can be translated into computational tools that reflect the properties and flexibility of the underlying econometric methodology while being numerically reliable and easy to use. More precisely, the class of generalized M-fluctuation tests is implemented in the package strucchange providing easily extensible functions for computing empirical fluctuation processes and automatic tabulation of critical values for a functional capturing excessive fluctuations. Traditional significance tests are supplemented by graphical methods which do not only visualize the result of the testing procedure but also convey information about the nature and timing of the structural change and which component of the parametric model is affected by it.
yearmon" indexes can easily be coerced to "POSIXct" such that the series could be exported as a "its" or "irts" series
- Furthermore
Furthermore, "yearmon" indexes can easily be coerced to "POSIXct" such that the series could be exported as a "its" or "irts" series.
its: Irregular Time Series. Portfolio & Risk Advisory Group and Commerzbank Securities
- Giles Heywood
Giles Heywood. its: Irregular Time Series. Portfolio & Risk Advisory Group and Commerzbank Securities, 2004. R package version 1.0.4.
R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing
- R Development
- Core Team
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2005. URL http://www.R-project.org/. ISBN 3-900051-00-3.
Posted by: bagsnearme.blogspot.com
Source: https://www.researchgate.net/publication/5142903_zoo_S3_Infrastructure_for_Regular_and_Irregular_Time_Series