Widget HTML Atas

Zoo Package In R Download

zoo is an R package providing an S3 class with methods for indexed totally ordered observations, such as discrete irregular time series. Its key design goals are independence of a particular index/time/date class and consistency with base R and the "ts" class for regular time series. This paper describes how these are achieved within zoo and provides several illustrations of the available methods for "zoo" objects which include plotting, merging and binding, several mathematical operations, extracting and replacing data and index, coercion and NA handling. A subclass "zooreg" embeds regular time series into the "zoo" framework and thus bridges the gap between regular and irregular time series classes in R.

Example of a single panel plot
Empirical M-fluctuation process for fruitohms data

Content may be subject to copyright.

ResearchGate Logo

Discover the world's research

  • 20+ million members
  • 135+ million publications
  • 700k+ research projects

Join for free

arXiv:math/0505527v1 [math.ST] 25 May 2005

zoo: An S3 Class and Meth od s for Indexed

Totally Ordered Observations

Achim Zeileis

Wirtschaftsuniversit

¨

at Wien

Gab or Grothendieck

Abstract

A previous version to this introduction to the R package zoo has been published as ? in

the Journal of Statistical Software.

zoo is an R package providing an S3 class with methods for indexed totally ordered ob-

servations, such as discrete irregular time series. Its key design goals are independence of a

particular index/time/date class and consistency with base R and the "ts" class for regular

time series. This paper describes how these are achieved within zoo and provides several

illustrations of the available methods for "zoo" objects which include plotting, merging and

binding, several mathematical operations, extracting and replacing data and index, coercion

and NA handling. A subclass "zooreg" embeds regular time series into the "zoo" framework

and thus bridges the gap between regular and irregular time series classes in R.

Keywords : totally ordered observations, irregular time series, regular time series, S3, R.

1. Introduction

The R system for statistical computing (R Development Core Team 2005, http://www.R-project.org/ )

ships with a a class for regularly spaced time series, "ts" in package stats, but has no native class

for irregularly spaced time series. With the increased interest in computational finance with R

over the last years several implementations of classes for irregular time series emerged which are

aimed particularly at finance applica tions. These include the S3 classes "timeSeries" in package

fCalendar from the Rmetrics bundle (Wuertz 2005) and "irts" in package tseries (Tr apletti 2005)

and the S4 class "its" in package its (Heywood 2004). With these packages available, why would

anybody want yet another package providing infrastructure for irregular time series? The above

mentioned implementations have in common that they are restricted to a particular class for the

time scale: the former implementation comes with its own time class "timeDate" built on top

of the "POSIXt" classes available in base R whereas the latter two use "POSIXct" directly. And

this was the starting point for the zoo project: the first author of the present paper needed more

general support for ordered observations, independent of a particular index class, for the package

strucchange (Zeileis, Leisch, Hor nik, and Kle iber 2002). Hence, the package was called zoo which

stands for Z

's ordered observations. Since the first release, a major part of the additions to zoo

were provided by the second author of this paper, so that the name of the package does not really

reflect the authorship anymore. Nevertheless, independence of a particular index class r e mained

the mos t important design goal. While the package evolved to its current status, a second key

design goal became mor e and more clear: to provide methods to standard generic functions for the

"zoo" class that are similar to those for the "ts" class (and base R in general) such that the usage

of zoo is very intuitive beca us e few additional commands have to be learned. This paper describes

how these design goals ar e implemented in zoo . The resulting package provides the "zoo" class

which offers an extensive (and still growing) set o f standard and new methods fo r working with

indexed observa tions and 'talks' to the classes "ts", "its", "irts" and "timeSeries". It also

bridges the gap between regular and irr e gular time series by providing coercion with (virtually) no

loss of information betwee n "ts" and "zoo". With these tools zoo provides the basic infrastructure

2 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

for working with indexed totally ordered observations and the package can be either employed by

users directly or can be a basic ingredient on top of which other more specialized applications can

be built.

The remainder of the paper is organized as follows: Section 2 explains how "zoo" objects are

created a nd illustra tes how the corre sponding methods for plotting, merging and binding, several

mathematical operations, extracting and replacing data and index, coercion and NA handling can

be used. Section 3 outlines how other packages can build on this basic infrastructure. Section 4

gives a few summarizing remarks a nd an outlook on future developments. Finally, an app e ndix

provides a reference card that gives an overview of the functionality contained in zoo.

2. The class "zoo" and its methods

This section describes how "zoo" series can be cre ated a nd subsequently manipulated, visualized,

combined or coerced to other class e s. In Section 2.1, the general class "zoo" for totally ordered

series is described. Subsequently, in Sectio n 2.2, the subclass "zooreg" for regular "zoo" series,

i.e., series which have an index with a specified frequency, is discussed. The metho ds illustrated in

the remainder of the section are mostly the same for both "zoo" and "zooreg" objects and hence

do not have to be discussed se parately. The few differences in merging and binding are briefly

highlighted in Section 2.4.

2.1. Creation of "zoo" objects

The simple idea for the creation of "zoo" objects is to have some vector or matrix of observations

x which are totally ordered by some index vector. In time series applications, this index is a

measure of time but every other numeric, character or even more abstract vector that provides

a total ordering of the observations is also suitable. Objects of class "zoo" are created by the

function

zoo(x, order.by)

where x is the vector or matrix of observations

1

and order.by is the index by which the observa-

tions should be o rdered. It has to be of the same length as NROW(x), i.e., either the same length

as x for vectors or the same number of rows for matrices.

2

The "zoo" object created is essentially

the vector/matrix as befor e but has an additional "index" attribute in which the index is stored.

3

Both the observations in the vector/matrix x and the index order.by can, in principle, be of

arbitrary classes. However, most of the following methods (plotting, aggregating, mathematical

operations) for "zoo" o bjects are typically only useful for numeric observa tio ns x. Special effo rt

in the design was put into indepe ndence from a particular class for the index vector. In zoo, it is

assumed that combination c(), quer ying the length(), value matching MATCH(), subsetting [,,

and, of course, ordering ORDER() work when applied to the index. In addition, an as.character()

method might improve printed output

4

and as.numeric() could be used for computing distances

between indexes, e.g., in interpola tion. Both methods are not neces sary for working with "zoo"

objects but could be used if available. All these methods are available, e.g., for standard numeric

and character vectors and for vectors of classes "Date", "POSIXct" or "times" from package

chron, but not for the class "dateTime" in fCalendar. In the last case , the solution is to pro-

vide methods for the above mentioned functions so that indexing "zoo" objects with "dateTime"

1

In principle, more general objects can be indexed, but curre ntly zoo does not support this. Development plans

are that zoo should eventually support indexed factors, data frames and lists.

2

The only case where this restriction is not imposed is for zero-length vectors, i.e., vectors that only have an

index but no data.

3

There is some limited support for indexed factors available in which case the "zoo" obj ect also has an attribute

"oclass" with the original class of x. This feature is still under development and might change in future versions.

4

If an as.character() method is already defined, but gives not the desired output for printing, then an in-

dex2char() method can be defined. This is a generic convenience function used for creating character representations

of the index vector and it defaults to using as.character().

Achim Zeileis, Gabor Grothendieck 3

vectors works (see Section 3.3 for an example). To achieve this independence of the index c lass,

new generic functions for ordering (ORDER() ) and value matching (MATCH()) are introduced as the

corresponding base functions order() and match() are non-generic. The default methods simply

call the corresponding base functions, i.e., no new method needs to be introduced for a particular

index class if the non-generic functions order() and match() work for this class.

To illustrate the usag e of zoo() , we first load the package and se t the random seed to make the

examples in this paper exactly reproducible.

R> library(zoo)

R> set.seed(1071)

Then, we create two vectors z1 and z2 with "POSIXct" indexes, one with random obse rvations

R> z1.index <- ISOdatetime(2004, rep(1:2, 5), sample(28, 10), 0,

+ 0, 0)

R> z1.data <- rnorm(10)

R> z1 <- zoo(z1.data, z1.index)

and one with a sine wave

R> z2.index <- as.POSIXct(paste(2004, rep(1:2, 5), sample(1:28,

+ 10), sep = "-"))

R> z2.data <- sin(2 * 1:10/pi)

R> z2 <- zoo(z2.data, z2.index)

Furthermore, we create a ma trix Z with random o bs e rvations and a "Date" index

R> Z.index <- as.Date(sample(12450:12500, 10))

R> Z.data <- matrix(rnorm(30), ncol = 3)

R> colnames(Z.data) <- c("Aa", "Bb", "Cc")

R> Z <- zoo(Z.data, Z.index)

In the examples above, the generation of indexes looks a bit awkward due to the fact the indexes

need to be randomly generated (and there are no special functions for random indexes beca us e

these are rarely needed in practice). In "real world" applications, the indexes are typically part of

the raw data set read into R so the code would be even simpler. See Section 3 for such examples.

5

Methods to several standard generic functions are available for "zoo" objects, such as print ,

summary, str, head, tail and [ (subsetting), a few of which are illustrated in the following.

There are three printing code styles for "zoo" objects: vectors are by default printed in "hori-

zontal" style

R> z1

2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07

0.74675994 0.02107873 -0.29823529 0.68625772 1.94078850 1.27384445

2004-02-12 2004-02-16 2004-02-20 2004-02-24

0.22170438 -2.07607585 -1.78439244 -0.19533304

R> z1[3:7]

2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12

-0.2982353 0.6862577 1.9407885 1.2738445 0.2217044

5

Note, that in the code above a new as.Date method, provided in zoo, is used to convert days since 1970-01-01

to class "Date". See the respective help page for more details.

4 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

and matrices in "vertical" style

R> Z

Aa Bb Cc

2004-02-02 1.25543390 0.68157316 -0.63292049

2004-02-08 -1.49458326 1.32341223 -1.49442269

2004-02-09 -1.87462247 -0.87329289 0.62733971

2004-02-21 -0.14538608 0.45234903 -0.14597401

2004-02-22 0.22542418 0.53838938 0.23136133

2004-02-29 1.20695518 0.31814222 -0.01129202

2004-03-05 -1.20861025 1.42379785 -0.81614483

2004-03-10 -0.11039563 1.34774254 0.95522468

2004-03-14 0.84202385 -2.73842019 0.23150695

2004-03-20 -0.19019104 0.12308872 -1.51862157

R> Z[1:3, 2:3]

Bb Cc

2004-02-02 0.6815732 -0.6329205

2004-02-08 1.3234122 -1.4944227

2004-02-09 -0.8732929 0.6273397

Additionally, there is a "plain" style which simply firs t prints the data and then the index.

Above, we have illustrated that "zoo" series can be indexed like vectors or matrice s respectively,

i.e., with integers correponding to their observation number (and column number). But for indexed

observations, one would obviously also like to be able to index with the index class. This is

also available in [ which only uses vector/matrix-type subsetting if its first argument is of class

"numeric", "integer" or "logical".

R> z1[ISOdatetime(2004, 1, c(14, 25), 0, 0, 0)]

2004-01-14 2004-01-25

0.02107873 0.68625772

If the index c lass happens to be "numeric", the index has to be either insulated in I() like z[I(i)]

or the window() method can be used (see Section 2.6).

Summaries and most other methods for "zoo" objects are carried out column wise, reflecting the

rectangular s tructure. In addition, a summary of the index is provided.

R> summary(z1)

Index z1

Min. :2004-01-05 00:00:00 Min. :-2.07608

1st Qu.:2004-01-20 12:00:00 1st Qu.:-0.27251

Median :2004-02-01 12:00:00 Median : 0.12139

Mean :2004-02-01 09:36:00 Mean : 0.05364

3rd Qu.:2004-02-15 00:00:00 3rd Qu.: 0.73163

Max. :2004-02-24 00:00:00 Max. : 1.94079

R> summary(Z)

Achim Zeileis, Gabor Grothendieck 5

Index Aa Bb Cc

Min. :2004-02-02 Min. :-1.8746 Min. :-2.7384 Min. :-1.51862

1st Qu.:2004-02-12 1st Qu.:-0.9540 1st Qu.: 0.1719 1st Qu.:-0.77034

Median :2004-02-25 Median :-0.1279 Median : 0.4954 Median :-0.07863

Mean :2004-02-25 Mean :-0.1494 Mean : 0.2597 Mean :-0.25739

3rd Qu.:2004-03-08 3rd Qu.: 0.6879 3rd Qu.: 1.1630 3rd Qu.: 0.23147

Max. :2004-03-20 Max. : 1.2554 Max. : 1.4238 Max. : 0.95522

2.2. Creation of "zooreg" objects

Strictly regular series are such se ries observations where the distance betwe e n the indexes of every

two adjacent observations is the same. Such series can a lso be described by their fre quency, i.e.,

the reciprocal value o f the dis tance between two observations. As "zoo" can be used to store series

with arbitrary type of index, it can, of course, also be used to store series with regular indexes.

So why should this case be given special attention, in particular as there is alre ady the "ts" class

devoted entirely to regular series? There are two reasons: First, to be able to convert back and

forth between "ts" and "zoo", the frequency of a certain series nee ds to be stored on the "zoo"

side. Second, "ts" is limited to strictly regular serie s and the regularity is lost if some internal

observations are omitted. Series that can be created by omitting some internal observations from

strictly regular series will in the following be refered to as being (weakly) re gular. Therefore, a

class that bridges the gap between irregular and strictly regular ser ie s is needed and "zooreg"

fills this gap. Objects of class "zooreg" inherit from class "zoo" but have an additional attribute

"frequency" in which the frequency of the series is stored. Therefore, they can be employed to

represent bo th strictly and weakly regular series.

To cr eate a "zooreg" object, either the command zoo() can be used or the co mma nd zooreg().

zoo(x, order.by, frequency)

zooreg(data, start, end, frequency, deltat, ts.eps, order.by)

If zoo() is called as in the previous section but with an additional frequency a rgument, it

is checked whether frequency complies with the index order.by: if it does an object of class

"zooreg" inheriting from "zoo" is re tur ned. The command zooreg() takes mostly the sa me ar-

guments as ts().

6

In both cases , the index clas s is more restricted than in the plain "zoo" case.

The index must be of a clas s which can be coerced to "numeric" (for checking its regularity) and

when converted to numeric the index must be expressa ble as multiples of 1/ frequency. Further-

more, adding/substr acting a numeric to/from an observation of the index clas s, should return the

correct value of the index class again, i.e., group generic functions Ops should be defined.

7

The following calls yield equivalent series

R> zr1 <- zooreg(sin(1:9), start = 2000, frequency = 4)

R> zr2 <- zoo(sin(1:9), seq(2000, 2002, by = 1/4), 4)

R> zr1

2000(1) 2000(2) 2000(3) 2000(4) 2001(1) 2001(2) 2001(3)

0.8414710 0.9092974 0.1411200 -0.7568025 -0.9589243 -0.2794155 0.6569866

2001(4) 2002(1)

0.9893582 0.4121185

R> zr2

6

Only if order.by is specified in the zooreg() call, then zoo(x, order.by, frequency) is called.

7

An application of non-numeric indexes for regular series are the classes "yearmon" and "yearqtr" which are

designed for monthly and quarterly series respectively and are discussed in Section 3. 4.

6 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

2000(1) 2000(2) 2000(3) 2000(4) 2001(1) 2001(2) 2001(3)

0.8414710 0.9092974 0.1411200 -0.7568025 -0.9589243 -0.2794155 0.6569866

2001(4) 2002(1)

0.9893582 0.4121185

to which methods to standard ge neric functions for regular series can be applied, such as fre-

quency , deltat , cycle.

As stated above, the advantage of "zooreg" series is that they remain regular even if an internal

observation is dropped:

R> zr1 <- zr1[-c(3, 5)]

R> zr1

2000(1) 2000(2) 2000(4) 2001(2) 2001(3) 2001(4) 2002(1)

0.8414710 0.9092974 -0.7568025 -0.2794155 0.6569866 0.9893582 0.4121185

R> class(zr1)

[1] "zooreg" "zoo"

R> frequency(zr1)

[1] 4

This facilitates NA ha ndling significantly compa red to "ts" and makes "zooreg" a much more

attractive data type, e.g., for time series regression.

zooreg() can also deal with non-numeric indexes provided tha t adding "numeric" observations

to the index class preserves the class and does not coerce to "numeric".

R> zooreg(1:5, start = as.Date("2005-01-01"))

2005-01-01 2005-01-02 2005-01-03 2005-01-04 2005-01-05

1 2 3 4 5

To check whether a certain series is (strictly) regular, the new g e neric function is.regular(x,

strict = FALSE) can be used:

R> is.regular(zr1)

[1] TRUE

R> is.regular(zr1, strict = TRUE)

[1] FALSE

This function (and also the frequency, deltat and cycle) also work for "zoo" objects if the

regularity can still be inferred from the data:

R> zr1 <- as.zoo(zr1)

R> zr1

2000 2000.25 2000.75 2001.25 2001.5 2001.75 2002

0.8414710 0.9092974 -0.7568025 -0.2794155 0.6569866 0.9893582 0.4121185

Achim Zeileis, Gabor Grothendieck 7

R> class(zr1)

[1] "zoo"

R> is.regular(zr1)

[1] TRUE

R> frequency(zr1)

[1] 4

Of course , inferring the underlying regularity is not always reliable and it is safer to s tore a regular

series as a "zooreg" object if it is intended to be a regular series.

If a weakly regular series is coerced to "ts" the missing observations are filled with NAs (see also

Section 2.8). For strictly regular series with numeric index, the class can be switched between

"zoo" and "ts" without loss of information.

R> as.ts(zr1)

Qtr1 Qtr2 Qtr3 Qtr4

2000 0.8414710 0.9092974 NA -0.7568025

2001 NA -0.2794155 0.6569866 0.9893582

2002 0.4121185

R> identical(zr2, as.zoo(as.ts(zr2)))

[1] TRUE

This enables direct use of functions such as acf, arima, stl etc. on "zooreg" objects as these

methods coerce to "ts" first. The result only has to be coerced back to "zoo", if appropriate.

2.3. Plot ting

The plot method for "zoo" objects, in particular for multivariate "zoo" series , is based on the

corresponding method for (multivariate) regular time series. It relies on plot and lines methods

being available for the index class which can plot the index against the obse rvations.

By default the plot method creates a panel for each series

R> plot(Z)

but can also display all series in a single panel

R> plot(Z, plot.type = "single", col = 2:4)

In both cases additional graphical parameters like color col , plotting character pch and line type

lty can be expanded to the number of series. But the plot method for "zoo" objects offers some

more flexibility in specifica tio n o f graphical parameters as in

R> plot(Z, type = "b", lty = 1:3, pch = list(Aa = 1:5, Bb = 2, Cc = 4),

+ col = list(Bb = 2, 4))

8 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

The argument lty be haves as before and sets every series in another line type. The pch argument

is a named list that assigns to each series a different vector of plotting character s each of which

is expanded to the number of observations. Such a list does not necessarily have to include the

names of all se ries, but can also specify a s ubse t. For the remaining series the default parameter

is then used which can again be changed: e.g., in the above example the col argument is set to

display the ser ies "Bb" in red and all remaining series in blue. The results of the multiple panel

plots are depicted in Figure 2 and the single panel plot in 1.

2.4. Merging and binding

As for many rectangular data formats in R, ther e are both methods for combining the rows and

columns of "zoo" objects respectively. For the rbind method the number of columns of the

combined objects has to be identical and the indexes may not overlap.

R> rbind(z1[5:10], z1[2:3])

2004-01-14 2004-01-19 2004-01-27 2004-02-07 2004-02-12 2004-02-16

0.02107873 -0.29823529 1.94078850 1.27384445 0.22170438 -2.07607585

2004-02-20 2004-02-24

-1.78439244 -0.19533304

The c method simply calls rbind and hence behave s in the s ame way.

The cbind method by default combines the columns by the union of the indexes and fills the

created g aps by NAs.

R> cbind(z1, z2)

z1 z2

2004-01-03 NA 0.94306673

2004-01-05 0.74675994 -0.04149429

2004-01-14 0.02107873 NA

−2 −1 0 1

Index

Z

Feb 02 Feb 12 Feb 22 Mar 03 Mar 13

Figure 1: Example of a single panel plot

Achim Zeileis, Gabor Grothendieck 9

Feb 02 Feb 12 Feb 22 Mar 03 Mar 13

Index

Z

Feb 02 Feb 12 Feb 22 Mar 03 Mar 13

Index

Z

Figure 2: Examples of multiple panel plots

10 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

2004-01-17 NA 0.59448077

2004-01-19 -0.29823529 -0.52575918

2004-01-24 NA -0.96739776

2004-01-25 0.68625772 NA

2004-01-27 1.94078850 NA

2004-02-07 1.27384445 NA

2004-02-08 NA 0.95605566

2004-02-12 0.22170438 -0.62733473

2004-02-13 NA -0.92845336

2004-02-16 -2.07607585 NA

2004-02-20 -1.78439244 NA

2004-02-24 -0.19533304 NA

2004-02-25 NA 0.56060280

2004-02-26 NA 0.08291711

In fact, the cbind method is synonymous with the merge method

8

except that the latter provides

additional arguments which allow for combining the columns by the intersection of the indexes

using the argument all = FALSE

R> merge(z1, z2, all = FALSE)

z1 z2

2004-01-05 0.74675994 -0.04149429

2004-01-19 -0.29823529 -0.52575918

2004-02-12 0.22170438 -0.62733473

Additionally, the filling pattern can be changed in merge , the naming of the columns can be

modified and the return class of the result can be specified. In the case of merging of objects with

different index classes, R gives a warning a nd tries to coerce the indexes. Merging objects with

different index classes is generally discouraged—if it is used nevertheless, it is the responsibility

of the user to ensure that the re sult is as intended. If at least one of the merged/binded o bjects

was a "zooreg" object, then merge tries to r eturn a "zooreg" object. This is done by asse ssing

whether there is a common maximal frequency and by checking whether the resulting index is still

(weakly) regular.

If non-"zoo" objects are included in merging, then merge g ives plain vectors/factors/matrices the

index of the fir st argument (if it is of the same length). Scalars are always added for the full index

without missing values.

R> merge(z1, pi, 1:10)

z1 pi 1:10

2004-01-05 0.74675994 3.14159265 1.00000000

2004-01-14 0.02107873 3.14159265 2.00000000

2004-01-19 -0.29823529 3.14159265 3.00000000

2004-01-25 0.68625772 3.14159265 4.00000000

2004-01-27 1.94078850 3.14159265 5.00000000

2004-02-07 1.27384445 3.14159265 6.00000000

2004-02-12 0.22170438 3.14159265 7.00000000

2004-02-16 -2.07607585 3.14159265 8.00000000

2004-02-20 -1.78439244 3.14159265 9.00000000

2004-02-24 -0.19533304 3.14159265 10.00000000

8

Note, that i n some situations the column naming in the resulting object is somewhat problematic in the cbind

method and the merge method might provide better formatting of the column names.

Achim Zeileis, Gabor Grothendieck 11

Another function which performs ope rations along a subset o f indexes is aggregate , which is

discussed in this section although it does not combine several objects. Using the aggregate

method, "zoo" o bjects are split into subsets along a coarser index grid, summary s tatistics are

computed for e ach and then the reduced object is retur ned. In the following example, first a

function is set up which returns for a given "Date" value the corresponding first of the month.

This function is then used to compute the coarser grid for the aggregate call: in the first example,

the g rouping is computed explicitely by firstofmonth(Z.index) and the mean of the observations

in the month is returned—in the second example, only the function that computes the grouping

(when applied to index(Z)) is supplied and the first observation is used for aggregation.

R> firstofmonth <- function(x) as.Date(sub("..$", "01", format(x)))

R> aggregate(Z, firstofmonth(Z.index), mean)

Aa Bb Cc

2004-02-01 0.53820841 0.04508597 -0.12412352

2004-03-01 -1.18080051 0.58156655 -0.45730045

R> aggregate(Z, firstofmonth(Z.index), head, 1)

Aa Bb Cc

2004-02-01 1.2554339 0.6815732 -0.6329205

2004-03-01 -1.4945833 1.3234122 -1.4944227

2.5. Mathematical operations

To allow for standard mathematical operations among "zoo" objects, zoo extends group generic

functions Ops. These perform the oper ations only for the intersection of the indexe s of the objects.

As an example, the summation and log ical comparison with < of z1 and z2 yield

R> z1 + z2

2004-01-05 2004-01-19 2004-02-12

0.7052657 -0.8239945 -0.4056304

R> z1 < z2

2004-01-05 2004-01-19 2004-02-12

FALSE FALSE FALSE

Additionally, methods for transposing t of "zoo" objects—which coerces to a matrix before—and

computing cumulative quantities such as cumsum, cumprod, cummin , cummax which are all applied

column wis e .

R> cumsum(Z)

Aa Bb Cc

2004-02-02 1.2554339 0.6815732 -0.6329205

2004-02-08 -0.2391494 2.0049854 -2.1273432

2004-02-09 -2.1137718 1.1316925 -1.5000035

2004-02-21 -2.2591579 1.5840415 -1.6459775

2004-02-22 -2.0337337 2.1224309 -1.4146162

2004-02-29 -0.8267785 2.4405731 -1.4259082

2004-03-05 -2.0353888 3.8643710 -2.2420530

2004-03-10 -2.1457844 5.2121135 -1.2868283

2004-03-14 -1.3037606 2.4736933 -1.0553214

2004-03-20 -1.4939516 2.5967820 -2.5739429

12 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

2.6. Ext racting and replacing the data and the index

zoo provides several generic functions and methods to work on the data contained in a "zoo"

object, the index (or time) attribute associated to it, and on both data a nd index.

The data stored in "zoo" objects can be extracted by coredata which strips off all "zoo"-specific

attributes and it can be re placed using coredata<-. Both are new generic functions

9

with methods

for "zoo" objects as illustrated in the following example.

R> coredata(z1)

[1] 0.74675994 0.02107873 -0.29823529 0.68625772 1.94078850 1.27384445

[7] 0.22170438 -2.07607585 -1.78439244 -0.19533304

R> coredata(z1) <- 1:10

R> z1

2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12

1 2 3 4 5 6 7

2004-02-16 2004-02-20 2004-02-24

8 9 10

The index a ssociated with a "zoo" object can be extracted by index and modified by index<-.

As the interpretation of the index as "time" in time series a pplica tio ns is natural, there are also

synonymous methods time and time<-. Hence, the commands index(z2) and time(z2) return

equivalent results.

R> index(z2)

[1] "2004-01-03 Eastern Standard Time" "2004-01-05 Eastern Standard Time"

[3] "2004-01-17 Eastern Standard Time" "2004-01-19 Eastern Standard Time"

[5] "2004-01-24 Eastern Standard Time" "2004-02-08 Eastern Standard Time"

[7] "2004-02-12 Eastern Standard Time" "2004-02-13 Eastern Standard Time"

[9] "2004-02-25 Eastern Standard Time" "2004-02-26 Eastern Standard Time"

The index scale of z2 can be changed to that of z1 by

R> index(z2) <- index(z1)

R> z2

2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07

0.94306673 -0.04149429 0.59448077 -0.52575918 -0.96739776 0.95605566

2004-02-12 2004-02-16 2004-02-20 2004-02-24

-0.62733473 -0.92845336 0.56060280 0.08291711

The start and the end of the index/time vector can be queried by start and end:

R> start(z1)

[1] "2004-01-05 Eastern Standard Time"

R> end(z1)

9

The coredata functionality is similar in spirit to the core function in its and value in tseries. However, the

focus of those functions is somewhat narrower and we try to provide more general purpose generic functions. See

the respective manual page for more details.

Achim Zeileis, Gabor Grothendieck 13

[1] "2004-02-24 Eastern Standard Time"

To work on both data and index/time, zoo provides window and window<- methods for "zoo"

objects. In both cases the window is specified by

window(x, index, start, end)

where x is the "zoo" object, index is a set of indexes to be selected (by default the full index of

x) and start and end can be use d to restrict the index set.

R> window(Z, start = as.Date("2004-03-01"))

Aa Bb Cc

2004-03-05 -1.2086102 1.4237978 -0.8161448

2004-03-10 -0.1103956 1.3477425 0.9552247

2004-03-14 0.8420238 -2.7384202 0.2315069

2004-03-20 -0.1901910 0.1230887 -1.5186216

R> window(Z, index = index(Z)[5:8], end = as.Date("2004-03-01"))

Aa Bb Cc

2004-02-22 0.22542418 0.53838938 0.23136133

2004-02-29 1.20695518 0.31814222 -0.01129202

The first example selects all observations starting from 2004-0 3-01 where as the se c ond selects from

the from the 5th to 8th obse rvation tho se up to 2004-03-01.

The same syntax can be used for the corresponding replace ment function.

R> window(z1, end = as.POSIXct("2004-02-01")) <- 9:5

R> z1

2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12

9 8 7 6 5 6 7

2004-02-16 2004-02-20 2004-02-24

8 9 10

Two methods that are standard in time series a pplications are lag and diff. These are available

with the same arguments as the "ts" methods.

10

R> lag(z1, k = -1)

2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12 2004-02-16

9 8 7 6 5 6 7

2004-02-20 2004-02-24

8 9

R> merge(z1, lag(z1, k = 1))

10

diff als o has an additional argument that also allows for geometric and not only allows arithmetic differences.

Furthermore, note the sign of the lag in lag: by default it is positive and shifts the observations forward, to obtain

the more standard backward shift the lag has to be negative.

14 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

z1 lag(z1, k = 1)

2004-01-05 9 8

2004-01-14 8 7

2004-01-19 7 6

2004-01-25 6 5

2004-01-27 5 6

2004-02-07 6 7

2004-02-12 7 8

2004-02-16 8 9

2004-02-20 9 10

2004-02-24 10 NA

R> diff(z1)

2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12 2004-02-16

-1 -1 -1 -1 1 1 1

2004-02-20 2004-02-24

1 1

2.7. Coercion to and from "zoo"

Coercion to and from "zoo" objects is available for o bjects of various classes, in particular "ts" ,

"irts" and "its" objects can be coerce d to "zoo" and back if the index is of the appropriate

class.

11

Coercion between "zooreg" and "zoo" is also available and is essentially dropping the "fre-

quency" attribute or try ing to add o ne, r e spectively.

Furthermore, "zoo" objects can be coerced to vectors, matrices, lists and data frames (the latter

dropping the index/time attribute). A simple example is

R> as.data.frame(Z)

Aa Bb Cc

2004-02-02 1.2554339 0.6815732 -0.63292049

2004-02-08 -1.4945833 1.3234122 -1.49442269

2004-02-09 -1.8746225 -0.8732929 0.62733971

2004-02-21 -0.1453861 0.4523490 -0.14597401

2004-02-22 0.2254242 0.5383894 0.23136133

2004-02-29 1.2069552 0.3181422 -0.01129202

2004-03-05 -1.2086102 1.4237978 -0.81614483

2004-03-10 -0.1103956 1.3477425 0.95522468

2004-03-14 0.8420238 -2.7384202 0.23150695

2004-03-20 -0.1901910 0.1230887 -1.51862157

2.8. NA handling

Four metho ds for dealing w ith NAs (missing observations) in the observations are applicable

to "zoo" objects: na.omit, na.contiguous, na.approx and na.locf. na.omit—or its de-

fault method to be more precise —returns a "zoo" object with incomplete observatio ns removed.

na.contiguous extracts the longest consecutive stretch of non-missing values. Furthermore, new

generic functions na.approx and na.locf and corresponding default metho ds are introduced in

zoo. The former replaces NAs by linear interpolation (using the function approx) and the name

11

Coercion from "zoo" to "irts" is contained in the tseries package.

Achim Zeileis, Gabor Grothendieck 15

of the latter stands for last observation carried forward. It replaces missing observatio ns by the

most recent non-NA prior to it. Leading NAs, which cannot be replaced by previous observations,

are remove d in both functions by default.

R> z1[sample(1:10, 3)] <- NA

R> z1

2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12

9 NA 7 6 5 6 NA

2004-02-16 2004-02-20 2004-02-24

8 9 NA

R> na.omit(z1)

2004-01-05 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-16 2004-02-20

9 7 6 5 6 8 9

R> na.contiguous(z1)

2004-01-19 2004-01-25 2004-01-27 2004-02-07

7 6 5 6

R> na.approx(z1)

2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12

9.000000 7.714286 7.000000 6.000000 5.000000 6.000000 7.111111

2004-02-16 2004-02-20

8.000000 9.000000

R> na.approx(z1, 1:NROW(z1))

2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12

9 8 7 6 5 6 7

2004-02-16 2004-02-20

8 9

R> na.locf(z1)

2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12

9 9 7 6 5 6 6

2004-02-16 2004-02-20 2004-02-24

8 9 9

As the above example illustrates, na.approx uses by default the underlying time scale for inter-

polation. This can be changed, e.g., to an equidistant spacing, by setting the sec ond argument of

na.approx.

2.9. Rolling functions

A typical task to be performed on ordered obser vations is to evaluate some function, e.g., comput-

ing the mean, in a window of obse rvations that is moved over the full sample period. The resulting

statistics are usually synonymously referred to a s rolling/r unning/moving sta tistics. In zoo , the

generic function rapply is provided along with a "zoo" and a "ts" method. The most important

arguments are

16 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

rapply(data, width, FUN)

where the function FUN is applied to a rolling window of size width of the observations data.

The function rapply currently only evaluates the function for windows of full size width, hence

the result has width - 1 fewer observations than the original ser ie s. But it can be determined

whether the 'lost' observa tions should be padded with NA s and whether the result should be left-

or right-aligned or centered (default) with respect to the original index.

R> rapply(Z, 5, sd)

Aa Bb Cc

2004-02-09 1.2814876 0.8018950 0.8218959

2004-02-21 1.2658555 0.7891358 0.8025043

2004-02-22 1.2102011 0.8206819 0.5319727

2004-02-29 0.8662296 0.5266261 0.6411751

2004-03-05 0.9363400 1.7011273 0.6356144

2004-03-10 0.9508642 1.6892246 0.9578196

R> rapply(Z, 5, sd, na.pad = TRUE, align = "left")

Aa Bb Cc

2004-02-02 1.2814876 0.8018950 0.8218959

2004-02-08 1.2658555 0.7891358 0.8025043

2004-02-09 1.2102011 0.8206819 0.5319727

2004-02-21 0.8662296 0.5266261 0.6411751

2004-02-22 0.9363400 1.7011273 0.6356144

2004-02-29 0.9508642 1.6892246 0.9578196

2004-03-05 NA NA NA

2004-03-10 NA NA NA

2004-03-14 NA NA NA

2004-03-20 NA NA NA

To improve the performance of rapply(x, k, foo ) for some frequently used functions foo, more

efficient implementations roll foo (x, k) are available (and also called by rapply ). Currently,

these are the generic functions rollmean, rollmedian and rollmax which have methods for "zoo"

and "ts" series a nd a default method fo r plain vectors.

R> rollmean(z2, 5, na.pad = TRUE)

2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27

NA NA 0.0005792538 0.0031770388 -0.1139910497

2004-02-07 2004-02-12 2004-02-16 2004-02-20 2004-02-24

-0.4185778750 -0.2013054791 0.0087574946 NA NA

3. Combining zoo with other packages

The main purpose of the package zoo is to provide basic infrastructure for working with indexed

totally ordered observations that can be either employed by users directly or can be a basic

ingredient on top of which other packages can build. The latter is illustrated with a few brief

examples involving the packages strucchange, tseries and fCalendar in this section. Finally, the

classes "yearmon" and "yearqtr" (provided in zoo) are used for illustrating how zoo can be

extended by creating a new index class.

Achim Zeileis, Gabor Grothendieck 17

3.1. strucchange: Empirical fluctuation processes

The package strucchange provides a collection of methods for testing, monitoring and dating

structural changes, in particular in linear regression models. Tests for structural change a ssess

whether the parameters of a model remain constant over an ordering with respect to a specified

variable, usually time. To adequatly store and visualize empirical fluctuation proce sses which

capture instabilities over this ordering, a data type for indexed order e d observations is required.

This was the motivation for starting the zoo project.

A simple example for the need of "zoo" objects in strucchange w hich ca n not be (easily) imple-

mented by other irregular time series classes available in R is described in the following. We assess

the constancy of the electrical resistance over the apparent juice content of kiwi fruits.

12

The da ta

set fruitohms is contained in the DAAG package (Maindonald and Braun 2004). The fitted ocus

object contains the OLS-based CUSUM process for the mean of the electr ic al resistance (variable

ohms) indexed by the juice content (variable juice).

R> library(strucchange)

R> library(DAAG)

R> data(fruitohms)

R> ocus <- gefp(ohms ~ 1, order.by = ~juice, data = fruitohms)

R> plot(ocus)

10 20 30 40 50 60

0 1 2 3 4

juice

empirical fluctuation process

M−fluctuation test

Figure 3: Empirical M-fluctuation process for fruitohms data

This OLS-based CUSUM process can be visualized using the plot method for "gefp" objects

which builds on the "zoo" method and yields in this case the plot in Figure 3 showing the process

which crosses its 5% critical value and thus signals a significant decrease in the mean electrical

resistance over the juice content. For more information on the package strucchange and the

function gefp see Zeileis et al. (2 002) and Zeileis (2004).

12

A different approach would b e to test whether the slope of a regression of electrical resistance on juice content

changes wi th increasing j uice content, i.e., to test for instabilities in ohms ~ juice instead of ohms ~ 1. Both lead

to similar results.

18 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

3.2. tseries: Historical financial data

A typical application for irregular time series which became increasingly important over the last

years in computational statistics and finance is daily (or higher frequent) financial data. The

package tseries provides the function get.hist.quote for obtaining historical financial data by

querying Yahoo! Finance at http://finance.yahoo.com/ , an online portal quoting data provided

by Reuters. The following code queries the quotes of Lucent Technologies starting from 2001-01-01

until 2004-09-30:

R> library(tseries)

R> LU <- get.hist.quote(instrument = "LU", start = "2001-01-01",

+ end = "2004-09-30", origin = "1970-01-01")

In the returned LU object the irregular data is stored by extending it in a r e gular grid and filling

the gaps with NAs. The time is stored in days starting from an origin, in this case spe cified to

be 1970-01-01, the origin used by the Date class. This s e ries can be trans fo rmed eas ily into an

irregular "zoo" series using a "Date" index. The log-difference returns for Lucent Technologies is

depicted in Fig ure 4.

R> LU <- as.zoo(LU)

R> index(LU) <- as.Date(index(LU))

R> LU <- na.omit(LU)

3.3. fCalendar: Indexes of class "timeDate"

Although the methods in zoo work out of the box for many index classe s, it might be necessary

for some index classes to provide c , length, ORDER and MATCH methods such that the methods

in zoo work properly. An example fo r s uch an index class which requires a bit more attention is

"timeDate" from the fCalendar package.

But after the necessary methods have been defined

R> length.timeDate <- function(x) prod(x@Dim)

R> ORDER.timeDate <- function(x, ...) order(as.POSIXct(x), ...)

R> MATCH.timeDate <- function(x, table, nomatch = NA, ...) match(as.POSIXct(x),

+ as.POSIXct(table), nomatch = NA, ...)

the class "timeDate" can be used for indexing "zoo" objects. The following example illustrates

how z2 can be transformed to use the "timeDate" c lass.

R> library(fCalendar)

R> z2td <- zoo(coredata(z2), timeDate(index(z2), FinCenter = "GMT"))

R> z2td

2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07

0.94306673 -0.04149429 0.59448077 -0.52575918 -0.96739776 0.95605566

2004-02-12 2004-02-16 2004-02-20 2004-02-24

-0.62733473 -0.92845336 0.56060280 0.08291711

3.4. The classes "yearmon" and "yearqtr": Roll your own index

One of the stre ngths of the zoo package is its independence of the index class, such that the

index can be easily customized. The previous section already explained how an existing cla ss

("timeDate") can be used as the index if the necessary methods are created. This section has a

Achim Zeileis, Gabor Grothendieck 19

R> plot(diff(log(LU)))

−0.2 0.0 0.2 0.4−0.2 0.0 0.1 0.2

2001 2002 2003 2004

Index

diff(log(LU))

Figure 4: Log-difference returns for Lucent Technologies

similar but slightly different focus: it describes how new index classes can be created addressing

a certain type of indexes. These classes are "yearmon" and "yearqtr" (already contained in

zoo) which pr ovide indexes for monthly and qua rterly data respectively. As the code is virtually

identical for both classes—except that one has the frequency 12 and the other 4—we will only

discuss "yearmon" explicitly.

Of course, monthly data can simply be stored using a numeric index just as the class "ts" does.

The problem is tha t this does not have the meta-information attached that this is really specifying

monthly data which is in "yearmon" simply added by a class attribute. Hence, the class crea tor

is simply defined as

20 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

yearmon <- function(x) structure(floor(12*x + .0001)/12, class = "yearmon")

which is very similar to the as.yearmon coercion functions provided.

As "yearmon" data is now explicitly declared to describe monthly data, this can b e exploited for

coercion to other time classes: either to coarser time scales such as "yearqtr" or to finer time

scales such as "Date", "POSIXct" or "POSIXlt" which by defa ult associate the fir st day within

a month with a "yearmon" observatio n. Adding a format and as.character method produces

human readable character representations of "yearmon" data and Ops and MATCH methods complete

the methods needed for conveniently working with monthly data in zoo. Note, that all o f these

methods are very simple and rather obvious (a s can be seen in the zoo sources), but prove very

helpful in the following examples.

First, we create a regular series zr3 with "yearmon" index which leads to improved printing

compared to the regular series zr1 and zr2 from Section 2.2.

R> zr3 <- zooreg(rnorm(9), start = yearmon(2000), frequency = 12)

R> zr3

Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun 2000

-0.30969096 0.08699142 -0.64837101 -0.62786277 -0.61932674 -0.95506154

Jul 2000 Aug 2000 Sep 2000

-1.91736406 0.38108885 1.51405511

This could be aggregated to quarterly da ta v ia

R> aggregate(zr3, as.yearqtr, mean)

2000 Q1 2000 Q2 2000 Q3

-0.2903569 -0.7340837 -0.0074067

The index can easily be transformed to "Date", the default being the first day of the month but

which can also be changed to the last day of the month.

R> as.Date(index(zr3))

[1] "2000-01-01" "2000-02-01" "2000-03-01" "2000-04-01" "2000-05-01"

[6] "2000-06-01" "2000-07-01" "2000-08-01" "2000-09-01"

R> as.Date(index(zr3), frac = 1)

[1] "2000-01-31" "2000-02-29" "2000-03-31" "2000-04-30" "2000-05-31"

[6] "2000-06-30" "2000-07-31" "2000-08-31" "2000-09-30"

Furthermore, "yearmon" indexes can easily be coerced to "POSIXct" such that the series could be

exp orted as a "its" or "irts" series.

R> index(zr3) <- as.POSIXct(index(zr3))

R> as.irts(zr3)

2000-01-01 00:00:00 GMT -0.3097

2000-02-01 00:00:00 GMT 0.08699

2000-03-01 00:00:00 GMT -0.6484

2000-04-01 00:00:00 GMT -0.6279

2000-05-01 00:00:00 GMT -0.6193

2000-06-01 00:00:00 GMT -0.9551

2000-07-01 00:00:00 GMT -1.917

2000-08-01 00:00:00 GMT 0.3811

2000-09-01 00:00:00 GMT 1.514

Achim Zeileis, Gabor Grothendieck 21

Again, this functionality makes switching between different time scales or index representations

particularly easy and zoo provides the user with the flexibility to adjust a certain index to his/her

problem of interest.

22 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

4. Summary and outlook

The package zoo provides an S3 class and methods for indexed totally ordered obs e rvations, such

as both regular and irregular time series. Its key design go als are independence of a particular

index class and compatibility with standard generics similar to the behaviour of the c orresponding

"ts" methods. This paper describes how these are implemented in zoo and illustrates the usage of

the methods for plotting, merging and binding, s e veral mathematical operations, extracting and

replacing data and index, coercion and NA handling.

An indexed object of clas s "zoo" can be thought of as data plus index where the data are essentially

vectors or matrices and the index can be a vector of (in principle) arbitrary class. For (weakly)

regular "zooreg" series, a "frequency" attribute is stored in addition. Therefore, objects of

classes "ts" , "its", "irts" and "timeSeries" can ea sily be transformed into "zoo" objects—

the reverse transformation is also possible provided that the index fulfills the restrictions of the

respective class. Hence, the "zoo" class can also be used as the basis for other classes of indexed

observations and more sp e cific functionality can be built on top of it. Furthermore, it bridges the

gap between irregular and regular series, facilitating operations such as NA handling compared to

"ts".

Whereas a lot of effort was put into achieving independence of a particular index class, the types

of data that can be indexed with "zoo" are currently limited to vectors and matrices, typically

containing numeric values. Although, there is some limited support available for indexed facto rs,

one important direction for future development of zoo is to add better support for other objects

that can also naturally b e indexed including specifically factors, data frames and lists.

Computational details

The results in this paper were obtained using R 2.1.0 with the packages zoo 1.0–0, strucchange

1.2–10, fCalendar 201.10060, tseries 0.9–2 7 and DAAG 0.46. R itself and all packages used are

available from CRAN at http://CRAN.R-project.org/ .

References

Giles Heywood. its: Irregular Time Series. Portfolio & Risk Adviso ry Group and Commerzbank

Securities, 2004. R package version 1.0.4.

John Maindonald and W. John Braun. DAAG: Data Analysis and Graphics, 2004. URL

http://www.stats.uwo.ca/DAAG/ . R package version 0.46 .

R Development Core Team. R: A Language and Environment for Statistical Computing. R Foun-

dation for Statistical Computing, Vienna, Austria, 2005. URL http://www.R-project.org/ .

ISBN 3-900051-00-3.

Adrian Trapletti. tseries: Time Series Analysis and Computational Finance, 2005. R package

version 0.9-25.

Diethelm Wuertz. Rmetrics: An Environment and Software Collection for Teaching Financial

Engineering and Computational Finance, 2005. URL http://www.Rmetrics.org/ . R package

fCalendar, version 201.10059.

Achim Zeileis. Implementing a class of structural change tests: An econometric c omputing ap-

proach. Report 7, Department of Statistics and Mathematics, Wirtschaftsuniversit

¨

at Wien,

Research Report Series, July 2004. URL http://epub.wu-wien.ac.at/ .

Achim Zeileis , Friedrich Leisch, Kurt Hornik, and Christian Kleiber . strucchange: An R package

for testing for structural change in linear regres sion models. Journal of Statistical Software, 7

(2):1–38, 2002. URL http://www.jstatsoft.org/v07/i02/ .

Achim Zeileis, Gabor Grothendieck 23

A. Reference card

Creation

zoo(x, order.by) creation of a "zoo" object from the observations x (a vector

or a matrix) and an index order.by by which the observa-

tions are ordered.

For computations on arbitrary index classes, methods to

the following ge nric functions are assumed to work: combin-

ing c(), querying length length(), subsetting [, ordering

ORDER() and value matching MATCH(). For pretty print-

ing an as.character and/or index2char method might

be helpful.

Creation of regular series

zoo(x, order.by, freq) works as above but creates a "zooreg" object which inherits

from "zoo" if the frequency freq complies with the index

order.by. An as.numeric method has to be available for

the index class.

zooreg(x, start, end, freq) creates a "zooreg" series with a numeric index as above

and has (almost) the same interface as ts().

Standard methods

plot plotting

lines adding a "zoo" series to a plot

print printing

summary summarizing (co lumn-wis e )

str displaying structure of "zoo" objects

head, tail head and tail of "zoo" objects

Coercion

as.zoo coercion to "zoo" is available for objects of class "ts",

"its", "irts" (plus a defa ult method).

as.class .zoo coercion from "zoo" to other classes. Currently available

for class in "matrix", "vector" , "data.frame", "list",

"irts" , "its" and "ts".

is.zoo querying wether an object is of class "zoo"

Merging and binding

merge union, intersection, le ft join, right join along indexes

cbind column binding along the intersection of the index

c, rbind combining/row binding (indexes may not overlap)

aggregate compute summary statistics along a coarser grid of indexe s

Mathematical operations

Ops group generic functions performed along the intersection of

indexes

t transposing (coerces to "matrix" before)

cumsum compute (columnwise) cumulative quantities: sums cum-

sum(), products cumprod(), ma ximum cummax() , mini-

mum cummin().

24 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

Extracting and replacing data and index

index, time extract the index of a series

index<-, time<- replace the index of a series

coredata, coredata<- extract and replace the data associated with a "zoo" object

lag lagged observations

diff arithmetic a nd ge ometric differences

start, end querying start and end of a series

window, window<- subsetting of "zoo" objects using their index

NA handling

na.omit omit NAs

na.contiguous compute longest sequence of non-NA observations

na.locf impute NAs by carrying forward the last observation

na.approx impute NAs by interpolation

Rolling functions

rapply apply a function to rolling mar gin of a n a rray

rollmean more efficient functions for c omputing the rolling mean, me-

dian and maximum are rollmean(), rollmedian() and

rollmax(), respectively

Methods for regular series

is.regular checks whether a series is weakly (or str ic tly if strict =

TRUE) regular

frequency, deltat extracts the frequency or its reciprocal value respectively

from a se ries, for "zoo" series the functions try to determine

the regularity and fr equency in a data-driven way

cycle gives the position in the cycle of a regular series

... The LAI values for the six vines in lysimeters were averaged to receive one weekly value to represent the time series. Linear temporal interpolation was used to generate a daily time-series, using the "zoo" package in R (Zeileis and Grothendieck, 2005). ...

... Finally, a rolling RMSE analysis was performed, using a seven-day bandwidth consecutively calculating the errors for the previous seven days, to assess the weekly bias of the forecast values. Rolling functions are typically applied on ordered observations using a predefined window that is moved over the full sample period (Zeileis and Grothendieck, 2005). The rolling RMSE statistic was applied on a seven-day window since this is a commonly used time gap for irrigation decision making. ...

... The mean weekly RMSE values were calculated based on rolling RMSE performed for the forecast values of all seasons and for all models. This analysis was conducted by R using package "zoo" (Zeileis and Grothendieck, 2005). ...

Vineyard irrigation management relies on accurate assessment of crop evapotranspiration (ETc). ETc is affected by the by type of plant, its physiological properties, and meteorological parameters. Rapid measurement of these factors facilitates quantification of ETc and enables skilled decision-making for data-driven irrigation. Our main objective was to quantify the performance of different modeling approaches for forecasting seasonal ETc using meteorological and vegetative data (e.g., leaf area) from five consecutive growing seasons (2013–2017) of Vitis vinifera 'Cabernet Sauvignon' vines. Time series of ETc was acquired from water balance from vines grown in drainage lysimeters within the vineyard. ETc forecasts were generated for each season using twelve regression models: six linear and six non-linear multivariate adaptive regression spline (MARS) models. Each regression model constituted a unique combination of variables, some relying on crop coefficient (Kc) and others based on direct ETc forecasting. The models were trained using data from four growing seasons and compared via measures of coefficient of determination (R2), residual standard deviation, and coefficient of variation. Each model was then tested using ETc forecasts for a fifth growing season, and compared to the measured ETc values using correlation, root mean squared error (RMSE), and normalized RMSE. Finally, a mean-seasonal rolling RMSE with a window of 7 days was used to assess the accuracy of the different models. The results show a clear advantage to using non-linear modeling for ETc forecasting (average RMSE range of 0.81–1.05 vs. 0.64–0.71 mm day−1, respectively). Furthermore, direct forecasting and Kc-based methods yielded similar results, and all models benefited from the incorporation of leaf area data. Similar outcomes were found for the rolling RMSE analysis, with improved model accuracy credited to the inclusion of leaf area, especially early in the season. Our findings confirm that advanced algorithms promote site-specific and location-oriented irrigation management.

... (R Core Team, 2019), we were then able to extract three-dimensional positions from the list of XYZ coordinates. Missing coordinates between two known coordinates were interpolated using the package zoo (Zeileis & Grothendieck, 2005). ...

Research on diel vertical migration (DVM) is generally conducted at the population level, whereas few studies have focused on how individual animals behaviorally respond to threats when also having access to foraging opportunities. We utilized a 3D tracking platform to record the swimming behavior of Daphnia magna exposed to ultraviolet radiation (UVR) in the presence or absence of a food patch. We analyzed the vertical position of individuals before and during UVR exposure and found that the presence of food reduced the average swimming depth during both sections of the trial. Since UVR is a strong driver of zooplankton behavior, our results highlight that biotic factors, such as food patches, have profound effects on both the amplitude and the frequency of avoidance behavior. In a broader context, the trade-off between threats and food adds to our understanding of the strength and variance of behavioral responses to threats, including DVM.

... We use R for the data analysis (R Core Team, 2020a). The main packages are tidyverse (Wickham et al., 2019), ncdf4 (Pierce, 2019), ggplot2 15 (Wickham, 2016), raster (Hijmans, 2020), zoo, (Zeileis and Grothendieck, 2005), plyr (Wickham, 2011), and (Wickham et al., 2021). We use the nest R package (https://github.com/krehfeld/nest ...

The incorporation of water isotopologues into the hydrology of general circulation models (GCMs) facilitates the comparison between modelled and measured proxy data in paleoclimate archives. However, the variability and drivers of measured and modelled water isotopologues, and indeed the diversity of their representation in different models are not well constrained. Improving our understanding of this variability in past and present climates will help to better constrain future climate change projections and decrease their range of uncertainty. Speleothems are a precisely datable paleoclimate archive and provide well preserved (semi-)continuous multivariate isotope time series in the lower and mid-latitudes, and are, therefore, well suited to assess climate and isotope variability on decadal and longer timescales. However, the relationship between speleothem oxygen and carbon isotopes to climate variables also depends on site-specific parameters, and their comparison to GCMs is not always straightforward. Here we compare speleothem oxygen and carbon isotopic signatures from the Speleothem Isotopes Synthesis and AnaLysis database version 2 (SISALv2) to the output of five different water-isotope-enabled GCMs (ECHAM5-wiso, GISS-E2-R, iCESM, iHadCM3, and isoGSM) over the last millennium (850-1850 common era, CE). We systematically evaluate differences and commonalities between the standardized model simulation outputs. The goal is to distinguish climatic drivers of variability for both modelled and measured isotopes. We find strong regional differences in the oxygen isotope signatures between models that can partly be attributed to differences in modelled temperatures. At low latitudes, precipitation amount is the dominant driver for water isotope variability, however, at cave locations the agreement between modelled temperature variability is higher than for precipitation variability. While modelled isotopic signatures at cave locations exhibited extreme events coinciding with changes in volcanic and solar forcing, such fingerprints are not apparent in the speleothem isotopes, and may be attributed to the lower temporal resolution of speleothem records compared to the events that are to be detected. Using spectral analysis, we can show that all models underestimate decadal and longer variability compared to speleothems, although to varying extent. We found that no model excels in all analyzed comparisons, although some perform better than the others in either mean or variability. Therefore, we advise a multi-model approach, whenever comparing proxy data to modelled data. Considering karst and cave internal processes through e.g. isotope-enabled karst models may alter the variability in speleothem isotopes and play an important role in determining the most appropriate model. By exploring new ways of analyzing the relationship between the oxygen and carbon isotopes, their variability, and co-variability across timescales, we provide methods that may serve as a baseline for future studies with different models using e.g. different isotopes, different climate archives, or time periods.

... Analysis was undertaken using R [15]. Rolling averages over time were calculated using the 'rollmean' function of the zoo (v1.8.9) package in R [16]. A centred rolling window of 14 days was used for daily deaths and daily medical admissions and a window of 28 days for all other plots. ...

Background To better understand the impact of the COVID-19 pandemic on hospital healthcare, we studied activity in the emergency department (ED) and acute medicine department of a major UK hospital. Methods Electronic patient records for all adult patients attending ED ( n = 243,667) or acute medicine ( n = 82,899) during the pandemic (2020–2021) and prior year (2019) were analysed and compared. We studied parameters including severity, primary diagnoses, co-morbidity, admission rate, length of stay, bed occupancy, and mortality, with a focus on non-COVID-19 diseases. Results During the first wave of the pandemic, daily ED attendance fell by 37%, medical admissions by 30% and medical bed occupancy by 27%, but all returned to normal within a year. ED attendances and medical admissions fell across all age ranges; the greatest reductions were seen for younger adults in ED attendances, but in older adults for medical admissions. Compared to non-COVID-19 pandemic admissions, COVID-19 admissions were enriched for minority ethnic groups, for dementia, obesity and diabetes, but had lower rates of malignancy. Compared to the pre-pandemic period, non-COVID-19 pandemic admissions had more hypertension, cerebrovascular disease, liver disease, and obesity. There were fewer low severity ED attendances during the pandemic and fewer medical admissions across all severity categories. There were fewer ED attendances with common non-respiratory illnesses including cardiac diagnoses, but no change in cardiac arrests. COVID-19 was the commonest diagnosis amongst medical admissions during the first wave and there were fewer diagnoses of pneumonia, myocardial infarction, heart failure, cellulitis, chronic obstructive pulmonary disease, urinary tract infection and other sepsis, but not stroke. Levels had rebounded by a year later with a trend to higher levels of stroke than before the pandemic. During the pandemic first wave, 7-day mortality was increased for ED attendances, but not for non-COVID-19 medical admissions. Conclusions Reduced ED attendances in the first wave of the pandemic suggest opportunities for reducing low severity presentations to ED in the future, but also raise the possibility of harm from delayed or missed care. Reassuringly, recent rises in attendance and admissions indicate that any deterrent effect of the pandemic on attendance is diminishing.

The FtsLB complex is a key regulator of bacterial cell division, existing in either an off or on state which supports the activation of septal peptidoglycan synthesis. In Escherichia coli, residues known to be critical for this activation are located in a region near the C-terminal end of the periplasmic coiled-coil domain of FtsLB, raising questions about the precise role of this conserved domain in the activation mechanism. Here, we investigate an unusual cluster of polar amino acids found within the core of the FtsLB coiled coil. We hypothesized that these amino acids likely reduce the structural stability of the domain and thus may be important for governing conformational changes. We found that mutating these positions to hydrophobic residues increased the thermal stability of FtsLB but caused cell division defects, suggesting that the coiled-coil domain is a "detuned" structural element. In addition, we identified suppressor mutations within the polar cluster, indicating that the precise identity of the polar amino acids is important for fine-tuning the structural balance between the off and on states. We propose a revised structural model of the tetrameric FtsLB (named the "Y-model") in which the periplasmic domain splits into a pair of coiled-coil branches. In this configuration, the hydrophilic terminal moieties of the polar amino acids remain more favorably exposed to water than in the original four-helix bundle model ("I-model"). We propose that a shift in this architecture, dependent on its marginal stability, is involved in activating the FtsLB complex and triggering septal cell wall reconstruction.

A comprehensive understanding of the behaviours of the various geophysical processes requires, among others, detailed investigations across temporal scales. In this work, we propose a new time series feature compilation for advancing and enriching such investigations in a hydroclimatic context. This specific compilation can facilitate largely interpretable feature investigations and comparisons in terms of temporal dependence, temporal variation, "forecastability", lumpiness, stability, nonlinearity (and linearity), trends, spikiness, curvature and seasonality. Detailed quantifications and multifaceted characterizations are herein obtained by computing the values of the proposed feature compilation across nine temporal resolutions (i.e., the 1-day, 2-day, 3-day, 7-day, 0.5-month, 1-month, 2-month, 3-month and 6-month ones) and three hydroclimatic time series types (i.e., temperature, precipitation and streamflow) for 34-year-long time series records originating from 511 geographical locations across the continental United States. Based on the acquired information and knowledge, similarities and differences between the examined time series types with respect to the evolution patterns characterizing their feature values with increasing (or decreasing) temporal resolution are identified. To our view, the similarities in these patterns are rather surprising. We also find that the spatial patterns emerging from feature-based time series clustering are largely analogous across temporal scales, and compare the features with respect to their usefulness in clustering the time series at the various temporal resolutions. For most of the features, this usefulness can vary to a notable degree across temporal resolutions and time series types, thereby pointing out the need for conducting multifaceted time series characterizations for the study of hydroclimatic similarity.

  • Camilla Ugolini
  • Logan Mulroney
  • Adrien Leger
  • Tommaso Leonardi

The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested sub genomic RNAs used to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5′ cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.

  • Kyle Newton Kyle Newton
  • Dovi Kacev
  • Simon RO Nilsson
  • Lavinia Sheets

Zebrafish lateral line is an established model for hair cell organ damage, yet few studies link mechanistic disruptions to changes in biologically relevant behavior. We used larval zebrafish to determine how damage via ototoxic chemicals impact rheotaxis. Larvae were treated with CuSO4 or neomycin to disrupt lateral line function then exposed to water flow stimuli. Their swimming behavior was recorded, and DeepLabCut and SimBA software were used to track movements and classify rheotaxis behavior. Lateral line-disrupted fish performed rheotaxis, but they swam greater distances, for shorter durations, and with greater angular variance than controls. Further, spectral decomposition analyses demonstrated that lesioned fish exhibited toxin-specific behavioral profiles with distinct fluctuations in the magnitude, timing, and cross-correlation between changes in linear and angular movements. Our observations support that lateral-line input is needed for fish to perform rheotaxis efficiently in flow and reveals commonly used lesion methods have unique effects on behavior.

  • Kyle Beattie Kyle Beattie

Policy makers and mainstream news anchors have promised the public that the COVID-19 vaccine rollout worldwide would reduce symptoms, and thereby cases and deaths associated with COVID-19. While this vaccine rollout is still in progress, there is a large amount of public data available that permits an analysis of the effect of the vaccine rollout on COVID-19 related cases and deaths. Has this public policy treatment produced the desired effect? One manner to respond to this question can begin by implementing a Bayesian causal analysis comparing both pre- and post-treatment periods. This study analyzed publicly available COVID-19 data from OWID utlizing the R package CausalImpact to determine the causal effect of the administration of vaccines on two dependent variables that have been measured cumulatively throughout the pandemic: total deaths per million (y1) and total cases per million (y2). After eliminating all results from countries with p > 0.05, there were 128 countries for y1 and 103 countries for y2 to analyze in this fashion, comprising 145 unique countries in total (avg. p < 0.004). Results indicate that the treatment (vaccine administration) has a strong and statistically significant propensity to causally increase the values in either y1 or y2 over and above what would have been expected with no treatment. y1 showed an increase/decrease ratio of (+115/-13), which means 89.84% of statistically significant countries showed an increase in total deaths per million associated with COVID-19 due directly to the causal impact of treatment initiation. y2 showed an increase/decrease ratio of (+105/-16) which means 86.78% of statistically significant countries showed an increase in total cases per million of COVID-19 due directly to the causal impact of treatment initiation. Causal impacts of the treatment on y1 ranges from -19% to +19015% with an average causal impact of +463.13%. Causal impacts of the treatment on y2 ranges from -46% to +12240% with an average causal impact of +260.88%. Hypothesis 1 Null can be rejected for a large majority of countries. This study subsequently performed correlational analyses on the causal impact results, whose effect variables can be represented as y1.E and y2.E respectively, with the independent numeric variables of: days elapsed since vaccine rollout began (n1), total vaccination doses per hundred (n2), total vaccine brands/types in use (n3) and the independent categorical variables continent (c1), country (c2), vaccine variety (c3). All categorical variables showed statistically significant (avg. p: < 0.001) postive Wilcoxon signed rank values (y1.E V:[c1 3.04; c2: 8.35; c3: 7.22] and y2.E V:[c1 3.04; c2: 8.33; c3: 7.19]). This demonstrates that the distribution of y1.E and y2.E was non-uniform among categories. The Spearman correlation between n2 and y2.E was the only numerical variable that showed statistically significant results (y2.E ~ n2: rho: 0.34 CI95%[0.14, 0.51], p: 4.91e-04). This low positive correlation signifies that countries with higher vaccination rates do not have lower values for y2.E, slightly the opposite in fact. Still, the specifics of the reasons behind these differences between countries, continents, and vaccine types is inconclusive and should be studied further as more data become available. Hypothesis 2 Null can be rejected for c1, c2, c3 and n2 and cannot be rejected for n1, and n3. The statistically significant and overwhelmingly positive causal impact after vaccine deployment on the dependent variables total deaths and total cases per million should be highly worrisome for policy makers. They indicate a marked increase in both COVID-19 related cases and death due directly to a vaccine deployment that was originally sold to the public as the "key to gain back our freedoms." The effect of vaccines on total cases per million and its low positive association with total vaccinations per hundred signifies a limited impact of vaccines on lowering COVID-19 associated cases. These results should encourage local policy makers to make policy decisions based on data, not narrative, and based on local conditions, not global or national mandates. These results should also encourage policy makers to begin looking for other avenues out of the pandemic aside from mass vaccination campaigns. Some variables that could be included in future analyses might include vaccine lot by country, the degree of prevalence of previous antibodies against SARS-CoV or SARS-CoV-2 in the population before vaccine administration begins, and the Causal Impact of ivermectin on the same variables used in this study.

  • Achim Zeileis
  • Friedrich Leisch Friedrich Leisch
  • Kurt Hornik
  • Christian Kleiber

This paper introduces ideas and methods for testing for structural change in linear regression models and presents how these have been realized in an R package called strucchange. It features tests from the generalized fluctuation test framework as well as from the F test (Chow test) framework. Extending standard significance tests it contains methods to fit, plot and test empirical fluctuation processes (like CUSUM, MOSUM and estimatesbased processes) on the one hand and to compute, plot and test sequences of F statistics with the supF , aveF and expF test on the other. Thus, it makes powerful tools available to display information about structural changes in regression relationships and to assess their significance. Furthermore it is described how incoming data can be monitored online. Keywords: structural change, CUSUM, MOSUM, recursive estimates, moving estimates, online monitoring, R, S. 1

  • Achim Zeileis

The implementation of a recently suggested class of structural change tests, which test for parameter instability in general parametric models, in the R language for statistical computing is described: Focus is given to the question how the conceptual tools can be translated into computational tools that reflect the properties and flexibility of the underlying econometric methodology while being numerically reliable and easy to use. More precisely, the class of generalized M-fluctuation tests is implemented in the package strucchange providing easily extensible functions for computing empirical fluctuation processes and automatic tabulation of critical values for a functional capturing excessive fluctuations. Traditional significance tests are supplemented by graphical methods which do not only visualize the result of the testing procedure but also convey information about the nature and timing of the structural change and which component of the parametric model is affected by it.

yearmon" indexes can easily be coerced to "POSIXct" such that the series could be exported as a "its" or "irts" series

  • Furthermore

Furthermore, "yearmon" indexes can easily be coerced to "POSIXct" such that the series could be exported as a "its" or "irts" series.

its: Irregular Time Series. Portfolio & Risk Advisory Group and Commerzbank Securities

  • Giles Heywood

Giles Heywood. its: Irregular Time Series. Portfolio & Risk Advisory Group and Commerzbank Securities, 2004. R package version 1.0.4.

R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing

  • R Development
  • Core Team

R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2005. URL http://www.R-project.org/. ISBN 3-900051-00-3.

Posted by: bagsnearme.blogspot.com

Source: https://www.researchgate.net/publication/5142903_zoo_S3_Infrastructure_for_Regular_and_Irregular_Time_Series