If you’re a user of R then check out this package that interfaces to the X-13-ARIMA-SEATS executable.
This is the last in a series of posts which looked at presenting seasonal information in a different way. Part 1 (Standard) generated a standard seasonal and irregular chart, Part 2 (Wheel) took the same information using the polar coordinate approach, Part 3 (Clock) switched the polar coordinate approach around, and now this final post, Part 4 (Flower) looks at something slightly different.
I had thought to compartmentalise the periods around the circle rather than have them overlapping, as hopefully by doing this it would make it all a bit clearer. After trying for a while I couldn’t replicate this using ggplot2. So I have had to use a specific function which uses the radial.plot function in the plotrix package. The full code is rather clunky so rather than post it in all its glory, it is based on building up the components (eg. period information) in a few loops using commands like:
radial.plot(years,radial.pos=rangeofmean,labels="",rp.type="r", line.col=8,lwd=3,add=TRUE) radial.plot(years,radial.pos=rangeofseasonalfactor,labels="",rp.type="s", point.col=colorinput[i],point.symbols=16,lwd=1,add=TRUE)
Using the same dataset as previously, the following plot can be generated. The linear grey line is the average seasonal factor for the month, within each month (compartment) the solid circles are the seasonal factor for each year with the most recent year towards the edge of the graph, and the open circles are the seasonal factor times by the irregular component for each year.:
At least to me(!) it looks like a flower with the different compartments representing the petals… but maybe you need a good imagination.
Anyway, at the end of this exercise there was one dataset, with (at least) four different ways to present the information. Having been used to using the standard approach for a long time, I think that is probably easier to see what is happening (and probably why it is the standard approach!). I had thought the circular representation might be useful but it can obviously distort the estimates which can make it difficult to determine precisely the movements in the seasonal factors. They may have some use to pretty things up in certain situations.
These series of posts are looking at displaying seasonal related information derived from a seasonal adjustment. Part 1 covered how to generate a standard seasonal and irregular chart, while part 2 took the same information and looked at using the polar coordinate approach.
This post builds on the polar coordinate approach and uses a parameter within coord_polar of theta=”y” to switch around the coordinates.
Using the previous dataset and commands we can now use this different parameter of theta=”y” within coord_polar.
pout <- p1 + facet_grid(.~period) + facet_wrap(~period,ncol=3) + coord_polar(theta="y") + scale_color_manual(values=c("#666666", "#FF3300", "#0033FF" ))
Typing pout in R then gives a different polar coordinate version of the data:
I’ve called this a Clock plot, as the seasonal factors look like hands of a clock! In this example, going around the perimeter of the circle in a clockwise direction from 12 o’clock (lowest value) represents increases in the magnitude of the seasonal factor. Going out from the centre of the circle along the radius represents an increase in the years, e.g. most recent years are near the edge of the circle.
A different representation can now put all of this data on one plot. The code for the single clock plot is given by:
p1 <- ggplot(bigdfsi,aes(year,dat,color=period)) + geom_point(data=bigdfsi[bigdfsi[,"label"]=="SI",],size=I(2), alpha=I(alphavec)) + opts(legend.title = theme_blank(), axis.title.y=theme_blank(), axis.text.x=theme_blank()) + geom_line(data=bigdfsi[bigdfsi[,"label"]=="Seasonal",],size=I(1.5), alpha=I(alphavec),aes(group=period)) + geom_line(data=bigdfsi[bigdfsi[,"label"]=="Mean",],size=I(1.5), alpha=I(alphavec),aes(group=period))
where a key parameter difference here compared to the earlier version is the use of aes(group=period) option. Setting
pout <- p1 + coord_polar(theta="y") + scale_color_manual(values=colorit)
and then typing pout in R gives a single polar coordinate version of the data:
This uses the default color palette within ggplot2 to distinguish between the different periods. We could also choose to plot only particular periods by limiting the dataset.
To help distinguish between the different months, the scale_color_manual option can also be used to give an appropriate choice of colors.
colorit <- c("#8DD3C7", "#FFFFB3", "#BEBADA","#FB8072", "#80B1D3", "#FDB462", "#B3DE69", "#FCCDE5", "#D9D9D9","#BC80BD", "#CCEBC5", "#FFED6F") colorit <- colorRampPalette(brewer.pal(9,"Blues"))(20)[9:20] colorit <- colorRampPalette(brewer.pal(9,"Reds"))(20)[9:20] colorit <- colorRampPalette(brewer.pal(9,"Greens"))(20)[9:20]
pout <- p1 + coord_polar(theta="y") + scale_color_manual(values=colorit)
A previous post focused on using ggplot2 within R to generate a standard seasonal and irregular chart.
The good thing about ggplot2 is that it is built to be flexible and can be modified by adding layers to the plots. While experimenting with this, I decided to try an alternative representation of the previous data using the options of facet_grid and coord_polar.
In general, there seems to be “a thing” that polar coordinates are bad and should be avoided. I guess it will depend on how you want to interpret the data. I’ve ignored this advice and decided that polar coordinates may have some value to help visualise how the seasonality changes between periods, and also over time.
Using the previous dataset and commands we can now build on this with the following commands:
pout <- p1 + facet_grid(.~period) + facet_wrap(~period,ncol=3) + coord_polar() + xlab(paste("Years:",min(bigdfsi[,4]),"to", max(bigdfsi[,4]))) + scale_color_manual(values=c("#666666", "#FF3300", "#0033FF" ))
Here, the parameter of facet_wrap has split the data by period into 3 columns, and the use of coord_polar has turned all the data into polar coordinates. Typing pout in R then gives the following plot:
where going around the perimeter of the circle represents the years, and going out from the centre of the circle along the radius represents an increase in the magnitude of the seasonal factor. So in this case, when all periods are plotted on the same graph, the size of the circle indicates how big or small the seasonal factor is, and by comparing the seasonal factor (red line) to the mean this can give an indication of how the seasonal factor has changed over time.
Looking at two months separately provides a bit more detail:
So in February the seasonal effect is quite large but has not changed much over time. In December, the seasonal effect is smaller but it has changed much over time, as represented by the mis-shaped circle.
Given this looks like a wheel (in the well behaved cases), the best name for this is wheel plot.
When analysing data one of the best and most obvious things to do first is to plot it. This is simple and easy advice and you can plot data in many different ways. In recent years there has been what seems like an explosion in the philosophy of data visualisation (e.g. datavis, infographics, or any other fancy name). Suddenly anyone can turn themselves into a data visualisation wizard by taking some data, making some trendy little circles or boxes with curved edges, adding a few random colors, and then choosing a hand-written font and bingo! They (and everyone else) now think they are experts in finding patterns and analysing information.
To get on the band-wagon I thought it would be useful to illustrate one of the standard time series plots that can be used to help assess the properties of seasonality in a time series. These may not be as eye-candied as a typical data visualisation but at least they might serve a useful purpose.
To illustrate this I’ve download some data and also used R with the ggplot2 package. In doing this I found two frustrating things with using ggplot2. The first is that you need your data in a good format, which is typically an R dataset with all the variables you will need. In reality this can take the longest amount of time to sort out! The second is the logic of the whole ggplot2 thing and the terminology that has been used for the names and functions. It has taken me a lot of playing around to get to grips with it but once the basics are sorted it can produce nice looking graphics relatively easy.
For this illustration I’ve used the Australian Labour Force unemployment persons from the Australian Bureau of Statistics (ABS). You can get this data from here. One of the good things that the ABS do is that they publish the original, seasonally adjusted and the trend estimates. Having these components immediately available makes things easier as there is some small data derivation we need to do. In this example the irregular component has been derived by taking the seasonally adjusted data (which by definition is trend multiplied by the irregular component) divided by the published trend estimate to give the irregular component.
Using this dataset, and after some intense data manipulation, I finally got it into this form within R:
label dat period year Seasonal 1.1241824 2 1978 Seasonal 1.0386693 3 1978 Seasonal 0.9935950 4 1978 Seasonal 0.9956933 5 1978 Seasonal 0.9683221 6 1978 Seasonal 0.9499924 7 1978 Seasonal 0.9494997 8 1978 Seasonal 0.9769546 9 1978 Seasonal 0.9321097 10 1978 Seasonal 0.9095337 11 1978 Seasonal 1.0728631 12 1978 … Seasonal 1.0088955 5 2012 Seasonal 0.9573871 6 2012 Seasonal 0.9326108 7 2012 Irregular 1.0342590 2 1978 … Irregular 1.0053110 7 2012 SI 1.1626958 2 1978 … SI 0.9375638 7 2012 Mean 1.1168443 2 1978 … Mean 0.9378250 7 2012
Note: The dataframe goes from February 1978 up to and including July 2012. The SI label reflects the seasonal multiplied by the irregular component.
Now that we have this, what can we use it for?
The best way to assess the seasonality in a monthly or quarterly economic time series is to use a seasonal and irregular chart. This groups the data according to periods, and plots the data within each period based on the yearly information. This will become clearer with the example below.
The benefit of this approach is that it highlights a little known or even acknowledged point by the hack analysts. Seasonality evolves over time and any estimation method for seasonal adjustment should capture this. A plot like the seasonal and irregular chart shows clearly how the estimates for the seasonality evolve over time. Note: this is why it is very very bad to use annual differences in the original estimates if you have any type of seasonality, particularly if it is evolving a lot.
Now for some R code. The following is probably not the most optimal code, but it does the job. Feel free to leave a comment if you can spot any obvious improvements! In this example the dataset is the data from above.
p1 <- ggplot(dataset,aes(year,dat,color=label)) + geom_point(data=dataset[dataset[,"label"]=="SI",],size=I(2),alpha=I(0.6)) + geom_line(data=dataset[dataset[,"label"]=="Seasonal",],size=I(1.5),alpha=I(0.9)) + geom_line(data=dataset[dataset[,"label"]=="Mean",],size=I(1.5),alpha=I(0.4)) + opts(legend.title = theme_blank(), axis.title.y=theme_blank(), axis.text.x=theme_blank()) pout <- p1 + facet_grid(. ~ period) + xlab(paste("Years:",min(dataset[,4]), "to", max(dataset[,4]))) + scale_color_manual(values=c("#666666", "#FF3300", "#0033FF" ))
Typing “pout” in R then gives the following standard seasonal and irregular chart:
By grouping according to the periods (e.g. months or quarters), then each year can be displayed for each type of period. In this case, each blue dot represents a yearly estimate for the seasonal times irregular (blue), and the red line shows how the seasonal factor changes over time. The grey line gives the mean of the seasonal component over the whole time period.
To limit it to particular periods, it then becomes easy by just modifying the dataset. e.g.
dataset <- dataset[(dataset[,3]=="Feb")|(dataset[,3]=="Dec"),]
And re-running the code above to re-generate p1, and then pout gives:
For an analyst, this chart allows you to assess how well the seasonal factor is coping. What you could expect to see is that the seasonal factor (red line) should wander nicely through the seasonal and irregular time points (blue dots). If the seasonal factor is inadequate, then you could see blue dots show some sort of pattern, e.g. all above the seasonal factor, or a systematic occurence such as leap years or timing of Easter. If the seasonal and irregular time points are consistently above the seasonal factor, this indicates residual seasonality in the seasonally adjusted estimates which is a bad thing. These type of patterns are more appropriately tested using regression methods, but some of these type of impacts can show up visually if you know what you’re looking for.
In this example, I’m not too worried about the actual data and giving an interpretation. In this case because some of the data has been indirectly derived, different estimates for the seasonal and irregular data could be derived if the time series was directly seasonally adjusted.
This example was more about showing how R and ggplot2 can be used to assess the outputs. In some coming posts I’ll build on this example and show some different representations of the same outputs using ggplot2.
The United States reported their Retail trade statistics the other day (July 16 2012). And quite rightly this picked up a bit of press where it was reported that they fell for the third month in a row. As is usually the case, these falls refer to changes in the seasonally adjusted estimates. And three falls in a row is starting to look like something bad for all those retailers (and perhaps the wider economy).
But given that the seasonally adjusted estimates still, by definition, contain a degree of volatility and also an underlying trend, we can go one better and derive our own smoothed estimate of the seasonally adjusted estimates. This will help us cut through the volatility and check out the underlying direction of the data.
We derived a trend estimate in the following way.
- Downloaded the data from here: http://www.census.gov/retail/marts/www/timeseries.html
- Plugged them into R (statistical package)
- Applied a 13 term Henderson filter to the full seasonally adjusted data to generate a trend estimate
- Used the ggplot2 package in R, which produces very nice plots (but can take some effort to get the data into the right format, e.g. a dataframe with all the right bits)
- And we get the following picture with a trend line…
I’ll leave the interpretation to the so called experts but it could be that with these three falls in a row in the seasonally adjusted estimates, retail activity in the US has started to reach a turning point. But. And this is the big but. We’ll need more data to make sure. This is because it is difficult to understand if what we are seeing is due to random variation or a change in direction of the underlying trend.
Just for completeness, the following table gives the one month percentage change in the different estimates. You can see that the one month change in the seasonally adjusted estimates can jump around, but the one month change in the trend is cutting through this noise and indicating a possible turning point.
|Nov 2011||Dec 2011||Jan 2012||Feb 2012||Mar 2012||Apr 2012||May 2012||Jun 2012|
Other approaches could of course be used with different filters being applied and these would give slightly different results depending on the type and length of the filter used. It would have also been more useful if there was an official estimate of the trend as it would’ve saved some time as this is can be produced as a by-product of the seasonal adjustment process. One good thing about the data that can be downloaded from the census site is the availability of the seasonal factors, and also the sampling variability of the estimates. This is something you don’t often see being produced. So this is a big plus to have.
Also note that there are a few different estimates floating around in the dataset, particularly: advance estimates, preliminary estimates, revised estimates, and then suppressed and also not available. So this can potentially be a bit confusing as each of these estimates will have different characteristics. This is something to keep in mind if you’re grabbing the latest information from any data source is that it can often be revised as new data becomes available. Actually – this is really a good thing as it means that we at least have the best, latest and most up-to-date information.