| 1 | | |
| 2 | | {{{ |
| 3 | | #!html |
| 4 | | <h1 style="background-color: #D3D3D3; font-weight: bold; font-size: 16pt; padding: 4px; border: 1px solid black;"> |
| 5 | | TimeSeries SciKit |
| 6 | | </h1> |
| 7 | | }}} |
| 8 | | |
| 9 | | [[PageOutline]] |
| 10 | | |
| 11 | | = Introduction = |
| 12 | | |
| 13 | | '''Please note that this package is still experimental and subject to API changes.''' |
| 14 | | |
| 15 | | The `TimeSeries` scikits module provides classes and functions for manipulating, reporting, and plotting time series of various frequencies. |
| 16 | | |
| 17 | | = License = |
| 18 | | |
| 19 | | The `TimeSeries` scikit is free for both commercial and non-commercial use, under the BSD terms. [http://svn.scipy.org/svn/scikits/trunk/timeseries/license.txt Click here for license details]. |
| 20 | | |
| 21 | | = Requirements = |
| 22 | | In order to use the `TimeSeries` package, you need to have the following packages installed: |
| 23 | | |
| 24 | | * [http://www.python.org/download/ Python 2.4 or later] |
| 25 | | * [http://pypi.python.org/pypi/setuptools setuptools] : scikits is a [http://peak.telecommunity.com/DevCenter/setuptools#namespace-packages namespace package], and as a result every scikit requires setuptools to be installed to function properly. |
| 26 | | * [http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=175103 numpy 1.1.0 or later] |
| 27 | | |
| 28 | | == Optional (but recommended) == |
| 29 | | * [http://sourceforge.net/project/showfiles.php?group_id=27747 scipy] : Some of the lib sub-modules (interpolate, moving_funcs) make use of scipy functions. |
| 30 | | * [http://matplotlib.sourceforge.net matplotlib] : matplotlib is required for time series plotting. |
| 31 | | |
| 32 | | = Setup = |
| 33 | | The module includes a setup script which you can use in the standard python manner to compile the C code. If you have difficulty installing, please ask for assistance on the [http://projects.scipy.org/mailman/listinfo/scipy-user scipy-user mailing list]. |
| 34 | | |
| 35 | | The timeseries module itself is currently only through subversion (http://svn.scipy.org/svn/scikits/trunk/timeseries). To install it, run {{{python setup.py install}}} in the directory you checked out the source code to. |
| 36 | | |
| 37 | | If you are using Windows and are having trouble compiling the module, please see the following page in the cookbook: [http://www.scipy.org/Cookbook/CompilingExtensionsOnWindowsWithMinGW Compiling Extensions on Windows] |
| 38 | | |
| 39 | | The current plan is to begin doing official releases and distributing windows binaries once an official release of numpy has been made which includes the new version of masked array. In the mean time, please bear with us. |
| 40 | | |
| 41 | | = Dates = |
| 42 | | Even if you have no use for time series in general, you may still find the Date class contained in the module quite useful. A `Date` object combines some date and/or time related information with a given frequency. You can picture the frequency as the unit into which the date is expressed. For example, we can create dates in the following manner: |
| 43 | | |
| 44 | | {{{ |
| 45 | | #!python numbers=disable |
| 46 | | >>> # The following imports are assumed throughout the documentation |
| 47 | | >>> import numpy as np |
| 48 | | >>> import numpy.ma as ma |
| 49 | | >>> import datetime |
| 50 | | >>> import scikits.timeseries as ts |
| 51 | | >>> |
| 52 | | >>> D = ts.Date(freq='D', year=2007, month=1, day=1) |
| 53 | | >>> M = ts.Date(freq='M', year=2007, month=1) |
| 54 | | >>> Y = ts.Date(freq='A', year=2007) |
| 55 | | }}} |
| 56 | | Observe that you only need to specify as much information as is relevant to the frequency. The importance of the frequency will become clearer later on. |
| 57 | | |
| 58 | | ~-A more technical note: `Date` objects are internally stored as integers. The conversion to integers and back is controlled by the frequency. In the example above, the internal representation of the three objects `D`, `M` and `Y` are 732677, 24073 and 2007, respectively. -~ |
| 59 | | |
| 60 | | == Construction of a `Date` object == |
| 61 | | Several options are available to construct a Date object explicitly. In each case, the `frequency` argument must be given. Valid frequency specifications are given in the section [#Frequencies Frequencies] below. |
| 62 | | |
| 63 | | * Give appropriate values to any of the `year`, `month`, `day`, `quarter`, `hour`, `minute`, `second` arguments. |
| 64 | | {{{ |
| 65 | | #!python numbers=disable |
| 66 | | >>> ts.Date(freq='Q',year=2004,quarter=3) |
| 67 | | <Q : 2004Q3> |
| 68 | | >>> ts.Date(freq='D',year=2001,month=1,day=1) |
| 69 | | <D : 01-Jan-2001> |
| 70 | | }}} |
| 71 | | * Use the `string` keyword. |
| 72 | | {{{ |
| 73 | | #!python numbers=disable |
| 74 | | >>> ts.Date('D', string='2007-01-01') |
| 75 | | <D : 01-Jan-2007> |
| 76 | | }}} |
| 77 | | * Use the `datetime` keyword with an existing `datetime.datetime` object. |
| 78 | | {{{ |
| 79 | | #!python numbers=disable |
| 80 | | >>> ts.Date('D', datetime=datetime.datetime.now()) |
| 81 | | }}} |
| 82 | | * Use the `value` keyword and provide an integer representation of the date. |
| 83 | | {{{ |
| 84 | | #!python numbers=disable |
| 85 | | >>> ts.Date('D', value=732677) |
| 86 | | <D : 01-Jan-2007> |
| 87 | | }}} |
| 88 | | |
| 89 | | == Frequencies == |
| 90 | | For any functions or class constructors taking a frequency argument, the frequency can be specified in one of two ways: using a valid string representation of the frequency, or using the integer frequency constants. The constants can be found in the timeseries.const sub-module. The following table lists the frequency constants and their valid string aliases. |
| 91 | | || '''CONSTANT''' || '''String aliases (case insensitive)''' || |
| 92 | | '''For annual frequencies, "Year" is determined by where the last month of the year falls.''' |
| 93 | | || '''''FR_ANN''''' || 'A', 'Y', 'ANNUAL', 'ANNUALLY', 'YEAR', 'YEARLY' || |
| 94 | | || '''''FR_ANNDEC''''' || 'A-DEC', 'A-December', 'Y-DEC', 'ANNUAL-DEC', etc... (annual frequency with December year end, equivalent to FR_ANN) || |
| 95 | | || '''''FR_ANNNOV''''' || 'A-NOV', 'A-NOVEMBER', 'Y-NOVEMBER', 'ANNUAL-NOV', etc... (annual frequency with November year end) || |
| 96 | | || '''''FR_ANNOCT''''' || 'A-OCT', 'A-OCTOBER', 'Y-OCTOBER', 'ANNUAL-OCT', etc... (annual frequency with October year end) || |
| 97 | | || '''''FR_ANNSEP''''' || 'A-SEP', 'A-SEPTEMBER', 'Y-SEPTEMBER', 'ANNUAL-SEP', etc... (annual frequency with September year end) || |
| 98 | | || '''''FR_ANNAUG''''' || 'A-AUG', 'A-AUGUST', 'Y-AUGUST', 'ANNUAL-AUG', etc... (annual frequency with August year end) || |
| 99 | | || '''''FR_ANNJUL''''' || 'A-JUL', 'A-JULY', 'Y-JULY', 'ANNUAL-JUL', etc... (annual frequency with July year end) || |
| 100 | | || '''''FR_ANNJUN''''' || 'A-JUN', 'A-JUNE', 'Y-JUNE', 'ANNUAL-JUN', etc... (annual frequency with June year end) || |
| 101 | | || '''''FR_ANNMAY''''' || 'A-MAY', 'Y-MAY', 'YEARLY-MAY', 'ANNUAL-MAY', etc... (annual frequency with May year end) || |
| 102 | | || '''''FR_ANNAPR''''' || 'A-APR', 'A-APRIL', 'Y-APRIL', 'ANNUAL-APR', etc... (annual frequency with April year end) || |
| 103 | | || '''''FR_ANNMAR''''' || 'A-MAR', 'A-MARCH', 'Y-MARCH', 'ANNUAL-MAR', etc... (annual frequency with March year end) || |
| 104 | | || '''''FR_ANNFEB''''' || 'A-FEB', 'A-FEBRUARY', 'Y-FEBRUARY', 'ANNUAL-FEB', etc... (annual frequency with February year end) || |
| 105 | | || '''''FR_ANNJAN''''' || 'A-JAN', 'A-JANUARY', 'Y-JANUARY', 'ANNUAL-JAN', etc... (annual frequency with January year end) || |
| 106 | | '''For the following quarterly frequencies, "Year" is determined by where the last quarter of the current group of quarters ENDS''' |
| 107 | | || '''''FR_QTR''''' || 'Q', 'QUARTER', 'QUARTERLY' || |
| 108 | | || '''''FR_QTREDEC''''' || 'Q-DEC', 'QTR-December', 'QUARTERLY-DEC', etc... (quarterly frequency with December year end, equivalent to FR_QTR) || |
| 109 | | || '''''FR_QTRENOV''''' || 'Q-NOV', 'QTR-NOVEMBER', 'QUARTERLY-NOV', etc... (quarterly frequency with November year end) || |
| 110 | | || '''''FR_QTREOCT''''' || 'Q-OCT', 'QTR-OCTOBER', 'QUARTERLY-OCT', etc... (quarterly frequency with October year end) || |
| 111 | | || '''''FR_QTRESEP''''' || 'Q-SEP', 'QTR-SEPTEMBER', 'QUARTERLY-SEP', etc... (quarterly frequency with September year end) || |
| 112 | | || '''''FR_QTREAUG''''' || 'Q-AUG', 'QTR-AUGUST', 'QUARTERLY-AUG', etc... (quarterly frequency with August year end) || |
| 113 | | || '''''FR_QTREJUL''''' || 'Q-JUL', 'QTR-JULY', 'QUARTERLY-JUL', etc... (quarterly frequency with July year end) || |
| 114 | | || '''''FR_QTREJUN''''' || 'Q-JUN', 'QTR-JUNE', 'QUARTERLY-JUN', etc... (quarterly frequency with June year end) || |
| 115 | | || '''''FR_QTREMAY''''' || 'Q-MAY', 'QTR-MAY', 'QUARTERLY-MAY', etc... (quarterly frequency with May year end) || |
| 116 | | || '''''FR_QTREAPR''''' || 'Q-APR', 'QTR-APRIL', 'QUARTERLY-APR', etc... (quarterly frequency with April year end) || |
| 117 | | || '''''FR_QTREMAR''''' || 'Q-MAR', 'QTR-MARCH', 'QUARTERLY-MAR', etc... (quarterly frequency with March year end) || |
| 118 | | || '''''FR_QTREFEB''''' || 'Q-FEB', 'QTR-FEBRUARY', 'QUARTERLY-FEB', etc... (quarterly frequency with February year end) || |
| 119 | | || '''''FR_QTREJAN''''' || 'Q-JAN', 'QTR-JANUARY', 'QUARTERLY-JAN', etc... (quarterly frequency with January year end) || |
| 120 | | '''For the following quarterly frequencies, "Year" is determined by where the first quarter of the current group of quarters STARTS''' |
| 121 | | || '''''FR_QTRSDEC''''' || 'Q-S-DEC', 'QTR-S-December', etc... (quarterly frequency with December year end) || |
| 122 | | || '''''FR_QTRSNOV''''' || 'Q-S-NOV', 'QTR-S-NOVEMBER', etc... (quarterly frequency with November year end) || |
| 123 | | || '''''FR_QTRSOCT''''' || 'Q-S-OCT', 'QTR-S-OCTOBER', etc... (quarterly frequency with October year end) || |
| 124 | | || '''''FR_QTRSSEP''''' || 'Q-S-SEP', 'QTR-S-SEPTEMBER', etc... (quarterly frequency with September year end) || |
| 125 | | || '''''FR_QTRSAUG''''' || 'Q-S-AUG', 'QTR-S-AUGUST', etc... (quarterly frequency with August year end) || |
| 126 | | || '''''FR_QTRSJUL''''' || 'Q-S-JUL', 'QTR-S-JULY', etc... (quarterly frequency with July year end) || |
| 127 | | || '''''FR_QTRSJUN''''' || 'Q-S-JUN', 'QTR-S-JUNE', etc... (quarterly frequency with June year end) || |
| 128 | | || '''''FR_QTRSMAY''''' || 'Q-S-MAY', 'QTR-S-MAY', etc... (quarterly frequency with May year end) || |
| 129 | | || '''''FR_QTRSAPR''''' || 'Q-S-APR', 'QTR-S-APRIL', etc... (quarterly frequency with April year end) || |
| 130 | | || '''''FR_QTRSMAR''''' || 'Q-S-MAR', 'QTR-S-MARCH', etc... (quarterly frequency with March year end) || |
| 131 | | || '''''FR_QTRSFEB''''' || 'Q-S-FEB', 'QTR-S-FEBRUARY', etc... (quarterly frequency with February year end) || |
| 132 | | || '''''FR_QTRSJAN''''' || 'Q-S-JAN', 'QTR-S-JANUARY', etc... (quarterly frequency with January year end) || |
| 133 | | |
| 134 | | || '''''FR_MTH''''' || 'M', 'MONTH', 'MONTHLY' || |
| 135 | | || '''''FR_WK''''' || 'W', 'WEEK', 'WEEKLY' || |
| 136 | | || '''''FR_WKSUN''''' || 'W-SUN', 'WEEK-SUNDAY', 'WEEKLY-SUN', etc... (weekly frequency with Sunday being the last day of the week, equivalent to FR_WK) || |
| 137 | | || '''''FR_WKSAT''''' || 'W-SAT', 'WEEK-SATURDAY', 'WEEKLY-SUN', etc... (weekly frequency with Saturday being the last day of the week) || |
| 138 | | || '''''FR_WKFRI''''' || 'W-FRI', 'WEEK-FRIDAY', 'WEEKLY-FRI', etc... (weekly frequency with Friday being the last day of the week) || |
| 139 | | || '''''FR_WKTHU''''' || 'W-THU', 'WEEK-THURSDAY', 'WEEKLY-THU', etc... (weekly frequency with Thursday being the last day of the week) || |
| 140 | | || '''''FR_WKWED''''' || 'W-WED', 'WEEK-WEDNESDAY', 'WEEKLY-WED', etc... (weekly frequency with Wednesday being the last day of the week) || |
| 141 | | || '''''FR_WKTUE''''' || 'W-TUE', 'WEEK-TUESDAY', 'WEEKLY-TUE', etc... (weekly frequency with Tuesday being the last day of the week) || |
| 142 | | || '''''FR_WKMON''''' || 'W-MON', 'WEEK-MONDAY', 'WEEKLY-MON', etc... (weekly frequency with Monday being the last day of the week) || |
| 143 | | || '''''FR_BUS''''' || 'B', 'BUSINESS', 'BUSINESSLY' || |
| 144 | | || '''''FR_DAY''''' || 'D', 'DAY', 'DAILY' || |
| 145 | | || '''''FR_HR''''' || 'H', 'HOUR', 'HOURLY' || |
| 146 | | || '''''FR_MIN''''' || 'T', 'MINUTE', 'MINUTELY' || |
| 147 | | || '''''FR_SEC''''' || 'S', 'SECOND', 'SECONDLY' || |
| 148 | | || '''''FR_UND''''' || 'U', 'UNDEF', 'UNDEFINED' || |
| 149 | | |
| 150 | | == Convenience functions == |
| 151 | | |
| 152 | | * '''now''' : get the current Date at a specified frequency |
| 153 | | * '''prevbusday''' : get the previous business day, determined by a specified cut off time. See the function's doc string for more details. |
| 154 | | |
| 155 | | == Manipulating dates == |
| 156 | | |
| 157 | | You can convert a `Date` object from one frequency to another with the `asfreq` method. When converting to a higher frequency (for example, from monthly to daily), you may optionally specify the "relation" parameter with the value "START" or "END" (default is "END"). Note that if you convert a daily `Date` to a monthly frequency and back to a daily one, you will lose your day information in the process (similarly for converting any higher frequency to a lower one): |
| 158 | | |
| 159 | | {{{ |
| 160 | | #!python numbers=disable |
| 161 | | >>> D = ts.Date('D', year=2007, month=12, day=31) |
| 162 | | >>> D.asfreq('M') |
| 163 | | <M: Dec-2006> |
| 164 | | >>> D.asfreq('M').asfreq('D', relation="START") |
| 165 | | <D: 01-Dec-2006> |
| 166 | | >>> D.asfreq('M').asfreq('D', relation="END") |
| 167 | | <D: 31-Dec-2006> |
| 168 | | }}} |
| 169 | | |
| 170 | | You can add and subtract integers from a `Date` object to get a new `Date` object. The frequency of the new object is the same as the original one. For example: |
| 171 | | |
| 172 | | {{{ |
| 173 | | #!python numbers=disable |
| 174 | | >>> yesterday = ts.now('D') - 1 |
| 175 | | >>> infivemonths = ts.now('M') + 5 |
| 176 | | }}} |
| 177 | | |
| 178 | | You can also subtract a Date from another Date of the same frequency to determine the number of periods between the two dates. |
| 179 | | |
| 180 | | {{{ |
| 181 | | #!python numbers=disable |
| 182 | | >>> Y = ts.Date('A', year=2007) |
| 183 | | >>> days_in_year = Y.asfreq('D', relation='END') - Y.asfreq('D', relation='START') + 1 |
| 184 | | >>> days_in_year |
| 185 | | 365 |
| 186 | | }}} |
| 187 | | |
| 188 | | Some other methods worth mentioning are: |
| 189 | | |
| 190 | | * `toordinal` : converts an object to the equivalent proleptic gregorian date. |
| 191 | | * `tostring` : converts an object to the corresponding string. |
| 192 | | |
| 193 | | == Formatting Dates as Strings == |
| 194 | | |
| 195 | | To output a date as a string, you can simply cast it to a string (call str on it) and a default output format for that frequency will be used, or you can use the strfmt method for explicit control. The strfmt method of the Date class takes one argument: a format string. This behaves in essentially the same manner as the ''strftime'' function in the standard python time module and accepts the same directives, plus several additional directives outlined below. |
| 196 | | |
| 197 | | || '''Directive''' || '''Meaning''' || |
| 198 | | || %q || the ''quarter'' of the date || |
| 199 | | || %f || Year without century as a decimal number [00,99]. The ''year'' in this case is the year of the date determined by the year for the current quarter. This is the same as %y unless the Date is one of the quarterly frequencies. In financial terms, this is the 'fiscal year'. || |
| 200 | | || %F || Year with century as a decimal number. The ''year'' in this case is the year of the date determined by the year for the current quarter. This is the same as %Y unless the Date is one of the quarterly frequencies. In financial terms, this is the 'fiscal year'. || |
| 201 | | |
| 202 | | '''Examples''' |
| 203 | | {{{ |
| 204 | | #!python numbers=disable |
| 205 | | >>> a = ts.Date(freq='q-jul', year=2006, quarter=1) |
| 206 | | >>> a.strfmt("%F-Q%q") |
| 207 | | '2006-Q1' |
| 208 | | >>> a.strfmt("%b-%Y") # this will output the last month in the quarter for this date |
| 209 | | 'Oct-2005' |
| 210 | | >>> b = ts.Date(freq='d', year=2006, month=4, day=25) |
| 211 | | >>> b.strfmt("%d-%b-%Y") |
| 212 | | '25-Apr-2006' |
| 213 | | }}} |
| 214 | | |
| 215 | | = !DateArray objects = |
| 216 | | `DateArrays` are simply ndarrays of `Date` objects. They accept the same methods as a `Date` object, with the addition of: |
| 217 | | |
| 218 | | * `tovalue` : converts the array to an array of integers. Each integer is the internal representation of the corresponding date. |
| 219 | | * `has_missing_dates` : outputs a boolean on whether some dates are missing or not. |
| 220 | | * `has_duplicated_dates` : outputs a boolean on whether some dates are duplicated or not. |
| 221 | | |
| 222 | | == Construction == |
| 223 | | To construct a `DateArray` object, you can use the factory function `date_array` (preferred), or call the class directly. See the `__doc__` strings of date_array and `DateArray` for parameter details. |
| 224 | | |
| 225 | | = TimeSeries = |
| 226 | | A `TimeSeries` object is the combination of three ndarrays: |
| 227 | | |
| 228 | | * `dates`: `DateArray` object. |
| 229 | | * `data` : ndarray. |
| 230 | | * `mask` : Boolean ndarray, indicating missing or invalid data. |
| 231 | | These three arrays can be accessed as attributes of a `TimeSeries` object. Another very useful attribute is `series`, that gives you the possibility to directly access `data` and `mask` as a masked array. |
| 232 | | |
| 233 | | == Construction == |
| 234 | | To construct a `TimeSeries`, you can use the factory function `time_series` (preferred) or call the class directly. See the `__doc__` strings of time_series and TimeSeries for parameter details. |
| 235 | | |
| 236 | | Use the class constructor when you want to bypass some of the overhead associated with the additional flexibility in the factory function.[[BR]] |
| 237 | | |
| 238 | | Let us construct a series of 600 random elements, starting 600 business days ago, at a business daily frequency: |
| 239 | | |
| 240 | | {{{ |
| 241 | | #!python numbers=disable |
| 242 | | >>> data = np.random.uniform(-100,100,600) |
| 243 | | >>> today = ts.now('B') |
| 244 | | >>> series = ts.time_series(data, dtype=np.float_, freq='B', start_date=today-600) |
| 245 | | }}} |
| 246 | | We can check that `series.dates` is a `DateArray` object and that `series.series` is a `MaskedArray` object. |
| 247 | | |
| 248 | | {{{ |
| 249 | | #!python numbers=disable |
| 250 | | >>> isinstance(series.dates, ts.DateArray) |
| 251 | | True |
| 252 | | >>> isinstance(series.series, ma.MaskedArray) |
| 253 | | True |
| 254 | | }}} |
| 255 | | So, if you are already familiar with `MaskedArray`, using `TimeSeries` should be straightforward. Just keep in mind that another attribute is always present, `dates`. |
| 256 | | |
| 257 | | == Dates and Data compatibility == |
| 258 | | The example we just introduced corresponds to the simplest case of only one variable indexed in time. In that case, the `DateArray` object should have the same size as the `data` part. In our example, the length of the `DateArray` was automatically adjusted to match the data length, and we have {{{DateArray.size == series.size}}}. |
| 259 | | |
| 260 | | However, it is often convenient to use series with multiple variables. A simple representation of this kind of data is a matrix, with as many rows as actual observations and as many columns as variables. In that case, the `DateArray` object should have the same length as the number of rows. More generally, {{{DateArray.size}}} should be equal to {{{series.shape[0]}}}. |
| 261 | | |
| 262 | | When a `TimeSeries` is created from a multi-dimensional `data` and a single starting date, it is assumed that the data consists of several variables: the length of the `DateArray` is then adjusted to match {{{len(data)}}}. However, you can force the length of the `DateArray` with the {{{length}}} optional parameter. |
| 263 | | |
| 264 | | For example, let us consider the case of an array of (50 x 12) points, corresponding to 50 years of monthly data. |
| 265 | | |
| 266 | | {{{ |
| 267 | | #!python numbers=disable |
| 268 | | >>> data = np.random.uniform(-1,1,50*12).reshape(50,12) |
| 269 | | }}} |
| 270 | | We may want to consider each month independently from the others: in that case, we want an annual series of 50 observations, each observation consisting of 12 variables. We define the time series as: |
| 271 | | |
| 272 | | {{{ |
| 273 | | #!python numbers=disable |
| 274 | | >>> newseries = ts.time_series(data, start_date=ts.now('Y')-50) |
| 275 | | >>> newseries._dates.size |
| 276 | | 50 |
| 277 | | }}} |
| 278 | | But we can also consider the series as monthly data. We could even ravel the initial data, or force the length of the `DateArray`: |
| 279 | | |
| 280 | | {{{ |
| 281 | | #!python numbers=disable |
| 282 | | >>> newseries = ts.time_series(data, start_date=ts.now('M')-600, length=600) |
| 283 | | >>> newseries._dates.size |
| 284 | | 600 |
| 285 | | }}} |
| 286 | | Now, let us consider the case of a (5x10x10) array. For example, each (10x10) slice could be a raster map, or a picture. The following code defines a daily series of 5 maps: |
| 287 | | |
| 288 | | {{{ |
| 289 | | #!python numbers=disable |
| 290 | | >>> data = np.random.uniform(-1,1,5*10*10).reshape(5,10,10) |
| 291 | | >>> newseries = ts.time_series(data, start_date=ts.now('D')) |
| 292 | | }}} |
| 293 | | == Indexing == |
| 294 | | Elements of a `TimeSeries` can be accessed just like with regular ndarrrays. Thus, |
| 295 | | |
| 296 | | {{{ |
| 297 | | #!python numbers=disable |
| 298 | | >>> series[0] |
| 299 | | }}} |
| 300 | | outputs the first element, while |
| 301 | | |
| 302 | | {{{ |
| 303 | | #!python numbers=disable |
| 304 | | >>> series[-30:] |
| 305 | | }}} |
| 306 | | outputs the last 30 elements. |
| 307 | | |
| 308 | | But you can also use a date: |
| 309 | | |
| 310 | | {{{ |
| 311 | | #!python numbers=disable |
| 312 | | >>> thirtydaysago = today - 30 |
| 313 | | >>> series[thirtydaysago:] |
| 314 | | }}} |
| 315 | | or even a string... |
| 316 | | |
| 317 | | {{{ |
| 318 | | #!python numbers=disable |
| 319 | | >>> series[thirtydaysago.tostring():] |
| 320 | | }}} |
| 321 | | or a sequence/ndarray of integers... |
| 322 | | |
| 323 | | {{{ |
| 324 | | #!python numbers=disable |
| 325 | | >>> series[[0,-1]] |
| 326 | | }}} |
| 327 | | ~-This latter is quite useful: it gives you the first and last data of your series.-~ |
| 328 | | |
| 329 | | In a similar way, setting elements of a `TimeSeries` works seamlessly. Let us set negative values to zero... |
| 330 | | |
| 331 | | {{{ |
| 332 | | #!python numbers=disable |
| 333 | | >>> series[series<0] = 0 |
| 334 | | }}} |
| 335 | | ... and the values falling on Fridays to 100 |
| 336 | | |
| 337 | | {{{ |
| 338 | | #!python numbers=disable |
| 339 | | >>> series[series.weekday == 4] = 100 |
| 340 | | }}} |
| 341 | | |
| 342 | | We can also index on multiple criteria. We will create a temporary array of 'weekdays' to avoid recomputing the weekdays multiple times. Here we will set all Wednesday and Fridays to 100. |
| 343 | | |
| 344 | | {{{ |
| 345 | | #!python numbers=disable |
| 346 | | >>> weekdays = series.weekday |
| 347 | | >>> series[(weekdays == 2) | (weekdays == 4)] = 100 |
| 348 | | }}} |
| 349 | | You should keep in mind that `TimeSeries` are basically `MaskedArrays`. If some data of an array are masked, you will not be able to use this array as index, you will have to fill it first. |
| 350 | | |
| 351 | | == Missing Observations (aka masked values) == |
| 352 | | |
| 353 | | Missing observations are handled in exactly the same way as with masked arrays. If you are familiar with masked arrays, then there is nothing new to learn. Please see the main numpy documentation for additional info on masked arrays. |
| 354 | | |
| 355 | | == Operations on TimeSeries == |
| 356 | | If you work with only one `TimeSeries`, you can use the `maskedarray` commands to process the data. For example: |
| 357 | | |
| 358 | | {{{ |
| 359 | | #!python numbers=disable |
| 360 | | >>> series_log = ma.log(series) |
| 361 | | }}} |
| 362 | | Note that invalid values (negative, in that case), are automatically masked. Note also that you could use the standard numpy version of the function instead, however the `reduce` and `accumulate` methods of some ufuncs (such as add or multiply) will only function properly with the `maskedarray` versions. ~-The reason is that the methods of the numpy.ufuncs will not know how to properly ignore masked values for such operations.-~[[BR]][[BR]] |
| 363 | | |
| 364 | | When working with multiple series, only series of the same frequency, size and starting date can be used in basic operations. The function `align_series` ~-(or its alias `aligned`)-~ forces series to have matching starting and ending dates. By default, the starting date will be set to the smallest starting date of the series, and the ending date to the largest.[[BR]][[BR]] |
| 365 | | |
| 366 | | Let's construct a list of months, starting on Jan 2005 and ending on Dec 2006, with a gap from Oct 2005 to Jan 2006. |
| 367 | | |
| 368 | | {{{ |
| 369 | | #!python numbers=disable |
| 370 | | >>> mlist_1 = ['2005-%02i' % i for i in range(1,10)] |
| 371 | | >>> mlist_1 += ['2006-%02i' % i for i in range(2,13)] |
| 372 | | >>> mdata_1 = np.arange(len(mlist_1)) |
| 373 | | >>> mser_1 = ts.time_series(mdata_1, mlist_1, freq='M') |
| 374 | | }}} |
| 375 | | Let us check whether there are some duplicated dates (no): |
| 376 | | |
| 377 | | {{{ |
| 378 | | #!python numbers=disable |
| 379 | | >>> mser_1.has_duplicated_dates() |
| 380 | | False |
| 381 | | }}} |
| 382 | | ...or missing dates (yes): |
| 383 | | |
| 384 | | {{{ |
| 385 | | #!python numbers=disable |
| 386 | | >>> mser_1.has_missing_dates() |
| 387 | | True |
| 388 | | }}} |
| 389 | | Let us construct a second monthly series, this time without missing dates: |
| 390 | | |
| 391 | | {{{ |
| 392 | | #!python numbers=disable |
| 393 | | >>> mlist_2 = ['2004-%02i' % i for i in range(1,13)] |
| 394 | | >>> mlist_2 += ['2005-%02i' % i for i in range(1,13)] |
| 395 | | >>> mser_2 = ts.time_series(np.arange(len(mlist_2)), mlist_2, freq='M') |
| 396 | | }}} |
| 397 | | |
| 398 | | We cannot perform binary operations on these two series (such as adding them together) because the dates of the series do not line up. Thus, we need to align them first. |
| 399 | | {{{ |
| 400 | | #!python numbers=disable |
| 401 | | >>> (malg_1, malg_2) = ts.align_series(mser_1, mser_2) |
| 402 | | }}} |
| 403 | | Now we can add the two series. Only the data that fall on dates common to the original, non-aligned series will be actually added, the others will be masked. After all, we are adding masked arrays. |
| 404 | | {{{ |
| 405 | | #!python numbers=disable |
| 406 | | >>> mser_3 = malg_1 + malg_2 |
| 407 | | }}} |
| 408 | | |
| 409 | | We could have filled the initial series first (replace masked values with a specified value): |
| 410 | | {{{ |
| 411 | | #!python numbers=disable |
| 412 | | >>> mser_3 = malg_1.filled(0) + malg_2.filled(0) |
| 413 | | }}} |
| 414 | | |
| 415 | | When aligning the series, we could have forced the series to start/end at some given dates: |
| 416 | | {{{ |
| 417 | | #!python numbers=disable |
| 418 | | >>> (malg_1,malg_2) = align_series(mser_1_filled, mser2, start_date='2004-06', end_date='2006-06') |
| 419 | | }}} |
| 420 | | |
| 421 | | == Time Shifting Operations == |
| 422 | | Calculating things like rate of change, or difference in a `TimeSeries` can be done most easily using a special method called tshift. |
| 423 | | |
| 424 | | Suppose we want to calculate a Year over Year rate of return for a monthly time series. One might initially try to do something along the lines of... |
| 425 | | |
| 426 | | {{{ |
| 427 | | #!python numbers=disable |
| 428 | | >>> YoY_change = 100*(mser[12:]/mser[:-12] - 1) |
| 429 | | }}} |
| 430 | | This will give you the correct numerical result, but since mser[12:] and mser[:-12] have different start and end dates, the result will be forced to a plain `MaskedArray`. Also, it will not be the same shape as your original input series, which may also be inconvenient. To get around these issues, use the tshift method instead. |
| 431 | | |
| 432 | | {{{ |
| 433 | | #!python numbers=disable |
| 434 | | >>> YoY_change = 100*(mser/mser.tshift(-12, copy=False) - 1) |
| 435 | | }}} |
| 436 | | mser.tshift(-12, copy=False) returns a series with the same start_date and end_date as mser, but values shifted to the right by 12 periods. Note that this will result in 12 masked values at the start of the resulting series. By default tshift copies any data it uses from the original series, but for situations like the example above you may want to avoid that. |
| 437 | | |
| 438 | | == TimeSeries Frequency Conversion == |
| 439 | | To convert a `TimeSeries` to another frequency, use the `convert` method or function. The optional argument `func` must be a function that acts on a 1D masked array and returns a scalar. |
| 440 | | |
| 441 | | {{{ |
| 442 | | #!python numbers=disable |
| 443 | | >>> mseries = series.convert('M',func=ma.average) |
| 444 | | }}} |
| 445 | | |
| 446 | | If `func` is None (the default value), the convert method/function returns a 2D array, where each row corresponds to the new frequency, and the columns to the original data. In our example, `convert` will return a 2D array with 23 columns, as there are at most 23 business days per month. |
| 447 | | {{{ |
| 448 | | #!python numbers=disable |
| 449 | | >>> mseries_default = series.convert('M') |
| 450 | | }}} |
| 451 | | |
| 452 | | When converting from a lower frequency to a higher frequency, an extra argument `position` is used to determine the placement of values in the resulting series. The value of the argument is either 'START' or 'END' ('END' by default). This will yield a series with a lot of masked values. To fill in these masked values, see the section [#InterpolatingMaskedValues Interpolating Masked Values] below. |
| 453 | | |
| 454 | | [[BR]] |
| 455 | | '''"asfreq" vs "convert"''': Be careful not to confuse these two methods. "asfreq" simply takes every date in the .dates attribute of the `TimeSeries` and changes it to the specified frequency, so the resulting series will have the same shape as the original series. "convert" is a more complicated function that takes a series with no missing or duplicated dates and creates a series at the new frequency with no missing or duplicated dates and intelligently places the data from the original series into appropriate points in the new series. |
| 456 | | |
| 457 | | == Interpolating Masked Values == |
| 458 | | The timeseries.interpolate sub-module contains several functions for filling in masked values in an array. Currently this includes: |
| 459 | | |
| 460 | | * interp_masked1d |
| 461 | | * foward_fill |
| 462 | | * backward_fill |
| 463 | | Let us take a monthly `TimeSeries` , convert it to business frequency, and then interpolate the resulting masked values. |
| 464 | | |
| 465 | | {{{ |
| 466 | | #!python numbers=disable |
| 467 | | >>> import scikits.timeseries.lib.interpolate as itp |
| 468 | | >>> mser = ts.time_series(np.arange(12, dtype=np.float_), start_date=ts.now('M')) |
| 469 | | >>> bser = mser.convert("B", position='END') |
| 470 | | >>> bser_ffill = itp.forward_fill(bser, maxgap=30) |
| 471 | | >>> bser_bfill = itp.backward_fill(bser) |
| 472 | | >>> bser_linear = itp.interp_masked1d(bser, kind='linear') |
| 473 | | }}} |
| 474 | | The optional maxgap parameter for forward_fill and backward_fill will ensure that if there are more than maxgap consecutive masked values, they will not be filled. Using maxgap=30 like in our above example will ensure that missing months from our original monthly series are not filled in. |
| 475 | | |
| 476 | | = Reports = |
| 477 | | == Report Class == |
| 478 | | The Report class allows you to generate tabular reports of `TimeSeries` objects with dates in the left most column. An instance of the Report class is essentially a template for generating reports. All parameters to the `__init__` method of the class are optional, any options you specify simply serve as the defaults for this instance. [[BR]][[BR]] When you call your Report instance (by invoking the `__call__` method), you may specify any of the options that are valid for creation of the Report instance, and these options will affect only the current call, they will not modify the defaults for that instance. |
| 479 | | |
| 480 | | === Parameters === |
| 481 | | Both the `__init__` and `__call__` methods accept all of the following parameters: |
| 482 | | |
| 483 | | * '''*tseries''' : time series objects. Must all be at the same frequency, but do not need to be aligned. |
| 484 | | * '''dates''' (`DateArray`, ''None'') : dates at which values of all the series will be output. If not specified, data will be output from the minimum start_date to the maximum end_date of all the time series objects. |
| 485 | | * '''header_row''' (list, ''None'') : List of column headers. Specifying the header for the date column is optional. |
| 486 | | * '''header_char''' (str, ''`'-'`''): Character to be used for the row separator line between the header and first row of data. None for no separator. This is ignored if `header_row` is None. |
| 487 | | * '''header_justify''' (List of strings or single string, ''None'') : Determines how headers are justified. If not specified, all headers are left justified. If a string is specified, it must be one of 'left', 'right', or 'center' and all headers will be justified the same way. If a list is specified, each header will be justified according to the specification for that header in the list. Specifying the justification for the date column is header is optional. |
| 488 | | * '''row_char''' (str, ''None''): Character to be used for the row separator line between each row of data. None for no separator. |
| 489 | | * '''footer_func''' (List of functions or single function, ''None'') : A function or list of functions for summarizing each data column in the report. For example, ma.sum to get the sum of the column. If a list of functions is provided there must be exactly one function for each column. Do not specify a function for the Date column. |
| 490 | | * '''footer_char''' (str, ''`'-'`''): Character to be used for the row separator line between the last row of data and the footer. None for no separator. This is ignored if `footer_func` is None. |
| 491 | | * '''footer_label''' (str, ''None'') : label for the footer row. This goes at the end of the date column. This is ignored if footer_func is None. |
| 492 | | * '''justify''' (List of strings or single string, *[None]*) : Determines how data are justified in their column. If not specified, the date column and string columns are left justified, and everything else is right justified. If a string is specified, it must be one of 'left', 'right', or 'center' and all columns will be justified the same way. If a list is specified, each column will be justified according to the specification for that column in the list. Specifying the justification for the date column is optional. |
| 493 | | * '''prefix''' (str, ''`''`'') : A string prepended to each printed row. |
| 494 | | * '''postfix''' (str, ''`''`'') : A string appended to each printed row. |
| 495 | | * '''mask_rep''' (str, ''`'--'`''): String used to represent masked values in output. |
| 496 | | * '''datefmt''' (str, ''None'') : Formatting string used for displaying the dates in the date column. If None, str() is simply called on the dates. |
| 497 | | * '''fmt_func''' (List of functions or single function, ''None'') : A function or list of functions for formatting each data column in the report. If not specified, str() is simply called on each item. If a list of functions is provided, there must be exactly one function for each column. Do not specify a function for the Date column, that is handled by the datefmt argument. |
| 498 | | * '''wrap_func''' (List of functions or single function, ''lambda x:x''): A function f(text) for wrapping text; each element in the column is first wrapped by this function. Instances of wrap_onspace, wrap_onspace_strict, and wrap_always (which are part of this module) work well for this. Eg. wrap_func=wrap_onspace(10) . If a list is specified, each column will be wrapped according to the specification for that column in the list. Specifying a function for the Date column is optional. |
| 499 | | * '''col_width''' (list of integers or single integer, ''None''): use this to specify a width for all columns (single integer), or each column individually (list of integers). The column will be at least as wide as col_width, but may be larger if cell contents exceed col_width. If specifying a list, you may optionally specify the width for the Date column as the first entry. |
| 500 | | * '''output''' (buffer, ''sys.stdout''): `output` must have a write method. |
| 501 | | * '''fixed_width''' (boolean, ''True''): If True, columns are fixed width (ie. cells will be padded with spaces to ensure all cells in a given column are the same width). If False, `col_width` will be ignored and cells will not be padded. |
| 502 | | == Examples == |
| 503 | | {{{ |
| 504 | | #!python numbers=disable |
| 505 | | # the following variables will be used throughout the examples |
| 506 | | import scikits.timeseries.lib.reportlib as rl |
| 507 | | ser1 = ts.time_series(np.random.uniform(-100,100,10), start_date=ts.now('b')-5) |
| 508 | | ser2 = ts.time_series(np.random.uniform(-100,100,10), start_date=ts.now('b')) |
| 509 | | strings = ['some string', 'another string', 'yet another, string', 'final string'] |
| 510 | | ser3 = ts.time_series(strings, start_date=ts.now('b'), dtype=np.string_) |
| 511 | | dArray = ts.date_array(start_date=ts.now('b'), length=3) |
| 512 | | }}} |
| 513 | | === Example 1: Basic report === |
| 514 | | {{{ |
| 515 | | #!python numbers=disable |
| 516 | | >>> basicReport = rl.Report(ser1, ser2, ser3) |
| 517 | | >>> basicReport() |
| 518 | | """ |
| 519 | | 29-Jan-2007 | -95.4554568525 | -- | -- |
| 520 | | 30-Jan-2007 | 8.58356086571 | -- | -- |
| 521 | | 31-Jan-2007 | 41.6353000447 | -- | -- |
| 522 | | 01-Feb-2007 | -70.4674570816 | -- | -- |
| 523 | | 02-Feb-2007 | 2.98803489327 | -- | -- |
| 524 | | 05-Feb-2007 | -21.6474414786 | -77.750560056 | some string |
| 525 | | 06-Feb-2007 | 84.3212422071 | 56.2238118715 | another string |
| 526 | | 07-Feb-2007 | 23.5664556686 | 64.2491772743 | yet another, string |
| 527 | | 08-Feb-2007 | 34.8778728662 | -39.4734173695 | final string |
| 528 | | 09-Feb-2007 | -64.0545308092 | -83.7175337221 | -- |
| 529 | | 12-Feb-2007 | -- | 52.4958419122 | -- |
| 530 | | 13-Feb-2007 | -- | 7.1396171176 | -- |
| 531 | | 14-Feb-2007 | -- | -57.7688749366 | -- |
| 532 | | 15-Feb-2007 | -- | 71.2844695721 | -- |
| 533 | | 16-Feb-2007 | -- | 87.1665936067 | -- |
| 534 | | """ |
| 535 | | }}} |
| 536 | | === Example 2: csv report for excel === |
| 537 | | {{{ |
| 538 | | #!python numbers=disable |
| 539 | | >>> mycsv = open('mycsv.csv', 'w') |
| 540 | | >>> strfmt = lambda x: '"'+str(x)+'"' |
| 541 | | >>> fmt_func = [None, None, strfmt] |
| 542 | | >>> csvReport = rl.Report(ser1, ser2, ser3, fmt_func=fmt_func, mask_rep='#N/A', delim=',', fixed_width=False) |
| 543 | | >>> csvReport() # output to sys.stdout |
| 544 | | """ |
| 545 | | 29-Jan-2007,67.4086881661,#N/A,#N/A |
| 546 | | 30-Jan-2007,-78.8405461996,#N/A,#N/A |
| 547 | | 31-Jan-2007,10.0559754743,#N/A,#N/A |
| 548 | | 01-Feb-2007,-71.149716374,#N/A,#N/A |
| 549 | | 02-Feb-2007,-46.055865283,#N/A,#N/A |
| 550 | | 05-Feb-2007,35.9105419931,85.1744316431,"some string" |
| 551 | | 06-Feb-2007,2.93015788615,-87.0634270731,"another string" |
| 552 | | 07-Feb-2007,-49.0774248826,-91.4854233865,"yet another, string" |
| 553 | | 08-Feb-2007,94.8175754225,36.587114053,"final string" |
| 554 | | 09-Feb-2007,-88.9474880802,37.3563788938,#N/A |
| 555 | | 12-Feb-2007,#N/A,21.1325367724,#N/A |
| 556 | | 13-Feb-2007,#N/A,72.2437957896,#N/A |
| 557 | | 14-Feb-2007,#N/A,37.2619438419,#N/A |
| 558 | | 15-Feb-2007,#N/A,-87.1465826319,#N/A |
| 559 | | 16-Feb-2007,#N/A,63.5556895555,#N/A |
| 560 | | """ |
| 561 | | >>> csvReport(output=mycsv) # output to file |
| 562 | | }}} |
| 563 | | === Example 3: HTML report === |
| 564 | | {{{ |
| 565 | | #!python numbers=disable |
| 566 | | >>> numfmt = lambda x: '%.2f' % x |
| 567 | | >>> fmt_func = [numfmt, numfmt, None] |
| 568 | | >>> footer_func = [ma.sum, ma.sum, None] |
| 569 | | >>> footer_label = "Total" |
| 570 | | >>> htmlReport = rl.Report(ser1, ser2, ser3) |
| 571 | | >>> htmlReport.set_options(prefix='<tr><td>', delim='</td><td>', postfix='</td></tr>') |
| 572 | | >>> htmlReport.set_options(wrap_func=rl.wrap_onspace(10,nls='<BR>')) |
| 573 | | >>> htmlReport.set_options(fmt_func=fmt_func) |
| 574 | | >>> htmlReport.set_options(footer_label=footer_label, footer_func=footer_func, footer_char='') |
| 575 | | >>> htmlReport.set_options(dates=dArray) |
| 576 | | >>> htmlReport() # output to sys.stdout |
| 577 | | """ |
| 578 | | <tr><td>05-Feb-2007</td><td> 91.66</td><td>-99.21</td><td>some<BR>string </td></tr> |
| 579 | | <tr><td>06-Feb-2007</td><td>-68.84</td><td> 30.50</td><td>another<BR>string </td></tr> |
| 580 | | <tr><td>07-Feb-2007</td><td> 93.53</td><td> 90.46</td><td>yet<BR>another,<BR>string</td></tr> |
| 581 | | <tr><td>Total </td><td>116.36</td><td> 21.75</td><td> </td></tr> |
| 582 | | """ |
| 583 | | }}} |
| 584 | | === Example 4: Extra Options === |
| 585 | | {{{ |
| 586 | | #!python numbers=disable |
| 587 | | >>> basicReport = rl.Report(ser1, ser2, ser3, dates=dArray) |
| 588 | | #............................................................................. |
| 589 | | """Output report with a header. By default, a line of dashes will separate the |
| 590 | | header and the first row of data. Optionally, you can specify a label for the |
| 591 | | Date column as well (so a list with four entries instead of three like this |
| 592 | | example), If you wish to get rid of the separater line, or use a different |
| 593 | | character, specify: header_char=''""" |
| 594 | | >>> basicReport(header_row=['col 1', 'col 2', 'col 3']) |
| 595 | | """ |
| 596 | | | col 1 | col 2 | col 3 |
| 597 | | ------------------------------------------------------------------- |
| 598 | | 06-Feb-2007 | 2.59583929443 | -96.2110139217 | some string |
| 599 | | 07-Feb-2007 | -24.1064434097 | 86.0387977626 | another string |
| 600 | | 08-Feb-2007 | -21.6432010416 | 4.83754030508 | yet another, string |
| 601 | | """ |
| 602 | | #............................................................................. |
| 603 | | """Change column justification for the report. You can specify a single |
| 604 | | string ('right', 'left', or 'center') and this will impact all columns, or you |
| 605 | | can specify a list of strings (optionally including the Date column, which is |
| 606 | | 'left' by default)""" |
| 607 | | >>> basicReport(justify=['left', 'left', 'right']) |
| 608 | | """ |
| 609 | | 06-Feb-2007 | 2.59583929443 | -96.2110139217 | some string |
| 610 | | 07-Feb-2007 | -24.1064434097 | 86.0387977626 | another string |
| 611 | | 08-Feb-2007 | -21.6432010416 | 4.83754030508 | yet another, string |
| 612 | | """ |
| 613 | | #............................................................................. |
| 614 | | """Change formatting of Date column""" |
| 615 | | >>> basicReport(datefmt='%d') |
| 616 | | """ |
| 617 | | 06 | 2.59583929443 | -96.2110139217 | some string |
| 618 | | 07 | -24.1064434097 | 86.0387977626 | another string |
| 619 | | 08 | -21.6432010416 | 4.83754030508 | yet another, string |
| 620 | | """ |
| 621 | | #............................................................................. |
| 622 | | """Add a separater line between each row""" |
| 623 | | >>> basicReport(row_char='-') |
| 624 | | """ |
| 625 | | 06-Feb-2007 | 2.59583929443 | -96.2110139217 | some string |
| 626 | | ------------------------------------------------------------------- |
| 627 | | 07-Feb-2007 | -24.1064434097 | 86.0387977626 | another string |
| 628 | | ------------------------------------------------------------------- |
| 629 | | 08-Feb-2007 | -21.6432010416 | 4.83754030508 | yet another, string |
| 630 | | """ |
| 631 | | #............................................................................. |
| 632 | | """Report different series. Notice that the other options set remain intact |
| 633 | | (ie. dates=dArray)""" |
| 634 | | >>> basicReport(ser1) |
| 635 | | """ |
| 636 | | 06-Feb-2007 | 2.59583929443 |
| 637 | | 07-Feb-2007 | -24.1064434097 |
| 638 | | 08-Feb-2007 | -21.6432010416 |
| 639 | | """ |
| 640 | | #............................................................................. |
| 641 | | """Specify column widths. Just as in the header and justify options, you can |
| 642 | | specify a single value to affect all columns, or a list which optionally |
| 643 | | includes a specification for the Date column. Specify -1 to auto-size a |
| 644 | | column""" |
| 645 | | >>> basicReport(col_width=[20, 20, -1]) |
| 646 | | """ |
| 647 | | 06-Feb-2007 | 2.59583929443 | -96.2110139217 | some string |
| 648 | | 07-Feb-2007 | -24.1064434097 | 86.0387977626 | another string |
| 649 | | 08-Feb-2007 | -21.6432010416 | 4.83754030508 | yet another, string |
| 650 | | """ |
| 651 | | }}} |
| 652 | | = Plotting = |
| 653 | | |
| 654 | | == Introduction == |
| 655 | | The timeseries.plotlib submodule makes it relatively simple to produce time series plots using matplotlib. It relieves the user from the burden of having to setup appropriately spaced and formatted tick labels. |
| 656 | | |
| 657 | | If you have never used matplotlib, you should first go through the tutorial on the matplotlib web-site before following the examples below. |
| 658 | | |
| 659 | | == Examples == |
| 660 | | === Adaptation of date_demo2.py in matplotlib tutorial === |
| 661 | | {{{ |
| 662 | | #!python numbers=disable |
| 663 | | import matplotlib.pyplot as plt |
| 664 | | from matplotlib.finance import quotes_historical_yahoo |
| 665 | | import scikits.timeseries as ts |
| 666 | | import scikits.timeseries.lib.plotlib as tpl |
| 667 | | # retrieve data from yahoo. The standard datetime python module is needed here |
| 668 | | import datetime |
| 669 | | date1 = datetime.date(2002, 1, 5) |
| 670 | | date2 = datetime.date(2003, 12, 1) |
| 671 | | quotes = quotes_historical_yahoo('INTC', date1, date2) |
| 672 | | """the dates from the yahoo quotes module get returned as integers, which |
| 673 | | happen to correspond to the integer representation of 'DAILY' frequency dates |
| 674 | | in the timeseries module. So create a DateArray of daily dates, then convert |
| 675 | | this to business day frequency afterwards.""" |
| 676 | | dates = ts.date_array([q[0] for q in quotes], freq='DAILY').asfreq('BUSINESS') |
| 677 | | opens = [q[1] for q in quotes] |
| 678 | | raw_series = ts.time_series(opens, dates) |
| 679 | | """fill_missing_dates will insert masked values for any missing data points. |
| 680 | | Note that you could plot the series without doing this, but it would cause |
| 681 | | missing values to be linearly interpolated rather than left empty in the plot""" |
| 682 | | series = ts.fill_missing_dates(raw_series) |
| 683 | | fig = tpl.tsfigure() |
| 684 | | fsp = fig.add_tsplot(111) |
| 685 | | fsp.tsplot(series, '-') |
| 686 | | """add grid lines at start of each quarter. Grid lines appear at the major tick |
| 687 | | marks by default (which, due to the dynamic nature of the ticks for time series |
| 688 | | plots, cannot be guaranteed to be at quarter start). So if you want grid lines |
| 689 | | to appear at specific intervals, you must first specify xticks explicitly""" |
| 690 | | dates = series.dates |
| 691 | | quarter_starts = dates[dates.quarter != (dates-1).quarter] |
| 692 | | fsp.set_xticks(quarter_starts.tovalue()) |
| 693 | | fsp.grid() |
| 694 | | plt.show() |
| 695 | | }}} |
| 696 | | The above code produces the following plot: |
| 697 | | |
| 698 | | [[Image(example1.png)]] |
| 699 | | |
| 700 | | === Monthly Data along with an exponential moving average === |
| 701 | | {{{ |
| 702 | | #!python numbers=disable |
| 703 | | import matplotlib.pyplot as plt |
| 704 | | import numpy as np |
| 705 | | import scikits.timeseries as ts |
| 706 | | import scikits.timeseries.lib.plotlib as tpl |
| 707 | | from scikits.timeseries.lib.moving_funcs import mov_average_expw |
| 708 | | # generate some random data |
| 709 | | data = np.cumprod(1 + np.random.normal(0, 1, 300)/100) |
| 710 | | series = ts.time_series(data, |
| 711 | | start_date=ts.Date(freq='M', year=1982, month=1)) |
| 712 | | fig = tpl.tsfigure() |
| 713 | | fsp = fig.add_tsplot(111) |
| 714 | | fsp.tsplot(series, '-', mov_average_expw(series, 40), 'r--') |
| 715 | | plt.show() |
| 716 | | }}} |
| 717 | | The above code produces the following plot: |
| 718 | | |
| 719 | | [[Image(example2.png)]] |
| 720 | | |
| 721 | | === Separate scales for left and right axis === |
| 722 | | {{{ |
| 723 | | #!python numbers=disable |
| 724 | | import matplotlib.pyplot as plt |
| 725 | | import numpy as np |
| 726 | | import numpy.ma as ma |
| 727 | | import scikits.timeseries as ts |
| 728 | | import scikits.timeseries.lib.plotlib as tpl |
| 729 | | # generate some random data |
| 730 | | data1 = np.cumprod(1 + np.random.normal(0, 1, 300)/100) |
| 731 | | data2 = np.cumprod(1 + np.random.normal(0, 1, 300)/100)*100 |
| 732 | | series1 = ts.time_series(data1, |
| 733 | | start_date=ts.Date(freq='M', year=1982, month=1)-50) |
| 734 | | series2 = ts.time_series(data2, |
| 735 | | start_date=ts.Date(freq='M', year=1982, month=1)) |
| 736 | | fig = tpl.tsfigure() |
| 737 | | fsp = fig.add_tsplot(111) |
| 738 | | # plot series on left axis |
| 739 | | fsp.tsplot(series1, 'b-', label='<- left series') |
| 740 | | fsp.set_ylim(ma.min(series1.series), ma.max(series1.series)) |
| 741 | | # create right axis |
| 742 | | fsp_right = fsp.add_yaxis(position='right', yscale='log') |
| 743 | | # plot series on right axis |
| 744 | | fsp_right.tsplot(series2, 'r-', label='-> right series') |
| 745 | | fsp_right.set_ylim(ma.min(series2.series), ma.max(series2.series)) |
| 746 | | # setup legend |
| 747 | | fsp.legend( |
| 748 | | (fsp.lines[-1], fsp_right.lines[-1]), |
| 749 | | (fsp.lines[-1].get_label(), fsp_right.lines[-1].get_label()), |
| 750 | | ) |
| 751 | | plt.show() |
| 752 | | }}} |
| 753 | | The above code produces the following plot: |
| 754 | | |
| 755 | | [[Image(example3.png)]] |
| 756 | | |
| 757 | | |
| 758 | | === Sample plots at various levels of zoom === |
| 759 | | The following charts show daily data being plotted at varying length date ranges. This demonstrates the dynamic nature of the axis labels. With interactive plotting, labels will be updated dynamically as you scroll and zoom. |
| 760 | | |
| 761 | | --------- |
| 762 | | . '''15 days'''[[BR]] [[Image(zoom1.png)]] |
| 763 | | --------- |
| 764 | | . '''45 days'''[[BR]] [[Image(zoom2.png)]] |
| 765 | | --------- |
| 766 | | . '''250 days'''[[BR]] [[Image(zoom3.png)]] |
| 767 | | --------- |
| 768 | | . '''3750 days'''[[BR]] [[Image(zoom4.png)]] |
| 769 | | --------- |
| 770 | | |
| 771 | | = Databases = |
| 772 | | |
| 773 | | Storing and retrieving time series from standard relational databases is very |
| 774 | | simple once you know a few tricks. For these examples, I use the ceODBC |
| 775 | | database module (http://ceodbc.sourceforge.net/) which I have found to be more |
| 776 | | reliable and faster than the pyodbc module. However, I *think* these examples |
| 777 | | should work with the pyodbc module as well.[[BR]][[BR]] |
| 778 | | |
| 779 | | SQL Server 2005 Express edition is the database used in the examples. Other |
| 780 | | standard relational databases should also work, but I have not personally |
| 781 | | verified it.[[BR]][[BR]] |
| 782 | | |
| 783 | | A database called "test" is assumed to have been created already along with a |
| 784 | | table called "test_table" described by the following query:[[BR]][[BR]] |
| 785 | | |
| 786 | | {{{ |
| 787 | | #!sql numbers=disable |
| 788 | | CREATE TABLE test_table ( |
| 789 | | [date] [datetime] NULL, |
| 790 | | [value] [decimal](18, 6) NULL |
| 791 | | ) |
| 792 | | }}} |
| 793 | | |
| 794 | | If you have verified these examples to work with other databases and python |
| 795 | | db modules, it would be greatly appreciated if you could add a note to the |
| 796 | | wiki.[[BR]][[BR]] |
| 797 | | |
| 798 | | == Example == |
| 799 | | {{{ |
| 800 | | #!python numbers=disable |
| 801 | | import ceODBC as odbc |
| 802 | | import scikits.timeseries as ts |
| 803 | | |
| 804 | | test_series = ts.time_series(range(50), start_date=ts.now('b')) |
| 805 | | |
| 806 | | # lets mask one value just to make things interesting |
| 807 | | test_series[5] = ts.masked |
| 808 | | |
| 809 | | conn = odbc.Connection( |
| 810 | | "Driver={SQL Native Client};Server=localhost;Database=test;Uid=userid;Pwd=password;") |
| 811 | | crs = conn.cursor() |
| 812 | | |
| 813 | | # start with an empty table for these examples |
| 814 | | crs.execute("DELETE FROM test_table") |
| 815 | | |
| 816 | | # convert series to list of (datetime, value) tuples which can be interpreted |
| 817 | | # by the database module. Note that masked values will get converted to None |
| 818 | | # with the tolist method. None gets translated to NULL when inserted into the |
| 819 | | # database. |
| 820 | | _tslist = test_series.tolist() |
| 821 | | |
| 822 | | # insert time series data |
| 823 | | crs.executemany(""" |
| 824 | | INSERT INTO test_table |
| 825 | | ([date], [value]) VALUES (?, ?) |
| 826 | | """, |
| 827 | | _tslist |
| 828 | | ) |
| 829 | | |
| 830 | | # Read the data back out of the database. |
| 831 | | # Explicitly cast data of type decimal to float for reading purposes, |
| 832 | | # otherwise you will get decimal objects for your result. |
| 833 | | crs.execute(""" |
| 834 | | SELECT |
| 835 | | [date], |
| 836 | | CAST(ISNULL([value], 999) AS float) as vals, -- convert NULL's to 999 |
| 837 | | (CASE |
| 838 | | WHEN [value] is NULL THEN 1 |
| 839 | | ELSE 0 |
| 840 | | END) AS mask -- retrieve a mask column |
| 841 | | FROM test_table |
| 842 | | ORDER BY [date] ASC |
| 843 | | """) |
| 844 | | |
| 845 | | # zip(*arg) converts row based results to column based results. This is the |
| 846 | | # crucial trick needed for easily reading time series data from a relational |
| 847 | | # database with Python |
| 848 | | _dates, _values, _mask = zip(*crs.fetchall()) |
| 849 | | |
| 850 | | _series = ts.time_series(_values, dates=_dates, mask=_mask, freq='B') |
| 851 | | |
| 852 | | # commit changes to the database |
| 853 | | conn.commit() |
| 854 | | conn.close() |
| 855 | | }}} |
| 856 | | |
| 857 | | |
| 858 | | = Support / Feedback = |
| 859 | | |
| 860 | | * For help using the timeseries scikit, please post questions to the [http://projects.scipy.org/mailman/listinfo/scipy-user scipy-user mailing list] |
| 861 | | * For development related inquiries (enhancements, bug, etc), please post questions to the [http://projects.scipy.org/mailman/listinfo/scipy-dev scipy-dev mailing list] |
| 862 | | * Please file bug reports on trac under the [http://scipy.org/scipy/scikits/query?component=timeseries timeseries component] |
| | 1 | This page has moved. Go to [http://pytseries.sourceforge.net] for current documentation to the scikits.timeseries module |