330 likes | 427 Views
Large Data Bases: Advantages, Problems and Puzzles: Some naive observations from an economist. Alan Kirman, GREQAM Marseille Jerusalem September 2008. Some Basic Points. Economic data bases may be large in two ways
E N D
Large Data Bases: Advantages, Problems and Puzzles: Some naive observations from an economist Alan Kirman, GREQAM Marseille Jerusalem September 2008
Some Basic Points • Economic data bases may be large in two ways • Firstly they may simply contain a very large number of observations. The best examples being tick by tick data. • Secondly as with some panel data each observation may have many dimensions.
The Advantages and Problems • From a statistical point of view at least the high frequency data might seem to be unambiguously advantageous. However the very nature of the data has to be examined carefully and certain stylised facts emerge which are not present at lower frequency • In the case of multidimensional data, the « curse of dimensionality » may arise.
FX: A classic example of high frequency data • Usually Reuters indicative quotes are used for the analysis. What do they consist of? • Banks enter bids and asks for a particular currency pair, such as the euro-dollar. They put a time stamp to indicate the exact time of posting • These quotes are « indicative » and the banks are not legally obliged to honour them. • For euro-dollar there are between 10 and 20 thousand updates per day.
Brief Reminder of the Characteristics of this sort of data • Returns are given by • We know that there is no autocorrelation between successive returns but that and are positively autocorrelated except at very small time intervals and have slow decay • Volatility exhibits spikes referred to as volatility clustering
A Problem • The idea of using such data as Brousseau (2007) points out is to track the « true value » of the exchange rate through a period. • But all the data are not of the same « quality » • Although the quotes hold, at least briefly, between major banks, they may not do so for other customers and they may also depend on the amounts involved. • There may be mistakes, quotes placed as « advertising » and one with spreads so large that they encompass the spread between the best bid and ask and thus convey no information
Cleaning the Data • Brousseau and other authors propose various filtering methods, from simple to sophisticated. If the jump between two successive mid-points exceeds a certain threshold for example the observation is eliminated. ( a primitive first run) • However, how can one judge whether the filtering is successful? • One idea is to test against quotes which are binding such as those on EBS. But this is not a guarantee.
Microstructure Noise • In principle, the higher the sampling frequency is, the more precise the estimates of integrated volatility become • However, the presence of so-called market microstructure features at very high sampling frequencies may create important complications. • Financial transactions - and hence price changes and non-zero returns- arrive discretely rather than continuously over time • The presence of negative serial correlation of returns to successive transactions (including the so-called bid-ask bounce), and the price impact of trades. • For a discussion see Hasbrouck (2006), O’Hara (1998), and Campbell et al. (1997, Ch. 3)
Microstructure Noise • Why should we treat this as « noise » rather than integrate it into our models? • One argument is that it overemphasises volatility. In other words sampling too frequently gives a spuriously high value. • On the other hand, Hansen and Lunde (2006) assert that empirically market microstructure noise is negatively correlated with the returns, and hence biases the estimated volatility downward. However, this empirical stylized fact, based on their analysis of high-frequency stock returns, does not seem to carry over to the FX market
Microstructure Noise • « For example, if an organized stock exchange has designated market makers and specialists, and if these participants are slow in adjusting prices in response to shocks (possibly because the exchangeís rules explicitly prohibit them from adjusting prices by larger amounts all at once), it may be the case that realized volatility could drop if it is computed at those sampling frequencies for which this behavior is thought to be relevant. • In any case, it is widely recognized that market microstructure issues can contaminate estimates of integrated volatility in important ways, especially if the data are sampled at ultra-high frequencies, as is becoming more and more common. » Chaboud et al. (2007)
What do we claim to explain? • Let’s look at rapidly at a standard model and see how we determine the prices. • What we claim for this model is that it is the switching from chartist to fundamentalist behaviour that leads to • Fat tails • Long memory • Volatility clustering What does high frequency data have to do with this?
Stopping the process from exploding • Bound the probability that an individual can become a chartist • If we do not do this the process may simply explode • We do not put arbitrary limits on the prices that can be attained however
The Real Problem • We have a market clearing equilibrium but this is not the way these markets function • They function on the basis of an order book and that is what we should model. • Each price in very high frequency data corresponds to an individual transaction • The mechanics of the order book will influence the structure of the time series • How often do our agents revise their prices? • They infer information from the actions of others revealed by the transactions
How to solve this? • This is the subject of a project with Ulrich Horst • We will model an arrival process for orders and the distribution from which these orders are drawn will be determined by the movements of prices • In this way we model directly what is too often referred to as « microstructure noise » and remove one of the problems with using high frequency data.
A Challenge « In deep and liquid markets, market microstructure noise should pose less of a concern for volatility estimation. It should be possible to sample returns on such assets more frequently than returns on individual stocks, before estimates of integrated volatility encounter significant bias caused by the market microstructure features.. It is possible to sample the FX data as often as once every 15 to 20 seconds without the standard estimator of integrated volatility showing discernible effects stemming rom market microstructure noise. This interval is shorter than the sampling intervals of several minutes, usually five or more minutes, often recommended in the empirical literature This shorter sampling interval and associated larger sample size affords a considerable gain in estimation precision. In very deep and liquid markets, microstructure-induced frictions may be much less of an issue for volatility estimation than was previously thought. » Chaboud et al. (2007)
Why does this matter? • We collect more and more data on individuals and, in particular, on consumers and the unemployed • If we have D observations on N individuals the relationship between D and N is important if we wish to estimate some functional relation between the variables • There is now a whole battery of approaches for reducing the dimensionality of the problem and these represent a major challenge for econometrics
A blessing? • Mathematicians assert that such high dimensionality leads to a « concentration of measure » • Someone here can no doubt explain how this might help economists!