 |
Hammock
Christos Faloutsos, Christos.Faloutsos@cs.cmu.edu
Michalis Faloutsos, michalis@cs.ucr.edu
Christophe Tournery, ct3@andrew.cmu.edu
July 2nd 2001
The Hammock project focuses on developing tools for mining large datasets,
using tools from fractals and self-similarity.
Network data are of special interest (traffic, topology)
Hammock provides a Java/SWING GUI interface to the modules we have been
developing. Users can drag and connect datasets and operators, such as
an operator to estimate the fractal dimension of the arrivals
(seen as 1-d points). See Figure 1.
Figure 1:
Fractal Dimension
|
|
- 'FRACDIM':
Estimates the fractal dimension. Given a set of points (eg.,
arrival time-stamps of network packets), this module will plot
the 'correlation integral' and estimate the intrinsic dimensionality
of the cloud of points.
In our network arrivals example, fractal dimension =1 means
that the arrivals are uniformly distributed (= Poisson arrivals);
a smaller number means they are clustered. If the fractal dimension
is 0, then all the packets arrived at exactly the same instance
- 'ANF':
It estimates the hop-plot exponent, using a fast,
approximate method for the neighborhood function
The input is a graph, that is, a set of edges,
and the output is a plot of the neighborhood function,
that is, the number of neighbors within hops, as a function of .
The slope of this function in log-log scales is exactly the
'hop-plot exponent' [Faloutsos, Faloutsos, Faloutsos, 1999]
Our algorithms for the approximate neighborhood function are described in
Chris Palmer, Georgios Siganos, Michalis Faloutsos,
Christos Faloutsos and Phil Gibbons
The connectivity and fault-tolerance of the Internet topology
Workshop on Network Related Data Management
(NRDM 2001), Santa Barbara, CA, May 25, 2001.
http://www.cs.cmu.edu/~christos/PUBLICATIONS/anf-nrdm.ps.gz
- 'HURST':
a module to estimate the Hurst exponent for the burstiness
of the input traffic. The module plots the 'rescaled range'
R/S, as a function of the window size (see the books by [Mandelbrot],
or [Schroeder], for the exact definition)
If the input time sequence is self-similar, then the above plot
will be linear, and its slope is the Hurst exponent by definition.
Hammock will collect all these tools under the same GUI interface.
Users, with, say, a graph that represents an instance of the internet,
could invoke the GUI, apply the appropriate operator (eg, ANF),
and just read-off the value of the hop exponent, while seeing
the plot in postscript.
Similarly, when studying network traffic data, a user could feed in
a set of time-stamps of packet arrivals, and check their fractal dimension
(the closer to '1', the more 'uniform/Poisson' the traffic is).
Christophe Tournery
2001-07-02
|