Hammock

Christos Faloutsos, Christos.Faloutsos@cs.cmu.edu
Michalis Faloutsos, michalis@cs.ucr.edu
Christophe Tournery, ct3@andrew.cmu.edu

July 2nd 2001

GOAL:

The Hammock project focuses on developing tools for mining large datasets, using tools from fractals and self-similarity.

Network data are of special interest (traffic, topology)

DETAILS:

Hammock provides a Java/SWING GUI interface to the modules we have been developing. Users can drag and connect datasets and operators, such as an operator to estimate the fractal dimension of the arrivals (seen as 1-d points). See Figure 1.

Figure 1: Fractal Dimension

SPECIFIC MODULES

completed already

  • 'FRACDIM':

    Estimates the fractal dimension. Given a set of points (eg., arrival time-stamps of network packets), this module will plot the 'correlation integral' and estimate the intrinsic dimensionality of the cloud of points.

    In our network arrivals example, fractal dimension =1 means that the arrivals are uniformly distributed (= Poisson arrivals); a smaller number means they are clustered. If the fractal dimension is 0, then all the packets arrived at exactly the same instance

under construction

  • 'ANF':

    It estimates the hop-plot exponent, using a fast, approximate method for the neighborhood function The input is a graph, that is, a set of edges, and the output is a plot of the neighborhood function, that is, the number of neighbors within $h$ hops, as a function of $h$. The slope of this function in log-log scales is exactly the 'hop-plot exponent' [Faloutsos, Faloutsos, Faloutsos, 1999] Our algorithms for the approximate neighborhood function are described in

    Chris Palmer, Georgios Siganos, Michalis Faloutsos,
    Christos Faloutsos and Phil Gibbons
    The connectivity and fault-tolerance of the Internet topology
    Workshop on Network Related Data Management
    (NRDM 2001), Santa Barbara, CA, May 25, 2001.
    http://www.cs.cmu.edu/~christos/PUBLICATIONS/anf-nrdm.ps.gz

  • 'HURST':

    a module to estimate the Hurst exponent for the burstiness of the input traffic. The module plots the 'rescaled range' R/S, as a function of the window size (see the books by [Mandelbrot], or [Schroeder], for the exact definition)

    If the input time sequence is self-similar, then the above plot will be linear, and its slope is the Hurst exponent by definition.

PRACTICAL SIGNIFICANCE:

Hammock will collect all these tools under the same GUI interface. Users, with, say, a graph that represents an instance of the internet, could invoke the GUI, apply the appropriate operator (eg, ANF), and just read-off the value of the hop exponent, while seeing the plot in postscript.

Similarly, when studying network traffic data, a user could feed in a set of time-stamps of packet arrivals, and check their fractal dimension (the closer to '1', the more 'uniform/Poisson' the traffic is).



Christophe Tournery 2001-07-02