Database Microbenchmarking --- Create a Small but Real World

Faculty: Anastassia Ailamaki

Student: Minglong Shao

    Performance evaluation of Database system from the architectural perspective has become a hot topic in database research. It aims at characterizing database behavior on modern computer architectures and correlating bottlenecks to the underlying hardware components. Database benchmarks are synthetic database workloads consisting of datasets and queries, which offer rich environments representative of typical database applications. Although current database benchmarks are well-designed to simulate the real world applications, they are not applicable in performance evaluation due to the following reasons. First, it is hard to setup the experiment environment: large hardware configuration is usually beyond the research budget; it may take years to execute a complex query on simulator; researchers have to set hundreds of parameters correctly to make sure that they match the requirements of the intended experiments. Secondly, it is difficult to analyze the results. Full-scale benchmarks test all aspects of the system. Intensive interactions between different components make it difficult to create a bottleneck-to-component mapping. Thirdly, full-scale benchmarks have the feature of uncertainty. For instance, query plan may vary dramatically with different system configurations, which complicates the analysis unnecessarily. Due to the above reasons, studies of database performance evaluation are mostly restricted on small scale benchmarks and a subset of simple queries. Though previous researches have successfully characterized the database behavior at small scale and predicted some behavior trends of database system, they lack strict analysis and proof based on sufficient experiments with database systems of different scales. The questions of  "is the behavior on small scale benchmarks good representatives?", "when does scaling down matter, in what way and to what extent?", and "what is the best way to scale down database workloads for analysis purpose?" are still open. The project aims at answering the above questions. We want to provide a methodology to shrink/simplify database workloads correctly, which can be used to evaluate computer architecture innovations and help to decide the design trade-off's quickly.