Previous Page TOC Index Next Page

DATABASE ENGINE


The SOLID Database Engine has been designed and implemented to provide the best possible performance by utilizing the operating system services and resources efficiently. This lean and mean Database Engine is the core of SOLID Server. It serves the data requests coming through the SA Interface from the SQL Parser and Optimizer. The Database Engine stores the data into and retrieves it from the database files.

The SOLID Database Engine provides:

Undisplayed Graphic

SOLID Database Engine offers scalability from small mobile devices to heavy-weight multiprocessing environments. The unique Bonsai Tree technology offers care-free transaction processing power and reliability within an exceptionally small footprint. These features allow easy embedding and large scale deployment.

Innovative Bonsai Technology

In SOLID Server, the active new data is separated from older, more stable data. The data storage is implemented internally as two separate indexing systems: the Bonsai Tree and the storage server.

The unique Bonsai Tree is the small active index efficiently storing new data in the central memory and maintaining multiversion information. The Bonsai Tree performs concurrency control, easily detecting if any operations conflict with each other. This minimizes the effort needed for validating transactions.

More stable data is maintained in the storage server. Data is transferred to the storage server as a highly-optimized batch insert, thus minimizing the hard disk load.

This division is invisible to the SOLID SQL API .

Storage Server

The storage server uses a B-tree variation to store all permanent indices in the database file. It is used to store both secondary keys and the primary keys. Also the data rows are stored as the primary key values actually containing all the columns of the rows. There is no separate storage method for data rows, except for BLObs and other long column values.

Indices are separated from each other by a system-defined index-identification inserted in front of every key value. This mechanism divides the index tree into several logical index subtrees, where the key values of one index are clustered close to each other.

Each key value in the index has a time stamp. The time stamp is the start number of the transaction that inserted the key value.

Main Memory Bonsai Tree with a Consistent View of Data

Undisplayed Graphic

The Bonsai Tree is a small index tree that is kept in the main memory. All delete, insert, and update operations are written into the Bonsai Tree. The key values in the Bonsai Tree nodes are both prefix and suffix compressed.

The Bonsai Tree offers a full time dimension and multiversioning to all data and key values; thus the old versions of lately updated rows and related key values are available. This information is used for both concurrency control and ensuring consistent read levels for all transactions without any locking overhead.

When a transaction is started, it is given a transaction start number (TSN). The TSN is used as the read level of the transaction; all key values inserted later in the index are not visible to searches. This offers consistent index read levels. It looks as if the read operation was performed atomically at the time the transaction was started. This guarantees that the read operations always see a consistent view of the data and no locks are needed.

Merging of Bonsai Tree to the Storage Server

Later the new committed data is merged to the storage server in a batch operation and removed from the Bonsai Tree. The parameter MergeInterval can be used to control this operation. The presorted key values are merged as a background operation concurrently with normal database operations. This offers significant I/O optimization and load balancing. The deleted key values are physically removed during the merge.

Undisplayed Graphic

The Bonsai Tree is a small index tree that is kept in the main memory. All delete, insert, and update operations are written into the Bonsai Tree. The Bonsai Tree offers a full time dimension and multiversioning to all data and key values; thus the old versions of lately updated rows and related key values are available. Later the new committed data is merged to the storage server in a batch operation and removed from the Bonsai Tree.

Bonsai Tree Benefits

The Bonsai Tree offers the following benefits compared to traditional storage structures:

Data Clustering

SOLID Server’s indexing system is used to store both secondary keys and primary keys containing also the actual data values. There is no separate storage method for data rows — except for long columns, for example, binary large objects (BLObs).

SOLID Server is capable of clustering data easily, automatically, and efficiently. Clustering is determined by defining a primary key for a table. The primary key can also be called the clustering key because it physically clusters the data rows to the order given by the index.

The set of columns used for clustering is called the row reference. The row reference uniquely identifies the data row. If the user-defined columns for the clustering key are not unique, the system ensures that the reference is unique by adding a unique row number to the reference columns. The row reference can also be called the row identifier.

The row reference can be any combination of one or more columns. Each table has a different set of columns that are used for the unique row reference.

Secondary key values refer to the data row using the row reference. This is also called primary key referencing. The data row is searched from the clustering key using the row reference as the search argument. However, if all the requested data is found from the secondary key, no search on the clustering key is performed.

Undisplayed Graphic

SOLID Server is capable of clustering data easily, automatically, and efficiently. Clustering is determined by defining a primary key for a table.

Index Compression Techniques

To save space in the index tree two methods are used when storing key values. First, only the information that differentiates the key value from the previous key value is saved. The key values are said to be prefix-compressed. Second, in the higher levels of the index tree, the key value borders are truncated from the end, i.e., they are suffix-compressed.

Undisplayed Graphic

All key values in the SOLID DBMS are prefix-compressed. Only the information that differentiates the key value from the previous key value is saved.

Unlimited Architecture

In designing SOLID Server, hard coded limits have been avoided right from the beginning. Thus the server can have any number of tables, rows, and indices.

Character strings and binary data in SOLID Server are stored in variable length format. This feature saves disk space because no extra data is stored in the database. Variable length storage also eases the tasks of a program developer since the length of strings or binary fields need not be fixed. The maximum size for a single attribute is 2 GB and the maximum size of the database 32 TB.

BLOb Support

Images, video, voice, graphics, and intelligent documents test the capabilities of LANs for moving large data objects quickly. Client/server applications, as they increasingly become multimedia applications, will be called upon to move these BLObs over LANs. The clients will capture and display BLObs and then send them to the servers for storage.

SOLID Server is capable of handling BLObs efficiently and automatically. BLObs, or binary fields larger than a configured limit, can be stored to special file areas that have optimized block sizes for large files. Large files are detected when they arrive to the server, and they are transferred directly to the file area allocated for BLOb storage. This is all done automatically and it does not require any action from the programmer or the administrator.

Undisplayed Graphic

Large files are detected when they arrive to the server, and they are transferred directly to the file area allocated for BLOb storage. This is all done automatically, and it does not require any action from the programmer or the administrator.

Concurrency Control

The primary concurrency model of SOLID Server is a multiversioning and optimistic concurrency control method. In a multiversioning scenario, each transaction has a consistent, unchanging view of the database precisely as it was when the transaction began. If any data in that view is updated by another transaction, a new version of the row is generated while the old version of the data is visible to the older transactions.

Optimistic Method

The general advantage of the multiversioning model is that read transactions never need to restrict other transactions’ access to the data. This radically improves parallelism in typical mixed-load application environments.

The optimistic concurrency control method provides the following benefits, especially in modern interactive GUI-based application environments:

SOLID Server offers fully serializable transactions. Serializability is achieved through a read-set validation scheme that prevents lost updates and phantom rows, for example.

Undisplayed Graphic

Because of the time dimension of the Bonsai Tree, each transaction has its own consistent view of the database — this makes locking unnecessary. When the transaction commits, SOLID Server checks that no conflicting operations were made to the small and efficient main memory Bonsai Tree by simultaneous transactions. Optimistic multiversion concurrency control never causes operations to wait for locks to be released. It offers better performance for the majority of applications. No effort is wasted in maintaining locks and deadlock resolution algorithms.

Locking

When necessary, SOLID Server can also use pessimistic (row-level locking) or mixed concurrency control methods.

Individual tables can be set as optimistic or pessimistic with the SQL command

By default, optimistic concurrency control is used for all tables.

Programmers can use the following locks: SHARED, INTENT, and EXCLUSIVE.

Transaction Isolation Levels

Applications have different requirements when it comes to concurrency control: some need to execute as if they had the database all to themselves, others can tolerate some degree of interference from other applications running simultaneously. To meet the needs of different applications, the SQL2 standard defines the following four alternative isolation levels:

The isolation level can be set to Serializable, for example, with the SQL2 command that affects all subsequent transactions:

Processes and Threads

A process is a program that has been loaded into memory and prepared for execution. A process consists of code, data, and other resources such as open files and open queues. Creating a new process is relatively slow and causes a substantial amount of overhead since the program must be read from a disk and loaded into memory. Communication between processes is done through protocols such as Named Pipes and Shared Memory. Many conventional DBMSs are using multi-process architecture.

SOLID Server is designed to take full advantage of multi-thread architecture. It provides an efficient way of sharing the processor within an application, as opposed to between applications. A thread is a dispatchable piece of code that merely owns a stack, registers, and its priority. It shares everything else with all the other active threads in a process. Creating a thread requires much less system overhead than creating a process. Threads are loaded into memory as part of the calling program; no disk access is therefore necessary when a thread is invoked by another thread. Threads can communicate using global variables, events, and semaphores.

If the operating system supports symmetric multi-threading between different processors, SOLID Server can automatically take advantage of multiple processors.

When different threads are executing simultaneously in the server, they interact with each other using shared server objects. These shared objects are the most critical for the proper synchronization between different threads. Conflicts between different threads can exist only when they are using shared objects.

The threading system of SOLID Server can be divided into two separate classes:

The number of SOLID Server threads can be set in the configuration file.

General Purpose Threads

General purpose threads execute tasks from the server's tasking system. They can execute any of the following tasks:

The most effective number of threads depends on the number of processors the system has installed. Usually it is most efficient to have between two and eight threads per processor. If there was a thread for every user, the performance of the system would actually degrade when hundreds of users are connected to the system.

General purpose threads take a task from the tasking system, execute the task step to completion and then switch to another task from the tasking system. The task steps are designed to be small because they are used to simulate multi-threading in non-multi-threaded environments. The tasking system works in a round-robin fashion distributing the client operations evenly between different threads.

Dedicated Threads

Dedicated threads are dedicated to a specific operation. The following dedicated threads may exist in the server:

The communication threads are described in the chapter Network Services.

I/O Manager Thread

The I/O manager thread is used for intelligent disk I/O optimization and load balancing. All I/O requests go through the I/O manager. Depending on the mode it is run in, it may pass the I/O request directly to the cache, or it may try to schedule it among other I/O requests.

The I/O manager has three basic functions:

Prefetching

When the I/O manager is handling a long sequential search, it enters a read-ahead operation mode. This happens in order to ensure that the next file blocks of the search in question will be read in the cache in advance. This naturally improves the overall performance of sequential searches.

Preflushing

The preflush operations prepare the cache for the allocation of new blocks. The blocks are written onto the disk from the tail of the cache based on a Least Recently Used (LRU) algorithm. Therefore, when new cache blocks are needed, they can be taken immediately without writing the old contents onto the disk.

I/O ordering

This function orders I/O requests by their logical file address. The ordering optimizes the file I/O since the file addresses accessed on the disk are in close range. This improves performance by minimizing the disk read head movement.

Buffer Management

SOLID Database Engine is designed to:

The basic element of the memory management system is a pool of central memory buffers of equal size. The amount and size of memory buffers can be configured to meet the demands of different application environments.

Log Manager

The task of a log manager is to ensure that the effects of a transaction are written to permanent storage immediately at commit time. The SOLID log manager has been designed to ensure robustness with optimal performance.

The log manager of SOLID Server can run in three different operation modes. The choice of logging method depends on the log file media and the level of security needed. All pending transactions are written to the log file as a single unit of work (i.e., the group commit method is used automatically).

Ping-pong Method

This default method uses two separate disk blocks at the end of the log file to write the transaction commit records. The ping-pong method toggles between these two blocks until one block becomes full. This double block method offers practical combination of high performance and security. It ensures that no previously written data is lost even if the server loses power in the most critical section of the log write process.

Write-once Method

This method will write all pending log records immediately to the disk. An incomplete disk block is always padded with blanks.

This is the method of choice when the log file storage media is, for example, a magnetic tape drive or a WORM. If the server runs on a single thread, this method of logging is not recommended.

Overwriting Method

This method rewrites incomplete blocks at each commit until the blocks become full. It may be used when data loss from the last log-file disk block is affordable.

Hot Standby Replication

In the ‘hot standby’ option, SOLID Servers can have two different roles: either the primary role or the backup role. The different roles can be changed dynamically after a failure. Typically, after a failure in the primary server, the backup server becomes the new primary server, and the old primary server becomes the new backup server. This change is automatic and dynamic. Only the initial roles of servers at start-up have to be configured manually.

There can be only one primary server, but there may be multiple backup servers. All update transactions are executed on the primary server and copied to the backup server. Logically, copying transactions means that the transaction log writes from the primary server are copied to the backup server. The backup server runs a continuous roll-forward process updating the database.

There are three alternative approaches for copying the log:

The SOLID Server ‘hot standby’ option uses the 2-safe replication design.

Previous Page TOC Index Next Page

Copyright © 1992-1997 Solid Information Technology Ltd All rights reserved.