The SOLID Database Engine has been designed and implemented to provide the best possible performance by utilizing the operating system services and resources efficiently. This lean and mean Database Engine is the core of SOLID Server. It serves the data requests coming through the SA Interface from the SQL Parser and Optimizer. The Database Engine stores the data into and retrieves it from the database files.
- true multi-thread SMP architecture and parallel processing
- intelligent row-level transaction management
- unique combination of pessimistic and optimistic concurrency control
- multiversioning to offer a consistent view of data with no locks
- persistent identity for efficient post-relational object references
- variable length columns and powerful BLOb support
- reduced memory usage by prefix and suffix compressing of index leaves
- intelligent transactions for mobile data synchronization
- automatic roll-forward recovery
- optional hot standby replication
- scalability from small mobile devices to SMP RISC environments
- small footprint starting from 300 kB RAM and disk space
SOLID Database Engine offers scalability from small mobile devices to heavy-weight multiprocessing environments. The unique Bonsai Tree technology offers care-free transaction processing power and reliability within an exceptionally small footprint. These features allow easy embedding and large scale deployment.
In SOLID Server, the active new data is separated from older, more stable data. The data storage is implemented internally as two separate indexing systems: the Bonsai Tree and the storage server.
The unique Bonsai Tree is the small active index efficiently storing new data in the central memory and maintaining multiversion information. The Bonsai Tree performs concurrency control, easily detecting if any operations conflict with each other. This minimizes the effort needed for validating transactions.
More stable data is maintained in the storage server. Data is transferred to the storage server as a highly-optimized batch insert, thus minimizing the hard disk load.
This division is invisible to the SOLID SQL API .
The storage server uses a B-tree variation to store all permanent indices in the database file. It is used to store both secondary keys and the primary keys. Also the data rows are stored as the primary key values actually containing all the columns of the rows. There is no separate storage method for data rows, except for BLObs and other long column values.
Indices are separated from each other by a system-defined index-identification inserted in front of every key value. This mechanism divides the index tree into several logical index subtrees, where the key values of one index are clustered close to each other.
Each key value in the index has a time stamp. The time stamp is the start number of the transaction that inserted the key value.
|
When a transaction is started, it is given a transaction start number (TSN). The TSN is used as the read level of the transaction; all key values inserted later in the index are not visible to searches. This offers consistent index read levels. It looks as if the read operation was performed atomically at the time the transaction was started. This guarantees that the read operations always see a consistent view of the data and no locks are needed.
Merging of Bonsai Tree to the Storage Server
Later the new committed data is merged to the storage server in a batch operation and removed from the Bonsai Tree. The parameter MergeInterval can be used to control this operation. The presorted key values are merged as a background operation concurrently with normal database operations. This offers significant I/O optimization and load balancing. The deleted key values are physically removed during the merge.
The Bonsai Tree is a small index tree that is kept in the main memory. All delete, insert, and update operations are written into the Bonsai Tree. The Bonsai Tree offers a full time dimension and multiversioning to all data and key values; thus the old versions of lately updated rows and related key values are available. Later the new committed data is merged to the storage server in a batch operation and removed from the Bonsai Tree.
The Bonsai Tree offers the following benefits compared to traditional storage structures:
- All write (e.g., delete, insert, and update) operations are very fast and access only the small Bonsai Tree in the main memory. There is no need to access the massive disk based storage server at all.
- All read operations have a consistent view of the data without any extra validation or locking.
- All transaction concurrency control operations can be limited to the Bonsai Tree: conflicts between transactions can occur only with simultaneous write (e.g., delete, insert, and update) operations that are all stored in the small and efficient Bonsai Tree.
- The time dimension within the Bonsai Tree offers simple and efficient tools for full optimistic predicate transaction validation. It means serializable transactions without locking, even avoiding the so-called phantom problem.
- When the Bonsai Tree is merged to the larger storage tree, the key values can be inserted in a sorted order. When the storage tree is very large, this feature is especially important because it radically minimizes disk I/O.
SOLID Servers indexing system is used to store both secondary keys and primary keys containing also the actual data values. There is no separate storage method for data rows except for long columns, for example, binary large objects (BLObs).
SOLID Server is capable of clustering data easily, automatically, and efficiently. Clustering is determined by defining a primary key for a table. The primary key can also be called the clustering key because it physically clusters the data rows to the order given by the index.
The set of columns used for clustering is called the row reference. The row reference uniquely identifies the data row. If the user-defined columns for the clustering key are not unique, the system ensures that the reference is unique by adding a unique row number to the reference columns. The row reference can also be called the row identifier.
The row reference can be any combination of one or more columns. Each table has a different set of columns that are used for the unique row reference.
Secondary key values refer to the data row using the row reference. This is also called primary key referencing. The data row is searched from the clustering key using the row reference as the search argument. However, if all the requested data is found from the secondary key, no search on the clustering key is performed.
SOLID Server is capable of clustering data easily, automatically, and efficiently. Clustering is determined by defining a primary key for a table.
To save space in the index tree two methods are used when storing key values. First, only the information that differentiates the key value from the previous key value is saved. The key values are said to be prefix-compressed. Second, in the higher levels of the index tree, the key value borders are truncated from the end, i.e., they are suffix-compressed.
All key values in the SOLID DBMS are prefix-compressed. Only the information that differentiates the key value from the previous key value is saved.
In designing SOLID Server, hard coded limits have been avoided right from the beginning. Thus the server can have any number of tables, rows, and indices.
Character strings and binary data in SOLID Server are stored in variable length format. This feature saves disk space because no extra data is stored in the database. Variable length storage also eases the tasks of a program developer since the length of strings or binary fields need not be fixed. The maximum size for a single attribute is 2 GB and the maximum size of the database 32 TB.
Images, video, voice, graphics, and intelligent documents test the capabilities of LANs for moving large data objects quickly. Client/server applications, as they increasingly become multimedia applications, will be called upon to move these BLObs over LANs. The clients will capture and display BLObs and then send them to the servers for storage.
SOLID Server is capable of handling BLObs efficiently and automatically. BLObs, or binary fields larger than a configured limit, can be stored to special file areas that have optimized block sizes for large files. Large files are detected when they arrive to the server, and they are transferred directly to the file area allocated for BLOb storage. This is all done automatically and it does not require any action from the programmer or the administrator.
Large files are detected when they arrive to the server, and they are transferred directly to the file area allocated for BLOb storage. This is all done automatically, and it does not require any action from the programmer or the administrator.
The primary concurrency model of SOLID Server is a multiversioning and optimistic concurrency control method. In a multiversioning scenario, each transaction has a consistent, unchanging view of the database precisely as it was when the transaction began. If any data in that view is updated by another transaction, a new version of the row is generated while the old version of the data is visible to the older transactions.
The general advantage of the multiversioning model is that read transactions never need to restrict other transactions access to the data. This radically improves parallelism in typical mixed-load application environments.
The optimistic concurrency control method provides the following benefits, especially in modern interactive GUI-based application environments:
- Data is always available to the users because locking is not used.
- Users can browse through the data displayed in various lists and menus and choose to update any row at will. When the updating transaction is committed, the system checks whether someone else has already changed that row. SOLID Server does this automatically; no extra checking code is needed in the application.
- The database access is improved since deadlocks are not possible.
SOLID Server offers fully serializable transactions. Serializability is achieved through a read-set validation scheme that prevents lost updates and phantom rows, for example.
|
|
When necessary, SOLID Server can also use pessimistic (row-level locking) or mixed concurrency control methods.
Individual tables can be set as optimistic or pessimistic with the SQL command
ALTER TABLE base-table-name SET {OPTIMISTIC | PESSIMISTIC}
By default, optimistic concurrency control is used for all tables.
Programmers can use the following locks: SHARED, INTENT, and EXCLUSIVE.
Applications have different requirements when it comes to concurrency control: some need to execute as if they had the database all to themselves, others can tolerate some degree of interference from other applications running simultaneously. To meet the needs of different applications, the SQL2 standard defines the following four alternative isolation levels:
- Read Uncommitted:
Allows read-only transactions to read data modified by transactions that have not yet committed. This dirty read mode of operation is not supported by SOLID Server. Its purpose has been to enhance concurrency in DBMSs that use locking, but it sacrifices the consistent view and potentially also database integrity.
- Read Committed:
Allows a transaction to read only committed data. Still, the view of the database may change in the middle of a transaction when other transactions commit their changes. Also the phantom problem may occur. However, SOLID Server ensures that the results set returned by a single query is consistent by setting the read level to the latest committed transaction when the query is started.
- Repeatable Read:
Allows a transaction to read only committed data and guarantees that read data will not change until the transaction terminates. SOLID Server additionally ensures that the transaction sees a consistent view of the database. This is the default isolation level provided by SOLID Server. Conflicts between transactions are detected by using transaction write-set validation. Still, the phantom problem may occur.
- Serializable:
Allows a transaction to read only committed data with a consistent view of the database. Additionally, no other transaction may change the values read by the transaction before it is committed because otherwise the execution of transactions cannot be serialized in the general case. SOLID Server can provide serializable transactions by detecting conflicts between transactions. It does this by using both write-set and read-set validations. This way, SOLID Server avoids all concurrency control anomalies, including the phantom problem, without any locks!
The isolation level can be set to Serializable, for example, with the SQL2 command that affects all subsequent transactions:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
A process is a program that has been loaded into memory and prepared for execution. A process consists of code, data, and other resources such as open files and open queues. Creating a new process is relatively slow and causes a substantial amount of overhead since the program must be read from a disk and loaded into memory. Communication between processes is done through protocols such as Named Pipes and Shared Memory. Many conventional DBMSs are using multi-process architecture.
SOLID Server is designed to take full advantage of multi-thread architecture. It provides an efficient way of sharing the processor within an application, as opposed to between applications. A thread is a dispatchable piece of code that merely owns a stack, registers, and its priority. It shares everything else with all the other active threads in a process. Creating a thread requires much less system overhead than creating a process. Threads are loaded into memory as part of the calling program; no disk access is therefore necessary when a thread is invoked by another thread. Threads can communicate using global variables, events, and semaphores.
If the operating system supports symmetric multi-threading between different processors, SOLID Server can automatically take advantage of multiple processors.
When different threads are executing simultaneously in the server, they interact with each other using shared server objects. These shared objects are the most critical for the proper synchronization between different threads. Conflicts between different threads can exist only when they are using shared objects.
The threading system of SOLID Server can be divided into two separate classes:
- general purpose threads
- dedicated threads
The number of SOLID Server threads can be set in the configuration file.
General purpose threads execute tasks from the server's tasking system. They can execute any of the following tasks:
- serving user requests
- making backups
- making checkpoints
- making timed commands
- index merging
The most effective number of threads depends on the number of processors the system has installed. Usually it is most efficient to have between two and eight threads per processor. If there was a thread for every user, the performance of the system would actually degrade when hundreds of users are connected to the system.
General purpose threads take a task from the tasking system, execute the task step to completion and then switch to another task from the tasking system. The task steps are designed to be small because they are used to simulate multi-threading in non-multi-threaded environments. The tasking system works in a round-robin fashion distributing the client operations evenly between different threads.
Dedicated threads are dedicated to a specific operation. The following dedicated threads may exist in the server:
- I/O manager thread
- communication read threads
- one communication select thread per protocol, i.e., selector thread
- communication server thread, i.e., RPC server main thread
The communication threads are described in the chapter Network Services.
The I/O manager thread is used for intelligent disk I/O optimization and load balancing. All I/O requests go through the I/O manager. Depending on the mode it is run in, it may pass the I/O request directly to the cache, or it may try to schedule it among other I/O requests.
The I/O manager has three basic functions:
Prefetching
When the I/O manager is handling a long sequential search, it enters a read-ahead operation mode. This happens in order to ensure that the next file blocks of the search in question will be read in the cache in advance. This naturally improves the overall performance of sequential searches.
Preflushing
The preflush operations prepare the cache for the allocation of new blocks. The blocks are written onto the disk from the tail of the cache based on a Least Recently Used (LRU) algorithm. Therefore, when new cache blocks are needed, they can be taken immediately without writing the old contents onto the disk.
I/O ordering
This function orders I/O requests by their logical file address. The ordering optimizes the file I/O since the file addresses accessed on the disk are in close range. This improves performance by minimizing the disk read head movement.
SOLID Database Engine is designed to:
- minimize mass storage I/O operations by keeping as much information as possible resident in the central memory.
- provide dynamically extendible and shrinkable work areas for variable and unlimited size of column values and BLObs.
- provide a practical way of handling the Bonsai technology.
The basic element of the memory management system is a pool of central memory buffers of equal size. The amount and size of memory buffers can be configured to meet the demands of different application environments.
The task of a log manager is to ensure that the effects of a transaction are written to permanent storage immediately at commit time. The SOLID log manager has been designed to ensure robustness with optimal performance.
The log manager of SOLID Server can run in three different operation modes. The choice of logging method depends on the log file media and the level of security needed. All pending transactions are written to the log file as a single unit of work (i.e., the group commit method is used automatically).
Ping-pong Method
This default method uses two separate disk blocks at the end of the log file to write the transaction commit records. The ping-pong method toggles between these two blocks until one block becomes full. This double block method offers practical combination of high performance and security. It ensures that no previously written data is lost even if the server loses power in the most critical section of the log write process.
Write-once Method
This method will write all pending log records immediately to the disk. An incomplete disk block is always padded with blanks.
This is the method of choice when the log file storage media is, for example, a magnetic tape drive or a WORM. If the server runs on a single thread, this method of logging is not recommended.
Overwriting Method
This method rewrites incomplete blocks at each commit until the blocks become full. It may be used when data loss from the last log-file disk block is affordable.
In the hot standby option, SOLID Servers can have two different roles: either the primary role or the backup role. The different roles can be changed dynamically after a failure. Typically, after a failure in the primary server, the backup server becomes the new primary server, and the old primary server becomes the new backup server. This change is automatic and dynamic. Only the initial roles of servers at start-up have to be configured manually.
There can be only one primary server, but there may be multiple backup servers. All update transactions are executed on the primary server and copied to the backup server. Logically, copying transactions means that the transaction log writes from the primary server are copied to the backup server. The backup server runs a continuous roll-forward process updating the database.
There are three alternative approaches for copying the log:
- 1-safe. In a 1-safe design, the primary transaction manager goes through the standard commit logic and declares completion when the commit record is written to the local log. In this design, throughput and response time are the same as in a single-system design. The log is synchronously spooled to the backup system. This design risks lost transactions in the case of primary system failure immediately after transaction commit. This alternative can also be called an asynchronous replication configuration.
- 2-safe. When possible, the 2-safe design involves the backup system in commit. If the backup system is up, it receives the transaction log at the end of commit phase 1. The primary transaction manager will not commit until the backup responds or until it is declared down. The backup transaction manager has the option of responding immediately after the log arrives or responding after the log has been forced into durable storage. The 2-safe design avoids lost transactions if there is only a single failure, but adds some delay to the transaction commit and consequently to the response time. This alternative can also be referred to as a hot backup configuration.
- Very-safe. The very-safe design takes an even more conservative approach: it commits transactions only if both the primary and the backup agree to commit. If one of the two nodes is down, no transaction can commit. The availability of such a system is not as good as the availability of a single system. However, the very-safe approach avoids lost transactions unless there are two simultaneous site disasters.
The SOLID Server hot standby option uses the 2-safe replication design.
Copyright © 1992-1997 Solid Information Technology Ltd All rights reserved.