Architecture of a database management system
There’s no common blueprint for database management system design. Every database is built slightly differently, and component boundaries are somewhat hard to see and define. Even if these boundaries exist on paper (e.g., in project documentation), in code seemingly independent components may be coupled because of performance optimizations, handling edge cases, or architectural decisions.
Hence in this article I will be defining a common architecture (component + interfaces between them) of DBMS, that I believe every DB vendor thrive for.
Quick Links
DBMS Architecture
Database management systems use a client/server model, where database system instances (nodes) take the role of servers, and application instances take the role of clients.
DBMS Components and their Interaction
So, it is not the one man job but various component/s that work together for a common goal and that is storage and retrieval of your data in a robust and consistent manner.
You can think of it like a car, who has various part/s or you can say component/s (Engine, Clutch, Gear, Tyre etc.) that work together for a common goal and that is keep you moving.
Transport Subsystem
Client requests arrive through the transport subsystem. Requests come in the form of queries, most often expressed in some query language. The transport subsystem is also responsible for communication with other nodes in the database cluster.
Query Processor
Upon receipt, the transport subsystem hands the query over to a query processor, which parses, interprets, and validates it. Later, access control checks are performed, as they can be done fully only after the query is interpreted.
The parsed query is passed to the query optimizer, which first eliminates impossible and redundant parts of the query, and then attempts to find the most efficient way to execute it based on internal statistics (index cardinality, approximate intersection size, etc.) and data placement (which nodes in the cluster hold the data and the costs associated with its transfer). The optimizer handles both relational operations required for query resolution, usually presented as a dependency tree, and optimizations, such as index ordering, cardinality estimation, and choosing access methods.
Execution Engine
The query is usually presented in the form of an execution plan (or query plan): a sequence of operations that have to be carried out for its results to be considered complete. Since the same query can be satisfied using different execution plans that can vary in efficiency, the optimizer picks the best available plan.
The execution plan is handled by the execution engine, which collects the results of the execution of local and remote operations. Remote execution can involve writing and reading data to and from other nodes in the cluster, and replication.
Storage Engine
Local queries (coming directly from clients or from other nodes) are executed by the storage engine. The storage engine has several components with dedicated responsibilities:
Transaction manager
- This manager schedules transactions and ensures they cannot leave the database in a logically inconsistent state. So it basically handles stuff like start transaction, commit/rollback/save points.
Lock manager
- This manager locks on the database objects for the running transactions, ensuring that concurrent operations do not violate physical data integrity. There is basically the decision of granularity like table or row level, that give a picture of how more the database will be handling simultaneous transactions at a given point of time.
Access methods (storage structures)
- These manage access and organizing data on disk. Access methods include heap files and storage structures such as B-Trees.
Buffer manager
- This manager caches data and/or index pages in memory.
Recovery manager
- This manager maintains the operation log and restoring the system state in case of a failure.
Together, transaction and lock managers are responsible for concurrency control. They guarantee logical and physical data integrity while ensuring that concurrent operations are executed as efficiently as possible.
Comments
Post a Comment