Subday design Library database development

30.03.17 3351

By following the principles described in this article, you can create a database that works as expected and can be adapted to new requirements in the future. We will look at the basic principles database design, as well as ways to optimize it.

Database Design Process

Properly structured database:

  • Helps you save money disk space by eliminating unnecessary data;
  • Maintains data accuracy and integrity;
  • Provides convenient access to data.

Database development includes the following stages:

  1. Requirements analysis or database purpose determination;
  2. Organizing data in tables;
  3. Note primary keys and linkage analysis;
  4. Normalization of tables.

Let's consider each database design stage more details. Please note that this tutorial covers Edgar Codd's relational database model, written in SQL language (rather than hierarchical, network or object models).

Requirements Analysis: Determining the Purpose of the Database

For example, if you are creating a database for a public library, you need to consider how both readers and librarians should access the database.

Here are some ways to collect information before creating a database:

  • Interviewing people who will use it;
  • Analysis of business forms such as invoices, schedules, surveys;
  • Consideration of all existing systems data ( including physical and digital files).

Start by collecting existing data that will be included in the database. Next, determine the types of data you want to save. As well as objects that describe this data. For example:

Clients

  • Address;
  • City, State, Zip Code;
  • Address email.

Goods

  • Name;
  • Price;
  • Quantity in stock;
  • Quantity to order.

Orders

  • Order number;
  • Sales Representative;
  • Date;
  • Product;
  • Quantity;
  • Price;
  • Price.

When designing a relational database, this information will later become part of the data dictionary, which describes the tables and fields of the database. Break the information down into the smallest parts possible. For example, consider splitting the street address and state fields so you can filter people by the state in which they live.

Once you have decided what data will be included in the database, where the data will come from, and how it will be used, you can begin planning the actual database.

Database structure: building blocks

The next step is to visually represent the database. To do this, you need to know exactly how relational databases are structured. Within a database, related data is grouped into tables, each of which consists of rows and columns.

To convert lists of data into tables, start by creating a table for each type of object, such as products, sales, customers, and orders. Here's an example:

Each row in a table is called a record. Records include information about something or someone, such as a specific customer. Columns (also called fields or attributes) contain the same type of information that is displayed for each record, for example, the addresses of all customers listed in the table.

To ensure consistency when designing a database model different records, assign the appropriate data type to each column. TO common types data include:

  • CHAR - specific text length;
  • VARCHAR - text of various lengths;
  • TEXT - large amount of text;
  • INT is a positive or negative integer;
  • FLOAT , DOUBLE — floating point numbers;
  • BLOB - binary data.

Some DBMSs also offer an Autonumber data type, which automatically generates a unique number on each row.

IN visual representation In the database, each table will be represented by a block in the diagram. The header of each block should state what the data in that table describes, and the attributes should be listed below:

At design information base data you need to decide which attributes, if any, will serve as the primary key for each table. Primary key ( PK) is a unique identifier for of this object. With it, you can select a specific customer's data, even if you only know that value.

Attributes chosen as primary keys must be unique, immutable, and cannot be set to NULL value (they can't be empty). For this reason, order numbers and usernames are suitable primary keys, but phone numbers or addresses are not. You can also use several fields at the same time as a primary key ( this is called a composite key).

When it comes time to create the actual database, you implement both the logical and physical structure through the data definition language supported by your DBMS.

You also need to evaluate the size of your database to ensure that you can get the required level of performance and that you have enough space to store the data.

Creating relationships between entities

Now that the data has been converted into tables, we need to analyze the relationships between them. The complexity of a database is determined by the number of elements interacting between two related tables. Determining complexity helps ensure that you divide your data into tables in the most efficient way.

Each object can be interconnected with another using one of three types of relationships:

One-to-one communication

When there is only one instance of object A for each instance of object B, they are said to have a one-to-one relationship ( often denoted 1:1). You can indicate this type of relationship in an ER diagram with a line with a dash at each end:

If you have no reason to separate this data when designing and developing databases, a 1:1 relationship usually indicates that it is better to combine these tables into one.

But under certain circumstances, it makes more sense to create tables with 1:1 relationships. If you have an optional data field, such as "description", that is left blank for many records, you can move all the descriptions to a separate table, excluding empty fields and improving database performance.

To ensure that the data is correlated correctly, you will need to include at least, one identical column in each table. Most likely this will be the primary key.

One-to-many communication

These relationships occur when a record in one table is related to multiple records in another. For example, one customer might place many orders, or a reader might have several books borrowed from the library. One-to-many (1:M) relationships are indicated by the so-called crow's foot mark, as in this example:

To implement a 1:M relationship, add the primary key from "one" table as an attribute to the other table. If the primary key is specified in this way in another table, it is called a foreign key. The table on the "1" side of the relationship is the parent table to the child table on the other side.

Many-to-many communication

When several objects of a table can be related to several objects of another. They say they have a connection" many-to-many» ( M:N). For example, in the case of students and courses, since a student can take many courses, and each course can have many students.

In an ER diagram, these relationships are represented using the following lines:

When designing a database structure, it is impossible to implement this kind of connection. Instead, you need to break them into two one-to-many relationships.

To do this, you need to create a new entity between these two tables. If there is an M:N relationship between sales and products, you can call this new object « sold_products", as it will contain data for each sale. Both the sales table and the products table will have a 1:M relationship with sold_products . This kind of intermediate object in various models called a link table, association object, or link table.

Each entry in the relationship table will correspond to two entities from neighboring tables. For example, a table of connections between students and courses might look like this:

Mandatory or not?

Another way to analyze connections is to consider which side of the relationship must exist for the other to exist. The optional side may be marked with a circle on the line. For example, a country must exist in order to have a representative at the United Nations, and not vice versa:

Two objects can be interdependent ( one cannot exist without the other).

Recursive connections

Sometimes when designing a database, a table points to itself. For example, an employee table might have an attribute "manager" that refers to another person in the same table. This is called recursive links.

Extra connections

Extraneous connections are those that are expressed more than once. Typically, you can delete one of these relationships without losing any important information. For example, if the object "students" has a direct relationship with another object called "teachers", but also has an indirect relationship with teachers through "subjects", you need to remove the relationship between "students" and "teachers". Because the only way, to which students are assigned teachers are subjects.

Database normalization

After preliminary database design, you can apply normalization rules to ensure that the tables are structured correctly.

At the same time, not all databases need to be normalized. In general, databases with real-time transaction processing ( OLTP), must be normalized.

Databases with interactive analytical processing ( OLAP), allowing for easier and faster data analysis, can be more effective with a certain degree of denormalization. The main criterion here is the speed of calculations. Each form or level of normalization includes rules associated with the lower forms.

First form of normalization

The first form of normalization ( abbreviated 1NF) states that during logical database design Each cell in a table can only have one value, not a list of values. Therefore, a table like the one below does not correspond to 1NF:

You may want to get around this limitation by splitting the data into additional columns. But this is also against the rules: a table with groups of duplicate or closely related attributes does not comply with the first form of normalization. For example, the table below does not correspond to 1NF:

Instead, during physical database design, divide the data into multiple tables or records until each cell contains only one value, and additional columns there won't be. Such data is considered to be broken down to the smallest usable size. In the above table, you can create an additional table " Sales details”, which will match specific products with sales. "Sales" will have a 1:M relationship with " Sales details».

Second form of normalization

The second form of normalization ( 2NF) stipulates that each of the attributes must depend entirely on the primary key. Each attribute must depend directly on the entire primary key, and not indirectly through another attribute.

For example, the attribute “age” depends on “birthday”, which, in turn, depends on “student ID”, has a partial functional dependence. A table containing these attributes will not conform to the second form of normalization.

In addition, a table with a primary key consisting of several fields violates the second form of normalization if one or more fields do not depend on each part of the key.

Thus, a table with these fields will not match the second form of normalization, since the "product name" attribute depends on the product ID, but not on the order number:

  • Order number (primary key);
  • Product ID (primary key);
  • Product name.

Third form of normalization

The third form of normalization ( 3NF) : Every non-key column must be independent of every other column. If at relational database design changing a value in one non-key column causes another value to change, this table does not comply with the third form of normalization.

According to 3NF, you cannot store any derived data in a table, such as the "Tax" column, which in the example below directly depends on the total cost of the order:

At one time, additional forms of normalization were proposed. Including Boyce-Codd form of normalization, forms four through six, and domain key normalization, but the first three are the most common.

Multidimensional data

Some users may need to access multiple views of the same data type, especially in OLAP databases. For example, they might want to know sales by customer, country, and month. In this situation, it is better to create a central table that can be referenced by the customer, country, and month tables. For example:

Data Integrity Rules

Also using database design tools it is necessary to configure the database taking into account the ability to check data for compliance with certain rules. Many DBMSs such as Microsoft Access, automatically apply some of these rules.

The integrity rule states that a primary key can never be NULL. If a key consists of multiple columns, none of them can be NULL. Otherwise, it may ambiguously identify the entry.

The referential integrity rule requires that every foreign key specified in one table be mapped to one primary key in the table it references. If a primary key is changed or deleted, those changes must be implemented in all objects referenced by that key in the database.

Business logic integrity rules ensure that data conforms to certain logical parameters. For example, the meeting time must be within standard business hours.

Adding Indexes and Views

An index is a sorted copy of one or more columns with values ​​in ascending or descending order. Adding an index allows you to find records faster. Instead of re-sorting for each query, the system can access the records in the order specified by the index.

Although indexes speed up data retrieval, they can slow down adding, updating, and deleting data because the index must be rebuilt whenever a record changes.

A view is a saved request for data. Views can include data from multiple tables or display part of a table.

Advanced Properties

After database model design You can refine your database using advanced properties such as help text, input masks, and formatting rules that apply to a specific schema, view, or column. The advantage of this method is that since these rules are stored in the database itself, the presentation of the data will be consistent across multiple programs that access the data.

SQL and UML

Unified Modeling Language ( UML) is another visual way of expressing complex systems created in an object-oriented language. Some of the concepts mentioned in this tutorial are known in UML as different names. For example, an object in UML is known as a class.

UML is not used that often these days. These days it is used academically and in communication between software developers and their clients.

Database Management Systems

The structure of the designed database depends on which DBMS you are using. Some of the most common:

A suitable database management system can be selected based on the cost established operating system, the presence of various functions, etc.

Translation of the article " Database Structure and Design Tutorial» friendly project team.

Good Bad

Translation of a series of 15 articles on database design.
The information is intended for beginners.
Helped me. Perhaps it will help someone else fill in the gaps.

Database Design Guide.

1. Introduction.
If you are going to create own bases data, it is a good idea to adhere to database design guidelines as this will ensure the long-term integrity and ease of maintenance of your data. This guide will tell you what databases are and how to design a database that follows the rules of relational database design.

Databases are programs that allow you to store and retrieve large amounts of related information. Databases consist of tables, which contain information. When you create a database you need to think about what tables you need to create and what communications exist between the information in the tables. In other words, you need to think about project your database. Nice project database, as mentioned earlier, will ensure data integrity and ease of maintenance.
A database is created to store information in it and retrieve this information when necessary. This means that we must be able to place, insert ( INSERT) information into the database and we want to be able to retrieve information from the database ( SELECT).
A database query language was invented for these purposes and was called Structured Query Language or SQL. The operations of inserting data (INSERT) and selecting them (SELECT) are parts of this very language. Below is an example of a data retrieval request and its result.

SQL is a big topic and is beyond the scope of this tutorial. This article is strictly focused on presenting database design process. I'll cover the basics of SQL later in a separate tutorial.

Relational model.
In this tutorial, I will show you how to create a relational data model. The relational model is a model that describes how to organize data in tables and how to define relationships between those tables.

Rules relational model dictate how information should be organized in tables and how tables relate to each other. Ultimately, the result can be presented in the form of a database diagram or, more precisely, an entity-relationship diagram, as in the figure (Example taken from MySQL Workbench).

Examples.
I used a number of applications as examples in the guide.

RDBMS.

The RDBMS I used to create the example tables was MySQL. MySQL is the most popular RDBMS and it is free.

Database administration utility.

After MySQL installations you only get the interface command line to interact with MySQL. Personally, I prefer a GUI to manage my databases. I use SQLyog often. This free utility With graphical interface. Images of tables in this manual taken from there.

Visual modeling.

There is a great free MySQL application Workbench. It allows you to design your database graphically. The diagram images in the manual are made in this program.

Design independent of RDBMS.
It's important to know that although this tutorial provides examples for MySQL, database design is independent of RDBMS. This means that the information applies to relational databases in general, not just MySQL. You can apply the knowledge from this tutorial to any relational databases like Mysql, Postgresql, Microsoft Access, Microsoft Sql or Oracle.

In the next part I will briefly talk about the evolution of databases. You will learn where databases and the relational data model come from.

2. History.
In the 70s and 80s, when computer scientists still wore brown tuxedos and glasses with large, square frames, data was stored unstructured in files that represented text document with data separated by (usually) commas or tabs.

This is what professionals in the field looked like information technology in the 70s. (Bottom left is Bill Gates).

Text files are still used today to store small amounts of simple information. Comma-Separated Values ​​(CSV) - Comma-separated values ​​are very popular and are widely supported today by various software and operating systems. Microsoft Excel is one example of programs that can work with CSV files. Data stored in such a file can be read by a computer program.

Above is an example of what such a file might look like. Reading program this file, should be notified that the data is separated by commas. If the program wants to select and display the category in which the lesson is located "Database Design Tutorial", then she must read line by line until the words are found "Database Design Tutorial" and then she will need to read the word following the comma in order to infer the category Software.

Database tables.
Reading a file line by line is not very efficient. In a relational database, data is stored in tables. The table below contains the same data as the file. Each line or “entry” contains one lesson. Each column contains some property of the lesson. IN in this case this is the title and its category.

A computer program could search the tutorial_id column of a given table for a specific tutorial_id to quickly find its corresponding title and category. This is much faster than searching the file line by line, much like a program does in a text file.

Modern relational databases are designed to allow data to be retrieved from specific rows, columns, and multiple tables at a time, very quickly.

History of the relational model.
The relational database model was invented in the 70s by Edgar Codd, a British scientist. He wanted to overcome his shortcomings network model databases and hierarchical model. And he was very successful in this. The relational database model is now widely accepted and considered a powerful model for effective organization data.

Today there is a wide selection of database management systems available: from small desktop applications to multifunctional server systems with highly optimized search methods. Here are some of the most famous relational database management systems (RDBMS):

- Oracle– used primarily for professional, large applications.
- Microsoft SQL server – RDBMS from Microsoft. Available only for the Windows operating system.
- mysql– a very popular RDBMS with open source source code. Widely used by both professionals and beginners. What else do you need?! It's free.
- IBM– has a number of RDBMSs, the most famous being DB2.
- Microsoft Access– RDBMS, which is used in the office and at home. In fact, it is more than just a database. MS Access allows you to create databases with a user interface.
In the next part I will tell you something about the characteristics of relational databases.

3. Characteristics of relational databases.
Relational databases are designed for quick save and obtaining large amounts of information. Below are some characteristics of relational databases and the relational data model.
Using keys.
Each row of data in a table is identified by a unique “key” called a primary key. Often, the primary key is an automatically increasing (auto-incrementing) number (1,2,3,4, etc.). Data in different tables can be linked together using keys. The primary key values ​​of one table can be added to the rows (records) of another table, thereby linking those records together.

Using structured language queries (SQL), data from different tables that are related by a key can be selected at one time. For example, you can create a query that will select all orders from the orders table that belong to user id 3 (Mike) from the users table. We will talk about keys further in the following parts.


The id column in this table is the primary key. Each record has a unique primary key, often a number. The usergroup column is a foreign key. Judging by its name, it apparently refers to a table that contains user groups.

No data redundancy.
In a database design that follows the rules of the relational data model, each piece of information, such as a user's name, is stored in only one place. This eliminates the need to work with data in multiple places. Duplicate data is called data redundancy and should be avoided in good project databases.
Input limitation.
Using relational base data, you can define what type of data is allowed to be stored in the column. You can create a field that contains integers, decimals, small pieces of text, large pieces of text, dates, etc.


When you create a database table you provide a data type for each column. For example, varchar is a data type for small pieces of text with maximum number characters equal to 255, and ints are numbers.

In addition to data types, RDBMS allows you to further limit the data you can enter. For example, limit the length or force the uniqueness of the value of records in this column. The last restriction is often used for fields that contain usernames or email addresses.

These restrictions give you control over the integrity of your data and prevent situations like the following:

Entering an address (text) in the field where you expect to see a number
- entering a region index with a length of this same index of one hundred characters
- creating users with the same name
- creating users with the same email address
- enter weight (number) in the birthday field (date)

Maintaining data integrity.
By adjusting field properties, linking tables, and configuring constraints, you can increase the reliability of your data.
Assignment of rights.
Most RDBMSs offer access rights settings that allow you to assign specific rights certain users. Some actions that can be allowed or denied to the user: SELECT, INSERT, DELETE, ALTER, CREATE, etc. These are operations that can be performed using Structured Query Language (SQL).
Structured Query Language (SQL).
In order to perform certain operations on the database, such as storing data, retrieving it, changing it, a structured query language (SQL) is used. SQL is relatively easy to understand and allows... and stacked selects, such as retrieving related data from multiple tables using the SQL JOIN statement. As mentioned earlier, SQL will not be discussed in this tutorial. I will focus on database design.

The way you design your database will have a direct impact on the queries you will need to run to retrieve data from the database. This is another reason why you need to think about what your base should be. With a well-designed database, your queries can be cleaner and simpler.

Portability.
The relational data model is standard. By following the rules of the relational data model, you can be sure that your data can be transferred to another RDBMS with relative ease.

As stated earlier, database design is a matter of identifying data, relating it, and storing the results of the decision. this issue onto paper (or into a computer program). Design a database independent of the RDBMS you plan to use to create it.

In the next part we will take a closer look at primary keys.

Database Design

Basic concepts about databases and DBMS

Information system (IS) is a system built on the basis of computer technology, designed for storing, searching, processing and transmitting significant amounts of information, having a certain practical scope of application.

Database- This is IP that is stored electronically.

Database (DB)– an organized collection of data intended for long-term storage external memory COMPUTER, constant update and use.

Databases are used for storage and retrieval large volume information. Examples of databases: notebook, dictionaries, reference books, encyclopedias, etc.

Database classification:

1. According to the nature of the stored information:

- Factual – contain brief information about the described objects, presented in a strictly defined format (card files, for example: database of the library’s book collection, database of the institution’s personnel),

- Documentary – contain documents (information) of various types: text, graphic, audio, multimedia (archives, for example: reference books, dictionaries, databases of legislative acts in the field of criminal law, etc.)

2. By data storage method:

- Centralized (stored on one computer),

- Distributed (used in local and global computer networks).

3. According to the data organization structure:

- Relational (tabular),

- Non-relational.

The term “relational” (from the Latin relatio – relationship ) indicates that such a data storage model is built on the relationship of its constituent parts. Relational the database is essentially a two-dimensional table. Each row of such a table is called a record. The columns of the table are called fields: each field is characterized by its name and data type. A database field is a table column containing the values ​​of a specific property.

Properties of the relational data model:

Each table element is one data element;

All table fields are homogeneous, i.e. have one type;

There are no identical entries in the table;

The order of records in the table can be arbitrary and can be characterized by the number of fields and data type.

Hierarchical is called a database in which information is ordered as follows: one element is considered the main one, the rest are subordinates. IN hierarchical records in the database are sorted into a certain sequence, like the steps of a ladder, and the search for data can be carried out by sequential “descent” from step to step. This model characterized by such parameters as levels, nodes, connections. The principle of operation of the model is such that several nodes of a lower level are connected using a connection with one node of a higher level.

Node is an information model of an element located at a given hierarchy level.

Properties of the hierarchical data model:

Multiple lower-level nodes are connected to only one node top level;

A hierarchical tree has only one vertex (the root) and is not subordinate to any other vertex;

Each node has its own name (identifier);

There is only one path from the root record to the more private data record.

The hierarchical database is the Catalog Windows folders, which you can work with by launching Explorer. Upper level occupies the Desktop folder. On the second level there are the folders My Computer, My Documents, Network environment and Recycle Bin, which are descendants of the Desktop folder, being twins. In turn, the My Computer folder is an ancestor in relation to third-level folders, disk folders (Disk 3.5 (A:), C:, D:, E:, F:) and system folders(Printers, Control Panel, etc.).

Network is called a database in which horizontal links are added to the vertical hierarchical relationships. Any object can be a master and a slave.

The network database is actually the World Wide Web computer network Internet. Hyperlinks link hundreds of millions of documents together into a single distributed network database.

Software designed to work with databases is called database management system(DBMS). DBMS are used for orderly storage and processing of large volumes of information.

Database Management System(DBMS) is a system that provides search, storage, correction of data, and generation of responses to queries. The system ensures data safety, confidentiality, movement and communication with other software.

The main actions that a user can perform using the DBMS:

Creating a database structure;

Filling the database with information;

Changing (editing) the structure and content of the database;

Searching for information in the database;

Data sorting;

Database protection;

Checking the integrity of the database.

Modern DBMS make it possible to include not only text and graphic information, but also sound fragments and even video clips.

The ease of use of the DBMS allows you to create new databases without resorting to programming, but using only built-in functions. DBMS ensure the correctness, completeness and consistency of data, as well as convenient access to them.

Popular DBMS - FoxPro, Access for Windows, Paradox.

Thus, it is necessary to distinguish between databases themselves (DBs) - ordered sets of data, and database management systems (DBMS) - programs that manage the storage and processing of data. For example, Access application included in office suite programs Microsoft Office, is a DBMS that allows the user to create and process tabular databases.

Principles of designing control systems databases follow from the requirements that a database organization must satisfy:

- Productivity and availability. Requests from the user by the database are satisfied at the speed required to use the data. The user quickly receives data whenever he needs it.

- Minimum costs. Low cost storing and using data, minimizing the cost of making changes.

- Simplicity and ease of use. Users can easily find out and understand what data is available to them. Access to data should be simple, exclusive possible errors from the user's side.

- Easy to make changes. The database can grow and change without disrupting existing uses of the data.



- Possibility of search. A database user can make a variety of queries regarding the data stored in it. To implement this, a so-called query language is used.

- Integrity. Modern databases can contain data shared by many users. It is very important that during the work the data elements and connections between them are not broken. In addition, hardware errors and various types of random failures should not lead to irreversible data loss. This means that the data management system must contain a data recovery mechanism.

- Security and privacy. Data security means the protection of data from accidental or intentional access to it by unauthorized persons, from unauthorized modification (change) of data or its destruction. Privacy is defined as the right of individuals or organizations to decide when, how, and how much information can be shared with other individuals or organizations.

Below is an example of one of the most common database management systems - Microsoft Access is part of the popular Microsoft package Office - We will learn about basic data types, how to create databases, and how to work with databases.

Database Design

Like anyone software product, the database has its own life cycle (LCD). The main component in life cycle DB is the creation of a unified database and the programs necessary for its operation.

LCBD includes the following main stages:

1. Planning for database development;

2. Determination of system requirements;

3. Collection and analysis of user requirements:

4. Database design:

Conceptual Database Design - Creation conceptual model data, that is information model. Such a model is created without focusing on any specific DBMS and data model. Most often, the conceptual database model includes: a description of information objects or concepts subject area and connections between them; description of integrity constraints, i.e. requirements for acceptable values data and the connections between them;

Logical database design – creating a logical data model; creating a database schema based on a specific data model, such as a relational data model. For a relational data model, a logical model is a set of relationship diagrams, usually specifying primary keys, as well as "links" between relationships, which are foreign keys.

The transformation of a conceptual model into a logical model is usually carried out according to formal rules. This stage can be largely automated.

At the logical design stage, the specifics of a specific data model are taken into account, but the specifics of a specific DBMS may not be taken into account.

Physical database design - creating a database schema for a specific DBMS, creating a description of the DBMS. The specifics of a particular DBMS may include restrictions on the naming of database objects, restrictions on supported data types, etc. In addition, the specifics of a particular DBMS during physical design include the choice of solutions related to physical environment data storage (selection of disk memory management methods, division of the database into files and devices, data access methods, development of data protection tools), creation of indexes, etc.;

5. Application development:

Transaction design (a group of SQL statements (a set of commands) executed as a whole);

Design user interface;

6. Implementation;

8. Testing;

9. Operation and maintenance:

Functional analysis and support original version DB;

Adaptation, modernization and support for redesigned options.

Database Design– the process of creating a database schema and determining the necessary integrity constraints (compliance of the information available in the database with its internal logic, structure and all explicitly specified rules).

Main tasks of database design:

Ensuring that all necessary information is stored in the database.

Ensuring the ability to obtain data for all necessary requests.

Reduce data redundancy and duplication.

Ensuring database integrity.

The essence of database design, like any other design process, is to create a description of a new system that has not previously existed in this form, which, when implemented, is capable of expectedly functioning under appropriate conditions. It follows from this that the stages of database design must consistently and logically reflect the essence of this process.

Contents of database design and phasing

The design intent is based on some formulated social need. This need has an environment for its occurrence and a target audience of consumers who will use the design result. Consequently, the database design process begins with studying a given need from the point of view of consumers and the functional environment of its intended placement. That is, the first stage is collecting information and defining a model of the system’s subject area, as well as looking at it from the point of view target audience. In general, to determine system requirements, the scope of activities as well as the boundaries of database applications are determined.

Next, the designer, who already has certain ideas about what he needs to create, clarifies the tasks supposedly solved by the application, creates a list of them (especially if the project development is a large and complex database), clarifies the sequence of solving problems and performs data analysis. This process is also a staged design work, but usually in the design structure these steps are absorbed by the stage conceptual design– the stage of identifying objects, attributes, connections.

Creating a conceptual (information model) involves the preliminary formation of conceptual user requirements, including requirements for applications that may not be immediately implemented, but taking into account which will improve the functionality of the system in the future. Dealing with representations of set abstraction objects (without specifying physical storage methods) and their relationships, the conceptual model essentially corresponds to the domain model. Therefore, in the literature, the first stage of database design is called infological design.

Next, as a separate stage (or in addition to the previous one), follows the stage of forming requirements for the operating environment, where the requirements for computing resources, capable of ensuring the functioning of the system. Accordingly, the larger the volume of the designed database, the higher the user activity and intensity of requests, the higher the requirements for resources: for the computer configuration, for the type and version of the operating system. For example, multi-user operation of a future database requires a network connection using an operating system capable of multitasking.

The next step is for the designer to select a database management system (DBMS), as well as software tools. After this, the conceptual model must be transferred to a data model compatible with the selected management system. But this often involves making amendments and changes to the conceptual model, since the interconnections between objects reflected in the conceptual model cannot always be implemented using the means of a given DBMS.

This circumstance determines the emergence of the next stage - the emergence of a conceptual model provided with the means of a specific DBMS. This step corresponds to the stage of logical design (creating a logical model).

Finally, the final stage of database design is physical design - the stage of linking the logical structure and the physical storage environment.

Thus, the main stages of design in detailed form are presented in the following stages:

  • information design,
  • formation of requirements for the operating environment
  • selection of control system and software DB,
  • logical design,
  • physical design

The key ones will be discussed in more detail below.

Infological design

Identification of entities forms the semantic basis of infological design. An entity here is an object (abstract or concrete), information about which will be accumulated in the system. In the information model of the subject area in user-friendly in terms that do not depend on the specific implementation of the database, the structure and dynamic properties of the subject area are described. But the terms are taken on a standard scale. That is, the description is expressed not through individual objects of the subject area and their relationships, but through:

  • description of object types,
  • integrity constraints associated with the described type,
  • processes leading to the evolution of a subject area - its transition to another state.

An information model can be created using several methods and approaches:

  1. The functional approach is based on the assigned tasks. It is called functional because it is used if the functions and tasks of the persons who will serve their information needs with the help of the designed database are known.
  2. The subject approach focuses on information about the information that will be contained in the database, despite the fact that the query structure may not be defined. In this case, research in the subject area is focused on its most adequate display in the database in the context full spectrum expected information requests.
  3. An integrated approach using the “entity-relationship” method combines the advantages of the previous two. The method comes down to dividing the entire subject area into local parts, which are modeled separately and then recombined into a whole area.

Since using the entity-relationship method is a combined design method for at this stage, it becomes a priority more often than others.

When methodically divided, local representations should, if possible, include information that would be sufficient to solve a separate problem or to meet the requests of a certain group of potential users. Each of these areas contains about 6-7 entities and corresponds to a separate external application.

The dependence of entities is reflected in their division into strong (base, parent) and weak (child). A strong entity (for example, a reader in a library) can exist in the database on its own, but a weak entity (for example, this reader’s subscription) is “attached” to a strong one and does not exist separately.

It is necessary to separate the concepts of “entity instance” (an object characterized by specific property values) and the concept of “entity type” - an object characterized by a common name and a list of properties.

For each individual entity, attributes (a set of properties) are selected, which, depending on the criterion, can be:

  • identifying (with a unique value for entities of that type, making them potential keys) or descriptive;
  • single-valued or multi-valued (with the appropriate number of values ​​for an entity instance);
  • basic (independent of other attributes) or derived (calculated based on the values ​​of other attributes);
  • simple (indivisible one-component) or composite (combined from several components).

After this, the attribute is specified, the connections are specified in the local view (divided into optional and mandatory) and the local views are merged. If the number of local areas is up to 4-5, they can be combined in one step. If the number increases, the binary merging of areas occurs in several stages.

During this and other intermediate stages, the iterative nature of design is reflected, which is expressed here in the fact that in order to eliminate contradictions it is necessary to return to the stage of modeling local representations for clarification and change (for example, to change the same names of semantically different objects or to coordinate integrity attributes on same attributes in different applications).

Selecting a control system and database software

Practical implementation depends on the choice of database management system information system. The most significant criteria in the selection process are the following parameters:

  • type of data model and its compliance with the needs of the subject area,
  • reserve of possibilities in case of expansion of the information system,
  • performance characteristics of the selected system,
  • operational reliability and convenience of the DBMS,
  • tools aimed at data administration personnel,
  • the cost of the DBMS itself and additional software.

Errors in choosing a DBMS will almost certainly subsequently provoke the need to adjust the conceptual and logical models.

Logical database design

The logical structure of the database must correspond to the logical model of the subject area and take into account the connection of the data model with the supported DBMS. Therefore, the stage begins with choosing a data model, where it is important to take into account its simplicity and clarity.

It is preferable when the natural data structure coincides with the model representing it. So, for example, if the data is presented in the form of a hierarchical structure, then it is better to choose a hierarchical model. However, in practice, such a choice is often determined by the database management system rather than by the data model. Therefore, the conceptual model is actually translated into a data model that is compatible with the selected database management system.

This also reflects the nature of design, which allows for the possibility (or necessity) of returning to the conceptual model to change it if the relationships between objects (or object attributes) reflected there cannot be implemented using the chosen DBMS.

Upon completion of the stage, database schemas of both levels of architecture (conceptual and external) should be generated, created in the data definition language supported by the selected DBMS.

Database schemas are formed using one of two different approaches:

  • or using a bottom-up approach, when work is done from lower levels defining attributes grouped into relationships representing objects based on relationships existing between attributes;
  • or using a reverse, top-down approach, used when the number of attributes increases significantly (up to hundreds and thousands).

The second approach involves identifying a number of high-level entities and their relationships with subsequent detailing to the required level, which is reflected, for example, in a model created based on the “entity-relationship” method. But in practice, both approaches are usually combined.

Physical database design

At the next stage of physical database design logical structure is displayed in the form of a database storage structure, that is, it is linked to the physical storage environment where the data will be placed as efficiently as possible. Here the data schema is described in detail, indicating all types, fields, sizes and restrictions. In addition to developing indexes and tables, basic queries are defined.

Construction physical model involves solving largely contradictory problems:

  1. tasks of minimizing data storage space,
  2. challenges to achieve integrity, security and maximum performance.

The second task conflicts with the first because, for example:

  • for transactions to function effectively, you need to reserve disk space for temporary objects,
  • to increase search speed, you need to create indexes, the number of which is determined by the number of all possible combinations participating in the search of fields,
  • for data recovery will be created backups database and keep a log of all changes.

All this increases the size of the database, so the designer is looking for a reasonable balance in which problems are solved optimally by intelligently placing data in memory space, but not at the expense of database security, which includes both protection from unauthorized access and protection from failures.

To complete the creation of a physical model, its operational characteristics are assessed (search speed, efficiency of query execution and resource consumption, correctness of operations). Sometimes this stage, like the stages of database implementation, testing and optimization, as well as maintenance and operation, is taken outside the immediate design of the database.

The essence of database design, like any other design process, is to create a description of a new system that has not previously existed in this form, which, when implemented, is capable of expectedly functioning under appropriate conditions. It follows from this that the stages of database design must consistently and logically reflect the essence of this process.

Contents of database design and phasing

The design intent is based on some formulated social need. This need has an environment for its occurrence and a target audience of consumers who will use the design result. Consequently, the database design process begins with studying a given need from the point of view of consumers and the functional environment of its intended placement. That is, the first stage is collecting information and defining a model of the system’s subject area, as well as a look at it from the point of view of the target audience. In general, to determine system requirements, the scope of activities as well as the boundaries of database applications are determined.

Next, the designer, who already has certain ideas about what he needs to create, clarifies the tasks supposedly solved by the application, creates a list of them (especially if the project development is a large and complex database), clarifies the sequence of solving problems and performs data analysis. Such a process is also a staged design work, but usually in the design structure these steps are absorbed by the conceptual design stage - the stage of identifying objects, attributes, and connections.

Creating a conceptual (information model) involves the preliminary formation of conceptual user requirements, including requirements for applications that may not be immediately implemented, but taking into account which will improve the functionality of the system in the future. Dealing with representations of set abstraction objects (without specifying physical storage methods) and their relationships, the conceptual model essentially corresponds to the domain model. Therefore, in the literature, the first stage of database design is called infological design.

Next, a separate stage (or an addition to the previous one) follows the stage of forming requirements for the operating environment, where the requirements for computing resources capable of ensuring the functioning of the system are assessed. Accordingly, the larger the volume of the designed database, the higher the user activity and intensity of requests, the higher the requirements for resources: for the computer configuration, for the type and version of the operating system. For example, multi-user operation of a future database requires a network connection using an operating system capable of multitasking.

The next step is for the designer to select a database management system (DBMS), as well as software tools. After this, the conceptual model must be transferred to a data model compatible with the selected management system. But this often involves making amendments and changes to the conceptual model, since the interconnections between objects reflected in the conceptual model cannot always be implemented using the means of a given DBMS.

This circumstance determines the emergence of the next stage - the emergence of a conceptual model provided with the means of a specific DBMS. This step corresponds to the stage of logical design (creating a logical model).

Finally, the final stage of database design is physical design - the stage of linking the logical structure and the physical storage environment.

Thus, the main stages of design in detailed form are presented in the following stages:

  • information design,
  • formation of requirements for the operating environment
  • selection of control system and database software,
  • logical design,
  • physical design

The key ones will be discussed in more detail below.

Infological design

Identification of entities forms the semantic basis of infological design. An entity here is an object (abstract or concrete), information about which will be accumulated in the system. In the infological model of the subject area, the structure and dynamic properties of the subject area are described in user-friendly terms that do not depend on the specific implementation of the database. But the terms are taken on a standard scale. That is, the description is expressed not through individual objects of the subject area and their relationships, but through:

  • description of object types,
  • integrity constraints associated with the described type,
  • processes leading to the evolution of a subject area - its transition to another state.

An information model can be created using several methods and approaches:

  1. The functional approach is based on the assigned tasks. It is called functional because it is used if the functions and tasks of the persons who will serve their information needs with the help of the designed database are known.
  2. The subject approach focuses on information about the information that will be contained in the database, despite the fact that the query structure may not be defined. In this case, research on a subject area focuses on its most adequate display in the database in the context of the full range of expected information requests.
  3. An integrated approach using the “entity-relationship” method combines the advantages of the previous two. The method comes down to dividing the entire subject area into local parts, which are modeled separately and then recombined into a whole area.

Since the use of the “entity-relationship” method is a combined design method at this stage, it most often becomes a priority.

When methodically divided, local representations should, if possible, include information that would be sufficient to solve a separate problem or to meet the requests of a certain group of potential users. Each of these areas contains about 6-7 entities and corresponds to a separate external application.

The dependence of entities is reflected in their division into strong (base, parent) and weak (child). A strong entity (for example, a reader in a library) can exist in the database on its own, but a weak entity (for example, this reader’s subscription) is “attached” to a strong one and does not exist separately.

It is necessary to separate the concepts of “entity instance” (an object characterized by specific property values) and the concept of “entity type” - an object characterized by a common name and a list of properties.

For each individual entity, attributes (a set of properties) are selected, which, depending on the criterion, can be:

  • identifying (with a unique value for entities of that type, making them potential keys) or descriptive;
  • single-valued or multi-valued (with the appropriate number of values ​​for an entity instance);
  • basic (independent of other attributes) or derived (calculated based on the values ​​of other attributes);
  • simple (indivisible one-component) or composite (combined from several components).

After this, the attribute is specified, the connections are specified in the local view (divided into optional and mandatory) and the local views are merged. If the number of local areas is up to 4-5, they can be combined in one step. If the number increases, the binary merging of areas occurs in several stages.

During this and other intermediate stages, the iterative nature of design is reflected, which is expressed here in the fact that in order to eliminate contradictions it is necessary to return to the stage of modeling local representations for clarification and change (for example, to change the same names of semantically different objects or to coordinate integrity attributes on same attributes in different applications).

Selecting a control system and database software

The practical implementation of the information system depends on the choice of the database management system. The most significant criteria in the selection process are the following parameters:

  • type of data model and its compliance with the needs of the subject area,
  • reserve of possibilities in case of expansion of the information system,
  • performance characteristics of the selected system,
  • operational reliability and convenience of the DBMS,
  • tools aimed at data administration personnel,
  • the cost of the DBMS itself and additional software.

Errors in choosing a DBMS will almost certainly subsequently provoke the need to adjust the conceptual and logical models.

Logical database design

The logical structure of the database must correspond to the logical model of the subject area and take into account the connection of the data model with the supported DBMS. Therefore, the stage begins with choosing a data model, where it is important to take into account its simplicity and clarity.

It is preferable when the natural data structure coincides with the model representing it. So, for example, if the data is presented in the form of a hierarchical structure, then it is better to choose a hierarchical model. However, in practice, such a choice is often determined by the database management system rather than by the data model. Therefore, the conceptual model is actually translated into a data model that is compatible with the selected database management system.

This also reflects the nature of design, which allows for the possibility (or necessity) of returning to the conceptual model to change it if the relationships between objects (or object attributes) reflected there cannot be implemented using the chosen DBMS.

Upon completion of the stage, database schemas of both levels of architecture (conceptual and external) should be generated, created in the data definition language supported by the selected DBMS.

Database schemas are formed using one of two different approaches:

  • or using a bottom-up approach, when work comes from the lower levels of defining attributes, grouped into relationships representing objects, based on the relationships existing between attributes;
  • or using a reverse, top-down approach, used when the number of attributes increases significantly (up to hundreds and thousands).

The second approach involves identifying a number of high-level entities and their relationships with subsequent detailing to the required level, which is reflected, for example, in a model created based on the “entity-relationship” method. But in practice, both approaches are usually combined.

Physical database design

At the next stage of the physical design of the database, the logical structure is displayed in the form of a database storage structure, that is, it is linked to the physical storage environment where the data will be placed as efficiently as possible. Here the data schema is described in detail, indicating all types, fields, sizes and restrictions. In addition to developing indexes and tables, basic queries are defined.

The construction of a physical model involves solving largely contradictory problems:

  1. tasks of minimizing data storage space,
  2. challenges to achieve integrity, security and maximum performance.

The second task conflicts with the first because, for example:

  • for transactions to function effectively, you need to reserve disk space for temporary objects,
  • to increase search speed, you need to create indexes, the number of which is determined by the number of all possible combinations of fields involved in the search,
  • To restore data, database backups will be created and a log of all changes will be kept.

All this increases the size of the database, so the designer is looking for a reasonable balance in which problems are solved optimally by intelligently placing data in memory space, but not at the expense of database security, which includes both protection from unauthorized access and protection from failures.

To complete the creation of a physical model, its operational characteristics are assessed (search speed, efficiency of query execution and resource consumption, correctness of operations). Sometimes this stage, like the stages of database implementation, testing and optimization, as well as maintenance and operation, is taken outside the immediate design of the database.