Main stages of database development. Database development stages

Database Design

Database design stages:

1. System analysis and verbal description of information objects of the subject area and the connections between them.

2. Semantic modeling of a subject area – a partially formalized description of objects of a subject area in terms of some semantic model, for example, an ER model.

3. Selecting a standard DBMS.

4. Logical design of the database, that is, description of the database in terms of the accepted data model. At this stage, the number and structure of tables are determined, queries to the database are generated, types of reporting documents are determined, information processing algorithms are developed, forms for entering and editing data are created, etc.

5. Physical design of the database, that is, the choice of effective placement of the database on external media to ensure maximum performance when processing data.

4.2 Entity-relationship model (ER model)

Essence– this is some object of the real world that can exist independently. The entity has copies, differing from each other in attribute values and allowing unambiguous identification. Attribute is a named characteristic of an entity. For example, entity Book characterized by such attributes as author, title, publisher, etc. Specific books are instances of an entity Book. They differ in the values of the specified attributes and are uniquely identified by the “name” attribute. An attribute that uniquely identifies instances of an entity is called key. The key may be composite, representing a combination of several attributes.

Let's assume that a database is being designed for the BANK subject area. It has branches managed by managers. Customers have different types of accounts: current, urgent, demand, etc., which are processed at branches. In the subject area, four entities can be distinguished: Branch, Manager, Account, Client.

In an ER diagram, an entity is represented by a rectangle containing its name, for example:

Connection represents the interaction between entities. It is characterized power (degree of connection), which shows how many entities are involved in the relationship. The relationship between two entities is called binary. On an ER diagram, the relationship is depicted as a diamond, for example:

In the BANK subject area, 3 connections can be distinguished:

1. Manager Manages the Branch

2. Branch Processes Invoice

3. Client Has an Account

An important characteristic of communication is connection type (multiplicity). Let's consider the types of the above connections. Since one manager manages only one branch, the 1st relationship is of a one-to-one type (1:1).

Since one branch processes multiple invoices, and each invoice is processed by only one branch, the 2nd relationship is a one-to-many (1:M) relationship.

Since one account can be shared by multiple clients and one client can have multiple accounts, the 3rd relationship is a many-to-many (M:N) relationship.

Degree of participation determines whether all or only some instances of an entity are involved in the relationship. She might be mandatory or optional.

If not every instance of entity A is associated with any instance of entity B, then the degree of participation of entity A is optional. This is represented on the ER diagram by a black circle placed on the communication line near entity A.

If every instance of entity A is associated with some instance of entity B, then the degree of participation of entity A is mandatory. In this case, on the ER diagram, a black circle on the communication line is placed in a rectangle next to entity A. For example, communication Employee Registers Clients has type (1:M). In this case, not every employee registers clients (optional participation), but each client is registered by an employee (mandatory participation):

Let us assume that in the BANK subject area under consideration, the degree of participation of all four entities is mandatory. Then the ER model will look like:

Each of the four model entities can be described by its own set of attributes.

An ER model, together with sets of entity attributes, can serve as an example of a semantic (conceptual) model of a domain or a conceptual database schema.

When developing a database, the following stages of work can be distinguished.

Stage I. Statement of the problem.

At this stage, a task for creating a database is generated. It describes in detail the composition of the database, the purpose and purpose of its creation, and also lists what types of work are supposed to be carried out in this database (selection, addition, changing data, printing or outputting a report, etc.).

Stage II. Object analysis.

At this stage, we consider what objects the database may consist of and what the properties of these objects are. After dividing the database into separate objects, it is necessary to consider the properties of each of these objects, or, in other words, to establish what parameters describe each object. All this information can be arranged in the form of separate records and tables. Next, we need to consider the data type of each individual record unit. Information about data types should also be included in the table you create.

Stage III. Model synthesis.

At this stage, based on the above analysis, it is necessary to select a specific database model. Next, the advantages and disadvantages of each model are considered and compared with the requirements and objectives of the database being created. After such an analysis, a model is selected that can best ensure the implementation of the task. After choosing a model, you need to draw its diagram indicating the relationships between tables or nodes.

Stage IV. Selecting methods for presenting information and software tools.

After creating the model, it is necessary, depending on the selected software product, to determine the form of information presentation.

In most DBMSs, data can be stored in two types:

using forms;

without using forms.

A form is a user-created graphical interface for entering data into a database.

V stage. Synthesis of a computer model of an object.

In the process of creating a computer model, there are some stages that are typical for any DBMS.

Stage 1. Launching the DBMS, creating a new database file or opening a previously created database.

Stage 2: Create the initial table or tables.

When creating the source table, you must specify the name and type of each field. Field names should not be repeated within the same table. While working with the database, you can add new fields to the table. The created table must be saved, giving it a name that is unique within the database being created.

1. Information in the table should not be duplicated. There should be no repetitions between tables. When certain information is stored in only one table, then it will only have to be changed in one place. This makes the work more efficient and also eliminates the possibility of mismatched information in different tables. For example, one table should contain customer addresses and phone numbers.
2. Each table should contain information on only one topic. Information on each topic is much easier to process if it is contained in tables that are independent from each other. For example, it is better to store customer addresses and orders in different tables, so that when an order is deleted, information about the customer remains in the database.
3. Each table must contain the required fields. Each field in the table should contain separate information about the topic of the table. For example, a customer data table might contain fields for company name, address, city, country, and phone number. When designing fields for each table, you need to remember that each field must be related to the topic of the table. It is not recommended to include data in a table that is the result of an expression. The table should contain all the necessary information. Information should be broken down into the smallest logical units (For example, the First Name and Last Name fields, rather than a general First Name field).
4. The database must have a primary key. This is necessary so that the DBMS can link data from different tables, for example, customer data and his orders.

Stage 3. Creation of screen forms.

Initially, you need to specify the table on the basis of which the form will be created. You can create it using the form wizard, specifying what type it should have, or you can create it yourself. When creating a form, you can specify not all the fields that the table contains, but only some of them. The name of the form can be the same as the name of the table on which it was created. Based on one table, you can create several forms, which may differ in the type or number of fields used from this table. After creating the form, you must save it. The created form can be edited by changing the location, size and format of the fields.

Stage 4. Filling out the database.

The process of filling out the database can be carried out in two forms: in the form of a table and in the form of a form. Numeric and text fields can be filled out as a table, and MEMO and OLE fields can be filled out as a form.

Stage VI. Working with the created database.

Working with the database includes the following actions:

searching for the necessary information;

data sorting;

data selection;

printing;

changing and adding data.

The creation and implementation of modern information systems of automated databases puts forward new design problems that cannot be solved with traditional techniques and methods. Much attention must be paid to database design issues. The effectiveness of the system as a whole, its viability and the possibility of expansion and further development depend on how successfully the database is designed. Therefore, the issue of database design is identified as a separate, independent area of work in the development of information systems.

Database Design is an iterative, multi-stage process of making informed decisions in the process of analyzing the information model of the subject area, data requirements from application programmers and users, synthesizing logical and physical data structures, analyzing and justifying the choice of software and hardware. The stages of database design are related to the multi-level organization of data. When considering the issue of database design, we will adhere to such a multi-level presentation of data: external, infological, logical (datalogical) and internal.

This representation of data layers is not the only one. There are other options for multi-level data presentation. Thus, in accordance with the proposals of the research group on data management systems of the American National Standards Institute ANSI/X3/SPARC, as well as CODASYL (Conference on Data Systems Languages), as a rule, there are three levels of data presentation:

· external level (from the point of view of the end user and application programmer),
· conceptual level (from the DBMS point of view),
· interior level (from the point of view of a system programmer).

According to this concept, the external layer is a part (subset) of the conceptual model necessary to implement a request or application program. That is, if a conceptual model acts as a schema supported by a specific DBMS, then the external level is a certain set of subcircuits necessary to implement a specific application program or user request.

There is also another point of view, according to which the external level is understood as more general concepts related to the study and analysis of information flows of the subject area and their structuring. Some authors introduce an auxiliary level (intermediate between the external and datalogical levels), which is called infological. It can act as an independent one or be an integral part of the external level.

This concept is more appropriate from the point of view of understanding the database design process. Therefore, we will consider the infological level as an independent level of data presentation. The external level in this case acts as a separate design stage, at which all non-machine information support is studied, that is, forms of documentation and presentation of data, as well as the external environment in which the data bank will operate in terms of methods for recording, collecting and transmitting information to the database data.

When designing a database at the external level it is necessary to study the functioning of the control object for which the database is being designed, all primary and output documentation from the point of view of determining what data needs to be stored in the database. The external level is, as a rule, a verbal description of input and output messages, as well as data that should be stored in the database. The description of the external level does not exclude the presence of elements of duplication, redundancy and inconsistency of data. Therefore, to eliminate these anomalies and contradictions in the external description of data, infological design is performed.

An information model is a means of structuring a subject area and understanding the concept of data semantics. An information model can be viewed mainly as a means of documenting and structuring a form of representation of information needs that ensures consistent communication between users and system developers.

All external representations are integrated at the infological level, where an infological (canonical) data model is formed, which is not a simple sum of external data representations.

Infological level is an information-logical model (ILM) of the subject area, from which data redundancy is excluded and the information features of the management object are displayed without taking into account the features and specifics of a particular DBMS. That is, the infological presentation of data is focused primarily on the person who designs or uses the database.

Logical (conceptual) the level is built taking into account the specifics and features of a particular DBMS. This level of data presentation is aimed more at computer processing and the programmers who develop it. At this level, a conceptual data model is formed, that is, a specially structured model of the subject area that meets the features and limitations of the selected DBMS. The logical level model, supported by the means of a specific DBMS, is also called datalogical.

Infological and datalogical models, which reflect the model of one subject area, are dependent on each other. The infological model can easily be transformed into a datalogical model.

Internal level associated with the physical placement of data in computer memory. At this level, a physical model of the database is formed, which includes structures for storing data in computer memory, incl. description of record formats, the order of their logical or physical arrangement, placement by device type, as well as characteristics and paths to access data. The following characteristics of the functioning of the database depend on the parameters of the physical model: memory volume and system response time. The physical parameters of the database can be changed during its operation in order to increase the efficiency of the system. Changing physical parameters does not predetermine the need to change the information and data models. The diagram of the relationship between the levels of data presentation in the database is shown in Fig. 1.1. The database is designed in accordance with these levels. Database design is a complex and time-consuming process that requires the involvement of many highly qualified specialists. The performance of the information system and the completeness of meeting the functional needs of users and application programs depend on how well the database is designed. A poorly designed database can complicate the development process

application software, necessitate the use of more complex logic, which, in turn, will increase the system response time, and in the future may lead to the need to redesign the logical database model. Restructuring or making changes to the logical database model is a very undesirable process, since it causes the need for modification or even reprogramming of individual tasks. All work that is performed at each design stage must be integrated with the data dictionary. Each design stage is considered as a certain sequence of iterative procedures, as a result of which a certain database model is formed.

Rice. 1.1.

The external level is the preparatory stage of infological design.

The goal of design at the external level is to develop non-machine information support, which includes a system of input (primary) documentation characterizing a certain subject area, a system for classifying and encoding technical and economic information, as well as a list of corresponding output messages that need to be generated using BnD.

There are two approaches to designing databases at the external level: “domain-based” and “query-driven”. The “domain-based” approach consists in the formation of external information support for the entire subject area without taking into account the needs of users and application programs. Sometimes this approach is also called object-based or non-process.

In the “from request” approach, the main source of information about the subject area is the study of user requests and the needs of application programs. This approach is also called process or functional. With this approach, the database is designed to perform current management tasks without taking into account the possibility of system expansion and the emergence of new management tasks. The advantage of the “subject domain” approach is its objectivity, consistency in displaying software and the stability of the information model, the ability to implement a large number of application programs and queries, including those unplanned when creating a database. The disadvantage of this approach is the significant amount of work that needs to be done when determining the information to be stored in the database, which, accordingly, complicates and increases the development time of the project.

The functional approach is focused on implementing the current requirements of users and application programs without taking into account the prospects for the development of the system. When using it, difficulties may arise in aggregating the requirements of different users and application programs. However, with this approach, the design effort is significantly reduced, and therefore it is possible to create a system with high performance characteristics. However, taken separately, any of these methods cannot provide enough information to design a rational database structure. Therefore, when designing a database, it is advisable to use these two approaches together. If we schematically imagine the database design process at the external level, then it consists of the following works.

1. Determination of functional problems of the subject area that are subject to automated solution. Since the main purpose of creating a database is to provide data processing functions with information, then, first of all, it is necessary to study all the functions of the subject area (control object) for which the database is being developed and analyze their features. The functions and functional features of a management object must be studied in inextricable connection with the study of functional data requirements on the part of future users of the information system. Study and analysis involve identifying information needs and determining information flows. This work can be done by surveying the subject area and questioning its employees. The result of such a study may be a list of functional tasks that must be solved in an automated way using a database.
2. Study and analysis of operational primary documents. Having studied the functions and determined the list of functional tasks that are subject to automated solution, we proceed to the study of operational documents that are used at the input of each task or their complex. Having studied and analyzed all operational documents (both external and internal) that are used at the input of each task, they determine which details of these documents need to be stored in the database.
3. Study of normative and reference documents. At the third step, all regulatory and reference documentation is studied and analyzed. Such documentation includes various classifiers, estimates, contracts, regulations, legislative acts on tax policy, planning documentation, etc. The distribution and separate analysis of operational and regulatory information are technologically determined. Databases differ in the technologies for creating and maintaining files of conditionally permanent information located in normative and reference documentation, and files of operational information.
4. Study of the processes of converting input messages into output messages.

First of all, all output messages that are printed or displayed are studied and stored in the form of output arrays on the MD. This is necessary in order to determine which attributes of the input messages need to be stored in the database to receive output messages. In addition, at this stage, those indicators are determined that are obtained during the solution of the problem as a result of performing certain calculations. For each calculated indicator, you should determine the algorithm for its formation and make sure that this indicator can be obtained based on the attributes of operational and regulatory information that were determined in the second and third steps. If certain data is not enough to complete the calculations, it is necessary to go back, conduct additional research and

determine where and in what way you can obtain the attributes that are missing. In addition, you need to decide which of the calculated indicators are appropriate to save in the database. Indicators obtained by calculation, as a rule, are not saved in the database. The exception is cases when the calculated indicator must be used to solve other problems or for this task, but in the following calendar periods.

When carrying out design work at an external level, it is necessary to take into account the fact that in order to perform certain functions in the database it is necessary to save additional data that is not displayed in documents (calendar data, statistical data, etc.). A generalized diagram of the process of studying documents and data during design at the external level is shown in Fig. 1.2.

Fig.1.2.

Such a study must be carried out for each functional task or complex of tasks that will be solved using the database.

The result of the design at the external level will be a list of attributes (details) of operational and conditionally permanent information that must be stored in the database, indicating the sources of their receipt and the form of presentation.

However, this list does not exclude the possibility of redundancy, duplication, inconsistency and other shortcomings in it. Therefore, the process does not end here, but a transition to the stage of informational design is carried out.

When developing a database, the following stages of work can be distinguished.

Stage I. Statement of the problem.

Stage II. Object analysis.

Stage III. Model synthesis.

Stage IV. Selecting methods for presenting information and software tools.

After creating the model, it is necessary, depending on the selected software product, to determine the form of information presentation.

In most DBMSs, data can be stored in two types:

using forms;
without using forms.

Form is a user-created graphical interface for entering data into the database.

V stage. Synthesis of a computer model of an object.

In the process of creating a computer model, there are some stages that are typical for any DBMS.

Stage 1. Launching the DBMS, creating a new database file or opening a previously created database.

Stage 2. Create the initial table or tables.

1. Information in the table should not be duplicated. There should be no repetitions between tables. When certain information is stored in only one table, then it will only have to be changed in one place. This makes the work more efficient and also eliminates the possibility of mismatched information in different tables. For example, one table should contain customer addresses and phone numbers.

2. Each table should contain information on only one topic. Information on each topic is much easier to process if it is contained in tables that are independent from each other. For example, it is better to store customer addresses and orders in different tables, so that when an order is deleted, information about the customer remains in the database.

3. Each table must contain the required fields. Each field in the table should contain separate information about the topic of the table. For example, a customer data table might contain fields for company name, address, city, country, and phone number. When designing fields for each table, you need to remember that each field must be related to the topic of the table. It is not recommended to include data in a table that is the result of an expression. The table should contain all the necessary information. Information should be broken down into the smallest logical units (For example, the First Name and Last Name fields, rather than a general First Name field).

4. The database must have a primary key. This is necessary so that the DBMS can link data from different tables, for example, customer data and his orders.

Stage 3. Creation of screen forms.

Stage 4. Filling the database.

Stage VI. Working with the created database.

Working with the database includes the following actions:

searching for the necessary information;
data sorting;
data selection;
printing;
changing and adding data.

The essence of database design, like any other design process, is to create a description of a new system that has not previously existed in this form, which, when implemented, is capable of expectedly functioning under appropriate conditions. It follows from this that the stages of database design must consistently and logically reflect the essence of this process.

Contents of database design and phasing

The design intent is based on some formulated social need. This need has an environment for its occurrence and a target audience of consumers who will use the design result. Consequently, the database design process begins with studying a given need from the point of view of consumers and the functional environment of its intended placement. That is, the first stage is collecting information and defining a model of the system’s subject area, as well as a look at it from the point of view of the target audience. In general, to determine system requirements, the scope of activities as well as the boundaries of database applications are determined.

Next, the designer, who already has certain ideas about what he needs to create, clarifies the tasks supposedly solved by the application, creates a list of them (especially if the project development is a large and complex database), clarifies the sequence of solving problems and performs data analysis. Such a process is also a staged design work, but usually in the design structure these steps are absorbed by the conceptual design stage - the stage of identifying objects, attributes, and connections.

Creating a conceptual (information model) involves the preliminary formation of conceptual user requirements, including requirements for applications that may not be immediately implemented, but taking into account which will improve the functionality of the system in the future. Dealing with representations of set abstraction objects (without specifying physical storage methods) and their relationships, the conceptual model essentially corresponds to the domain model. Therefore, in the literature, the first stage of database design is called infological design.

Next, a separate stage (or an addition to the previous one) follows the stage of forming requirements for the operating environment, where the requirements for computing resources capable of ensuring the functioning of the system are assessed. Accordingly, the larger the volume of the designed database, the higher the user activity and intensity of requests, the higher the requirements for resources: for the computer configuration, for the type and version of the operating system. For example, multi-user operation of a future database requires a network connection using an operating system suitable for multitasking.

The next step is for the designer to select a database management system (DBMS), as well as software tools. After this, the conceptual model must be transferred to a data model compatible with the selected management system. But this often involves making amendments and changes to the conceptual model, since the interconnections between objects reflected in the conceptual model cannot always be implemented using the means of a given DBMS.

This circumstance determines the emergence of the next stage - the emergence of a conceptual model provided with the means of a specific DBMS. This step corresponds to the stage of logical design (creating a logical model).

Finally, the final stage of database design is physical design - the stage of linking the logical structure and the physical storage environment.

Thus, the main stages of design in detailed form are presented in the following stages:

information design,
formation of requirements for the operating environment
selection of control system and database software,
logical design,
physical design

The key ones will be discussed in more detail below.

Infological design

Identification of entities forms the semantic basis of infological design. An entity here is an object (abstract or concrete), information about which will be accumulated in the system. In the infological model of the subject area, the structure and dynamic properties of the subject area are described in user-friendly terms that do not depend on the specific implementation of the database. But the terms are taken on a standard scale. That is, the description is expressed not through individual objects of the subject area and their relationships, but through:

description of object types,
integrity constraints associated with the described type,
processes leading to the evolution of a subject area - its transition to another state.

An information model can be created using several methods and approaches:

The functional approach is based on the assigned tasks. It is called functional because it is used if the functions and tasks of the persons who will serve their information needs with the help of the designed database are known.
The subject approach focuses on information about the information that will be contained in the database, despite the fact that the query structure may not be defined. In this case, research in a subject area focuses on its most adequate display in the database in the context of the full range of expected information requests.
An integrated approach using the “entity-relationship” method combines the advantages of the previous two. The method comes down to dividing the entire subject area into local parts, which are modeled separately and then combined again into a whole area.

Since the use of the “entity-relationship” method is a combined design method at this stage, it most often becomes a priority.

When methodically divided, local representations should, if possible, include information that would be sufficient to solve a separate problem or to meet the requests of a certain group of potential users. Each of these areas contains about 6-7 entities and corresponds to a separate external application.

The dependence of entities is reflected in their division into strong (base, parent) and weak (child). A strong entity (for example, a reader in a library) can exist in the database on its own, but a weak entity (for example, this reader’s subscription) is “attached” to a strong one and does not exist separately.

It is necessary to separate the concepts of “entity instance” (an object characterized by specific property values) and the concept of “entity type” - an object characterized by a common name and a list of properties.

For each individual entity, attributes (a set of properties) are selected, which, depending on the criterion, can be:

identifying (with a unique value for entities of that type, making them potential keys) or descriptive;
single-valued or multi-valued (with the appropriate number of values for an entity instance);
basic (independent of other attributes) or derived (calculated based on the values of other attributes);
simple (indivisible one-component) or composite (combined from several components).

After this, the attribute is specified, the connections are specified in the local view (divided into optional and mandatory) and the local views are merged. If the number of local areas is up to 4-5, they can be combined in one step. If the number increases, the binary merging of areas occurs in several stages.

During this and other intermediate stages, the iterative nature of design is reflected, which is expressed here in the fact that in order to eliminate contradictions it is necessary to return to the stage of modeling local representations for clarification and change (for example, to change the same names of semantically different objects or to coordinate integrity attributes on same attributes in different applications).

Selecting a control system and database software

The practical implementation of the information system depends on the choice of the database management system. The most significant criteria in the selection process are the following parameters:

type of data model and its compliance with the needs of the subject area,
reserve of possibilities in case of expansion of the information system,
performance characteristics of the selected system,
operational reliability and convenience of the DBMS,
tools aimed at data administration personnel,
the cost of the DBMS itself and additional software.

Errors in choosing a DBMS will almost certainly subsequently provoke the need to adjust the conceptual and logical models.

Logical database design

The logical structure of the database must correspond to the logical model of the subject area and take into account the connection of the data model with the supported DBMS. Therefore, the stage begins with choosing a data model, where it is important to take into account its simplicity and clarity.

It is preferable when the natural data structure coincides with the model representing it. So, for example, if the data is presented in the form of a hierarchical structure, then it is better to choose a hierarchical model. However, in practice, such a choice is often determined by the database management system rather than by the data model. Therefore, the conceptual model is actually translated into a data model that is compatible with the selected database management system.

This also reflects the nature of design, which allows for the possibility (or necessity) of returning to the conceptual model to change it if the relationships between objects (or object attributes) reflected there cannot be implemented using the chosen DBMS.

Upon completion of the stage, database schemas of both levels of architecture (conceptual and external) should be generated, created in the data definition language supported by the selected DBMS.

Database schemas are formed using one of two different approaches:

or using a bottom-up approach, when work comes from the lower levels of defining attributes, grouped into relationships representing objects, based on the relationships existing between attributes;
or using a reverse, top-down approach, used when the number of attributes increases significantly (up to hundreds and thousands).

The second approach involves identifying a number of high-level entities and their relationships with subsequent detailing to the required level, which is reflected, for example, in a model created based on the “entity-relationship” method. But in practice, both approaches are usually combined.

Physical database design

At the next stage of the physical design of the database, the logical structure is displayed in the form of a database storage structure, that is, it is linked to the physical storage environment where the data will be placed as efficiently as possible. Here the data schema is described in detail, indicating all types, fields, sizes and restrictions. In addition to developing indexes and tables, basic queries are defined.

The construction of a physical model involves solving largely contradictory problems:

tasks of minimizing data storage space,
challenges to achieve integrity, security and maximum performance.

The second task conflicts with the first because, for example:

for transactions to function effectively, you need to reserve disk space for temporary objects,
to increase search speed, you need to create indexes, the number of which is determined by the number of all possible combinations of fields involved in the search,
To restore data, database backups will be created and a log of all changes will be kept.

All this increases the size of the database, so the designer is looking for a reasonable balance in which problems are solved optimally by intelligently placing data in memory space, but not at the expense of database security, which includes both protection from unauthorized access and protection from failures.

To complete the creation of a physical model, its operational characteristics are assessed (search speed, efficiency of query execution and resource consumption, correctness of operations). Sometimes this stage, like the stages of database implementation, testing and optimization, as well as maintenance and operation, is taken outside the immediate design of the database.