Structural elements of a relational database. Relational database - basic concepts

DBMS functions.

DBMS functions are of high and low level.

High-level functions:

1. Data definition - using this function, it is determined what information will be stored in the database (type, data properties and how they will be related to each other).

2. Data processing. Information can be processed in different ways: selection, filtering, sorting, combining one information with another, calculating totals.

3. Data management. This function specifies who is allowed to view the data, correct it or add new information, as well as define the rules for shared access.

Low-level functions:

1. Data management in external memory;

2. RAM buffers management;

3. Transaction management;

4. Introduction of a log of changes to the database;

5. Ensuring the integrity and security of the database.

By transaction is called an indivisible sequence of operations, which is monitored by the DBMS from beginning to completion, and in which if one operation fails, the entire sequence is canceled.

DBMS log - a special database or part of the main database, inaccessible to the user and used to record information about all changes to the database.

Introducing the DBMS Log is designed to ensure the reliability of storage in the database in the presence of hardware failures and failures, as well as errors in the software.

Database integrity - this is a property of the database, which means that it contains complete, consistent and adequately reflecting the subject area information.

DBMS classification.

DBMS can be classified:

1. By types of programs:

a. Database servers (for example, MS SQL Server, InterBase (Borland)) - designed to organize data centers in computer networks and implement database management functions requested by client programs using SQL statements (i.e. programs that respond to queries);

b. DB clients - programs that request data. PFSDBMS, spreadsheets, word processors, e-mail programs can be used as client programs;

c. Fully functional databases (MS Access, MS Fox Pro) - a program with a developed interface that allows you to create and modify tables, enter data, create and format queries, develop reports and print them.

2. According to the data model of the DBMS (as well as the DB):

a. Hierarchical - are based on a tree structure of information storage and resemble the file system of a computer; the main disadvantage is the inability to implement the many-to-many relationship;

b. Network - which replaced the hierarchical ones and did not last long, since the main drawback was the complexity of developing serious applications. The main difference between the network and the hierarchical one is that in the hierarchical structure "record - descendant" has only one ancestor, and in the network descendant it can have any number of ancestors;

c. Relational - whose data is located in tables, between which there are certain links;

d. Object oriented - they store data in the form of objects and the main advantage when working with them is that you can apply an object-oriented approach to them;

e. Hybrid, i.e. object - relational - combine the capabilities of relational and object - oriented databases. An example of such a database is Oracle (previously it was relational).

3. Depending on the location of the individual parts of the DBMS, they are distinguished:

a. local - all parts of which are located on one computer;

b. network.

Network includes:

- with organization file - server;

With such an organization, all data is located on one computer, which is called a file - server, and which is connected to the network. When finding the necessary information, the entire file is transferred, including a lot of redundant information. And only when creating a local copy, the required record is found.

- with a client-server organization;

The database server receives a request from the client, looks for the required record in the data and transfers it to the client. The query to the server is formed in the structured query language SQL, therefore the database servers are called SQL servers.

- distributed DBMS contain several tens and hundreds of servers located on a large territory.

Basic provisions of the relational database model.

Relational database is called a database in which all data is organized in the form of tables, and all operations on this data are reduced to operations on tables.

Features of relational databases:

1. Data is stored in tables consisting of columns and rows;

2. There is one value at the intersection of each column and row;

3. Each column - field has its own name, which serves as its name - an attribute, and all values in one column are of the same type;

4. Columns are arranged in a certain order, which is specified when creating a table, as opposed to rows, which are arranged in an arbitrary order. There may not be a single row in the table, but there must be at least one column.

Relational database terminology:

Relational database element	Presentation form
1. Database	Set of tables
2. Database schema	A set of table headers
3. Attitude	table
4. Relationship diagram	Table Column Headers Row
5. Essence	Description of object properties
6. Attribute	Column heading
7. Domain	Many valid attribute values
8. Primary key	A unique identifier that uniquely identifies each record in the table
9. Data type	The type of the values of the elements in the table
10. Tuple	String (write)
11. Cardinality	Number of rows in the table
12. Degree of attitude	Number of fields
13. Body relationship	Multiple relation tuples

When designing a relational database, data is placed in several tables. Relationships are established between tables using keys. When linking tables, the main and additional (subordinate) tables are selected.

There are the following types of relationships between tables:

1. Relationship of the form 1: 1 (one to one) means that each record in the main table corresponds to one record in the additional table and, conversely, each record in the additional table corresponds to one record in the main table.

2. Relationship type 1: M (one to many) means that each record in the main table corresponds to several records in the additional table and, conversely, each record in the additional table corresponds to only one record in the main table.

3. Relationship like M: 1 (many to one) means that one or more records in the main table correspond to only one record in the secondary table.

4. Relationship of the form M: M (many to many) - this is when several records of the additional table correspond to several records of the main table and vice versa.

5. The main components of MS Access.

The main components (objects) of MS Access are:

1. Tables;

3. Forms;

4. Reports;

5. Macros:

Modules.

table Is an object designed to store data in the form of records (rows) and fields (columns). Each field contains a separate part of the record, and each table is used to store information about one specific question.

Inquiry - a question about the data stored in tables, or an instruction for selecting the records to be changed.

The form Is an object in which you can place controls for entering, displaying and changing data in the fields of tables.

Report Is an object that allows you to present user-defined information in a certain way, view and print it.

Macro - one or more macros that can be used to automate a specific task. Macro is the main building block of a macro; a stand-alone instruction that can be combined with other macros to automate a task.

Module - a set of descriptions, instructions and procedures stored under one name. There are three types of modules in MS Access: form module, report module and general module. Form and report modules contain a local program for forms and reports.

6. Tables in MS Access.

There are the following methods for creating tables in MS Access:

1. Table mode;

2. Constructor;

3. Table Wizard;

4. Import of tables;

5. Relationship with tables.

V table mode the data is entered into an empty table. A table with 30 fields is provided for data entry. After saving it, MS Access decides for itself which data type to assign to each field.

Constructor provides the ability to independently create fields, select data types for fields, field sizes and set field properties.

To define a field in the mode Constructor are set:

1. Field name , which in each table must have a unique name that is a combination of letters, numbers, spaces, and special characters, with the exception of " .!” “ ". The maximum name length is 64 characters.

2. Data type defines the type and range of valid values, as well as the amount of memory allocated for this field.

MS Access data types

Data type	Description
Text	Text and numbers, such as names and addresses, phone numbers, postal codes (up to 255 characters).
Memo field	Long text and numbers, such as comments and explanations (up to 64,000 characters).
Numerical	General data type for numerical data that allows mathematical calculations, with the exception of monetary calculations.
Date Time	Date and time values. The user can choose standard shapes or create a custom format.
Monetary	Monetary values. For monetary calculations, it is not recommended to use numeric data types, since they can be rounded off in calculations. Currency values are always displayed with the specified number of decimal places after the decimal point.
Counter	Automatically exposed sequential numbers. The numbering starts from 1. The counter field is convenient for creating a key. This field is compatible with a numeric field that has the Size property set to Long.
Logical	Values are Yes / No, True / False, On / Off, one of two possible values.
OLE Object Field	Objects created in other programs that support the OLE protocol.

3. The most important field properties:

- Field size sets the maximum size of data stored in the field.

- Field format is a display format for a given data type and sets the rules for presenting data when displaying it on the screen or printing.

- Field signature sets the text that is displayed in tables, forms, reports.

- Condition on value allows you to control input, sets restrictions on input values, in case of violation of conditions, prohibits input and displays the text specified by the Error message property;

- Error message specifies the text of the message displayed on the screen when the constraints specified by the Condition on the value are violated.

Control type- a property that is set on the Substitution tab in the table designer window. This property determines whether the field will be displayed in the table and in what form - as a field or as a combo box.

Unique (primary) key tables can be simple or complex with several fields.

To define a key, the fields that make up the key are highlighted, and the button on the toolbar is pressed key field or the command is executed Edit / Key Field.

© 2015-2019 site
All rights belong to their authors. This site does not claim authorship, but provides free use.
Date the page was created: 2016-02-16

Level 1: The level of external models is the topmost level where each model has its own view of the data. This level defines the database point of view of individual applications.

Conceptual level: The central control link, where the database is presented here in the most general form, which unites the data used by all applications. In fact, the conceptual level reflects a generalized domain model.

Physical layer (Database): These are the data themselves located in files or in page structures located on external storage media.

Data models

The following data models are distinguished:

1. Infological

2. Date logical

3. Physical

The database design process begins with the design of an infological model. An infological data model is a generalized informal description of the database being created, made using natural language, mathematical formulas, tables, graphs, and other means that are understandable to all people working on database design.

Domain tuple

The infological model displays the real world in a human-readable concept that is completely independent of the data storage environment. Therefore, the Infological model should not change until some changes in the real world require changes outside the definition, so that this model continues to display the subject area.

There are many approaches to building this model: graph models, semantic networks, entity-relationship, and others.

Datalogical model

The infological model should be displayed in a datalogical model that the DBMS can understand. Datalogical model is a formal description of the infological model in the DBMS language.

Hierarchical model

This model is a collection of related elements that form a hierarchical structure. The basic concepts of a hierarchy are level, node, and relationship.

communication level

A node is a collection of data attributes describing an object. Each node is associated with one higher-level node and with any number of lower-level nodes. The exception is the highest level node. The number of trees in the database is determined by the number of tree roots. Each database record has a single path from the root record. A simple example is the Internet Domain Name System \ address. On the first level (the root of the tree) lies our planet earth, on the second - the Country, on the third - the Region, on the fourth - the settlement, street, house, apartment. A typical representative is a DBMS from IBM - IMS.

All instances of a given descendant type with a common ancestor type instance are called twins. A complete traversal order is defined for the database. From top to bottom and from right to left.

Physical model

A physical model is built on the basis of the datalogical model. The physical organization of the data has a major impact on the performance of the database. DBMS developers are trying to create the most efficient physical data models, offering users one or another toolkit for customizing the model for a specific database.

Example: In particular for a relational database, it already takes into account:

1. Physical aspects of storing tables in specific files.

2. Creation of indexes that optimize the speed of data operations using the application.

3. Performing various actions on data at certain events, defined by users using triggers and stored procedures.

Infological models X

Physical Models

For all levels and for any method of representing the subject area, the coding of concepts of relations between concepts lies. A key step in the development of any information system is to conduct a system analysis:

Formalization of the subject area and representation of the system as a set of components.

Composition as the basis of systems analysis can be functional (building a hierarchy).

However, in most systems, when it comes to databases, data types are more static than the way they are processed. Therefore, such methods of system analysis as the data flown diagram have been intensively developed. Development of relational databases. Stimulated the development of building data development techniques in particular ER diagrams ER. The relational data model directly uses the concept of a relationship as a mapping. It is closest to the conceptual data presentation model. And it often underlies it.

Unlike the graph model theorist, in the relational model, relationships between relationships are implemented in an implicit way, for which the keys of relationships are used. For example, relationships of a hierarchical type are implemented by the mechanism of primary and foreign keys, when the attribute fact must be present in the subordinate relationship.

Such an attribute of relations in the main relations will be called the primary key, and in the subordinate, the secondary key.

Progress in the development of programming languages associated primarily with data typing and the emergence of object-oriented languages has made it possible to approach the analysis of complex systems from the point of view of hierarchical representations, that is, using object classes with polymorphism, inheritance, and encapsulation properties.

THE RATIO IS A TABLE.

Editing tables, records ...

Deleting what was created and

Editing.

Relational database model

Relational data models are currently the most popular because of this kind of data presentation.

A relational model can be thought of as a special method of presenting data, containing its own data (in the form of tables) and ways of working and manipulating with it (in the form of links). The relational model assumes three conceptual elements: Structure, Integrity, and Data Processing. These elements have their own obligatory concepts that need to be explained for further presentation.

The table is considered as a direct data store. Traditionally, in relational systems, a table is called attitude. The table row is called by a tuple and the column attribute... In this case, the attributes have unique names (within the relationship).

The number of tuples in the table is called cardinal number... Number of attributes degree. A unique identifier is set for the relationship, that is, one or more attributes whose values are not the same at the same time - the identifier is called primary key.Domain it is the set of admissible homogeneous values for this or that attribute. Thus, a domain can be considered as a named set of data, and the constituent parts of this set are logically indivisible units (for example, a list of the names of employees of an institution can act as a domain, but not all surnames can be present in the table).

SUMM Kireeva 25.50 Motyleva 17.05 … …. …

Attitude

attributes

Fields KOD, NAME, SUMM are table attributes contained in the header.

Pairs KOD 5216, NAME Kireev, SUMM 25.50 are elements of the relationship body.

In relational databases, unlike other models, the user specifies what data is needed for him and not how to do it. For this reason, the process of moving and navigating the database in relational systems is automatic, and this task in the DBMS is performed by optimizer. Its job is to fetch data from the database on demand in the most efficient way. Thus, the optimizer should at least be able to determine from which tables the data is fetched, how much information in these tables and what is the physical order of records in the tables and how they are grouped.

In addition, the relational database also performs the functions of a directory. The catalog contains a description of all the objects that make up the database: tables, indexes, triggers, etc. Obviously, a component such as an optimizer is vital for the proper operation of the entire system. The optimizer uses the information stored in the catalog. An interesting fact is that the catalog itself is a set of tables, so the DBMS can manipulate it in traditional ways, without resorting to any special tricks and methods.

Domains and relationships

Basic definitions: Domains, types of relations, predicates.

A relationship has a number of basic properties:

1. In the most general case, there are no common tuples in relations - this follows from the very definition of relations. However, for some DBMS, in some cases, a deviation from this property is allowed. As long as the relationship has a primary key, identical tuples are excluded.

2. Tuples are not ordered from top to bottom - there is simply no concept of a positional number in a relation. In a relationship without losing information, you can successfully arrange the tuples in any order.

3. Attributes are not ordered from left to right. Attributes in the header of relations can be placed in any order, while the integrity of the data is not violated. Therefore, the notion of a positional number in relation to an attribute does not exist either.

4. The value of attributes consists of logically indivisible units - this follows from the fact that the values are taken from domains, otherwise we can say that relations do not contain groups of repetitions. That is, they are normalized.

Relational systems support several types of relationships:

1. Named variables are relations variables defined in the DBMS by means of creation statements and, as a rule, are necessary for a more convenient presentation of information for the user.

2. Basic relationships are directly an important part of the database, so they are given their own name during design.

3. A derived relation is one that was defined through other, usually basic, relations by using the means of the DBMS.

4. This view is actually a named derived relation, and the view is expressed exclusively through DBMS operators applied to named relations, so they do not physically exist in the database.

5. The result of queries is an unnamed derived relation containing data (the result of a specific query). The result is not stored in the database but exists as long as the user needs it.

6. A stored relationship is one that is physically maintained in the memory of a relationship. Stored relationships most often include a relationship base. Based on the above, you can define a relational database as a set of relationships related to each other.

A connection in this case is the association of two or more relationships.

KOD	ADRES

1 1 A one-to-many relationship is that at each moment of time, each element (tuple A) corresponds to several elements of tuple B
∞ Binary Link

Students

Teachers

Timetable of classes

∞

Students

Ternary connections

Data integrity

In relational models, the issue of data integrity is given a special place. Recall that a key or potential key is the minimum set of attributes, by the values of which you can uniquely find the required tuple, minimality means that excluding any attribute from the set does not allow identifying a tuple by the remaining attributes.

Every relationship has at least one possible key. One of them is taken as the primary key.

When choosing a primary key, preference should be given to non-composite keys or keys composed of a minimal set of attributes. It is also undesirable to use keys with long text values (It is preferable to use integer attributes as keys). So to identify an employee, you can use either a unique personnel number, or a passport number, or a set of surnames, patronymic name and department number. It is not allowed that the primary key of the relationship, that is, any attribute participating in the primary key takes undefined values. In this case, a contradictory situation will arise ( collision): The non-unique element of the primary key appears. Therefore, this should be carefully considered when designing a database.

About foreign keys. It is worth noting that since relationship C connects relationship B and A, it must include foreign keys corresponding to the primary keys of relationship A and B.

The foreign key of a table is formed using multiple primary keys of other tables.

Thus, when considering the problem of choosing a way to link a relationship in a database, the question arises of what should be the foreign keys. At the same time, for each foreign key, it is necessary to solve the problem associated with the possibility (or impossibility) of the appearance of undefined values in foreign keys (NULL is the value of the attribute for the missing information). In other words, can there be some tuple in a relationship for which no tuple is known in the related relationship?

On the other hand, you need to think in advance about what happens when you remove tuples from the relationship that the foreign key refers to. However, there are the following possible possibilities:

· Operation cascades- that is, deleting tuples in a relationship leads to the deletion of tuples related by the relationship. For example, deleting information about the last name of the first name, etc. an employee in one respect results in a deletion about his wages in another respect;

· Operation limited to - that is, only those tuples for which there is no related information in another respect are deleted. Not all information is deleted (not in all respects) as it can be used in other ways, the deletion of information in which leads to a breach of data integrity. If such information is available, then the deletion cannot be carried out, for example, deletion of information about the name, surname, etc. an employee is possible only if there is no information in the related relation about his salary.

It is necessary to provide for the technology of what will happen when an attempt is made to update the primary key of a relationship referenced by some foreign key. Here you have the same options as for deleting:

· The operation is cascaded, that is, when the primary key is updated, the foreign key in the related relationship is updated. For example, updating the primary key in a relationship where employee information is stored results in an update of the foreign key in the relationship with payroll information.

· The operation is limited, that is, only those primary keys for which there is no other related information are updated. If such information is available, then the update cannot be done. For example, updating the primary key in a relationship where information about an employee is stored is possible only if there is no information about his salary in the related relationship.

Relational algebra

The formal basis of the base of the relational database model is relational algebra, based on set theory and considering a special operator over relations, and relational calculus based on mathematical logic.

Work

A A A B C C

D D E

D D

A B C

D D E

F F G

It should be noted that relational algebra is very powerful - complex database queries can be expressed using a single expression. It is for this reason that these mechanisms are included in the relational data model. Any query expressed using a single relational algebra expression, or a single relational calculus formula, can be expressed using a single operator of this language.

Relational algebra has an important property - it is closed with respect to the concept of a relation. This means that the expression of relational algebra is performed on the relations of relational databases and the results of their calculation are also relations.

The main idea of relational algebra is that the means of manipulating relations, considered as a set, are based on traditional multiple operations, supplemented by some specific operations for the database.

Let us describe a variant of algebra that was proposed by KODDOM. The operation consists of 8 main operators:

Fetching a relation (unary operation)

Relationship projection (unary operation)

Union relations

Intersection of relations (binary operation)

Subtraction of relations

Product relationship

Connecting relationships

Division of relations

These operations can be explained as follows:

· The result of selecting a relation for some condition is a relation that includes only those tuples of the original relation that satisfy this condition.

· When a relation is projected onto a given set of its attributes, a relation whose tuples are taken from the corresponding tuples of the first relation will be obtained.

· When performing the operation of joining two relations, a relation will be obtained that includes all tuples included in at least one of the relations participating in the operation.

· When performing the operation of intersection of two relations, a relation will be obtained that includes all tuples included in both initial relations.

· When performing the operation of subtracting two relations, a relation will be obtained that includes all tuples included in the first relation, except for those that are also included in the second relation.

· When performing a direct product of two relations, a relation whose tuples are a combination of the tuples of the first and second relation is obtained.

· When two relations are joined by some condition, the resulting relation of tuples is formed as a combination of tuples of the first and second relations that satisfy this condition.

· The operation of relational division has two operands - binary (consisting of two attributes) and unary (consisting of one attribute) relations. The result of the operation is a relation consisting of tuples including the relation of the first attribute of the tuples of the first relation, such that the set of values of the second attribute coincides with the set of values of the second relation.

In addition to the above, there are a number of special operations typical for working with databases:

· As a result of the renaming operation, the relation is a set of tuples, which coincides with the body of the original relation, but the names of the attributes have been changed.

It follows that the result of a relational operation is some relation, then it is possible to form relational expressions in which, instead of the initial relation (operand), a nested relational expression will be used. This is due to the fact that the operations of relational algebra are indeed closed to the concept of a relation. Let's start with the operation uniting relations, however, this applies equally to the operations of intersection and combination, that is, in relational algebra, the result of a union operation is a relation. Assuming in relational algebra the possibility amalgamations arbitrary two relations with different sets of attributes, then the result of such an operation will be many, but many different types of tuples, that is, generally speaking, not a relation. If we proceed from the requirement that relational algebra is closed with respect to the concept of a relation, then such an operation amalgamations is meaningless. This leads to the emergence of the concept relationship compatibility on amalgamation: two relations are compatible only if they have the same headers, that is, they have the same set of attribute names, and the same attributes are defined in the same domain.

Provided that two relations are compatible in terms of union, when the operation of union of intersection of subtraction is usually performed on them, the result of the operation is a relationship with a correctly defined title that coincides with the title of each of the relations - operands. If two relations are not completely compatible in terms of union, that is, they are compatible in everything except attribute names, then before performing an operation such as a connection, these relations can be made completely compatible in terms of union by applying the rename operation.

The operation of the direct product of two relations raises new problems. In Set Theory, the direct product can be obtained for any sets. The elements of the result set will be pairs made up of the elements of the first and second sets. Since relations are sets, it is possible to obtain a direct product for any two relations. However, the result will not be an attitude. The elements of the result will not be tuples, but pairs of tuples. Therefore, in relational algebra, a special form of the operation of taking a direct product is used - an extended direct product of relations. When taking the extended direct product of two relations, the element of the resulting relation is a tuple formed by merging one tuple of the first relation and one tuple of the second relation. Immediately, a second problem arises, associated with obtaining a correctly formed header of the resulting relationship, this leads to the need to introduce the concept of compatibility of relationships by taking an extended direct product.

Two relations are compatible in taking a direct product only if the set of attribute names of these relations do not overlap. Any two relations can be converted to a compatible form by taking the direct product by applying the rename operation to one of these relations.

The fetch operation requires two relations: the original relation, the operand, and a simple constraint condition. As a result of the selection operation, a title relation is produced which matches the title of the operand relation, and the body contains those tuples of the operand relation that satisfy the values of the constraint condition.

Let's introduce a number of operators.

Let union denote the union operation, intersect the intersection operation, minus the subtraction operation. To denote a selection operation, we will use the construction A where B, where A is the operand relation, and B is a simple comparison condition. Let C1 and C2 be two simple sampling conditions

A where C1 AND C2 is identical to (A where C1) intersect (A where C2)

A where C1 OR C2 is identical to (A where C1) union (A where C2)

A where C1 not C2 is identical to (A where C1) minus (A where C2)

Using these definitions, it is possible to implement selection operations in which the selection condition is an arbitrary logical expression composed of simple conditions using logical connections (and, or, not). The operation of taking projections, the relation A op to the list of attributes a1, a2,…, an will be a relation whose title is the set of attributes, a1, a2,…, an. The body of the result will consist of tuples for which in relation A there is a tuple, the a1 attribute has the b1 value, the a2 attribute the b2 value< и так далее атрибут an – bn. По сути при выполнении операции проекции определяется «Вертикальная» вырезка отношения - операнда с удалением возникающих кортежей –дубликатов.

The join operation, sometimes called conditional join, requires two operands, a joinable relationship, and a third operand, a simple condition. Let the relation A and B be connected. As in the case of the selection operation, the condition of the join C has the form, (a comp –op b) or (a comp –op const) where A and B are the names of the attributes of the relations A and B, const- literally given constant. Comp-op is a valid comparison operation in this context. Then, by definition, the result of the join operation is the relation obtained by performing the constraint operation, by condition C of the direct product of the ratio A and B.

There is an important special case of connection, natural connection. A join operation is called a natural join operation if the join conditions are (a = b) where a and b are attributes of different join operands. This case is important because it is especially common in practice and there are efficient implementation algorithms for it in a DBMS. The natural join operation is applied to a pair of relations A and B that share a common attribute P, that is, an attribute with the same name and defined on the same domain. Let ab denote the union of the headings of relations A and B. Then the natural union is the result of the union of A and B projected onto a. The operation of a natural join is not included directly in the set of operations of relational algebra, but it has very important practical significance.

The operation of dividing a relationship needs a more detailed explanation because it is difficult to understand. Let two relations A (a1, a2, .., an, b1, b2, ..., bm) be given

B (b1, b2, ..., bn) We will assume that attribute b1 of relationship A and attribute b1 of relationship B are defined on the same domain. Let's call the set of attributes (aj) a composite attribute a, and the set (bj) with a composite attribute b. After that, we will talk about the relational division of the binary relation A (a, b) by the unary relation B (b).

The result of dividing A by B is a unary relation C (a), consisting of such tuples v that in relation A there are tuples which in the set of values (w) include the set of values of b in relation to B.

Since division is the most difficult operation, let us explain it with an example. Let there be two relations in the students' database: STUDENTS (FULL NAME, NUMBER) and NAMES (FULL NAME), and the unary relation NAMES contains all the surnames that the students of the institute have. Then, after performing the operation of relational division of the relation STUDENTS by the relation NAMES, a unary relation will be obtained containing the numbers of student cards belonging to students with all surnames possible in this institute.

Relational calculus

Suppose there is a database with the structure STUDENTS (number, name, scholarship, group code), and the relation GROUP (gr_nom, gr_col, gr old). Suppose that you need to find out the names and numbers of students. tickets for students who are heads of groups with more than 25 people. In relational algebra, you need to take the following steps for such a query:

1. Perform connection of relations STUDENTS and GROUPS, according to the condition "student_number = gr_star";

2. Limit the resulting ratio by the condition gr_col> 25.

3. Project the result of the previous operation on the attribute student_name, student_number.

Here, step by step, the sequence of execution of a query in the database is formulated, each of which corresponds to one relational operation. If we formulate the same query using the relational calculus, then we would get a formula that can be read: Give STUD_NAME and STUD_NUMBER for such students so that such a group GR_STAR and the value GR_COL> 25 coexist. In the second formulation, we indicated only the characteristics of the resulting relationship, but did not say anything about the method of its formation. In this case, the DBMS must decide for itself what kind of operations and in what order it is necessary to perform on the relations STUDENTS and GROUPS. Both methods considered in the example are actually equivalent and there are not very complex conversions from one to another.

The basic concepts of relational calculus are the concepts of a variable with a certain range of its value, and the concept of a well-formed formula based on variables and specials. Functions. What is the scope of the variable? The tuple calculus and the domain calculus, that is, along or across, differ. In tuple calculus, the domains of variable definition are database relations, that is, the admissible value of each variable is a tuple of some relation. In the calculus of domains, the domains of definition of variables are the domains on which the attributes of database relations are defined, that is, the valid value of each variable is the value of each variable.

Byte

Integer

String

Char

The RANGE command is used to define tuples. For example, to define a variable STUDENT whose scope is STUDENTS, you need to use the construction RANGE STUDENT IS STUDENTS. From this definition it follows that at any moment of time the variable student represents a tuple of the relation STUDENTS. When using tuple variables in formulas, you can reference the attribute values of the variables. For example, to refer to the value of the STUD_NAME attribute of the STUDENT variable, use the STUDENT.STUD_NAME construction.

Correctly constructed formulas serve to express conditions imposed on tuple variables. Such formulas are based on simple comparisons, which are operations of comparing the values of attributes of variables and literally specified constants. For example, the construction STUDENT.STUD_NOM = 123456. Is a simple comparison. A more complex version of compound formulas is with the help of logical connections AND, OR, NOT, IF… THEN. Finally, it is allowed to build well-formed formulas using quantifiers. If F is a well-formed formula in which the variable var participates, then the EXIST (existential quantifier) var (F) and FORALL (for all tuples) var (F) constructions are correct.

Variables included in well-formed formulas can be free or bound. All variables included in their composition during the construction of which did not use quantifiers are free. This means that if for some set of values of free tuple variables when calculating formulas, the value "true" is obtained, then these values can be included in the resulting relation. If the quantifier is used in the construction of formulas, then the variables are bound. When calculating the value of such a well-formed formula, not a single value of the associated variable is used, but its entire domain of definition.

1) EXISTS STUD2 (STUD.1STUD_STIP> STUD2.STUD_STIP)

2) FORALL STUD2 (STUD.1STUD_STYP> STUD2.STUD_STIP)

Let STUD1 and STUD2 be two tuple variables defined for the student relation, then the formula for the current tuple of the STUD1 variable takes on the value true only if, in the whole relation, students there is such a tuple associated with the STUD2 variable that the value of its STUD_STIP attribute satisfies the internal comparison condition. Correctly constructed formula No. 2 for the constructed tuple STUD 1 takes on the value true if for all tuples the relation STUDENTS associated with the variable STUD 2 the value of the STUD.STYP attribute satisfies the internal condition.

Thus, well-formed formulas provide a means of expressing a selection condition from a database relationship. In order to be able to use the relational calculus for real work with the database, one more component is required that defines the set and column names of the resulting relation. This component is called target list.

Target list looks like:

· Var.attr - the name of a free variable, attr is the name of the relationship attribute on which the variable var is defined.

· Var which is equivalent to a relation from a list, Var.attr1, Var.attr1… Var.attr№ includes the names of all attributes of the defining relation.

· New_name = var.attr; the new name of the corresponding attribute of the resulting relationship.

The latter option is required in cases where the code in the formula uses several free variables with the same scope. In domain calculus, domains are not domains but domains. With regard to the GROUP STUDENTS database, we can talk about the domain variables NAME (Domain values are valid names or NOM STUD). (Domain values are valid student numbers).

The main difference between the calculus of domains and the calculus of tuples is the presence of an additional set of predicates that allow one to express the so-called membership conditions. If R is an n-ary relation with attributes (a1, a2,… an), then the membership condition has the form R (ai1: Vi1, ai2: Vi2,… aim: Vim) where (m<=n). Где в Vij это либо литерально заданная константа либо имя кортежной переменной. Условие членства принимает значение истина, только в том случае если в отношении R существует кортеж, содержащий следующие значения указанных атрибутов. Если от Vij константа то на атрибут aij накладывается жёсткое условие независящее от текущих доменных переменных. Если же Vij имя доменной переменной то условие членства может принимать различные значения при разных значениях этой переменной.

A predicate is a logical function that returns true or false for some argument. A relationship can be viewed as a predicate with arguments that are attributes of the relationship in question. If a given specific set of tuples is present in the relation, then the predicate will return true; otherwise, false.

In all other respects, formulas and expressions for domain calculus look similar to formulas and expressions for tuple calculus. Relational domain notation is at the heart of most form-based language queries.

Similar information.

A data model is a collection of data structures and operations for their processing. Using the data model, you can visualize the structure of objects and the relationships established between them. The terminology of data models is characterized by the concepts of "data element" and "binding rules". A data item describes any set of data, and the binding rules determine the algorithms for the relationship of data items. To date, many different data models have been developed, but in practice, three main ones are used. Allocate hierarchical, network and relational data models. Accordingly, they talk about hierarchical, network and relational DBMS.

О Hierarchical data model. Hierarchically organized data is very common in everyday life. For example, the structure of a higher education institution is a multilevel hierarchical structure. A hierarchical (tree-like) database consists of an ordered set of elements. In this model, the original elements give rise to other elements, and these elements in turn give rise to the next elements. Each child has only one parent.

Organizational charts, lists of materials, table of contents in books, project plans, and many other collections of data can be presented in a hierarchical manner. Integrity of links between ancestors and descendants is automatically maintained. Basic rule: no descendant can exist without its parent.

The main disadvantage of this model is the need to use the hierarchy that was laid in the basis of the database during the design. The need for constant data reorganization (and often the impossibility of this reorganization) led to the creation of a more general model - network.

О Network data model. The networked approach to organizing data is an extension of the hierarchical approach. This model differs from the hierachic one in that each generated element can have more than one parent element. ■

Since the network database can directly represent all types of relationships inherent in the data of the corresponding organization, this data can be navigated, explored and queried in all sorts of ways, that is, the network model is not connected by just one hierarchy. However, in order to compose a query to a network database, it is necessary to delve deeply into its structure (to have a diagram of this database at hand) and develop a mechanism for navigating the database, which is a significant drawback of this database model.

About the relational data model. The basic idea behind a relational data model is to represent any dataset as a two-dimensional table. In its simplest case, a relational model describes a single two-dimensional table, but most often this model describes the structure and relationships between several different tables.

Relational data model

So, the purpose of the information system is to process data about objects the real world, taking into account connections between objects. In DB theory, data is often called attributes, and objects - entities. Object, attribute and connection are fundamental concepts of I.S.

An object(or entity) is something that exists and discernible, that is, an object can be called that "something" for which there is a name and a way to distinguish one similar object from another. For example, every school is an object. Objects are also a person, a class at a school, a firm, an alloy, a chemical compound, etc. Objects can be not only material objects, but also more abstract concepts that reflect the real world. For example, events, regions, works of art; books (not as printed products, but as works), theatrical performances, films; legal norms, philosophical theories, etc.

Attribute(or given)- this is some indicator that characterizes a certain object and takes a certain numerical, textual or other value for a specific instance of the object. The information system operates with sets of objects designed in relation to a given subject area, using specific attribute values(data) of certain objects. For example, let's take classes in a school as a collection of objects. The number of students in a class is a given, which takes a numerical value (one class has 28, another has 32). The name of the class is a given one that takes a textual meaning (one has 10A, the other has 9B, etc.).

The development of relational databases began in the late 1960s, when the first papers appeared discussing; the possibility of using in the design of databases the usual and natural ways of presenting data - the so-called tabular datalogical models.

The founder of the theory of relational databases is considered to be an employee of IBM, Dr. E. Codd, who published the article A Relational Model of Data for Large-Shared Data Banks(Relational data model for large collective data banks). This article first used the term "relational data model." The theory of relational databases, developed in the 70s in the United States by Dr. E. Codd, has a powerful mathematical foundation describing the rules for efficiently organizing data. The theoretical basis developed by E. Codd became the basis for the development of the theory of database design.

E. Codd, being a mathematician by education, suggested using the apparatus of set theory (union, intersection, difference, Cartesian product) for data processing. He proved that any dataset can be represented as two-dimensional tables of a special kind, known in mathematics as "relations."

Relational such a database is considered in which all data is presented to the user in the form of rectangular tables of data values, and all operations on the database are reduced to manipulating tables.

The table consists of columns (fields) and lines (records); has a name that is unique within the database. table reflects Object type the real world (essence), and each of her string is a specific object. Each column in a table is a collection of values for a particular attribute of an object. The values are selected from the set of all possible values of the object attribute, which is called domain.

In its most general form, a domain is defined by specifying some basic data type, to which the elements of the domain belong, and an arbitrary logical expression applied to the data elements. If, when evaluating a logical condition on a data item, the result is "true", then this item belongs to the domain. In the simplest case, a domain is defined as a valid potential set of values of the same type. For example, the set of dates of birth of all employees constitutes the “date of birth domain”, and the names of all employees constitute the “employee name domain”. The date of birth domain has a data type that stores information about points in time, and the employee name domain must have a character data type.

If two values come from the same domain, you can compare the two values. For example, if two values are from the date of birth domain, you can compare them to determine which employee is older. If the values are taken from different domains, then their comparison is not allowed, since, in all likelihood, it does not make sense. For example, nothing definite will come out of comparing the name and date of birth of an employee.

Each column (field) has a name, which is usually written at the top of the table. When designing tables within a specific DBMS, it is possible to select for each field its type of, that is, define a set of rules for displaying it, as well as define those operations that can be performed on the data stored in this field. The sets of types may differ for different DBMSs.

The field name must be unique in the table, however different tables can have fields with the same name. Any table must have at least one field; the fields are arranged in the table in accordance with the order of their names when it was created. Unlike fields, strings do not have names; their order in the table is not defined, and the number is not logically limited.

Since the rows in the table are not ordered, it is impossible to select a row by its position - there is no “first”, “second”, “last” among them. Any table has one or more columns, the values in which uniquely identify each of its rows. Such a column (or combination of columns) is called primary key... An artificial field is often introduced to number records in a table. Such a field, for example, can be its ordinal, which can ensure the uniqueness of each record in the table. The key must have the following properties.

Uniqueness. At any moment in time, no two different tuples of the relation have the same value for the combination of the attributes included in the key. That is, there cannot be two lines in the table that have the same identification number or passport number.

Minimality. None of the attributes included in the key can be excluded from the key without violating the uniqueness. This means that it is not necessary to create a key that includes both the passport number and the identification number. It is sufficient to use any of these attributes to uniquely identify the tuple. You should also not include a non-unique attribute in the key, that is, it is forbidden to use a combination of an identification number and an employee's name as a key. By excluding the employee name from the key, you can still uniquely identify each row.

Each relation has at least one possible key, since the totality of all its attributes satisfies the uniqueness condition - this follows from the very definition of the relation.

One of the possible keys is randomly selected in as the primary key. The remaining possible keys, if any, are taken as alternative keys. For example, if you select an identification number as the primary key, then the passport number will be an alternate key.

The relationship of tables is an essential element of the relational data model. She is supported foreign keys.

When describing a relational database model, different terms are often used for the same concept, depending on the level of description (theory or practice) and the system (Access, SQL Server, dBase). Table 2.3 summarizes the terms used.

Table 2.3. Database terminology

Database Theory ____________ Relational Databases _________ SQL Server __________

Relation Table Table

Tuple Record Row

Attribute Field _______________ Column

Relational databases

Relational database is a set of relations containing all the information that must be stored in the database. That is, the database represents the set of tables required to store all of the data. Relational database tables are logically related to each other. Design requirements for a relational database can be summarized in a few rules.

О Each table has a unique name in the database and consists of rows of the same type.

О Each table consists of a fixed number of columns and values. More than one value cannot be stored in one column of a row. For example, if there is a table with information about the author, publication date, circulation, etc., then the column with the author's name cannot contain more than one surname. If the book is written by two or more authors, you will have to use additional tables.

О At no point in time will there be two rows in the table that duplicate each other. Rows must differ by at least one value in order to be able to uniquely identify any row in the table.

О Each column is assigned a name that is unique within the table; a specific data type is set for it so that homogeneous values (dates, surnames, phone numbers, sums of money, etc.) are placed in this column.

О The complete informational content of the database is represented in the form of explicit values of the data itself, and this method of presentation is the only one. For example, the relationship between tables is based on the data stored in the corresponding columns, and not on the basis of any pointers that artificially define the relationship.

О When processing data, you can freely refer to any row or column of the table. The values stored in the table do not impose any restrictions on the order in which the data is accessed. Column description,

Relational model

The relational database model was proposed in 1969 by E.F. Codd (E.F. Codd). For some introductory information on relational databases, see the overview article “ DB and DBMS"2. Since at present it is relational databases that are dominant, in this article (as well as in articles" Description of data”, “Data processing" and " Database design”2) the most essential concepts of the relational model are examined in detail.

Immediately, we note that the theory of relational databases was originally formulated in a rigorous mathematical language, and it is rigorous, formally defined mathematical concepts that best describe the essence of things. At the same time, in most cases, it is possible, without much damage, to sacrifice the rigor of terminology in favor of the transparency of the presentation, which we will try to do.

The basic idea behind the relational model is as follows. The database consists of a number of unordered tables(in the simplest case - from one table). Tables can be manipulated through non-procedural (declarative) operations - requests, the results of which are also tables.

Often the word "relational" ( relational) in the term "relational model" is interpreted based on the fact that links are established in a relational database ( relate) between tables. This explanation is convenient, but not precise. In Codd's original system of terms, the communication terms ( relations), attributes ( attributes) and tuples ( tuples) were used where most of us use the more familiar terms tables, columns (fields), and rows (records).

When building an infological model of the subject area (see “ DB and DBMS”, “Database design”2) stand out entities(objects), they are described properties a (characteristics, attributes) that are significant for modeling purposes, and relationships are established between entities. At the stage of transition from an infological to a datalogical relational model, tables appear. Typically, each entity is represented by one table. Each row of the table (one record) corresponds to one instance of the entity, and each field describes some property (attribute).

For example, if we need to store information about people, including the last name of each, first name, patronymic, TIN, country of residence and date of birth, then the entity is exactly the person, and the specified data are attributes. The entity itself naturally becomes the name of the table.

Table "Person"

The relational model requires each row in the table to be unique, i.e. so that any two lines differ in the value of at least one attribute.

The traditional tabular form is useful when you need to present the data itself. If, as in the above example, you are only interested in structure- field names, then from the point of view of clarity, ease of use in diagrams and saving space, it is more convenient to depict it as follows:

Keys

Key tablesis a field or group of fields containing values that are unique within this table... The key uniquely identifies the corresponding row in the table. If the key consists of one field, it is often called simple if out of several - composite... In the above example, the key is the TIN field (we consider it known that TINs are unique within the country).

Let's look at an example of a table with a composite key. It is not uncommon for weather forecast sites to present information in the following way: for each date, the forecast temperature is indicated at night, in the morning, in the afternoon and in the evening. To store the specified information, you can use a table of the following form:

In this table, neither the Date field, nor the Time of day, nor the Temperature are keys - values can be repeated in each of these fields. But the combination of Date + Time of day fields is unique and uniquely defines a table row. This is the composite key.

Often there is a situation in which the choice of a key is not unambiguous. Let's go back to the first example. Suppose, in addition to the surname, first name, patronymic, TIN, date of birth, it is required to store the series and number of the civil passport and the series and number of the foreign passport. The table will look like this.

There are as many as three keys to choose from in this table. One of them is simple (TIN), the other two are composite (Series + Civil passport number and Series + Foreign passport number). In such a situation, the developer chooses the most convenient key from the point of view of organizing the database (in general, the key, the search for the value of which takes the least time). The chosen key in this case is often called the master, or primary, a key, and other combinations of columns from which a key can be made are possible, or alternative, keys. Note that there is always at least one possible key in the table, since rows cannot be repeated and, therefore, the combination of all columns is guaranteed to be a possible key.

When displaying tables, it is customary to highlight the primary keys of the tables. For example, relevant fields are often underlined. And Microsoft Access bolds key fields.

Even more often than with ambiguity in the choice of a key, developers are faced with the absence of a key among the data that needs to be stored. A similar fact can be established in the process of analyzing the subject area. For example, if you need to store a simple list of people - first names, surnames, patronymics and dates of birth, then there is no key in this set of attributes at all - a situation is conceivable when two different people have the same data completely. In this case, you have to artificially enter an additional field, for example, a unique person's number. Such a key is sometimes called in the literature surrogate... Quite often, a surrogate key is also introduced for reasons of efficiency. If, for example, a table has a long composite key, then developers often introduce an additional short numeric surrogate key and make it the primary one. Often this is done even if there is a simple key that has an “inconvenient” (ineffective for searching) data type, for example, a string. Such operations are no longer related to theory, but they are quite common in practice.

The attentive reader will probably notice that the key can almost always be expanded (unless it includes all the fields of the table) by including redundant fields. Formally, such a key will remain a key, but from a practical point of view, it is just a game of concepts. Such keys are not considered possible either, since it is always necessary to strive to minimize the length (complexity) of the key.

Normal forms, normalization

Not every table that we can draw on paper or in Word can be a relational database table. And not every table that can be used in a relational database is correct in terms of the relational model requirement.

At first, requires all data within one column to be of the same type(for types, see“Description of data”2). From this point of view, the example below does not even make sense to discuss:

Secondly, requires the table to be assigned a primary key.

These requirements are necessary, but not sufficient. The theory of relational databases introduces the concept of so-called "normal forms" - requirements for organizing data in tables. Normal forms are numbered sequentially as requirements become more stringent. In a properly designed database, tables are in at least third normal form. Accordingly, we will consider the first three normal forms. Recall that we are dealing with tables that satisfy the two basic requirements formulated above.

First normal form (1NF)

The first normal form dictates that all data contained in the table must be atomic(indivisible). The list of corresponding atomic data types is determined by the DBMS. The 1NF requirement is completely natural. It means that each field of each record should contain only one value, not an array or any other data structure. Let's give a meaningful example of a table that is not in 1NF. Suppose we have lists of grades of students in a certain subject.

Since the value of the Evaluation field is not atomic, the table does not meet the requirements of 1NF.

A possible way of presenting the list of grades is written in the guidelines for the article "DB design" 2.

Second normal form (2NF)

A table is said to be in second normal form if it is in 1NF and each non-key column is completely dependent on the primary key. In other words, the value of each field must be completely determined by the value of the primary key. It is important to note that dependence on the primary key is understood precisely as dependence on the key as a whole, and not on its individual component (in the case of a composite key). Here is an example of a table that is not in 2NF. To do this, let's go back to the weather forecast example and add another column to the table - the time of sunrise (this is a completely plausible example, this kind of information is often given on weather forecast sites).

As we remember, this table has a composite key Date + Time of day. The Temperature field is completely dependent on the primary key - there is no problem with it. But the Sunrise field depends only on the Date field. The time of day does not naturally affect the sunrise time.

Here it is appropriate to ask the question: what is the practical meaning of 2NF? What is the use of these restrictions? It turns out - big. Let's say that in the example above, the developer ignores the 2NF requirements. First, the so-called redundancy- storage of unnecessary data. After all, if the sunrise time is already stored for one record with a given date, then for all other records with a given date it should be the same and, generally speaking, there is no need to store it.

Let's pay attention to the words “should be”. And if not? Indeed, at the database level, this is not controlled in any way - the key in the table is composite, the dates can be the same (and most likely will be in the sense). And no formal restrictions (and our understanding that “this cannot be” does not apply to such) do not prohibit specifying different sunrise times for the same date.

Third normal form (3NF)

A table is said to be in 3NF if it matches 2NF and all non-key columns are mutually independent.

It is convenient to understand the interdependence of columns as follows: columns are mutually dependent if you cannot change one of them without changing the other.

Let's give an example of a table that is not in 3NF. Consider an example of a simple address book for storing the home phone numbers of people living, perhaps, in different regions of the country.

In this table, there is a relationship between non-key columns City and City code, therefore, the table is not in 3NF.

Note that the developer determines the presence of the above dependence by analyzing the subject area - such a collision cannot be seen by any formal methods. When you change the properties of the subject area, the dependence between the columns may disappear. For example, if different codes are entered within the same city (like 495 and 499 in Moscow), the corresponding columns cease to be related in terms of violation of the requirements of 3NF.

In the theory of relational databases, forms of higher orders are also considered - the normal form of Boyes - Codd, 4NF, 5NF and even higher. These forms are not of great practical importance, and developers, as a rule, always stop at 3NF.

Database normalization

Normalization is the process of converting database tables to a chosen normal form. Normalization to 2NF, as a rule, comes down to decomposition - splitting one table into several. Normalization to 3NF can usually be done by removing dependent (calculated) columns. In some cases, when normalizing to 3NF, you also have to perform decomposition.

Multi-table databases, relationships between tables, foreign keys

In practice, single-table databases are quite rare, since from the point of view of modeling a domain database, the presence of one table means the presence of one entity. In turn, the presence of several entities usually means the presence of relationships between them.

Without intending to design a complete database, consider an example that allows you to demonstrate relationships in multi-table databases.

Suppose we are dealing with a school that has students grouped by grade and teachers teaching certain subjects. We immediately distinguish four entities: students, teachers, classes and subjects. These entities already give us four tables.

Next, we need to solve the issue of entity attributes - what kind of information we will store. Since our example is for demonstration purposes only, we will try to minimize the amount of stored information. We agree to store the surname and first name for each student, for the class - the parallel number and the letter identifying the class inside the parallel, for the teacher - the last name, first name and patronymic, for the subject - only its name.

Now we have to deal with the primary key issue. The tables of students and teachers, in principle, do not have a key, so we will enter a surrogate numeric key in them - a number. Class and item tables generally have keys. In the table of classes, the key is composite, it is formed by the attributes Parallel Number + Letter, and in the table of objects, a simple key consists of a single field - the name of the object. Recall that when we talk about keys, we mentioned that surrogate keys are often added for efficiency reasons, in an effort to get rid of composite keys or key fields of inconvenient types, such as strings. This is what we will do. Let's add a surrogate numeric key to each of the tables.

As a result, we will get the following set of tables corresponding to the described entities.

Understanding which subject area we are dealing with, we know that our entities do not exist by themselves - they are connected by some of the relationships that we outlined above. But how to connect them technically? You can't do without introducing additional fields and even additional tables. Let's deal with the relationships between entities in order.

To assign a student to a certain class, we will add an additional field Class number in the “Student” table. (It is clear that its type must completely coincide with the type of the field Class number in the table "Class".) Now we can link the tables "Student" and "Class" by the coinciding values of the fields Class number (we did not accidentally name these fields the same, in practice this is often done to easily navigate the binding fields). Note that one record in the “Class” table can correspond to many records in the “Student” table (and in practice it most likely corresponds - it is difficult to imagine a class with one student). Such tables are said to be related by the relationship “ one to many”. And the Class Number field in the "Student" table is called foreign key... As you can see, the purpose of foreign keys is to link tables. Note that the foreign key always refers to the primary key of the related table (ie, the foreign key is on the “many” side). The associated primary key is called parent although this term is less commonly used.

Let's illustrate what has been said with a schema in the style of Microsoft Access (for more information about the Access Data Scheme, see the article "Description of data" 2).

Now let's think about teachers and subjects. Analyzing the subject area (this is the only way - after all, the true state of affairs cannot be extracted from the formal model itself), we notice that the type of connection between the entities “teacher” and “subject” is different from that discussed above. After all, not only one subject can be taught by many teachers, but one teacher can teach many subjects. Thus, there is a connection between these entities “ many to many”. Introducing additional fields is no longer enough (try it!). Many-to-many relationships are always resolved by introducing an additional table. Namely, we will organize the “Teacher-Subject” table, which has the following structure:

Table "Teacher-Subject"

This table has a composite key formed from two of its fields. Both the Teacher table and the Subject table are related to this table in a one-to-many relationship (of course, in both cases, “many” are on the Teacher-Subject side). Accordingly, there are two foreign keys in the “Teacher-Subject” table (both are part of a composite primary key, which is not prohibited) that serve to link with the corresponding tables.

In practice, in addition to the considered "one-to-many" and "many-to-many" relationships, there is also the relationship " one to one”. From the point of view of theory, such a relationship is not of interest, since two tables, connected by a one-to-one relationship, can always be simply combined into one. However, in real databases, a one-to-one relationship is used to optimize data processing. Let us illustrate what has been said with an example.

Let's say we store a lot of various information about people - the data of their various documents, phone numbers, addresses, etc. Most likely, most of this data will be used very rarely. And often we only need a surname, name, patronymic and telephone number. Then it makes sense to organize the two tables and link them in a one-to-one relationship. Store frequently used information in one (small) table, and the rest in another. Naturally, tables in a one-to-one relationship have the same primary key.

Integrity rules

The relational model defines two general rules for database integrity: object integrity and referential integrity.

Integrity rule objects very simple. It requires primary keys of tables not to contain null values.

Referential integrity rule requires foreign keys not to contain values inconsistent with parent keys... Returning to the above example, we must require, for example, that students belong only to the class whose number is indicated in the table “Classes”.

Most DBMSs know how to monitor data integrity (of course, this requires appropriate efforts from the developer at the stage of describing data structures). In particular, mechanisms are used to maintain referential integrity cascading operations. Cascading means, in particular, that when a record is deleted from a “parent” table, linked to another table in a one-to-many relationship, all related records are automatically deleted from the “many” table (by the DBMS itself, without user intervention). And this is natural, because such records "hang in the air", they are no longer connected with anything.

Indexing

Indexing is extremely important from the point of view of practical application, but optional from the point of view of pure theory. The main purpose of indexing is to optimize (speed up) search (and, accordingly, some other operations with the database). Indexing in any case requires additional resources (at the physical level, special index files are most often created). Operations related to data modification, indexing can even slow down, therefore, indexes are usually rarely modified tables, which are often searched.

The index file is very similar to the index of a regular book. For each index value, a list of table rows that contains the given value is stored. Accordingly, you do not need to look through the entire table for searching - you just need to look into the index. However, modifying records may require rebuilding the index. And this takes extra time.

Of course, there is no question of teaching relational database theory as part of a basic computer science course! Nevertheless, this article is very important for our encyclopedia, since in this case we are dealing with material that cannot be fully presented in the lessons, but the teacher must own it. Why?

Firstly, because a number of concepts are studied just within the framework of the basic course. This is both a tabular view of data and table keys. And we all know that it is very difficult to correctly and accurately state only some concepts without presenting the general picture.

Secondly, performing simple database queries with children (the relevant material is presented in the article "Data processing" 2), it is necessary to deal with tables that are correct from the point of view of relational theory. There is no need to explain to the students that these tables are correct, but “if…, then the table would be wrong,” but it is unacceptable to use bad examples.

In a specialized computer science course, the situation may be fundamentally different. The most important and extremely productive form of work in specialized classes is design. Within the framework of educational projects, it is possible and necessary to develop simple databases, and here one cannot do without the foundations of the stated theory. However, the following must be taken into account:

The domains to be modeled should not be too large;

Students should be very familiar with them (in this sense, the “School” project, which is pretty boring for everyone, is not the worst choice!);

It is naive to expect that after listening to the fundamentals of the theory, students will be able to design something themselves. Each step must be taken with them, giving detailed reasons for their actions.

Database management systems and expert systems. Basic concepts of relational databases. Working with queries. Forms. Reports. Database creation.

Database management systems and their functions

In modern database technology for the creation of databases, their support and maintenance, specialized software is used - database management systems. DBMS is a set of software and language tools necessary for the creation and operation of databases.

At the stage of developing databases, the DBMS is used to describe the structure of the database: defining tables; determining the number of fields; the type of data displayed in them; the size of the fields; defining relationships between tables. In addition to tables, most DBMS provide for the creation of special tools for working with data - forms, queries.

During the operation of databases, the DBMS provides editing of the database structure, filling it with data, searching, sorting, selecting data according to specified criteria, and generating reports.

In information systems that run on IBM-compatible personal computers, the so-called dBASE-like database management systems, for example, dBASE, FoxPro and Clipper, have become widespread. For users, it is essential that, differing from each other in command languages and the format of index files, all these DBMS use the same database files with the .DBF extension, the format of which has for some time become a kind of database standard.

In dBASE-like databases, a relational approach to data organization is actually used, i.e. each DBF file is a two-dimensional table that consists of a fixed number of columns and a variable number of rows (records). In terms of technical documentation, each column corresponds to a field of one of five types (N - numeric, C - character, D - date, L - logical, M - note), and each row - a fixed length record consisting of a fixed number fields. Using the command languages of these DBMSs, layouts of DBF files (table descriptions) are created and corrected, index files are created, procedures for working with databases are described (reading, searching, modifying data, generating reports, and much more). A characteristic feature of the DBF file is its simplicity and clarity: the physical representation of data on disk exactly corresponds to the representation of a table on paper. In general, however, systems built from DBF files should be considered obsolete.

Other DBMSs (with a different file format) are also very popular - Paradox, Clarion, etc. It should be emphasized that the listed systems are descended from MS-DOS, but now almost all of them have been improved and have versions for Windows.

Among modern relational systems, the most popular DBMS for Windows is Access by Microsoft, Approach by Lotus, Paradox by Borland. Many of these systems support OLE technology and can manipulate not only numerical and textual information, but also graphic images (pictures, photographs) and even sound fragments and video clips.

The listed DBMSs are often called desktop, meaning the relatively small amount of data served by these systems. However, not only individual users but also entire teams (especially in local area networks) often work with them.

At the same time, more powerful relational DBMS with so-called SQL access are gradually moving to the center of modern information technology. These DBMSs are based on the client-server technology. Among the leading manufacturers of such systems are Oracle, Centura (Gupta), Sybase, Informix, Microsoft and others.

Data types in databases

Information systems work with the following basic types of data.

Text data... The meaning of each text (character) data is represented by a set of arbitrary alphanumeric characters, the length of which most often does not exceed 255 (for example, 5, 10, 140). The names and positions of people, the names of firms, products, devices, etc., are represented in the IS as text data. In a particular case, the value of a textual data can be the name of a file that contains unstructured information of arbitrary length (for example, a biography or a photograph of an object). In fact, it is a structured link that allows you to dramatically expand the information content of your table.

Numeric data... Data of this type is usually used to represent attributes, with the values of which you need to perform arithmetic operations (weights, prices, coefficients, etc.). A numeric data, as a rule, has additional characteristics, for example: an integer 2 bytes long, a floating point number (4 bytes) in a fixed format, etc. The separator of the integer and fractional parts is usually a period.

Date and / or time data... Date type data are specified in some format known to the machine, for example, - DD.MM.YY (day, month, year). At first glance, this is a special case of text data. However, the use of a special type for date in IS has the following advantages. First, the system gets the ability to maintain tight control (for example, the value of the month can only be discrete in the range 01-12). Secondly, it becomes possible to automatically represent the date format depending on the traditions of a particular country (for example, in the United States, the MM-DD-GT format is adopted). Third, when programming, arithmetic operations with dates are greatly simplified (try, for example, manually calculating the date 57 days after a given number). The use of this type of time has the same advantages.

Logical data... This type of data (sometimes called boolean) can take only one of two mutually exclusive values - True or False (conventionally: 1 or 0). In fact, it is a switch, the value of which can be interpreted as "Yes" and "No" or as "True" and "False". The boolean type is convenient to use for those attributes that can take one of two mutually exclusive values, for example, having a driver's license (yes-no), liable for military service (yes-no), etc.

OLE Object Fields... The value of such data can be any OLE object that is available on the computer (graphics, sound, video). In particular, the list of students can include not only a static photo of the student, but also his voice.

Custom types... In many systems, users are given the opportunity to create their own data types, for example: "Day of the week" (Monday, Tuesday, etc.), "Address" (postal code - city - ...), etc.

In a particular case, the value of the text data can be a collection of spaces, and the value of the numeric data can be zero. If no information is entered into the table at all, the value will be Null. Null (no data) should not be confused with zero or spaces. In many systems, it is important for the user to record the absence of data for some object instances (for example, the absence of an address, "Address is Null"). If you accidentally enter a space in such a row of the table, the system will consider that the address has been specified, and this instance will not be included in the list of objects with missing addresses.

Relational databases

The most convenient for both the user and the computer is the presentation of data in the form of a two-dimensional table - most modern information systems work with just such tables. Databases that are composed of two-dimensional tables are called relational , (in English "relation" - relation). The basic idea behind the relational approach is to represent an arbitrary data structure as a simple two-dimensional table.

An example of implementing a relational data model would be a table with information about students.

As you can see from the above example, a relational table has the following properties:

· Each row of the table - one data element (information about one student);

· All columns in the table are homogeneous, i.e. All items in a column are of the same type and length (for example, the Name column displays character-type student names with a maximum of 17 characters);

· Each column has a unique name (for example, there are no two Name columns in the table);

· Identical lines in the table are not allowed (each student is recorded only once);

· The order of the rows and columns in the table can be arbitrary (an entry about the student in the table is made upon admission to school, and the order of the columns does not matter).

Building blocks of a relational database

Let's take a look at the main structural elements of a database using a relational table as an example.

1. In relational databases, any set of data is represented in the form of two-dimensional tables (relations), similar to the list of students described above. Moreover, each table consists of a fixed number of columns and a certain (variable) number of rows. Column descriptions are commonly referred to as table layouts.

2. Each column of the table represents a field - an elementary unit of the logical organization of data, which corresponds to an indivisible unit of information - an attribute of the data object (for example, the student's surname, address).

The following characteristics are used to describe the field:

· Field name (for example, personal file number, surname);

· Field type (for example, character, date);

· Additional characteristics (field length, format, precision).

For example, the field Date of Birth can be of type "date" and length 8 (6 digits and 2 dots separating the day, month and year in the date record).

3. Each row of the table is called a record. The record logically unites all the fields that describe one data object, for example, all the fields in the first row of the above table describe the data about the student Petrov Ivan Vasilievich born 03/12/89, living at ul. Gorky, 12-34, studying in grade 4A, personal file number - P-69. The system numbers the records in order: 1,2, ..., n, where n is the total number of records (rows) in the table at the moment. In contrast to the number of fields (columns) in the table, the number of records during the operation of the database can change as desired (from zero to millions). The number of fields, their names and types can also be changed, but this is already a special operation called changing the table layout .

4. In the structure of the file record, fields are specified, the values of which are a simple key, that identify the record instance. An example of such a simple key in the Students table is the personal file number field, the value of which uniquely identifies one object in the table - one student, since the table does not contain two students with the same personal file number.

5. Each field can be included in several tables (for example, a field Surname may be included in the table List of those engaged in the theater group).