The standard part of the ascii encoding table contains. Encoding text information

EVE (End of Blockette) - the end of a nested block. Today this code, separating elements of the same record, would be called “field end”.

EOF (End of File) - end of message (end of transmission, end of data file).

ASCII encoding scheme

The first attempt to standardize character codes for computers took place in 1963 in the USA, when the first version of the ASCII standard (pronounced “aski” in Russian) was created. The coding system turned out to be not entirely successful, caused many complaints, and soon a second, more successful version was prepared, adopted in 1968. It is still in use today. The name of the standard stands for American Standard Code for Information Interchange- Standard code internal

early US information exchange. It was put into effect by the National Institute of Standardization

tions of the USA (ANSI, American National Standard Institute).

The ASCII table is designed for seven-bit 128 encoding various characters(). This is enough to represent lowercase and uppercase letters of the English alphabet, punctuation marks, numbers, signs mathematical operations, as well as some special signs, for example, such as @, #, § and others.

The first 32 ASCII table codes (0 to 31) are not represented printed characters. This area is reserved for placing special characters:

control codes (used to control remote devices, for example printers);

formatting codes (used for special formatting of messages);

delimiter codes (used to structure transmitted data sets).

Domestic 8-bit text encoding schemes

The active implementation of national standards for encoding text characters dates back to the 70s of the 20th century. These processes affected the whole of Europe. Didn't stand aside Soviet Union: The first national 8-bit encoding schemes were approved.

With 8-bit encoding, a byte is allocated for writing a character, having 256 distinguishable states. This allows encoding of bilingual character sets, such as English and Russian. The English-language part is placed at the bottom of the table (codes from 0 to 127), and the national part is at the top (codes from 128 to 255).

ISO-8859 encoding scheme

Formally, for Russia this coding scheme has highest priority, because it is approved by the International Institute for Standardization (ISO - International Standard Organization). In the ISO-8859 standard, the encoding of Cyrillic characters (Cyrillic is the written system of Slavic languages) allocates the so-called “fifth code page”, therefore this standard is also called ISO 8859-5.

In practice, documents using this scheme are rare, especially on IBM PC computers. This encoding can more often be found in documents executed on Sun platform computers. Despite its low prevalence, this coding system has

19.12.13 23756

In order to use ASCII correctly, it is necessary to expand your knowledge in this area and about coding capabilities.

What is it?

ASCII is a character set printed characters(see screenshot No. 1) dialed on computer keyboard, to transmit information and some codes. In other words, the alphabet and decimal digits are encoded into corresponding symbols that represent and carry the necessary information.

The ASCII encoding was developed in America, so the standard encoding table usually includes the English alphabet with numbers, which total is about 128 characters. But then a fair question arises: what to do if encoding of the national alphabet is required?

Other versions of the ASCII table have been developed to address similar issues. For example, for languages ​​with a foreign language structure, letters of the English alphabet were either removed or added additional characters in the form of a national alphabet. Thus, the ASCII encoding may contain Russian letters for national use (see screenshot No. 2).

Where is the ASCII coding system used?

This coding system is necessary not only for dialing text information on the keyboard. It is also used in graphics. For example, in the ASCII Art Maker program, graphic images of various extensions consist of a range of ASCII characters (see screenshot No. 3).

As a rule, similar programs can be divided into those that perform the function graphic editors, inverting an image into text, and those that convert an image into ASCII graphics. The well-known emoticon (or as it is also called “ smiling human face ") is also an example of an encoding character.

This encoding method can also be used during writing or creation HTML document. For example, you enter a specific and necessary set of characters, and when viewing the page itself, the symbol corresponding to this code will be displayed on the screen.

Among other things this type encoding is necessary when creating a multilingual website, because characters that are not included in a particular national table will need to be replaced with ASCII codes. If the reader is directly connected with information and communication technologies (ICT), then it will be useful for him to familiarize himself with such systems as:

  1. Portable character set;
  2. Control characters;
  3. EBCDIC;
  4. VISCII;
  5. YUSCII;
  6. Unicode;
  7. ASCII art;
  8. KOI-8.

ASCII Table Properties

Like any systematic program, ASCII has its own characteristic properties. So, for example, the decimal number system (numbers from 0 to 9) is converted to binary system calculus (i.e. each decimal digit is converted to binary 288=1001000 respectively).

The letters located in the upper and lower columns differ from each other only by a bit, which significantly reduces the level of complexity of checking and editing the case.

With all these properties, ASCII encoding works as eight-bit, although it was originally intended to be seven-bit.

Application of ASCII in Microsoft programs Office:

If necessary this option information encoding can be used in Microsoft Notepad and Microsoft Office Word. Within these applications, the document can be saved in ASCII format, but in this case, you will not be able to use some functions when typing text.

In particular, bold and bold fonts will not be available because encoding only preserves the meaning of the typed information, and not general view and shape. You can add such codes to a document using the following software applications.

[8-bit encodings: ASCII, KOI-8R and CP1251] The first encoding tables created in the United States did not use the eighth bit in a byte. The text was represented as a sequence of bytes, but the eighth bit was not taken into account (it was used for official purposes).

The table has become a generally accepted standard ASCII(American Standard Code for Information Interchange). The first 32 characters of the ASCII table (00 to 1F) were used for non-printing characters. They were designed to control a printing device, etc. The rest - from 20 to 7F - are regular (printable) characters.

Table 1 - ASCII encoding

DecHexOctCharDescription
0 0 000 null
1 1 001 start of heading
2 2 002 start of text
3 3 003 end of text
4 4 004 end of transmission
5 5 005 inquiry
6 6 006 acknowledge
7 7 007 bell
8 8 010 backspace
9 9 011 horizontal tab
10 A 012 new line
11 B 013 vertical tab
12 C 014 new page
13 D 015 carriage return
14 E 016 shift out
15 F 017 shift in
16 10 020 data link escape
17 11 021 device control 1
18 12 022 device control 2
19 13 023 device control 3
20 14 024 device control 4
21 15 025 negative acknowledge
22 16 026 synchronous idle
23 17 027 end of trans. block
24 18 030 cancel
25 19 031 end of medium
26 1A 032 substitute
27 1B 033 escape
28 1C 034 file separator
29 1D 035 group separator
30 1E 036 record separator
31 1F 037 unit separator
32 20 040 space
33 21 041 !
34 22 042 "
35 23 043 #
36 24 044 $
37 25 045 %
38 26 046 &
39 27 047 "
40 28 050 (
41 29 051 )
42 2A 052 *
43 2B 053 +
44 2C 054 ,
45 2D 055 -
46 2E 056 .
47 2F 057 /
48 30 060 0
49 31 061 1
50 32 062 2
51 33 063 3
52 34 064 4
53 35 065 5
54 36 066 6
55 37 067 7
56 38 070 8
57 39 071 9
58 3A 072 :
59 3B 073 ;
60 3C 074 <
61 3D 075 =
62 3E 076 >
63 3F 077 ?
DecHexOctChar
64 40 100 @
65 41 101 A
66 42 102 B
67 43 103 C
68 44 104 D
69 45 105 E
70 46 106 F
71 47 107 G
72 48 110 H
73 49 111 I
74 4A 112 J
75 4B 113 K
76 4C 114 L
77 4D 115 M
78 4E 116 N
79 4F 117 O
80 50 120 P
81 51 121 Q
82 52 122 R
83 53 123 S
84 54 124 T
85 55 125 U
86 56 126 V
87 57 127 W
88 58 130 X
89 59 131 Y
90 5A 132 Z
91 5B 133 [
92 5C 134 \
93 5D 135 ]
94 5E 136 ^
95 5F 137 _
96 60 140 `
97 61 141 a
98 62 142 b
99 63 143 c
100 64 144 d
101 65 145 e
102 66 146 f
103 67 147 g
104 68 150 h
105 69 151 i
106 6A 152 j
107 6B 153 k
108 6C 154 l
109 6D 155 m
110 6E 156 n
111 6F 157 o
112 70 160 p
113 71 161 q
114 72 162 r
115 73 163 s
116 74 164 t
117 75 165 u
118 76 166 v
119 77 167 w
120 78 170 x
121 79 171 y
122 7A 172 z
123 7B 173 {
124 7C 174 |
125 7D 175 }
126 7E 176 ~
127 7F 177 DEL

As is easy to see, in this encoding only latin letters, and those that are used in English. There are also arithmetic and other service symbols. But there are neither Russian letters, nor even special Latin ones for German or French. This is easy to explain - the encoding was developed exactly as American standard. As computers began to be used throughout the world, other characters needed to be encoded.

To do this, it was decided to use the eighth bit in each byte. This made 128 more values ​​available (from 80 to FF) that could be used to encode characters. The first of the eight-bit tables is “extended ASCII” ( Extended ASCII) - included various options Latin characters used in some languages ​​of Western Europe. It also contained other additional symbols, including pseudographics.

Pseudographic characters allow, by displaying only text characters, provide some semblance of graphics. Using pseudographics, for example, a control program works FAR files Manager.

There were no Russian letters in the Extended ASCII table. Russia (formerly the USSR) and other countries created their own encodings that made it possible to represent specific “national” characters in 8-bit text files - Latin letters of the Polish and Czech languages, Cyrillic (including Russian letters) and other alphabets.

In all encodings that have become widespread, the first 127 characters (that is, the byte value with the eighth bit equal to 0) are the same as ASCII. So an ASCII file works in either of these encodings; letters English language they are presented equally.

Organization ISO(International Standardization Organization) adopted a group of standards ISO 8859. It defines 8-bit encodings for different groups languages. So, ISO 8859-1 is an Extended ASCII table for the USA and Western Europe. And ISO 8859-5 is a table for the Cyrillic alphabet (including Russian).

However, for historical reasons, the ISO 8859-5 encoding did not take root. In reality, the following encodings are used for the Russian language:

Code Page 866 ( CP866), aka “DOS”, aka “alternative GOST encoding”. Widely used until the mid-90s; now used to a limited extent. Practically not used for distributing texts on the Internet.
- KOI-8. Developed in the 70-80s. Is a generally accepted standard for the transmission of mail messages in Russian Internet. Also widely used in operating systems Oh Unix family, including Linux. The KOI-8 version, designed for Russian, is called KOI-8R; There are versions for other Cyrillic languages ​​(for example, KOI8-U is a version for the Ukrainian language).
- Code Page 1251, CP1251,Windows-1251. Developed by Microsoft to support the Russian language in Windows.

The main advantage of the CP866 was the preservation of pseudo-graphics characters in the same places as in Extended ASCII; therefore, foreign ones could work without changes text programs, for example, the famous Norton Commander. The CP866 is now used for Windows programs running in text windows or full-screen text mode, including FAR Manager.

Texts in CP866 recent years are quite rare (but it is used to encode Russian file names in Windows). Therefore, we will dwell in more detail on two other encodings - KOI-8R and CP1251.



As you can see, in the CP1251 encoding table, Russian letters are arranged in alphabetical order (with the exception, however, of the letter E). Thanks to this location computer programs It's very easy to sort alphabetically.

But in KOI-8R the order of Russian letters seems random. But in reality this is not the case.

In many older programs, the 8th bit was lost when processing or transmitting text. (Now such programs are practically “extinct”, but in the late 80s - early 90s they were widespread). To get a 7-bit value from an 8-bit value, just subtract 8 from the most significant digit; for example, E1 becomes 61.

Now compare KOI-8R with ASCII table(Table 1). You will find that Russian letters are placed in clear correspondence with Latin ones. If the eighth bit disappears, lowercase Russian letters turn into uppercase Latin letters, and uppercase Russian letters turn into lowercase Latin letters. So, E1 in KOI-8 is the Russian “A”, while 61 in ASCII is the Latin “a”.

So, KOI-8 allows you to maintain the readability of Russian text when the 8th bit is lost. “Hello everyone” becomes “pRIWET WSEM”.

IN lately And alphabetical order The arrangement of characters in the encoding table, and readability with the loss of the 8th bit, have lost their decisive importance. Eighth bit in modern computers is not lost during transmission or processing. And alphabetical sorting is done taking into account the encoding, and not by simply comparing codes. (By the way, the CP1251 codes are not completely arranged alphabetically - the letter E is not in its place).

Due to the fact that there are two common encodings, when working with the Internet (mail, browsing Web sites), you can sometimes see a meaningless set of letters instead of Russian text. For example, “I AM SBYUFEMHEL.” These are just the words “with respect”; but they were encoded in CP1251 encoding, and the computer decoded the text using the KOI-8 table. If the same words were, on the contrary, encoded in KOI-8, and the computer decoded the text using the CP1251 table, the result would be “U KHBTSEOYEN”.

Sometimes it happens that a computer deciphers Russian-language letters using a table that is not intended for the Russian language. Then, instead of Russian letters, a meaningless set of symbols appears (for example, Latin letters of Eastern European languages); they are often called “crocozybras”.

In most cases modern programs cope with determining the encodings of Internet documents ( emails and Web pages) independently. But sometimes they “misfire”, and then you can see strange sequences of Russian letters or “krokozyabry”. As a rule, in such a situation, to display real text on the screen, it is enough to select the encoding manually in the program menu.

Information from the page http://open-office.edusite.ru/TextProcessor/p5aa1.html was used for this article.

Material taken from the site:

Using binary code, you can encode text information if each character of the alphabet is associated with a specific integer. Eight binary digits are enough to encode 256 different characters. This is enough to express various combinations eight bits all symbols of the English and Russian languages, both lowercase and uppercase, as well as punctuation marks, basic symbols arithmetic operations and some generally accepted special characters.

In order for the whole world to encode text data in the same way, unified encoding tables are needed, but this is not yet possible due to contradictions between the characters of national alphabets.

The US Standards Institute introduced the ASCII coding system, which has two coding tables: basic and extended. The basic table assigns code values ​​from 0 to 127, and the extended table refers to symbols numbered from 128 to 255.

The base table of the ASCII system contains 128 codes. The first 32 codes of the base table, starting with zero, are given to hardware manufacturers. This area contains control codes that do not correspond to any language symbols. From the 32nd to the 127th code there are codes for characters of the English alphabet, punctuation marks, arithmetic operations and some auxiliary symbols.

The Russian language character encoding, known as Windows-1251 encoding, was introduced by Microsoft. Considering the wide distribution of operating systems and other products of this company in Russia, it is deeply entrenched and widely used. The encoding of Russian language characters is fixed in the extended encoding table ASCII systems from 192 to 255 code.

Most systems recognize 256 codes: 128 standard and 128 additional from the extended character set.

Since one byte corresponds to one character, 4 bytes are needed to represent a string of four characters. This is what, for example, a group of characters AI2B, consisting of letters and numbers, looks like in ASCII encoding:

And this is what it looks like binary representation six characters of the word "binary":

0100001∩ 01001001 01001110 OIOOOO∩1 01010010 01011001

In computer text, as opposed to text typed on a typewriter, a "space" is meaningful symbol and, like any other symbol, it has a corresponding binary representation. At automated processing absence of information or presence of a space plays a role important role, sometimes leading to confusion and confusing new users.

Uppercase and lowercase letters correspond to different letters ASCII codes. For example, capital letter D corresponds to code 68, and lowercase d to 100.

To encode letters of the Russian alphabet, the Windows-1251 encoding is most often used in practice, but there are other encoding systems. The most common encoding is KOI-8 (eight-digit information exchange code). Its origin dates back to the times of the Council for Mutual Economic Assistance of Eastern European States. Today the KOI-8 encoding is widespread in computer networks on the territory of Russia.

The international standard, which provides for the encoding of Russian language characters, is called ISO (International Standard Organization - International Institute for Standardization). In practice, this encoding is rarely used.

You should always remember that computers are only machines, they do not understand ones and zeros, but they are able to interpret electrical voltage, perceiving its presence as I, and its absence as 0. This technology allows computers to process information.

Coding of graphic data. A black and white graphic image printed on paper consists of tiny dots - pixels (picture element) forming a characteristic pattern called a raster.

Raster coding allows the use of binary code to represent graphical data, since the linear coordinates and individual properties of each point (brightness) can be expressed using integers. It is generally accepted today to represent black and white illustrations in the form of a combination of dots with 256 gradations gray. Consequently, to encode the brightness of any point, an 8-bit binary number is usually sufficient.

Color images are formed in accordance with the binary color code of each point stored in video memory. Color images can have different color depths, determined by the number of bits to encode the color of a dot. Thus, for depth color 8, the number of displayed colors is 2 x = 256.

Coding of color i graphics with 16-bit binary numbers: imii is called High Color mode.

The mode of representing color graphics using 24 binary bits is called true color.

For color coding graphic images applying the principle of decomposition of an arbitrary color into its main components. It is believed that any color visible to the human eye can be obtained by mechanically mixing a mix of three primary colors: red (Red), green (Green) and blue (Blue). This coding system is called RGB (but the first letters of the primary colors).

The RGB color representation model is given in table. 1.1.

Each of the primary colors can be associated with a primary color, i.e. a color that complements the primary color io white. As follows from the table. I. I, for any of the main colors | The complementary color will be the color formed by the sum of a pair of other primary colors. Accordingly, the additional colors are cyan (Cyan), magenta (Magenta) and yellow (Yellow).

The principle of decomposition of an arbitrary color into its constituent components can be applied not only to primary colors, but also to additional ones, i.e. Any color can be represented as the sum of cyan, magenta and yellow components.


This color coding method is accepted in printing, but printing also uses a fourth color - black. That's why this system coding is denoted by four letters CMYK (black color is denoted by the last letter in the color name - the letter K, because the letter B already denotes blue). To represent color graphics in this system, you must have 32 binary bits. This mode is also called full color.

If you reduce the number of binary bits used to encode the color of each point, you can reduce the amount of data, but the range of encoded colors is noticeably reduced.

The image quality is determined by the resolution of the monitor, i.e. number of points per line and raster lines. Typically, monitors use a screen resolution of 800x600, I024x768 or I280x960. Let's calculate the required amount of video memory for one of graphics modes, for example, resolution I 024x768 and color rendering quality 32 bits per pixel. The required amount of video memory will be:

32хl 024x768 = 25,165,824 bits = 3,145,728 bytes = 3072 KB = 3 MB.