What did the first search engines evaluate? Search engines and directories

Search engines(Search engine)

Search engines allow you to find WWW documents related to given topics or equipped with keywords or combinations thereof. There are two search methods used on search servers:

· According to the hierarchy of concepts;

· By keywords.

Search servers are populated automatically or manually. The search server usually has links to the rest search servers, and sends them a search request at the user's request.

There are two types of search engines.

1. "Full-text" search engines that index every word on a web page, excluding stop words.

2. "Abstract" search engines that create an abstract of each page.

For webmasters, full-text engines are more useful because any word found on a web page is analyzed to determine its relevance to user queries. However, abstract engines can index pages better than full-text ones. This depends on the algorithm for extracting information, for example, by the frequency of use of the same words.

Main characteristics of search engines.

1.The size of a search engine is determined by the number of pages indexed. However, at any given time, the links provided in response to user requests may be of different ages. Reasons why this happens:

· some search engines immediately index the page at the user's request, and then continue to index pages that have not yet been indexed.

· others often index the most popular pages networks.

2. Indexation date. Some search engines show the date a document was indexed. This helps the user determine when a document appeared online.

3. Indexing depth shows how many pages after the specified one will be indexed search engine. Most machines have no restrictions on indexing depth. Reasons why not all pages may be indexed:

· Not correct use frame structures.

· use of a site map without duplicating regular links

4.Working with frames. If search robot does not know how to work with frame structures, then many structures with frames will be missed during indexing.

5. Frequency of links. Major search engines can determine the popularity of a document by how often it is linked to. Some machines, based on such data, “conclude” whether or not it is worth indexing a document.

6.Server update frequency. If the server is updated frequently, the search engine will re-index it more often.

7. Indexation control. Shows what tools you can use to control the search engine.

8.Redirection. Some sites redirect visitors from one server to another, and this option shows how this will be related to the documents found.

9.Stop words. Some search engines do not include certain words in their indexes or may not include those words in user queries. These words are usually considered prepositions or frequently used words.

10.Spam fines. Ability to block spam.

11.Deleting old data. A parameter that determines the actions of the webmaster when closing the server or moving it to another address.

Examples of search engines.

1. Altavista. The system was opened in December 1995. Owned by DEC. Since 1996 he has been collaborating with Yahoo. AltaVista is best option for custom search . However, sorting results by category This is not done and you have to manually review the information provided. AltaVista does not provide any means of retrieving lists of active sites, news, or other content search capabilities.

2.Excite Search. Launched at the end of 1995. In September 1996 - acquired by WebCrawler. This unit has a powerful search furlow, possibility of automatic individual settingsinformation provided, as well as compiled qualificationsdescriptions of multiple nodes by qualified personnel. Excite differs from other search nodes in thatallows you to search news services and publish reviews Web pages. The search engine uses toolsstandard keyword search and heuristiccontent search methods. Thanks to this combination,you can find relevant pages Web if they do not contain a user-specified key words Disadvantage of Excite is a somewhat chaotic interface.

3.HotBot. Launched in May 1996. Owned by Wired. Based on Berkeley Inktomi search engine technology. HotBot is a database containing documents indexed by full text, and one of the most comprehensive search engines on the Web. Its Boolean search capabilities and its ability to limit searches to any area or Web site help the user find necessary information, weeding out the unnecessary. HotBot provides the ability to select the desired search parameters from drop-down lists.

4.InfoSeek. Launched before 1995, easily accessible. Currently contains about 50 million URLs. Infoseek has a well-designed interface and excellent search tools. Most responses to queries are accompanied by “related topics” links, and each response is followed by “similar pages” links. Search engine database of pages indexed by full text. Answers are ordered by two indicators: the frequency of occurrences of the word or phrases on the page tsakh, as well as the location of words or phrases on the pages. There is a Web Directory, divided into 12 categories with hundreds of subcategories that can be searched. Each catalog page contains a list of re recommended nodes.

5. Lycos. Operating since May 1994. Widely known and used. It includes a directory with a huge number of URLs. and the Point search engine with technology statistical analysis page content, as opposed to full text indexing. Lycos contains news, site reviews, links to popular sites, city maps, and tools for finding addresses, images expressions and sound and video clips. Lycos arranges answers by degree of correlationsatisfying a request based on several criteria, for example, numberlu search terms, found in the annotation to the docment, interval betweenin words in a specific phrase of the document, locationterms in the document.

6. WebCrawler. Opened on April 20, 1994 as a project of the University of Washington. WebCrawler provides opportunities syntax for specifying queries, as well as a large selection node annotations with a simple interface.

Following each response, WebCrawler will display a small icon with an approximate assessment of whether the request was matched. Comee also displays a page with a short summary for each answer, its full URL, an exact match score, and also uses this answer in the sample query as its keywords.Graphical interface for configuring queries in There is no Web Crawler. N is not allowedthe use of universal symbols is also impossibleassign weights to keywords.There is no way to limit the search fielda certain area.

7. Yahoo. Oldest Yahoo directory was launched in early 1994. Widely known, frequently used and most respected. In March 1996, the Yahooligans catalog for children was launched. Yahoo regional and top directories appear. Yahoo is based on user subscriptions. It can serve as a starting point for any search on the Web because its classification system the user will find a site with well-organized information. Web content falls into 14 general categories, listed on home page Yahoo!. Depending on the specifics of the user's request, it is possible to either work with these categories to get acquainted with subcategories and lists of nodes, or search specific words and terms throughout the database. The user can also limit the search within any section or subsection of Yahoo!. Due to the fact that the classification of nodes is performed by people, and not by computer, the quality of links is usually very high. However, refining the search in case of failure is a difficult task. Join Yahoo ! search engine included AltaVista, so if your search on Yahoo! it happens automatically repetition using a search engine AltaVista . The results are then sent to Yahoo!. Yahoo! provides the ability to send search queries to Usenet and Fourl 1 to find out addresses email.

Russian search engines include:

1. Rambler. This is a Russian-language search engine. The sections listed on the Rambler home page cover Russian-language Web resources. There is an information classifier. A convenient opportunity work is to provide a list of the most visited nodes for each the proposed topic.

2. Aport Search. Aport ranks among the leading certified search engines Microsoft like local search enginessystems for Russian version Microsoft Internet Explorer. One of the advantages of Aport is English-Russian and Russian-English translation in the mode of online queries and searches for results, thanks to which you can search in Russian Internet resources , even without knowing Russian. Moreover you can search for information tion using expressions, even for sentences.Among the main properties of the Aport search system you candivide the following:

Translation of query and search results from Russian into EnglishChinese language and vice versa;

Automatic check spelling errors in the request;

Informative display of search results for found sites;

Ability to search in any grammatical form;

advanced query language for professionals cash users.

Other search properties include:support of five main code pages (different operatingsystems) for the Russian language, search technology usingthere are no restrictions on URL and date of documents, search implementationby headlines, comments and signaturesto pictures, etc., saving search parameters and defining number of previous user requests, merging copies of the document located on different servers.

3.List. ru ( http://www.list.ru) In its implementation, this server has manycommon with the English-language system Yahoo!. On home page server contains links to the most popular search categories.

The list of links to the main categories of the catalog takes central part. Search in the catalog is implemented in such a way that the result of a query can be found both individual sites and categories. If the search is successful, the URL, title, description, and keywords are displayed. Acceptable use Yandex query language. WITHlink "Structuredirectory" opens in separate window full kata rubricatorlog. The ability to move from the rubricator to any selected subcategory has been implemented. More detailed thematic divisionthe current section is represented by a list of links. The catalog is organized like this such that all sites contained on lower levels stroktours are also presented in sections.The displayed list of resources is sorted alphabetically, but you can choose to sort: by time add menu, by transition, by order of adding to the catalog, according topopularity among catalog visitors.

4. Yandex. Yandex series software products represent a set of tools for full-text indexing and searching for text data, taking into account the morphology of the Russian language. Yandex includes modules for morphological analysis and synthesis, indexing and search, as well as a set of auxiliary modules, such as a document analyzer, markup languages, format converters, and a spider.

Morphological analysis and synthesis algorithms based on the base dictionary are able to normalize words, that is, find their initial form, and also build hypotheses for words not contained in the base dictionary. The full-text indexing system allows you to create a compact index and quickly search based on logical operators.

Yandex is designed to work with texts locally and in global network, and can also be connected as a module to other systems.

Of course, the list of popular search engines does not end there – their number is in the hundreds. However, I am sure that these will be more than enough for you to work with English-language sites.

It should be noted that almost all of the search engines presented above can work with the Cyrillic alphabet. But to search for information in Russian, I still recommend domestic search engines:

There are other Russian-language search engines, but these are the most popular, especially the first two.

Rice. 4.1. Search engine Google system

From the book Countering Black PR on the Internet author Kuzin Alexander Vladimirovich

Search engines and directories as tools for promoting “combat” Internet resources and filling them with content This section was written using some materials from the book “Internet Intelligence: A Guide to Action.” Automation of filling “combat” sites and

From the book Blog. Create and promote author Yushchuk Evgeniy Vladimirovich

Search engines and catalogs as tools for promoting a blog and filling it with content We will talk about automating filling a blog with interesting content, i.e., how to find material for a blog faster than other bloggers. It is unlikely that most readers will be able to

From the book Internet Intelligence [Guide to Action] author Yushchuk Evgeniy Leonidovich

Compiling queries related to the company name in search engines

From the book Assembling a computer with your own hands author Vatamanyuk Alexander Ivanovich

5.2. Popular operating systems There are many operating systems, and each has its own degree of popularity. Some systems are better for networking, while others are better for battery life, since you can combine everything without losing performance and

From the book Win2K FAQ (v. 6.0) author Shashkov Alexey

(6.10) There is a mixed network, netware and NT, clients W2kPro and W98. Machines with W98 cannot log into machines with W2k. To solve this problem, you need to bind using protocols: IPX/SPX only to the Novell client, TCP/IP only to Microsoft client. You can do this in properties network connections menu

From the book Abstract, coursework, diploma on a computer author

Search engines If you need to get a selection of materials on a more specific and special information, it is better to use search engines. A search engine is a set of special programs for searching the Internet. They are free from the disadvantages inherent

From the book Internet. New opportunities. Tricks and effects author Balovsyak Nadezhda Vasilievna

How search engines function and how to construct queries correctly Modern search engines are a set of special programs designed to search for information on the Internet. The principle of their operation is as follows: from time to time they

From the book Internet 100%. Detailed tutorial: from beginner to professional author Gladky Alexey Anatolievich

Popular online cinemas One of the popular free online cinemas is located at http://vsekino.tv (Fig. 7.5). Rice.

From the book Internet - easy and simple! author Alexandrov Egor

Popular directories The most popular and complete directory in the world is undoubtedly the English-language Yahoo! (http://dir.yahoo.com) (Fig. 4.4). Rice. 4.4. The most popular directory of links Yahoo! It should be noted that directories are often combined with search engines, so many of the presented ones

From the book Yandex for everyone author Abramzon M. G.

1.11.3. Popular posts The list of popular posts is updated once a day. Several dozen such records are selected, but only a few of the most popular ones are submitted to title page Search section. The rest can be viewed at the link Total records.

From the book How to find and download any files on the Internet author Reitman M.A.

1.11.6. Popular categories If before there was talk about the rating of bloggers, services, the most popular posts, now we'll talk about categories. The category of their message is determined by their authors. How, why, wherefore - depends on the topic, on fashion, on mood. However

From the book First Steps with Windows 7. A Beginner's Guide author Kolisnichenko Denis N.

1.11.7. Popular news The news rating of the Yandex service is highlighted as a separate block among other indicators. News. These are the news that are discussed the most on blogs. The list of popular news is updated every 5-10 minutes. If you follow the link from

From the book Meet the Laptop author Zhukov Ivan

Popular trackers There are many torrent trackers in the world whose services you can use. The following are the most popular trackers.? http://lostfilm.tv is a tracker specializing in TV series. As a rule, releases have professional dubbing and quality

From the book IT security: is it worth risking the corporation? by Linda McCarthy

10.2.1. Search engines The Internet contains a huge amount of information. After all, anyone can create a website on the Internet, so the number of new sites is growing every day. Search engines are used to search the Internet. A search engine is a special

From the author's book

Popular Internet applications ICQ (ICQ) ICQ is a centralized instant messaging service. The service user (that is, you) works with a client program (the so-called “messenger”). Messages are sent instantly. Also you can

From the author's book

Popular postal lists Bugtraq mailing lists This list discusses UNIX vulnerabilities, how they can be exploited, and how to close them. Its purpose is not to teach how to hack systems, but rather how to discover vulnerabilities, how to share information about them, how to

It is known that users arriving at a site from search engines provide up to forty percent of traffic. Therefore, taking care of the correct indexing of your site in search engines is very useful. By “correct indexing” I mean that the relevance of the query and the content of the site must be observed, i.e., in simple and accessible language, the content of the site must correspond to the request (some “masters” abuse sets of keywords that do not correspond to reality. For example , when my sister was preparing to release a CD with local copies of the first levels of Web pages, the word “x#y” and others like it were found on the servers of very reputable companies that had nothing in common with this kind of vocabulary :-).

Altavista
Fetch-search
Medialingua
Rambler
RusInfOil
Russian Express
BODY-search
HotBot
Yandex

Why did I list these particular search engines? Because, according to my observations, these are the ones that Russian-speaking netizens use. What are "my observations"? This is an analysis of the access logs to my server http://citforum.ru/, more precisely the part of the logs where information on HTTP_REFERER is collected, i.e. addresses (URLs) where clients used a link to any page on my server.

What is the rating of the machines I have listed in practice, which machines are used more, which ones less?

Altavista is in first place by a huge margin from the rest. This search engine was in the lead even before search in various languages (including Russian-language documents) appeared there. Well, it’s understandable - an excellent, easily accessible server, has been running for a long time (since the beginning of 1996), a huge database of documents (over 50 million addresses). It should also be taken into account that Russian-speaking users are located not only in Russia, but also in Bulgaria, the Czech Republic and Slovakia, Poland, Israel, Germany, not to mention the former republics of the USSR - Ukraine, Belarus... (I would especially like to say about the Balts: It’s they who, when they meet on the streets of some Kaunas or Tallinn, do not know Russian, but in front of the monitor, especially if it’s really necessary, they really know :-)) So for all these users it’s more convenient to use Altavista, and not our domestic cars - closer, still...

The next most popular search engine, oddly enough, is the youngest in Russia - Yandex. As Aleksey Amilyushchenko (Comptek company) told me, today there is an average of 72,000 requests per day and there is a trend of +10% per week (data from 04/07/98). It seems to me that Yandex is the most promising Russian search engine. With Comptek's system for parsing the "great and mighty" Russian language, Yandex may well emerge victorious in competition with the second whale in this area - Rambler.

Rambler is the third serious search engine for Russian-speaking users. The main thing I don’t like about it is that it ignores the contents of the structure . (I didn’t come up with this, this was said by Dmitry Kryukov from Stack Ltd.) Probably, it is precisely because of the refusal to take into account keywords that such a strange set of links is displayed in the query results. The second drawback of a purely interface nature is that the results are constantly given in KOI encoding, regardless of what the user selected before. Third drawback: the Rambler spider works on HTTP protocol 0.9, which leads to indexing errors, i.e. if several live on the same IP address virtual servers, Rambler sees only the first one, and considers all the others simply synonyms. Oh well, let's hope this gets fixed soon.

Well, in last place in my rating are Aport-Search, which indexes servers very strangely, RusInfOil, which regularly closes for reconstruction, and TELA-Search - a beautiful and almost useless gadget for the www.dux.ru server.

You may ask: were HotBot and the Pathfinder metasearch engine from Medialingua also on the list? I haven’t forgotten them, it’s just that HotBot, for some unknown reason, leaves a crowd of entries in my logs, which cannot be random flights of foreigners who don’t understand the Russian language (there are much fewer such flights from other imported machines), and I haven’t studied “Pathfinder” seriously enough yet.

Why do search engines need to promote a website?

It’s very simple, as I already said, search engines can provide up to forty percent of traffic to a site. And for this to happen, it is necessary that your site be correctly indexed, and for this you need to know how this is done.

And this is done in the following way: either the search engine robot itself gets to your site, or you yourself indicate the site in the appropriate interface (AddUrl), which is present in any self-respecting search engine. The first option suffers from delays (the robot will still get there, maybe in a day, maybe in a year: the Internet is big). The second one requires spending some time (a variety of software for automatically registering your site in a cloud of search engines does not give us anything - the machines are imported).

For everything to happen in at its best required:

there should be at least some text on the site. Search engines ignore pictures and tests on them. True, you can duplicate the text in the alt attribute of the img tag
Each site document MUST contain a meaningful title, keywords and a short description. They only write that search engines are full-text, but in reality this is not the case.
Creating a robots.txt file (especially if you have own server like www.name.ru).
Manual registration in each search engine you are interested in and subsequent control of indexing of your site.

So, you have already registered the first page of your website in various search engines.

Do you think everything is already in contract? No matter how it is. If a link to your site in a search engine response is displayed on the second screen, “it’s as bad as if there was no link at all” (Danny Sullivan, searchenginewatch.com)

In other words, simply specifying the page in AddURL is not enough. It is necessary to prepare the document in advance so that in response to appropriate requests to the search engine, in its response to the request, the link to your document is, if not the first, then at least in the top ten links (or better if there were several links to your documents in this top ten:- ). What does "prepare" mean? This is a purely technical question, nothing supernatural. Just in the HEAD section of each document on your site you should indicate the “talking” Title, KeyWords, Description and Robots.

Title:document title. A good, meaningful title can make a user choose your link from many others. Often you see headings that look something like the following: “Contents” - what, why - is unclear, there is no desire to check. Another case: on all pages of the site, the title is “Welcome to the company...” - it’s also not very attractive to check all documents titled in this way. Imagine that you have selected the search mode by titles, without a description of each document.

KeyWords:keywords. It is the content of this container that affects the relevance of the document to the search query.

No matter how much they say that search engines are full-text, this is not entirely true, but the contents of this container will definitely end up in the search engine index. Unfortunately, the creators of one of the largest domestic search engines, Rambler, do not want to work on this container. But in vain.

the content field should not contain end of line marks, quotes, etc. special characters, character case does not matter
It is not recommended to repeat the same keywords several times; this may be perceived as spam and the page risks being removed from the search engine index.
You should not use the same keywords for different pages of your site. This is, of course, simpler, but the contents of the documents themselves are different. If you really want to automate this process, you can write a program that would write all the selected blocks of the document in this field, for example, what is between the tags H, I and B.
if the line in content is too long, it is not forbidden to make several more similar constructions.
Generally speaking, the total volume of keywords in one document can reach up to 50% of the volume of that document.

Description: brief description document. Quite a useful container, its contents are used as a short description of relevant documents in the response of modern search engines. If this container does not exist, then a certain number of lines from the beginning of the document are returned. Accordingly, it is not uncommon when JavaScript is located at the very beginning of the document, and instead of a normal description, abracadabra is given in the form of a piece of script.

The content field must not contain line endings, quotes, or other special characters.
It is desirable that there be a meaningful summary of the document from a couple of human sentences, so that the search engine user, in addition to the title, can understand the meaning of the document.
Unfortunately, domestic search engines do not yet know how to work with this container, although they promise that they will soon learn.

Is it possible to control the actions of search engines?

It is possible, and even necessary! The first action that needs to be taken for this is to write a robots.txt file and put it in the root of your server. This file popularly explains to the search engine robot what should be indexed and what should not be indexed. For example, why index service files, such as statistical reports? Or the results of scripts? Moreover, many “smart” machines simply will not index servers without finding robots.txt. By the way, in this file you can specify different indexing masks for different search engines.

You can read more about this in my translation "Standard for Robots Exclusion". Second action: provide the site pages with Robots META tags. This is a more flexible indexing control tool than robots.txt. In particular, in this tag you can instruct the search engine robot not to follow links to other servers, for example, in documents with lists of links. The format of this mess is as follows:

robot_terms is a comma-separated list of the following keywords (uppercase or lowercase characters do not matter): ALL, NONE, INDEX, NOINDEX, FOLLOW, NOFOLLOW. NONE tells all robots to ignore this page when indexing (equivalent to simultaneous use keywords NOINDEX, NOFOLLOW). ALL allows this page and all links from it to be indexed (equivalent to using the INDEX, FOLLOW keywords simultaneously). INDEX allows this page to be indexed NOINDEX does not allow this page to be indexed FOLLOW allows all links from this page to be indexed NOFOLLOW does not allow links from this page to be indexed

If this meta tag is omitted or robot_terms are not specified, then by default the search robot acts as if robot_terms=INDEX, FOLLOW (i.e. ALL) were specified. If the keyword ALL is detected in CONTENT, then the robot acts accordingly, ignoring possibly specified other keywords. If CONTENT contains keywords that have opposite meanings, for example, FOLLOW, NOFOLLOW, then the robot acts at its own discretion (in this case, FOLLOW).

If robot_terms contains only NOINDEX, then links from this page are not indexed. If robot_terms contains only NOFOLLOW, then the page is indexed and links are accordingly ignored.

Monitoring the current status of your documents in the search engine index.

Well, okay, you read everything above and did so. What's next? And then there will be a long, tedious and, most importantly, regular check to see how things are going. As sad as it is, you will have to pay attention to this, if only because documents sometimes disappear from search engines. Why? I wish I knew... So, in good search engines you can see what documents and how many of them there are current time is in the index. Here's how it's done:

Alta Vista
In this search engine, checking the URL status is quite simple - just type in the query line:

url: citforum.ru
url:citforum.ru/win/
url:citforum.ru/win/internet/index.shtml

In the first case, all indexed server pages will be returned. In the second - only Windows encoding pages. In the third - is there an index.shtml file from the specified directory in the AltaVista index?

Excite
Checking the status of a URL in the Excite search engine is just as easy as in AltaVista. Just type the URL. For example:

HotBot
The URL status is checked in the HotBot search engine in a slightly different way. This is done like this:

Enter the URL in the request field
Change the "all of the words" option to "links to this URL"

Infoseek
In the Infoseek search engine, there is a separate interface with a whole set of settings for checking the status of a URL:

WebCrawler
WebCrawler provides the ability to check the status of a URL on a page:

Rambler
In this search engine, the URL status can be checked in two ways.

In the "Advanced Search" section by specifying the server name as a mask in one of the options Top 100 words on Rambler

INTERNET SEARCH ENGINE

There is a huge amount of information stored on the Internet useful information, but finding the right one may take a lot of time. This is one of the main problems that gave rise to the emergence of search engines. Internet search engines are associated with databases that catalog much of the information available on the Internet. Search engines have programs that index databases, and human librarians categorize, sort, and transform the Web into a searchable environment. Despite the fact that there are more than 100 search engines and browsing tools, users often experience frustration caused by difficulties in finding the information they need. And the main question today remains not the availability of this or that information on the Internet, but the question of where to look for it.

Search engines consist of three main elements. The first element is the indexer, or, as it is also called, the “spider”. The indexer reads information from a web page and follows links to other pages on the same website. Websites are viewed regularly, once a month or once every two months; this is necessary to monitor changes. All data about the information found goes to the second part of the search engine, the index, or, as it is sometimes called, the catalog. This is something like a huge book that stores the table of contents of each web page found by the indexer. When a web page changes, information about it in the index is also updated. Sometimes new pages or changes do not appear in the catalog immediately. Until the data about the web page is included in the catalog, the page is inaccessible to the search engine. Software search engine is its third component. This program sifts through millions of cataloged pages to find information that matches the search intent, and then ranks them according to their relevance to the specified goal. Search engines designed to analyze websites are based on the use of queries. The user types words or phrases relevant to the topic of interest.

Special program(the spider) “crawls” across the Web and then, using special search algorithms, finds the required data in a few seconds. Responding to a search query, the search engine goes through millions of sources and finds the addresses of relevant documents. Search engines provide annotated lists of hyperlinks to relevant Internet pages. If you click on a hyperlink, the corresponding URL will be used to find text, images, and links on another computer. Internet search engines with their huge catalogs of web pages are constantly improving search algorithms and expanding their functionality. Each search engine has its own personality (has its own special characteristics) and works differently. The work of many search engines is considered quite successful. However, everything modern systems suffer from some serious disadvantages:

1. Keyword searches yield too many links, and many of them are useless.

2. A huge number of search engines with different user interfaces creates the problem of cognitive overload.

3. Database indexing methods, as a rule, are not semantically related to the information content.

4. Inadequate directory maintenance strategies often result in links to information that is no longer available on the Internet.

5. Search engines are not yet advanced enough to understand natural language.

6. With the level of access that modern search engines provide, it is almost impossible to do reasoned conclusion about the usefulness of the source.

IN lately the needs for intellectual assistance are growing rapidly: assistance is needed for productive search for information, for navigating the vast Internet or corporate network specialized information. This led to the emergence of intelligent agents. Typically, intelligent agents are an integral part of a search engine. Some particularly advanced programs are like living assistants. Artificial intelligence technologies are used to search and sort information. Such a search engine “thinks” and acts on its own. The user trains the agent, then the agent goes searching the Internet to select the necessary documents from the millions of available documents and evaluate them. The user can “recall” the intelligent agent at any time and see how the work is progressing, or continue training it based on the information found, which will make the search even more accurate. Table 3 shows examples of intelligent agents and their characteristics.

Intelligent agents carry out a series of instructions on behalf of the user or another program, can work independently and have some degree of autonomy in the network. There are some differences between intelligent agents and Java applets. Java applets are downloaded from the Internet and run on the user's machine. Intelligent agents actually go online and search for applications that help complete a task and carry out their mission remotely, freeing up the user's computer for other tasks. When the goal is achieved, they notify the user that the work is completed and present the results to him.

Intelligent agents are able to “understand” what information the user needs. Agents can be programmed to change behavior based on experience and interactions with other agents. Generalized characteristics of intelligent agents can be presented as follows:

Intelligence - learning based feedback, by examples, errors and through interaction with other agents.

Ease of use - agents can be “trained” using natural language.

Individual approach - agents adapt to user preferences.

Integration - continuous learning, applying existing knowledge to new situations, developing a mental model.

Autonomy - agents are able to “sense” the environment and react to its changes, and are able to draw conclusions.

Table 3

Examples of intelligent agents and their characteristics.

The scale of information resources and their number on the Internet are constantly expanding. It becomes clear that the centralized database typical of search engines is not a satisfactory solution. Intelligent agents are a completely new field that underpins the next generation of search engines that will be able to filter information and achieve more accurate results. For example, Hyperlink-Induced Topic Search Engine, developed by John Kleinberg from Cornell University. This search engine does not hunt for keywords. The system analyzes the natural structure of the Web, looking for "communities" of pages related to a particular subject, then finds out which of these pages are considered significant from the point of view of the page authors themselves. This idea is similar to citation metrics, which have long been used in the academic community. This approach is more efficient and reliable than traditional keyword searches.

Hello, dear readers of the blog site. , then its few users had enough your own bookmarks. However, as you remember, it happened in geometric progression, and very soon it became more difficult to navigate in all its diversity.

Then directories appeared (Yahoo, Dmoz and others), in which their authors added and sorted various sites into categories. This immediately made life easier for the then, not yet very numerous users of the global network. Many of these catalogs are still alive today.

But after some time, the size of their databases became so large that the developers first thought about creating a search within them, and then about creating automated system indexing all Internet content to make it accessible to everyone.

The main search engines of the Russian-speaking segment of the Internet

As you understand, this idea was implemented with stunning success, but, however, everything turned out well only for a handful of selected companies that managed not to disappear on the Internet. Almost all search engines that appeared in the first wave have now either disappeared, languished, or were bought by more successful competitors.

A search engine is a very complex and, importantly, very resource-intensive mechanism (this means not only material resources, but also human ones). Behind the seemingly simple , or its ascetic analogue from Google, there are thousands of employees, hundreds of thousands of servers and many billions of investments that are necessary for this colossus to continue to operate and remain competitive.

Entering this market now and starting from scratch is more of a utopia than real business project. For example, one of the world's richest corporations, Microsoft, has been trying to gain a foothold in the search market for decades, and only now their search engine Bing is slowly beginning to meet their expectations. And before that there was a whole series of failures and setbacks.

What can we say about entering this market without any special financial influences. For example, our domestic search engine Nigma has a lot of useful and innovative things in its arsenal, but their traffic is thousands of times lower than the leaders of the Russian market. For example, take a look at the daily Yandex audience:

In this regard, we can assume that the list of the main (best and luckiest) search engines of the Runet and the entire Internet has already been formed and the whole intrigue lies only in who will eventually devour whom, or how they will be distributed percentage share, if they all survive and remain afloat.

Russian search engine market is very clearly visible and here, probably, we can distinguish two or three main players and a couple of minor ones. In general, a rather unique situation has developed in RuNet, which, as I understand it, has repeated itself only in two other countries in the world.

I'm talking about the fact that the Google search engine, having come to Russia in 2004, has still not been able to take leadership. In fact, they tried to buy Yandex around this period, but something didn’t work out there and now “our Russia”, along with the Czech Republic and China, are those places where the almighty Google, if not defeated, then, in any case, met serious resistance.

In fact, to see the current state of affairs among the best search engines Runet Anyone can. It will be enough to paste this URL into address bar your browser:

Http://www.liveinternet.ru/stat/ru/searches.html?period=month;total=yes

The point is that most uses on its websites, and this URL allows you to see statistics of visitors from various search engines to all websites that belong to the RU domain zone.

After entering the given Url, you will see a picture that is not very attractive and presentable, but it well reflects the essence of the matter. Pay attention to the top five search engines from which sites in Russian receive traffic:

Yes, of course, not all resources with Russian-language content are located in this zone. There are also SU and RF, and general areas like COM or NET are full of Internet projects focused on Runet, but still, the sample is quite representative.

This dependence can be presented in a more colorful way, as, for example, someone did online for his presentation:

This doesn't change the essence. There are a couple of leaders and several very, very far behind search engines. By the way, I have already written about many of them. Sometimes it can be quite interesting to plunge into the history of success or, conversely, to delve into the reasons for the failures of once promising search engines.

So, in order of importance for Russia and the Runet as a whole, I will list them and give them brief characteristics:

Searching on Google has already become a household word for many people on the planet - you can read about it in the link. In this search engine, I liked the “translation of results” option, when you received answers from all over the world, but in your native language, but now, unfortunately, it is not available (at least on google.ru).

Lately I have also been puzzled by the quality of their output (Search Engine Result Page). Personally, I always first use the RuNet mirror search engine (there is one there, well, I’m used to it) and only if I don’t find an intelligible answer there, I turn to Google.

Usually the release of them made me happy, but lately it has only puzzled me - sometimes such nonsense comes out. It is possible that their struggle to increase income from contextual advertising and the constant shuffling of search results in order to discredit SEO promotion may lead to the opposite result. In any case, this search engine has a competitor on the RuNet, and what kind of one at that.

I think that it is unlikely that anyone will specifically go to Go.mail.ru to search in RuNet. Therefore, traffic to entertainment projects from this search engine can be significantly more than ten percent. Owners of such projects should pay attention to this system.

However, in addition to the clear leaders in the search engine market of the Russian-language segment of the Internet, there are several more players whose share is quite low, but nevertheless the very fact of their existence makes it necessary to say a few words about them.

Runet search engines from the second echelon

Internet-wide search engines

By and large, on the scale of the entire Internet there is only one serious player - Google. This is the undisputed leader, but it still has some competition.

First of all, it's still the same Bing, which, for example, has a very good position in the American market, especially considering that its engine is also used on all Yahoo services (almost a third of the entire US search market).

Well, secondly, due to the huge share that users from China make up in the total number of Internet users, their main search engine called Baidu wedges itself into the distribution of places on the world Olympus. He was born in 2000 and now his share is about 80% of the entire national audience in China.

It is difficult to say anything more intelligible about Baidu, but on the Internet there are opinions that places in its Top are occupied not only by the sites most relevant to the request, but also by those who paid for it (directly to the search engine, and not to the SEO office). Of course, this applies primarily to commercial listings.

In general, looking at the statistics, it becomes clear why Google easily agrees to worsen its search results in exchange for increasing profits from contextual advertising. In fact, they are not afraid of user churn, because in most cases they have nowhere to go. This situation is somewhat sad, but we'll see what happens next.

By the way, to make life even more difficult for optimizers, and perhaps to maintain peace of mind for users of this search engine, Google has recently used encryption when transmitting queries from users’ browsers to the search bar. Soon it will no longer be possible to see in the statistics of visitor counters what queries Google users came to you for.

Of course, in addition to the search engines mentioned in this publication, there are thousands of others - regional, specialized, exotic, etc. Trying to list and describe them all in one article would be impossible, and probably not necessary. Let's better say a few words about how easy it is to create a search engine and how easy and inexpensive it is to keep it up to date.

The vast majority of systems work on similar principles (read about this and that) and pursue the same goal - to give users an answer to their question. Moreover, this answer must be relevant (corresponding to the question), comprehensive and, which is not unimportant, relevant (of the first freshness).

Solving this problem is not so easy, especially considering that the search engine will need to analyze the contents of billions of Internet pages on the fly, weed out the unnecessary ones, and from the remaining ones form a list (issue), where the most appropriate answers to the user’s question will appear first.

This extremely complex task is solved by preliminary collection of information from these pages using various indexing robots. They collect links from already visited pages and load information from them into the search engine database. There are bots that index text (a regular and fast bot that lives on news and frequently updated resources so that the latest data is always presented in the search results).

In addition, there are robots that index images (for their subsequent output to), favicons, site mirrors (for their subsequent comparison and possible gluing), bots that check the functionality of Internet pages, which users or through tools for webmasters (here you can read about, and) .

The indexing process itself and the subsequent process of updating index databases are quite time-consuming. Although Google does this much faster than its competitors, at least Yandex, which takes a week or two to do this (read about).

Typically, the search engine breaks the text content of an Internet page into individual words, which leads to basic principles so that you can then give correct answers to questions asked in different morphological forms. All excess body kit in the form HTML tags, spaces, etc. things are deleted, and the remaining words are sorted alphabetically and their position in this document is indicated next to them.

This kind of thing is called a reverse index and allows you to search not by web pages, but by structured data located on the search engine servers.

The number of such servers for Yandex (which searches mainly only for Russian-language sites and a little for Ukrainian and Turkish) is in the tens or even hundreds of thousands, and for Google (which searches in hundreds of languages) - in the millions.

Many servers have copies, which serve both to increase the security of documents and help increase the speed of request processing (by distributing the load). Estimate the costs of maintaining this entire economy.

The user's request will be sent by the load balancer to the server segment that is currently least loaded. Then an analysis is carried out of the region from which the search engine user sent his request, and a morphological analysis of it is done. If a similar query was recently entered into search bar, then the user is given data from the cache so as not to overload the servers.

If the request has not yet been cached, then it is transferred to the area where the search engine’s index database is located. In response, you will receive a list of all Internet pages that are at least somewhat related to the request. Not only direct occurrences are taken into account, but also other morphological forms, as well as, etc. things.

Their needs to be ranked and at this stage the algorithm comes into play ( artificial intelligence). In fact, the user's request multiplies at the expense of everyone possible options its interpretation and answers to many queries are sought simultaneously (through the use of query language operators, some of which are available to ordinary users).

As a rule, the search results contain one page from each site (sometimes more). are now very complex and take into account many factors. In addition, to correct them, and are used, which manually evaluate reference sites, which allows you to adjust the operation of the algorithm as a whole.

In general, it is clear that the matter is dark. We can talk about this for a long time, but it is already clear that user satisfaction with a search system is achieved, oh, how difficult it is. And there will always be those who don’t like something, like you and me, dear readers.

Good luck to you! See you soon on the pages of the blog site

You can watch more videos by going to

");">