Document markup languages ​​general information. See what "Markup Language" is in other dictionaries. HTML markup language and basic tags

Hello dear reader. It's time to tell about HTML markup language, with the help of which absolutely all Internet sites are created - both Russian and bourgeois and even Chinese. But this is not a programming language, as some people think, but a hypertext markup language.

Let me remind you that hypertext is text that contains links to other pages and documents. The markup language shows where and how some text element should be placed, for example, a paragraph, a heading, a list, etc. CSS, which is closely related to HTML, is responsible for the design of these elements, which makes the design of website pages beautiful, readable and lightweight due to for unloading the page code.

In addition to CSS, html can be supplemented with functions of the PHP and JavaScript programming languages ​​that make pages interactive, i.e. the ability to respond to user actions.
With the help of all these tools, you can have any complexity and any functionality. The HTML language itself is responsible only for markup.

Web page from inside




This is my website



This is my text

In this code above, you see tag commands, which are sometimes called descriptors. They are enclosed in angle brackets. Tags are mostly paired. Opening and closing, with a slash in front. All code of the html markup language is nested into each other, and resembles, as it were, a “matryoshka”, where one container is nested in another.

The figure below shows the decoding of this code:

And here is how the same page looks in the Mozilla Firefox browser. I showed where the text is displayed header Title and text tag Body

How to create an html page

For clarity, copy the text with the tags above into any text editor, such as notepad, and save it to your desktop. Press the right button and select "rename". Instead of the .txt extension, as with a regular text file, specify the .html or .htm extension. The notepad icon will change to a browser icon, clicking on which will display your first web page.

If the extension is not displayed, then you need to do the following.

Find on your computer: Appearance and personalization - Folder options - View.

Showing the file extension is always useful, so that attackers cannot expect you to open a file with a "gift.jpg" infection, which will actually be "gift.jpg.exe". Without the extension that Windows hides by default, it's very easy to mistake the "malware" launcher file with a hidden .EXE extension for a regular image.

Watch a video on creating HTML pages


Programs for creating HTML pages

Writing Html hypertext markup code manually without experience can seem like a severe test of attentiveness and endurance. But, believe me, that only by strengthening your skills in this way, you will be able to proudly call yourself a webmaster.

For intelligent control over writing html code manually, there are many programs with code highlighting. Among amateur developers, the most popular are Notepad ++, PHP Designer, Dreamweaver

The last two programs are paid, but the old versions, which are no worse than the new ones, can be found on the net for free and legally used for your needs. dreamweaver is a visual editor. It will convert your text with design into code. In any case, you will not regret getting to know this instrument.

Why then learn to write code by hand? The thing is that any visual editors, even the editor built into WordPress, sometimes generate so much garbage of their own code that the pages weigh many times more than those written with the help of the head and hands. Yes, if you also take into account that search engines are now paying attention to the purity of the code, then you will inevitably think about learning Html in order to control the entire process.

In general, the Dreamweaver will replace a good teacher at first. Use this program and see how an html page is written. Most importantly, do not be lazy to look at the top window of the program, where the code is generated. Note for yourself where the program does something extra.

What is a browser really

Many people believe that the browser is designed to search for sites on the Internet, that this is its purpose. Thus, in the understanding of the PC user, an erroneous opinion is created about the purpose of the browser. In fact, a browser is a program for interpreting html code, CSS code, JS code, etc. In other words, it is an application software tool for displaying web pages and other documents.

The capabilities of modern browsers are truly great. After all, web pages contain graphics, videos and texts of various formats. The browser reads the html code, sees the video material embedded there, graphic and text code, and correctly displays all this on device screens. Tags help him with this, these ordinary English words in angle brackets.

The browser sees with the help of tags which part of the text it interprets is the name of the site, which is the heading, what should be presented as a paragraph, where to place the picture and along the way solve many problems associated with various languages ​​embedded in plain HTML.

language HTML markup and main tags
tells the browser that this is an HTML document
here is information for search engines
content is displayed in the browser window
page title
Headings: from largest to smallest
Bold and italic text selection
link text Tells the browser that this is a link with the text "link text"

Create a new paragraph

paragraph alignment (left, right, justify or center )
Tells the browser to create a form

This table is provided to show only the main descriptors.

In the modern version of HTML5, along with new tags, a huge number of new features have appeared that website developers did not even dream of 10 years ago.

Styles in html document

When the browser displays the content of a web page, it displays headings in one style, paragraph text in another, and font sizes are also different for them. Every browser has this enabled by default. But we want to see individual web page designs, and CSS comes to the rescue, cascading style sheet language. Using CSS, you can set the design of any element, you can create any design of a web document.

CSS is a style addition to the html language and does not exist without it.

Styles in Html are embedded like this:

head>

Hr ( color: sienna; )

P ( margin-left: 20px ; )

Body ( background-image: url("images/back40.gif"); )

If an external styles.css file is used, then it is connected to the html document like this:

An example of writing CSS rules:

p (color: black; font: x-small).

Tells the browser that the color of the paragraph

black-black , and the font size is x-small (small)

Here's how, for example, I style the content at the beginning of each article on this blog.

anons
(border: 2px green;
border-radius: 10px
width: 360
font-family: "Yeseva+One";
font-size: 16px;
line-height: 1.2em;padding:10px 10px 10px 20px;
margin:10px auto 20px;
text-align:left;
background-color: #a7cece;
}

the last line has an interesting snippet: background-color: #a7cece ;

#a7cece is the html color. Using the HEX character set - a hexadecimal system: numbers from 0 to 9 and letters from A to F, you can set absolutely any color. Pretty aquamarine is set here.

I will return to the topic of CSS in separate publications.

How to learn HTML markup language
  • The web is full of references to HTML (html). I like the site http://htmlbook.ru. I often go here for reference material. Recommended to save time.
  • Andrey Bernatsky. Check it out for sure!
  • I like the book from American authors. This is a fascinating self-guided HTML/CSS tutorial with such a cool presentation of the material that you will read without stopping. Everything is explained simply and clearly. It can be downloaded for free on the net, but it is better to buy and work with it like a book.

Most The best way to master the HTML markup language (html) is to download the most famous training courses in Runet, besides, some of them are completely free. Visit Evgeny Popov's website and download tons of useful educational information. For professional training, read the information.

Markup languages

Markup language (text) in computer terminology - a set of characters or sequences inserted into the text to convey information about its output or structure. It belongs to the class of computer languages. A text document written using a markup language contains not only the text itself (as a sequence of words and punctuation marks), but also Additional information about its various sections - for example, an indication of headings, highlights, lists, etc. In more difficult cases A markup language allows you to insert interactive elements and content from other documents into a document.

It should be noted that a markup language is not Turing complete and is not usually considered a programming language, although strictly speaking it is.

HTML (from English. Hyper Text Markup Language-- "Hypertext Markup Language") - developed by the British scientist Tim Berners-Lee around 1986-1991 within the walls of the European Center nuclear research in Geneva (Switzerland). HTML was created as a language for the exchange of scientific and technical documentation, suitable for use by people who are not specialists in the field of layout. HTML successfully dealt with the complexity of SGML by defining a small set of structural and semantic elements called descriptors. Descriptors are also often referred to as "tags". With HTML, you can easily create a relatively simple yet beautifully designed document. In addition to simplifying the structure of the document, support for hypertext has been added to HTML. Multimedia features were added later.

Initially, the HTML language was conceived and created as a means of structuring and formatting documents without being tied to the means of reproduction (display). Ideally, text with HTML markup should be reproduced without stylistic and structural distortions on equipment with various technical equipment (color screen of a modern computer, monochrome screen of an organizer, limited-sized screen mobile phone or devices and programs for voice reproduction of texts). However, the modern use of HTML is very far from its original purpose. For example, tag

, used several times for page formatting, is designed to create the most common tables in documents. Over time, the platform's core idea of ​​HTML independence has been sacrificed in favor of modern needs for multimedia and graphic design.

xml (English) eX tensibleM arkupL angle-- extensible markup language; pronounced [ ex-em-eml]) is a markup language recommended by the World Wide Web Consortium (W3C). The XML specification describes XML documents and partially describes the behavior of XML processors (programs that read XML documents and provide access to their content). XML was designed to be a language with a simple, formal syntax that would be easy for programs to create and process documents, while also being easy for humans to read and create, with an emphasis on web use. The language is called extensible because it does not fix the markup used in documents: the developer is free to create markup according to the needs of a particular area, being limited only by the syntax rules of the language. The combination of simple formal syntax, human-friendliness, extensibility, and reliance on Unicode encodings for representing the content of documents has led to the widespread use of both XML itself and a variety of XML-derived specialized languages ​​in a wide variety of software tools.

XHTML (English) Ex tensibleH ypert extM arkupL angle-- extensible hypertext markup language) -- a family of XML-based web page markup languages ​​that repeat and extend the capabilities of HTML 4. The XHTML 1.0 and XHTML 1.1 specifications are recommendations from the World Wide Web Consortium, but at the moment its development has been stopped with the recommendation to use HTML. New versions of XHTML are not released.

The main difference between XHTML and HTML is the processing of the document. XHTML documents are processed by their module (parser) in the same way as XML documents. During this processing, errors made by developers are not corrected.

XHTML conforms to the SGML specification because XML is a subset of it. HTML has many features in the process of processing and actually ceased to belong to the SGML family, which is enshrined in the draft HTML 5 specification.

The browser chooses the parser to process the document based on the content-type header received from the server:

HTML - text/html

XHTML - application/xhtml+xml

· For local viewing on the client, the selection is based on the file extension.

· IN Internet Explorer up to version 8, there is no parser for processing XHTML documents.

WML (English) Wireless Markup Language-- "wireless markup language") -- document markup language for use in cell phones and other mobile devices according to the WAP standard.

The structure resembles somewhat simplified HTML, but there are key differences, since WML is aimed at devices that do not have the capabilities of personal computers (small screen, not all devices can display graphics, small memory size, etc.): all information in WML is contained in the so-called "decks" (Eng. deck). Dec is the smallest unit of data that can be transferred by the server. The decks contain "cards" ( card) (each map is limited by and tags). There should always be at least one card in one deck, but there may be several. At the same time, only one card is displayed on the device screen at a time, and the user can switch between them by clicking on the links - this is done to reduce the number of requests for information to the server; at the same time, the size of WML pages should not exceed 1-4 kilobytes.

VML (English) Vector Markup Language-- vector markup language) was developed by Microsoft to describe vector graphics. VML was submitted to the W3C in 1998 by Microsoft, Macromedia, and others. Around the same time, Adobe, Sun, and several other companies submitted PGML documents for consideration. Both of these languages ​​later became the basis for SVG.

PGML (Precision Graphics Markup Language, loosely translated into Russian - “precision graphics markup language”) is an XML-based markup language used to describe vector graphics on a web page (diagrams, individual interface elements) in the form of text in the format XML uses an imaging model similar to PDF and PostScript. It was submitted to the W3C consortium by Adobe Systems, IBM, Netscape Communications and Sun Microsystems in 1998, but was not accepted as recommended. Almost simultaneously, Microsoft submitted its VML project for consideration, a year later a more advanced SVG language was developed, based on the idea of ​​​​two technologies. SVG has received a W3C recommendation and has become the main format for describing vector graphics on a web page.

SVG (from English. S calableV ectorG raphics-- scalable vector graphics) -- the scalable vector graphics markup language created by the World Wide Web Consortium (W3C) and included in a subset of the extensible markup language XML, is designed to describe two-dimensional vector and mixed vector / bitmap graphics in XML format. Supports both still and animated interactive graphics -- or, in other terms, declarative and scripted. Does not support the description of three-dimensional objects. It is an open standard that is a recommendation of the W3C, the organization behind standards such as HTML and XHTML. SVG is based on the VML and PGML markup languages. Developed since 1999.

XBRL (English) eX tensibleB businessR eportingL angle, lit. Extensible Business Reporting Language is an open standard for electronic financial reporting. The XBRL format is based on the Extensible Markup Language XML. XBRL uses the XML syntax as well as XML-related technologies such as the XML namespace, XML Schema, XLink, and XPath. One of the purposes of XBRL is to represent and exchange financial information, such as the financial statements of companies. The XBRL language specification is developed and published by XBRL International, Inc., an independent international organization.

To improve the visual perception of the web, CSS technology has become widely used, which allows you to set uniform design styles for many web pages. Another innovation worth noting is the URN resource naming system. Uniform Resource Name).

A popular development concept for the World Wide Web is the creation of the Semantic Web. The Semantic Web is an add-on to the existing World Wide Web, which is designed to make the information posted on the network more understandable to computers. The Semantic Web is the concept of a network in which every resource on human language would be provided with a description understandable to the computer. The Semantic Web provides access to clearly structured information for any application, regardless of platform and regardless of programming languages. Programs will be able to find the necessary resources themselves, process information, classify data, identify logical relationships, draw conclusions, and even make decisions based on these conclusions. If widely adopted and implemented well, the Semantic Web has the potential to revolutionize the Internet. To create a computer-friendly description of a resource, the Semantic Web uses the RDF format (Eng. Resource Description Framework), which is based on XML syntax and uses URIs to identify resources. New in this area is RDFS (Eng. RDF Schema) and SPARQL (eng. Protocol And RDF Query Language) new language queries for quick access to RDF data.

(Standard Generalized Markup Language), presented in the ISO 8879 standard. This language is accepted as the main language for the design of technical documentation, including interactive electronic technical manuals on created products in CALS-technologies.

SGML defines the structure of documents as a sequence of data objects. Data objects representing parts of a document can be stored in different files. The SGML standard establishes such sets of symbols and rules for representing information that allow various systems to correctly recognize and identify this information. The named sets are described in a separate part of the document called the DTD declaration.(Document Type Definition), which is transmitted along with the main SGML document. The DTD specifies the correspondence between characters and their codes, the maximum lengths of identifiers used, how delimiters for tags are represented, other possible conventions, DTD syntax, and document type and version. Therefore, SGML can be called a metalanguage for a family of specific markup languages. In particular, XML markup languages ​​can be considered subsets of SGML. and HTML.

The technical description in the form of an SGML document includes:

  • the main file with the technical manual marked up with SGML tags;
  • entity description if the document belongs to a group in which the same entities are used and their fame is implied;
  • dictionary to explain SGML tags;

However, SGML is difficult to learn and use. Therefore, for the widespread use of markup in documents submitted to the WWW-technologies, in 1991, a simplified HTML language was developed on the basis of SGML(HyperText Markup Language), and in 1996 XML(eXtensible Markup Language), which, in combination with HTML, becomes the main language for representing documents in various applications.

The HTML language was developed with the aim of widely using markup in documents presented in WWW technologies.

An HTML description is ASCII text and a sequence of commands (control codes) included in it, also called descriptors or tags. This text is called an HTML document, or an HTML page, or, after being placed on a Web server, a Web page.. Tags are placed in the right places in the source text, they define fonts, hyphenation, the appearance of graphic images, links, etc. When using WWW editors, inserting commands is done by simply pressing the appropriate keys.

XML, like HTML, is considered a subset of SGML. Currently, XML claims to be the main language for representing documents in information technology, it can be thought of as a metalanguage that serves as the basis for creating private markup languages ​​in various applications. At the same time, XML is more convenient than SGML, which is ensured by the elimination of some minor features of SGML in XML. Descriptions in XML are easier to understand, adapted for use in modern browsers while maintaining the core features of SGML.

For specific applications, their own variants of XML are created, called XML dictionaries or XML applications. So, for the description of texts with specific mathematical symbols, an XML-application OSD (Open Software Description) has been developed. For CALS, the Product Definition eXchange (PDX) variant of data exchange is of interest. Known dictionaries for chemistry (CML - Chemical Markup Language), biology (BSML - Bioinformatic Sequence Markup Language), etc.

Lightweight markup languages

Languages ​​designed for easy and fast writing of text in a simple text editor are called lightweight(en:Lightweight markup language). Features of such languages:

  • Minimum features.
  • Small set of supported tags .
  • Easy to learn.
  • The source text in such a language is read with the same ease as the finished document.

They are used where a person has to prepare text in a regular text editor (blogs, forums, wikis), or where it is important that a user with a regular text editor can also read the text. Here are some widely used lightweight markup languages:

  • Wiki markup (see Wikipedia:How to edit articles)
  • Various auto-documentation systems (eg Javadoc).
Story

The term "markup" (markup) comes from the phrase "marking up" ( mark, markup- Eng.) from the traditional publishing practice of putting down special conditional marks in the margins and in the text of a manuscript or proofreading before sending it to print. Thus, "markup men" indicated the typeface, style and font size for each part of the text. Nowadays, text markup is handled by editors, proofreaders, graphic designers - and, of course, the authors themselves.

GenCode

The idea of ​​using markup languages ​​in computer word processing was most likely first introduced by William Tunnicliffe. William W. Tunnicliffe) at a conference in 1967. He himself called his proposal "universal coding" (Eng. generic coding). During the 1970s, Tunnicliffe led the development of the GenCode standard for the publishing industry and later became chairman of a committee of the International Organization for Standardization (ISO). International Organization for Standardization), who created SGML, the first descriptive markup language. Brian Reid (ur. Brian Reid) in his dissertation, which he defended in 1980 at Carnegie University (Eng. Carnegie Mellon University), in the development of the proposed concept, carried out the practical implementation of descriptive markup.

However, IBM researcher Charles Goldfarb is now commonly referred to as the "father" of markup languages. Charles Goldfarb). The basic concept came to him in 1969 while working on a primitive document management system designed for law firms. In the same year, he took part in the creation of the IBM GML language, which was first introduced in 1973.

Some early implementations of computer markup languages ​​can be found in UNIX typography utilities such as troff and nroff . They allow you to insert formatting commands into the text of a document to format it according to the requirements of the editor.

Availability of publishing software with WYSIWYG function (eng. « what you see is what you get" what you see is what you get) has supplanted most of these languages ​​among general users, although serious publishing work still uses markup for specific non-visual text structures, and WYSIWYG editors now most commonly save documents in formats based on markup languages. .

TeX

Another important publishing standard is TeX, created and subsequently improved by Donald Knuth in the 70s and 80s of the twentieth century. TeX has brought together powerful text formatting and font description capabilities, especially for professional-quality math books. This took a lot of time for Knuth to learn the art of typesetting. However, TeX has gone downhill so it is now mainly used in scientific world, where is the de facto standard in many scientific disciplines. In addition to Tex, there is LaTeX, which is a widely used TeX-based descriptive markup system.

Scribe, GML and SGML

The first language with a clear and distinct distinction between document structure and kind was Scribe, created and described by Brian Reid's doctoral dissertation in 1980. Scribe was revolutionary in the way it was processed, not in last turn because of the introduced idea of ​​styles, separated from the actual text and grammar and controlling the use of descriptive elements. Scribe was influential in the development of the GML language (later SGML) and is also the direct ancestor of the HTML and LaTeX languages.

In the early 80s, the idea that markup should focus on the structural aspects of a document and should leave the external representation of the document to the interpreter led to the creation of SGML. The language was developed by a committee headed by Goldfarb. He combined ideas from many sources, including the Tunnikofflick project, GenCode. Sharon Adler, Anders Berglund and James A. Marke were also key members of the SGML committee.

SGML precisely defined the syntax for including markup in text, as well as separately describing which tags are allowed and where (DTD - Document Type Definition). This allowed authors to create and use any markup they wanted, choosing which tags to use and giving them names in the normal language. Thus, SGML should be considered a meta-language; multiple special markup languages ​​have descended from it. The late 80s were most significant in the emergence of new markup languages ​​based on SGML, such as TEI and DocBook.

In 1986, SGML was published as an International Standard by ISO 8879. SGML has found wide acceptance and has been widely used in very large projects. However, it was generally found to be cumbersome and difficult to learn, a side effect of the language being that it tried to do too much and be too flexible. For example, SGML created end tags (or start tags, or even both) that were not always needed because it believed that this markup would be added manually by the project support staff, who would appreciate the savings in keystrokes.

HTML

By 1991, the use of SGML was limited to business programs and databases, while WYSIWYG tools (which saved documents in proprietary binary formats) were used for other document processing programs. The situation changed when Sir Tim Berners-Lee learned about SGML from his colleague Anders Bergland. Anders Berglund) and others at CERN, used the SGML syntax to generate the HTML. The language had similarities to other markup languages ​​based on the SGML syntax, but it was much easier to get started, even for developers who had never done so. Steven DeRose argued that HTML using descriptive markup (and from SGML in particular) was a major factor in the development of the Web because it was designed to be flexible and extensible (as well as other factors including the notion of URLs and free use by browsers). HTML is the most attractive and most used markup language in the world today.

However, HTML's status as a markup language has been disputed by some computer scientists. Their main argument is that HTML restricts the placement of tags by requiring both tags to be nested within other tags or within the document's main tags. As a result, these scholars consider HTML to be a container language following a hierarchical model.

XML

XML (Extensible Markup Language) is a meta markup language widely used today. XML is developed by the World Wibe Web Consortium in a committee chaired by Jon Bosak. The main purpose of XML is to be simpler than SGML and to focus on a specific problem - documents on the web. XML is a meta language like SGML, users are allowed to create any tags they want (hence "extensible"). The rise of XML was helped because every XML document could be written in the same way as an SGML document, and programs and users using SGML could migrate to XML fairly easily.

However, XML lost many of the human-centric features of SGML that made it easier to use (until the amount of markup increased and readability and editability were restored to the same level). Other enhancements fixed some SGML issues internationally and made it possible to parse a document hierarchically even if no DTD was available.

XML was designed primarily for semi-structured environments such as documents and publications. However, it resulted in a sweet spot between flexibility and simplicity, and it was quickly adopted by many users. Nowadays, XML is widely used for passing data between programs. Like HTML, it can be described as a "container" language.

XHTML

Since January 2000, all recommendations to the W3C have been based on XML rather than SGML, the acronym XHTML (Extensible HyperText Markup Language - Extensible HyperText Markup Language) has been proposed. The language specifications required that XHTML documents be formatted as XML documents, this allows XHTML to be used for clearer and more precise documents using tags from HTML.

One of the most noteworthy differences between HTML and XHTML is the rule that all tags must be closed: empty tags, for example, must both be closed by the standard end tag or a special notation: (the space before the "/" in the end tag is optional, but often used as it is used by some pre-XML browsers, also SGML parsers). Other attributes in the tags must be in quotes. Finally, all tags and attribute names must be written in lowercase to be read correctly; HTML is case insensitive.

Other XML based developments

Many XML-based developments are now in use, such as RDF (Resource Description Framework), XFORMS, DocBook, SOAP, and OWL (Ontology Web Language).

Peculiarities

A common feature of all markup languages ​​is that they mix document text with markup instructions in a data stream or file. It is not necessary, it is possible to isolate markup from text using pointers, labels, identifiers, or other coordination methods. This "separated markup" is typical for the internal representation of programs that work with markup documents. However, embedded or "interline" markup is more accepted elsewhere. For example, here is a small piece of text marked up with HTML:

Anatidae

The family Anatidae includes ducks, geese, and swans, but not the closely-related screamers.

Markup instruction code (known as tags) is surrounded by angle brackets. The text between these instructions is the text of the document. Codes h1, p And em- examples of structural markup, they describe the position, purpose or meaning of the text included in them.

More accurately, h1 means "this is a first level heading", p means "this is a paragraph", and em means "this is the underlined word or phrase". The interpreter can apply these rules or styles to display different parts of the text using different typefaces, font sizes, indentation, color, or other styles as needed. A tag such as h1 may, for example, be rendered in large, bold typeface, or in a document with monospaced text (like a typewriter) may be underlined, or may not change appearance at all.

For contrast, tag i in HTML, an example of visual markup; it is usually used to identify specific features of text (use italic typeface in this block) without explanation.

The TEI (Tex Encoding Initiative) has published comprehensive guidance documents specifying how to encode text for the benefit of humanity and scientific societies. These guides were used to code historical documents, specific works of scientists, periodicals and so on.

Alternative uses

While the idea of ​​using markup languages ​​with text documents was developing, it increased the use of markup languages ​​in other areas, suggesting that they be used to represent various types of information, including playlists, vector graphics, web services, user interfaces. Most of these applications are based on XML because it is a highly structured and extensible language.

The use of the XHTML language also shows that it can be combined with different markup languages ​​of the same profile, such as XHTML+SMIL or XHTML+MathML+SVG.

Any document has three components:

structure;

Content is the information that is displayed in the document. The content of the document on paper can be purely textual, and also contain images. If the document is presented in electronic form, it may contain multimedia data, as well as links to other documents. Although the content of different documents varies, they can be classified by type, such as a book or a train ticket.

The style of a document determines the form in which its content is displayed on a particular device (for example, a printer or display). The concept of style includes the characteristics of the font (name, size, color) of the entire output document or its individual blocks, the order of pagination, the arrangement of blocks on pages, and other parameters. The same document can be displayed in different styles both on different media and on the same media.

Document markup languages ​​are artificial languages ​​designed to describe the structure of a document and the relationships between various structure objects. Markup data is also called metadata.

The first markup language is GML (Generalized Markup Language), developed by IBM employees back in the 60s of the last century. Its immediate successor was the SGML language (Standard Generalized Markup Language - the standard generalized markup language), which defines the rules for writing document markup elements. A document that follows the rules of the language is called an SGML document.

The SGML language is defined in the ISO 8879 standard, which specifies the following basic requirements for a document markup language:

The language must be human readable.

· Marked up document files must be textual and encoded using ASCII (American Standard Code for Information Interchange) code characters. However, the content of the document does not have to be ASCII encoded or textual.

SGML and similar languages ​​use special document markup tools:

Elements and related attributes;

· entities (entities);

comments.

The structural unit of an SGML document is an element. In markup text, each element must be highlighted in a certain way. The selection is made by inserting a start tag (from English word tag - label) at the beginning of the element (start tag) and the end tag (end tag) at the end of the element. The start and end tags have the same name. To distinguish tags from plain text, they must begin with a character - a sign of the beginning of the tag and end with a character - a sign of the end of the tag. In addition, a symbol is specified in the end tag - a sign of the end tag. In SGML, any characters can be specified as such signs, but the most common character is "" (left angle bracket) as the start tag sign, and the "/" (slash) character as the end tag sign. Elements in an SGML document may contain other elements, resulting in a graphical representation of the SGML document as a hierarchical (tree) structure.


Example 4.3.1. An SGML document specifying a list of students with the results of their examination session can be specified as follows:

List of student grades in session

Ivanov Ivan Ivanovich

TS-61

A

B

B

B

Petrov Petr Petrovich

TS-62

C

C

D

C

In this document, the first element is the student-list element. This element contains one title element (title) and several student elements (student data). In turn, each student element contains one full-name element (last name, first name and patronymic of the student), one group-number element (group number) and one mark-list element (list of student grades in the session). And finally, the mark-list element contains several mark (evaluation) elements.

Graphical representation of this list in fig. 4.3.1 has a tree structure:

Rice. 4.3.1. SGML document structure in graphical representation

You can use attributes to refine SGML elements. Attributes are written in the start tag of an element in the following form:

attribute-name="attribute-value".

An element can have multiple attributes. Attributes are separated from each other and the element name by at least one space.

Example 4.3.2. For the mark elements in example 4.3.1, you can specify the subject attribute, the value of which is the name of the discipline in which the exam was taken. Then for the first student, the elements will take the following form:

A

B

B

B

Languages ​​like SGML use entities to work with groups of data. An entity is any named data, both textual and non-textual. When viewing the document, the name of the entity is replaced by its value. So, for example, the name of the text entity kpi will be replaced by its value: Kiev Polytechnic Institute, and the non-text entity image1 will be replaced by an image named image1.