Annotated Line Breaking Algorithm

10~~/4.0.1~~/5~~/5.1~~/10

{10.0.0: 147-A79}

This annex~~report~~ presents the Unicode~~specification of~~ line breaking algorithm along with detailed descriptions of each of the character classes established by the Unicode line breaking property. The line breaking algorithm produces a set of "break opportunities", or positions that would be suitable~~properties~~ for wrapping lines when preparing text~~Unicode characters as well as a default algorithm~~ for display~~determining line break opportunities~~. ~~A model implementation using pair tables is also provided.algorithm for determining line break opportunities.~~.

10.a

{3.0.0: 175-A67}

Discussion: This Annotated Line Breaking Algorithm contains the entire text of Unicode Standard Annex #14, Unicode Line Breaking Algorithm, plus certain annotations. The annotations give a more in-depth analysis of the algorithm. They describe the reason for each nonobvious rule, and point out interesting ramifications of the rules and interactions among the rules (interesting to Unicode maintainers, that is). (The text you are reading now is an annotation.)

10.b

{3.0.0: 175-A67}

The structure of this document is heavily inspired by that of the Annotated Ada Reference Manual. For a description of the various kinds of annotations, see paragraphs 1(2.dd) through 1(2.ll) in that document.

10.c

{3.0.0: 175-A67}

A version number of the form /v[.v[.v]] follows the paragraph number for any paragraph that has been modified from the original Unicode Line Breaking Algorithm (Unicode Version 3.0.0). Paragraph numbers are of the form pp{.nn}, where pp is a sequential numbering of the paragraphs of Version 3.0.0, and the nn are insertion numbers. For instance, the first paragraph inserted after paragraph 3 is numbered 3.1, the second is numbered 3.2, etc. A paragraph inserted between paragraphs 3.1 and 3.2 is numbered 3.1.1, a paragraph inserted between paragraphs 3 and 3.1 is numbered 3.0.1, a paragraph inserted between paragraphs 3 and 3.0.1 is numbered 3.-1.1. Inserted text is indicated by highlighting, and deleted text is indicated by strikethroughs. Colour is used to indicate the version of the change. Deleted paragraphs are indicated by the text “This paragraph was deleted.”, or by a description of the new location of any text retained. Compare the Annotated Ada 2012 Reference Manual, Introduction (77.5).

10.d

{3.0.0: 175-A67}

Annotations are numbered similarly, except that the first insertion number is alphabetic rather than numeric.

10.e

{3.0.0: 175-A67}

Discussion: This document is available as an interactive web page; the bar on the right-hand side of the document allows for the selection of the base version from which changes are shown and the “head” version which determines the most recent changes shown. Paragraph deleted by the base version or by earlier versions may be suppressed. Clicking on a version number sets the head to that version and the base to the preceding version, thus showing the changes from that version. These settings are reflected as URL parameters.

12~~/3.0.1~~/4/4.1

{3.0.1: 83-C6; L2/00-118}

This document has been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium. This is a stable document and may be used~~Technical Committee~~ as reference material or cited as a normative~~Unicode Standard Annex. This~~It ~~is a stable document and may be used as~~ reference~~contains informative~~ ~~material or cited as aand~~ ~~normative~~ ~~specifications which have been considered and approved by the Unicode Technical Committee for publication as a Technical Report and as part of the Unicode Standard, Version 3.0. Any~~ ~~reference~~ by other specifications.~~from another document.to version 3.0 of the Unicode Standard automatically includes this technical report. Please mail corrigenda and other comments to the author.~~

12.a

{3.0.0: 175-A67}

To be honest: The document that has been reviewed by the UTC is the actual UAX #14, available at https://www.unicode.org/unicode/reports/tr14/. While the text outside of the annotations comes from that UAX, it has been processed in a way that is not stable and may alter its meaning; in particular, most formatting is lost.

12.b

{3.0.0: 175-A67}

The annotations have not been considered, reviewed, nor approved by the UTC nor by any other Technical Committee.

12.c

{3.0.0: 175-A67}

This Annotated UAX is not a stable document. It has not been approved by any of the Unicode Technical Committees, nor is it part of the Unicode Standard or any other Unicode specification.

12.1~~/3.0.1/3.2~~/4/5

{3.0.1: 83-C6; L2/00-118}

A Unicode Standard Annex (UAX) forms an integral part of the Unicode Standard, but is published online as a separate document. The Unicode Standard may require~~Note that~~ conformance to normative content in a~~carrying the same~~ Unicode Standard Annex, if so specified in the Conformance chapter of that version of the Unicode Standard. The version number of a UAX document corresponds ~~includes conformance~~ to the version ~~number~~ of the~~its~~ Unicode Standard of which it forms a part.~~at the last point that theAnnexes. The version number of, but is published as~~ a ~~UAXseparate~~ ~~document~~ ~~corresponds. Note that conformance~~ ~~to the~~a ~~version number of the Unicode Standard at the last point that the UAX document~~ ~~was updated.includes conformance to its Unicode Standard Annexes.~~

13~~/3.0.1~~/4/5/5.2

{3.0.1: 83-C6; L2/00-118}

Please submit corrigenda and other comments with the online reporting form [Feedback]. Related information that is useful in understanding this annex~~document~~ is found in Unicode Standard Annex #41, “Common~~the~~ References for Unicode Standard Annexes.”~~section.~~ For the latest version of the Unicode Standard, see [Unicode]. For a list of current Unicode Technical Reports, see~~See~~ [Reports]. For more information about versions of the ~~for a~~A ~~list of currentThe content of all technical reports must be understood in the context of the appropriate version of the~~ Unicode ~~Technical Reports~~ ~~is found onStandard. References in this technical report to sections of the Unicode Standard refer to the Unicode Standard, Version 3.0. See~~ ~~http://www.unicode.org/unicode/reports/. For more information about versions of the Unicode~~ Standard, see [Versions]. For any errata which may apply to this annex, see [Errata].~~http://www.unicode.org/unicode/standard/versions/.~~ ~~for more information.~~

13.1~~/3.0.1/3.1~~/4

{3.0.1: 83-C6; L2/00-118}

This paragraph was deleted. ~~The References provide related information that is useful in understanding this document. Please mail corrigenda and other comments to the author(s).~~

15~~/4.0.1~~/4.1

• 1. Overview and Scope

16~~/4.0.1~~/4.1

• 2. Definitions

17/4~~/4.0.1~~/4.1

• 3. Introduction~~Description~~

17.1/4.1

3.1 Determining Line Break Opportunities

18~~/4.0.1~~/4.1

• 4. Conformance

18.1/4/5

This paragraph was deleted. ~~4.1 Line Breaking Properties~~

18.2/4/5/5.1

This paragraph was deleted. ~~4.1~~2 ~~Line Breaking Algorithm~~

18.3/5/5.1

This paragraph was deleted. ~~4.2 Line Breaking Properties~~

18.4/5/5.1

This paragraph was deleted. ~~4.3 Higher-Level Protocols~~

18.5/5.1

4.1 Conformance Requirements

19~~/4.0.1~~/4.1

• 5. Line Breaking Properties

20/3.2

• 5.1 Description of Line Breaking Properties

21~~/3.2~~/4.1

• 5.2 ~~Additional Details on~~ Dictionary Usage

21.0.1/5.1

5.3 Use of Hyphen

21.1/4~~/4.1~~/5.1

5.43 ~~Additional Details on the~~ Use of Soft Hyphen

21.2~~/4.0.1/4.1~~/5.1

5.54 ~~Additional Details on the~~ Use of Double Hyphen

21.3~~/4.1~~/5.1

5.65 Tibetan Line Breaking

21.4/5.1

5.7 Word Separator Characters

22~~/4.0.1~~/4.1

• 6. Line Breaking Algorithm

22.1~~/4.1~~/5

This paragraph was deleted. ~~6.1 Line Breaking Rules~~

22.2/5

6.1 Non-tailorable Line Breaking Rules

22.3/5

6.2 Tailorable Line Breaking Rules

23/4~~/4.0.1/4.1~~/5/10

{10.0.0: 147-A79}

• 7. Deleted. (Formerly was: Pair Table-Based~~basedTabletable~~ ~~Based~~ Implementation)

24~~/3.2~~/10

{10.0.0: 147-A79}

This paragraph was deleted. • ~~7.1 Minimal Table~~

25~~/3.2~~/10

{10.0.0: 147-A79}

This paragraph was deleted. • ~~7.2 Extended Context~~

26~~/3.2/4.0.1~~/10

{10.0.0: 147-A79}

This paragraph was deleted. • ~~7.3 Example Pair Table~~

27~~/3.2~~/10

{10.0.0: 147-A79}

This paragraph was deleted. • ~~7.4 Sample Code~~

28~~/3.2~~/10

{10.0.0: 147-A79}

This paragraph was deleted. • ~~7.5 Combining Marks~~

28.1/4/6.3

This paragraph was deleted. ~~7.6 Conjoining Jamos~~

28.2~~/5.2/6.3~~/10

{10.0.0: 147-A79}

This paragraph was deleted. ~~7.6~~7 ~~Explicit Breaks~~

29~~/3.2~~/4~~/4.0.1~~/4.1

• 8.• ~~7.6~~ Customization

29.1/4

8.1 Types of Tailoring

30~~/3.2~~/4

8.2• ~~7.7~~ Examples of Customization

30.0.1/5

9 Implementation Notes

30.0.2/5

9.1 Combining Marks in Regular Expression-Based Implementations

30.1~~/4.1~~/5

9.2~~8.3~~ Legacy Support for Space Character as Base for Combining Marks

30.2/5.1

10 Testing

30.3~~/5.2~~/16

{16.0.0: 175-A67}

11 History~~Rule Numbering Across Versions~~

31~~/3.1~~/4.0.1

• 8 References

32~~/3.1~~/4.0.1

• 9 Acknowledgments

33~~/3.1~~/4.0.1

• Modifications~~10 Changes from Previous Revisions~~

1 Overview and Scope

34.1/5.2

Line breaking, also known as word wrapping, is the process of breaking a section of text into lines such that it will fit in the available width of a page, window or other display area. The Unicode Line Breaking Algorithm performs part of this process. Given an input text, it produces a set of positions called "break opportunities" that are appropriate points to begin a new line. The selection of actual line break positions from the set of break opportunities is not covered by the Unicode Line Breaking Algorithm, but is in the domain of higher level software with knowledge of the available width and the display size of the text.

35~~/3.1~~/4~~/4.0.1/4.1~~/5/5.2

The~~Although the~~ text of the~~The~~ Unicode Standard [Unicode~~U3.0~~] presents a limited description of some of the characters with specific functions~~function~~ in~~a summary of basic~~ line -breaking ~~behavior~~, butit~~but~~ does not give a complete specification of line breaking behavior. This annex~~Unicode Standard Annextechnical report~~ provides more detailed~~the needed~~ information about default line breaking behavior, reflecting~~in a way that reflects~~ best practices~~. The Unicode Standard assigns normativeNormative~~ ~~line-breaking properties to those characters that are intendedassigned~~ ~~to explicitly influencethose characters whose line breaking behavior must be identical across all implementations. For all other classes of characters informative~~, ~~line-breaking and~~ for the support of multilingual texts.~~which the line-breaking behavior is therefore expected to be identical across all implementations.properties are provided.~~

35.1/4~~/4.0.1/4.1~~/5/5.1

For most Unicode characters, considerable variation in line breaking behavior can be expected, including variation based on local or stylistic preferences. For that reason~~Therefore~~, the line breaking properties provided for these characters~~that~~ are informative. Some characters are intended to explicitly influence line breaking. Their line breaking behavior is therefore expected to be identical across all implementations. As described in this annex, the Unicode Standard assigns normative line breaking properties to those characters. The Unicode Line Breaking Algorithm is a tailorable set of rules that uses these line break~~Standard assigns normative line break~~ing properties ~~to those characters. The Unicode Line Breaking Algorithm is a tailorable set of rules that usesprovided for~~ ~~these line breaking properties~~ in context~~characters are informative. Some characters are intended~~ to determine~~explicitly influence~~ line break opportunities.~~ing. Their line breaking behavior is therefore expected to be identical across all implementations. The Unicode Standard assigns normative line breaking properties to thoseother~~ ~~characters.~~ ~~informative line-breaking properties are provided. For these characters, considerable variation in line-breaking behavior can be expected, including variation based on local or stylistic preferences.~~

35.2~~/4.0.1~~/4.1

This paragraph was deleted. ~~The Unicode Line Breaking Algorithm is a tailorable set of rules that uses these line breaking properties in context to determine line break opportunities.~~

36~~/3.1~~/4~~/4.0.1/4.1~~/5~~/5.1/5.2~~/10

{10.0.0: 147-A79}

This annex~~document~~ opens with~~Following the~~ formal definitions, a ~~and~~ summary of the line - breaking task and the context in which it occurs in overall text layout, followed by a brief section on conformance requirements. Two~~ThreeFourproperties, there are fourthree~~ main sections follow::

37~~/3.1~~/4~~/4.1~~/5/5.2

•1. Section 5, Line Breaking Properties, contains a narrative~~textual~~ description of the line breaking behavior of the characters inof the Unicode Standard, ~~and their~~ grouping them in alphabetical order by line breaking class.~~property. These descriptions do not take account of the order of precedence.~~ Section 6 provides a set of rules listed in order of precedence that constitute a line breaking algorithm. Section 7 provides the detailed description of an efficient pair table based implementation of the algorithm.

37.1~~/3.1~~/4~~/4.1~~/5

•2. Section 6, Line Breaking Algorithm, provides a set of rules listed in order of precedence that constitute a line breaking algorithm.

37.2~~/3.1~~/4~~/4.0.1/4.1~~/5/10

{10.0.0: 147-A79}

This paragraph was deleted. •3. ~~Section 7, Pair Table-BasedTable~~ ~~based~~ ~~Implementation, describesprovides the detailed description of~~ ~~an efficient pair table-~~ ~~based implementation of the algorithm.~~

37.2.1/5/5.2

The next~~final two~~ sections discuss issues of customization and implementation.

37.3/4~~/4.1~~/5

• Section 8, Customization, provides a discussion of how~~on ways~~ to ~~customize or~~ tailor the algorithm.

37.4/5

• Section 9, Implementation Notes, provides additional information to implementers using regular expression-based techniques or requiring legacy support for combining marks.

37.5/5.2

• Section 10, Testing, describes the test data file that is available for checking implementations of the line breaking algorithm.

37.6~~/5.2~~/16

{16.0.0: 175-A67}

• Section 11, History, provides references to additional documentation for investigating~~Rule Numbering Across Versions, documents~~ changes to the algorithm~~in the numbering of the line breaking rules~~ across Unicode versions.

39/4~~/4.1~~/5~~/5.2~~/6/8

The notation~~All terms not~~ defined in this annex differs somewhat from the notation~~here shall be as~~ defined elsewhere in the Unicode Standard. ~~[UnicodeUnicode5.2~~0~~]. The notation defined in this annex differs somewhat from the notation defined elsewhere in the Unicode Standard. All other]. The~~ ~~notation used here without an explicit definition shall be as defined elsewhere in~~ ~~this technical report differs somewhat from the notation defined elsewhere in~~ ~~the Unicode Standard.~~ ~~All other notation used here without an explicit definition shall be as defined in the Unicode Standard.~~ ..

39.1/8

All other notation used here without an explicit definition shall be as defined elsewhere in the Unicode Standard [Unicode].

40/4~~/4.0.1~~/5/8

LD1. Line Fitting: The~~fitting —~~- ~~the~~ process of determining ~~the~~ how much text will fit on a line of text, given the available space between the margins and the actual display width of the text.

41/3.1

This paragraph was deleted. ~~Overfull - a line that contains so much text that it does not fit in the space allotted, or only after unacceptable compression of the text.~~

42/3.1

This paragraph was deleted. ~~Underfull - a line that contains so little text that it ends too far from the margin, or one that would require unacceptable expansion when lines are justified.~~

43~~/4.0.1~~/5/8

LD2. Line Break: The —- ~~the~~ position in the text where one line ends and the next one starts.

44~~/4.0.1/4.1~~/5/8

LD3. Line Break Opportunity: A place where —- a ~~place where a~~ line is allowed to end. ~~Whether a given position in the text is a valid line break opportunity depends on context as well as the line breaking rules in force., as well as on context.~~

44.1/5

• Whether a given position in the text is a valid line break opportunity depends on the context as well as the line breaking rules in force.

45/4~~/4.0.1/4.1~~/5/8

LD4. Line Breaking: The —- ~~the~~ process of selecting one among several line break opportunities such that the resulting line is optimal or ends at~~that part of~~ a ~~text that can be displayed on a line. In other words, selecting one among several line breaking~~ ~~opportunities such that the resulting line is optimal or ends at a(unless the~~ user- requested an explicit line break.).

46/4~~/4.0.1/4.1~~/5~~/5.1~~/8

LD5. Line Breaking Property: —- A character property with enumerated~~mutually exclusive~~ values, as listed~~set out~~ in Table 1, and separated into normative and informative values.. ~~Line breaking property values are used to classify characters, and takenarranged~~ ~~in context, determine the type ofapproximate order of precedence. Line~~ ~~break.ing properties are used to determine the type of break.~~

46.0.1/5/5.1

• Line breaking property values are used to classify characters and, taken in context, determine the type of line break opportunity..

46.0.1.a/10

{10.0.0: 147-A79}

Discussion: While many rules depend only on the code points either side of the break (those of the form A # B or simply A # or # B, where # is either ÷, ×, or !), others depend on context further away. Such rules are said to require extended context.

46.0.1.b/10

{10.0.0: 147-A79}

Rules requiring extended context used to be a concern for pair table-based implementations, and were listed in Section 7. As more rules of this kind have been added, pair table-based implementations have become intractable, and this section has been removed.

46.0.1.c/10

{10.0.0: 147-A79}

However, extended context can lead to unexpected interactions between the rules, so they are called out in this annotated version with the annotation “Ramification: This rule requires extended context.” in order to facilitate the analysis of the algorithm.

46.0.1.c.1/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

Implementations based on state machines may require special treatment for rules of the form A # B C or similar that require more than one character of lookahead beyond the break. These are annotated with “Ramification: This rule requires extended context after the break.”.

46.1/4~~/4.0.1~~/5/8

LD6. Line Breaking Class: A —- a class of characters with the same line breaking property value.

46.2/5/5.2

• The line breaking classes are described in Section 5.1, Description of Line Breaking ~~Classes are described in Section 5.1, Description of Line Breaking~~ Properties.

47~~/3.1~~/4~~/4.0.1~~/5/8

LD7. Mandatory Break: A line must break following -— a character that has the mandatory~~line must~~ break ~~following a character that has the mandatory break~~ property. ~~Such a break is alsoAlso~~ ~~known as a forced break and. This~~ ~~is indicated in the rules as B !, where B is the character with the mandatory break property.~~ ~~(In the notation of the Unicode Standard, Version 3.0 [U3.0], this would be: B ×, although the standard doesn't specify whether or not a break is forced or just an opportunity.)~~

47.1/5/5.2

• Such a break is also known as a forced break and is indicated in the rules as B !, where B is the character with the mandatory break property.

48~~/3.1~~/4~~/4.0.1~~/5/8

LD8. Direct Break: A —- a line break~~ing~~ opportunity exists between two adjacent characters of the given line breaking classes~~properties~~. ~~This is indicated in the rules below as B ÷ A, where B is the character class of the character before and A is the character class of the character after the break.~~ (~~If they are separated by one or more space characters, a break opportunity also exists after the last space. In) This indicated in~~ ~~the pair table,~~ ~~below as B ÷ A, where B is the character class of the character before and A is the character class of the character after the break and~~ ~~the optional space characters are not shown.~~

48.1/5~~/5.1/5.2~~/10

{10.0.0: 147-A79}

• A direct break is indicated in the rules below as B ÷ A, where B is the character class of the character before and A is the character class of the character after the break. If they are separated by one or more space characters, a break opportunity ~~also~~ exists instead after the last space. ~~In the pair table, the optional space characters are not shown.~~

49~~/3.1~~/4~~/4.0.1~~/5/8

LD9. Indirect Break: A —- a line break~~ing~~ opportunity exists between two characters of the given line breaking classes~~properties~~ only if they are separated by one or more spaces. In this case, a break opportunity exists after the last space. No break opportunity exists if the characters are immediately adjacent. This is indicated in the pair table below as B % A, where B is the character class of the character before and A is the character class of the character after the break. Even though no ~~and the optional~~ ~~space characters are not shown in the pair table, an indirect break can only occur if one or more spaces follow B. In the notation of the rules in Section 6, Line Breaking AlgorithmUnicode Standard,~~ ~~this would be represented as two rules: B × A and B SP+ ÷ A.~~

49.1/5

• For an indirect break, a break opportunity exists after the last space. No break opportunity exists if the characters are immediately adjacent.

49.2/5~~/5.1/5.2~~/10

{10.0.0: 147-A79}

• In the notation of the rules in Section 6, Line Breaking Algorithm, anAn indirect break is represented~~indicated in the pair table in Table 2~~ as two rules: B ×% A and~~, where~~ B is the character class of the character before and A is the character class of the character after the break. Even though space characters are not shown in the pair table, an indirect break can occur only if one or more spaces follow B. In the notation of the rules in Section 6, Line Breaking Algorithm, this would be represented as two rules: B × A and B SP+ ÷ A where the “+” sign means one or more occurrences..

49.2.a/10

{10.0.0: 147-A79}

Discussion: Indirect breaks can be represented with such a rule requiring extended context, but within the algorithm, they are instead expressed as × SP, SP ÷, B × A, which does not require extended context.

50~~/3.1~~/4~~/4.0.1~~/5/8

LD10. Prohibited Break: No —- no line break~~ing~~ opportunity exists between two characters of the given line breaking classes~~properties~~, even if they are separated by one or more space characters. This is indicated in the pair table below as B ^ A, where B is the character class of the character before and A is the character class of the character after the break and the optional space characters are not shown. In the notation of the ~~the~~ ~~rules in Section 6, Line Breaking AlgorithmUnicode Standard,~~ ~~this would be expressed as athe~~ ~~rule of the form: B SP* × A.~~

50.1/5~~/5.1/5.2~~/10

{10.0.0: 147-A79}

• In the notation of the rules in Section 6, Line Breaking Algorithm, aA prohibited~~direct~~ break is expressed~~indicated in the pair table in Table 2~~ as a rule of the form: B ^ A, where B is the character class of the character before and A is the character class of the character after the break, and the optional space characters are not shown. In the notation of the rules in Section 6, Line Breaking Algorithm, this would be expressed as a rule of the form: B SP* × A.

50.1.a/10

{10.0.0: 147-A79}

Discussion: Not all prohibited breaks involve a rule requiring extended context: a rule × A before the rule SP ÷ is a prohibited break before A. However, prohibited breaks with context before spaces require extended context.

51~~/4.0.1/4.1~~/5/8

LD11. Hyphenation: — Hyphenation uses language-specific rules to provide additional line break opportunities within a word. Hyphenation improves the layout of narrow columns, especially for languages with many longer words, such as German or Finnish. For the purpose of this document, it is assumed that hyphenation is equivalent~~uses language specific rules~~ ~~to insertinginsertion of~~ ~~soft hyphen characters. All other aspects of hyphenation are outside the scope~~provide additional line breaking opportunities within a word. Hyphenation improves the layout of narrow columns, especially for languages with many longer words, such as German or Finnish. For the purpose ~~of this document., it is assumed that hyphenation is equivalent to insertion of soft hyphen characters. All other aspects of hyphenation are outside the scope of this document.~~

51.1/5

• Hyphenation improves the layout of narrow columns, especially for languages with many longer words, such as German or Finnish. For the purpose of this annex, it is assumed that hyphenation is equivalent to inserting soft hyphen characters. All other aspects of hyphenation are outside the scope of this annex.

51.2/5/5.1

Table 1 lists all of line breaking classes by name, also giving their class abbreviation and their status as tailorable or not. The examples and brief indication of line breaking behavior in this table are merely typical, not exhaustive. Section 5.1, Description of Line Breaking Properties, provides a detailed description of each~~summary listing of all~~ line breaking class, including~~classes while Section 5.1, Description of Line Breaking Properties, provides a~~ detailed overview of the~~description of each~~ line breaking behavior for characters of that class.~~, including detailed overview of the line breaking behavior for characters of that class.~~

52/4~~/4.1~~/5/5.2

Table 1.: Line Breaking Classes~~Properties~~ ~~(* = non-tailorablenormative~~)

53~~/3.1~~/4~~/4.0.1~~/5/5.2

Class~~Value~~

Descriptive Name~~Line Breaking Property~~

Examples

Behavior

~~Characters with This Property...this property...~~

53.1~~/3.1~~/4/5

Non-tailorable~~Normative~~ Line Breaking Classes~~Properties~~

54~~/3.1/4.0.1~~/5~~/5.2~~/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

BK *

Mandatory Break

NL, PARAGRAPH SEPARATORPS

Cause

~~cause~~ a line break (after).

55~~/3.1/4.0.1~~/5/5.2

CR *

Carriage Return

Cause

~~cause~~ a line break (after), except between CR and LF

56~~/3.1/4.0.1~~/5/5.2

LF *

Line Feed

Cause

~~cause~~ a line break (after)~~, except between CR and LF~~

57~~/3.1~~/4~~/4.0.1~~/5/5.2

CM *

~~Attached Characters and Combining Marks~~

Combining Mark

Combining marks~~Marks~~, control codes~~, Conjoining Jamo (non-initial)~~

Prohibit~~prohibit~~ a line break between the character and the preceding character

57.1/4~~/4.0.1~~/5/5.2

NL *

Next Line

NEL

Cause

~~cause~~ a line break (after)

58~~/3.1~~/4~~/4.0.1~~/5~~/5.1~~/5.2

SG *

Surrogate~~Surrogates~~

~~High~~ Surrogates

Do~~Should~~

~~should~~ not occur in well-formed text~~prohibit a break betweenfrom~~ ~~a high and a following low surrogate~~

58.1/4~~/4.0.1~~/5~~/5.1~~/5.2

WJ *

Word Joiner

Prohibit

~~prohibit~~ line breaks before andor after

59~~/3.1~~/4~~/4.0.1~~/5/5.2

ZW *

Zero Width Space

ZWSP

Provide

~~provide~~ a~~optional~~ break opportunity

60/3.1

This paragraph was deleted. IN

~~Inseparable~~

~~Leaders~~

~~allow only indirect line breaks between pairs.~~

61~~/3.1/3.2~~/4~~/4.0.1~~/5~~/5.1~~/5.2

{3.2.0: 81-M6, 85-M7; L2/00-258} {3.2.0: 83-AI43, 84-M10, 85-M13; L2/00-156}

GL *

Non-breaking (“Glue”)

CGJ,

NBSP, ZWNBSP~~, WJ, CGJZWNSP~~

Prohibit

~~prohibit~~ line breaks before andor after.

62~~/3.1/4.0.1~~/5

This paragraph was deleted. CB *

~~Contingent Break Opportunity~~

~~Inline Objects~~

~~provide a line break opportunity contingent on additional information.~~

63~~/3.1~~/4~~/4.0.1~~/5~~/5.1~~/5.2

SP *

Space

SPACE

~~Space~~

Enable indirect~~Generally~~

~~generally~~ ~~provide a~~ line ~~break opportunity after the character, enableenables~~ ~~indirect~~ breaks

63.0.1/9

ZWJ

Zero Width Joiner

Prohibit line breaks within joiner sequences

63.1/3.1

Break Opportunities

63.2/4~~/4.0.1~~/5

Break Opportunity Before and After

Em dash

~~EM Dash~~

Provide

~~provide~~ a line break opportunity before and after the character

64~~/3.1/4.0.1~~/5/5.2

Break ~~Opportunity~~ After

Spaces, hyphens~~Hyphens~~

Generally

~~generally~~ provide a line break opportunity after the character

65~~/3.1/4.0.1/4.1~~/5/5.2

Break ~~Opportunity~~ Before

Punctuation used in dictionaries

Generally

~~generally~~ provide a line break opportunity before the character.

66~~/3.1~~/4

This paragraph was deleted. B2

~~Break Opportunity Before and After~~

~~EM Dash~~

~~provide a line break opportunity before and after the character~~

67~~/3.1/4.0.1~~/5

Hyphen

HYPHEN~~Hyphen~~-MINUS~~Minus~~

Provide

~~provide~~ a line break opportunity after the character, except in numeric context

67.0.1/5

Contingent Break Opportunity

Inline objects

Provide a line break opportunity contingent on additional information

67.1/3.1

Characters Prohibiting Certain Breaks

67.1.1/4~~/4.0.1~~/5/5.2

Close~~Closing~~ Punctuation

“~~)”, “]”, “~~}”, “❳”, “⟫” etc.

Prohibit~~prohibit a~~ line breaks~~break~~ before

67.1.1.1/5.2

Close Parenthesis

“)”, “]”

Prohibit line breaks before

67.1.2/4~~/4.0.1~~/5

Exclamation/ Interrogation

“!”, “?”, etc.

Prohibit~~prohibit~~ line breaks~~break~~ before

67.2~~/3.1/4.0.1~~/5

Inseparable

Leaders

Allow

~~allow~~ only indirect line breaks between pairs.

68~~/3.1/4.0.1~~/5/6.1

Nonstarter~~Non Starter~~

“‼”, “‽”, “⁇”, “⁉”, etc.

~~small kana~~

Allow

~~allow~~ only indirect line breaks~~break~~ before

69~~/3.1/4.0.1~~/5/5.2

Open~~ing~~ Punctuation

“(“, “[“, “{“, etc.

Prohibit~~prohibit a~~ line breaks~~break~~ after

70~~/3.1~~/4

This paragraph was deleted. CL

~~Closing Punctuation~~

~~“)”, “]”, “}”, etc.~~

~~prohibit a line break before~~

71~~/3.1/4.0.1~~/5~~/5.2~~/16

{16.0.0: 175-C23, 175-A71; L2/23-063}

~~Ambiguous~~ Quotation

Quotation marks

Act

~~act~~ like they are opening, closing, or both ~~opening and closing~~

72~~/3.1~~/4

This paragraph was deleted. EX

~~Exclamation/Interrogation~~

~~“!”, “?” etc.~~

~~prohibit line break before~~

73/3.1

Numeric ContextID

~~Ideographic~~

~~Ideographs~~

~~break before or after~~

73.1/4~~/4.0.1~~/5/5.2

Infix Numeric Separator ~~(Numeric)~~

. ,

~~. ,~~

Prevent~~prevent~~ breaks after any and before numeric

74~~/3.1/4.0.1~~/5

Numeric

Digits

Form

~~form~~ numeric expressions for line breaking purposes

75~~/3.1~~/4

This paragraph was deleted. IS

~~Infix Separator (Numeric)~~

~~. ,~~

~~prevent breaks after any and before numeric~~

75.1/4~~/4.0.1~~/5/5.2

Postfix (Numeric)

%, ¢

do not break following a numeric expression

75.2/4~~/4.0.1/4.1~~/5/5.2

Prefix (Numeric)

$, £, ¥, etc.

Dodo not~~don't~~ break in front of a numeric expression

76~~/3.1/4.0.1~~/5

Symbols Allowing Break After~~Breaks~~

Prevent

~~prevent~~ a break before, and allow a break after

77/3.1

This paragraph was deleted. AL

~~Ordinary Alphabetic and Symbol Characters~~

~~Alphabets and regular symbols~~

~~are alphabetic characters or symbols that are used with alphabetic characters~~

78~~/3.1~~/4

This paragraph was deleted. PR

~~Prefix (Numeric)~~

~~$, £, ¥, etc.~~

~~don't break in front of a numeric expression~~

79~~/3.1~~/4

This paragraph was deleted. PO

~~Postfix (Numeric)~~

~~%, ¢, ‰, º~~

~~don’t break following a numeric expression~~

79.1/3.1

Other Characters

79.1.1/4~~/4.0.1~~/5

Ambiguous (Alphabetic or Ideographic)

Characters with Ambiguous East Asian Width

Act like AL when the resolved EAW is N; otherwise,

act ~~like AL when the resolved EAW is N otherwise act~~ as ID

79.1.2/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Aksara

Consonants

Form orthographic syllables in Brahmic scripts

79.2~~/3.1/4.0.1~~/5/5.2

~~Ordinary~~ Alphabetic ~~and Symbol Characters~~

Alphabets and regular symbols

Are alphabetic characters or symbols that

are used with alphabetic characters ~~or symbols that are used with alphabetic characters~~

79.3~~/3.1~~/4~~/4.0.1~~/4.1

This paragraph was deleted. ID

~~Ideographic~~

~~Ideographs, Hangul, conjoining Jamo~~

~~break before or after, except in some numeric context~~

79.3.1/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Aksara Pre-Base

Pre-base repha

Form orthographic syllables in Brahmic scripts

79.3.2/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Aksara Start

Independent vowels

Form orthographic syllables in Brahmic scripts

79.4~~/3.1~~/4

This paragraph was deleted. AI

~~Ambiguous (Alphabetic or Ideographic)~~

~~Characters with Ambiguous East Asian Width~~

~~act like AL when the resolved EAW is N otherwise act as ID~~

79.4.1/6.1

Conditional Japanese Starter

Small kana

Treat as NS or ID for strict or normal breaking.

79.4.2/9

Emoji Base

All emoji allowing modifiers

Do not break from following Emoji Modifier

79.4.3/9

Emoji Modifier

Skin tone modifiers

Do not break from preceding Emoji Base

79.5~~/4.1~~/5

Hangul LV Syllable

Hangul

Form~~form~~ Korean syllable blocks

79.6~~/4.1~~/5

Hangul LVT Syllable

Hangul

Form~~form~~ Korean syllable blocks

79.6.1/6.1

Hebrew Letter

Hebrew

Do not break around a following hyphen; otherwise act as Alphabetic

79.7~~/4.1~~/5

Ideographic

Ideographs

Break~~break~~ before or after, except in some numeric context

79.8~~/4.1~~/5

Hangul L Jamo

Conjoining jamo~~Jamo~~

Form~~form~~ Korean syllable blocks

79.9~~/4.1~~/5

Hangul V Jamo

Conjoining jamo~~Jamo~~

Form~~form~~ Korean syllable blocks

79.10~~/4.1~~/5

Hangul T Jamo

Conjoining jamo~~Jamo~~

Form~~form~~ Korean syllable blocks

79.11~~/6.2~~/9

Regional Indicator

REGIONAL INDICATOR SYMBOL LETTER A .. Z

Keep pairs together. For pairs, break before and after other classes~~from others~~

80~~/3.1/4.0.1~~/5

Complex Context Dependent (South East Asian)

South East Asian: Thai, Lao, Khmer

Provide

~~provide~~ a line break opportunity contingent on additional, language- specific context analysis

81/3.1

This paragraph was deleted. AI

~~Ambiguous (Alphabetic or Ideographic)~~

~~Characters with Ambiguous East Asian Width~~

~~act like AL when the resolved EAW is N otherwise act as ID~~

81.1/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Virama Final

Viramas for final consonants

Form orthographic syllables in Brahmic scripts

81.2/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Virama

Conjoining viramas

Form orthographic syllables in Brahmic scripts

82~~/3.1/4.0.1/4.1~~/5/5.2

Unknown

Most unassigned

~~Unassigned~~, private-use~~Private Use~~

Have~~have~~

~~are all characters with (~~as yet) unknown line breaking behavior or unassigned code positions

83/4

3 IntroductionDescription

84~~/4.0.1/4.1/5.1~~/5.2

Lines are broken as the result of ~~oneeither~~ of two conditions. The first is the presence of a mandatory line breaking character. The second condition results from~~is the presence of~~ a formatting algorithm having selected among available~~mandatoryan explicit~~ line break opportunities; ideally the chosen line break~~ing character. The second condition~~ results ~~from a formatting algorithm having selected among available line breaking~~ ~~opportunities; ideally the chosen line break~~ ~~the particular one that~~ ~~results~~ in the optimal layout of the text.

85/4~~/4.0.1/4.1~~/5/5.1

Different formatting algorithms may use different methods to determine~~of determining~~ an~~The definition of~~ optimal line break. For example, simple implementations ~~just~~ consider a single line at a time, trying to find a locally ~~is outside the scope of this document. Different formatting algorithms may use different methods of determining an~~ optimal line break. A basic, yet widely used approach is~~For example, simple implementations just consider a line at a time, trying~~ to ~~find a locally optimal line break. A common approach is to~~ allow no compression or expansion of the intercharacter and interword~~inter-character and inter-word~~ spaces and consider the longest line that fits. More complex formatting algorithms often take into account the interaction of line breaking decisions for the whole paragraph. The well-known text layout system [TEX] implements an example of such a globally~~When compression or expansion is allowed, a locally~~ optimal strategy that may make complex tradeoffs across an entire paragraph to avoid unnecessary hyphenation and other legal, but inferior breaks. For a description of this strategy, see [Knuth78].~~line break seeks to balance the relative merits of the resulting amounts of compression and expansion for different line break candidates.~~

86~~/3.1~~/4/4.0.1

This paragraph was deleted. ~~More complex algorithms may take into account the interaction of line breaking decisions for the whole paragraph. The well known text layout system~~, ~~[TEX] implements a~~ ~~well known~~ example of such a globally optimal strategy that may make complex tradeoffs across an entire paragraph to avoid unnecessary hyphenation and other legal, but inferior breaks. For a description of this strategy~~a globally optimizingthe purpose of this document, what is important is not so much what defines the optimal amount of text on the~~ ~~line fitting algorithm, see [Knuth78]., but how line breaking opportunities are defined.~~

86.0.1/4~~/4.0.1/4.1~~/5~~/5.1~~/6

When compression or expansion is allowed, a locally optimal line break seeks to balance the relative merits of the resulting amounts of compression and expansion for different line break candidates. When expanding or compressing interword~~inter-word~~ space according to common typographical practice, only the spaces~~space~~ marked by U+0020 SPACE and, U+00A0 NO-BREAK SPACE~~, and U+3000 IDEOGRAPHIC SPACE~~ are ~~normally~~ subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width. When expanding or compressing intercharacter~~inter-character~~ space, the presence of U+200B ZERO WIDTH SPACE or U+2060 WORD JOINER is~~are~~ always ignored.

86.0.2/4~~/4.0.1/4.1~~/5/5.1

Local custom or document style determines whether and~~Whether~~ to what degree~~allow~~ expansion of intercharacter~~inter-character~~ space is allowed into justifying a line~~, and how much, depends on local custom~~. In ~~thosesome~~ languages, such as~~for example,~~ German, where intercharacter~~inter-character~~ space is commonly used to mark e m p h a s i s (like this)~~. In such languages~~, allowing variable intercharacter~~inter-character~~ spacing would have the unintended effect of adding random emphasis, and is~~should~~ therefore bestbe avoided. In table headings that use Han ideographs, even extreme amounts of intercharacter space commonly occur as short texts are spread out across the entire available space to distribute the characters evenly from end to end. ~~altogether.~~.

86.0.3/4/5/5.1

This paragraph was deleted. ~~In table headings that use Han ideographs,~~ ~~on the other hand,~~ ~~even extreme amounts of intercharacterinter-character~~ ~~space commonly occur as short texts are spread out across the entire available space to distribute the characters evenly from end to end.~~

86.0.4~~/4.0.1~~/5/5.1

This paragraph was deleted. ~~More complex formatting algorithms may take into account the interaction of line breaking decisions for the whole paragraph. The well-~~ known text layout system [TEX] implements an example of such a globally optimal strategy that may make complex tradeoffs across an entire paragraph to avoid unnecessary hyphenation and other legal, but inferior breaks. For a description of this strategy, see [Knuth78].

86.0.5~~/5.1/5.2~~/6

In line break~~linebreak~~ing it is necessary to distinguish between three~~two~~ related tasks. The first is the determination of all legal line break opportunities, given a string of text. This is the scope of the Unicode Line Breaking Algorithm. The second task is the selection of the actual location for breaking a given line of text. This selection not only takes into account the width of the line compared to the width of the text, but may also apply an additional prioritization of line breaks based on aesthetic and other criteria. What defines an optimal choice for a given line break is outside the scope of this annex, as are methods for its selection. The third is the possible justification of lines, once actual locations for line breaking have been determined, and is also out of scope for the Unicode Line Breaking Algorithm.

86.1~~/3.1~~/4~~/4.0.1/4.1~~/5/5.1

This paragraph was deleted. ~~The definition of optimal line breaks is outside the scope of this annexdocument, as arebreak or~~ ~~methods for their selection. For the purposeselecting it are outside the scope~~ ~~of this annexdocument. For the purpose of this document, what is important is not so much what defines the optimal amount of text on the line, but how to determine all legal line breakpossible line breaking~~ ~~opportunities~~ ~~are determined. Whether and how any given line breaka line break~~ opportunity is actually used is up to the full layout system. Some layout systems will further evaluate the raw line break opportunities returned from the line breaking algorithm and apply additional rules. [~~TEX,~~] ~~for example, uses line break opportunities based on hyphens only as a last resort.~~

86.1.1~~/4.1~~/5/5.1

Finally, ~~most~~ text layout systems may~~will~~ support an emergency mode that~~which~~ handles the case of an unusual line that contains no otherwise permitted line break opportunities. In such line layout emergencies, line breaks may be placed with no regard to the ordinary line breaking behavior of the characters involved. The details of ~~opportunities. In~~ such an emergency mode~~line layout emergencies, line breaks~~ are outside the scope of this annex, however, it is recommended that grapheme clusters be kept together.~~placed with no regard to the ordinary line breaking behavior of the characters involved.~~

86.2/4~~/4.0.1~~/4.1

3.1 Determining Line Break Opportunitiesline breaking opportunities

87~~/4.0.1~~/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Four~~Three~~ principal styles of context analysis determine line -break~~ing~~ opportunities.

88/4/5

1. Western: — (spaces and hyphens are used to determine breaks)

89/4/5

2. East Asian: — (lines can break anywhere, unless prohibited)

90/4/5

3. South East Asian: — line breaks (require morphological analysis)

90.1/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

4. Brahmic: line breaks can occur at the boundaries of any orthographic syllable

91~~/3.1/4.0.1/4.1~~/5

The ~~first, or~~ Western style is commonly used for scripts employing the space character. Hyphenation is often~~The second is~~ used with space~~East Asian ideographic scripts. The third is used for scripts such as Thai, which do not use spaces, but which restrict word~~-based line breaking~~breaks~~ to provide additional line break opportunities— - however, it~~syllable boundaries, the determination of which~~ requires knowledge of the language and ~~in addition,~~ it may need~~potentially~~ user interaction or overrides.~~comparable to that required by a hyphenation algorithm.~~

91.1~~/3.1~~/4/4.1

The second style of context analysis is used with East Asian ideographic and syllabic scripts. In these scripts, lines can break anywhere, except before or after certain characters. The precise set of prohibited line breaks may depend on user preference or local custom and is commonly tailorable..

92~~/3.1~~/4/4.1

Korean makes use of both styles of line break.~~NOTE:~~ When~~Note:~~ Korean text is ~~laid out~~ justified, the second style is commonly used, even for interspersed Latin letters. But when ragged margins are used, the Western~~first~~ style~~may alternately use a space-based~~ (relying on spaces~~style 1~~) is commonly used instead, even for ideographs.. ~~of the style 2 context analysis.~~

93/3.1

This paragraph was deleted. Space-based line breaking is often augmented by hyphenation. Some Unicode characters have explicit line breaking properties assigned to them. These can be used for the first and second type context analysis for line break opportunities. For multilingual text, styles one and two can be unified into a single set of specifications.

93.1~~/3.1~~/4

This paragraph was deleted. For multilingual text, styles one and two can be unified into a single set of specifications, based on the information provided in this report. Some Unicode characters have explicit line breaking properties assigned to them. These can be utilized with these two styles of context analysis for line break opportunities.

94~~/3.1~~/4

This paragraph was deleted. ~~NOTENote~~: Interpretation of line breaking properties in bidirectional text takes place before applying rule L1 of the Unicode Bidirectional Algorithm. However, it is strictly independent of directional properties of the characters or of any auxiliary information determined by the application of rules of that algorithm.

94.1~~/3.1~~/4/5~~/5.2~~/15.1

{15.1.0: 173-A8; L2/22-244; PRI-446#ID20220410201211}

The third style is used for scripts such as Thai, which allow line breaks only at word boundaries, but do not mark~~use spaces, but which restrict~~ word -~~breaks to syllable~~ boundaries in any way, so that the~~, whosethe~~ determination of line break opportunities ~~of which~~ requires ~~knowledge of the~~ language dependent text analysis. Algorithms and data for such analysis are~~comparable to that required by a hyphenation algorithm. Such anand~~ ~~algorithm is~~ beyond the scope of the Unicode Standard.~~this report.~~

94.1.1/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

The fourth style is used in some Brahmic scripts, such as Brahmi, Balinese, or Javanese, which allow line breaks to occur at the boundaries of any orthographic syllable, without restricting them to word boundaries. This style is only supported for scripts that encode orthographic syllables in primarily phonetic order.

94.2/4~~/4.1~~/5~~/5.1~~/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

For multilingual text, the Western, ~~and~~ East Asian, and Brahmic styles can be unified into a single set of specifications, based on the information in this annex~~report~~. Unicode characters have explicit line breaking properties assigned to them. These properties can be utilized to implement the effect of both of~~with~~ these~~one and~~ two styles of context analysis for line break opportunities. Customization for user preferences or document style can then be unified into a single set of specifications, based on the information provided in this report. Some Unicode characters have explicit line breaking properties assigned to them. These can be utilized with these two styles of context analysis for line break opportunities. Customization for user preferences or document style can then be achieved by tailoring that specification.

94.3/4~~/4.1/5.1~~/9

In bidirectional text,~~Determining the~~ line breaks ~~in bidirectional text~~ ~~takes~~ are determined~~place~~ before applying rule L1 of the Unicode Bidirectional Algorithm [UAX9~~Bidi~~]. However, line breakingit is strictly independent of directional properties of the characters or of any auxiliary information determined by the application of rules of that algorithm.

96~~/3.1~~/4

This paragraph was deleted. •· The line breaking behavior of characters with normative line breaking properties is described in the Unicode Standard. (See The Unicode Standard, Version 3.0, Chapters 6 and 13). Unless otherwise stated, the information in this technical report is not intended to supersede the normative specifications found in the Unicode Standard, but to organize the description in a different context and provide additional informative detail.

96.1/4~~/4.0.1~~/5~~/5.1~~/5.2

There is no single method for determining line breaks; the rules may differ~~change~~ based on user preference and document layout. The~~Therefore the~~ information, in this annex, including the specification of the line breaking algorithm, allows~~must allow~~ for the necessary flexibility in determining line breaks according to different conventions~~is informative, rather than normative~~. However, some characters have been encoded explicitly for their effect~~fact, the rules may change based~~ on ~~user preference and document layout. Therefore the information in this annex, including the specification of the~~ line breaking. Because users~~Users~~ adding such characters to a text expect that they will have the desired effect~~. For that reason~~, these characters have been given ~~algorithm, is informative, rather than~~ required~~non-tailorable~~ line~~normative. However, there are some characters which have been encoded explicitly for the purpose of their effect on~~ ~~line~~ breaking~~. Users adding such characters to a text must be able to expect that they will have the desired effect. For that reason, these characters have been given normative line breaking~~ behavior. ~~The conformance requirements are spelled out in the following subsections.~~

96.1.1/5/5.1

This paragraph was deleted. At times, this specification recommends best practice. These recommendations are not normative and conformance with this specification does not depend on their realization. These recommendations contain the expression “This specification recommends ...”, or some similar wording.

96.1.2/5/5.1

This paragraph was deleted. 4.1 Line Breaking Algorithm

96.1.2.1~~/5.1~~/5.2

To handle certain situations, some line breaking implementations use techniques that cannot be expressed within the framework of the Unicode Line Breaking Algorithm. Examples include using dictionaries of words for languages that do not~~the~~ use spaces, such as Thai; recognition of the language of the text in order to choose among different punctuation conventions; usingof dictionaries of common abbreviations or contractions to resolve ambiguities with periods or apostrophes; or a deeper analysis of common syntaxes~~words~~ for numbers~~languages that do not use spaces, such as Thai; recognition of the language of the text in order to choose among different punctuation conventions; the use of dictionaries of common abbreviations~~ or ~~contractions to resolve ambiguities with periods or apostrophes; or a deeper analysis of common syntaxes for numbers or~~ dates, and so on. The conformance requirements permit variations of this kind.

96.1.2.2~~/5.1/5.2~~/6

Processes which support multiple modes for determining line breaks are also accommodated. This situation can arise with marked-up text, rich text, style sheets, or other environments in which a higher-level protocol can carry formatting instructions that prevent or force line breaks in positions that differ from those specified by the Unicode Line Breaking Algorithm. The approach taken here requires that such processes have a conforming default line break behavior, andis to disclose that they also include overrides or optional behaviors that are invoked via~~require that such processes have~~ a ~~conforming default line break behavior, and to disclose that they also include overrides or optional behaviors that are invoked via a~~ higher-level protocol.

96.1.2.3/5.1

The methods by which a line layout process chooses optimal line breaks from among the available break opportunities is outside the scope of this specification. The behavior of a line layout process in situations where there are no suitable break opportunities is also outside of the scope of this specification.

96.1.2.3.1/12

{12.0.0: 173-A128}

Note: Locale-sensitive line break specifications can be expressed in LDML [UTS35]. Tailorings are available in the Common Locale Data Repository [CLDR].

96.1.2.4/5.1

4.1 Conformance Requirements

96.1.3/5/5.1

UAX14-C1. A process that determines line breaks in Unicode text, and that purports to implement the Unicode Line Breaking Algorithm, shall do so in accordance with the specifications in this annex. In particular, the following three subconditions~~the absence of a permissible higher-level protocol, a process that determines line breaks in Unicode text, and that purports to implement the Unicode Line Breaking Algorithm,~~ shall be met:~~do so in accordance with the specifications in this annex.~~

96.1.4/5/5.1

This paragraph was deleted. • As is the case for all other Unicode algorithms, this specification is a logical description—particular implementations can have more efficient mechanisms as long as they produce the same results. See C18 in Chapter 3, Conformance, of [Unicode], and the notes following.

96.1.5/5/5.1

This paragraph was deleted. • The line breaking algorithm specifies part of the intrinsic semantics of characters specifically encoded for their line breaking behavior and, therefore, is required for conformance to the Unicode Standard where text containing such characters is broken into lines.

96.1.5.1/5.1

1. The sets of mandatory break positions and of break opportunities which the implementation produces include all of those specified by the rules in Section 6.1, Non-tailorable Line Breaking Rules.

96.1.5.2/5.1

2. There exist no break opportunities or mandatory breaks produced by the implementation that fall on a "non-break" position specified by the rules in Section 6.1, Non-tailorable Line Breaking Rules.

96.1.5.3/5.1

3. If the implementation tailors the behavior of Section 6.2, Tailorable Line Breaking Rules, that fact must be disclosed.

96.1.6/5/5.1

UAX14-C2. If an implementation has a default line breaking operation which conforms to UAX14-C1, but also has overrides based on a~~The permissible~~ higher-level protocol, that fact must be disclosed and any behavior that differs from that specified by the rules of~~protocols are described in~~ Section 6.1, Non~~4.3, Higher~~-tailorable Line Breaking Rules, must be documented.~~Level Protocols.~~

96.2/4/5/5.1

This paragraph was deleted. 4.21 Line Breaking Properties

97~~/3.1~~/4/5/5.1

This paragraph was deleted. •· ~~All line breaking classesproperties~~ ~~are normative, but overridableinformative, except for thosethe~~ ~~line breaking classesproperties~~ ~~marked with~~ a * in Table 1, which are not overridable. ~~Line Breaking Properties. The interpretation ofbehavior for~~ ~~characters with normative line breaking classes by all conforming implementationsproperties~~ ~~must be consistent with the specification of the normative property.the same for all conformant implementations.~~

98~~/3.1~~/4/5

This paragraph was deleted. •· ~~Conformant implementations must not tailor characters with normative line breaking classesproperties~~ ~~to any of the informative line breaking classesproperties, but may tailor characters with informative line breaking classesproperties~~ ~~to one of the normative line breaking classes.properties.~~

99~~/3.1~~/4~~/4.0.1~~/5

This paragraph was deleted. •· ~~Higher-~~ ~~level protocols may further restrict, override, or extend the line breaking classesproperties~~ ~~of certain characters in some contexts.~~

99.1/4/5

This paragraph was deleted. 4.2 Line Breaking Algorithm

99.2/4~~/4.0.1/4.1~~/5

This paragraph was deleted. The specification of the Line Breaking Algorithm in this annex is informative. As stated in [Unicode] Section 3.2, Conformance Requirements, conformant implementations are not required to implement the Unicode Line Breaking Algorithm. The relationship between conformance to the Unicode Standard, and conformance to an individual Unicode Standard Annex (UAX) is described in more detail in the Unicode Standard in Section 3.2 Conformance Requirements..

99.3~~/4.0.1~~/5

This paragraph was deleted. There are many different ways to break lines of text, and the Unicode Standard does not restrict the ways in which implementations can do this. However, any Unicode-conformant implementation that purports to implement this specification must do so as described in the following clause. Implementations are free to deviate from this, as long as they do not purport to conform to this specification.

99.4~~/4.0.1/4.1~~/5

This paragraph was deleted. C1

An implementation that claims conformance to the default Unicode Line Breaking Algorithm shall produce the same results as the algorithm published in this specification. • As specified in Section 3.2, Conformance Requirements of [Unicode], Unicode specifications are generally described as an algorithm or process, producing a result from a given input. However, these are simply logical specifications; particular implementations can change or optimize the internal processing as long as they provide the same results from the same input.

99.5~~/4.0.1~~/5

This paragraph was deleted. C2

This specification defines default behavior, which is to be used in the absence of tailoring for particular languages and environments. • Where a particular environment requires tailoring, such modifications to this specification can be done without affecting conformance.

99.6~~/4.0.1/4.1~~/5

This paragraph was deleted. C3

If tailoring is used by an implementation that claims conformance to the default Unicode Line Breaking Algorithm, the existence of such tailoring must be documented. • This does not require that the tailoring be described in a reproducible manner; for example, a statement "'~~tailored to language X"~~' ~~is sufficient.~~

99.7~~/4.0.1/4.1~~/5

This paragraph was deleted. At times, this specification recommends best practice. These recommendations are not normative and conformance with this specification does not depend on their realization. These recommendations contain the expression ~~"We recommend ...",~~ ~~"This specification recommends ...", or some similar wording.~~

99.8/5/5.1

This paragraph was deleted. 4.3 Higher-Level Protocols

99.9/5/5.1

This paragraph was deleted. There are many different ways to break lines of text, and the Unicode Standard does not intend to unnecessarily restrict the ways in which implementations can do this. However, for characters that are encoded solely or primarily for their line breaking behavior, interpretation of these characters must be consistent with their semantics as defined by their normative line breaking behavior. This leads to the following permissible higher-level protocols:

99.10/5/5.1

This paragraph was deleted. ~~UAX14-HL1. Override rule 2 and report a break at the start of text.~~

99.11/5/5.1

This paragraph was deleted. • A higher-level protocol may report a break at the start of text (sot). As written, the rule is intended to ensure that the line breaking algorithm always produces lines that have at least one character in them. However, an analysis in terms of text boundaries would more naturally report a boundary at the sot, leaving it to any client to skip past that boundary in breaking lines.

99.12/5/5.1

This paragraph was deleted. ~~UAX14-HL2. Tailor any tailorable line break class.~~

99.13/5/5.1

This paragraph was deleted. ~~• A higher-level protocol may change the algorithm to produce results as if the membership of any tailorable line break class had been changed.~~

99.14/5/5.1

This paragraph was deleted. ~~UAX14-HL3. Override any rule in Section 6.2,Tailorable Line Breaking Rules, or add new rules to that section.~~

99.15/5/5.1

This paragraph was deleted. • A higher-level protocol my change the algorithm to produce results as if any of the rules in Section 6.2, Tailorable Line Breaking Rules, had been deleted or amended, or as if new rules had been added.

99.16/5/5.1

This paragraph was deleted. Because of the way the specification is set up, HL2 and HL3 have no effect on the results for text containing only characters of the non-tailorable line breaking classes. However, they allow for unrestricted tailoring of the results for texts containing only characters from the tailorable line breaking classes as well as wide latitude in defining the behavior of mixed texts.

99.17~~/5.1~~/5.2

Example: An XML~~xml~~ format provides markup which disables all line breaking over some span of text. When the markup is not in place, the default behavior is in conformance according to UAX14-C1. As long as the existence of the option is disclosed, that format can be said to conform to the Unicode Line Breaking Algorithm according to UAX14-C2.

99.18/5.1

As is the case for all other Unicode algorithms, this specification is a logical description—particular implementations can have more efficient mechanisms as long as they produce the same results. See C18 in Chapter 3, Conformance, of [Unicode]. While only disclosure of tailorings is required in the conformance clauses, documentation of the differences in behaviors is strongly encouraged.

100

5 Line Breaking Properties

101~~/3.1~~/4~~/4.0.1/4.1/5.1~~/5.2

This~~The main emphasis in this~~ section provides~~is to provide~~ detailed narrative descriptions of the line breaking behavior of many Unicode characters. Many descriptions in this section provide additional informative detail about handling a given character at the end of a line, or during line layout, which goes beyond the simple determination of line breaks. In some cases, the text also gives guidance as to preferred characters for achieving a particular effect~~many instances, the descriptions~~ in ~~this section provide additional informative detail about handling a given character at the end of a~~ line~~, or during line layout, which goes beyond the simple determination of line breaks. In some cases, the text also gives guidance as to preferred characters for achieving a particular effectfew instances, the descriptions~~ in ~~this section provide additional detail about handling a given character at the end of adescription of the~~ ~~line~~ breaking.~~, which goes beyond the simple determination of~~ ~~breaking behavior and summarizesto summarize~~ ~~the membership of character classes for each value of the~~ ~~line breaks.breaking property.~~ The full classification of all Unicode characters by their line breaking properties is available as the file LineBreaking.txt in the Unicode Character Database. This is a tab-delimited, three column plain text file, with code position, line breaking class and character name (for reference purpose only). The abbreviated way of listing the Ideographic, Hangul, Surrogate, and Private Use ranges is the same as in UnicodeData.txt.

101.1~~/4.1~~/5~~/5.2~~/13

This section also summarizes the membership of character classes corresponding to each value of the line breaking property. Note that the mnemonic names for ~~each value of~~ the line break classes are intended neither as exhaustive descriptions of their membership nor as indicators of their entire range of behaviors in the line breaking process. Instead, their main purpose is to serve as unique, yet broadly~~ing property. Note that the~~ mnemonic labels. In other words, as long as their~~names for the~~ line breaking behavior is identical, otherwise unrelated characters will be grouped together ~~classes are intended neither as exhaustive descriptions of their membership nor as indicators of their entire range of behaviors~~ in the same line break~~ing process. Instead, their main purpose is to serve as unique, yet broadly mnemonic labels. In other words, as long as their line break behavior is identical, otherwise unrelated characters will be~~ ~~found~~ ~~grouped together in the same line break~~ class.

102~~/3.1/4.1~~/5~~/5.2/10~~/13

{10.0.0: 147-A79}

The classification by property values~~properties~~ defined in this section and in the data file~~here~~ is used as input into the algorithm~~two algorithms~~ defined in Section 6, Line Breaking Algorithm. That~~This~~ section describes a workable default line breaking method.~~, and~~ Section 8, Customization, discusses how the~~7, Pair~~ -~~Table-Basedbased~~ ~~Implementation. These sections describebelow that implement~~ ~~workable~~ default line breaking ~~methods. Section 8, Customization, discusses how the defaultIn a few instances, the descriptions in this section provide additional detail about handling a given character at the end of a~~ ~~line breaking~~ behavior can be tailored to the needs of specific languages or for particular ~~languages for particular~~ document styles and user preferences. Permitted customizations can include changing the classification of characters for certain classes., ~~and~~ ~~which goes beyond the simple determination of line breaks.~~

102.0.1~~/13~~/16

{16.0.0: 180-C18, 180-A57; L2/24-162}

In addition to the line breaking properties defined in this section, the algorithm defined in Section 6, Line Breaking Algorithm also makes use of East_Asian_Width property values, defined in Unicode Standard Annex #11, East Asian Width [UAX11], as well as the General_Category and Extended_Pictographic properties. Note that for purposes of the line breaking algorithm, those ~~East_Asian_Width~~ property values are tailorable, as are the rules of the line breaking algorithm which use them. (See rules LB15a, LB15b, LB19, LB19a, LB21a,~~rule~~ LB30, and LB30b.)

102.1/3.1

Data File

102.2~~/3.1/3.2/4.1~~/5/11

{11.0.0: 154-A128; L2/18-009#ID20171110171601}

The full classification of all Unicode characters by their line breaking properties, ~~as of the time of publication of this document,~~ is available in the ~~current version of the~~ file LineBreak.txt [Data14~~Data~~] in the Unicode Character Database [UCD]. This is a semicolon~~tab~~-delimited, two- column, plain text file, with code position, and line breaking class. A comment at the end of each line indicates the character name. ~~Ideographic, Hangul, Surrogate, and Private Use ranges are collapsed by giving a range in the first column.~~

102.2.-1.1/16

{16.0.0: 180-A59; L2/24-162}

The same data, but with a more explicit listing of code point ranges with complex default values, is available in the file DerivedLineBreak.txt [Data14Derived].

102.2.0.1/5.2

The line break property assignments from the data file are normative. The descriptions of the line break classes in this UAX include examples of representative or interesting characters for each class, but for the complete list always refer to the data file.

102.2.1/5

Future Updates

102.3~~/3.1/3.2~~/4~~/4.1~~/5~~/5.1~~/5.2

As scripts are added to the Unicode Standard and become more widely implemented, line breaking classes may be~~scripts are~~ added or the assignment of line breaking class may be changed for some characters. Implementers must not make any assumptions to the contrary. Any future updates will~~Unicode Standard and become, and~~ ~~more widely implemented and used on computers,scripts become~~ ~~more~~ ~~widely implemented and used on computers, more~~ ~~line breaking classes may~~ be ~~added~~, ~~or the assignment of line breakinglinebreak~~ ~~class may be changed for some characters. Implementers mustImplementations should~~ ~~not make any assumptions to the contrary. Any future updates will be~~ reflected in the latest version of the data file. (See the Unicode Character Database [UCD] for any specific version of the data file.~~datafile~~).

103

5.1 Description of Line Breaking Properties

104~~/3.1~~/4~~/4.1~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853} {16.0.0: 180-A59; L2/24-162}

Line breaking classes~~properties~~ are listed alphabetically. For each~~Each~~ line breaking class, the rules that explicitly reference that class are listed in italics above the description of the class. Note that characters in these classes may be involved in other rules; for instance, rule LB31 can apply to characters with almost any line breaking class, but it does not list any line breaking class explicitly.~~property~~ ~~is marked with an annotation in parentheses with the following meanings:parenthesis for easy reference showing that...~~

104.1/8/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853} {16.0.0: 180-A59; L2/24-162}

This paragraph was deleted. ~~Label~~

~~Meaning for the Class~~

105~~/3.1~~/4~~/4.0.1~~/5/8/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853} {16.0.0: 180-A59; L2/24-162}

This paragraph was deleted. (A)

It —- ~~the classproperty~~ ~~allows a break opportunity after in specified contexts.~~

106~~/3.1~~/4~~/4.0.1~~/5/8/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853} {16.0.0: 180-A59; L2/24-162}

This paragraph was deleted. (~~XA)~~

It —- ~~the classproperty~~ ~~prevents a break opportunity after in specified contexts.~~

107~~/3.1~~/4~~/4.0.1~~/5/8/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853} {16.0.0: 180-A59; L2/24-162}

This paragraph was deleted. (B)

It —- ~~the classproperty~~ ~~allows a break opportunity before in specified contexts.~~

108~~/3.1~~/4~~/4.0.1~~/5/8/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853} {16.0.0: 180-A59; L2/24-162}

This paragraph was deleted. (~~XB)~~

It —- ~~the classproperty~~ ~~prevents a break opportunity before in specified contexts.~~

109~~/3.1~~/4~~/4.0.1~~/5/8/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853} {16.0.0: 180-A59; L2/24-162}

This paragraph was deleted. (P)

It —- ~~the classproperty~~ ~~allows a break opportunity for a pair of same characters.~~

110~~/3.1~~/4~~/4.0.1~~/5/8/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853} {16.0.0: 180-A59; L2/24-162}

This paragraph was deleted. (~~XP)~~

It —- ~~the classproperty~~ ~~prevents a break opportunity for a pair of same characters.~~

110.1~~/3.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853} {16.0.0: 180-A59; L2/24-162}

This paragraph was deleted. ~~NoteNOTE~~: The use of the letters B and A in these annotations marks the position of the break opportunity relative to the character. It is not to be confused with the use of the same letters in the other parts of this annex~~document, where they indicate the positionsposition~~ ~~of the characters relative to the break opportunity.~~

111~~/4.0.1~~/5

AI: —- Ambiguous (Alphabetic or Ideograph)

111.0.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB1

111.1~~/4.0.1/4.1~~/5~~/5.1/5.2~~/6.2

Some~~Ambiguous~~ characters that ordinarily act ~~either~~ like alphabetic ~~or symbol~~ characters ~~(which havethat is,are~~ ~~those~~ ~~that can act either like alphabetic characters (i.e. those~~ ~~with~~ ~~the AL line breaking class)~~ are treatedor like ideographs (~~that is~~or ~~characters with~~ line breaking class ID) in certain East Asian legacy contexts. Their line breaking behavior therefore depends~~, depending~~ on the context. In the absence of appropriate context information, they are treated as class AL;, ~~but~~ see the note at the end of this description..

112~~/3.1~~/4~~/4.0.1~~/5~~/5.1~~/7/16

{16.0.0: 180-C17, 180-A55, 180-A56; L2/24-162}

As originally defined until Unicode Version 3.1.0, the line break~~this~~ class AI contained all characters~~Characters~~ with East_Asian_Width value~~East Asian Width property~~ A (ambiguous width) that~~, and which~~ would otherwise be AL in this classification. ~~They~~, ~~take~~ on ~~the AL line breaking class only when their resolved width is N (narrow) and take the line breaking class ID~~ ~~line breaking class~~, ~~when their resolved width is W (wide).~~ For more information on East_Asian_Width~~East Asian Width~~, and how to resolve it, see Unicode Standard Annex~~Technical Report~~ #11, East Asian Width [UAX11~~EAW~~]. ~~In the absence of information needed to resolve their East Asian Width, they are treated as class AL.~~.

112.0.1~~/4.0.1~~/5/16

{16.0.0: 180-C17, 180-A55, 180-A56; L2/24-162}

The original definition included many Latin, Greek, and Cyrillic characters. Since Unicode Version 4.0.1, these~~These~~ characters are ~~now~~ classified by ~~for which a~~ default as~~assignment of the~~ AL because use of the AL line breaking class better corresponds to modern practice. Where strict compatibility with older legacy implementations is desired, some of these~~At the same time, the set of ambiguous~~ characters need~~has been extended~~ to be treated as ID in certain contexts. This can be done by always tailoring them to ID or by continuing to classify them as AI and resolving them to ID where required.~~completely encompass the enclosed alphanumeric characters used for numbering of bullets.~~

112.0.1.1/5

As part of the same revision, the set of ambiguous characters has been extended to completely encompass the enclosed alphanumeric characters used for numbering of bullets.

112.0.2~~/4.0.1/4.1~~/5/16

{16.0.0: 180-C17, 180-A55, 180-A56; L2/24-162}

In Unicode Version 4.0.1~~As updated~~, the AI line breaking class therefore included~~includes~~ all characters with East Asian Width A that are outside~~the exception ofthis line breaking class includes all~~ ~~characters~~ ~~with East Asian Width W, except those~~ in the range U+0000..U+1FFF, plus the following~~this line breaking class includes all~~ characters: ~~with East Asian Width A, plus the following characters:~~

112.0.3/4.0.1

24EA	CIRCLED DIGIT ZERO

112.0.4/4.0.1

2780..2793

DINGBAT CIRCLED SANS-SERIF DIGIT ONE..DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN

112.1/4/5/5.1

This paragraph was deleted. ~~The line breaking rules in Section 6, Line Breaking Algorithm, and the pair table in Section 7, Pair Table-Basedbased~~ ~~Implementation, assume that all ambiguous characters have been resolved appropriately as part of assigning line breaking classes to the input characters.~~

112.2/5/5.1

This paragraph was deleted. Note: Normally characters with class AI are resolved to either ID or AL. However, the following two characters are used as punctuation marks in Spanish, where they would behave more like a character of class OP. Implementations might therefore wish to tailor these characters to class OP for use in Spanish.

112.3/5/5.1

This paragraph was deleted. ~~00A1~~

~~INVERTED EXCLAMATION MARK~~

112.4/5/5.1

This paragraph was deleted. ~~00BF~~

~~INVERTED QUESTION MARK~~

112.4.1/16

{16.0.0: 180-C17, 180-A55, 180-A56; L2/24-162}

Since that time, the East_Asian_Width and Line_Break properties have been maintained independently, with the latter being based on the need for language-specific line-breaking behavior rather than compatibility with legacy encodings. In particular, all vulgar fractions have Line_Break=AI.

112.5~~/5.1/5.2~~/10

{10.0.0: 147-A79}

Characters with the line break class AI with East_Asian_Width value A typically take the AL line breaking class when their resolved East_Asian_Width is N (narrow) and take the line breaking class ID when their resolved width is W (wide). The remaining characters are then resolved to AL or ID in a consistent fashion. The details of this resolution are not specified in this annex. The line breaking rules in Section 6, Line Breaking Algorithm~~, and the pair table in Section 7, Pair Table-Based Implementation,~~ merely require that all ambiguous characters be resolved~~have been resolved~~ appropriately as part of assigning line breaking classes to the input characters.

112.6~~/5.1~~/5.2

Note: The canonical decompositions of characters of class AI are not necessarily of class AI themselves. The East_Asian_Width property A on which the definition of AI is largely based, does not preserve canonical equivalence. In the context of line breaking, the fact that a character has been assigned class AI means that the line break implementation must resolve it to either AL, or ID, in the absence of further tailoring. If preserving canonical equivalence is desired, an implementation is free to make sure that the resolved line break classes~~conversely. The East_Asian_Width property A on which the definition of AI is largely based, does not~~ preserve canonical equivalence. Unless compatibility with particular legacy behavior is important, it may be sufficient to map all such characters to AL. This achieves~~In the context of line breaking, the fact that~~ a canonically equivalent resolution of line breakingcharacter has been assigned class AI means that the line break implementation must resolve it to either AL or ID, in the absence of further tailoring. If preserving canonical equivalence is desired, an implementation is free to make sure that the resolved line break classes, and is compatible ~~preserve canonical equivalence. Unless compatibility~~ with ~~particular legacy behavior is important, it may be sufficient to map all such characters to AL. This achieves a canonically equivalent resolution of line breaking classes, and is compatible with~~ emerging modern practice that treats these characters increasingly like regular alphabetic characters.

112.7~~/15.1~~/16

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535} {16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

AK: Aksara (XB/XA)

112.7.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB28a

112.8/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

The AK line break class is used for scripts that use the Brahmic style of context analysis and have a virama of Indic syllabic category Virama or Invisible_Stacker. It contains characters that can occur as the bases of orthographic syllables and can also follow a virama of Indic syllabic category Virama or Invisible_Stacker within the same orthographic syllable. Depending on the script, this may include characters with the Indic syllabic categories Consonant, Vowel_Independent, or Number.

112.9/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1B05..1B33

BALINESE LETTER AKARA..BALINESE LETTER HA

112.10/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1B45..1B4C

BALINESE LETTER KAF SASAK..BALINESE LETTER ARCHAIC JNYA

112.11/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

A984..A9B2

JAVANESE LETTER A..JAVANESE LETTER HA

112.12/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11005..11037

BRAHMI LETTER A..BRAHMI LETTER OLD TAMIL NNNA

112.13/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11071..11072

BRAHMI LETTER OLD TAMIL SHORT E..BRAHMI LETTER OLD TAMIL SHORT O

112.14/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11075

BRAHMI LETTER OLD TAMIL LLA

112.15/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11305..1130C

GRANTHA LETTER A..GRANTHA LETTER VOCALIC L

112.16/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1130F..11310

GRANTHA LETTER EE..GRANTHA LETTER AI

112.17/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11313..11328

GRANTHA LETTER OO..GRANTHA LETTER NA

112.18/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1132A..11330

GRANTHA LETTER PA..GRANTHA LETTER RA

112.19/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11332..11333

GRANTHA LETTER LA..GRANTHA LETTER LLA

112.20/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11335..11339

GRANTHA LETTER VA..GRANTHA LETTER HA

112.21/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11360..11361

GRANTHA LETTER VOCALIC RR..GRANTHA LETTER VOCALIC LL

112.22/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11F04..11F10

KAWI LETTER A..KAWI LETTER O

112.23/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11F12..11F33

KAWI LETTER KA..KAWI LETTER JNYA

113~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

AL: —- Ordinary Alphabetic and Symbol Characters (XP)

113.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB1, LB10, LB20a, LB23, LB24, LB28, LB29, LB30

114~~/4.0.1/4.1~~/5

Ordinary characters require~~Require~~ other characters to provide break opportunities;, otherwise, no line breaks are allowed between pairs of them. However, this behavior is tailorable. In some Far Eastern documents, it may be desirable to allow breaking between pairs of ordinary characters—, particularly Latin~~. However, this is tailorable. In some Far Eastern documents it may be desirable to allow breaking between pairs of ordinary~~ characters~~, particularly Latin characters~~ and symbols..

115~~/4.1~~/5

Note~~NOTE~~: Use~~use~~ ZWSP as a manual override to provide break opportunities around alphabetic or symbol characters.

115.1~~/4.0.1~~/6.2

This class contains alphabetic or symbolic characters not~~Except as listed~~ explicitly assigned to~~below as part of~~ another line breaking class. These are primarily characters of~~, and except as assigned class AI or ID based on East Asian Width, this class contains~~ the following categories:~~characters:~~

115.2/8

AP: Aksara Pre-Base (B/XA)

116.9.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB28a

116.10/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

The AP line break class is only used for scripts that use the Brahmic style of context analysis. It contains the characters of such scripts that are part of an orthographic syllable but in logical order precede the base or any half-forms. This includes characters with the Indic syllabic categories Consonant_Preceding_Repha, Consonant_With_Stacker, and Consonant_Prefixed.

116.11/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11003..11004

BRAHMI SIGN JIHVAMULIYA..BRAHMI SIGN UPADHMANIYA

116.12/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11F02

KAWI SIGN REPHA

116.13~~/15.1~~/16

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535} {16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

AS: Aksara Start (XB/XA)

116.13.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB28a

116.14~~/15.1~~/16

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535} {16.0.0: 177-C46, 177-A113; L2/23-234}

The AS line break class is only used for scripts that use the Brahmic style of context analysis. It contains characters that can occur as the bases of orthographic syllables, but cannot follow a virama of Indic syllabic category Virama or Invisible_Stacker within the same orthographic syllable. Depending on the script, this may include characters with the Indic syllabic categories Consonant, Vowel_Independent, ~~Number~~, and several others. This class also contains all digits of scripts that use the Brahmic style of line breaking; in some of these scripts, such as Brahmi or Kawi, digits can occur as bases of orthographic syllables.

116.14.1/16

{16.0.0: 177-C46, 177-A113; L2/23-234}

1B50..1B59

BALINESE DIGIT ZERO..BALINESE DIGIT NINE

116.15/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1BC0..1BE5

BATAK LETTER A..BATAK LETTER U

116.15.1/16

{16.0.0: 177-C46, 177-A113; L2/23-234}

A9D0..A9D9

JAVANESE DIGIT ZERO..JAVANESE DIGIT NINE

116.16/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

AA00..AA28

CHAM LETTER A..CHAM LETTER HA

116.16.1/16

{16.0.0: 177-C46, 177-A113; L2/23-234}

AA50..AA59

CHAM DIGIT ZERO..CHAM DIGIT NINE

116.17/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11066..1106F

BRAHMI DIGIT ZERO..BRAHMI DIGIT NINE

116.18/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11350

GRANTHA OM

116.19/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1135E..1135F

GRANTHA LETTER VEDIC ANUSVARA..GRANTHA LETTER VEDIC DOUBLE ANUSVARA

116.19.1/16

{16.0.0: 177-C46, 177-A113; L2/23-234}

11950..11959

DIVES AKURU DIGIT ZERO..DIVES AKURU DIGIT NINE

116.20/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11EE0..11EF1

MAKASAR LETTER KA..MAKASAR LETTER A

116.21/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11F50..11F59

KAWI DIGIT ZERO..KAWI DIGIT NINE

117~~/4.0.1~~/5~~/5.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

BA: —- Break Opportunity After (A)

117.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB12a, LB21, LB21a

118/4/5

Like ~~the~~ SPACE, the characters in this class provide a break opportunity;SP~~, but~~ unlike SPACE, they do~~are~~ not take part in determining indirect breaks. They can be subdivided into several categories.

118.1/3.1

Breaking Spaces

119~~/3.1~~/4~~/4.1~~/5/5.2

Breaking spaces are a~~theThe~~ ~~following~~ subset of~~These~~ characters with General_Category~~General Categorycategory~~ Zs. Examples include::

119.1/4.0.1

1680	OGHAM SPACE MARK

122/4

2002	EN SPACE~~QUAD~~

123/4

2003	EM SPACE~~QUAD~~

124

2004	THREE-PER-EM SPACE

125

2005	FOUR-PER-EM SPACE

126

2006	SIX-PER-EM SPACE

127

2008	PUNCTUATION SPACE

128

2009	THIN SPACE

129

200A	HAIR SPACE

129.1/4

205F	MEDIUM MATHEMATICAL SPACE

129.2/6.3

3000	IDEOGRAPHIC SPACE

130~~/3.1~~/4~~/4.1~~/5

All of these~~The preceding list of~~ space characters ~~all~~ have a specific width, but otherwise behave ~~otherwise~~ as breaking spaces. In setting a justified line, none of these spaces normally changes in width~~none of these spaces~~, except for THIN SPACE when used in mathematical notation~~, will change in width~~. See also the SP property.

130.1/4/4.1

This paragraph was deleted. ~~See the ID property for U+3000 IDEOGRAPHIC SPACE. For a list of all space characters in the Unicode Standard, see Section 6.2 in [Unicode].~~

130.2~~/4.0.1/5.1~~/5.2

The OGHAM SPACE MARK~~Ogham space mark~~ may beis rendered visibly between words but it is recommended that it~~should~~ be elided at the end of a line. For more information, see Section 5.7,, Word Separator Characters.

130.3~~/4.1~~/5~~/5.2~~/6/6.3

~~See the ID property for U+3000 IDEOGRAPHIC SPACE.~~ For a list of all space characters in the Unicode Standard, see Section 6.2, General Punctuation, in [Unicode~~Unicode5.2~~0~~Unicode~~].

132/4

This paragraph was deleted. ~~Except for the effect of the location of the tabstops, the tab character acts similarly to a space for the purpose of line breaking.~~

133.1/4

Except for the effect of the location of the tab stops, the tab character acts similarly to a space for the purpose of line breaking.

133.2/4

Conditional Hyphens

133.3/4

00AD	SOFT HYPHEN (SHY)

133.4/4~~/4.1~~/5~~/5.1~~/6

SHY is an invisible format character with no width. It marks the place where an optional line break may occur inside~~place where~~ a word. It can be used with all scripts. If a line is broken at an optional line break position marked by a SHY, the text at that line break position often ~~is rendered invisibly and~~ has ~~no width: it merely indicates an optional line break. The rendering of the optional line break depends on the script. For the Latin script, rendering the line break typically means displaying~~ ~~may occur inside~~ a modified appearance as described~~hyphen at the end of the line; however, some languages require a change~~ in ~~spelling surroundingword. It can be used with all scripts. SHY is rendered invisibly and has no width:~~, ~~it merely indicates~~ ~~an optional line break.~~ ~~The rendering of the optional line break depends on the script.~~ ~~For~~ ~~the Latin script, rendering the line break typically means displaying a hyphen at the end of the line;~~, ~~however, some languages require a change in spelling surrounding a line break. For~~ ~~examples, see~~ Section 5.43, Use~~Additional Details on use~~ of Soft Hyphen.

134

Breaking Hyphens

135

Breaking hyphens establish explicit break opportunities immediately after each occurrence.

136~~/3.1~~/4

This paragraph was deleted. There are three types of hyphens: Explicit hyphens, conditional hyphens, and dictionary-inserted hyphens (as a result of a hyphenation process). There is no character code for the third kind of hyphen; therefore if it is desired to make the distinction, the fact that a hyphen is dictionary-inserted ~~hyphens~~ ~~must be represented out of band, or by usingwith~~ ~~a privately assigned control code instead of SHY.~~.

136.1/3.1

058A	ARMENIAN HYPHEN

138/3.1

This paragraph was deleted. ~~058A~~

~~ARMENIAN HYPHEN~~

138.1/3.1

2012	FIGURE DASH

138.2~~/3.1~~/4

2013

EN -DASH

139/4~~/4.1~~/5.1

Hyphens are graphic characters with width. Because~~Since~~, unlike spaces, they are visible~~print~~, they are included in the measured part of the preceding line, except where the layout style allows hyphens to hang into the margins. For additional information about how to format line breaks resulting from the presence of hyphens, see Section 5.3, Use of Hyphen.

140/4

This paragraph was deleted. ~~00AD~~

~~SOFT HYPHEN (SHY)~~

141/4

This paragraph was deleted. SHY is rendered invisibly and has no width, except at a line break. The rendering of the soft hyphen depends on the script. For the Latin script it is rendered as a hyphen, however, some languages require a change in spelling surrounding an optional hyphen, if it occurs at a line break. For example in Swedish the word “tuggummi” changes to “tugg-gummi” when hyphenated.

141.1/4.1

Visible Word Dividers

142/4

This paragraph was deleted. The action of a hyphenation algorithm is equivalent to the insertion of a SHY. However, when a word contains an explicit SHY it is customarily treated as overriding the action of the hyphenator for that word.

142.1~~/4.0.1/4.1~~/5.2

The following are examples of other~~Other~~ forms of visible word dividers that provide break opportunities:.

142.2/5

05BE	HEBREW PUNCTUATION MAQAF

143

0F0B	TIBETAN MARK INTERSYLLABIC TSHEG

144

1361	ETHIOPIC WORDSPACE

145/4.0.1

This paragraph was deleted. ~~1680~~

~~OGHAM SPACE MARK~~

146/5.1

This paragraph was deleted. ~~17D5~~

~~KHMER SIGN BARIYOOSAN~~

146.1~~/4.0.1~~/5.1

This paragraph was deleted. ~~10100~~

~~AEGEAN WORD SEPARATOR LINE~~

146.2~~/4.0.1~~/5.1

This paragraph was deleted. ~~10101~~

~~AEGEAN WORD SEPARATOR DOT~~

146.3~~/4.0.1~~/5.1

This paragraph was deleted. ~~10102~~

~~AEGEAN CHECK MARK~~

146.4~~/4.0.1~~/5.1

This paragraph was deleted. ~~1039F~~

~~UGARITIC WORD DIVIDER~~

146.5/5/5.1

This paragraph was deleted. ~~103D0~~

~~OLD PERSIAN WORD DIVIDER~~

146.6/5/5.1

This paragraph was deleted. ~~12470~~

~~CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER~~

146.7/5.1

17D8	KHMER SIGN BEYYAL

146.8/5.1

17DA	KHMER SIGN KOOMUUT

147~~/4.1~~/5/5.1

The Tibetan tsheg~~thseg~~ is a visible mark, but it functions effectively like a space to separate words (or other units) in Tibetan. It provides a break opportunity after itself~~, like space~~. For additional information, see Section 5.65,4 Tibetan Line Breaking.

148~~/3.1/4.1~~/5~~/5.1~~/5.2

The ETHIOPIC WORDSPACE is a visible~~EthiopicethiopicEthiopian~~ word ~~space is a visible word~~ delimiter and is kept on the previous line. In contrast, U+1360 ETHIOPIC SECTION MARK is typically used in a sequence of several such marks on a separate line, and separated by spaces. As such lines are typically marked with separate hard line breaks (BK), the section mark is treated like an ordinary symbol and given line break class AL. ~~before.~~

149/4.0.1

This paragraph was deleted. ~~The Ogham space mark is rendered visibly between words but should be elided at the end of a line.~~

150

2027	HYPHENATION POINT

151/4~~/4.0.1/4.1~~/5/5.1

A hyphenation~~Hyphenation~~ point is a raised dot, which is mainly~~primarily~~ used in dictionaries and similar works ~~primarily~~ to visibly indicate syllabification of words. Syllable breaks frequently also are potential line break~~ing~~ opportunities in the middle of words. ~~It is mainly used in dictionaries and similar works.~~ When an actual~~to visibly indicate syllabification of words. Syllable breaks are potential~~ line break falls inside a word containing~~ing opportunities in the middle of words. The~~ hyphenation point characters, the hyphenation point is usually rendered as a regular hyphen at the end of the~~It is mainly used in dictionaries and similar works. When an actual~~ line. ~~break falls inside a word containing hyphenation point characters, the hyphenation point is rendered as a regular hyphen at the end of the line.~~

152

007C	VERTICAL LINE

153~~/5.1~~/5.2

In some dictionaries, a vertical bar is used instead of a hyphenation point. In this usage, U+0323 COMBINING DOT BELOW is used to mark stressed syllables, so all breaks are marked by the vertical bar. For an actual line break ~~opportunity,~~ the vertical bar is rendered as a hyphen at the end of the line.~~in such usage.~~.

153.1~~/4.1~~/5.1

Historic Word Separators

153.2~~/4.1~~/5/5.1

Historic texts, especially ancient ones, often do not use spaces, even for scripts where modern use of spaces is standard. Special punctuation was used to mark word boundaries in such texts. For modern text processing it is recommended to treat these as line break opportunities by default. WJ can~~should~~ be used to override this~~treated as line breaklinebreak~~ ~~opportunities by~~ default~~. WJ can be used to override this default~~, where necessary.

153.2.1/5.2

Examples of Historic Word Separators include:

153.3~~/4.1~~/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

16EB	RUNIC SINGLE ~~DOT~~ PUNCTUATION

153.4~~/4.1~~/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

16EC	RUNIC MULTIPLE ~~DOT~~ PUNCTUATION

153.5/4.1

16ED	RUNIC CROSS PUNCTUATION

153.6/4.1

2056	THREE DOT PUNCTUATION

153.7/4.1

2058	FOUR DOT PUNCTUATION

153.8/4.1

2059	FIVE DOT PUNCTUATION

153.9/4.1

205A	TWO DOT PUNCTUATION

153.10/4.1

205B	FOUR DOT MARK

153.11/4.1

205D

TRICOLON

153.12/4.1

205E	VERTICAL FOUR DOTS

153.12.1/5.1

2E19	PALM BRANCH

153.12.2/5.1

2E2A	TWO DOTS OVER ONE DOT PUNCTUATION

153.12.3/5.1

2E2B	ONE DOT OVER TWO DOTS PUNCTUATION

153.12.4/5.1

2E2C	SQUARED FOUR DOT PUNCTUATION

153.12.5~~/5.1~~/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

2E2D	FIVE DOT MARK~~PUNCTUATION~~

153.12.6/5.1

2E30	RING POINT

153.12.7/5.1

10100

AEGEAN WORD SEPARATOR LINE

153.12.8/5.1

10101

AEGEAN WORD SEPARATOR DOT

153.12.9/5.1

10102

AEGEAN CHECK MARK

153.12.10/5.1

1039F

UGARITIC WORD DIVIDER

153.12.11/5.1

103D0

OLD PERSIAN WORD DIVIDER

153.12.12~~/5.1~~/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

1091F

PHOENICIAN WORD SEPARATOR~~DIVIDER~~

153.12.13/5.1

12470

CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER

153.14/4.1

DEVANAGARI DANDA is similar to a full stop. The danda or historically related symbols are used with several other Indic scripts. Unlike a full stop, the danda is not used in number formatting. DEVANAGARI DOUBLE DANDA marks the end of a verse. It also has analogues in other scripts.

153.14.1~~/5.2~~/6

Examples of dandas~~Dandas~~ include:

153.15/4.1

0964	DEVANAGARI DANDA

153.16/4.1

0965	DEVANAGARI DOUBLE DANDA

153.17/4.1

0E5A	THAI CHARACTER ANGKHANKHU

153.17.1/5.1

0E5B	THAI CHARACTER KHOMUT

153.18/4.1

104A	MYANMAR SIGN LITTLE SECTION

153.19/4.1

104B	MYANMAR SIGN SECTION

153.20/4.1

1735	PHILIPPINE SINGLE PUNCTUATION

153.21/4.1

1736	PHILIPPINE DOUBLE PUNCTUATION

153.22/4.1

17D4	KHMER SIGN KHAN

153.23/4.1

17D5	KHMER SIGN BARIYOOSAN

153.24~~/4.1~~/5.1

This paragraph was deleted. ~~17D8~~

~~KHMER SIGN BEYYAL~~

153.25~~/4.1~~/5.1

This paragraph was deleted. ~~17DA~~

~~KHMER SIGN KOOMUUT~~

153.25.1/5.1

1B5E	BALINESE CARIK SIKI

153.25.2/5.1

1B5F	BALINESE CARIK PAREREN

153.25.3/5.1

A8CE	SAURASHTRA DANDA

153.25.4/5.1

A8CF	SAURASHTRA DOUBLE DANDA

153.25.5/5.1

AA5D	CHAM PUNCTUATION DANDA

153.25.6/5.1

AA5E	CHAM PUNCTUATION DOUBLE DANDA

153.25.7/5.1

AA5F	CHAM PUNCTUATION TRIPLE DANDA

153.26/4.1

10A56

KHAROSHTHI PUNCTUATION DANDA

153.27/4.1

10A57

KHAROSHTHI PUNCTUATION DOUBLE DANDA

153.29~~/4.1~~/5.1

This paragraph was deleted. ~~0F85~~

~~TIBETAN MARK PALUTA~~

153.30/4.1

0F34	TIBETAN MARK BSDUS RTAGS

153.31/4.1

0F7F	TIBETAN SIGN RNAM BCAD

153.31.1/5.1

0F85	TIBETAN MARK PALUTA

153.32/4.1

0FBE	TIBETAN KU RU KHA

153.33/4.1

0FBF	TIBETAN KU RU KHA BZHI MIG CAN

153.33.1/5.1

0FD2	TIBETAN MARK NYIS TSHEG

153.34~~/4.1~~/5/5.1

For additional information, see Section 5.65, Tibetan Line Breaking.

153.35/4.1

Other Terminating Punctuation

153.36~~/4.1~~/5.2

Termination punctuation stays with the line, but otherwise allows a break after it. This is similar to EX, except that the latter may be separated by a space from the preceding word without allowing a break, whereas these marks are used without spaces. Terminating punctuation includes:

153.37~~/4.1~~/5.1

This paragraph was deleted. ~~1802~~

~~MONGOLIAN COMMA~~

153.38~~/4.1~~/5.1

This paragraph was deleted. ~~1803~~

~~MONGOLIAN FULL STOP~~

153.39/4.1

1804	MONGOLIAN COLON

153.40/4.1

1805	MONGOLIAN FOUR DOTS

153.41~~/4.1~~/5.1

This paragraph was deleted. ~~1808~~

~~MONGOLIAN MANCHU COMMA~~

153.42~~/4.1~~/5.1

This paragraph was deleted. ~~1809~~

~~MONGOLIAN MANCHU FULL STOP~~

153.43~~/4.1~~/5.1

This paragraph was deleted. ~~1A1E~~

~~BUGINESE PALLAWA~~

153.44~~/4.1~~/5.1

This paragraph was deleted. ~~2CF9~~

~~COPTIC OLD NUBIAN FULL STOP~~

153.44.1/5.1

1B5A	BALINESE PANTI

153.44.2/5.1

1B5B	BALINESE PAMADA

153.44.3~~/5.1~~/5.2

This paragraph was deleted. ~~1B5C~~

~~BALINESE WINDU~~

153.44.4/5.1

1B5D	BALINESE CARIK PAMUNGKAH

153.44.5/5.1

1B60	BALINESE PAMENENG

153.44.6/5.1

1C3B	LEPCHA PUNCTUATION TA-ROL

153.44.7/5.1

1C3C	LEPCHA PUNCTUATION NYET THYOOM TA-ROL

153.44.8/5.1

1C3D	LEPCHA PUNCTUATION CER-WA

153.44.9/5.1

1C3E	LEPCHA PUNCTUATION TSHOOK CER-WA

153.44.10/5.1

1C3F	LEPCHA PUNCTUATION TSHOOK

153.44.11/5.1

1C7E	OL CHIKI PUNCTUATION MUCAAD

153.44.12/5.1

1C7F	OL CHIKI PUNCTUATION DOUBLE MUCAAD

153.45/4.1

2CFA	COPTIC OLD NUBIAN DIRECT QUESTION MARK

153.46/4.1

2CFB	COPTIC OLD NUBIAN INDIRECT QUESTION MARK

153.47/4.1

2CFC	COPTIC OLD NUBIAN VERSE DIVIDER

153.48~~/4.1~~/5.1

This paragraph was deleted. ~~2CFE~~

~~COPTIC FULL STOP~~

153.49/4.1

2CFF	COPTIC MORPHOLOGICAL DIVIDER

153.49.1/5.1

2E0E..2E15

EDITORIAL CORONIS..UPWARDS ANCORA

153.49.2~~/5.1~~/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

2E17	DOUBLE OBLIQUE ~~DOUBLE~~ HYPHEN

153.49.3/5.1

A60D

VAI COMMA

153.49.4/5.1

A60F	VAI QUESTION MARK

153.49.5/5.1

A92E	KAYAH LI SIGN CWI

153.49.6/5.1

A92F	KAYAH LI SIGN SHYA

153.50/4.1

10A50

KHAROSHTHI PUNCTUATION DOT

153.51/4.1

10A51

KHAROSHTHI PUNCTUATION SMALL CIRCLE

153.52/4.1

10A52

KHAROSHTHI PUNCTUATION CIRCLE

153.53/4.1

10A53

KHAROSHTHI PUNCTUATION CRESCENT BAR

153.54/4.1

10A54

KHAROSHTHI PUNCTUATION MANGALAM

153.55/4.1

10A55

KHAROSHTHI PUNCTUATION LOTUS

153.56/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11EF7..11EF8

MAKASAR PASSIMBANG..MAKASAR END OF SECTION

153.57/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Letters Attached to Orthographic Syllables

153.58/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

In scripts that use the Brahmic style of line breaking, most characters that attach to the initial consonant cluster of an orthographic syllable and are part of that syllable are encoded as combining marks. These have line break class CM. Sometimes, however, additional characters with general category Lo or Lm, such as final consonants or vowel lengtheners, should remain attached to the preceding orthographic syllable. They are then assigned line break class BA.

153.59/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

A9CF	JAVANESE PANGRANGKEP

153.60/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

AA40..AA42

CHAM LETTER FINAL K..CHAM LETTER FINAL NG

153.61/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

AA44..AA4B

CHAM LETTER FINAL CH..CHAM LETTER FINAL SS

153.62/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1133D

GRANTHA SIGN AVAGRAHA

153.63/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1135D

GRANTHA SIGN PLUTA

153.64/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11EF2

MAKASAR ANGKA

154~~/4.0.1~~/5~~/5.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

BB: —- Break Opportunities Beforeopportunities before characters (B)

154.0.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB21

154.1/5

Characters of this line break class move to the next line at a line break and thus provide a line break opportunity before.

154.1.1/5.2

Examples of BB characters are described in the following sections.

154.2/5

Dictionary Use

155/3.1

00B4	ACUTE~~ACCUTE~~ ACCENT

155.1/5.1

1FFD	GREEK OXIA

156~~/4.1~~/5.1

In some dictionaries, stressed syllables are indicated with a spacing acute accent instead of the hyphenation point. In this case the accent moves~~would move~~ to the next line, and the preceding line ends~~ended~~ with a hyphen. The oxia is canonically equivalent to the acute accent.

156.1/5.1

02DF	MODIFIER LETTER CROSS ACCENT

156.2/5.1

A cross accent also appears in some dictionaries to mark the stress of the following syllable, and should be handled in the same way as the other stress marking characters in this section. The accent should not be separated from the syllable it marks by a break.

157

02C8	MODIFIER LETTER VERTICAL LINE

158

02CC	MODIFIER LETTER LOW VERTICAL LINE

159~~/3.1~~/4.1

These characters are used in dictionaries to indicate stress and secondary stress when IPA is used. Both are prefixes to the stressed syllable in IPA. Breaking before~~Therefore, the only sensible way to break~~ them keeps~~is to keep~~ them with the syllable.;, ~~that is to break before them.~~

160~~/3.1/4.1~~/5

Note~~NOTE~~: It is hard to find actual examples in most dictionaries because~~, since~~ the pronunciation fields usually occur right after the headword~~head word~~, and the columns are wide enough to prevent line breaks in most~~the~~ pronunciations.

160.0.1/5/5.1

Tibetan and Phags-Pa Head Letters

160.1/4.1

0F01	TIBETAN MARK GTER YIG MGO TRUNCATED A

160.2/4.1

0F02	TIBETAN MARK GTER YIG MGO -UM RNAM BCAD MA

160.3/4.1

0F03	TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA

160.4/4.1

0F04	TIBETAN MARK INITIAL YIG MGO MDUN MA

160.5/4.1

0F06	TIBETAN MARK CARET YIG MGO PHUR SHAD MA

160.6/4.1

0F07	TIBETAN MARK YIG MGO TSHEG SHAD MA

160.7/4.1

0F09	TIBETAN MARK BSKUR YIG MGO

160.8/4.1

0F0A	TIBETAN MARK BKA- SHOG YIG MGO

160.9/4.1

0FD0	TIBETAN MARK BSKA- SHOG GI MGO RGYAN

160.10/4.1

0FD1	TIBETAN MARK MNYAM YIG GI MGO RGYAN

160.10.1/5.1

0FD3	TIBETAN MARK INITIAL BRDA RNYING YIG MGO MDUN MA

160.10.2/5.1

A874	PHAGS-PA SINGLE HEAD MARK

160.10.3/5.1

A875	PHAGS-PA DOUBLE HEAD MARK

160.11~~/4.1~~/5/5.1

~~These characters are~~ Tibetan head letters ~~which~~ allow a break before. For more information, see Section 5.65, Tibetan Line Breaking.

161

1806	MONGOLIAN TODO SOFT HYPHEN

162/4~~/4.1~~/5.1

Despite its name, this~~theThe~~ Mongolian character is not an invisible control like SOFT HYPHEN, but rather a visible character like a regular~~Todo~~ ~~soft~~ hyphen. Unlike the hyphen, MONGOLIAN TODO ~~is notindicates~~ ~~an invisible control like~~ SOFT HYPHEN~~, but rather a visible character like a regular hyphen. Unlike the hyphen it~~ stays with the following line. Whenever optional line breaks are to be marked invisibly, SOFT HYPHEN should be used instead.~~whenever optional line breaks are to be marked in any script.break opportunity with hyphen, but unlike the soft-hyphen it stays with the following line.~~

163~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

B2: —- Break Opportunity Before and After (B/A/XP)

163.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB17

165~~/3.1~~/4~~/4.0.1/4.1~~/5/5.1

The EM DASH is used to set off parenthetical text. Normally, it is used~~, normally~~ without spaces. However~~, however~~, this is language dependent. For example, in Swedish, spaces are used around the EM DASH,. Line breaks can occur before and after an EM DASH. Because EM DASHes are sometimes used in pairs instead of a single quotation dash, the default behavior is~~, but~~ not to break~~between a pair of them. Such pairstwo em dashes. Pairs of em dashes~~ ~~are sometimes used instead of a single quotation dash. For that reason,~~ the line between even though~~should~~ not all fonts use connecting glyphs for the~~be broken between~~ EM ~~DASHesem dashes~~ ~~even though not all fonts use connecting glyphs~~, ~~for~~ ~~example, in Swedish, spaces are used around~~ ~~the EM~~ DASH. Line breaks can occur before and after an EM DASH, but not between two em dashes. Pairs of em dashes are sometimes used instead of a single quotation dash. For that reason, the line should not be broken between em dashes event though not all fonts use connecting glyphs ~~character is used to set off parenthetical text, normally without spaces, however, this is language dependent,~~ ~~for the EM DASH.~~example, in Swedish, spaces are used. Line breaks can occur before and after an em dash, but not between two em dashes. Pairs of em dashes are sometimes used instead of a single quotation dash. For that reason, the line should not be broken between em dashes.

166/3.1

This paragraph was deleted. ~~not be broken between em dashes.~~

166.1/6.1

Some languages, including Spanish, use EM DASH to set off a parenthetical, and the surrounding dashes should not be broken from the contained text. In this usage there is space on the side where it can be broken. This does not conflict with symmetrical usages, either with spaces on both sides of the em-dash or with no spaces.

167~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

BK: —- Mandatory Break (A) —- (Non-tailorablenormative)

167.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB4, LB6, LB9, LB15a, LB15b, LB20a

168/5

Explicit breaks act independently of the surrounding characters. No characters can be added to the BK class as part of tailoring, but implementations are not required to support the VT character.

168.1/5/15.1

{15.1.0: 173-C29, 173-A128; L2/22-229R,L2/22-234R2}

Moved from §169.1. 000B

LINE TABULATION (VT)

169/5

000C	FORM FEED (FF)

169.1/5/15.1

{15.1.0: 173-C29, 173-A128; L2/22-229R,L2/22-234R2}

This paragraph was moved to §168.1. ~~000B~~

~~LINE TABULATION (VT)~~

170~~/4.1~~/5.1

FORM FEED~~Form Feed~~ separates pages. The text on the new page starts at the beginning of the line. In some layout modes there may be no visible advance to a new “page”.~~No paragraph formatting is applied.~~

171/5/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

2028	LINE SEPARATOR ~~(LS)~~

172/5~~/5.1~~/8

The text after the LINE SEPARATOR~~Line Separator~~ starts at the beginning of the line. ~~No paragraph formatting is applied.~~ This is similar to HTML <BR>.

173/5

This paragraph was deleted. ~~This is similar to HTML <BR>~~

174/5/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

2029	PARAGRAPH SEPARATOR ~~(PS)~~

175~~/5.1~~/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

The text of the new paragraph starts at the beginning of the line. This character defines a paragraph break, causing suitable~~Paragraph~~ formatting to beis applied, for example, interparagraph spacing or first line indentation. LINE SEPARATORLS, FF, VT as well as CR, LF and NL do not define a paragraph break..

176/4/5

Newline Function“"NEW LINE FUNCTION (NLF)”"

177/4~~/4.1~~/5~~/5.1/5.2~~/6

Newline Functions are defined in the Unicode Standard as providing~~New line functions provide~~ additional mandatory~~explicit~~ breaks. They are not individual characters, but are encoded~~expressed~~ as sequences of the control characters NEL, LF, and CR. If a character sequence for a Newline Function contains more than one character, it is kept together. The~~What~~ particular sequences that~~sequence(s)~~ form ana NLF depend~~depends~~ on the implementation and other circumstances as described in Section 5.8, Newline Guidelines, of [Unicode~~Unicode5.2~~0~~Unicode~~]. ~~Section 5.8,~~ ~~Technical Report 13, Unicode~~ ~~Newline Guidelines.~~

178~~/3.1~~/4/5

This specification defines the NLF implicitly. It defines the three~~If a~~ ~~the~~ character classes CR, LF, and NL. Their~~sequence for a new~~ line break~~function contains more than one character, it is kept together. The default~~ behavior, defined in rule LB5 in Section 6.1, Non-tailorable Line Breaking Rules, is to break after NL, LF, or CR, but not between CR and LF. ~~Two additional line breaking classes have been added for convenience in this operation.~~

179~~/3.1/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

CB: —- Contingent Break Opportunity (B/A) —- (normative)

179.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB1, LB20, LB20a

180/5

By default, there is a break opportunity both before and after any inline object. Object-specific line breaking behavior is implemented in the associated object itself, and where available can override the default to prevent either or both of the default break opportunities. Using U+FFFC OBJECT REPLACEMENT CHARACTER allows the object anchor to take a character position in the string.~~Contingent Break Opportunity Before and After~~

181

FFFC	OBJECT REPLACEMENT CHARACTER

182/4~~/4.0.1/4.1~~/5

~~By default there is a break opportunity both before and after the object.~~ Object-specific line break~~ing~~ behavior is best implemented by querying the~~in the associated~~ object itself, not by replacing the CB line~~and where available can override the default to prevent either or both of the~~ breaking class ~~opportunities. Note~~, ~~that this is best implemented~~ by another~~querying the object itself, not by replacing the CB line breaking~~ class. ~~by another class.~~

182.1/6.1

CJ: Conditional Japanese Starter

182.1.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB1

182.2/6.1

This character class contains Japanese small hiragana and katakana. Characters of this class may be treated as either NS or ID.

182.3/6.1

CSS Text Level 3 (which supports Japanese line layout) defines three distinct values for its line-break behavior:

182.4/6.1

• strict, typically used for long lines

182.5~~/6.1~~/16

• normal ~~(CSS default)~~, the behavior typically used for books and documents

182.6/6.1

• loose, typically used for short lines such as in newspapers

182.7/6.1

These have different sets of “kinsoku” characters which cannot be at the beginning or end of a line; strict has the largest set, while loose has the smallest. The motivation for the smaller number of kinsoku characters is to avoid triggering justification that puts characters off the grid position.

182.8/6.1

Treating characters of class CJ as class NS will give CSS strict line breaking; treating them as class ID will give CSS normal breaking.

182.9/6.1

The CJ line break class includes

182.10/6.1

3041, 3043, 3045, etc.

Small hiragana

182.11/6.1

30A1, 30A3, 30A5, etc.

Small katakana

182.12/6.1

30FC	KATAKANA-HIRAGANA PROLONGED SOUND MARK

182.13/6.1

FF67..FF70

Halfwidth variants

183~~/4.0.1~~/5~~/5.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

CL: —- CloseClosing Punctuation (XB)

183.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB13, LB15b, LB16, LB25

184~~/3.2/4.1~~/5~~/5.1~~/5.2

The closing character of any set of paired punctuation should~~must~~ be kept with the preceding character, and the same applies to all forms of wide comma and full stop. This is desirable, even when there are intervening space characters, ~~so as~~ to prevent the appearance of a bare closing punctuation mark at the head of a line. ~~The CL line break class contains the following characters plus any characters of General_Category Pe in the Unicode Character Database.~~:

184.1/5.2

The class CL is closely related to the class CP (Close Parenthesis). They differ only in that CP will not introduce a break when followed by a letter or number, which prevents breaks within constructs like “(s)he”.

184.2/5.2

The CL line break class contains characters of General_Category Pe in the Unicode Character Database, but excludes any characters included in the class CP. It also contains certain non-paired punctuation characters, including:

185

3001..3002

IDEOGRAPHIC COMMA..IDEOGRAPHIC FULL STOP

185.-1.1/16

{16.0.0: 179-C30, 179-A107}

FE10	PRESENTATION FORM FOR VERTICAL COMMA

185.0.1/4.1

FE11	PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA

185.0.2/4.1

FE12	PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP

185.1/3.1

FE50	SMALL COMMA

185.2/3.1

FE52	SMALL FULL STOP

186

FF0C	FULLWIDTH COMMA

187

FF0E	FULLWIDTH FULL STOP

188/3.1

This paragraph was deleted. ~~FE50~~

~~SMALL COMMA~~

189/3.1

This paragraph was deleted. ~~FE52~~

~~SMALL FULL STOP~~

190

FF61	HALFWIDTH IDEOGRAPHIC FULL STOP

191

FF64	HALFWIDTH IDEOGRAPHIC COMMA

192~~/3.1~~/5

This paragraph was deleted. ~~plus any characters of General Categorygeneral category~~ ~~Pe in the Unicode Character Database.~~

193~~/4.0.1~~/5~~/5.1/5.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

CM: —- Attached Characters and Combining MarkMarks (XB) (Non-tailorable) —- (normative)

193.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB1, LB9, LB10

194/4.1

Combining Characterscharacters

195~~/4.0.1~~/4.1

Combining character sequences are treated as units for the purpose~~purposes~~ of line breaking. The line -breaking behavior of the sequence is that of the base character. ~~If U+0020 SPACE is used as a base character, it is treated as ID~~AL ~~instead of SP.~~

195.1~~/4.1~~/8

The preferred base character for showing combining marks in isolation is U+00A0 NONo-BREAK~~Break~~ SPACE. If a line break before or after the combining sequence is desired, U+200B ZERO WIDTH SPACE can be used. The use of U+0020 SPACE as a base character is deprecated.

195.2/5~~/5.1~~/15

For most purposes, combining characters take on the properties of their base characters, and that is how the CM class is treated in rule LB9 of this specification. As a result, if the sequence <0021, 20E4> is used to represent a triangle enclosing an exclamation point, it is effectively treated as EX, the line break class of the exclamation mark. If U+26A0 WARNING~~2061 CAUTION~~ SIGN had been used, which also looks like an exclamation point inside a triangle, it would have the line break class of AL. Only the latter corresponds to the line breaking behavior expected by users for this symbol. To avoid surprising behavior, always use a base character that is a symbol or letter (Line Break AL) when using enclosing combining marks (General_Category Me).

196~~/3.1/4.0.1/4.1~~/5/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

The CM line break class includes all combining~~All~~ characters with General_Category~~General Category~~ Mc, Me, and~~general category~~ Mn, unless listed explicitly elsewhere. This includes viramas that don’t have line break class VI or VF...~~, Mc, and Me.~~

197~~/3.1~~/4

This paragraph was deleted. Conjoining Jamos (non-initial)

198/4

This paragraph was deleted. ~~1160..11F9~~

~~Conjoining Jamos~~

199/4

This paragraph was deleted. A sequence of conjoining Jamos is used to make up a Hangul syllable. Breaks are only allowed around the entire Hangul syllable, and then the line break properties are the same for precomposed Hangul syllables as for conjoined sequence of Jamos.

200~~/3.1~~/4

This paragraph was deleted. ~~NOTE: for the purpose of determining line break opportunities, non-initial conjoining Jamos~~ ~~thus~~ ~~behave like combining marks, while the initial combining Jamos have the same property as Hangul Syllables.~~

201/4.1

Control and Formatting Charactersformatting characters

202~~/3.1~~/4.1

Most control~~controls~~ and formatting characters are ignored in line breaking and do not contribute to the line width. By giving them class CM, the line breaking behavior of the last preceding character that is not of class CM affects the line breaking behavior.~~. All characters of General Category Cc and Cf, unless listed explicitly elsewhere.~~

202.1~~/3.1~~/5/5.1

Note~~NOTE~~: When control codes and format characters are rendered visibly during editing, more graceful layout might be achieved by treat~~assign~~ing them as if they had the line break class of the visible symbols instead, that is~~the~~ AL or ID. Such visible modes do not violate the constraint on tailorability, because they are logically equivalent to having temporarily substituted symbol characters, such as the characters from the Control Pictures block, or in some cases, character sequences, for the actual control characters. ~~class instead.~~

202.2~~/3.1/4.1~~/5

The CM line break class includes all~~All~~ characters of General_Category~~General Category~~ Cc and Cf, unless listed explicitly elsewhere.

202.2.1/6.3

The CM class also includes U+3035 VERTICAL KANA REPEAT MARK LOWER HALF. This character is normally preceded by either U+3033 VERTICAL KANA REPEAT MARK UPPER HALF or U+3034 VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF, and should not be separated from them.

202.3~~/5.2~~/6/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

CP: Closing Parenthesis (XB)

202.3.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB13, LB15b, LB16, LB25, LB30

202.4~~/5.2~~/16

{16.0.0: 172-A100; PRI-446#ID20220603194905}

This class contains ~~just~~ two common characters, U+0029 RIGHT PARENTHESIS and U+005D RIGHT SQUARE BRACKET. It also contains closing brackets used in phonetic notations. Characters of class CP differ from those of the CL (Close Punctuation) class in that they will not cause a break opportunity when appearing in contexts like “(s)he.” In all other respects the breaking behavior of CP and CL are the same.

202.5/5.2

0029	RIGHT PARENTHESIS

202.6/5.2

005D	RIGHT SQUARE BRACKET

202.7/16

{16.0.0: 172-A100; PRI-446#ID20220603194905}

2E56	RIGHT SQUARE BRACKET WITH STROKE

202.8/16

{16.0.0: 172-A100; PRI-446#ID20220603194905}

2E58	RIGHT SQUARE BRACKET WITH DOUBLE STROKE

202.9/16

{16.0.0: 172-A100; PRI-446#ID20220603194905}

2E5A	TOP HALF RIGHT PARENTHESIS

202.10/16

{16.0.0: 172-A100; PRI-446#ID20220603194905}

2E5C	BOTTOM HALF RIGHT PARENTHESIS

203~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

CR: —- Carriage Return (A) —- (Non-tailorablenormative)

203.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB5, LB6, LB9, LB15a, LB15b, LB20a

204

000D	CARRIAGE RETURN (CR)

205/4

A CR indicates a mandatory~~Do not~~ break after, unlessif followed by a LF. See also the discussion under BK.~~, mandatory break after otherwise~~

205.1/4~~/4.0.1~~/5/5.1

Note~~NOTE~~: On some platforms the character sequence <CR, CR, LF>~~CR, CR, LF~~ is used to indicate the location of actual line breaks, whereas <CR, LF>~~CR LF~~ is treated like a hard line break. As soon as a user edits the text, the location of all the <CR, CR, LF> sequences~~CR CR LF~~ may change as the new text breaks differently, while the relative position of any <CR, LF> to~~the CR LF to~~ the surrounding text stays~~stay~~ the same. This convention allows an editor to return a buffer and the client to~~is able to~~ tell which text is displayed on which line, by counting the number of <CR, CR, LF> and <CR, LF> sequences. This convention is essentially equivalent to markup that captures the result of applying the line break algorithm, not a tailoring of the CR character. The <CR, CR, LF> sequences are thus not considered part of the plain text content.~~CR CR LFs and CR LFs.~~

205.2/9/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

EB: Emoji Base (B/A)

205.2.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB23a, LB30b

205.3/9/10

{10.0.0: 150-A58, 150-C22; L2/16-315R}

This class includes characters whose appearance can be modified by a subsequent emoji modifier in an emoji modifier sequence. This class directly corresponds to the Emoji_Modifier_Base property as defined in Section 1.4.4 Emoji Modifiers of [UTS51~~UTR51~~].

205.4/9

Examples include:

205.6/9

1F478

PRINCESS

205.7/9

1F6B4

BICYCLIST

205.8/9

Breaks within emoji modifier sequences are prevented by rule LB30b. In other contexts, characters of class EB behave similarly to ideographs of class ID, with break opportunities before and after.

205.9/9/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

EM: Emoji Modifier (A)

205.9.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB23a, LB30b

205.10/9/10

{10.0.0: 150-A58, 150-C22; L2/16-315R}

This class includes characters that can be used to modify the appearance of a preceding emoji in an emoji modifier sequence. This class directly corresponds to the Emoji_Modifier property as defined in Section 1.4.4 Emoji Modifiers of [UTS51~~UTR51~~].

205.11/9

Breaks within emoji modifier sequences are prevented by rule LB30b.

205.12/9

Emoji modifiers include:

205.13/9

1F3FB..1F3FF

EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6

206~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

EX: —- Exclamation / Interrogation (XB)

206.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB13, LB15b

207~~/4.1~~/5/5.2

Characters in this line break class~~These~~ behave like closing characters, except in relation to postfix (PO) and ~~and ‘~~non-starter’ characters (NS). Examples include:~~. They include:~~

208

0021	EXCLAMATION MARK

209

003F	QUESTION MARK

209.-1.1/4.1

05C6	HEBREW PUNCTUATION NUN HAFUKHA

209.-1.2~~/4.1~~/5.1

This paragraph was deleted. ~~060C~~

~~ARABIC COMMA~~

209.-1.3/4.1

061B	ARABIC SEMICOLON

209.-1.4/4.1

061E	ARABIC TRIPLE DOT PUNCTUATION MARK

209.-1.5/4.1

061F	ARABIC QUESTION MARK

209.-1.6~~/4.1~~/5.1

This paragraph was deleted. ~~066A~~

~~ARABIC PERCENT SIGN~~

209.-1.7/4.1

06D4	ARABIC FULL STOP

209.-1.7.1/5

07F9	NKO EXCLAMATION MARK

209.-1.8/4.1

0F0D	TIBETAN MARK SHAD

209.-1.9~~/4.1~~/5.2

This paragraph was deleted. ~~0F0E~~

~~TIBETAN MARK NYIS SHAD~~

209.-1.10~~/4.1~~/5.2

This paragraph was deleted. ~~0F0F~~

~~TIBETAN MARK TSHEG SHAD~~

209.-1.11~~/4.1~~/5.2

This paragraph was deleted. ~~0F10~~

~~TIBETAN MARK NYIS TSHEG SHAD~~

209.-1.12~~/4.1~~/5.2

This paragraph was deleted. ~~0F11~~

~~TIBETAN MARK RIN CHEN SPUNGS SHAD~~

209.-1.13~~/4.1~~/5.2

This paragraph was deleted. ~~0F14~~

~~TIBETAN MARK GTER TSHEG~~

209.-1.13.1~~/5.1~~/5.2

This paragraph was deleted. ~~1802~~

~~MONGOLIAN COMMA~~

209.-1.13.2~~/5.1~~/5.2

This paragraph was deleted. ~~1803~~

~~MONGOLIAN FULL STOP~~

209.-1.13.3~~/5.1~~/5.2

This paragraph was deleted. ~~1808~~

~~MONGOLIAN MANCHU COMMA~~

209.-1.13.4~~/5.1~~/5.2

This paragraph was deleted. ~~1809~~

~~MONGOLIAN MANCHU FULL STOP~~

209.0.1~~/4.0.1~~/5.2

This paragraph was deleted. ~~1944~~

~~LIMBU EXCLAMATION MARK~~

209.0.2~~/4.0.1~~/5.2

This paragraph was deleted. ~~1945~~

~~LIMBU QUESTION MARK~~

209.1~~/3.2~~/5.2

This paragraph was deleted. ~~2762~~

~~HEAVY EXCLAMATION MARK ORNAMENT~~

209.2~~/3.2~~/5.2

This paragraph was deleted. ~~2763~~

~~HEAVY HEART EXCLAMATION MARK ORNAMENT~~

209.2.1~~/5.1~~/5.2

This paragraph was deleted. ~~2CF9~~

~~COPTIC OLD NUBIAN FULL STOP~~

209.2.2~~/5.1~~/5.2

This paragraph was deleted. ~~2CFE~~

~~COPTIC FULL STOP~~

209.2.3~~/5.1~~/5.2

This paragraph was deleted. ~~2E2E~~

~~REVERSED QUESTION MARK~~

209.2.4~~/5.1~~/5.2

This paragraph was deleted. ~~A60C~~

~~VAI SYLLABLE LENGTHENER~~

209.2.5~~/5.1~~/5.2

This paragraph was deleted. ~~A60E~~

~~VAI FULL STOP~~

209.3/5/5.2

This paragraph was deleted. ~~A876~~

~~PHAGS-PA MARK SHAD~~

209.4/5/5.2

This paragraph was deleted. ~~A877~~

~~PHAGS-PA MARK DOUBLE SHAD~~

209.5/5/5.2

This paragraph was deleted. ~~FE15~~

~~PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK~~

209.6/5/5.2

This paragraph was deleted. ~~FE16~~

~~PRESENTATION FORM FOR VERTICAL QUESTION MARK~~

210/5.2

This paragraph was deleted. ~~FE56..FE57~~

~~SMALL QUESTION MARK..SMALL EXCLAMATION MARK~~

211

FF01	FULLWIDTH EXCLAMATION MARK

212

FF1F	FULLWIDTH QUESTION MARK

212.1~~/4.1~~/5

This paragraph was deleted. ~~FE15~~

~~PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK~~

212.2~~/4.1~~/5

This paragraph was deleted. ~~FE16~~

~~PRESENTATION FORM FOR VERTICAL QUESTION MARK~~

213/4~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

GL: —- Non-breaking (“"Glue”") (XB/XA) —- (Non-tailorablenormative)

214/4

This paragraph was deleted. ~~The action of these characters is to glue together both left and right neighbor character such that they are kept on the same line. If they follow a space character, they still allow a break.~~

214.1~~/3.2~~/4

{3.2.0: 81-M6, 85-M7; L2/00-258}

This paragraph was deleted. ~~2060~~

~~WORD JOINER (WJ)~~

215/4

This paragraph was deleted. ~~FEFF~~

~~ZERO WIDTH NO-BREAK SPACE (ZWNBSP)~~

215.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB12, LB12a, LB15a, LB15b, LB20a

216~~/3.2~~/4

{3.2.0: 81-M6, 85-M7; L2/00-258}

This paragraph was deleted. ~~The word joinerSince this~~ character is the preferred choice for an invisible character to keep other characters together that would otherwise be split across the line at a direct break. The character FEFF has the same effect, but since~~not visible,~~ ~~it is also used in an unrelated way as a byte order mark the use of the WJ as the preferred interword glue will simplify the handling of FEFF.choice for keeping characters together that would otherwise be split across the line at a direct break.~~

216.1/4~~/4.1/5.2~~/6.1

Non-breaking characters prohibit breaks on either side, but that prohibition can be overridden by SP or ZW. In particular, when NO-BREAK SPACE~~NBSP~~ follows SPACE, there is a break opportunity after the SPACE and the NO-BREAK SPACE~~NBSP~~ will go as visible space onto the next line. See also WJ. The following are examples of~~lists the~~ characters of line break class GL: ~~with additional description.~~

217

00A0	NO-BREAK SPACE (NBSP)

218/4

202F	NARROW NO-BREAK SPACE (NNBSP)

218.1/4

180E	MONGOLIAN VOWEL SEPARATOR (MVS)

219/4~~/4.1~~/5~~/5.1/5.2~~/6/7~~/11~~/12

{12.0.0: 155-A31}

NO-BREAK SPACE is the preferred character to use where two words are to~~should~~ be visually separated but kept on the same line, as in the case of a title and a name “"Dr.<NBSP>Joseph Becker”". When SPACE follows NO-BREAK SPACE~~NBSP~~, there is no break, because there never is a break in front of SPACE. ~~NARROW NO-BREAK SPACE is used in Mongolian. The MONGOLIAN VOWEL SEPARATOR acts like a NARROW NO-BREAK SPACE in its line breaking behavior. It additionally affects the shaping of certainmongolian~~ ~~vowel~~ ~~separator acts like a NNBSP in its line breaking behavior. It additionally affects the shaping of certain vowel~~ ~~characters as described in Section 13.5~~42~~, Mongolian, of [UnicodeUnicode5.2~~0~~Unicode~~]. ~~Section 12.3, Mongolian.~~

219.0.1~~/5.1/5.2~~/12

{12.0.0: 155-A31}

NARROW NO-BREAK SPACE has exactly the same line breaking behavior as~~(NNBSP)~~ ~~is a narrow version of~~ NO-BREAK SPACE, ~~which has exactly the same line breaking behavior,~~ but with a narrow~~except for its~~ display width. The MONGOLIAN VOWEL SEPARATOR acts like~~It is regularly used~~ ~~behaves exactly the same~~ ~~in Mongolianits line breaking behavior. It is regularly used~~ in ~~Mongolian in~~ ~~certain grammatical contexts (before~~ a ~~particle), where it also influences the shaping of the glyphs for the particle. In Mongolian text, the~~ NARROW NO-BREAK SPACE~~NNBSP~~ in its line breaking behavior. Both of these characters are regularly used in Mongolian text, where they participate in special shaping behavior, as described in Section 13.5, Mongolian of [Unicode].~~is typically displayed with one third1/3~~ ~~the width of a normal space character.~~

219.0.2/5.1

When NARROW NO-BREAK SPACE occurs in French text, it should be interpreted as an “espace fine insécable”.

219.0.3~~/5.1/5.2~~/12

{12.0.0: 155-A31}

This paragraph was deleted. ~~The MONGOLIAN VOWEL SEPARATOR is equivalent to a NARROW NO-BREAK SPACENNBSP~~ in its line breaking behavior, but has different effects in controlling the shaping of its preceding and following characters. It constitutes a word-internal space and is typically displayed with half the width of a NARROW NO-BREAK SPACE.~~NNBSP.~~

219.0.4/15.1

1107F

BRAHMI NUMBER JOINER

219.0.5/15.1

13430..13436

EGYPTIAN HIEROGLYPH VERTICAL JOINER..EGYPTIAN HIEROGLYPH OVERLAY MIDDLE

219.0.6/15.1

13439..1343B

EGYPTIAN HIEROGLYPH INSERT AT MIDDLE..EGYPTIAN HIEROGLYPH INSERT AT BOTTOM

219.0.7/15.1

16FE4

KHITAN SMALL SCRIPT FILLER

219.0.8/15.1

These characters participate in shaping behavior. Together with the characters on either side, they form a ligature, quadrat, or cluster, within which there can be no line break. See Section 14.1, Brahmi, Section 11.4, Egyptian Hieroglyphs, and Section 18.12, Khitan Small Script, respectively, of [Unicode].

219.1/3.2

{3.2.0: 83-AI43, 84-M10, 85-M13; L2/00-156}

034F	COMBINING GRAPHEME JOINER

219.2~~/3.2~~/5.1

{3.2.0: 83-AI43, 84-M10, 85-M13; L2/00-156}

This character has no visible glyph and its presence indicates that adjoining characters are to be treated as a graphemic unit, therefore preventing line breaks between them. The use of grapheme joiner affects other processes, such as sorting, therefore, U+2060 WORD JOINER should be used if the intent is to merely prevent a line break.

220

2007	FIGURE SPACE

221

This is the preferred space to use in numbers. It has the same width as a digit and keeps the number together for the purpose of line breaking.

222/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

2011	NON-BREAKING HYPHEN ~~(NBHY)~~

223~~/5.1~~/5.2

This is the preferred character to use where words need to~~must~~ be hyphenated but may not be broken at the hyphen. Because of its use as a substitute for ordinary hyphen, the appearance of this ~~use as a substitute for ordinary hyphen, the appearance of this~~ character should match that of U+2010 HYPHEN.

223.1/4.1

0F08	TIBETAN MARK SBRUL SHAD

224

0F0C	TIBETAN MARK DELIMITER TSHEG BSTAR

224.1/4.1

0F12	TIBETAN MARK RGYA GRAM SHAD

225~~/4.1~~/5~~/5.1~~/5.2

The TSHEG BSTAR~~BstARThis~~ looks exactly like a Tibetan tsheg, but can be used to prevent a break~~. It inhibits breaking on either side,~~ like no-break space. It inhibits breaking on either side. For more information, see Section 5.65, Tibetan Line Breaking.

225.1~~/4.0.1~~/5

035C~~035D~~..0362

COMBINING DOUBLE BREVE BELOW..COMBINING DOUBLE RIGHTWARDS ARROW BELOW

225.2~~/4.0.1~~/5

These diacritics span two characters, so~~thus~~ no word or line breaks are possible on either side.

225.3/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

FE20	COMBINING LIGATURE LEFT HALF

225.4/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

FE22	COMBINING DOUBLE TILDE LEFT HALF

225.5/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

FE24	COMBINING MACRON LEFT HALF

225.6/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

FE27	COMBINING LIGATURE LEFT HALF BELOW

225.7/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

FE29	COMBINING TILDE LEFT HALF BELOW

225.8/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

FE2B	COMBINING MACRON LEFT HALF BELOW

225.9/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

FE2E	COMBINING CYRILLIC TITLO LEFT HALF

225.10/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

FE26	COMBINING CONJOINING MACRON

225.11/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

FE2D	COMBINING CONJOINING MACRON BELOW

225.12/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

The left half diacritics are part of a legacy representation of the double diacritics; they occur between the two characters spanned by the double diacritic. Preventing breaks on either side therefore achieves the same line breaking behavior as when using the preferred representation U+035C..U+0362.

225.13/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

In addition, the conjoining macrons above and below, together with left and right half marks, form marks spanning more than two characters; likewise no line break occurs within such spans.

226~~/4.1~~/5

This paragraph was deleted. ~~Some dictionaries use a character that looks like a vertical series of four dots to indicate places where there is a syllable, but no allowable break. This can be represented bycharacter has not been encoded in Unicode yet, but is an example of~~ ~~a sequence of 205E VERTICAL FOUR DOTS followed by 2060 WORD JOINER.character that should be given the GL property.~~

226.1~~/4.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

H2: — Hangul LV Syllable (B/A)

226.1.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB26, LB27

226.2~~/4.1~~/5

This class includes all~~All~~ characters of Hangul Syllable Type LV.

226.3~~/4.1~~/5~~/5.1~~/5.2

Together with conjoining jamos, Hangul syllables form Korean Syllable Blocks, which are kept together; see Unicode Standard Annex #29, “Unicode Text Segmentation” [UAX29~~Boundaries~~]. Korean uses space-based line breaking in many styles of documents. To support these, Hangul syllables and conjoining jamos~~jamo~~ need to be tailored to use class AL. The~~, while the~~ default in this specification is class ID, which supports the~~In that~~ case of Korean documents not using space-based line breaking~~Hangul syllables and conjoining jamo are tailored to use class AL but the default is class ID~~. See Section 8.1, Types of Tailoring. See also JL, JT, JV, and H3.

226.4~~/4.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

H3: — Hangul LVT Syllable (B/A)

226.4.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB26, LB27

226.5~~/4.1~~/5

This class includes all~~All~~ characters of Hangul Syllable Type LVT. See also JL, JT, JV, and H2.

227~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

HY: —- Hyphen (XA)

227.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB12a, LB20a, LB21, LB21a, LB25

228

002D	HYPHEN-MINUS

229~~/4.0.1~~/5~~/5.1~~/5.2

Some additional context analysis is required to distinguish usage of this character as a hyphen from its usage~~the use~~ as a minus sign (or indicator of numerical range). If used as hyphen, it acts like U+2010 HYPHEN~~hyphen~~, which has line break class BA..~~HYPHEN.~~

230~~/3.1/4.0.1/4.1~~/5

Note~~NOTE~~: Some typescript conventions use~~In some practice,~~ runs of HYPHEN-MINUS ~~are used~~ to stand in for longer dashes or horizontal rules. If actual character code conversion is not performed and it is desired to treat them like the characters or layout elements they stand for, ~~and actual character code conversion is not performed,~~ line breaking needs~~will need~~ to support these runs~~special cases~~ explicitly.

231~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

ID: —- Ideographic (B/A)

231.1~~/3.1/4.1~~/5~~/5.2~~/6

This paragraph was deleted. ~~Note: This class also includesNOTE: The actual set of~~ ~~characters~~ in~~name ideographic for~~ ~~this~~ ~~line breaking~~ ~~class includeswas chosen pars pro toto. The actual set of~~ ~~characters~~ ~~in this class includes characters~~ ~~other than Han ideographs.~~

231.2/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB23a

232/4~~/4.0.1/4.1~~/5~~/5.2~~/6

Characters with this property doDo not require other characters to provide break opportunities;, lines can ordinarily break before and after and between pairs of ideographic characters. Examples of characters with the~~The~~ ID line break class include most assigned characters in the ranges listed below. Note that this class also includes~~consists of~~ ~~the following~~ characters other than Han ideographs.::.

233/3.1

This paragraph was deleted. ~~4E00..9FAF~~

~~CJK UNIFIED IDEOGRAPHS~~

234/3.1

This paragraph was deleted. ~~3400..4DBF~~

~~CJK UNIFIED IDEOGRAPHS EXTENSION A~~

235/3.1

This paragraph was deleted. ~~F900..FAFF~~

~~CJK COMPATIBILITY IDEOGRAPHS~~

235.1~~/3.1~~/4

This paragraph was deleted. ~~1100..115F~~

~~Initial Conjoining Jamos~~

235.2~~/3.1~~/8

2E80..2FFF

CJK, Kangxi Radicals, Ideographic Description Symbols~~KANGXI RADICALS, DESCRIPTION SYMBOLS~~

236/6

This paragraph was deleted. ~~3000~~

~~IDEOGRAPHIC SPACE~~

237/3.1

This paragraph was deleted. ~~AC00..D7AF~~

~~HANGUL SYLLABLES~~

238/3.1

This paragraph was deleted. ~~3130..318F~~

~~HANGUL COMPATIBILITY JAMO~~

239/3.1

This paragraph was deleted. ~~1100..115F~~

~~HANGUL JAM0 (ONLY THE INITIALS)~~

240~~/4.1~~/5

3040..309F

Hiragana~~HIRAGANA~~ (except small characters)

241~~/4.1~~/5/10

30A2..30FA~~30A0..30FF~~

Katakana~~KATAKANA~~ (except small characters)

241.1~~/3.1~~/4

This paragraph was deleted. ~~3130..318F~~

~~HANGUL COMPATIBILITY JAMO~~

241.2~~/3.1/5.1~~/8

3400..4DBF~~4DB54DBF~~

CJK Unified Ideographs Extension~~UNIFIED IDEOGRAPHS EXTENSION~~ A

241.3~~/3.1/5.1~~/6/8

4E00..9FFF~~9FBB9FAF~~

CJK Unified Ideographs~~UNIFIED IDEOGRAPHS~~

241.4~~/3.1/5.1~~/8

F900..FAFF~~FAD9FAFF~~

CJK Compatibility Ideographs~~COMPATIBILITY IDEOGRAPHS~~

241.5~~/3.1~~/4

This paragraph was deleted. ~~AC00..D7AF~~

~~HANGUL SYLLABLES~~

242~~/3.0.1~~/8/10

This paragraph was deleted. ~~A000..A48FA4C8~~

~~Yi SyllablesYI SYLLABLES~~

243~~/3.0.1~~/3.1

This paragraph was deleted. ~~A490-A4CFACFF~~

~~YI RADICALS~~

244/3.1

This paragraph was deleted. ~~2E80.. 2FFF~~

~~CJK, KANGXI RADICALS, DESCRIPTION SYMBOLS~~

244.1~~/3.1~~/8/10

This paragraph was deleted. ~~A490..A4CF~~

~~Yi RadicalsYI RADICALS~~

245/8/10

This paragraph was deleted. ~~FE62..FE66~~

~~SMALL PLUS SIGN..~~ to ~~SMALL EQUALS SIGN~~

246/3.1

This paragraph was deleted. ~~FF10-FF19~~

~~WIDE DIGITS~~

246.1~~/3.1~~/6

This paragraph was deleted. ~~FF10..FF19~~

~~WIDE DIGITS~~

246.2~~/3.1~~/6

This paragraph was deleted. ~~20000..2A6D6~~

~~CJK UNIFIED IDEOGRAPHS EXTENSION B~~

246.3~~/3.1~~/6

This paragraph was deleted. ~~2F800..2FA1D~~

~~CJK COMPATIBILITY IDEOGRAPHS SUPPLEMENT~~

247~~/4.1~~/5/6

This paragraph was deleted. ~~It also includesplus~~ ~~all of the FULLWIDTH LATIN letters and all of the blocks in the range 3000..~~-~~33FF~~ ~~blocks~~ ~~not covered elsewhere.~~

247.1/6/8/10

This paragraph was deleted. ~~FF01..FF5A~~

~~Fullwidth Latin letters and digitsFULL WIDTH LATIN LETTERS and DIGITS~~

247.2/6/16

See the data file LineBreak.txt [Data14] or the data file DerivedLineBreak.txt [Data14Derived] for the complete list of characters with the ID line break class.

248~~/3.1/3.2/4.1~~/5

{3.2.0: 81-M6, 85-M7; L2/00-258}

Note~~NOTE~~: Use~~use~~ U+2060 WORD JOINER~~ZWNBSP~~ as a manual override to prevent break opportunities around characters of class ID.~~ideographs.~~

248.0.1~~/4.0.1~~/6

This paragraph was deleted. ~~U+3000 IDEOGRAPHIC SPACE may be subject to expansion or compression during line justification.~~

248.0.2/5.2

Unassigned code points in blocks or regions of the Unicode codespace that have been reserved for CJK scripts are also assigned this line break class. These assignments anticipate that future characters assigned in these ranges will have the class ID. Once a character is assigned to one of these code points, the property value could change.

248.0.3~~/5.2/10~~/16

This paragraph was deleted. ~~The unassigned code points in the followingCJK~~ ~~blocks~~ ~~and regions in which unassigned characters~~ ~~default to~~ ~~line break class~~ ~~ID:~~ ~~are:~~

248.0.4~~/5.2~~/8/16

This paragraph was deleted. ~~3400..4DBF~~

~~CJK Unified Ideographs Extension A~~

248.0.5~~/5.2~~/8/16

This paragraph was deleted. ~~4E00..9FFF~~

~~CJK Unified Ideographs~~

248.0.6~~/5.2~~/8/16

This paragraph was deleted. ~~F900..FAFF~~

~~CJK Compatibility Ideographs~~

248.0.7~~/5.2~~/8/10

This paragraph was deleted. ~~20000..2A6DF~~

~~CJK Unified Ideographs Extension B~~

248.0.8~~/5.2~~/8/10

This paragraph was deleted. ~~2A700..2B73F~~

~~CJK Unified Ideographs Extension C~~

248.0.8.1/6/8/10

This paragraph was deleted. ~~2B740..2B81F~~

~~CJK Unified Ideographs Extension D~~

248.0.8.2/8/10

This paragraph was deleted. ~~2B820..2CEAF~~

~~CJK Unified Ideographs Extension E~~

248.0.9~~/5.2~~/8/10

This paragraph was deleted. ~~2F800..2FA1F~~

~~CJK Compatibility Ideographs Supplement~~

248.0.9.1~~/10~~/16

For example, all of the~~All~~ undesignated code points in Planes 2 (20000..2FFFD) and 3 (30000..3FFFD)~~, whether inside or outside of allocated blocks,~~ default to ID. See the data file DerivedLineBreak.txt for the complete list of code point ranges which default to the ID line break class.:

248.0.10~~/5.2~~/8~~/10~~/16

This paragraph was deleted. ~~20000..2FFFD~~

~~SIP (Plane 2) outside of blocks~~

248.0.11~~/5.2~~/8~~/10~~/16

This paragraph was deleted. ~~30000..3FFFD~~

~~TIP (Plane 3) outside of blocks~~

248.0.12~~/10~~/16

{10.0.0: 147-C25} {16.0.0: 177-A115, 177-C47, 162-A67; L2/23-234}

This paragraph was deleted. ~~All unassigned code points in the following Plane 1 range, whether inside or outside of allocated blocks, also default to ID:~~

248.0.13~~/10~~/16

{10.0.0: 147-C25} {16.0.0: 177-A115, 177-C47, 162-A67; L2/23-234}

This paragraph was deleted. ~~1F000..1FFFD~~

~~Plane 1 range~~

248.2/4~~/4.0.1~~/4.1

This paragraph was deleted. ~~Conjoining Jamos form Korean Syllable Blocks which are kept together, see [Boundaries]. Korean uses space-~~ ~~based line breaking in many styles of documents. In that case Hangul Syllables and Conjoining Jamo are tailored to use class AL but the default is class ID.~~

248.2.1~~/4.1~~/5/5.2

Korean is encoded with conjoining jamos, Hangul syllables, or both. See also JL, JT, JV, H2, and H3. The following set of compatibility jamo~~, Hangul syllables, or both. See also JL, JT, JV, H2, and H3. The following set of compatibility jamo~~ is~~are~~ treated as ID by default.

248.3/4/4.1

This paragraph was deleted. ~~1100..11FF~~

~~Conjoining Jamos~~

248.3.1/4.0.1

3130..318F

HANGUL COMPATIBILITY JAMO

248.4/4/4.1

This paragraph was deleted. ~~AC00..D7AF~~

~~HANGUL SYLLABLES~~

248.5/4/4.0.1

This paragraph was deleted. ~~3130..318F~~

~~HANGUL COMPATIBILITY JAMO~~

248.5.1/6.2

Symbols

248.5.2/6.2

Certain pictographic symbols of General Category So are also included in this line break class.

248.6~~/6.1~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

HL: Hebrew Letter (XB)

248.6.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB21a, LB21b, LB23, LB24, LB28, LB29, LB30

248.7/6.1

This class includes all Hebrew letters.

248.8~~/6.1~~/16

{16.0.0: 179-C25, 179-A98; PRI-335#ID20170429231648} {16.0.0: 137-C9}

When a Hebrew letter is separated from following non-Hebrew text~~followed~~ by a hyphen, there is no break on either side of the hyphen. In this context a hyphen is any character of class HY or class BA. There is also no break between a solidus and a Hebrew letter. In other respects, Hebrew letters behave the same as characters of class AL.

248.9/6.1

Included in this class are all characters of General Category Letter that have Script=Hebrew.

249~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

IN: —- Inseparable Characterscharacters (XP)

249.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB22

251~~/4.0.1~~/5.2

These characters are intended to be used consecutively~~in consecutive sequence~~. There is never a~~They therefore prevent~~ line break between~~breaks absolutely in a series of~~ two characters~~character~~ of this class.

251.1/5.2

Examples include:

252

2024	ONE DOT LEADER

253

2025	TWO DOT LEADER

254

2026	HORIZONTAL ELLIPSIS

254.1/4.1

FE19	PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS

255~~/3.1~~/8

HORIZONTAL ELLIPSIS~~Horizontal ellipsis~~ can be used as a three- dot leader.

256~~/4.0.1~~/5~~/5.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

IS: —- Infix Numeric Separator (Infix) (XB)

257~~/3.1~~/4

This paragraph was deleted. ~~Characters that usually occur inside a numerical expression~~, ~~may not be separated from following numeric characters, unless space character intervenes. Since they are otherwise sentence ending punctuation, they prevent breaks before.~~

257.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB15b, LB15c, LB15d, LB25, LB29

258/4

This paragraph was deleted. ~~There is no break in “100.00” or “10,000”, nor in “12:59”~~

258.1/5

Characters that usually occur inside a numerical expression may not be separated from the numeric characters that follow, unless a space character intervenes. For example, there is no break in “100.00” or “10,000”, nor in “12:59”.

258.2/5.2

Examples include:

262.1/4.0.1

037E	GREEK QUESTION MARK (canonically equivalent to 003B)

263/3.1

0589

ARMENIAN FULL STOP

263.-1.1/5.1

060C	ARABIC COMMA

263.0.1/4.0.1

060D	ARABIC DATE SEPARATOR

263.0.2/5

07F8

NKO COMMA

263.1/4

2044	FRACTION SLASH

263.1.1~~/4.1~~/16

{16.0.0: 179-C30, 179-A107}

This paragraph was deleted. ~~FE10~~

~~PRESENTATION FORM FOR VERTICAL COMMA~~

263.1.2~~/4.1~~/16

{16.0.0: 179-C30, 179-A107}

This paragraph was deleted. ~~FE13~~

~~PRESENTATION FORM FOR VERTICAL COLON~~

263.1.3~~/4.1~~/16

{16.0.0: 179-C30, 179-A107}

This paragraph was deleted. ~~FE14~~

~~PRESENTATION FORM FOR VERTICAL SEMICOLON~~

263.2/4~~/4.0.1~~/5

This paragraph was deleted. ~~Characters that usually occur inside a numerical expression may not be separated from the numeric characters that following numeric characters, unless a space character intervenes. For example, there is no break in “100.00” or “10,000”, nor in “12:59”.Since they are otherwise sentence ending punctuation, they prevent breaks before.~~

263.3/4~~/4.0.1/4.1~~/5

When not used in a numeric context, infix~~Infix~~ separators are sentence- ending punctuation ~~when not used in a numeric context~~. Therefore~~Since~~ they always~~are otherwise sentence ending punctuation, they~~ prevent breaks before.~~There is no break in “100.00” or “10,000”, nor in “12:59”~~

263.3.1~~/5.1~~/5.2

Note: FIGURE SPACE~~Figure Space~~, not being a punctuation mark, has been given the line break class GL.

263.4~~/4.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

JL: — Hangul L Jamo (B)

263.4.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB26, LB27

263.5/4.1

The JL line break class consists of all characters of Hangul Syllable Type L.

263.6~~/4.1~~/5~~/5.1~~/5.2

Conjoining jamos~~Jamos~~ form Korean Syllable Blocks, which are kept together; see Unicode Standard Annex #29, “Unicode Text Segmentation” [UAX29~~Boundaries~~]. Korean uses space-based line breaking in many styles of documents. To support these, Hangul syllables and conjoining jamos~~jamo~~ need to be tailored to use class AL. The~~, while the~~ default in this specification is class ID, which supports the~~In that~~ case of Korean documents not using space-based line breaking~~Hangul Syllables and Conjoining Jamo are tailored to use class AL but the default is class ID~~. See Section 8.1, Types of Tailoring. See also JT, JV, H2, and H3.

263.7~~/4.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

JT: — Hangul T Jamo (A)

263.7.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB26, LB27

263.8~~/4.1~~/5

The JT line break class consists of all characters of Hangul Syllable Type T. See also JL, JV, H2, and H3.

263.9~~/4.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

JV: — Hangul V Jamo (XA/XB)

263.9.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB26, LB27

263.10~~/4.1~~/5

The JV line break class consists of all characters of Hangul Syllable Type V. See also JL, JT, H2, and H3.

264~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LF: —- Line Feed (A) —- (Non-tailorablenormative)

264.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB5, LB6, LB9, LB15a, LB15b, LB20a

265

000A	LINE FEED (LF)

266/4

There is a mandatory break after any LF character, but see the discussion under BK..

266.1/4~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

NL: —- Next Line (A) —- (Non-tailorablenormative)

266.1.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB5, LB6, LB9, LB15a, LB15b, LB20a

266.2/4

0085	NEXT LINE (NEL)

266.3/4/5

The NL class acts like BK in all respects (there~~There~~ is a mandatory break after any NEL character). It cannot be tailored, but implementations are not required to support the NEL character; see the discussion under BK.

267~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

NS: Nonstarters —- Non-starters (XB)

267.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB1, LB16, LB21

268~~/4.1~~/5~~/5.1/5.2~~/6.1

Nonstarter~~Non-starterSome~~ characters cannot start a line, but unlike CL they may allow a break in some contexts~~context~~ when they ~~are~~ follow~~ing~~ one or more space characters. Nonstarters~~Non-starters~~ include: ~~all Hiragana, Katakana, and Halfwidth Katakana “small” characters, plus many others, including:~~:

269/4/4.0.1

This paragraph was deleted. ~~All characters with General Category Lm (Letter, Modifier) and East Asian Width type W or H (such as KATAKANA-HIRAGANA PROLONGED SOUND MARK or the katakana iteration marks)~~, ~~and all characters with General Category Sk (Symbol, Modifier) and East Asian width type W plus the following characters:~~

270/5.1

This paragraph was deleted. ~~0E5A..0E5B~~

~~THAI CHARACTER ANGKHANKHU..THAI CHARACTER KHOMUT~~

271/5.1

This paragraph was deleted. ~~17D4~~

~~KHMER SIGN KHAN~~

272/5

17D6~~..17DA~~

KHMER SIGN CAMNUC PII KUUH~~..KHMER SIGN KOOMUUT~~

273

203C	DOUBLE EXCLAMATION MARK

273.1/5

203D	INTERROBANG

273.2/5

2047	DOUBLE QUESTION MARK

273.3/5

2048	QUESTION EXCLAMATION MARK

273.4/5

2049	EXCLAMATION QUESTION MARK

274/4

This paragraph was deleted. ~~2044~~

~~FRACTION SLASH~~

274.1/3.1

3005	IDEOGRAPHIC ITERATION MARK

275.0.1/4

303C

MASU MARK

275.0.2/4.0.1

303B	VERTICAL IDEOGRAPHIC ITERATION MARK

275.1~~/3.1~~/5

309B.. 309E

KATAKANA-HIRAGANA VOICED SOUND MARK.. to HIRAGANA VOICED ITERATION MARK

275.2/4

30A0	KATAKANA-HIRAGANA DOUBLE HYPHEN

276~~/4.0.1~~/6.1

30FB~~..30FE~~

KATAKANA MIDDLE DOT~~..KATAKANA VOICED ITERATION MARK~~

277/3.1

This paragraph was deleted. ~~3005~~

~~IDEOGRAPHIC ITERATION MARK~~

278/3.1

This paragraph was deleted. ~~309B.. 309E~~

~~KATAKANA-HIRAGANA VOICED SOUND MARK to HIRAGANA VOICED ITERATION MARK~~

279/4

This paragraph was deleted. ~~30FD~~

~~KATAKANA ITERATION MARK~~

279.1~~/4.1/5.1~~/5.2

This paragraph was deleted. ~~A015~~

~~YI SYLLABLE WU (misnomer for YI SYLLABLE ITERATION MARK)~~

279.2/6.1

30FD..30FE

KATAKANA ITERATION MARK..KATAKANA VOICED ITERATION MARK

279.3/16

{16.0.0: 179-C30, 179-A107}

FE10	PRESENTATION FORM FOR VERTICAL COMMA

279.4/16

{16.0.0: 179-C30, 179-A107}

FE13	PRESENTATION FORM FOR VERTICAL COLON

280

FE54..FE55

SMALL SEMICOLON..SMALL COLON

281/3.1

FF1A..FF1B

FULLWIDTH COLON.. FULLWIDTH SEMICOLON

282

FF65	HALFWIDTH KATAKANA MIDDLE DOT

283/10

This paragraph was deleted. ~~FF70~~

~~HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK~~

283.1~~/3.1~~/4.0.1

FF9E..FF9F

HALFWIDTH KATAKANA VOICED SOUND MARK.. - HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK

284~~/4.1~~/5/5.2

This paragraph was deleted. ~~plusPlus~~ ~~all Hiragana, Katakana, and Halfwidth Katakana “small” characters.~~

285~~/3.1/4.0.1/4.1~~/5~~/5.1~~/6.1

Note~~NOTENote~~: Optionally, the NS restriction may be relaxed by tailoring, with~~and~~ some or all characters treated like ID, to achieve a more permissive style of line breaking, especially~~particular~~ in some East Asian document styles. Alternatively, line breaking can be tightened by moving characters that are ID into NS.~~contexts.~~.

285.1/5.1

For additional information about U+30A0 KATAKANA-HIRAGANA DOUBLE HYPHEN, see Section 5.5, Use of Double Hyphen.

286~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

NU: —- Numeric (XP)

286.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB15c, LB23, LB25, LB30

287~~/4.1~~/5.1

These characters behave~~Behave~~ like ordinary characters (AL) in the context of most~~ordinary~~ characters but, activate the prefix and postfix behavior of prefix and postfix characters.

288~~/4.0.1/4.1~~/5/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Numeric characters consist of decimal digits~~DECIMAL DIGITS~~ (all~~All~~ characters of General_Category~~General Category~~ Nd), except: ~~those with East_Asian_Width F (FullwidthFULL WIDTH), plus these characters:~~

288.0.1/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1. those with East_Asian_Width F (Fullwidth)

288.0.2/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

2. those from scripts that use the Brahmic style of context analysis

288.0.3/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

plus these characters:

288.1/4.0.1

066B	ARABIC DECIMAL SEPARATOR

288.2/4.0.1

066C	ARABIC THOUSANDS SEPARATOR

288.3~~/4.0.1/4.1~~/5/5.1

Unlike ~~with~~ IS characters, the Arabic numeric punctuation does not occur as sentence terminal punctuation outside numbers.

289~~/4.0.1~~/5~~/5.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

OP: —- Opening Punctuation (XA)

289.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB14, LB15a, LB25, LB30

290~~/4.1~~/5~~/5.1~~/5.2

The opening character of any set of paired punctuation should~~must~~ be kept with the ~~following~~ character that follows. This is desirable, even if~~when~~ there are intervening space characters, so as it prevents~~to prevent~~ the appearance of a bare opening punctuation mark at the end of a line. The OP line break class consists of all characters of General_Category Ps in the Unicode Character Database, plus.

291~~/3.1/3.2/4.1~~/5

This paragraph was deleted. ~~The OP line break class consists of allAll~~ ~~charactersCharacters~~ ~~of General Categorygeneral category~~ ~~Ps in the Unicode Character Database.~~

291.1/5.1

00A1	INVERTED EXCLAMATION MARK

291.2/5.1

00BF	INVERTED QUESTION MARK

291.3/5.1

2E18	INVERTED INTERROBANG

291.4~~/5.1~~/5.2

Note: The first two of these characters used to be in the class~~classed~~ AI based on their East_Asian_Width assignment of A. Such characters are normally resolved to either ID or AL. However, the characters listed above are used as punctuation marks in Spanish, where they would behave more like a character of class OP.

292~~/4.0.1~~/5~~/5.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

PO: —- Postfix (Numeric) (XB)

292.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB23a, LB24, LB25, LB27

293~~/4.1~~/5~~/5.1~~/16

{16.0.0: 179-A97; PRI-446#ID20220405071453}

Characters that usually follow a numerical expression may not be separated from preceding numeric characters or preceding closing characters~~, even if one or more space characters intervene~~. For example, there is no break opportunity in “(12.00) %”.

294/4.1

This paragraph was deleted. ~~For example, there is no break in “(12.00) %”~~

294.1/5

Some of these characters—in particular, degree sign and percent sign—can appear on both sides of a numeric expression. Therefore the line breaking algorithm by default does not break between PO and numbers or letters on either side.

295/5~~/5.2~~/6

Examples of Postfix~~The list of postfixpost-fix~~ characters includeis:

296

0025	PERCENT SIGN

298

00B0	DEGREE SIGN

298.1/4.1

060B	AFGHANI SIGN

298.2/5.1

066A	ARABIC PERCENT SIGN

299~~/4.1~~/5.1

2030~~203002030~~

PER MILLE SIGN

300

2031	PER TEN THOUSAND SIGN

301/3.1

2032..2037~~2035~~

PRIME..REVERSED TRIPLE PRIME

302

20A7	PESETA SIGN

303

2103	DEGREE CELSIUS

304

2109	DEGREE FAHRENHEIT

305/4.0.1

FDFC~~2126~~

RIAL~~OHM~~ SIGN

306

FE6A	SMALL PERCENT SIGN

307

FF05	FULLWIDTH PERCENT SIGN

308

FFE0	FULLWIDTH CENT SIGN

308.1~~/4.0.1~~/5

Alphabetic characters are also widely used as unit designators in a postfix~~post-fix~~ position. For purposes of line breaking, their classification as alphabetic is sufficient to keep them together with the preceding number.

309~~/4.0.1~~/5~~/5.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

PR: —- Prefix (Numeric) (XA)

309.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB23a, LB24, LB25, LB27

310~~/3.1~~/4~~/4.1~~/5~~/5.1~~/16

{16.0.0: 179-A97; PRI-446#ID20220405071453}

Characters that usually precede a numerical expression, may not be separated from following numeric characters or following opening characters~~, evenEVEN~~ ~~if a space character intervenes~~. For example, there is no break opportunity in “$ (100.00)”.

311/4.1

This paragraph was deleted. ~~There is no break in “$ (100.00)”~~

311.1/5

Many currency signs can appear on both sides, or even the middle, of a numeric expression. Therefore the line breaking algorithm, by default, does not break between PR and numbers or letters on either side.

312~~/4.1~~/5~~/5.2~~/16

{16.0.0: 133-C26}

All~~The PR line break class consists of allAll~~ currency symbols (General_Category~~General Category~~ Sc) except those~~as listed explicitly~~ in class PO have been assigned line breaking class PR. This class also contains all unassigned code points in the Currency Symbols block, and additional characters, including:~~as well asand~~ ~~the following:~~

313/3.1

002B

PLUS SIGN

314

005C	REVERSE SOLIDUS

315/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

00B1	PLUS-MINUS SIGN

315.1/3.1

2116	NUMERO SIGN

316

2212	MINUS SIGN

317/3.1

This paragraph was deleted. ~~2116~~

~~NUMERO SIGN~~

318/3.1

This paragraph was deleted. ~~2213~~

~~MINUS-PLUS~~

318.1~~/3.1~~/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

2213	MINUS-OR-PLUS -SIGN

318.2/4~~/4.0.1/4.1~~/5

Note~~NOTENote~~: Many currency symbols may be used either as prefix or as postfix, depending on local convention. For details on the conventions~~When~~ used, see [CLDR]. ~~in that way, these currency symbols should be treated as if they havehad~~ ~~line breaking class PO.~~

319~~/4.0.1~~/5~~/5.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

QU: —- Ambiguous Quotation (XB/XA)

319.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB15a, LB15b, LB19, LB19a

320~~/5.1~~/16

{16.0.0: 175-C23, 175-A71; L2/23-063} {16.0.0: 179-C28, 179-A102}

Some quotation~~paired~~ characters can be ~~either~~ opening or closing, or even both, depending on usage. The default is to use the General_Category values Initial_Punctation and Final_Punctation as a hint, together with context, but to err on the side of treating them as both opening and closing, thus preventing breaks on either side. This will prevent some breaks that might have been legal for a particular language or usage, such as outside~~between~~ a Simplified Chinese quotation of Latin text, or before~~closing quote and~~ a German quotation of text starting with a full stop.~~following opening punctuation.~~

321~~/3.1~~/4~~/4.1~~/5~~/5.1/5.2~~/6

Note~~NOTENote~~: If language information is available, it can be used to determine which character is used as the opening quote and which as the closing quote. (See the information in Section 6.2, General Punctuation, in [Unicode~~Unicode5.2~~0~~the~~ ~~Unicode~~]. In such a case, the quotation marks could be tailored to either OP or CL depending on their actual usage. ~~Section~~ ~~Standard, Version 3.0, Chapter~~ ~~6.2, General Punctuation.~~ ~~[U3.0]~~)

321.a/5.1

{5.1.0: }

Discussion: Class QU (for ambiguous quotation marks) is a Unicode innovation compared to the ancestor standard JIS X 4051, necessitated by the variety of quotation mark styles across languages; see The Unicode Standard, Chapter 6.

321.b/5.1

{5.1.0: }

Some rules pertaining to class QU in the algorithm may be expressed as heuristics for its resolution into OP and CL, as follows, where treating a quotation mark as both OP and CL means disallowing breaks according to both interpretations:

321.c/5.1

{5.1.0: }

Treat QU as OP in QU [^SP]. (LB19)

321.c.1/16

{16.0.0: 179-C28, 179-A102}

Except for \p{Pf} in East Asian context. (LB19a)

321.d/5.1

{5.1.0: }

Treat QU as CL in [^SP] QU. (LB19)

321.d.1/16

{16.0.0: 179-C28, 179-A102}

Except for \p{Pi} in East Asian context. (LB19a)

321.e/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

Treat Initial Punctuation (gc=Pi) as OP at the beginning of a line, at the beginning of a parenthetical or quotation, or after spaces. (LB15a)

321.f/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

Treat Final Punctuation (gc=Pf) as CL at the end of a line, before a prohibited break (including at the end of a parenthetical or quotation, as well as before trailing punctuation), or before spaces. (LB15b)

322~~/3.1/4.1~~/5/5.2

The QU line break class consists of characters~~Characters~~ of General_Category~~General Categorygeneral category~~ Pf or Pi in the Unicode Character Database and additional characters, including:~~as well as~~:,

323

0022	QUOTATION MARK

324

0027	APOSTROPHE

324.1~~/3.2~~/5

{3.2.0: 83-M11; L2/00-119}

This paragraph was deleted. ~~23B6~~

~~BOTTOM SQUARE BRACKET OVER TOP SQUARE BRACKET~~

324.2/3.2

275B	HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT

324.3/3.2

275C	HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT

324.4/3.2

275D	HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT

324.5/3.2

275E	HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT

324.5.1/5.1

2E00..2E01

RIGHT ANGLE SUBSTITUTION MARKER..RIGHT ANGLE DOTTED SUBSTITUTION MARKER

324.5.2/5.1

2E06..2E08

RAISED INTERPOLATION MARKER..DOTTED TRANSPOSITION MARKER

324.5.3/5.1

2E0B	RAISED SQUARE

324.6~~/3.2~~/4/5/15

{3.2.0: 83-M11; L2/00-119} {15.0.0: 172-A98; L2/22-124; PRI-446#ID20220603102213}

This paragraph was deleted. U+~~Note:~~ ~~23B6 BOTTOM SQUARE BRACKET OVER TOP SQUARE BRACKET is subtly different from the others in this class, in that it is both an opening and a closing punctuation character at the same time. However,~~ ~~since~~ ~~its use is limited to certain vertical text modes in terminal emulation. Instead of creating a one-of-~~ of a- ~~kind class for this rarely used character, assigning it to the QU class approximates the intended behavior.~~

324.7~~/6.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

RI: Regional Indicator (B/A/XP)

324.7.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB30a

324.8~~/6.2~~/10

For line Breaking, the~~The~~ Regional Indicator characters are all those with the Unicode character property of Regional_Indicator. This includes:

324.9/6.2

1F1E6..1F1FF

REGIONAL INDICATOR SYMBOL LETTER A .. REGIONAL INDICATOR SYMBOL LETTER Z

324.10~~/6.2~~/9

Pairs of RI characters are used to represent a two-letter ISO 3166 region code. ~~No break opportunity occurs between adjacent RI characters, otherwise breaks can occur before and after.~~

324.11~~/6.2~~/9/12

{12.0.0: 173-A128; PRI-383#ID20190106230734}

Runs of adjacent RI characters are grouped into pairs, beginn~~beginnn~~ing at the start of the run. No break opportunity occurs within~~To provide~~ a pair; breaks can occur~~break~~ between adjacent pairs. When RI characters are adjacent to characters of other classes, breaks can occur before and after, except where forbidden by other rules.~~insertion of a U+200B ZERO WIDTH SPACE is recommended.~~

325~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

SA: —- Complex-Contextcontext Dependent Characters (South East Asian) (P)

325.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB1

326/5~~/5.1~~/5.2

Runs of these characters require morphological analysis to determine break opportunities. This is similar to, for example, ~~e.g.~~ a hyphenation algorithm. For the characters that have this property, no break opportunities~~line breaks~~ will be found otherwise. Therefore~~, therefore~~ complex context analysis, often involving dictionary lookup of some form, is required to determine non-emergency line breaks. If such analysis is not available, it is recommended to treat them as AL. ~~is mandatory.~~

326.1/5~~/5.1~~/5.2

This paragraph was deleted. ~~If such analysis is not available, it is recommended to treat themthey should be treated~~ ~~as AL.~~

327~~/3.1~~/4/5/5.2

Note~~NOTENote~~: These characters can be mapped into their equivalent line breaking classes by using~~as the result of~~ dictionary lookup, thus permitting a logical separation of this algorithm from the morphological analysis.

328/5

This paragraph was deleted. ~~If dictionary lookup is not available they should be treated as XX.~~

329~~/4.0.1~~/5/5.2

The class SA consists of all~~All~~ characters of General_Category~~General Category~~ Cf, Lo, Lm, Mn, or McLm in the following~~these~~ blocks that are not members of another line break class.~~ranges, except as noted elsewhere:~~:

330/5

0E00..0E7F~~0EFF~~

Thai~~THAI / LAO~~

330.1/5

0E80..0EFF

Lao

331~~/3.0.1~~/5

1000..109F~~1100..11FF~~

Myanmar~~MYANMAR~~

332/5

1780..17FF

Khmer~~KHMER~~

332.1/5

1950..197F

Tai Le

332.2/5

1980..19DF

New Tai Lue

332.3/5.2

1A20..1AAF

Tai Tham

332.3.1/8

A9E0..A9FF

Myanmar Extended-B

332.4/5.2

AA60..AA7F

Myanmar Extended-A

332.5/5.2

AA80..AADF

Tai Viet

332.6/8

11700..1173F

Ahom

333~~/4.0.1~~/5~~/5.2~~/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

SG: —- SurrogateSurrogates (XP) —- (Non-tailorablenormative)

333.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB1

334/4~~/4.0.1/4.1~~/5/5.1

Line break class SG comprises all~~All~~ code points~~characters~~ with General_Category~~General Category~~ Cs. The line~~There is no~~ breaking behavior of isolated surrogates is undefined. In UTF-16, paired surrogates represent non-BMP code points. Such code points must be resolved before assigning line break properties. In UTF-8 and UTF-32 surrogate code points represent corrupted data and their line break behavior is undefined.~~between a high surrogate and a low surrogate.~~

335~~/3.1~~/4/5/5.1

Note~~NOTENote~~: The use of this line breaking class is deprecated. It was of limited usefulness for UTF-16 implementations that diddo~~are~~ not support~~ing~~ characters beyond the BMP. The correct implementation is to resolve a pair of surrogates into a supplementary~~depends on the~~ character before. A useful default is to treat characters in the range 0x00010000 to 0x0001FFFD as AL and characters in the range 0x00020000 to 0x0002FFFD as ID, until the implementation can be revised to take into account the actual line breaking. ~~properties for these characters, many of which have yet to be assigned.~~

336~~/3.1/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

SP: —- Space (A) —- (Non-tailorablenormative)

336.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB7, LB8, LB9, LB12a, LB14, LB15a, LB15b, LB15c, LB16, LB17, LB18, LB20a

337/3.1

This paragraph was deleted. Breaking Spaces

337.1/5/5.1

The space characters are used as explicit break opportunities; they allow line breaks before most other characters. However~~however~~, spaces at the end of a line are ordinarily not measured for fit. If there is a sequence of space characters, and breaking after any of the space characters would result in the same visible line, then the line breaking position after the last space character in the sequence is the locally most optimal one. In other words, when~~because~~ the last character measured for fit is before the space character, any number of space characters are kept together invisibly on the previous line and the first non-space character starts the next line.

338

0020	SPACE (SP)

339/4~~/4.1~~/5

This paragraph was deleted. ~~The space characters are explicit break opportunities, howeverbut~~ spaces at the end of a line are not measured for fit. If there is a sequence of space characters, and breaking after any of the space characters would result in the same visible line, the line breaking position after the last space character in the sequence is the locally most optimal one. In other words, because~~since~~ ~~the last character measured for fit is before the space characterBEFORE the space character, any number of space characters are kept together invisibly on the previous line and the first non-space character starts the next line.~~

340~~/3.1~~/5

Note: By default,~~NOTENote~~: SPACE, but none of the other breaking spaces, is used in determining an indirect break. For other breaking space characters, see BA.

341~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

SY: —- Symbols Allowing Break After (A)

341.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB13, LB15b, LB21b, LB25

342~~/3.1/4.0.1/4.1~~/5/5.1

The SY line breaking property is intended to provide a break opportunity after, except in front of digits,~~URLs are now~~ so as to not~~common~~ ~~enough now~~ ~~in regular plain text~~, ~~that they must be taken into account when assigning general-~~ ~~purpose line~~ break~~ing properties. The SY line breaking property is intended to provide a break after, but not in front of, digits so as to not break~~ “1/2” or “06/07/99”.

344~~/4.1~~/5/5.1

URLs are now so common in regular plain text that they need to be taken into account when assigning general-purpose line breaking properties. Slash (solidus~~SOLIDUS~~) is allowed as an additional, limited break opportunity to improve layout of Web~~web~~ addresses. As a side effect, some common abbreviations such as “"w/o”" or “"A/S”," which normally would not be broken, acquire a line break opportunity. The recommendation in this case is for the layout system not to utilize a line break opportunity allowed by SY unless the distance between it and the next line break opportunity exceeds an implementation- defined minimal distance.

345~~/3.1/4.1~~/5

Note~~NOTE~~: Normally, symbols are treated as AL. However, symbols can be added to this line breaking class or classes BA, BB, and B2 by tailoring. This can be used to allow additional ~~symbols can be addedIf it is desired~~ ~~to this~~ line ~~breaking class, or classes BA, BB, B2 by tailoring. This can be used to allow additional lineother~~ breaks—, ~~more symbols can be added to this line breaking classcategory, or classescategory~~ ~~BA, BB, B2 by tailoring,~~ for example, after “=”. Mathematics requires additional specifications for line breaking, which are outside the scope of this annex.~~document.~~

345.0.1~~/15.1~~/16

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535} {16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

VF: Virama Final (XB/A)

345.0.1.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB28a

345.0.2/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

The VF line break class is only used for scripts that use the Brahmic style of context analysis. It contains the viramas of Indic syllabic category Pure_Killer in scripts where the final consonant of a phonological syllable is expressed as a sequence of a consonant and such a virama, and the final consonant needs to be kept together with the preceding orthographic syllable. This includes:

345.0.3/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1BF2..1BF3

BATAK PANGOLAT..BATAK PANONGONAN

345.0.4/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Viramas of Indic syllabic category Pure_Killer that don’t meet the conditions for line break class VF use the line break class CM.

345.0.5~~/15.1~~/16

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535} {16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

VI: Virama (XB/XA)

345.0.5.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB28a

345.0.6/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

The VI line break class is only used for scripts that use the Brahmic style of context analysis. It contains the viramas of Indic syllabic categories Virama and Invisible_Stacker of such scripts.

345.0.7/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1B44	BALINESE ADEG ADEG

345.0.8/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

A9C0	JAVANESE PANGKON

345.0.9/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11046

BRAHMI VIRAMA

345.0.10/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

1134D

GRANTHA SIGN VIRAMA

345.0.11/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

11F42

KAWI CONJOINER

345.1/4~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

WJ: —- Word Joinerjoiner (XB/XA) —- (Non-tailorablenormative)

345.1.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB11, LB15b

345.2/4~~/4.1~~/5

These~~The action of these~~ characters ~~is to~~ glue together ~~both~~ left and right neighbor characters~~character~~ such that they are kept on the same line. ~~If they follow a space character, they still allow a break.~~

345.3/4

2060	WORD JOINER (WJ)

345.4/4

FEFF	ZERO WIDTH NO-BREAK SPACE (ZWNBSP)

345.5/4/4.1

The word joiner character is the preferred choice for an invisible character to keep other characters together that would otherwise be split across the line at a direct break. The character FEFF has the same effect, but because~~since~~ it is also used in an unrelated way as a byte order mark, the use of the WJ as the preferred interword glue simplifies~~will simplify~~ the handling of FEFF. ~~By definition WJ and ZWNBSP take precedence over the action of SP and ZW.~~

345.6~~/4.1~~/5

By definition, WJ and ZWNBSP take precedence over the action of SP, but not ZW.

346~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

XX: —- Unknown (XP)

346.0.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB1

346.1/4~~/4.0.1/4.1~~/5/5.2

The XX line break class consists of all~~All~~ characters with General_Category~~General Category~~ Co as well as those unassigned~~and all~~ code points~~codepoints~~ that are not within a CJK block. Unassigned characters in blocks or ranges of the Unicode codespace that have been reserved for CJK scripts default to the class ID, and are listed in the description of that class.~~with General_CategoryGeneral Category~~ ~~Cn.~~

347~~/3.1/3.2~~/4~~/4.0.1~~/5~~/5.1~~/5.2

{3.2.0: 81-M6, 85-M7; L2/00-258}

Unassigned code positions, private- use ~~and~~ characters, and characters for which reliable line breaking information is not available are assigned this line breaking property. The default ~~line breaking property by default. The default~~ behavior for this class is identical to class AL. Users can manually insert ZWSP or WORD JOINER~~word joinerWORD JOINERZWNBSP~~ around~~(e.g. Private use~~ characters of class XX~~) are assigned this default line breaking property. The behavior is otherwise identical~~ to allow~~forceclass AL. Implementations can override~~ or tailor this default behavior, e.g. by assigning private use characters the property ID if that is more likely to give the correct default behavior for their users. Users can manually insert ZWSP or ZWNBSP around characters of class XX to force or prevent breaks as needed.

347.1~~/3.1/4.1~~/5

In addition, implementations can override or tailor this default behavior—, for example,~~e.g.~~ by assigning characters the property ID or another class. Doing so may give better default behavior for their users. There are other possible means of determining the desired behavior of private-use characters ~~use characters~~. For example, one implementation might treat any private-use ~~use~~ character in ideographic context as ID, while another implementation might support a method for assigning specific properties~~, if that is more likely~~ to give the correct default behavior for their users, or use other means to determine the correct behavior. For example one implementation might treat any private use character in ideographic context as ID, while another implementation might support a method for assigning specific ~~properties to specific~~ definitions of private-use characters ~~use characters~~. The details of such use of private-use characters ~~use characters~~ are outside the scope of this standard.

347.2~~/3.1~~/4

This paragraph was deleted. ~~All characters with General Category Co and all codepoints with General Category Cn.~~

347.3/4/5

For supplementary characters, a useful default is to treat characters in the range 10000..1FFFD as AL and characters in the ranges 20000..2FFFD and 30000..3FFFD as ID, until the implementation can be revised~~0x10000~~ to take into account the actual line breaking properties for these~~0x1FFFD as AL and~~ characters. ~~in the range 0x20000 to 0x2FFFD, and 0x30000 to 0x3FFFD as ID, until the implementation can be revised to take into account the actual line breaking properties for these characters.~~

347.4/4~~/4.0.1/4.1~~/5~~/5.2~~/6

For more information on handling default property values for unassigned characters, see the discussion on default property values in Section 5.3, Unknown and Missing Characters, of [Unicode~~Unicode5.2~~0~~Unicode~~].

347.5/4/5/10

{10.0.0: 147-A79}

The line breaking rules in Section 6, Line Breaking Algorithm~~, and the pair table in Section 7, Pair Table-Basedbased~~ ~~Implementation,~~ assume that all unknown characters have been assigned one of the other line breaking classes, such as AL, as part of assigning line breaking classes to the input characters.

347.6/5.1

Implementations that do not support a given character should also treat it as unknown (XX).

348~~/4.0.1~~/5/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

ZW: —- Zero Width Space (A) —- (Non-tailorablenormative)

348.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB7, LB8, LB9, LB15a, LB15b, LB20a

349

200B	ZERO WIDTH SPACE (ZWSP)

350/4.0.1

This character ~~does not have width. It~~ is used to enable additional (invisible) break opportunities wherever SPACE cannot be used. As its name implies, it normally has no width. However, its presence between two characters does not prevent increased letter spacing in justification.

350.1/9/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

ZWJ: Zero Width Joiner (XA/XB) (Non-tailorable)

350.1.1/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

LB8a, LB9, LB10

350.2/9

200D	ZERO WIDTH JOINER (ZWJ)

350.3/9/11

{11.0.0: 149-A53, 155-A27, 155-C14, 155-A112; L2/17-074}

A ZWJ prevents breaks between most pairs of characters that would otherwise break. It~~This character~~ has various uses, including as a connector in emoji zwj sequences and as a joiner in complex scripts.

350.4/9/10

{10.0.0: 150-A58, 150-C22; L2/16-315R}

Emoji zwj sequences are defined by ED-16, emoji zwj sequence, in [UTS51~~UTR51~~] and implemented for line breaking by rule LB8a. In other respects, the line breaking behavior of ZWJ is that of a combining character of class CM.

351/4.1

5.2 Additional Details on Dictionary Usage

352/4~~/4.0.1/4.1~~/5/16

{16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

Dictionaries follow specific conventions~~strict standards~~ that guide their use of special characters to indicate features of the ~~listed~~ terms they list. Marks used for some ~~listed. Some~~ of these conventions may occur near~~mark places that can also serve as~~ line break~~ing~~ opportunities and therefore interact with line breaking. For,~~. Some of these~~ ~~and~~ ~~are described here. Where possible, the default line breaking properties~~ ~~for~~ example,~~characters commonly used~~ in one dictionary a natural hyphen in a word becomes a tilde dash when the word is split. Section 6.2.8, Hyphenation Point and Dictionary Syllabification, of [Unicode] illustrates the use of marks whose line breaking classes have been assigned to accomodate various dictionary usages.~~dictionaries have been assigned so as to accommodateappropriate,~~ ~~these conventions.characters have been inserted in the list of characters for the corresponding line breaking classproperty~~ ~~above.~~

353~~/4.1~~/5~~/5.1/5.2~~/16

{16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~This subsection briefly describesExamples of~~ ~~conventions used in several dictionaries. Where possible, the line breaking properties for characters commonly used~~ ~~are briefly described~~ ~~in dictionaries have been assigned to accommodate these and similar conventions by default. However, implementing the full conventions in dictionaries requires tailoring ofthis subsection. Where possible, the~~ ~~line break~~ing properties for characters commonly used in dictionaries have been assigned to accommodate these and similar conventions by default. However, implementing the full conventions in dictionaries requires tailoring of line break ~~classes and rules or other types ofing properties for characters commonly used in dictionaries have been assigned~~ ~~so as~~ ~~to accommodate these and similar conventions. However, implementing the full conventions in dictionaries requires~~ ~~special support.~~ ~~Looking up the noun “syllable” in eight dictionaries yields eight different conventions, in one dictionary a natural hyphen in a word becomes a tilde dash if the word is split.~~

353.1~~/4.1~~/16

{16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~Looking up the noun “syllable” in eight dictionaries yields eight different conventions:~~

354~~/3.1/4.0.1~~/5~~/15.1~~/16

{15.1.0: 173-A13; L2/22-244; L2/22-243#ID20220921024738} {16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~Dictionary of the English Language (~~, ~~Samuel Johnson, 1843) SYʹLLABLE withSY´LLABLE where ´ is~~ ~~an oversized~~a ~~U+02B9 which(and a large one at that)~~ ~~and~~ ~~follows the vowel of the main syllable (not the syllable itself).~~

355~~/3.1/4.0.1~~/5~~/15.1~~/16

{15.1.0: 173-A13; L2/22-244; L2/22-243#ID20220921024738} {16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~Oxford English Dictionary (1st Edition) si·lă~~lâ~~'bl where · is a slightly raisedabove~~ ~~middle dot indicating the vowel of the stressed syllable (similar to Johnson’~~'~~s primeacute). The~~ ~~letter ă isâ is really~~ ~~U+0103. The~~ ~~' is an apostrophe.~~

356~~/3.1/4.0.1~~/5~~/15.1~~/16

{15.1.0: 173-A13; L2/22-244; L2/22-243#ID20220921024738} {16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~Oxford English Dictionary (2nd Edition) has gone to IPA ˈsɪləb'sIləbsIleb~~(əe~~)l where ˈ~~' ~~is U+02C8, I is U+026A, and ə~~e ~~is U+0259 (both times). The~~ ~~' comes before the stressed syllable. The~~ ~~() indicates thatindicateindicates~~ ~~the schwa may be omitted.~~

357~~/3.1/4.0.1/4.1~~/5~~/15.1~~/16

{15.1.0: 173-A13; L2/22-244; L2/22-243#ID20220921024738} {16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~Chambers English Dictionary (7th Edition) silʹəsil´ə~~e~~-bl where the stressed syllable is followed by ʹ~~´ ~~U+02B9, ə~~e ~~is U+0259,~~ ~~and - is a hyphen. When~~ ~~when~~ ~~splitting a word like abateʹabate´- ment, the stress mark ʹ~~´ ~~goes after stressed syllable followed by the hyphen. No special convention is used~~, ~~when~~if ~~splitting at hyphen.~~

358~~/4.0.1~~/5~~/15.1~~/16

{15.1.0: 173-A13; L2/22-244; L2/22-243#ID20220921024738} {16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~BBC English Dictionary sɪ̲ləblsIləblsIlebl~~ ~~where ɪ̲~~I ~~is <U+026A, U+0332>~~ ~~and~~, ə~~, e~~ ~~is U+0259. The vowel of the stressed syllable is underlined.~~

359~~/4.0.1/4.1~~/5~~/15.1~~/16

{15.1.0: 173-A13; L2/22-244; L2/22-243#ID20220921024738} {16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~Collins Cobuild English Language Dictionary sɪ̲ləbə⁰lsIləbəsIlebe~~°l ~~where ɪ̲~~I ~~is <U+026A, U+0332> and has, and means~~ ~~the same meaning as in the BBC English Dictionarydictionary. The ⁰ is~~əe ~~is U+0259 (both times). The ° is a~~ ~~U+2070 and indicates the schwa may be omitted.~~

360~~/4.0.1/4.1~~/5~~/15.1~~/16

{15.1.0: 173-A13; L2/22-244; L2/22-243#ID20220921024738} {16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~Readers Digest Great Illustrated Dictionary~~. ~~syl‧~~·~~la‧~~·~~ble (sílləbsílleb'l) The spelling of the word has hyphenation points (‧ is· is a~~ ~~U+2027) followed by phonetic spelling. The vowel of the stressed syllable is given an accent,~~ (~~rather than being followed by an accent~~)~~. The~~ ~~letter e is a schwa in the actual example and~~ ~~' is an apostrophe.~~

361~~/4.0.1/4.1~~/5~~/15.1~~/16

{15.1.0: 173-A13; L2/22-244; L2/22-243#ID20220921024738} {16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~Webster’~~'~~s 3rd New International Dictionary~~. ~~syl‧~~·~~la‧~~·~~ble /ˈsiləbəl'siləbəlsilebel/ The spelling of the word has hyphenation points (‧ is· is a~~ ~~U+2027) and is followed by phonetic spelling. The stressed syllable is preceded by ˈ~~' ~~U+02C8.~~ ~~The ə’~~e'~~s are schwas as usual.~~ ~~Webster’s splits words at the end of a line with a normal hyphen. A U+2E17 DOUBLE OBLIQUE HYPHEN indicates thatWhen~~ ~~a hyphenated word is split at the hyphen.~~ ~~this is indicated by a double hyphen which looks like a light version of the German Fraktur hyphen (short equals sign with a slight slope up to the right).~~

361.0.1/5/16

{16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. Some dictionaries use a character that looks like a vertical series of four dots to indicate places where there is a syllable, but no allowable break. This can be represented by a sequence of U+205E VERTICAL FOUR DOTS followed by U+2060 WORD JOINER.

361.0.2/5.1

5.3 Use of Hyphen

361.0.3/5.1

The rules for treating hyphens in line breaking vary by language. In many instances, these rules are not supported as such in the algorithm, but the correct appearance can be realized by using a non-breaking hyphen.

361.0.4/5.1

Some languages and some transliteration systems use a hyphen at the first position in a word. For example, the Finnish orthography uses a hyphen at the start of a word in certain types of compounds of the form xxx yyy -zzz (where xxx yyy is a two-word expression that acts as the first part of a compound noun, with zzz as the second part). Line break after the hyphen is not allowed here; therefore, instead of a regular hyphen, U+2011 NON-BREAKING HYPHEN should be used.

361.0.5/5.1

There are line breaking conventions that modify the appearance of a line break when the line break opportunity is based on an explicit hyphen. In standard Polish orthography, explicit hyphens are always promoted to the next line if a line break occurs at that location in the text. For example, if, given the sentence "Tam wisi czerwono-niebieska flaga" ("There hangs a red-blue flag"), the optimal line break occurs at the location of the explicit hyphen, an additional hyphen will be displayed at the beginning of the next line like this:

361.0.6/5.1

Tam wisi czerwono- -niebieska flaga.

361.0.7~~/5.1~~/5.2

The same convention is used in Portuguese, where the use of hyphens is common, because they are mandatory for verb~~verbs~~ forms that include a pronoun. Homographs or ambiguity may arise if hyphens are treated incorrectly: for example, "disparate" means "folly" while "dispara-te" means "fire yourself" (or "fires onto you"). Therefore the former needs to be line broken as

361.0.8/5.1

dispara- te

361.0.9/5.1

and the latter as

361.0.10/5.1

dispara- -te.

361.0.11~~/5.1~~/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

A recommended practice is to type <SHY, NON-BREAKING HYPHEN~~NBHY~~> instead of <HYPHEN> to achieve promotion of the hyphen to the next line. This practice is reportedly already common and supported by major text layout applications. See also Section 5.4, Use of Soft Hyphen.

361.1/4~~/4.1~~/5.1

5.43 Additional Details on the Use of Soft Hyphen

361.2/4/5~~/5.1~~/5.2

Unlike U+2010 HYPHEN, which always has a visible rendition, the character U+00AD SOFT HYPHEN (SHY) is an invisible format character that merely indicates a preferred intraword line break position. If the line is broken at that point, then whatever mechanism is appropriate for intraword line breaks should be invoked, just as if the line break had been triggered by another hyphenation mechanism, such as a dictionary lookup. Depending on the language and the ~~intra-~~word, that may produce different visible results, — ~~line-break position. If the line is broken at that point, then whatever mechanism is appropriate~~ for example:intra-word line-breaks should be invoked, just as if the line break had been triggered by another mechanism, such as a dictionary lookup. Depending on the language and the word, that may produce different visible results, such as:

361.3/4/5

• Simply~~simply~~ inserting a hyphen glyph

361.4/4/5

• Insert~~insert~~ing a hyphen glyph and changing spelling in the divided word parts

361.5/4/5

• Not~~not~~ showing any visible change and simply breaking at that point

361.6/4/5

• Insert~~insert~~ing a hyphen glyph at the beginning of the new line

361.7/4~~/4.0.1~~/5

The following~~Here~~ are a few examples~~some example~~ of spelling changes. Each example shows the line break as “ / ” and any inserted hyphens. There are many other cases.:

361.8/4~~/4.1~~/5

• Inin pre-reform~~traditional~~ German orthography, a “c” before the hyphenation point can change into a “k”: “Drucker” hyphenates into “Druk- / ker”.

361.9/4~~/4.0.1/4.1~~/5

• Inin modern Dutch, ana e-diaeresis~~diaresis~~ after the hyphenation point can change into a simple “e”: “geërfde~~angeërfde~~” hyphenates into “ge~~ange~~- / erfde”, and “geëerd” into “ge-/ eerd”..

361.10/4~~/4.0.1~~/5/15

• Inin ~~German and~~ Swedish, a consonant is sometimes doubled: ~~Swedish~~ “tuggummi”; hyphenates into “tugg- / gummi”.

361.11/4/5

• Inin Dutch, a letter can disappear: “opaatje” hyphenates into “opa- / tje”.

361.12/4~~/4.0.1/4.1~~/5

The inserted hyphen glyph can take a wide variety of shapes,~~Each example shows the line break~~ as appropriate for the situation. Examples include~~“ / ” and any inserted hyphens. There are many other cases. The inserted hyphen glyph, if any,~~ ~~can~~ be ~~take a wide variety of~~ shapes~~, as appropriate for the situation. Examples include shapes~~ like U+2010 HYPHEN, U+058A ARMENIAN HYPHEN, or U+180A MONGOLIAN NIRUGU, or U+1806 MONGOLIAN TODO SOFT HYPHEN.

361.13/4~~/4.0.1~~/4.1

When a SHY is used to represent a possible hyphenation location, the spelling is that of the word without hyphenation: “tug<SHY>gummi”. It is up to the line -breaking implementation to make any necessary spelling changes when such a possible hyphenation is actually used.~~becomes actual.~~

361.14/4~~/4.1~~/5~~/5.1/5.2~~/6

Sometimes it is's desirable to encode text that includes line breaking decisions and will not be further broken into lines. If such~~, in other words,~~ text ~~that~~ includes ~~line breaking decisions. If such text includes~~ hyphenations, the spelling needs to~~must~~ reflect the changes due to hyphenation: “tugg<U+2010>/ gummi”, including the appropriate character for any inserted hyphen. For a list of dash-like characters~~character~~ in Unicode, see Section 6.2, General Punctuation, in [Unicode~~Unicode5.2~~0~~Unicode~~].

361.14.1/5~~/5.1/5.2~~/6/7/9

Hyphenation, and therefore the SHY, can be used with the Arabic script. If the rendering system breaks at that point, the display—including shaping—should be what is appropriate for the given language. For example, sometimes a hyphen-like mark is placed on the end of the line. This mark looks like a kashida, but is not connected to the letter preceding it. Instead, the appearance of the mark is~~, looking~~ as if it had been~~the mark is~~ placed— and the line divided— after the contextual shapes for the line have been determined. For more information on shaping, see [UAX9~~Bidi~~] and Section 98.2, Arabic, of [Unicode~~Unicode5.2~~0].

361.15/4~~/4.1~~/5/5.1

There are three types of hyphens: explicit~~Explicit~~ hyphens, conditional hyphens, and dictionary-inserted hyphens resulting from~~(as~~ a hyphenation process. There is no character code for the third kind of hyphen. If~~; therefore ifresult of~~ a distinction is desired, the fact that a~~hyphenation process). There is no character code for the third kind of~~ hyphen~~; therefore if it is desired to make the distinction, the fact that a hyphen~~ is dictionary-inserted and not user-supplied can only~~must~~ be represented out of band, or by using another control code instead of SHY.

361.16/4/5

The action of a hyphenation algorithm is equivalent to the insertion of a SHY. However, when a word contains an explicit SHY, it is customarily treated as overriding the action of the hyphenator for that word.

361.16.1~~/5.1~~/11

{11.0.0: 155-A26; PRI-376#ID20180414084252}

The sequence <SHY, NON-BREAKING HYPHEN~~NBHY~~> is given a particular interpretation, see Section 5.3, Use of Hyphen.

361.17~~/4.0.1/4.1~~/5.1

5.54 Additional Details on the Use of Double Hyphen

361.18~~/4.0.1~~/5.2

In some fonts, notably~~noticeably~~ Fraktur fonts, it is customary to use a double-stroke form of the hyphen, usually oblique. Such use is ~~merely~~ a font-based glyph variation and does not affect line breaking in any way. In texts using such a font, automatic hyphenation or SHY would also result in the display of a double-stroke, oblique hyphen.

361.19~~/4.0.1/4.1~~/5~~/5.1~~/5.2

In some dictionaries, such as~~for example~~ Webster’'s 3rd New International Dictionary ~~cited above~~, double-stroke, oblique hyphens are used to indicate an explicit hyphen at the end of the line;, in other words, a hyphen that would be retained when the term shown is not line wrapped. It is not necessary to store a special character in the data toTo support this option; one merely needs~~, it is not necessary~~ to substitute the glyph of any ordinary hyphen that winds up at the end of~~store~~ a line. In this example, if the shape of the special hyphen matches an existing character, such ~~in the data; one merely needs to substitute the glyph of any ordinary hyphen that winds up at the end of athe~~ ~~line. For example, if the shape of the special hyphen,~~ as ~~in this case, matches an existing character, such as~~ U+2E17 DOUBLE OBLIQUE HYPHEN, that character can ~~that should~~ be substituted temporarily for display purposes by the~~retained when the term shown is not~~ line formatter. WithIn such a convention, automatic hyphenation or SHY would result~~wrapped. It is not necessary to~~ ~~actually~~ ~~store a special character~~ in the display of an~~data; one~~, ~~merely needs to substitute the glyph of any~~ ordinary hyphen ~~that windsends~~ ~~up at the end of a line. In such convention, automatic hyphenation or SHY would result in the display of an ordinary hyphen~~ without further substitution. (See also Section 5.3, Use of Hyphen)..

361.20~~/4.0.1~~/4.1

Certain linguistic notations make use of a double-stroke, oblique hyphen to indicate specific features. The U+2E17 DOUBLE OBLIQUE HYPHEN~~In these cases, the~~ character used in this case is not a hyphen and does not represent a line break opportunity. Automatic hyphenation or SHY would result in the display of an ordinary hyphen.

361.20.1/5.1

U+30A0 KATAKANA-HIRAGANA DOUBLE HYPHEN is used in scientific notation, for example, to mark the presence of a space that would otherwise have been lost in transcribing text, such as the name of a chemical compound, into Katakana. In such notation, ordinary hyphens are retained.

361.21~~/4.1~~/5.1

5.65 Tibetan Line Breaking

361.22~~/4.1~~/5/5.2

The Tibetan script uses spaces sparingly, relying instead on the tsheg~~thseg~~. There is no punctuation equivalent to a period in Tibetan; Tibetan shad characters indicate the end of a “"phrase,”" not a sentence. “"Phrases”" are often metrical—, that is, written after every N syllables—, and a new sentence can often start within the middle of a phrase. Sentence boundaries need to be determined grammatically rather than by punctuation.

361.23~~/4.1~~/5

Traditionally there is nothing akin to a paragraph in Tibetan text. It is typical to have many pages of text without a paragraph break—, that is, without an explicit line break. The closest thing to a paragraph in Tibetan is a new section or topic starting with U+0F12 or U+0F08. However, these occur inline: one section ends and a new one starts on the same ~~in-~~line, and the new section is marked only by the presence of: one ~~section ends and a new one starts on the same line and the new section is marked only by the presence of one~~ of these characters.

361.24~~/4.1~~/5~~/5.1~~/5.2

Some modern books, newspapers, and magazines format text more like English with a break before each section or topic— - and (often) the title of the section on a separate line. Where this is done, authors do insert an explicit line break. Western punctuation (full stop, question mark, exclamation mark, comma, colon, semicolon~~semi colon~~, quotes) is starting to appear in Tibetan documents, particularly those published in India, Bhutan, and Nepal. Because there are no formal rules for their use in Tibetan, they get treated generically by default. In Tibetan documents published in China, CJK bracket and punctuation characters occur frequently; it is recommended to treat these ~~should be treated~~ as in horizontally written Chinese. ~~written horizontally.~~

361.25~~/4.1~~/5/5.2

Note~~NOTE~~: The detailed rules for formatting Tibetan texts are complex, and the original assignment of line break classes was found to be ~~wholly~~ insufficient ~~for the purpose~~. In [Unicode4~~Unicode 4~~.1],.0 the assignment of line break classes for Tibetan was~~has been~~ revised significantly in an attempt to better model Tibetan line breaking behavior. No new rules or line break classes were added. As yet there is limited practical experience with the revised assignment of line break classes. As more experience is gained, some modifications, possibly including new rules or additional line break classes, can be expected. Nevertheless the current set of line break classes should provide a good starting point.

361.25.1/5~~/5.1~~/5.2

The set of line break classes for Tibetan is~~are~~ expected to~~should~~ provide a good starting point, even though there is limited practical experience in their implementation. As more experience is gained, some modifications, possibly including new rules or additional line break classes, can be expected.

361.26~~/4.1~~/5.2

This paragraph was deleted. It is the stated intention of the Unicode Consortium to review these assignments in a future version and to furnish a more detailed and complete description of Tibetan line breaking and line formatting behavior.

361.27/5.1

5.7 Word Separator Characters

361.28~~/5.1~~/5.2

Visible word separator characters may behave in one of three ways at line breaks. As an example, consider the text “The:quick:brown:fox:jumped.”, where the colon (:) represents a visible word separator, with a break between “"brown” and “~~" and "~~fox”". The desired visual appearance could be one of the following:

361.29/5.1

1. suppress the visible word separator

361.30/5.1

The:quick:brown fox:jumped.

361.31/5.1

2. break before the visible word separator

361.32/5.1

The:quick:brown :fox:jumped.

361.33/5.1

3. break after the visible word separator

361.34/5.1

The:quick:brown: fox:jumped.

361.35/5.1

Both (2) and (3) can be expressed with the Unicode Line Breaking Algorithm by tailoring the Line Break property value for the word separator character to be Break Before or Break After, respectively.

361.36/5.1

For case (1), the line break opportunity is positioned after the word separator character, as in case (3), but the visual display of the character is suppressed. The means by which a line layout and display process inhibits the visible display of the separator character are outside of the scope of the Line Break algorithm. U+1680 OGHAM SPACE MARK is an example of a character which may exhibit this behavior.

362

6 Line Breaking Algorithm

363/4/5/5.2

Unicode Standard Annex ~~UAX~~#29, “Unicode Text Segmentation~~Boundaries~~”, [UAX29~~Boundaries~~]~~The Unicode Standard, Version 3.0~~, describes a particular method for boundary detection, ~~in Chapter 5. It is~~ based on a set of hierarchical rules and character classifications. That method is~~would be~~ well suited for implementation of some of the advanced heuristics for line breaking..

364/4~~/4.0.1~~/5/10

{10.0.0: 147-A79}

This paragraph was deleted. ~~A slightly simplified implementation of such an algorithmthat~~ ~~can be devised that uses a two-~~ ~~dimensional table to resolve break opportunities between pairs or characters. It is described in Section 7, Pair Table-BasedbasedTable Based~~ ~~Implementation.the following section.~~

365/4

This paragraph was deleted. ~~The line breaking algorithm presented in this section can be expressed in a series of rules which take line breaking classes as input.~~

366~~/4.1~~/5

This paragraph was deleted. 6.1 Line Breaking Rulesbreaking rules

367/4~~/4.1~~/5~~/5.2~~/8

The line breaking algorithm presented in this section can be expressed in a series of rules that~~which~~ take line breaking classes defined in Section 5.1,2, Description of Line Breaking Properties, as input. The title of each rule contains a mnemonic summary of the main effect of the rule. The formal statement of each line breaking rules consists either of a remap rule or of one or more~~are stated in terms of~~ regular expressions containing one or more~~over the~~ line breaking classes and one of~~defined in Section 5.2, Description of Line Breaking Propertiesabove~~ ~~and~~ three special symbols indicating the type of line break opportunity:.

368

! Mandatory break at the indicated position

369

× No break allowed at the indicated position

370

÷ Break allowed at the indicated position

370.0.1~~/13~~/16

{16.0.0: 179-C28, 179-A102}

In the regular expressions, parentheses may be used for grouping, and square brackets, &, -, and \p{...} may be used to compose sets of characters, as in UAX #29, Unicode Text Segmentation [UAX29] and in UTS #18, Unicode Regular Expressions [UTS18]. Use of a line break class such as BK is short for the property expression \p{lb=BK}. The symbol $EastAsian stands for the set [\p{ea=F}\p{ea=W}\p{ea=H}] of characters with Fullwidth, Wide, or Halfwidth East Asian Width.

370.1~~/3.1/4.1~~/5

Split from §372. The rules are applied in order. That is, there is an implicit “”otherwise” at the front of each rule following the first. It is possible to construct alternate sets of such rules that are fully equivalent. To be equivalent, an alternate set of rules must~~, i.e. they~~ have the same effect.~~The examples for each use representative characters, where ’H’ stands for an ideographs, ’h’ for small kana, ’9’ for digits.~~

371~~/3.1/4.0.1/4.1~~/5

The distinction between a direct break and an indirect break as defined in Section 2, Definitions, is handled in rule LB18, whichby explicitly considers~~considering~~ the effect of SP. Because rules are applied in order, allowing breaks following SP in rule LB18 implies that any prohibited break in~~LB12. Because~~ rules LB19–LB30 is equivalent to an indirect~~are applied in order, rule LB12 implies that a prohibited~~ break. ~~in rules LB13~~13~~– LB19~~-19 ~~is equivalent to an indirect break.~~.

372~~/3.1/4.1~~/5

Part of this paragraph was moved to §370.1. ~~The rules are applied in order. That is, there is an implicit ”otherwise” at the front of each rule following the first.~~ The examples for each rule use representative characters, where ‘’H’ stands for an ideographs, ‘’h’ for small kana, and ‘’9’ for digits. Except where a rule contains no expressions, the italicized text of the rule is intended merely as a handy summary.

372.1/5

The algorithm consists of a part for which tailoring is prohibited and a freely tailorable part.

372.2/5

6.1 Non-tailorable Line Breaking Rules

372.3/5~~/5.1~~/9

The rules in this subsection and the membership in the classes BK, CM, CR, GL, LF, NL, SP, WJ, ~~and~~ ZW and ZWJ define behavior that is required of all line break implementations~~are not tailorable~~; see Section 4, Conformance.

372.3.a/5

{5.0.0: 105-C37}

To be honest: Implementations are not required to support the vertical tabulation in class BK, nor to support the singleton class NL.

373/4

Resolve line breaking classes:

374~~/3.1~~/4~~/4.0.1/4.1~~/5/6.1

{6.1.0: 129-C2}

LB1~~LB 1~~ Assign a line breaking class ~~category~~ to each code point~~character~~ of the input. Resolve AI, CB, CJ, SA, SG, and XX~~, SG~~ into other line breaking classes depending on criteria outside the scope of this algorithm.

374.1/4/4.1

This paragraph was deleted. Alternatively, particularly for text consisting of or predominantly containing of characters with line breaking class SA, it may be useful defer the determination of line breaks to a different algorithm entirely.

374.2/5/6.1

{6.1.0: 129-C2}

In the absence of such criteria all characters with a specific combination of original class and General_Category property value are~~, it is recommended that classes AI, SA, SG, and XX be~~ resolved as follows:~~to AL, except that characters of class SA that have General_Category Mn or Mc be resolved to CM (see SA). Unresolved class CB is handled in rule LB20.~~

374.3/6.1

{6.1.0: 129-C2}

Resolved

Original

General_Category

374.4/6.1

{6.1.0: 129-C2}

AI, SG, XX

Any

374.5/6.1

{6.1.0: 129-C2}

Only Mn or Mc

374.6/6.1

{6.1.0: 129-C2}

Any except Mn and Mc

374.7/6.1

{6.1.0: 129-C2}

Any

375

Start and end of text:

375.1/5

There are two special logical positions: sot, which occurs before the first character in the text, and eot, which occurs after the last character in the text. Thus an empty string would consist of sot followed immediately by eot. With these two definitions, the line break rules for start and end of text can be specified as follows:

376~~/4.1~~/5

LB2~~LB 2a~~ Never break at the start of text.

377/5

sot × ~~sot~~

378~~/4.1~~/5

LB3~~LB 2b~~ Always break at the end of text.

380~~/3.1~~/4.1

These two rules are designed to deal with degenerate cases, so that there is at least one character on each line, and at least one line break for the whole text. Emergency line breaking behavior usually also allows line breaks anywhere on the line if a legal line break cannot be found. This has the~~. Their~~ effect of preventing~~is to have at least one character on each line, and at least one line breaklinebreak~~ ~~for the whole~~ text from running into~~. Emergency line breaking behavior usually also allows line breaks anywhere on the line if a legal line break cannot be found. This has the effect of preventing text to run over~~ the margins.

381

Mandatory breaks:

381.1/4~~/4.1~~/5~~/5.2~~/6

{4.0.0: 94-M2}

Moved from §388.1. ~~Note:~~ A hard line break can consist of BK or a Newline~~New Line~~ Function (NLF) as described in in Section 5.8, Newline Guidelines, of [Unicode~~Unicode5.2~~0~~Unicode~~]. These three rules are designed to handle the line ending and line separating characters as described there..

382/5/5.1

LB4~~LB 3a~~ Always break after hard line breaks. ~~(but never between CR and LF).~~

382.1/3.1

Moved from §386. BK !

382.1.1/4~~/4.1~~/5

{4.0.0: 94-M2}

LB5~~LB 3b~~ Treat CR followed by LF, as well as CR, LF, and NL as hard line breaks.

382.1.2/4

{4.0.0: 94-M2}

Moved from §383. CR × LF

382.2/3.1

Moved from §385. CR !

382.3/3.1

Moved from §384. LF !

383/4

{4.0.0: 94-M2}

This paragraph was moved to §382.1.2. ~~CR × LF~~

384/3.1

This paragraph was moved to §382.3. ~~LF !~~

385/3.1

This paragraph was moved to §382.2. ~~CR !~~

386/3.1

This paragraph was moved to §382.1. ~~BK !~~

386.1/4

{4.0.0: 94-M2}

NL !

386.2/15.1

{15.1.0: 173-C29, 173-A128; L2/22-229R,L2/22-234R2}

Note: When displaying source code, failing to support all forms of the new line function can have security implications; for instance, executable code can appear commented out. It is therefore strongly recommended that source code editors support the VT character within the BK class, and support the NEL character within the NL class, even though that support is not required for conformance. See Unicode Technical Standard #55, Unicode Source Code Handling [UTS55].

387/4~~/4.1~~/5

{4.0.0: 94-M2}

LB6~~LB 3c~~3b Do not~~Don’t~~ break before hard line breaks.

388/4

{4.0.0: 94-M2}

× ( BK | CR | LF | NL )

388.1/4~~/4.1~~/5

{4.0.0: 94-M2}

This paragraph was moved to §381.1. ~~Note: A hard line break can consist of BK or a New Line Function (NLF) as described in~~ in ~~Section 5.8 Newline Guidelines of [Unicode]. These three rules are designed to handle the line ending and line separating characters as described there.~~.

389

Explicit breaks and non-breaks:

390~~/4.1~~/5

LB7~~LB 4~~ Do not~~Don’t~~ break before spaces or zero -width space.

392.a

{3.0.0: 175-A67}

Ramification: Lines do not start with spaces, except after a hard line break or at the start of text.

392.b

{3.0.0: 175-A67}

Ramification: A sequence of spaces is unbreakable; a prohibited break after X is expressed in subsequent rules by disallowing the break after any spaces following X (X SP* ×), and a prohibited break before X by disallowing the break before X (× X).

393/5/6

{6.0.0: 121-C5}

LB8~~LB 5~~ Break before any character following a~~after~~ zero- -width space, even if one or more spaces intervene..

394/6

{6.0.0: 121-C5}

ZW SP* ÷

394.a/6

{6.0.0: 121-C5}

Ramification: This rule requires extended context.

394.b/6

{6.0.0: 121-C5}

Reason: The zero width space is a hint to the line breaking algorithm, hinting a break. Its inverse is the word joiner, see LB11. When they contradict each other, the zero width space wins. However, this rule needs to be more complicated than LB11: if it were simply ZW ÷ before LB7, it would allow for spaces at the beginning of a line. Instead it acts through any sequence of spaces following it.

394.1/4/4.1

This paragraph was deleted. Conjoining Jamo:

394.1.1~~/5.2~~/6

{6.0.0: 121-C5}

This paragraph was deleted. Note:The break opportunities produced by LB8 differ in certain cases from those produced by the pair table included in Section 7, Pair Table-Based Implementation. The differences occur with sequences like ZW SP CL. The inconsistencies will be addressed in the next revision of this document.

394.1.2/9/11

{9.0.0: 146-A46, 147-C26} {11.0.0: 149-A53, 155-A27, 155-C14, 155-A112; L2/17-074}

LB8a Do not break after~~between~~ a zero width joiner. ~~and an ideograph, emoji base or emoji modifier.~~

394.1.3/9/11

{9.0.0: 146-A46, 147-C26} {11.0.0: 149-A53, 155-A27, 155-C14, 155-A112; L2/17-074}

ZWJ × ~~(ID | EB | EM)~~

394.1.4/9~~/10~~/11

{9.0.0: 146-A46, 147-C26} {10.0.0: 150-A58, 150-C22; L2/16-315R} {11.0.0: 149-A53, 155-A27, 155-C14, 155-A112; L2/17-074}

A ZWJ will prevent breaks between most pairs of characters. This behavior is used to prevent~~rule prevents~~ breaks within ~~most~~ emoji zwj sequences.~~, as defined by ED-16. emoji zwj sequence in [UTS51UTR51~~].

394.1.5/9~~/10~~/11

{9.0.0: 146-A46, 147-C26} {10.0.0: 150-A58, 150-C22; L2/16-315R} {11.0.0: 149-A53, 155-A27, 155-C14, 155-A112; L2/17-074}

This paragraph was deleted. ~~Further customization of this rule may be necessary for best behavior of emoji zwj sequences, using [data planned for inclusion in~~ ~~CLDR].~~ ~~Version 30.~~

394.2/4/4.1

This paragraph was split into §§459.5, 459.10, and 459.15. Conjoining Jamo form Korean Syllable Blocks. Such blocks are effectively treated as if they were Hangul Syllables; no breaks can occur in the middle of a syllable block. The effective line breaking class for the syllable block should match the line breaking class for Hangul Syllables, which is ID by default, but is often tailored to AL, see Section 8.

394.2.1~~/4.1~~/5

Combining marks:Marks:

394.3/4~~/4.1~~/5

{4.0.0: 94-C6} {4.1.0: 100-C40}

Parts of this paragraph were split into §§459.6 and 459.11. This paragraph was deleted. ~~LB 6 [replaced by 18b and 18c].Don’t break a Korean Syllable Block, and treat it as a single unit of the same LB class as a Hangul Syllable in all the following rules~~

394.4/4/4.1

{4.0.0: 94-C6} {4.1.0: 100-C40}

This paragraph was deleted. ~~Treat a Korean Syllable block as if it were ID~~

394.5/4/4.1

{4.0.0: 94-C6} {4.1.0: 100-C40}

This paragraph was deleted. ~~See the Unicode Standard Annex #29 [Boundaries] for rules regarding Korean Syllable Blocks.~~

395/4.1

{4.1.0: 100-C40}

This paragraph was deleted. Combining Marks:

396~~/3.1~~/4

{4.0.0: 94-C6}

This paragraph was moved to §401.1. ~~At any possible break opportunity between CM and a following character, CM behaves as if it had the type of its base character.~~ ~~If there is no base, the CM behaves like AL.~~ ~~Virama are treated as CM so they work correctly. Nonand non-initial Jamo are treated as CM, so each syllable inherits the linebreak class of the~~ ~~and~~ ~~initial Jamo and no breaks can occur in the middle of a syllable.are merged with class ID so they work correctly.~~

396.1~~/3.1~~/4~~/4.1~~/5

{4.1.0: 100-M2}

Moved from §400. This paragraph was deleted. ~~LB 7a~~7 ~~[deprecated].~~In all of the following rules, if a space is the base character for a combining mark, the space is changed to type ID. In other words, break before SP CM* in the same cases as one would break before an ID.~~AL.~~

396.2/4/4.1

{4.1.0: 100-M2}

Moved from §401. This paragraph was deleted. ~~Treat SP CM* as if it were ID~~

396.3~~/3.1/3.2~~/4/4.1

{4.1.0: 100-M2}

Moved from §399.1. This paragraph was deleted. ~~As stated in [Unicode], Sectionsection~~ ~~7.7 Combining Marks9 of The Unicode Standard, Version 3.0 [U3.0]~~, combining characters are shown in isolation by applying them to either U+0020 SPACE (SP) or U+00A0 NO- BREAK SPACE (NBSP). The visual appearance is the same, but the line breaking result is different. Correspondingly, if there is no base, or if the base character is SP, CM* or SP CM* behave like ID.

396.4~~/4.1~~/5

{4.1.0: 100-M2}

See also Section 9.2~~8.3~~, Legacy Support for Space Character as Base for Combining Marks.

397/4~~/4.1~~/5/9

{4.0.0: 94-C6} {9.0.0: 146-A46, 147-C26}

LB9~~LB 7b~~6 Do not~~Don'~~’t break a~~graphemes (before~~ combining character sequence; ~~and~~ treat it as if it has the line breakingLB class of the base character in all of the following rules. Treat ZWJ as if it were CM.~~marks, around virama or on sequences of conjoining Jamos.~~

398~~/4.1~~/9

{9.0.0: 146-A46, 147-C26}

Treat X (CM | ZWJ)* as if it were X.

399~~/3.1~~/4

This paragraph was deleted. ~~(See the Unicode Standard [U3.0] for other rules regarding graphemes.)~~

399.1~~/3.1/3.2~~/4

This paragraph was moved to §396.3. As stated in section 7.9 of The Unicode Standard, Version 3.0 [U3.0], combining characters are shown in isolation by applying them to either U+0020 SPACE (SP) or U+00A0 NO BREAK SPACE (NBSP). The visual appearance is the same, but the line breaking result is different. Correspondingly, if there is no base, or if the base character is SP, CM* or SP CM* behave like ID.

400~~/3.1~~/4

This paragraph was moved to §396.1. LB 7 In all of the following rules, if a space is the base character for a combining mark, the space is changed to type ID. In other words, break before SP CM* in the same cases as one would break before an ID.~~AL.~~

400.1~~/4.1~~/5

{4.1.0: 102-C23}

where~~Where~~ X is any line break class except BK, CR, LF, NL, SP, ~~BK, CR, LF, NL~~ or ZW.

400.1.a

{3.0.0: 175-A67}

Ramification: The “do not break” part of these rules does not require extended context, but the “treat as” means that most subsequent rules implicitly have extended context across combining marks.

401/4

This paragraph was moved to §396.2. ~~Treat SP CM* as if it were ID~~

401.1~~/3.1~~/4~~/4.1/5.2~~/16

{4.0.0: 94-C6} {4.1.0: 102-C23} {16.0.0: L2/24-009R; L2/24-008#ID20231107140948}

Moved from §396. In subsequent rules, any CM or ZWJ characters affected by this rule are ignored~~At any possible break opportunity between CM and a following character, CM behaves as if it had the type of its base character~~. Note that despite the summary title, of this rule it is not limited to standard combining~~If there is no base~~ character sequences. For the purposes of line breaking, sequences containing most of the control codes or layout control characters~~, in other words at the start of text (sot), treat CM as if following a SP.~~ ~~If there is no base, the CM behaves like AL.~~ ~~Virama~~ are treated like combining sequences.~~as CM so they work correctly.~~ ~~Nonand non-initial Jamo are treated as CM, so each syllable inherits the linebreak class of the~~ ~~and~~ ~~initial Jamo and no breaks can occur in the middle of a syllable.are merged with class ID so they work correctly.~~

401.2/4/5/9

{4.0.0: 92-A64, 93-A96} {9.0.0: 146-A46, 147-C26}

LB10~~LB 7c~~ Treat any remaining combining mark or ZWJ as AL.

401.3/4~~/4.1~~/9/16

{4.0.0: 92-A64, 93-A96} {9.0.0: 146-A46, 147-C26} {16.0.0: 180-C18, 180-A57; L2/24-162}

Treat any remaining CM or ZWJ as if it had the properties of U+0041 A LATIN CAPITAL LETTER A, that is, Line_Break=~~if were~~ AL, General_Category=Lu, East_Asian_Width=Na, Extended_Pictographic=N..

401.4/4~~/4.0.1/4.1~~/5

{4.0.0: 92-A64, 93-A96} {4.1.0: 102-C23}

This catches the case where a CM is the first character on the line, or follows SP, BK, CR, LF, NL, or ZW.~~. However, since combining marks are most commonly applied to characters of class AL, rule 7c alone generally produces acceptablecorrect~~ ~~results even in implementations that do not explicitly supportsituations where~~ ~~7a and 7b.~~ ~~cannot be supported.~~

401.4.a/11

{11.0.0: 149-A53, 155-A27, 155-C14, 155-A112; L2/17-074}

Discussion: Absent tailoring, this rule has no effect in the case of ZWJ. The breaks on both sides of ZWJ have already been resolved, except in SP ZWJ, which gets resolved at the latest in LB18 without classes AL nor ZWJ being involved as extended context. The rule is written like this for consistency with combining marks following LB9.

401.4.b/11

{11.0.0: 149-A53, 155-A27, 155-C14, 155-A112; L2/17-074}

Discussion: In contrast to SP CM which is either deprecated or anomalous, SP ZWJ can occur in practice, and SP ÷ ZWJ is desired; a leading ZWJ can be used to force a leading medial or final form, such as this final alif: ‍ا (contrast the isolated ا).

401.5/4~~/4.1~~/5

Moved from §414.3. Word joiner:Joiner:Non-breaking characters:

401.6~~/3.2~~/4~~/4.1~~/5

{3.2.0: 81-M6, 85-M7; L2/00-258} {4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3}

Moved from §414.4. LB11~~LB 11b~~13 Do not~~Don’t~~ break before or after Word joiner~~NBSP,~~ or ~~WORD JOINER~~ and related characters.~~ZWNBSP~~

401.7~~/4.1~~/5

{4.1.0: 94-M3}

Moved from §414.7. × WJ

401.8~~/4.1~~/5

{4.1.0: 94-M3}

Moved from §414.8. WJ ×

401.8.a/6.1

{6.1.0: 125-A99; L2/11-141R}

Ramification: Since this is not a “treat as” rule, the WJ remains in the sequence for subsequent rules to see. In the presence of rules that require extended context, this means that introducing a WJ can paradoxically create break opportunities. For instance, LB21 and LB21a yield HL × HY × AL, but LB21a does not apply in HL × WJ × HY ÷ AL.

401.9~~/4.1~~/5

Moved from §417.2. Non-breaking characters:

401.10~~/3.2~~/4~~/4.1~~/5/5.1

{3.2.0: 81-M6, 85-M7; L2/00-258} {4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3} {5.1.0: 110-C17}

Moved from §417.3. Part of this paragraph was moved to §401.17. LB12~~LB 13~~ Do not~~11b~~13 ~~Don’t~~ break ~~before or~~ after NBSP, or ~~WORD JOINER~~ and related characters.~~ZWNBSP~~

401.11/5/5.1

{4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3} {5.1.0: 110-C17}

Moved from §417.4. This paragraph was moved to §401.18. ~~[^SP] × GL~~

401.12/5

{4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3}

Moved from §417.5. GL ×

401.13/5/5.1

{5.1.0: 110-C17}

This paragraph was moved to §401.19. Unlike the case for WJ, inserting a SP overrides the non-breaking nature of a GL. The expression [^SP] designates any line break class other than SP. The symbol ^ is used, instead of !, to avoid confusion with the use of ! to indicate an explicit break.

401.14/5

6.2 Tailorable Line Breaking Rules

401.15/5~~/5.1~~/5.2

{5.1.0: 110-C17}

The following rules and the classes referenced in them provide a reasonable default set of line break opportunities. Implementations should~~SHOULD~~ implement them unless alternate approaches produce better results for some classes of text or applications. When using alternative rules or algorithms, implementations must ensure that the mandatory breaks, break opportunities and non-break positions determined~~can be tailored~~ by the algorithm and rules of~~a conformant implementations; see~~ Section 6.1, Non-tailorable Line Breaking Rules, are preserved. See Section 4, Conformance.

401.16/5.1

{5.1.0: 110-C17}

Non-breaking characters:

401.17~~/3.2~~/4~~/4.1~~/5~~/5.1~~/5.2

{3.2.0: 81-M6, 85-M7; L2/00-258} {4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3} {5.1.0: 110-C17}

Split from §401.10. LB12a~~LB12LB 13~~ Do not~~11b~~13 ~~Don’t~~ break before ~~or after~~ NBSP, or ~~WORD JOINER~~ and related characters, except after spaces and hyphens..~~ZWNBSP~~

401.18/5/5.1

{4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3} {5.1.0: 110-C17}

Moved from §401.11. [^SP BA HY] × GL

401.19/5/5.1

{5.1.0: 110-C17}

Moved from §401.13. ~~Unlike the case for WJ, inserting a SP overrides the non-breaking nature of a GL.~~ The expression [^SP, BA, HY] designates any line break class other than SP, BA or HY. The symbol ^ is used, instead of !, to avoid confusion with the use of ! to indicate an explicit break. Unlike the case for WJ, inserting a SP overrides the non-breaking nature of a GL. Allowing a break after BA or HY matches widespread implementation practice and supports a common way of handling special line breaking of explicit hyphens, such as in Polish and Portuguese. See Section 5.3, Use of Hyphen.

402

Opening and closing:

403~~/4.1~~/5/5.1

{5.1.0: 110-C17}

These have special behavior with respect to spaces, and thereforeso come before rule LB18.~~19.12.~~

404~~/3.1/4.1~~/5/16

{16.0.0: 179-C35, 179-A116}

LB13~~LB 8~~ Do not~~Don’t~~ break before ‘]’ or ‘!’ or ~~‘;’ or~~ ‘/’ ~~or ‘,’ or ‘]’~~, even after spaces.

405.1/5.2

{5.2.0: 114-A86, 120-M1}

× CP

406.1~~/3.1~~/16

{16.0.0: 179-C35, 179-A116}

Moved from §408. This paragraph was moved to §412.9. ~~× IS~~

408/3.1

This paragraph was moved to §406.1. ~~× IS~~

408.a

{3.0.0: 175-A67}

Reason: × EX and × IS accomodate French typographical conventions in cases where a normal space (rather than NBSP or NNBSP, class GL) is used before the exclamation or question marks, or the colon and semicolon. × CL likewise caters to French « quotation marks » if QU has been resolved for French. See [Suign98].

409~~/4.1~~/5

LB14~~LB 9~~ Do not~~Don’t~~ break after ‘[’, even after spaces.

410.a

{3.0.0: 175-A67}

Ramification: This rule requires extended context.

411~~/4.1~~/5/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

This paragraph was deleted. ~~LB15LB 10~~ ~~Do notDon’t~~ ~~break within ‘”[’,~~ , ~~even with intervening spaces.~~

412/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

This paragraph was deleted. ~~QU SP* × OP~~

412.a

{3.0.0: 175-A67}

Ramification: This rule requires extended context.

412.1~~/5.1~~/15.1

{5.1.0: } {15.1.0: 175-C23, 175-A71; L2/23-063}

This paragraph was deleted. ~~For more information on this rule, see the note in the description for the QU class.~~

412.2/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

LB15a Do not break after an unresolved initial punctuation that lies at the start of the line, after a space, after opening punctuation, or after an unresolved quotation mark, even after spaces.

412.3/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

(sot | BK | CR | LF | NL | OP | QU | GL | SP | ZW) [\p{Pi}&QU] SP* ×

412.3.a/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

Ramification: Ramification: This rule requires extended context.

412.4/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

LB15b Do not break before an unresolved final punctuation that lies at the end of the line, before a space, before a prohibited break, or before an unresolved quotation mark, even after spaces.

412.5/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

× [\p{Pf}&QU] ( SP | GL | WJ | CL | QU | CP | EX | IS | SY | BK | CR | LF | NL | ZW | eot)

412.5.a/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

Ramification: Ramification: This rule requires extended context after the break.

412.5.b/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

Reason: In some typographic traditions, such as German or Swedish, initial punctuation can be closing, and final punctuation can be be opening, „like this“, »like that«, or ”like that”. In others, such as French and Vietnamese, opening and closing quotation marks are separated from their contents by spaces, « like this ». These inner spaces must not be broken. Crucially, these two sets do not intersect (no-one does » this «), so an “isolated” quotation mark, is likely one from the French tradition.

412.5.c/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

Besides hard line breaks and spaces, the alternatives in LB15a encompass characters that may be expected to occur before a quotation, such as opening parentheses (« like this »), or other quotation marks (for „« nested » quotations”).

412.5.d/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

ZW is also included, for two reasons. One is technical: some major state-machine based implementations are incapable of considering context across break opportunities, so that the position following ZW is indistinguishable from sot for them. The other is semantic: ZW is an overriding control character that creates a break opportunity; it is similar to a hard line break, and is a strong signal that the a line break can be expected at this position, from which it follows that the preceding character is unlikely to be an opening quotation mark, and the following character is unlikely to be a closing quotation mark.

412.5.e/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

The alternatives in LB15b comprise the closing counterparts, as well as classes encompassing terminal punctuation, for the case of quotations before « commas », or before « full stops ».

412.5.f/15.1

{15.1.0: 175-C23, 175-A71; L2/23-063}

Ramification: When text starting with a full stop is quoted within German text using »this style« of quotation mark or within Swedish text, if the quotation marks are not resolved, the algorithm fails to allow breaks that should be permitted, as before the » quotation mark in “12,7 × 99 mm NATO, auch ».50 BMG«”.

412.6/16

{16.0.0: 179-C35, 179-A116}

LB15c Break before a decimal mark that follows a space, for instance, in ‘subtract .5’.

412.7/16

{16.0.0: 179-C35, 179-A116}

SP ÷ IS NU

412.8/16

{16.0.0: 179-C35, 179-A116}

LB15d Otherwise, do not break before ‘;’, ‘,’, or ‘.’, even after spaces.

412.9/16

{16.0.0: 179-C35, 179-A116}

Moved from §406.1. × IS

413~~/4.1~~/5/5.1

LB16~~LB 11~~ Do not~~Don’t~~ break between closing punctuation and a nonstarter (lb=NS)~~within ‘]h’~~, even with intervening spaces.

414/5.2

{5.2.0: 114-A86, 120-M1}

(CL | CP) SP* × NS

414.a

{3.0.0: 175-A67}

Ramification: This rule requires extended context.

414.1~~/3.1/4.1~~/5

{3.1.0: }

LB17~~LB 11a~~ Do not~~Don’t~~ break within ‘——’, even with intervening spaces.

414.2~~/3.1~~/4.1

{3.1.0: }

B2 SP* × B2

414.2.a/4.1

{4.1.0: }

Ramification: This rule requires extended context.

414.3/4~~/4.1~~/5

This paragraph was moved to §401.5. Word Joiner:Non-breaking characters:

414.4~~/3.2~~/4~~/4.1~~/5

{3.2.0: 81-M6, 85-M7; L2/00-258} {4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3}

Moved from §419. Part of this paragraph was moved to §417.3. This paragraph was moved to §401.6. ~~LB 11b~~13 ~~Do notDon’t~~ ~~break before or after~~ ~~NBSP,~~ or ~~WORD JOINER and related characters.ZWNBSP~~

414.5/4/4.1

{4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3}

Moved from §420. This paragraph was moved to §417.4. ~~× GL~~

414.6/4/4.1

{4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3}

Moved from §421. This paragraph was moved to §417.5. ~~GL ×~~

414.7~~/4.1~~/5

{4.1.0: 94-M3}

This paragraph was moved to §401.7. ~~× WJ~~

414.8~~/4.1~~/5

{4.1.0: 94-M3}

This paragraph was moved to §401.8. ~~WJ ×~~

416~~/4.1~~/5

LB18~~LB 12~~ Break after spaces.

417.1/4/4.1

This paragraph was deleted. ~~Many existing implementations reverse the order of precedence between rules LB11b and LB12.~~

417.2~~/4.1~~/5

This paragraph was moved to §401.9. Non-breaking characters:

417.3~~/3.2~~/4~~/4.1~~/5

{3.2.0: 81-M6, 85-M7; L2/00-258} {4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3}

Split from §414.4. This paragraph was moved to §401.10. ~~LB 13 Do not11b~~13 ~~Don’t~~ ~~break before or after NBSP~~, or ~~WORD JOINER~~ ~~and related characters.ZWNBSP~~

417.4~~/4.1~~/5

{4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3}

Moved from §414.5. This paragraph was moved to §401.11. ~~× GL~~

417.5~~/4.1~~/5

{4.0.0: 92-A64, 93-A96} {4.1.0: 94-M3}

Moved from §414.6. This paragraph was moved to §401.12. ~~GL ×~~

418

Special case rules:

419~~/3.2~~/4

{3.2.0: 81-M6, 85-M7; L2/00-258}

This paragraph was moved to §414.4. ~~LB 13 Don’t break before or after NBSP or WORD JOINERZWNBSP~~

420/4

This paragraph was moved to §414.5. ~~× GL~~

421/4

This paragraph was moved to §414.6. ~~GL ×~~

422~~/4.1~~/5/16

{16.0.0: 179-C28, 179-A102}

LB19~~LB 14~~ Do not~~Don’t~~ break before non-initial unresolved quotation marks, such as ‘ ” ’ or ‘ " ’, nor after non-final unresolved quotation marks, such as ‘ “” ’ or ‘ " ’..

423/16

{16.0.0: 179-C28, 179-A102}

× [ QU - \p{Pi} ]

424/16

{16.0.0: 179-C28, 179-A102}

[ QU - \p{Pf} ] ×

424.0.1/16

{16.0.0: 179-C28, 179-A102}

LB19a Unless surrounded by East Asian characters, do not break either side of any unresolved quotation marks.

424.0.2/16

{16.0.0: 179-C28, 179-A102}

[^$EastAsian] × QU

424.0.3/16

{16.0.0: 179-C28, 179-A102}

× QU ( [^$EastAsian] | eot )

424.0.4/16

{16.0.0: 179-C28, 179-A102}

QU × [^$EastAsian]

424.0.5/16

{16.0.0: 179-C28, 179-A102}

( sot | [^$EastAsian] ) QU ×

424.0.5.a/16

{16.0.0: 179-C28, 179-A102}

Reason: In some typographic traditions, such as Swedish and German, final punctuation can be opening and initial punctuation can be closing, ”like this” and „like that“. However, this is not the case in East Asian typographic traditions where (in particular in Simplified Chinese) the lb=QU “” and ‘’ are used. At the same time, these East Asian traditions benefit from classifying the quotation marks as opening and closing, as, contrary to the Western case, they do not separate the quotation marks from surrounding words by spaces, so that the breaks outside of quotation marks are not supplied by LB18. Thus, if the context is East Asian, we should treat initial punctuation as opening and final punctuation as closing. Otherwise, we need to be cautious and disallow breaks on either side.

424.0.5.b/16

{16.0.0: 179-C28, 179-A102}

Having East Asian characters on one side is not enough to establish an East Asian context, as, e.g., a Chinese word could be quoted inside of German text. For instance, the ‘ quotation mark must not be considered to be in East Asian context in the following, as this would incorrectly allow a break inside the closing quotation mark: Anmerkung: „White“ bzw. ‚白人‘ – in der Amtlichen Statistik

424.0.5.c/16

{16.0.0: 179-C28, 179-A102}

Ramification: When non-East Asian text is quoted within Simplified Chinese text and the quotation marks U+2018 and U+2019 are not resolved, the algorithm fails to allow breaks that should be permitted, as outside the “” quotation marks in 2000年获得了《IGN》的“Best Game Boy Strategy”奖。 In that example, breaks are correctly permitted outside 《》 because these are unambiguous, lb=OP and lb=CL.

424.1/4~~/4.1~~/5

{4.0.0: 92-A64, 93-A96}

LB20~~LB 14a~~ Break before and after unresolved CB.

424.2/4

{4.0.0: 92-A64, 93-A96}

÷ CB

424.3/4

{4.0.0: 92-A64, 93-A96}

CB ÷

424.4/4/4.0.1

{4.0.0: 92-A64, 93-A96}

Conditional breaks should be resolved external to the line breaking rules. However, the default action is to treat unresolved CB as breaking before and after.

424.5/16

{16.0.0: 179-C32, 179-A111}

LB20a Do not break after a word-initial hyphen.

424.6/16

{16.0.0: 179-C32, 179-A111}

( sot | BK | CR | LF | NL | SP | ZW | CB | GL ) ( HY | [\u2010] ) × AL

424.7/16

{16.0.0: 179-C32, 179-A111}

Note: In the above regular expression, the class [\u2010] contains the single character U+2010 HYPHEN.

424.7.a/16

{16.0.0: 179-C32, 179-A111}

Reason: Originally added as a Finnish tailoring in CLDR-3029, with the example “Mac Pro -tietokone”. Does not interact adversely with other languages, and indeed seems generally beneficial, thus made into a root tailoring in ICU-8151 then upstreamed to The Unicode Standard.

425~~/3.1/4.1~~/5

LB21~~LB 15~~ Do not~~Don’t~~ break before ~~small kana and other non starters,~~ hyphen-minus, other hyphens, fixed-width spaces, small kana, and other non- starters, or after acute accents.:

426/3.1

This paragraph was moved to §428.1. ~~× NS~~

426.1/3.1

Moved from §428. × BA

428/3.1

This paragraph was moved to §426.1. ~~× BA~~

428.1/3.1

Moved from §426. × NS

430/4

This paragraph was moved to §459.1. ~~LB 15b Break after hyphen-minus, and before acute accents:~~

431/4

This paragraph was moved to §459.2. ~~HY ÷~~

432/4

This paragraph was moved to §459.3. ~~÷ BB~~

432.1~~/6.1~~/8/16

{6.1.0: 125-A99; L2/11-141R} {16.0.0: 179-C25, 179-A98; PRI-335#ID20170429231648}

LB21a Do not~~Don't~~ break after the hyphen in Hebrew + Hyphen + non-Hebrew..

432.2~~/6.1~~/16

{6.1.0: 125-A99; L2/11-141R} {16.0.0: 179-C25, 179-A98; PRI-335#ID20170429231648} {16.0.0: 180-C18, 180-A57; L2/24-162}

HL (HY | [ BA - $EastAsian ]) × [^HL]

432.2.a/6.1

{6.1.0: 125-A99; L2/11-141R}

Reason: “With <hebrew hyphen non-hebrew>, there is no break on either side of the hyphen.”

432.2.b/6.1

{6.1.0: 125-A99; L2/11-141R}

Discussion: The Hebrew ICU “and” list format with a non-Hebrew last element provides an example of such a sequence: ⁧John ו-Michael⁩; with a Hebrew last word, the letter ו is prefixed to the word: יוחנן ומיכאל. See ICU-21016.

432.2.c/6.1

{6.1.0: 125-A99; L2/11-141R}

Ramification: This rule requires extended context.

432.2.d/16

{16.0.0: 179-C25, 179-A98; PRI-335#ID20170429231648}

Ramification: A break is allowed after the hyphen in Hebrew + Hyphen + Hebrew.

432.3/8/16

{8.0.0: 137-C9}

LB21b Do not~~Don’t~~ break between Solidus and Hebrew letters.

432.4/8

{8.0.0: 137-C9}

SY × HL

432.4.a/8

{8.0.0: 137-C9}

Reason: From CLDR. “Hebrew makes extensive use of the / character to create gender-neutral verb forms, with the feminine suffix coming after the slash. […] It is quite rare in Hebrew to use a slash other than in this context.” See CLDR-6116.

433~~/4.1~~/5/8/13

{8.0.0: 142-C3} {13.0.0: 142-A23, 160-A56}

LB22~~LB 16~~ Do not~~Don’t~~ break before~~between two~~ ellipses.~~, or between letters, numbers or exclamationsnumbers~~ ~~and ellipsis.~~:

433.1~~/3.1/6.1~~/13

{6.1.0: 125-A99; L2/11-141R} {13.0.0: 142-A23, 160-A56}

Moved from §436. This paragraph was deleted. (~~AL | HL) × IN~~

433.1.1/8/13

{8.0.0: 142-C3} {13.0.0: 142-A23, 160-A56}

This paragraph was deleted. ~~EX × IN~~

433.2~~/3.1~~/9/13

{9.0.0: 146-A46, 147-C26} {13.0.0: 142-A23, 160-A56}

Moved from §437. This paragraph was deleted. (~~ID | EB | EM) × IN~~

434/13

{13.0.0: 142-A23, 160-A56}

IN × IN

435/13

{13.0.0: 142-A23, 160-A56}

This paragraph was deleted. ~~NU × IN~~

436/3.1

This paragraph was moved to §433.1. ~~AL × IN~~

437/3.1

This paragraph was moved to §433.2. ~~ID × IN~~

438/5

Examples: ‘9...’~~9...’~~, ‘a...’, ‘H...’

440/4.1

Do not~~Don't~~ break alphanumerics.

441~~/4.1~~/5/9

{9.0.0: 143-A4, 146-C19}

LB23~~LB 17~~ Do not~~Don’t~~ break between digits and letters.~~within ‘a9’, ‘3a’, or ‘H%’.~~

441.1~~/3.1~~/9

{9.0.0: 143-A4, 146-C19}

Moved from §444. This paragraph was moved to §443.0.3. ~~ID × PO~~

442/6.1

{6.1.0: 125-A99; L2/11-141R}

(AL | HL) × NU

443/6.1

{6.1.0: 125-A99; L2/11-141R}

NU × (AL | HL)

443.0.1/5/9

{5.0.0: 105-C37} {5.0.0: 105-C6} {9.0.0: 146-A46, 147-C26} {9.0.0: 143-A4, 146-C19}

Split from §443.1. LB23a~~LB24~~ Do not break between numeric prefixes and ideographs,~~prefix and letters~~ or between ideographs and numeric postfixes..

443.0.2~~/3.1~~/9

{5.0.0: 105-C37} {5.0.0: 105-C6} {9.0.0: 146-A46, 147-C26} {9.0.0: 143-A4, 146-C19}

Moved from §443.2. PR × (ID | EB | EM)

443.0.3/9

{9.0.0: 146-A46, 147-C26} {9.0.0: 143-A4, 146-C19}

Moved from §441.1. (ID | EB | EM) × PO

443.1/5/9

{5.0.0: 105-C37} {5.0.0: 105-C6} {9.0.0: 143-A4, 146-C19}

Part of this paragraph was moved to §443.0.1. LB24 Do not break between numeric prefix/postfix and letters, or between letters and prefix/postfix.~~ideographs.~~

443.2~~/3.1~~/5/9

{5.0.0: 105-C37} {5.0.0: 105-C6} {9.0.0: 143-A4, 146-C19}

Moved from §448.8. This paragraph was moved to §443.0.2. ~~PR × ID~~

443.3/5~~/6.1~~/9

{5.0.0: 105-C37} {5.0.0: 105-C6} {6.1.0: 125-A99; L2/11-141R} {9.0.0: 143-A4, 146-C19}

Moved from §448.6. This paragraph was deleted. ~~PR × (AL | HL)~~

443.4/5~~/6.1~~/9

{5.0.0: 105-C37} {5.0.0: 105-C6} {6.1.0: 125-A99; L2/11-141R} {9.0.0: 143-A4, 146-C19}

(PR | PO) × (AL | HL)

443.5/9

{9.0.0: 143-A4, 146-C19}

(AL | HL) × (PR | PO)

443.5.a/9

{9.0.0: 143-A4, 146-C19}

Reason: This rule forbids breaking within currency symbols such as CA$ or JP¥, as well as stylized artist names such as “Travi$ Scott”, “Ke$ha”, “Curren$y”, and “A$AP Rocky”.

444/3.1

This paragraph was moved to §441.1. ~~ID × PO~~

444.1~~/4.1~~/5.1

In general, it is recommended to not break lines ~~should not be broken~~ inside numbers of the form described by the following regular expression:

445~~/3.1/4.1~~/5~~/5.2~~/16

{5.0.0: 105-C37} {5.0.0: 105-C6} {5.2.0: 114-A86, 120-M1} {16.0.0: 179-C35, 179-A116}

446/16

{16.0.0: 179-C35, 179-A116}

Examples: $(12.35) 2,1234 (12)¢ 12.54¢ .50 ₹1,00,000.00 -1/12

447/4~~/4.1~~/5~~/5.1~~/16

{5.0.0: 105-C37} {5.0.0: 105-C6} {16.0.0: 179-C35, 179-A116}

The default line breaking algorithm implements~~approximates~~ this~~This is approximated~~ with the following rule~~, together with PR × AL and PR × ID, which handle numeric prefix puncutation~~. Note that some~~rules. (Some~~ cases have~~are~~ already been handled, such as ~~above, like~~ ‘9,’, ‘[9’. ~~For a tailoring that supports the regular) Regular~~ ~~expression directly, as well as a key to the notation see Section 8.2, Examples of Customization.-based line breaking engines will get better results implementing the above regular expression for numeric expressions.~~

448~~/4.1~~/5/16

{5.0.0: 105-C37} {5.0.0: 105-C6} {16.0.0: 179-C35, 179-A116}

LB25~~LB 18~~ Do not~~Don’t~~ break ~~between the following pairs of classes relevant to~~ numbers:.

448.1~~/3.1~~/16

{16.0.0: 179-C35, 179-A116}

Moved from §458. NU ( SY | IS )* CL × PO

448.1.0.1~~/5.2~~/16

{5.2.0: 114-A86, 120-M1} {16.0.0: 179-C35, 179-A116}

NU ( SY | IS )* CP × PO

448.1.1/5/16

{5.0.0: 105-C37} {5.0.0: 105-C6} {16.0.0: 179-C35, 179-A116}

NU ( SY | IS )* CL × PR

448.1.1.1~~/5.2~~/16

{5.2.0: 114-A86, 120-M1} {16.0.0: 179-C35, 179-A116}

NU ( SY | IS )* CP × PR

448.1.2/5/16

{5.0.0: 105-C37} {5.0.0: 105-C6} {16.0.0: 179-C35, 179-A116}

Moved from §448.5. NU ( SY | IS )* × PO

448.1.3/5/16

{5.0.0: 105-C37} {5.0.0: 105-C6} {16.0.0: 179-C35, 179-A116}

NU ( SY | IS )* × PR

448.1.4/5/16

{5.0.0: 105-C37} {5.0.0: 105-C6} {16.0.0: 179-C35, 179-A116}

PO × OP NU

448.1.4.1/16

{16.0.0: 179-C35, 179-A116}

PO × OP IS NU

448.1.5/5

{5.0.0: 105-C37} {5.0.0: 105-C6}

PO × NU

448.1.6/5/16

{5.0.0: 105-C37} {5.0.0: 105-C6} {16.0.0: 179-C35, 179-A116}

Moved from §451. PR × OP NU

448.1.6.1/16

{16.0.0: 179-C35, 179-A116}

PR × OP IS NU

448.1.7/5

{5.0.0: 105-C37} {5.0.0: 105-C6}

Moved from §449. PR × NU

448.2/3.1

Moved from §453. HY × NU

448.3/3.1

Moved from §456. IS × NU

448.4~~/3.1~~/16

{16.0.0: 179-C35, 179-A116}

Moved from §455. NU ( SY | IS )* × NU

448.5~~/3.1~~/5

{5.0.0: 105-C37} {5.0.0: 105-C6}

Moved from §457. This paragraph was moved to §448.1.2. ~~NU × PO~~

448.6~~/3.1~~/5

{5.0.0: 105-C37} {5.0.0: 105-C6}

Moved from §450. This paragraph was moved to §443.3. ~~PR × AL~~

448.7~~/3.1~~/5

{5.0.0: 105-C37} {5.0.0: 105-C6}

Moved from §452. This paragraph was deleted. ~~PR × HY~~

448.8~~/3.1~~/5

{5.0.0: 105-C37} {5.0.0: 105-C6}

This paragraph was moved to §443.2. ~~PR × ID~~

449/5

{5.0.0: 105-C37} {5.0.0: 105-C6}

This paragraph was moved to §448.1.7. ~~PR × NU~~

450/3.1

This paragraph was moved to §448.6. ~~PR × AL~~

451/5

{5.0.0: 105-C37} {5.0.0: 105-C6}

This paragraph was moved to §448.1.6. ~~PR × OP~~

452/3.1

This paragraph was moved to §448.7. ~~PR × HY~~

453/3.1

This paragraph was moved to §448.2. ~~HY × NU~~

454/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~SY × NU~~

455/3.1

This paragraph was moved to §448.4. ~~NU × NU~~

456/3.1

This paragraph was moved to §448.3. ~~IS × NU~~

457/3.1

This paragraph was moved to §448.5. ~~NU × PO~~

458/3.1

This paragraph was moved to §448.1. ~~CL × PO~~

459/5/16

{5.0.0: 105-C37} {5.0.0: 105-C6} {16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~Example pairs: ‘$9’, ‘$[’, ‘$-’~~‘~~, ‘-9’, ‘/9’, ‘99’, ‘,9’, ‘9%’ ‘]%’~~

459.1/4/4.1

{4.0.0: 92-A64, 93-A96, 94-M4} {4.1.0: 100-C40}

Moved from §430. This paragraph was deleted. ~~LB 18b15b~~ ~~Break after hyphen-minus, and before acute accents:~~

459.2/4/4.1

{4.0.0: 92-A64, 93-A96, 94-M4} {4.1.0: 100-C40}

Moved from §431. This paragraph was deleted. ~~HY ÷~~

459.3/4/4.1

{4.0.0: 92-A64, 93-A96, 94-M4} {4.1.0: 100-C40}

Moved from §432. This paragraph was deleted. ~~÷ BB~~

459.4/4.1

{4.1.0: 100-C40}

Korean syllable blocks

459.5/4~~/4.1~~/5/5.2

{4.1.0: 100-C40} {5.0.0: 105-C37} {5.0.0: 105-C6} {5.2.0: 114-A86, 120-M1}

Split from §394.2. Conjoining jamos~~jamo~~, Hangul syllables, or combinations of both~~Jamo~~ form Korean Syllable Blocks. Such blocks are effectively treated as if they were Hangul syllables; no breaks can occur in the middle of a syllable~~Syllable~~ ~~Blocks. Such blocks are effectively treated as if they were Hangul syllablesSyllables; no breaks can occur in the middle of a syllable~~ block. See ~~the~~ Unicode Standard Annex #29, “Unicode : Text Segmentation~~Boundaries~~” [UAX29~~Boundaries~~], for more information on Korean Syllable Blocks.~~The effective line breaking class for the syllable block should match the line breaking class for Hangul Syllables, which is ID by default, but is often tailored to AL, see Section 8.~~

459.6/4~~/4.1~~/5

{4.0.0: 94-C6} {4.1.0: 100-C40}

Split from §394.3. LB26~~LB 18b~~ Do not~~6 Don’t~~ break a Korean syllable.~~Syllable Block, and treat it as a single unit of the same LB class as a Hangul Syllable in all the following rules~~

459.7~~/4.1~~/5

{4.1.0: 100-C40}

JL × (JL | JV | H2 | H3)

459.8~~/4.1~~/5

{4.1.0: 100-C40}

(JV | H2) × (JV | JT)

459.9~~/4.1~~/5

{4.1.0: 100-C40}

(JT | H3) × JT

459.10/4~~/4.1~~/5.1

{4.1.0: 100-C40}

Split from §394.2. where the notation (JT | H3) means JT or H3. ~~Conjoining Jamo form Korean Syllable Blocks. Such blocks are effectively treated as if they were Hangul Syllables; no breaks can occur in the middle of a syllable block.~~ The effective line breaking class for the syllable block matches~~should match~~ the line breaking class for Hangul syllables~~Syllables~~, which is ID by default. This is achieved by the following rule:~~, but is often tailored to AL, see Section 8.~~

459.11/4~~/4.1~~/5

{4.0.0: 94-C6} {4.1.0: 100-C40}

Split from §394.3. LB27~~LB 18c~~ Treat~~6 Don’t break~~ a Korean Syllable Block the same~~, and treat it~~ as ID.~~a single unit of the same LB class as a Hangul Syllable in all the following rules~~

459.12~~/4.1~~/5/14

{4.1.0: 100-C40} {14.0.0: 163-A70}

This paragraph was deleted. (~~JL | JV | JT | H2 | H3) × IN~~

459.13~~/4.1~~/5

{4.1.0: 100-C40}

(JL | JV | JT | H2 | H3) × PO

459.14~~/4.1~~/5

{4.1.0: 100-C40}

PR × (JL | JV | JT | H2 | H3)

459.15/4~~/4.1~~/5/5.1

{4.1.0: 100-C40}

Split from §394.2. When Korean uses SPACE for line breaking, the~~these~~ classes in rule LB26, as well as ~~and~~ characters of class ID, areConjoining Jamo form Korean Syllable Blocks. Such blocks are effectively treated as if they were Hangul Syllables; no breaks can occur in the middle of a syllable block. The effective line breaking class for the syllable block should match the line breaking class for Hangul Syllables, which is ID by default, but is often tailored to AL;:, see Section 8, Customization.~~Tailoring.~~.

460/5

Finally, join alphabetic letters into words and break everything else.

461~~/4.1~~/5

LB28~~LB 19~~ Do not~~Don’t~~ break between alphabetics (“at”).

462/6.1

{6.1.0: 125-A99; L2/11-141R}

(AL | HL) × (AL | HL)

462.0.1/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

LB28a Do not break inside the orthographic syllables of Brahmic scripts.

462.0.2~~/15.1~~/16

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535} {16.0.0: 178-A20}

AP × (AK | [◌] | AS)

462.0.3~~/15.1~~/16

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535} {16.0.0: 178-A20}

(AK | [◌] | AS) × (VF | VI)

462.0.4~~/15.1~~/16

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535} {16.0.0: 178-A20}

(AK | [◌] | AS) VI × (AK | [◌])

462.0.5~~/15.1~~/16

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535} {16.0.0: 178-A20}

(AK | [◌] | AS) × (AK | [◌] | AS) VF

462.0.5.a/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Ramification: Ramification: This rule requires extended context after the break.

462.0.6/16

{16.0.0: 178-A20}

Note: In the above regular expressions, the class [◌] contains the single character U+25CC DOTTED CIRCLE.

462.1~~/4.0.1/4.1~~/5

{4.0.1: 97-C25}

LB29~~LB 19b~~ Do not~~Don’t~~ break between numeric punctuation and alphabetics (“"e.g.”").

462.2~~/4.0.1~~/6.1

{4.0.1: 97-C25} {6.1.0: 125-A99; L2/11-141R}

IS × (AL | HL)

462.3/5~~/5.1~~/5.2

{5.0.0: 105-C37} {5.1.0: 114-C30} {5.2.0: 114-A86, 120-M1}

LB30 Do not break between letters, numbers, or ordinary symbols and opening or closing parentheses.Withdrawn. In Unicode 5.0, rule LB30 was intended to prevent breaks in cases where a part of a word appears between delimiters—for example, in “person(s)”. The rule was withdrawn because it prevented desirable breaks after certain Asian~~Do not break between letters, numbers, or ordinary symbols and opening or closing~~ ~~punctuation characters with class CL. See Example 9 of Section 8, Customization, for options for restoring the functionality.~~.

462.4/5/5.1

{5.0.0: 105-C37} {5.1.0: 114-C30}

This paragraph was deleted. ~~(AL | NU) × OP~~

462.5/5/5.1

{5.0.0: 105-C37} {5.1.0: 114-C30}

This paragraph was deleted. ~~CL × (AL | NU)~~

462.6/5/5.1

{5.0.0: 105-C37} {5.1.0: 114-C30}

This paragraph was deleted. ~~The purpose of this rule is to prevent breaks in common cases where a part of a word appears between delimiters—for example, in “person(s)”.~~

462.7~~/5.2/6.1/13~~/16

{5.2.0: 114-A86, 120-M1} {6.1.0: 125-A99; L2/11-141R} {13.0.0: 160-A75, 161-A47, 162-A42; PRI-406#ID20191014184618} {16.0.0: 179-C28, 179-A102}

(AL | HL | NU) × [OP-$EastAsian]~~[\p{ea=F}\p{ea=W}\p{ea=H}]]~~

462.8~~/5.2/6.1/13~~/16

{5.2.0: 114-A86, 120-M1} {6.1.0: 125-A99; L2/11-141R} {13.0.0: 160-A75, 161-A47, 162-A42; PRI-406#ID20191014184618} {16.0.0: 179-C28, 179-A102}

[CP-$EastAsian~~[\p{ea=F}\p{ea=W}\p{ea=H}]~~] × (AL | HL | NU)

462.9/5.2

{5.2.0: 114-A86, 120-M1}

The purpose of this rule is to prevent breaks in common cases where a part of a word appears between delimiters—for example, in “person(s)”.

462.9.1~~/13~~/16

{13.0.0: 160-A75, 161-A47, 162-A42; PRI-406#ID20191014184618} {16.0.0: 179-C28, 179-A102}

The excluded set ($EastAsian~~[\p{ea=F}\p{ea=W}\p{ea=H}]~~) refines the behavior of this rule, to enable a break before an East Asian OP or after an East Asian CP. Those cases are identified by excluding East_Asian_Width values of Fullwidth, Wide, or Halfwidth. This is illustrated by the following example, which shows East Asian corner brackets immediately following a Latin letter in Japanese text. In such a case, the preferred line break is between the Latin letter and the opening angle bracket.

462.9.2/13

{13.0.0: 160-A75, 161-A47, 162-A42; PRI-406#ID20191014184618}

Preferred

Bad Break

462.9.3/13

{13.0.0: 160-A75, 161-A47, 162-A42; PRI-406#ID20191014184618}

日中韓統合漢字拡張G 「ユニコード」

日中韓統合漢字拡張 G「ユニコード」

462.10~~/6.2~~/9

{6.2.0: 131-C16, 132-C33} {9.0.0: 146-A46, 147-C26}

LB30a Break between two~~Do not break between~~ regional indicator symbols if and only if there are an even number of regional indicators preceding the position of the break..

462.11~~/6.2~~/9

{6.2.0: 131-C16, 132-C33} {9.0.0: 146-A46, 147-C26}

sot (RI RI)* RI × RI

462.12/9

{9.0.0: 146-A46, 147-C26}

[^RI] (RI RI)* RI × RI

462.12.a/9

{9.0.0: 146-A46, 147-C26}

Ramification: This rule requires extended context.

462.13/9/14

{9.0.0: 146-A46, 147-C26} {14.0.0: 167-A94, 168-C7, 168-C8, 168-A98; L2/21-135R}

LB30b Do not break between an emoji base (or potential emoji) and an emoji modifier.

462.14/9

{9.0.0: 146-A46, 147-C26}

EB × EM

462.15/14

{14.0.0: 167-A94, 168-C7, 168-C8, 168-A98; L2/21-135R}

[\p{Extended_Pictographic}&\p{Cn}] × EM

462.15.a/14

{14.0.0: 167-A94, 168-C7, 168-C8, 168-A98; L2/21-135R}

Reason: The property-based rule provides some degree of future-proofing, by preventing implementations running earlier of Unicode from breaking emoji sequences encoded in later versions. Any future emoji will be encoded in the space preallocated as \p{Extended_Pictographic} in Unicode Version 13.0.0, see 168-C7.

462.15.b/14

{14.0.0: 167-A94, 168-C7, 168-C8, 168-A98; L2/21-135R}

Ramification: As emoji get encoded, new line break opportunities may appear between those that did not turn out to be an emoji base and subsequent (dangling) emoji modifiers.

463~~/4.1~~/5

LB31~~LB 20~~ Break everywhere else.

466~~/3.1/4.0.1~~/5/10

{10.0.0: 147-A79}

7 DeletedPair Table-BasedbasedTabletable Based Implementation

467~~/3.1~~/4~~/4.0.1/4.1~~/5~~/5.1~~/10

This paragraph was deleted. ~~A two-~~ ~~dimensional table can be used to resolve break opportunities between pairs of characters. This section defines such a table. The rows of the table are labeled with~~by ~~the possible values of the line breaking property of the leading character in the pair. The~~;, ~~the~~ ~~columns are labeled with~~by ~~the line breaking classproperty~~ ~~for the following character of the pair. Each intersection is labeled with the resulting line breaking~~ ~~opportunity.~~

468~~/3.1~~/4/5~~/5.1~~/10

This paragraph was deleted. ~~The Japanese standard JIS X 4051-1995 [JIS] provides an example of~~ ~~such~~ ~~a similar table-based definition. However, it uses line breaking classes whose membership is not solely determined by the line breaking property (as in this annexAnnexreport), but in some cases by heuristic analysis or markup of the text.~~

469~~/3.1/4.1~~/5/10

This paragraph was deleted. ~~The implementation provided here directly uses the line breaking classesclassed~~ ~~defined previously.above.~~ ~~Rules LB 6, and LB 8 - LB11 require extended context for handling combining marks and spaces. This extended context is built into the code that interprets the pair table.~~

470/10

This paragraph was deleted. 7.1 Minimal Table

471~~/3.1/4.1~~/5~~/5.1~~/10

This paragraph was deleted. ~~If two rows of the table have identical values and the corresponding columns also have identical values, then the two line breaking classes can be coalesced. For example, theThe~~ ~~JIS standard uses 20 classes, of which only 14 appear to be unique. Any~~A minimal table representation is unique, except for trivial reordering of rows and columns. Minimal tables for which the rows and columns are sorted alphabetically can be mechanically compared for differences. This is in contrast to the rules, where identical results can be achieved by sets of rules that cannot be easily compared by looking at their textual representation. However, any set of rules that is equivalent to a minimal pair table can be used to automatically generate such a table, which can then be used for comparison. The rules in Section 6, Line Breaking Algorithm, can be expressed as minimal pair tables if the extended context used as described below.

472/10

This paragraph was deleted. 7.2 Extended Context

473/3.1

This paragraph was deleted. By broadening the definition of pair from B A to B SP* A, where A and B are characters and SP* is an optional run of space characters, the same table can be used to distinguish between cases where SP can or cannot provide a line breaking opportunity (i.e. direct and indirect breaks). Equivalent rules to the ones above can be formulated to the ones above, not using SP, but using % to express indirect breaks. These rules can then be simplified to involve only pairs of classes, e.g. only constructions of the form

473.1~~/3.1~~/4~~/4.1~~/5/10

This paragraph was deleted. Most of the rules in Section 6, Line Breaking Algorithm, involve only pairs of characters, or they apply to a single line break class preceded or followed by any character. These rules can be represented directly in a pair table. However, rules LB14–LB17~~LB9Rules LB 7a~~6, ~~and LB 7b, as well as LB 9~~8 ~~- LB11 similarly~~ ~~require extended context to handlefor handling combining marks and~~ ~~spaces.~~ ~~This extended context must be built into the code that interprets the pair table.~~

473.2~~/3.1~~/4~~/4.0.1/4.1~~/5/10

This paragraph was deleted. ~~By broadening the definition of a pair from B A, where~~ to ~~B is the line breaking class before a break~~, ~~and A the one after, to B SP* A, where SP* is an optional run of spaceA and B are~~ ~~characters~~ ~~and SP* is an optional run of space characters, the same table can be used to distinguish between cases where SP can or cannot provide a line breaking~~ ~~opportunity (that is,i.e.~~ ~~direct and indirect breaks). Rules equivalent to the ones given in Sectionsection~~ ~~6, Line Breaking Algorithm, can~~ ~~cancan~~ ~~be formulated without explicit use of SP~~, ~~by using % to express indirect breaks instead~~ ~~using % to express indirect breaks. These rules can then be simplified to involve only pairs of classes—~~, ~~that is,e.g.~~ ~~only constructions of the form:~~

474/10

This paragraph was deleted. ~~B ÷ A~~

475/10

This paragraph was deleted. ~~B % A~~

476/5/10

This paragraph was deleted. ~~B ×~~^ A

477~~/4.0.1/4.1~~/5/10

This paragraph was deleted. where either A or B may be empty. These simplified rules can be automatically translated into a pair table, as in Table 2. Line breaking analysis then proceeds by pair table lookup as explained below. (For readability in~~Line breaking analysis then proceeds bybe automatically translated into a~~ ~~pair~~ ~~table layout, the symbol ^ is used in the table instead of × and _ is used instead of ÷.)lookup~~, ~~as explainedin the example~~ ~~below.~~ ~~lineLine~~ ~~breaking analysis then proceeds by pair table lookup.~~

477.1~~/4.1~~/5/10

This paragraph was deleted. ~~Rule LB9LB7b~~ requires extended context for handling combining marks. This extended context must also be built into the code that interprets the pair table. For convenience in detecting the condition where A = CM, the symbols # and @ are used in the pair table, instead of % and ^, respectively. See Section 7.5, Combining Marks.

477.2~~/6.1~~/9/10

This paragraph was deleted. ~~Rule LB21a~~ ~~also~~ requires extended context to handle Hebrew letters followed by hyphens. This rule cannot be represented directly by the example pair table and is not handled by the sample implementation code included here. In the absence of special case handling, rule LB21a is effectively ignored by this example pair table and implementation code.

477.3/9/10

This paragraph was deleted. Rule LB30 requires extended context to handle the grouping of pairs of Regional Indicators. This rule is not represented by the example pair table and is not handled by the sample implementation code included here. In the absence of special case handling, rule LB30 is treated as if it were RI × RI by the example pair table and implementation code.

478~~/4.0.1~~/10

This paragraph was deleted. 7.3 Example Pair Table

479~~/3.1~~/4~~/4.0.1/4.1~~/5~~/5.2/6.3~~/8/10

This paragraph was deleted. ~~Table 2The following example table~~ ~~implements an approximation of the line breaking behavior described in this annexAnnexTechnical Report, withwithin~~ the limitation that only context of the form B SP* A is considered. BK, CR, LF, NL, and SP classes are handled explicitly in the outer loop, as given in the code sample below. Pair context of the form B CM* can be processed~~handled~~ ~~by handling the special entries @ and # inapproximately in~~in ~~the table, or explicitly in~~ ~~the drivingouter~~ ~~loop, as explained in Sectionsection~~ ~~7.5,~~4 ~~Combining Marks. In Table 2~~, ~~the rowsConjoining jamos~~ ~~are labeled with the B class and the columnsconsidered separately in Section 7.6, Conjoining Jamos. In Table 2, the rows~~ ~~are labeled with the A~~B~~Using the example pair table for CM is equivalent to making the simplifying assumption that combining marks are only applied to base characters of line breaking~~ ~~class.~~ ~~and the columns are labeled with the A class.AL or SP. Conjoining Jamos are considered separately in Section 7.5 Conjoin. Using the table entries is equivalent to making the simplifying assumption that combining marks are only applied to AL. (Such an assumption does not hold when conjoining Jamos.~~ ~~are used).~~

479.1/4~~/4.1~~/5/10

This paragraph was deleted. ~~Table 2.~~: ~~Example Pair Table~~

480/4/4.0.1

This paragraph was deleted. '‘~~After'~~’ ~~class~~

481/4/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. OP

~~ZWJ~~

482/4~~/4.1~~/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. OP

483/4/5~~/5.1/5.2/6.1/6.2~~/9/10

This paragraph was deleted. CL

_%_

483.1~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. CP

484/4~~/4.1~~/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. QU

485/4~~/4.1~~/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. GL

486/4/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. NS

487/4/5~~/5.2/6.1/6.2~~/8/9/10

This paragraph was deleted. EX

488/4/5~~/5.2/6.1/6.2~~/8/9/10

This paragraph was deleted. SY

489/4~~/4.0.1~~/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. IS

490/4~~/4.1~~/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. PR

491/4/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. PO

492/4/5~~/5.1/5.2/6.1/6.2~~/9/10

This paragraph was deleted. NU

^_%_

493/4/5~~/5.1/5.2/6.1/6.2~~/9/10

This paragraph was deleted. AL

^_%_

493.1~~/6.1/6.2~~/9/10

This paragraph was deleted. HL

494/4/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. ID

495/4/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. IN

496/4/5~~/5.1/5.2/6.1/6.2~~/9/10

This paragraph was deleted. HY

497/4/5~~/5.1/5.2/6.1/6.2~~/9/10

This paragraph was deleted. BA

498/4~~/4.1~~/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. BB

499/4/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. B2

500/4~~/4.0.1/4.1~~/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. ZW

_#_

501/4/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. CM

501.1/4~~/4.1~~/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. WJ

501.1.1/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. H2

501.1.2/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. H3

501.1.3/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. JL

501.1.4/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. JV

501.1.5/5~~/5.2/6.1/6.2~~/9/10

This paragraph was deleted. JT

501.1.5.1~~/6.2~~/9/10

This paragraph was deleted. RI

501.1.5.2/9/10

This paragraph was deleted. EB

501.1.5.3/9/10

This paragraph was deleted. EM

501.1.5.4/9/10

This paragraph was deleted. ~~ZWJ~~

501.1.6~~/5.2~~/10

This paragraph was deleted. ~~Table 2 uses the following notation:~~

501.2/4~~/4.0.1/4.1~~/5~~/6.1~~/10

This paragraph was deleted. ~~Resolved outside the pair tableSuppressed: AI, BK, CB, CJ, CR, LF, NL, SA, SG, SP, XX~~ ~~SP BK SG CR LF CB SA AI NL~~

501.3~~/4.1~~/5.2

This paragraph was deleted. ~~Table 2 uses the following notation:~~

501.4/8/10

This paragraph was deleted. ~~Symbol~~

~~Denotes~~

~~Explanation~~

502/4~~/4.1~~/8/10

This paragraph was deleted. • ^

~~denotes a~~ ~~prohibited break~~

: ~~B ^ A is equivalent to B SP* × A; in other words, neverNever~~ ~~break before A and after Bhere, even if one or more spaces intervene.~~

503/4/4.1

This paragraph was deleted. ~~• As a reminder, B ^ A is equivalent to~~^ B ~~is equivalent to B~~ ~~SP* × A.~~

504~~/3.2~~/4/4.1

This paragraph was deleted. ~~• % denotes an indirect break opportunity: Don’t break before Ahere, unless one or~~of ~~more spaces follow B.intervene.~~

504.1~~/4.1~~/5/8/10

This paragraph was deleted. %

~~denotes an~~ ~~indirect break opportunity~~

:. ~~B % A is equivalent to B × A and B SP+ ÷ A; in other words, do not break before A, unless one or more spaces follow B.~~

504.2~~/4.1~~/5/8/10

This paragraph was deleted. @

~~denotes a~~ ~~prohibited break for combining marks~~

: ~~B @ A is equivalent to B SP* × A, where A is of class CM. For more details, see~~ >~~Section 7.5, Combining Marks.~~

505/4/4.1

This paragraph was deleted. ~~• As a reminder, B % A is equivalent to~~% ~~B × A andis equivalent to~~ B ~~× A and B~~ ~~SP+ ÷ A.~~

505.1/4~~/4.1~~/5/8/10

This paragraph was deleted. • #

~~denotes an~~ ~~indirect break opportunity for combining marks following a space~~

:. ~~B # A is equivalentIt is similar~~ ~~to (B ×an indirect break, but if a break is taken it is before the last space in front of~~ ~~A and B SP+ ÷ A), where A is of classcla5ss~~ ~~CM.~~.

505.2/4/4.1

This paragraph was deleted. ~~• In other words, B # A is equivalent to B × A and B SP* ÷ SP A.~~

506/4~~/4.0.1/4.1~~/8/10

This paragraph was deleted. • _

~~an empty cell~~ ~~denotes a~~ ~~direct break opportunity~~

(~~equivalent to ÷ as defined above~~). ~~These are blank to make them easier to distinguish in the table.~~

506.1/4/4.0.1

This paragraph was deleted. ~~• These are left blank in the table to make them easier to distinguish.~~

506.2/4~~/4.0.1/4.1~~/5~~/6.3~~/10

This paragraph was deleted. ~~Note: HoverIn the online edition, hoverHovering over the cells in a browser withenabled for~~ ~~tool-tips~~, enabled reveals the rule number that determines the breaking status for the pair in question. When a pair must be tested both with and without intervening spaces, multiple rules are given. Hovering over a line breaking class name gives a representative member of the class and additional information. Clicking on any line break class name anywhere in the document jumps~~has~~ to be tested with and without intervening spaces, multiple rules are given. Hovering over a line breaking class name gives a representative member of the class and additional information. Clicking on any line break class name anywhere~~the case~~ ~~in the document jumpsquestion. When a case has~~ to be tested with and without intervening spaces, multiple rules are given. Hovering over a line break class name gives a sample member of the class and additional information. Clicking on any line break class name anywhere jumps to ~~the definition.~~

507/10

This paragraph was deleted. 7.4 Sample Code

508/4~~/4.1~~/5~~/6.3~~/10

This paragraph was deleted. ~~The following two sectionsfunctions~~ ~~provide sample code [Code14Code] that demonstratesdemonstrate~~ ~~how the pair table is used. For a complete implementation of the line breaking algorithm, if statements to handle the~~ ~~following~~ ~~line breaking classes CR, LF, and NL need to be added: CR, LF, NL, CB,~~ ~~SG,~~ XX~~. They have been omitted here for brevity, but see Section 7.6~~7~~, Explicit Breaks.~~.

508.1~~/4.1~~/10

This paragraph was deleted. The sample code assumes that the line breaking classes AI, CB, SG, and XX have been resolved according to rule LB1 as part of initializing the pcls array. The code further assumes that the complex line break analysis for characters with line break class SA is handled in function findComplexBreak, for which the following placeholder is given:

509~~/3.1~~/10

This paragraph was deleted.     // placeholder function for complex break analysis

509.1~~/4.1~~/10

This paragraph was deleted.     // cls - resolved line break class, may differ from pcls[0]

509.2~~/4.1~~/10

This paragraph was deleted.     // pcls - pointer to array of line breaking classes (input)

509.3~~/4.1~~/10

This paragraph was deleted.     // pbrk - pointer to array of line breaking opportunities (output)

509.4~~/4.1~~/10

This paragraph was deleted.     // cch - remaining length of input

509.5/5/10

This paragraph was deleted.     int

510~~/3.1/4.1~~/5~~/5.1~~/10

This paragraph was deleted.     int findComplexBreak(enum break_class cls, enum break_classint *pcls, int *pbrk, int cch)

511/3.1

This paragraph was deleted. {

511.1~~/4.1~~/10

This paragraph was deleted.                              enum break_action *pbrk, int cch)

512/3.1

This paragraph was deleted.                   if (!cch)                   return 0;

512.1~~/3.1~~/10

This paragraph was deleted.     {

512.2~~/3.1/4.1~~/10

This paragraph was deleted.             if (!cch)

512.3~~/3.1/4.1~~/10

This paragraph was deleted.                 return 0;

513~~/3.1~~/4.1

This paragraph was deleted.                   int cls = pcls[0];

514~~/3.1/4.0.1/4.1~~/5/10

This paragraph was deleted.                      for (int ich = 10; ich < cch; ich++) {

515~~/4.1/5.1~~/10

This paragraph was deleted.

516~~/3.1/4.1~~/10

This paragraph was deleted.                                     // .. do complex break analysis here

516.1/4~~/4.1~~/10

This paragraph was deleted.                   // and report any break opportunities in pbrk ..

517~~/4.1~~/5

This paragraph was deleted.

517.1/5~~/5.1~~/10

This paragraph was deleted.

517.2/5/10

This paragraph was deleted.                 pbrk[ich-1] = PROHIBITED_BRK; // by default, no break

517.3/5~~/5.1~~/10

This paragraph was deleted.

518~~/3.1/4.1~~/10

This paragraph was deleted.                                     if (pcls[ich] != SA)

519~~/3.1/4.1~~/5/10

This paragraph was deleted.                                                   break;

520~~/4.1~~/5/10

This paragraph was deleted.                   }

521~~/3.1/4.1~~/10

This paragraph was deleted.                      return ich;

522/3.1

This paragraph was deleted. }

523/3.1

This paragraph was deleted.

523.1~~/3.1~~/10

This paragraph was deleted.     }

523.2~~/3.1/4.1/5.2~~/10

This paragraph was deleted. ~~The entries in the example pair table correspond to the following enumeration. For diagnostic purposes, the sample code returns these valuesvalue~~ ~~to indicate not only the location but also the type of rule that triggered a given break opportunity.~~

523.3~~/3.1/4.0.1/4.1~~/10

This paragraph was deleted.      enum break_action {

523.4~~/3.1/4.1~~/10

This paragraph was deleted.            DIRECT_BRKDBK = 0,             // _direct break     (blank in table)

523.5~~/3.1/4.1~~/10

This paragraph was deleted.            INDIRECT_BRK,          IBK,     // indirect break   (% in table)

523.5.1/4~~/4.1~~/10

This paragraph was deleted.            COMBINING_INDIRECT_BRKCBK,     // combining break  (# in table)

523.5.2~~/4.1~~/10

This paragraph was deleted.            COMBINING_PROHIBITED_BRK,   // @ in table

523.6~~/3.1/4.1~~/10

This paragraph was deleted.            PROHIBITED_BRK,          PBK };   // prohibited break (^ in table)

523.7~~/4.1/5.1~~/10

This paragraph was deleted.            EXPLICIT_BRKEXPLICTI_BRK };             // ! in rules

523.8~~/4.1~~/5~~/5.1~~/10

This paragraph was deleted. Because the contexts involved in indirect breaks of the form B SP* A are of indefinite length, they need to be handled explicitly in the driver code. The sample implementation of a findLineBrk function below remembers the line break class for the last characters seen, but skips any occurrence of SP without resetting this value. Once character A is encountered, a simple lookback is used to see if it is preceded by a SP. This lookback is necessary only ~~necessary~~ ~~if B % A. To handle the case of a SP following sot, it is necessary to set cls to a dummy value. Using WJ gives the correct result and, as required, is unaffected by any tailoring.~~

524~~/3.1~~/10

This paragraph was deleted.     // handle spaces separately, all others by table

525~~/3.1~~/4/10

This paragraph was deleted.     // pcls - pointer to array of line breaking classes (input)

526~~/3.1~~/10

This paragraph was deleted.     // pbrk - pointer to array of line break opportunities (output)

527~~/3.1~~/10

This paragraph was deleted.     // cch - number of elements in the arrays (“count of characters”) (input)

528~~/3.1/4.1~~/10

This paragraph was deleted.     // ich - current index into the arrays (variable) (returned value)

528.1~~/4.1~~/10

This paragraph was deleted.     // cls - current resolved line break class for 'before' character (variable)

528.2~~/4.1~~/5

This paragraph was deleted.     // fTailorSPCM - selects a tailoring to keep SP CM together (see section 8.3)

528.3/5/10

This paragraph was deleted.

528.4/5~~/5.1~~/10

This paragraph was deleted.     int

529~~/3.1~~/4~~/4.1~~/5/10

This paragraph was deleted.     int findLineBrkfindLineBrk1(enum break_classint *pcls, enum break_actionint *pbrk, int cch, bool fTailorSPCM)

530/3.1

This paragraph was deleted. {

531/3.1

This paragraph was deleted.                   if (!cch) return;

531.1~~/3.1~~/10

This paragraph was deleted.     {

531.2~~/3.1/4.1~~/5

This paragraph was deleted.          if (!cch)

531.3~~/3.1/4.1~~/5

This paragraph was deleted.               return 0;O;

531.4/5/10

This paragraph was deleted.         if (!cch) return 0;

532~~/4.1/5.1~~/10

This paragraph was deleted.

533~~/3.1/4.1~~/10

This paragraph was deleted.                  enum break_class int  cls = pcls[0];   // class of 'before' character

533.1~~/3.1/4.1~~/5.1

This paragraph was deleted.

533.1.1~~/5.1~~/10

This paragraph was deleted.

533.1.2~~/5.1~~/10

This paragraph was deleted.         // treat SP at start of input as if it followed a WJ

533.1.3~~/5.1~~/10

This paragraph was deleted.         if (cls == SP)

533.1.4~~/5.1~~/10

This paragraph was deleted.             cls = WJ;

533.1.5~~/5.1~~/10

This paragraph was deleted.

533.2~~/3.1~~/4~~/4.1~~/10

This paragraph was deleted.          // loop over all pairs in the string up to a hard break

534~~/3.1~~/4.1

This paragraph was deleted.                   for (int ich = 1; (ich < cch) && (cls != BK); ich++) {

535/4.1

This paragraph was deleted.

535.1~~/4.1~~/10

This paragraph was deleted.         for (int ich = 1; (ich < cch) && (cls != BK); ich++) {

535.2~~/4.1~~/5.1

This paragraph was deleted.

535.3~~/4.1~~/5.1

This paragraph was deleted.             // handle explicit breaks here (see Section 7.7)

535.3.1~~/5.1~~/10

This paragraph was deleted.

535.3.2~~/5.1~~/10

This paragraph was deleted.             // to handle explicit breaks, replace code from "for" loop condition

535.3.3~~/5.1/6.3~~/10

This paragraph was deleted.             // above to comment below by code given in Section 7.67

535.4/5/10

This paragraph was deleted.

536~~/3.1/4.1~~/10

This paragraph was deleted.                                     // handle spaces explicitly

537~~/3.1/4.1~~/10

This paragraph was deleted.                                     if (pcls[ich] == SP) {

538~~/3.1/4.1~~/5/10

This paragraph was deleted.                                                   pbrk[ich-1] = PROHIBITED_BRK;   // apply rule LB7LB4: × SPPBK;XX;

539~~/3.1/4.1~~/10

This paragraph was deleted.                                                   continue;                       // do not update cls

540~~/3.1/4.1~~/10

This paragraph was deleted.                                     }

541~~/4.1/5.1~~/10

This paragraph was deleted.

542~~/3.1/4.1~~/10

This paragraph was deleted.                                     // handle complex scripts in a separate function

543~~/3.1/4.1~~/10

This paragraph was deleted.                                     if (pcls[ich] == SA) {

544~~/3.1/4.1~~/5~~/5.1~~/10

This paragraph was deleted.                                                   ich += findComplexBreak(cls, &pcls[ich-1], &pbrk[ich-1], cch - (ich-1));

544.1/5/10

This paragraph was deleted.                            cch - (ich-1));

545~~/3.1/4.1~~/10

This paragraph was deleted.                                                   if (ich < cch)

546~~/3.1/4.1~~/10

This paragraph was deleted.                                                                     cls = pcls[ich];

547~~/3.1/4.1~~/10

This paragraph was deleted.                                                   continue;

548~~/3.1/4.1~~/10

This paragraph was deleted.                                     }

549~~/4.1/5.1~~/10

This paragraph was deleted.

550~~/3.1/4.1~~/10

This paragraph was deleted.                                     // lookup pair table information in brkPairs[before, after];

551~~/3.1/4.1~~/10

This paragraph was deleted.                                   enum break_action  int brk = brkPairs[cls][pcls[ich]];

552/4.1

This paragraph was deleted.

552.1~~/4.1/5.1~~/10

This paragraph was deleted.

552.2~~/4.1~~/10

This paragraph was deleted.             pbrk[ich-1] = brk;                     // save break action in output array

552.3~~/4.1/5.1~~/10

This paragraph was deleted.

553~~/3.1/4.1~~/10

This paragraph was deleted.                                     if (brk == INDIRECT_BRKIBKSS)       {             // resolve indirect break

554~~/3.1~~/4.1

This paragraph was deleted.                                                   pbrk[ich-1] = ((pcls[ich - 1] == SP) ? IBK : PBKSS : XX);

554.1/4/4.1

This paragraph was deleted.               } else if (brk == CBK) {

554.2/4/4.1

This paragraph was deleted.                     if (ich > 1 && (pcls[ich - 1] == SP))

554.3/4/4.1

This paragraph was deleted.                         pbrk[ich-2] = ((pcls[ich - 2] == SP) ? IBK : DBK);

554.4/4/4.1

This paragraph was deleted.                     pbrk[ich-1] = PBK;

554.5~~/4.1/5.1~~/10

This paragraph was deleted.                 if (pcls[ich - 1] == SP)           // if context is A SP +* B

554.6~~/4.1~~/10

This paragraph was deleted.                     pbrk[ich-1] = INDIRECT_BRK;    //       break opportunity

555~~/3.1/4.1~~/10

This paragraph was deleted.                                      } else                               // else{

556~~/3.1~~/4.1

This paragraph was deleted.                                                   pbrk[ich-1] = brk;

557~~/3.1~~/4.1

This paragraph was deleted.                                     }

558~~/3.1~~/4.1

This paragraph was deleted.                                     cls = pcls[ich];

559/3.1

This paragraph was deleted.                   }

560/3.1

This paragraph was deleted.                   pbrk[ich-1] = 0;

561/3.1

This paragraph was deleted.

562/3.1

This paragraph was deleted.                   return ich;

562.1~~/4.1~~/10

This paragraph was deleted.                     pbrk[ich-1] = PROHIBITED_BRK;  //       no break opportunity

562.2~~/4.1~~/10

This paragraph was deleted.             }

562.3~~/4.1~~/10

This paragraph was deleted.

562.4~~/4.1~~/10

This paragraph was deleted.             // handle breaks involving a combining mark (see Section 7.5)

562.5~~/4.1~~/10

This paragraph was deleted.

562.6~~/4.1~~/10

This paragraph was deleted.             // save cls of 'before' character (unless bypassed by 'continue')

562.7~~/4.1/5.1~~/10

This paragraph was deleted.             cls = pcls[ich];

563/3.1

This paragraph was deleted. }

563.1~~/3.1~~/10

This paragraph was deleted.         }

563.2~~/3.1~~/5

This paragraph was deleted.         // always break at the end

563.3~~/3.1/4.1~~/5

This paragraph was deleted.         pbrk[ich-1] = EXPLICIT_BRK;DBK;

563.3.1/5~~/5.1~~/10

This paragraph was deleted.         pbrk[ich-1] = EXPLICIT_BRK;                                      // always break at the end

563.4~~/3.1/4.1/5.1~~/10

This paragraph was deleted.

563.5~~/3.1~~/10

This paragraph was deleted.         return ich;

563.6~~/3.1/4.1~~/10

This paragraph was deleted.     }

564/4/5/10

This paragraph was deleted. ~~The function returns all of the break opportunities in the array pointed to by pbrk, using the values in the table. On return, pbrk[ich] is the type ofThe code assumes that the predefined value SS is used for~~ ~~break after the character at index ich.opportunities marked by an % entry in the table and the value XX for an entry marked by an ^ above.~~

564.1~~/4.1~~/5/10

This paragraph was deleted. A common optimization in implementation is to determine only the nearest line break opportunity prior to the position of the first character that would cause the line to become overfull. Such an optimization requires backward~~backwards~~ ~~traversal of the string instead of forward traversalforwards~~ ~~as shown in the sample code.~~

565/10

This paragraph was deleted. 7.5 Combining Marks

566~~/3.1/4.0.1~~/4.1

This paragraph was deleted. ~~If one makes the simplifying assumption that combining marks are only applied to AL~~, or SP, and that applying a combining mark to any other character turns the combination into AL, then CM can be handled in the table as shown, by introducing a specialized type of indirect break. The expression~~. (Such an assumption does not hold when conjoining Jamos are used).Otherwise a simple statement in the outer loop~~

567/3.1

This paragraph was deleted. if (pcls[i] == CM) {

568/3.1

This paragraph was deleted.                 pbrk[ich-1] = 0;

569/3.1

This paragraph was deleted.                 continue;

569.1~~/4.0.1~~/4.1

This paragraph was deleted. ~~B # A~~

569.2~~/4.0.1~~/4.1

This paragraph was deleted. denotes an indirect break opportunity for combining marks following a space. It is similar to an indirect break, but if a break is taken it is before the last space in front of A. In other words, B # A is equivalent to applying both B × A and B SP* ÷ SP A.

569.3~~/4.1~~/5/10

This paragraph was deleted. ~~The implementation of combining marks in the pair table presents an additional complication because rule LB9LB7b~~ defines a context X CM* that is of arbitrary length. There are some similarities to the way contexts of the form B SP* A that are involved in indirect breaks are evaluated. However, contexts of the form SP CM* or CM* SP also need to be handled, while rule LB10~~LB7c~~ ~~requires some CM* to be treated like AL.~~

569.4~~/4.1~~/5~~/5.2~~/10

This paragraph was deleted. ~~Implementing LB10. This ruleThe latter~~ ~~can be reflected directly in the example pair table in Table 2 by assigning the same values in the row marked CM as in the row marked AL. Incidentally, thisThis~~ ~~is equivalent to rewriting the rules LB11–LB31LB8—LB20~~ ~~by duplicating any expression that contains an AL on its left handlefthand~~ ~~side with another expression that contains a CM. For example, in LB22LB16~~

569.5~~/4.1~~/10

This paragraph was deleted. ~~AL × IN~~

569.6~~/4.1~~/5/10

This paragraph was deleted. ~~would becomebecomes~~

569.7~~/4.1~~/5/10

This paragraph was deleted. ~~AL × IN CM × IN~~.

569.8~~/4.1~~/5~~/5.2~~/10

This paragraph was deleted. ~~Rewriting these rules as indicated here (and then deleting LB10)This~~ ~~is fully equivalent to the original rules because rule LB9LB7c because rule LB7b~~ ~~already accounts for all CMs that are not supposed to be treated like AL. For a complete descriptionprescription~~ ~~see Example 9 in Section 8.2, Examples of Customization.~~

569.9~~/4.1~~/5/10

This paragraph was deleted. ~~Implementing LB9. Rule LB9LB7b~~ is implemented in the example pair table in Table 2 by assigning a special # entry in the column marked CM for all rows referring to a line break class that allows a direct or indirect break after itself. (Note that the intersection between the row for class ZW and the column for class CM must be assigned “'_”' ~~because of rule LB8LB5.) The # corresponds to a break_action value of COMBINING_INDIRECT_BREAK, which triggers the following code in the sample implementation:~~

569.10~~/4.1~~/10

This paragraph was deleted.     else if (brk == COMBINING_INDIRECT_BRK) {    // resolve combining mark break

569.11~~/4.1~~/5/10

This paragraph was deleted.         pbrk[ich-1] = PROHIBITED_BRK;             // do notdon't break before CM

569.12~~/4.1~~/10

This paragraph was deleted.         if (pcls[ich-1] == SP){

569.13~~/4.1~~/5/10

This paragraph was deleted.             #ifndef LEGACY_CMif (!fTailorSPCM)                    // new: space is not a baseuntailored:

569.14~~/4.1~~/10

This paragraph was deleted.                 pbrk[ich-1] = COMBINING_INDIRECT_BRK;    // apply rule SP ÷

569.15~~/4.1~~/5~~/5.1~~/10

This paragraph was deleted.             #else

569.16~~/4.1~~/5

This paragraph was deleted.             {

569.17~~/4.1~~/5/10

This paragraph was deleted.                 pbrk[ich-1] = PROHIBITED_BRK;      // legacy:optionally, keep SP CM together

569.18~~/4.1~~/10

This paragraph was deleted.                 if (ich > 1)

569.19~~/4.1~~/5~~/5.1~~/10

This paragraph was deleted.                     pbrk[ich-2] = ((pcls[ich - 2] == SP) ? INDIRECT_BRK : DIRECT_BRK);

569.19.1/5/10

This paragraph was deleted.                                                   INDIRECT_BRK : DIRECT_BRK);

569.20~~/4.1~~/5/10

This paragraph was deleted.             #endif}

569.21~~/4.1~~/5/10

This paragraph was deleted.         } else                                   // apply rule LB9LB7b: X CM * -> X

569.22~~/4.1~~/5/10

This paragraph was deleted.             continue;                            // do notdon't update cls

570/3.1

This paragraph was deleted. }

570.1~~/3.1/4.0.1~~/4.1

This paragraph was deleted. However, this is only an approximation and it is still necessary to treat CM at the beginning of the text. Therefore it is preferable to handle CM outside of the pair table in the driver code. Adding a simple statement in the loop

570.2~~/3.1~~/4.1

This paragraph was deleted.     // handle combining marks

570.3~~/3.1/4.0.1~~/4.1

This paragraph was deleted.     if (pcls[ich] == CM){

570.4~~/3.1~~/4/4.1

This paragraph was deleted.        if (pcls[ich-1] == SP){

570.5~~/3.1~~/4.1

This paragraph was deleted.           cls = ID;

570.6~~/3.1~~/4.1

This paragraph was deleted.           if (ich > 1)

570.7~~/3.1~~/4.1

This paragraph was deleted. 	     pbrk[ich-2] = brkPairs[pcls[ich-2]][ID] == DBK ? DBK : PBK;

570.8~~/3.1~~/4.1

This paragraph was deleted.        }

570.9~~/3.1~~/4.1

This paragraph was deleted.        pbrk[ich-1] = PBK;

570.10~~/3.1~~/4.1

This paragraph was deleted.        continue;

570.11~~/3.1~~/10

This paragraph was deleted.     }

571~~/3.1/4.0.1~~/4.1

This paragraph was deleted. ~~would have the effect of letting the CM take on the class of the preceding non-CM characters. It also takes care of rule LB7aLB7, treating a combining mark applied to a SP as if it was ID. Covering the case of a missing base character at the beginning of the line (rule LB7c)This also~~ ~~requires a statement in the setup part before the loopspecial rule~~ ~~to :cover the case of a missing base character at the beginning of the line:in the setup part before the loop:~~

572/3.1

This paragraph was deleted. if (pcls[i] == CM)

573/3.1

This paragraph was deleted.                 cls = SP;

573.1~~/3.1~~/4.1

This paragraph was deleted.     // handle missing base character

573.2~~/3.1/4.0.1~~/4.1

This paragraph was deleted.     if (cls == CM)

573.3~~/3.1/4.0.1~~/4.1

This paragraph was deleted.             cls = AL;ID;

573.4/4/4.1

This paragraph was deleted. 7.5 Conjoining Jamos

573.5/4~~/4.0.1~~/4.1

This paragraph was deleted. In principle, line breaking analysis would follow grapheme cluster boundary detection. This would handle combining character sequences containing both non-spacing marks and conjoining Jamo sequences as units. However, in order to do the analysis in one pass, combining character sequences can be handled approximately as described above. For Korean Syllable Blocks, a simple pair table can be constructed based on the information in [Boundaries]. The input to such a pair table would be Hangul~~Korean~~ ~~Syllable Type [HangulST] values.~~

573.6~~/4.1~~/10

This paragraph was deleted.

573.7~~/4.1~~/5/10

This paragraph was deleted. ~~When handling a COMBINING_INDIRECT_BREAK, theThe~~ ~~last remembered line break class in variable cls is not updated, except for those cases covered by rule LB10LB7c. A tailoring of rule LB9LB7b~~ that keeps the last SPACE character preceding a combining mark, if any, and therefore breaks before that SPACE character can easily be implemented as shown in the sample code. (See Section 9.2, Legacy Support for Space Character as Base for Combining Marks.)

573.8~~/4.1~~/5/10

This paragraph was deleted. ~~Any rows in Table 2Rows~~ ~~for line break classes that prohibit breaks after must be handled explicitly. In the example pair table, these are assigned a special entry “~~'~~@”,~~' ~~which corresponds to a special break action of COMBINING_PROHIBITED_BREAK thatand~~ ~~triggers the following code:~~

573.9~~/4.1~~/5/10

This paragraph was deleted.     else if (brk == COMBINING_PROHIBITED_BRK) {  // this is the case OP SP* CM

573.10~~/4.1~~/10

This paragraph was deleted.         pbrk[ich-1] = COMBINING_PROHIBITED_BRK;  // no break allowed

573.11~~/4.1~~/10

This paragraph was deleted.         if (pcls[ich-1] != SP)

573.12~~/4.1~~/5/10

This paragraph was deleted.             continue;                            // apply rule LB9LB7b: X CM* -> X

573.13~~/4.1/5.1~~/10

This paragraph was deleted.     }

573.14~~/4.1~~/5/10

This paragraph was deleted. ~~The only line break class that unconditionally prevents breaks across a following SP is OP. The precedingThis~~ ~~code fragment ensures that OP CM is handled according to rule LB9LB7c~~ ~~and OP SP CM is handled as OP SP AL according to rule LB10.LB7c.~~

573.15~~/4.1~~/6.3

This paragraph was deleted. 7.6 Conjoining Jamos

573.16~~/4.1~~/5/6.3

This paragraph was deleted. ~~For Korean Syllable Blocks, the information in rule LB26 is represented bysyllable blocks,~~ ~~a simple pair table showncan be constructed based on the information~~ in ~~rule LB18b, and shown in~~ ~~Table 3.~~ ~~below.~~

573.17~~/4.1~~/5/6.3

This paragraph was deleted. ~~Table 3.~~: ~~Korean Syllable Block Pair Table~~

573.18~~/4.1~~/6.3

This paragraph was deleted. H2

573.19~~/4.1~~/6.3

This paragraph was deleted. H2

573.20~~/4.1~~/6.3

This paragraph was deleted. H3

573.21~~/4.1~~/6.3

This paragraph was deleted. JL

573.22~~/4.1~~/6.3

This paragraph was deleted. JV

573.23~~/4.1~~/6.3

This paragraph was deleted. JT

573.24~~/4.1~~/5/6.3

This paragraph was deleted. When constructing a pair table such as Table 2, this pair table for Korean syllable blocks in Table 3 is merged with the main pair table for all other line break classes by adding the cells from Table 3 beyond the lower-right corner of the main pair table. Next, according to rule LB27, any empty cells in the new rows are filled with the same values as in the existing row for class ID, and any empty cells for the new columns are filled with the same values as in the existing column for class ID. The resulting~~pair table for Korean syllable blocks in Table 3 can be~~ ~~merged table is shown in Tablewith the example pair table in Table~~ 2. by adding the cells from Table 3 beyond the lower right corner of Table 2. Next, according to rule LB18c, any empty cells in the new rows are filled with the same values as the existing row for class ID, and any empty cells for the new columns are filled with the same values as the existing column for class ID. Such a merged table can be handled with the same sample code as above.

573.25~~/4.1/6.3~~/10

This paragraph was deleted. 7.67 Explicit Breaks

573.26~~/4.1~~/5~~/5.1~~/10

This paragraph was deleted. Handling explicit breaks is straightforward in the driver code, although it does clutter up the loop condition and body of the loop a bit. For completeness, the following sample shows how to change the loop condition and add if statements—both before and inside the loop— ~~to the loop~~ ~~that handle BK, NL, CR, and LF. Because NL and BK behave identically by default, this code can be simplified in implementations where the character classification is changed soassumes~~ ~~that BK will always behas been~~ ~~substituted for NL when assigning the line break class. Because this optimization does not change the result, it is not considered a tailoring and does not affect conformance.~~.

573.27~~/4.1~~/10

This paragraph was deleted.     // handle case where input starts with an LF

573.28~~/4.1~~/10

This paragraph was deleted.     if (cls == LF)

573.29~~/4.1~~/10

This paragraph was deleted.          cls = BK;

573.30~~/4.1~~/10

This paragraph was deleted.

573.31~~/4.1~~/5

This paragraph was deleted.     // loop over all pairs in the string up to a hard break

573.31.1/5~~/5.1~~/10

This paragraph was deleted.     // treat initial NL like BK

573.31.2/5/10

This paragraph was deleted.     if (cls == NL)

573.31.3/5/10

This paragraph was deleted.          cls = BK;

573.31.4/5/10

This paragraph was deleted.

573.31.5/5/10

This paragraph was deleted.     // loop over all pairs in the string up to a hard break or CRLF pair

573.32~~/4.1~~/10

This paragraph was deleted.     for (int ich = 1; (ich < cch) && (cls != BK) && (cls != CR || pcls[ich] == LF); ich++) {

573.33~~/4.1/5.1~~/10

This paragraph was deleted.

573.34~~/4.1~~/5

This paragraph was deleted.         // handle BK and LF explicitly

573.35~~/4.1~~/5

This paragraph was deleted.         if (pcls[ich] == BK || pcls[ich] == LF) {

573.35.1/5/10

This paragraph was deleted.         // handle BK, NL and LF explicitly

573.35.2/5~~/5.1~~/10

This paragraph was deleted.         if (pcls[ich] == BK ||pcls[ich] == NL ||  pcls[ich] == LF)

573.35.3/5/10

This paragraph was deleted.         {

573.36~~/4.1~~/10

This paragraph was deleted.             pbrk[ich-1] = PROHIBITED_BRK;

573.37~~/4.1~~/10

This paragraph was deleted.             cls = BK;

573.38~~/4.1~~/10

This paragraph was deleted.             continue;

573.39~~/4.1~~/10

This paragraph was deleted.         }

573.40~~/4.1/5.1~~/10

This paragraph was deleted.

573.41~~/4.1~~/10

This paragraph was deleted.         // handle CR explicitly

573.42~~/4.1~~/10

This paragraph was deleted.         if(pcls[ich] == CR)

573.43~~/4.1~~/10

This paragraph was deleted.         {

573.44~~/4.1~~/10

This paragraph was deleted.             pbrk[ich-1] = PROHIBITED_BRK;

573.45~~/4.1~~/10

This paragraph was deleted.             cls = CR;

573.46~~/4.1~~/10

This paragraph was deleted.             continue;

573.47~~/4.1~~/10

This paragraph was deleted.         }

573.48~~/4.1/5.1~~/10

This paragraph was deleted.

573.49~~/4.1~~/10

This paragraph was deleted.         // handle spaces explicitly...

573.50~~/4.1/5.1~~/10

This paragraph was deleted.

573.51~~/5.2~~/10

This paragraph was deleted.

573.52/10

{10.0.0: 147-A79}

Formerly was: Pair Table-Based Implementation.

574/4

87.6 Customization

575~~/3.1~~/4/5/5.1

A real- world line breaking algorithm has to~~must~~ be tailorable to some degree to meet user or document requirements.~~. There are three principalprinciple~~ ~~ways of tailoring a pair-table based algorithm:~~

576/3.1

This paragraph was deleted. ~~1. Change the line breaking class assignment for some characters~~

577/3.1

This paragraph was deleted. ~~2. Change the table value assigned to a pair of character classes~~

578/3.1

This paragraph was deleted. ~~3. Change the interpretation of the line breaking actions~~

579/3.1

This paragraph was deleted. ~~4. Augment the algorithm.~~

579.1/4~~/4.1~~/5

In Korean, for example, two distinct line breaking modes ~~may~~ occur, which can be summarized as breaking after each character, or breaking after spaces (as in Latin text). The former tends to occur when text is set justified;, the latter, when ragged margins are used. In that case, even ideographs~~Ideographs~~ are broken only ~~broken~~ at space characters. In Japanese, for example, tighter and looser specifications of prohibited line breaks may be used.

579.2/4/5

This paragraph was deleted. ~~In Japanese for example, tighter and looser specifications of prohibited line breaks may be used.~~

579.2.1~~/4.1~~/5

Specialized text or specialized text constructs may need specific line breaking behavior that differs from the default line breaking rules given in this annex. This may require additional tailorings beyond those considered in this section. For example, the rules given here are insufficient for mathematical equations, whether inline or in display format. Likewise, text that~~which~~ commonly contains lengthy URLs might benefit from special tailoring that suppresses SY × NU from rule LB25~~LB18~~ within the scope of a URL to allow breaks after a “'/”' separated segment in the URL regardless of whether the next segment starts with a digit. ~~or not.~~

579.2.1.1/12

Notes:

579.2.1.2/12

{12.0.0: 173-A128}

• Locale-sensitive line break specifications can be expressed in LDML [UTS35]. Tailorings are available in the Common Locale Data Repository [CLDR].

579.2.2/9~~/10/12~~/15

{10.0.0: 150-A58, 150-C22; L2/16-315R}

•~~Note:~~ Some changes to rules and data are needed~~Implementers should allow~~ for the best segmentation behavior of emoji zwj sequences [UTS51]. Implementations are strongly encouraged~~customizations~~ to use the ~~the~~ line break rules~~ing that are implemented~~ in the latest version of CLDR (Version 3531 or later) [CLDR] and the latest~~releases. Importantly, some changes to rules and data are needed for best line breaking behavior of additional~~ emoji properties (Version 12~~version 5zwj sequences, prior to the eventual publication of Unicode 10~~.0 or later) [UTS51].~~. Such changes are planned for inclusion in CLDR Version 30~~

579.3/4~~/4.1~~/12

{12.0.0: 173-A128; PRI-383#ID20190106230734}

The remainder of this section gives an overview of common types of tailorings. ~~and examples of how to customize the pair table implementation of the line breaking algorithm for these tailorings.can be used to customize the algorithm as needed.~~

579.4/4

8.1 Types of Tailoring

579.5/4~~/4.0.1/4.1~~/5/10

{10.0.0: 147-A79}

There are two~~three~~ principal ways of tailoring the ~~sample pair table implementation of the~~ line breaking algorithm:

580~~/3.1/4.1~~/5

1. Changing the line breaking class assignment for some characters~~. This· The first~~ This is useful in~~for~~ cases where the line breaking properties of one class of characters are occasionally lumped together with the properties of another class to achieve a less restrictive line breaking behavior.

581~~/3.1~~/4~~/4.1~~/5/10

{10.0.0: 147-A79}

This paragraph was deleted. ~~2. Changing the table value assigned to a pair of character classes This is particularly useful if the behavior can be expressed by a change at a limited number of pair intersections. This· The second method~~ ~~is particularly useful if the behavior can be expressed by a change at a limited number of pair intersections. This~~ ~~form of customization is equivalent to permanently overriding some of the rules in Sectionsection~~ ~~6, Line Breaking Algorithm.~~.~~These intersections can be labeled with special values that cause different actions for different customizations.~~

582~~/3.1~~/4/5/10

{10.0.0: 147-A79}

This paragraph was deleted. ~~3. Changing the interpretation of the line breaking actions· The third method~~ ~~This is a dynamic~~is equivalent of the preceding. Instead of changing the values for the pair intersection directly in the table, they are labeled with special values that cause different actions for different customizations. This is most suitable when customizations need to ~~the precedingsecond, but instead of changing the values for the pair intersection directly in the table, they can~~ be ~~labeled with special values that cause different actions for different customizations, an additional indirection is performed. This is most suitable when customizations need to be~~ ~~enableddone~~ ~~at run time.~~

582.1/10

{10.0.0: 147-A79}

2. Changing the line breaking rules Adding new rules, or altering or removing existing rules, provides more flexibility in changing the line breaking behavior. This can also include introducing new character classes for use by the new or altered rules.

583~~/3.1~~/4~~/4.0.1/4.1~~/5/15.1

{15.1.0: 173-A6; L2/22-244; L2/22-243#ID20220921075300}

For example, specialized rules could be added~~Beyond these three straightforward customization steps, it is always possible~~ to ~~augment the algorithm itself—~~, ~~for example, by providing specialized rules to~~ recognize and break common constructs, such as URLs, numeric expressions, and so on~~etc~~. Such open- ended customizations place~~which places~~ no limits onto possible changes, other than the requirement that non-tailorable~~characters with normative~~ line breaking rules~~properties~~ beto correctly implemented. This means that whatever changes are made must be equivalent to changes to the line breaking assignments of tailorable line breaking rules, and to alteration, removal, or addition of rules applied after rule LB12.~~implement characters with normative line~~ -~~breaking properties.~~.~~· The fourth method is the most open ended...~~

583.1~~/4.1~~/5/10

{10.0.0: 147-A79}

This paragraph was deleted. ~~Note:~~ ~~Reference [Cedar97] reports on a real-~~ world implementation of a pair table-based implementation of a line breaking algorithm substantially similar to the one presented here, and including the types of customizations presented in this section. That implementation was able to simultaneously meet~~met~~ ~~the requirements of customers in many European and East Asian countries with a single implementation of the algorithm.~~

584/4

8.27.7 Examples of Customization

584.1~~/3.1/4.1~~/5~~/5.2~~/10

{10.0.0: 147-A79}

Example 1. The exact method of resolving the line break class for characters with class~~wtih class~~ SA is not specified in the default algorithm. One method of implementing line breaks for complex scripts is to invoke context-based classification for all runs of characters with class SA. For example, a dictionary-based algorithm could return different classes for Thai letters depending on their context: letters at the start of Thai words would become BB and other Thai letters would become AL. Alternatively, for text consisting of, or predominantly containing characters with line breaking class SA, it may be useful to instead defer the determination of line breaks to a different algorithm ~~entirely~~. ~~Section 7.4, Sample Code,The sample code~~ ~~outlinessketches~~ ~~sucha different~~ ~~an approach in which the interface towhere~~ ~~the dictionary-based algorithm directly reports break opportunities.~~

584.2~~/3.1/4.1~~/5~~/5.1~~/6.1

Example 2. To implement terminal style line breaks, it would be necessary to allow breaks at fixed positions. These could occur inside a run of spaces or in the middle of words without regard to hyphenation. Such~~. This requires~~ a modification essentially disregards the output of the line break~~linebreak~~ing algorithm, and is~~change in the way the driver loop handles spaces and,~~ therefore not a conformant tailoring. For a system that supports both regular line breaking and terminal style line breaks, only some of its line break modes would~~, cannot~~ be conformant.~~simply done by customizing~~in ~~the pair~~ -~~table. However, the additional task of line wrapping runs of, but requires a change in the way the driver loop handles~~ ~~spaces could also be performed after the fact at the layout system level while leaving unchanged the actual line breaking algorithm.~~.

585~~/3.1~~/4~~/4.0.1/4.1~~/5~~/5.2~~/10

{10.0.0: 147-A79}

Example 3. Depending on the nature of the document, Korean either uses ~~either~~ implicit breaking around characters (type 2 as defined ~~above~~ in Section~~section~~ 3, Introduction~~Description~~) or uses spaces (type 1). Space- based layout is common in magazines and other informal documents with ragged margins, while books, with both margins justified, use the other type,~~such~~ as ~~magazines, while books, with both margins justified, use the other type, as~~ it affords more line break opportunities and therefore leads to better justification~~. Korean uses either implicit breaking around Hangul and ideographs or uses spaces~~. ~~Reference [Suign98~~1~~] shows how the necessary customizationsthis~~ ~~can be elegantly handled by selectively altering the interpretation of the pair entries.~~:~~the second or third method~~. ~~Only the intersectionsintersection~~ ~~of ID/ID, AL/ID, and ID/AL are affected. For alphabetic style line breaking, breaks for these~~ ~~four~~ ~~cases require space;~~, ~~for ideographic style line breaking, these~~ ~~four~~ ~~cases do notdon’t~~ ~~require spaces. Therefore, the implementationone~~he ~~defines a pseudo-action, which is then resolved into either direct or indirect break action based on user selection of the preferred behavior for a given~~ ~~(piece of)~~ ~~text.~~

586~~/3.1/4.1~~/5~~/5.2~~/12

{12.0.0: 173-A128; PRI-383#ID20190106230734}

Example 42. In~~Sometimes in~~ a Far Eastern context it is sometimes necessary~~required~~ to allow~~ing~~ alphabetic characters and digit strings to break anywhere ~~is required in Far Eastern context~~. According to reference [Suign981], this can again be done in the same way as Korean. This can be implemented by adjusting rules LB23, LB25 and LB28 to allow breaks between all permutations of the character classes AL andIn, ~~this casetimeby the second or third method,~~ ~~affecting~~ ~~the intersections of~~ NU.~~/NU, NU/AL, AL/AL, and AL/NU are affected.~~.

587~~/3.1~~/4~~/4.0.1/4.1~~/5~~/5.2~~/6.1

Example 5. Some users prefer~~3. Sometimes it is desirable~~ to relax the requirement that~~force~~ Kana syllables to be kept together. For~~, for~~ example,~~i.e.~~ the syllable kyu, spelled with the two kanas KI and “"small “yu”, would no longer be kept together as if~~even though~~ KI and yu were~~are normally~~ atomic. This customization can be handled ~~via the first method~~, by mapping class CJ~~changing the classification of the Kana small characters frombetween~~ NS to be handled as class ID in rule LB1.to~~and~~ NS ~~as needed.~~

587.1~~/3.1/4.0.1~~/4.1

This paragraph was deleted. ~~Reference [Cedar97] reports on a real world implementation of a pair table-table~~ ~~based implementation of a line breaking algorithm substantially similar to the one presented here, and including the types of customizations presented in this section. ThatThis~~ ~~implementation was able to simultaneously meet the requirements of customers in many European and East Asian countries with a single implementation of the algorithm.~~

587.2~~/4.1~~/5~~/5.2/6.3/15~~/15.1

{15.1.0: 173-A6; L2/22-244; L2/22-243#ID20220921075300}

Example 6. Tailor~~Some implementations may wish~~ to prevent~~tailor the~~ line breaks from falling within default~~breaking algorithm to resolve~~ grapheme clusters, as defined by by ~~according to~~ Unicode Standard Annex~~UAX~~ #29, “Unicode : Text Segmentation~~Boundaries~~” [UAX29~~Boundaries~~]. The tailoring can be accomplished by~~, as a~~ first segmenting the text into grapheme clusters according to the rules defined in UAX #29, and then finding~~stage. Generally, the~~ line breaks according to the default line break rules, as follows: After applying the mandatory line break rules, give~~giving~~ each grapheme cluster the~~ing algorithm does not create~~ line breaking ~~opportunities within default grapheme clusters;~~, ~~therefore such a tailoring would be expected to produce results that~~ ~~for most practical cases~~ ~~are close to those defined by the default algorithm. However, if such a tailoring is chosen, characters thatwhat~~ ~~are members of line break~~ class of its first code point.~~CM but not part of the definition of default grapheme clusters must still be handleddefined~~ ~~by rules LB9 and LB10, orthe default algorithm. However, if such a tailoring is chosen, characters that are members of line break class CM but not part of the definition of default grapheme clusters must still be handled~~ by ~~rules LB9 and LB10LB7b and LB7c, or by~~ ~~some additional tailoring.~~

587.2.1~~/6.3~~/15.1

{15.1.0: 173-A6; L2/22-244; L2/22-243#ID20220921075300}

An example of a grapheme cluster that would be split by the default line break rules is U+0020 SPACE followed by a ~~Zero Width Space followed by a~~ combining mark.

587.3~~/4.1~~/5/16

{16.0.0: 179-C35, 179-A116}

Example 7 (deleted). Versions 4.1.0 through 15.1.0 of The Unicode Standard defined~~. Regular expression-based line breaking engines might get better results using~~ a tailoring of the line breaking of~~that directly implements the following regular expression for~~ numeric expressions as Example 7. This tailoring was used in the test files provided with Unicode 5.1.0 and later. Since Unicode version 16.0, that behavior has been incorporated into the default; it no longer constitutes a tailoring.:

587.4~~/4.1~~/5~~/6.1~~/16

{16.0.0: 179-C35, 179-A116}

587.5~~/4.1~~/5/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~This is equivalent to replacing thetogether with PR × AL and PR × ID from~~ ~~rule LB25 by the followingLB18. In that case, LB8 must be~~ ~~tailored rule:as follows~~

587.5.1/5/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~Regex-Number: Do not break numbers.~~

587.5.2/5/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~(PR | PO) × ( OP | HY )? NU~~

587.5.3/5/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~( OP | HY ) × NU~~

587.5.4/5/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~NU × (NU | SY | IS)~~

587.5.5/5~~/6.1~~/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~NU (NU | SY | IS)* × (NU | SY | IS | CL | CP )~~

587.5.6/5~~/6.1~~/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~NU (NU | SY | IS)* (CL | CP)? × (PO | PR)~~

587.5.7/5~~/5.1/10~~/16

{10.0.0: 147-A79} {16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~This customized rule uses extended contexts that cannot be represented in a pair table.~~ In these tailored rules, (PR | PO) means PR or PO, the Symbol “?” means 0 or one occurrence and the symbol “*” means 0 or more occurrences. The last two rules can have a left side of any non-zero length.

587.5.8/5~~/5.1~~/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~When the tailored rule is used, LB13 need tomust also~~ ~~be tailored as follows:~~

587.6~~/4.1~~/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~[^NU] × CL~~

587.6.1~~/6.1~~/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~[^NU] × CP~~

587.7~~/4.1~~/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~× EX~~

587.8~~/4.1~~/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~[^NU] × IS~~

587.9~~/4.1~~/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~[^NU] × SY~~

587.10~~/4.1~~/5~~/5.1/5.2~~/16

{16.0.0: 179-C35, 179-A116}

This paragraph was deleted. ~~If this is not doneOtherwise~~,~~otherwise~~ ~~single digits mightmay~~ ~~be handled by rule LB13LB8~~ ~~before being handled in the regular expression. In these tailored rules~~, ~~[^NU] designates any line break class other than NU. The symbol ^ is used, instead of !, to avoid confusion with the use of ! to indicate an explicit break.~~

587.10.1/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

Example 8. Some scripts that traditionally follow the Brahmic style of context analysis are nowadays occasionally written with spaces, and word-based line breaking might be desired in that case. This can be accomplished by remapping the line break classes AK, AP, and AS to AL; and VI or VF to CM. In some cases other word-forming characters, such as U+A9CF JAVANESE PANGRANGKEP, also need to be remapped to AL. Digits, which may have line break class AS or ID in such scripts, need to be remapped to NU. Punctuation, which may have line break class ID in such scripts, need to be remapped to AL or BA.

587.11~~/4.1~~/5/15.1

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

This paragraph was deleted. ~~Example 8. For someSome~~ ~~implementations it may be difficultwish~~ to implement LB9 due to the added complexity of its indefinite length context. Because combining marks are most commonly applied to characters of class AL, rule LB10 by itself generally produces acceptable results for such implementations, but such an approximation is not a conformant tailoring. the algorithm to omit rule LB7b due to the added complexity of its indefinite length context. Because combining marks are most commonly applied to characters of class AL, rule LB7c alone generally produces acceptable results for such implementations.

587.11.0.1~~/5.1~~/5.2

This paragraph was deleted. ~~Example 9. Prevent breaks when part of a word appears within parentheses—for example, in “person(s)”.~~

587.11.0.2~~/5.1~~/5.2

This paragraph was deleted. ~~1. Reclassify U+0029, RIGHT PARENTHESIS, from line break class CL (Closing Punctuation) to line break class IS (Numeric Infix Separator).~~

587.11.0.3~~/5.1~~/5.2

This paragraph was deleted. ~~2. Add the following rule as LB 30:~~

587.11.0.4~~/5.1~~/5.2

This paragraph was deleted. ~~(AL | NU) × OP~~

587.11.0.5~~/5.1~~/5.2

This paragraph was deleted. This customization is one possible way of achieving the original purpose of LB30—preventing breaks in words like "person(s)"—without the undesired side effect of preventing breaks after Asian punctuation characters having line breaking class CL (Closing Punctuation).

587.11.1/5

9 Implementation Notes

587.11.2/5

This section provides additional notes on implementation issues.

587.11.3/5

9.1 Combining Marks in Regular Expression-Based Implementations

587.11.4/5/5.2

Implementations~~For implementations~~ that use regular expressions cannot directly express rules LB9 and LB10. However, it is possible to make these rules unnecessary by rewriting all the rules from LB11 on down so that the overall result of the algorithm is unchanged. This restatement of the rules is therefore not a tailoring, but rather an equivalent statement of the algorithm that can be~~possible to~~ directly express rules LB9 and LB10. However, it is possible to make these rules unnecessary by rewriting all the rules from LB11 on down so that the overall result of the algorithm is unchanged. This restatement of the rules is therefore not a tailoring, but rather an equivalent statement of the algorithm that can be directly expressed as regular expressions.

587.11.5/5

To replace rule LB9, terms of the form

587.11.7/5

B SP* # A

587.11.10/5

are replaced by terms of the form

B CM* # A

B CM* SP* # A

B CM* #

B CM* SP* #

where B and A are any line break class or set of alternate line break classes, such as (X |Y), and where # is any of the three operators !, ÷, or ×.

587.11.16/5

Note that because sot, BK, CR, LF, NL, and ZW are all handled by rules above LB9, these classes cannot occur in position B in any rule that is rewritten as shown here.

587.11.17/5

Replace LB10 by the following rule:

587.11.19/5

For each rule containing AL on its left side, add a rule that is identical except for the replacement of AL by CM, but taking care of correctly handling sets of alternate line break classes. For example, for rule

587.11.20/5

(AL | NU) × OP

587.11.21/5

add another rule

587.11.22/5

CM × OP.

587.11.23/5~~/5.2~~/16

{16.0.0: 179-C35, 179-A116}

These prescriptions for rewriting the rules are, in principle, valid even where the rules have been tailored as permitted in Section 4, Conformance. However, for extended context rules such as in LB25~~Example 7~~, additional considerations apply. These are described in Section 6.2, Replacing Ignore Rules, of Unicode Standard Annex #29, “Unicode Text Segmentation~~Boundaries~~” [UAX29~~Boundaries~~].

587.12~~/4.1~~/5

9.28.3 Legacy Support for Space Character as Base for Combining Marks

587.13~~/4.1~~/5~~/5.1/5.2~~/6/8

As stated in Section 7.9, Combining Marks of [Unicode~~Unicode5.2~~0~~Unicode~~], ~~Section 7.9~~7~~, Combining Marks,~~ combining characters are shown in isolation by applying them to U+00A0 NO-BREAK SPACE (NBSP). In earlier versions, this recommendation included the use of U+0020 SPACE. The~~This~~ use of SPACE for this purpose has been~~is now~~ deprecated because it leads~~has been found~~ to many complications in text processing. The visual appearance is the same with both NO-BREAK SPACE and SPACE, but the line breaking behavior is different. Under the current rules, SP CM* will allow a break between SP and CM*, which could result in a new line starting with a combining mark. Previously, whenever the base character was SP, the sequences CM* and SP CM* were defined~~lead~~ to act like indivisible clusters, allowing breaks on~~many complications in text processing. WhetherWhen~~ ~~usingFor~~ either sideNBSP or SPACE as the base character, the visual appearance is the same, but the line breaking behavior is different. Under the current rules, SP CM* will allow a break between SP and CM*, which could result in a new line starting with a combining mark. Previously, whenever the base character was SP, the sequences CM* andor ~~SP CM* were defined to act~~ like an ~~indivisible clusters,cluster~~ ~~allowing breaks on either side like~~ ID.

587.14~~/4.1~~/5

Where backward~~backwards~~ compatibility with documents created under the prior practice is desired, the following tailoring should be applied to those CM characters that have a General_Category value of Combining_Mark (M):~~in place of the deprecated rule LB7a.~~

587.15~~/4.1~~/5

Legacy-CM: In all of the rules following rule LB8~~rules~~, if a space is the base character for a combining mark, the space is changed to type ID. In other words, break before SP in the same cases as one would break before an ID.

587.16/4.1

Treat SP CM* as if it were ID.

587.17~~/4.1~~/5/5.1

While~~The application of~~ this tailoring changes the location of the line break opportunities in the string, it~~rule~~ is~~should~~ ordinarily not expected to affect the display of the text. That is because spaces at the end of the line are normally invisible and the recommended display for isolated combining marks is the same as if they were applied~~be limited~~ to a preceding SPACE or NBSP.~~those CM characters with General Category M.~~

587.18/5.1

10 Testing

587.19~~/5.1/5.2~~/8/16

As with the other default specifications, implementations are free to override (tailor) the results to meet the requirements of different environments or particular languages as described in Section 4, Conformance. For those who do implement the default breaks as specified in this annex~~, plus the tailoring of numbers described in Example 7 of Section 8.2, Examples of Customization,~~ and wish to check that ~~that~~ their implementation matches that specification, a test file has been made available in [Tests14].

587.20/5.1

These tests cannot be exhaustive, because of the large number of possible combinations; but they do provide samples that test all pairs of property values, using a representative character for each value, plus certain other sequences.

587.20.1~~/5.2~~/10

{10.0.0: 147-A79}

This paragraph was deleted. Note:The break opportunities produced by an implementation of the rules of Section 6, Line Breaking Algorithm differ in certain cases from those produced by the pair table included in Section 7, Pair Table-Based Implementation. The differences occur with sequences like ZW SP CL. The test data file matches the results expected of a rule based implementation. The inconsistencies between the two will be addressed in the next revision of this document.

587.21/5.1

A sample HTML file is also available for each that shows various combinations in chart form, in [Charts14]. The header cells of the chart consist of a property value, followed by a representative code point number. The body cells in the chart show the break status: whether a break occurs between the row property value and the column property value. If the browser supports tool-tips, then hovering the mouse over the code point number will show the character name, General_Category and Script property values. Hovering over the break status will display the number of the rule responsible for that status.

587.22/5.1

Note: To determine a break it is generally not sufficient to just test the two adjacent characters.

587.23/5.1

The chart is followed by some test cases. These test cases consist of various strings with the break status between each pair of characters shown by blue lines for breaks and by whitespace for non-breaks. Hovering over each character (with tool-tips enabled) shows the character name and property value; hovering over the break status shows the number of the rule responsible for that status.

587.24/5.1

Due to the way they have been mechanically processed for generation, the test rules do not match the rules in this annex precisely. In particular:

587.25/5.1

1. The rules are cast into a more regex-style.

587.26/5.1

2. The rules “sot”, “eot”, and “Any” are added mechanically and have artificial numbers.

587.27~~/5.1~~/5.2

3. The rules are given decimal numbers without prefixes~~prefix~~, so rules such as LB14 are given a number using tenths, such as 14.0.

587.28/5.1

4. Where a rule has multiple parts (lines), each one is numbered using hundredths, such as

587.29/5.1

• 13.01) [^NU] × CL

587.30/5.1

• 13.02) × EX

587.32/5.1

5. LB9 and LB10 are handled as described in Section 9.1, Combining Marks in Regular Expression-Based Implementations, resulting in a transformation of the rules not visible in the tests.

587.33/5.1

The mapping from the rule numbering in this annex to the numbering for the test rules is summarized in Table 4.

587.34~~/5.1~~/5.2

Table 4. Numbering of Test Rules

587.35/5.1

Rule in This Annex

Test Rule

Comment

587.36/5.1

LB2

0.2

start of text

587.37/5.1

LB3

0.3

end of text

587.38/5.1

LB12a

12.0

GL ×

587.39/5.1

LB12b

12.1

[^SP, BA, HY] × GL

587.40/5.1

LB31

999

÷ any

587.41~~/5.2~~/16

{16.0.0: 175-A67}

11 HistoryRule Numbering Across Versions

587.42~~/5.2~~/16

{16.0.0: 175-A67}

This paragraph was deleted. Table 5 documents changes in the numbering of line breaking rules. A duplicate number indicates that a rule was subsequently split. (In each version, the rules are applied in their numerical order, not in the order they appear in this table.) Versions prior to 3.0.1 are not documented here.

587.43~~/5.2~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~Table 5. Rule Numbering Across Versions~~

587.44~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~15.1.0~~

~~9.0.0~~

~~8.06.2~~.0

~~6.2.0~~

~~6.15.2~~.0

~~5.2~~1.0

~~5.1.0~~

~~5.0.0~~

~~4.1.0~~

~~4.0.1~~

~~4.0.0~~

~~3.2.0~~

~~3.1.0~~

~~3.0.1~~

587.45~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB1~~

587.46~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB2~~

587.47~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB3~~

587.48~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB4~~

587.49~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB5~~

587.50~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB6~~

587.51~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB7~~

587.52~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB8~~

587.52.1/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB8a~~

587.53~~/5.2~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~deprecated~~

587.54~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB9~~

587.55~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB10~~

587.56~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB11~~

~~11b~~

587.57~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB12~~

~~11b~~

587.58~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB12a~~

~~12a~~

~~11b~~

587.59~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB13~~

587.60~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB14~~

587.61~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~split~~

15~~LB15~~

587.61.1~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB15a~~

587.61.2~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB15b~~

587.62~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB16~~

587.63~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB17~~

~~11a~~

587.64~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB18~~

587.65~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB19~~

587.66~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB20~~

~~14a~~

587.67~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB21~~

587.67.1~~/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB21a~~

~~21a~~

587.67.2/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB21b~~

~~21b~~

587.68~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB22~~

587.69~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB23~~

587.69.1/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB23a~~

~~23a~~

587.70~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB24~~

587.71~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB25~~

587.72~~/5.2~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~removed~~

~~18b~~

~~15b~~

587.73~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB26~~

~~18b~~

587.74~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB27~~

~~18c~~

587.75~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB28~~

587.75.1~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB28a~~

587.76~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB29~~

~~19b~~

587.77~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB30~~

~~removed~~

587.77.1~~/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB30a~~

~~30a~~

587.77.2/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB30b~~

~~30b~~

587.78~~/5.2/6.1/6.2~~/8/9~~/15.1~~/16

{16.0.0: 175-A67}

This paragraph was deleted. ~~LB31~~

587.79/16

{16.0.0: 175-A67}

Since its publication in 1999 as part of Unicode Version 3.0.0, the line breaking algorithm has undergone many changes. It started as a set of 29 line breaking classes involved in 23 rules which were representable as a pair table with some special handling for combining marks and spaces. It now encompasses 48 line breaking classes involved in more than 40 rules, many of which rely on extended context which may be several characters removed from the position they govern.

587.80/16

{16.0.0: 175-A67}

As the algorithm grew, rules were split, reordered, added, and removed. In Unicode Version 5.0, the rules were renumbered to reduce the number of alphabetic suffixes on the rule numbers.

587.81/16

{16.0.0: 175-A67}

Please refer to Unicode Technical Note #54, “Annotated Line Breaking Algorithm” [UTN54], for a complete history of the changes to the text of this document since Unicode Version 3.0.0, and for additonal background on these changes.

588/3.1

8 References

589/3.1

This paragraph was deleted. ~~[1]Michel Suignard, Worldwide Typography and How to Apply JIS X 4051-1995 to Unicode, Proceedings of the Twelfth International Unicode/ISO 10646 Conference, Tokyo, Japan, 1998~~

590/3.1

This paragraph was deleted. [2] Cy Cedar, David Veintimilla, Michel Suignard and Asmus Freytag, Report from the Trenches: Microsoft Publisher goes Unicode, Proceedings of the Eleventh International Unicode Conference, San Jose, CA 1997

591/3.1

This paragraph was deleted. ~~[3] The Unicode Standard, Version 3.0, (Reading, Massachusetts: Addison-Wesley Developers Press 2000)~~

592/3.1

This paragraph was deleted. ~~[4] Donald E. Knuth and Michael F. Plass, Breaking Lines into Paragraphs, republished in Digital Typography, CSLI 78, (Stanford, California: CLSI Publications1997)~~

592.1/4~~/4.1~~/5

This paragraph was deleted. ~~[Bidi]~~

~~Unicode Standard Annex #9~~27~~: Unicode BidirectionalBidirectinal~~ ~~Algorithm http://www.unicode.org/unicode/reports/tr9/~~

592.2/4~~/4.1~~/5

This paragraph was deleted. ~~[Boundaries]~~

~~Unicode Standard Annex #29, Text Boundaries. http://www.unicode.org/unicode/reports/tr29/ For information on grapheme cluster boundaries~~

593/3.1

This paragraph was deleted. ~~[5] Donald E. Knuth, TEX, the Program, Volume B of Computers & Typesetting, (Reading, Massachusetts: Addison-Wesley 1986)~~

593.1~~/3.1~~/5

This paragraph was deleted. ~~[Cedar97]~~

Cy Cedar, David Veintimilla, Michel Suignard and Asmus Freytag, Report from the Trenches: Microsoft Publisher goes Unicode, Proceedings of the Eleventh International Unicode Conference, San Jose, CA 1997

593.1.1~~/4.1~~/5

This paragraph was deleted. ~~[Code]~~

~~Sample code implementing the pair table http://www.unicode.org/Public/PROGRAMS/LineBreakSampleCpp/ Contains the code samples shown in this document together with driver code~~

593.2~~/3.1/3.2~~/4~~/4.0.1/4.1~~/5

This paragraph was deleted. ~~[Data]~~

~~Line Break property data file For the latestThe~~ ~~current~~ ~~version, see:~~ ~~of the line breaking property data file at the time of the publication of this document is~~ ~~http://www.unicode.org/Public/4.03.2~~1~~-Update/LineBreak-43.2.0.0.txt The latest version of the data file is http://www.unicode.org/Public/UNIDATA/LineBreak.txt For the current version, see: http://www.unicode.org/Public/4.1.0/ucd/LineBreak.txt For other versions, see: http://www.unicode.org/versions/~~

593.3~~/3.1/4.1~~/5

This paragraph was deleted. ~~[EAW]~~

~~Unicode Standard Annex #11, East Asian Width. http://www.unicode.org/unicode/reports/tr11/ For a definition of East Asian Width~~

593.4~~/3.1/4.1~~/5

This paragraph was deleted. ~~[FAQ]~~

~~Unicode Frequently Asked Questions http://www.unicode.org/unicode/faq/ For answers to common questions on technical issues.~~

593.4.1/4/5

This paragraph was deleted. ~~[Feedback]~~

~~http://www.unicode.org/reporting.html For reporting errors and requesting information online.~~

593.5~~/3.1~~/5

This paragraph was deleted. ~~[Glossary]~~

~~Unicode Glossary http://www.unicode.org/glossary/ For explanations of terminology used in this and other documents.~~

593.5.1~~/4.0.1~~/5

This paragraph was deleted. ~~[HangulST]~~

~~The latest version of the Hangul Syllable Types property data file is http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt~~

593.6~~/3.1/4.0.1~~/5

This paragraph was deleted. ~~[JIS]~~

~~JIS X 4051-1995. Line Composition Rules for Japanese Documents. (~~ ~~『日本語文晝の行組版方法』) Japanese Standards Association. 1995.~~

593.7~~/3.1/4.1~~/5

This paragraph was deleted. ~~[Knuth78]~~

~~Donald E. Knuth and Michael F. Plass, Breaking Lines into Paragraphs, republished in Digital Typography, CSLI 78, (Stanford, California: CLSI Publications 1997Publications1997~~)

593.8~~/3.1/4.1~~/5

This paragraph was deleted. ~~[Reports]~~

~~Unicode Technical Reports http://www.unicode.org/unicode/reports/ For information on the status and development process for technical reports, and for a list of technical reports.~~

593.9~~/3.1~~/5

This paragraph was deleted. ~~[Suign98]~~

~~Michel Suignard, Worldwide Typography and How to Apply JIS X 4051-1995 to Unicode, Proceedings of the Twelfth International Unicode/ISO 10646 Conference, Tokyo, Japan, 1998~~

593.10~~/3.1/4.0.1~~/5

This paragraph was deleted. ~~[TEXTeX~~]

~~Donald E. Knuth, TEX, the Program, Volume B of Computers & Typesetting, (Reading, Massachusetts: Addison-Wesley 1986)~~

593.11~~/3.1~~/4/5

This paragraph was deleted. ~~[UnicodeU3.0~~]

~~The Unicode Standard, Version 4~~3~~.0, (Reading, Massachusetts: Addison-Wesley Developers Press 2003, ISBN 0-321-18578-12000) or online as http://www.unicode.org/versionsunicode/Unicode4.0.0uni2book~~/~~u2.html~~

593.12~~/3.1~~/4

This paragraph was deleted. ~~[U3.1]~~

~~Unicode Standard Annex #27: Unicode 3.1 http://www.unicode.org/unicode/reports/tr27/~~

593.12.1~~/3.2~~/4

This paragraph was deleted. ~~[U3.2]~~

~~Unicode Standard Annex #28: Unicode 3.2 http://www.unicode.org/unicode/reports/tr28/~~

593.13~~/3.1/3.2~~/4/5

This paragraph was deleted. ~~[UCD]~~

~~Unicode Character Database~~. ~~http://www.unicode.org/ucd/ For an overview of the Unicode Character Database and a list of its associated files see http://www.unicode.org/Public/UNIDATA/UCDUnicodeCharacterDatabase.html~~ ~~For and overview of the Unicode Character Database and a list of its associated files~~

593.14~~/3.1/4.1~~/5

This paragraph was deleted. ~~[Versions]~~

~~Versions of the Unicode Standard http://www.unicode.org/unicode/standard/versions/ For details on the precise contents of each version of the Unicode Standard, and how to cite them.~~

593.15/5/5.2

For references for this annex, see Unicode Standard Annex #41, “Common References for Unicode Standard Annexes.” [UAX41].

594/3.1

9 Acknowledgments

594.1/5/6.3

This paragraph was deleted. ~~Asmus Freytag is the author of the initial version and has added to and maintained the text of this annex.~~

594.2/6~~/13~~/15.1

Asmus Freytag created the initial version of this annex and maintained the text for many years. Andy Heninger maintained~~maintains~~ the text from 2008 through 2019. Christopher Chapman maintained the text from 2020 through 2022. Robin Leroy has maintained the text since September 2022..

595~~/3.1~~/4~~/4.1~~/5~~/5.1/15.1~~/16

{15.1.0: 173-A8; L2/22-244; PRI-446#ID20220410201211} {15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535} {16.0.0: 179-A97; PRI-446#ID20220405071453} {16.0.0: 179-C25, 179-A98; PRI-335#ID20170429231648} {16.0.0: 172-A100; PRI-446#ID20220603194905} {16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

The initial assignments of properties are based on input by Michel Suignard. Mark Davis provided algorithmic verification and formulation of the rules, and detailed suggestions on the algorithm and text. Ken Whistler, Rick McGowan, Deborah Anderson, Lorna Evans, and other members of the editorial committee provided valuable feedback. Tim Partridge enlarged the information on dictionary usage. Sun Gi Hong reviewed the information on Korean and provided copious printed samples. Eric Muller reanalyzed the behavior of the soft hyphen and collected the samples. Adam Twardoch provided the Polish example. António Martins-Tuválkin supplied information about Portuguese. Tomoyuki Sadahiro provided information on use of U+30A0. Christopher Fynn provided the background information on Tibetan line breaking. Andrew West, Kamal Mansour, Andrew Glass, Daniel Yacob, and Peter Kirk suggested improvements for Mongolian, Arabic, Kharoshthi, Ethiopic, and Hebrew punctuation characters, respectively. Kent Karlsson reviewed the line break properties for consistency. Jerry Hall~~Andy Heninger~~ reviewed the sample code. Elika J. Etemad (fantasai) reviewed the entire document in an effort to make it easier to reference from external standards. Norbert Lindenberg added the Brahmic style of line breaking and provided clarifications on the South East Asian style of line breaking. Charlotte Buff and David Corbett provided ample feedback on property assignments and ramifications of the rules. Many others provided additional review of the rules and ~~provided input on regular expression-based implementations. Many others provided additional review of the rules and~~ property assignments.

596/3.1

Modifications10 Changes from previous revisions

596.-4.1/5/6.1

This paragraph was deleted. Change History

596.-4.2/5/6.3

This paragraph was deleted. ~~For details of the change history, see the online copy of this annex at http://www.unicode.org/reports/tr14/.~~

596.-4.3/5/5.2

This paragraph was deleted. Rule Numbering Across Versions

596.-4.4/5~~/5.1~~/5.2

This paragraph was deleted. ~~Table 5~~4 documents changes in the numbering of line breaking rules. A duplicate number indicates that a rule was subsequently split. (In each version, the rules are applied in their numerical order, not in the order they appear in this table.) Versions prior to 3.0.1 are not documented here.

596.-4.5/5~~/5.1~~/5.2

This paragraph was deleted. ~~Table 5~~4~~. Rule Numbering Across Versions~~

596.-4.6/5~~/5.1~~/5.2

This paragraph was deleted. ~~5.1.0~~

~~5.0.0~~

~~4.1.0~~

~~4.0.1~~

~~4.0.0~~

~~3.2.0~~

~~3.1.0~~

~~3.0.1~~

596.-4.7/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB1~~

596.-4.8/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB2~~

596.-4.9/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB3~~

596.-4.10/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB4~~

596.-4.11/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB5~~

596.-4.12/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB6~~

596.-4.13/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB7~~

596.-4.14/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB8~~

596.-4.15/5/5.2

This paragraph was deleted. ~~deprecated~~

596.-4.16/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB9~~

596.-4.17/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB10~~

596.-4.18/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB11~~

~~11b~~

596.-4.19/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB12~~

~~11b~~

596.-4.19.1~~/5.1~~/5.2

This paragraph was deleted. ~~LB12a~~

~~11b~~

596.-4.20/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB13~~

596.-4.21/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB14~~

596.-4.22/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB15~~

596.-4.23/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB16~~

596.-4.24/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB17~~

~~11a~~

596.-4.25/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB18~~

596.-4.26/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB19~~

596.-4.27/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB20~~

~~14a~~

596.-4.28/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB21~~

596.-4.29/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB22~~

596.-4.30/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB23~~

596.-4.31/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB24~~

596.-4.32/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB25~~

596.-4.33/5/5.2

This paragraph was deleted. ~~removed~~

~~18b~~

~~15b~~

596.-4.34/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB26~~

~~18b~~

596.-4.35/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB27~~

~~18c~~

596.-4.36/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB28~~

596.-4.37/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB29~~

~~19b~~

596.-4.38/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB30~~

596.-4.39/5~~/5.1~~/5.2

This paragraph was deleted. ~~LB31~~

596.-4.40/5/6.1

This paragraph was deleted. Change History

596.-3.1~~/4.1~~/5/11

The following summarizes modifications from the previous~~documentsThis section indicates~~ ~~the changes introduced by each~~ revision of this annex..

596.-3.1.-9.1~~/10~~/11

This paragraph was deleted. Revision 39:

596.-3.1.-9.2~~/10~~/11

This paragraph was deleted. ~~• Reissued for Unicode 10.0.0.~~

596.-3.1.-9.3~~/10~~/11

This paragraph was deleted. ~~• Removed Section 7, Pair Table Based Implementation, and other references to it.~~

596.-3.1.-9.4~~/10~~/11

This paragraph was deleted. ~~• Changed definition of Regional Indicator to refer to the corresponding Unicode character property.~~

596.-3.1.-9.5~~/10~~/11

This paragraph was deleted. ~~• Changed text referring to customizations from CLDR to match the wording agreed to for UAX #29.~~

596.-3.1.-9.6~~/10~~/11

This paragraph was deleted. ~~• Updated the list of ranges and blocks that default to class ID.~~

596.-3.1.-9.7~~/10~~/11

This paragraph was deleted. ~~• Removed U+FF70 from the list of example characters for class NS.~~

596.-3.1.-9.8~~/10~~/11

This paragraph was deleted. ~~Revision 38 being a proposed update, only changes between revisions 37 and 39 are noted here.~~

596.-3.1.-8.1/9/11

This paragraph was deleted. Revision 37:

596.-3.1.-8.2/9/11

This paragraph was deleted. ~~• Reissued for Unicode 9.0.0.~~

596.-3.1.-8.3/9/11

This paragraph was deleted. ~~• Revised rules 23 & 24, and added 23a, to prevent breaks between (letters or numbers) and (numeric prefixes or postfixes), on either side.~~

596.-3.1.-8.4/9/11

This paragraph was deleted. ~~• Added Emoji Base (EB) and Emoji Modifier (EM) character classes, and rule 30b to prevent breaks within emoji modifier sequences.~~

596.-3.1.-8.5/9/11

This paragraph was deleted. ~~• Revised rules 8a, 22 and 23a to treat EB and EM similarly to ID in most contexts.~~

596.-3.1.-8.6/9/11

This paragraph was deleted. ~~• Regional Indicators: revised class description and rule LB30a to group into pairs, with break opportunities between pairs.~~

596.-3.1.-8.7/9/11

This paragraph was deleted. ~~• Added ZWJ class and rule 8a to prevent breaks within emoji ZWJ sequences.~~

596.-3.1.-8.8/9/11

This paragraph was deleted. ~~• Revised rules 9 and 10 to treat any ZWJ as a combining mark when not in the context of an emoji zwj sequence.~~

596.-3.1.-8.9/9/11

This paragraph was deleted. ~~Revision 36 being a proposed update, only changes between revisions 35 and 37 are noted here.~~

596.-3.1.-7.1/8/11

This paragraph was deleted. Revision 35:

596.-3.1.-7.2/8/11

This paragraph was deleted. ~~• Reissued for Unicode 8.0.0.~~

596.-3.1.-7.3/8/11

This paragraph was deleted. ~~• Updated table styles. Minor editing and HTML cleanup throughout. [KW]~~

596.-3.1.-7.4/8/11

This paragraph was deleted. ~~• Added EX × IN to rule 22.~~

596.-3.1.-7.5/8/11

This paragraph was deleted. ~~• Added rule 21b, don’t break between Solidus and Hebrew.~~

596.-3.1.-7.6/8/11

This paragraph was deleted. ~~Revision 34 being a proposed update, only changes between revisions 33 and 35 are noted here.~~

596.-3.1.-6.1/7/11

This paragraph was deleted. Revision 33:

596.-3.1.-6.2/7/11

This paragraph was deleted. ~~• Reissued for Unicode 7.0.0.~~

596.-3.1.-5.1~~/6.3~~/11

This paragraph was deleted. Revision 32:

596.-3.1.-5.2~~/6.3~~/11

This paragraph was deleted. ~~• Reissued for Unicode 6.3.0.~~

596.-3.1.-5.3~~/6.3~~/11

This paragraph was deleted. ~~• Update the description of class CM to mention U+3035 VERTICAL KANA REPEAT MARK LOWER HALF.~~

596.-3.1.-5.4~~/6.3~~/11

This paragraph was deleted. ~~• Update the description of class BA to reflect the change of U+3000 IDEOGRAPHIC SPACE to class BA.~~

596.-3.1.-5.5~~/6.3~~/8/11

This paragraph was deleted. ~~• Clarify descriptions in Sectionsection~~ ~~7.3, Example Pair Table.~~

596.-3.1.-5.6~~/6.3~~/8/11

This paragraph was deleted. ~~• Remove Sectionsection~~ ~~7.6, Conjoining Jamos pair table implementation, reflecting that Jamos are included directly in the main pair table.~~

596.-3.1.-5.7~~/6.3~~/8/11

This paragraph was deleted. ~~• Revised Example 6 of Sectionsection~~ ~~8.2, Examples of Customization.~~

596.-3.1.-5.8~~/6.3~~/11

This paragraph was deleted. ~~Revision 31 being a proposed update, only changes between revisions 32 and 30 are noted here.~~

596.-3.1.-4.1~~/6.2~~/11

This paragraph was deleted. Revision 30:

596.-3.1.-4.2~~/6.2~~/11

This paragraph was deleted. ~~• Reissued for Unicode 6.2.0.~~

596.-3.1.-4.3~~/6.2~~/11

This paragraph was deleted. ~~• Introduce character class RI (Regional Indicator).~~

596.-3.1.-4.4~~/6.2~~/11

This paragraph was deleted. ~~• Introduce rule 30a, do not break between Regional Indicators.~~

596.-3.1.-4.5~~/6.2~~/11

This paragraph was deleted. ~~Revision 29 being a proposed update, only changes between revisions 30 and 28 are noted here.~~

596.-3.1.-3.1~~/6.1~~/11

This paragraph was deleted. Revision 28:

596.-3.1.-3.2~~/6.1~~/11

This paragraph was deleted. ~~• Reissued for Unicode 6.1.0.~~

596.-3.1.-3.3~~/6.1~~/11

This paragraph was deleted. ~~• Add rule 21a, don't break after Hebrew + hyphen.~~

596.-3.1.-3.4~~/6.1~~/11

This paragraph was deleted. ~~• Introduce character class HL (Hebrew Letter).~~

596.-3.1.-3.5~~/6.1~~/11

This paragraph was deleted. ~~• Introduce character class CJ for small kana, and amend rule LB1 to provide default resolution for class CJ.~~

596.-3.1.-3.6~~/6.1~~/11

This paragraph was deleted. ~~• Clarify that the list of GL characters is not comprehensive.~~

596.-3.1.-3.7~~/6.1~~/8/11

This paragraph was deleted. ~~• Update Example 7 of Section 8.2, Examples of~~ (~~Customization,~~) ~~to reflect the introduction of character class CP.~~

596.-3.1.-3.8~~/6.1~~/11

This paragraph was deleted. ~~Revision 27 being a proposed update, only changes between revisions 24 and 26 are noted here.~~

596.-3.1.-2.1/6/11

This paragraph was deleted. Revision 26:

596.-3.1.-2.2/6/11

This paragraph was deleted. ~~• Reissued for Unicode 6.0.0.~~

596.-3.1.-2.3/6/11

This paragraph was deleted. ~~• In Section 5.1, revised the description of the SHY character.~~

596.-3.1.-2.4/6/11

This paragraph was deleted. ~~• Changed LB8 from "ZW ÷" to "ZW SP* ÷" in accordance with UAX 14 Corrigendum #7, http://www.unicode.org/versions/corrigendum7.html.~~

596.-3.1.-2.5/6/11

This paragraph was deleted. ~~• Changed references to Unicode 5.2 to generic references where appropriate.~~

596.-3.1.-2.6/6/11

This paragraph was deleted. ~~• In Section 3, removed Ideographic Space from the list of spaces that may be compressed or expanded and clarified that the justification of lines is out of scope.~~

596.-3.1.-2.7/6/11

This paragraph was deleted. ~~Revision 25 being a proposed update, only changes between revisions 24 and 26 are noted here.~~

596.-3.1.-1.1~~/5.2~~/11

This paragraph was deleted. Revision 24:

596.-3.1.-1.2~~/5.2~~/11

This paragraph was deleted. ~~• Reissued for Unicode 5.2.0.~~

596.-3.1.-1.3~~/5.2~~/11

This paragraph was deleted. ~~• Added class CP, reintroduced rule LB30, adjusted other rules for class CP.~~

596.-3.1.-1.4~~/5.2~~/6/11

This paragraph was deleted. ~~• In Sectionsection~~ ~~5.1, clarified that the lists of characters for each property contain representative characters, and are not necessarily complete.~~

596.-3.1.-1.5~~/5.2~~/11

This paragraph was deleted. ~~• Unassigned code points in CJK regions default to class ID.~~

596.-3.1.-1.6~~/5.2~~/11

This paragraph was deleted. ~~• Added Tai Tham, Myanmar Extended-A and Tai Viet blocks to class SA.~~

596.-3.1.-1.7~~/5.2~~/11

This paragraph was deleted. ~~• Brought descriptive names of the character classes into closer alignment with the Unicode Line Break character property values.~~

596.-3.1.-1.8~~/5.2~~/11

This paragraph was deleted. ~~• Small edits for improved clarity, document style.~~

596.-3.1.-1.9~~/5.2~~/11

This paragraph was deleted. ~~Revision 23 being a proposed update, only changes between revisions 22 and 24 are noted here.~~

596.-3.1.0.1~~/5.1~~/11

This paragraph was deleted. Revision 22:

596.-3.1.0.2~~/5.1~~/8/11

This paragraph was deleted. ~~• ReissuedUpdated~~ ~~for Version 5.1.0.~~

596.-3.1.0.3~~/5.1~~/11

This paragraph was deleted. ~~• Added 2E18, INVERTED INTERROBANG, to class OP.~~

596.-3.1.0.4~~/5.1~~/11

This paragraph was deleted. ~~• Added 2064, INVISIBLE PLUS, to class AL.~~

596.-3.1.0.5~~/5.1~~/11

This paragraph was deleted. ~~• Added 2E00..2E01, 2E06..2E08, 2E0B to class QU.~~

596.-3.1.0.6~~/5.1~~/11

This paragraph was deleted. ~~• Removed LB30, to correct regression for U+3002 IDEOGRAPHIC FULL STOP~~

596.-3.1.0.7~~/5.1~~/11

This paragraph was deleted. ~~• Add Example 9 to Section 8, Customization.~~

596.-3.1.0.8~~/5.1~~/11

This paragraph was deleted. ~~• Substantial revisions to Section 4, Conformance.~~

596.-3.1.0.9~~/5.1~~/11

This paragraph was deleted. ~~• Section 5.7, Word Separators, added.~~

596.-3.1.0.10~~/5.1~~/11

This paragraph was deleted. ~~• Section 10, Testing, added.~~

596.-3.1.0.11~~/5.1~~/8/11

This paragraph was deleted. ~~• Renumber rules for consistency: 12a →~~-> ~~12; 12b →~~-> ~~12a~~

596.-3.1.0.12~~/5.1~~/11

This paragraph was deleted. ~~• Added 02DF, MODIFIER LETTER CROSS ACCENT, to class BB.~~

596.-3.1.0.13~~/5.1~~/11

This paragraph was deleted. ~~• Added discussion for 202F NARROW NO-BREAK SPACE and 180E MONGOLIAN VOWEL SEPARATOR~~

596.-3.1.0.14~~/5.1~~/11

This paragraph was deleted. ~~• Corrected typos in LB13 and LB16~~

596.-3.1.0.15~~/5.1~~/11

This paragraph was deleted. ~~• Added characters introduced with Unicode 5.1 to the lists associated with the line break properties.~~

596.-3.1.0.16~~/5.1~~/11

This paragraph was deleted. ~~• Added Section 5.2 on the special handling of hyphens. Edited Section 3 for clarity.~~

596.-3.1.0.17~~/5.1~~/11

This paragraph was deleted. ~~• Improved delineation between normative and informative information.~~

596.-3.1.0.18~~/5.1~~/11

This paragraph was deleted. ~~• Changed from EX to IS~~

596.-3.1.0.19~~/5.1~~/11

This paragraph was deleted. ~~• 060C ( ، ) ARABIC COMMA~~

596.-3.1.0.20~~/5.1~~/11

This paragraph was deleted. ~~• Changed from EX to PO~~

596.-3.1.0.21~~/5.1~~/11

This paragraph was deleted. ~~• 066A ( ٪ ) ARABIC PERCENT SIGN~~

596.-3.1.0.22~~/5.1~~/11

This paragraph was deleted. ~~• Changed from AI to OP~~

596.-3.1.0.23~~/5.1~~/11

This paragraph was deleted. ~~• 00A1 ( ¡ ) INVERTED EXCLAMATION MARK~~

596.-3.1.0.24~~/5.1~~/11

This paragraph was deleted. ~~• 00BF ( ¿ ) INVERTED QUESTION MARK~~

596.-3.1.0.25~~/5.1~~/11

This paragraph was deleted. ~~• Changed from BA to EX~~

596.-3.1.0.26~~/5.1~~/11

This paragraph was deleted. ~~• 1802 ( ᠂ ) MONGOLIAN COMMA~~

596.-3.1.0.27~~/5.1~~/11

This paragraph was deleted. ~~• 1803 ( ᠃ ) MONGOLIAN FULL STOP~~

596.-3.1.0.28~~/5.1~~/11

This paragraph was deleted. ~~• 1808 ( ᠈ ) MONGOLIAN MANCHU COMMA~~

596.-3.1.0.29~~/5.1~~/11

This paragraph was deleted. ~~• 1809 ( ᠉ ) MONGOLIAN MANCHU FULL STOP~~

596.-3.1.0.30~~/5.1~~/11

This paragraph was deleted. ~~• 2CF9 ( ⳹ ) COPTIC OLD NUBIAN FULL STOP~~

596.-3.1.0.31~~/5.1~~/11

This paragraph was deleted. ~~• 2CFE ( ⳾ ) COPTIC FULL STOP~~

596.-3.1.0.32~~/5.1~~/11

This paragraph was deleted. ~~• Changed from BA to AL~~

596.-3.1.0.33~~/5.1~~/11

This paragraph was deleted. ~~• 1A1E ( ᨞ ) BUGINESE PALLAWA~~

596.-3.1.0.34~~/5.1~~/11

This paragraph was deleted. ~~• Changed from AL to BB~~

596.-3.1.0.35~~/5.1/6.1~~/11

This paragraph was deleted. ~~• 1FFD ( ´~~´ ~~) GREEK OXIA~~

596.-3.1.0.36~~/5.1~~/11

This paragraph was deleted. ~~• Added a note on lack of canonical equivalence for the definition of ambiguous characters.~~

596.-3.1.0.37~~/5.1~~/11

This paragraph was deleted. ~~• Corrected typos in the sample source code.~~

596.-3.1.0.38~~/5.1~~/11

This paragraph was deleted. ~~• Split rule LB12 to accommodate Polish and Portuguese hyphenation.~~

596.-3.1.0.39~~/5.1~~/11

This paragraph was deleted. ~~Revisions 20 and 21 being a proposed update, only changes between revisions 19 and 22 are noted here.~~

596.-3.1.1/5/11

This paragraph was deleted. Revision 19:

596.-3.1.2/5/11

This paragraph was deleted. ~~• Changed 000B from CM to BK, changed 035C from CM to GL.~~

596.-3.1.3/5/11

This paragraph was deleted. ~~• Changed 17D9 from NS to AL. 203D, 2047..2049 from AL to NS.~~

596.-3.1.4/5/11

This paragraph was deleted. ~~• Corrected listing of NS property to match the data file to remove 17D8 and 17DA.~~

596.-3.1.5/5/11

This paragraph was deleted. ~~• The data file has been corrected to match the listing of the BA property to include 1735 and 1736, also changed 05BE and 103D0 from AL to BA.~~

596.-3.1.6/5/11

This paragraph was deleted. ~~• Changed the brackets 23B4.23B6 to AL.~~

596.-3.1.7/5/11

This paragraph was deleted. ~~• Updated the SA property to make it more generic, includes changing many characters from CM to SA.~~

596.-3.1.8/5/11

This paragraph was deleted. ~~• Reflected new characters~~

596.-3.1.9/5/11

This paragraph was deleted. ~~• Made several text changes for clarifications, including reworded the intro to Section 6.~~

596.-3.1.10/5~~/10~~/11

This paragraph was deleted. ~~• Added Section 9, Implementation Notes.~~.

596.-3.1.11/5/11

This paragraph was deleted. ~~• Restated the conformance clauses and reorganized the algorithm into a tailorable and a non-tailorable part; this affects text in Sections 4, 5, and 6.~~

596.-3.1.12/5/11

This paragraph was deleted. ~~• Removed redundant term PR x HY from rule 18 and rule into new LB24 and LB26 to provide better granularity for tailoring,~~

596.-3.1.13/5/11

This paragraph was deleted. ~~• Moved rule 11b and 13 above rule 8 (new LB13), restating rule 13 (new LB12) to preserve its effect in the new location.~~

596.-3.1.14/5/11

This paragraph was deleted. ~~• Added new rule LB30 to handle words like “person(s)”.~~

596.-3.1.15/5/11

This paragraph was deleted. ~~• Renumbered the rules.~~

596.-3.1.16/5/11

This paragraph was deleted. ~~• Extensive copy-editing as part of Unicode 5.0 publication.~~

596.-3.1.17/5/11

This paragraph was deleted. ~~Revision 18 being a proposed update, only changes between revisions 17 and 19 are noted here.~~

596.-3.2~~/4.1~~/11

This paragraph was deleted. Revision 17:

596.-3.3~~/4.1~~/5

This paragraph was deleted. ~~• [2005-08-29 Erratum] The status section inadvertently proclaimed this to be a proposed update, this was corrected to correctly reflect the status of the document.~~

596.-3.4~~/4.1~~/11

This paragraph was deleted. ~~• Significantly revised the line break classes for Tibetan, as well as Mongolian and Arabic Punctuation.~~

596.-3.5~~/4.1/5.1~~/8/11

This paragraph was deleted. ~~• Added Sectionsection~~ ~~5.6~~5 ~~on Tibetan and Sectionsection~~ ~~7.7 on handling explicit breaks.~~

596.-3.6~~/4.1~~/11

This paragraph was deleted. ~~• Added line break class assignments for Unicode 4.1 characters.~~

596.-3.7~~/4.1~~/11

This paragraph was deleted. ~~• Significantly revised the line break class assignments for danda characters and made it consistent across scripts.~~

596.-3.8~~/4.1~~/11

This paragraph was deleted. ~~• LB6: Replaced by new rules 18b and 18c, using new classes JL, JV, JT, H2, and H3.~~

596.-3.9~~/4.1~~/11

This paragraph was deleted. ~~• LB7a: Deprecated rule 7a because SPACE as base character for standalone combining marks is deprecated.~~

596.-3.10~~/4.1~~/8/11

This paragraph was deleted. ~~• LB7b: Revised 7b and Sectionsection~~ ~~7.5 as well as Table 2 to match the deprecation of rule 7a.~~

596.-3.11~~/4.1~~/11

This paragraph was deleted. ~~• LB7b: Clarified that this rule does not apply to SP.~~

596.-3.12~~/4.1~~/11

This paragraph was deleted. ~~• LB11a: Added a missing SP * to make the formula match the rule.~~

596.-3.13~~/4.1~~/11

This paragraph was deleted. ~~• LB18b: Removed the existing rule 18b because it was redundant.~~

596.-3.14~~/4.1~~/11

This paragraph was deleted. ~~• Corrected an erratum on revision 14 by splitting GL from WJ in rule 11b and moving to a new rule 13.~~

596.-3.15~~/4.1~~/11

This paragraph was deleted. ~~• Updated the pair table and sample code to match the changes in the rules.~~

596.-3.16~~/4.1~~/11

This paragraph was deleted. ~~• Updated the regular expression for numbers.~~

596.-3.17~~/4.1~~/11

This paragraph was deleted. ~~• Added several notes on implementation techniques.~~

596.-3.18~~/4.1~~/8/11

This paragraph was deleted. ~~• Moved all suggested tailorings from the rules section to the examples in Sectionsection~~ ~~8.2.~~

596.-3.19~~/4.1~~/5/11

This paragraph was deleted. [~~Revision 16~~, ~~being a proposed update, only changesis superseded and no longer publicly available. Only modifications~~ ~~between revisions 17 and 1510 and 12~~ ~~are notedtracked~~ ~~here.~~]

596.-3.20~~/4.1~~/11

This paragraph was deleted. Revision 15:

596.-2.1~~/4.0.1~~/4.1

This paragraph was deleted. ~~Change from Revision 14:~~

596.-2.2~~/4.0.1/4.1~~/11

This paragraph was deleted. ~~• LB19b: Added new rule 19b.~~

596.-2.3~~/4.0.1~~/11

This paragraph was deleted. • Changed line breaking class: combining double diacritics from CM to GL, 037A and 2126 to match their canonical equivalents, 2140 corrected to AL, Arabic numerical separators from AL to NU, many alphabetic characters that are EAW=Ambiguous from AI to AL to better reflect current practice, remaining circled numbers and letters from AL to AI for consistency.

596.-2.4~~/4.0.1~~/11

This paragraph was deleted. ~~• Added a note on the behavior of U+200B and U+3000 when lines are justified.~~

596.-2.5~~/4.0.1~~/8/11

This paragraph was deleted. ~~• Reconciled the data file and description of line breaking classes in Sectionsection~~ 5.

596.-2.6~~/4.0.1~~/11

This paragraph was deleted. ~~• Reconciled the rules and pair table implementation of the algorithm.~~

596.-2.7~~/4.0.1~~/8/11

This paragraph was deleted. ~~• Updated the text of the conformance statement in Sectionsection~~ 4.

596.-2.8~~/4.0.1/5.1~~/8/11

This paragraph was deleted. ~~• Added Sectionsection~~ ~~5.5~~4 ~~on use of double hyphen.~~

596.-2.9~~/4.0.1~~/11

This paragraph was deleted. ~~• Updated styles and table formatting.~~

596.-2.10~~/4.0.1~~/11

This paragraph was deleted. ~~• Minor edits throughout.~~

596.-1.1/4/4.1

This paragraph was deleted. ~~Change from Revision 13:~~

596.-1.2/4/4.1

This paragraph was deleted. ~~[Revision 13, being a proposed update is superseded and no longer publicly available]~~

596.-1.3/4/4.1

This paragraph was deleted. ~~Change from Revision 12:~~

596.-1.3.1~~/4.1~~/11

This paragraph was deleted. Revision 14:

596.-1.4/4~~/4.0.1~~/11

This paragraph was deleted. ~~• Added new line breaking classes NL and WJ to better support NEL and Word Joiner.~~

596.-1.5/4/11

This paragraph was deleted. ~~• Deprecated the use of class SG.~~

596.-1.6/4~~/4.1~~/11

This paragraph was deleted. ~~• Several changes to the rules. Moved rule 15b to 18b, added 14b, moved 13 to 11b. Split rule 6 in to 6a and 7band7b~~ ~~and split rule 3a into 3a and 3b. Restated rule 7a and added rule 7c.~~

596.-1.7/4/11

This paragraph was deleted. ~~• Updated the pair table and sample code, adding a special token '#' to account for breaks before SP followed by CM.~~

596.-1.8/4~~/4.0.1/4.1~~/11

This paragraph was deleted. ~~• Clarified the behavior of SHY and MONGOLIAN TODO SOFT HYPHENmongolian todo soft hyphensyphen, as well as WJ and ZWNBSP.~~

596.-1.9/4~~/4.1/5.1~~/8/11

This paragraph was deleted. ~~• Added a new Sectionsubsection~~ ~~5.4~~3 ~~on SOFT HYPHENsoft hyphen~~ ~~and a new Sectionsubsection~~ ~~7.6 on conjoining jamos.~~

596.-1.10/4/11

This paragraph was deleted. ~~• Added to the discussion on how to treat combining marks.~~

596.-1.11/4/11

This paragraph was deleted. ~~• Clarified the conformance requirements in Section 4~~

596.-1.12/4/11

This paragraph was deleted. ~~• Added a definition of line breaking class as synonym for the unwieldy line breaking property value.~~

596.-1.13/4/11

This paragraph was deleted. ~~• Expanded the introduction in Section 3.~~

596.-1.14/4/11

This paragraph was deleted. ~~• Moved subsections on customization into a new Section 8 and expanded the text.~~

596.-1.15/4/11

This paragraph was deleted. ~~• Many edits throughout the text to update it for Unicode 4.0.0.~~

596.0.1~~/3.2/4.1~~/5/11

This paragraph was deleted. [~~Change from~~ ~~Revision 1113,~~ ~~being a proposed update, only changesis superseded and no longer publicly available. Only modifications~~ ~~between revisions 12 and 14 are notedtracked~~ ~~here.~~]~~11:~~

596.0.1.1/4~~/4.1~~/11

This paragraph was deleted. [Revision 12:11, being a proposed update, is superseded and no longer publicly available]

596.0.2~~/3.2~~/4/11

This paragraph was deleted. ~~• Change header for publication of Unicode. Fixed a few additional typos.~~ ~~[Revision 11, being a proposed update, is superseded and no longer publicly available]~~

596.0.3~~/3.2~~/4.1

This paragraph was deleted. ~~Change from Revision 10:~~

596.0.4~~/3.2~~/11

This paragraph was deleted. ~~• Updated for publication of Unicode, Version 3.2~~

596.0.5~~/3.2~~/5/11

{3.2.0: 81-M6, 85-M7; L2/00-258}

This paragraph was deleted. ~~• Added Word joinerWORD JOINER~~ ~~to GL and noted that it now is the preferred character instead of FEFF~~

596.0.6~~/3.2~~/5/11

{3.2.0: 83-AI43, 84-M10, 85-M13; L2/00-156} {3.2.0: 83-M11; L2/00-119}

This paragraph was deleted. ~~• Added LB class assignments for the new Unicode 3.2 characters to the data filedatafile. Only characters whose LB class differs from those of characters with related General_CategoryGeneral Category~~ ~~are noted explicitly in this text.~~

596.1~~/3.1~~/4.1

This paragraph was deleted. ~~Change from Revision 9:~~

596.1.1~~/4.1~~/5/11

This paragraph was deleted. [~~Revision 11, being a proposed update, only changesis superseded and no longer publicly available. Only modifications~~ ~~between revisions 10 and 12 are notedtracked~~ ~~here.]~~

596.1.2~~/4.1~~/11

This paragraph was deleted. Revision 10:

596.2~~/3.1~~/5/11

This paragraph was deleted. ~~• ChangedChange~~ ~~header for publication of Unicode 3.1. Fixed a few additional typos.~~

596.3~~/3.1/4.1~~/11

This paragraph was deleted. Change from Revision 9:8:

596.4~~/3.1~~/11

This paragraph was deleted. ~~• Fixed several typos, reformatted and sorted some lists by code points~~

596.5~~/3.1~~/11

This paragraph was deleted. ~~• Reconciled the data file and the description for BB (00B4), XX (PUA), AI (2015,25C8,PUA), ID (FE6B), BA (00B4)~~

596.6~~/3.1/4.1~~/11

This paragraph was deleted. ~~• Restored PUA to XX.~~

596.7~~/3.1~~/4~~/4.1~~/5/11

This paragraph was deleted. ~~• LB7: Restored the rule, and fixed the note so it matches the rule and Section~~LB 7.~~: Restored the rule, and fixed the note so it matches the rule and Sectionsection~~ 7.79 ~~of [Unicode4.0UnicodeU3.0~~].

596.8~~/3.1~~/5/11

{3.1.0: }

This paragraph was deleted. ~~• LB11a: added a rule to reconcile the rules against pair~~ -~~table entry B2 ^ B2~~

596.9~~/3.1~~/5/11

This paragraph was deleted. ~~• LB19: added an entry to reconcile the rules against pair~~ -~~table entry PR % ID~~

596.10~~/3.1~~/4/8/11

This paragraph was deleted. ~~• Reworked Sectionsection~~ ~~7.5.~~

596.11~~/3.1~~/8/11

This paragraph was deleted. ~~• Removed two unused definitions (overfull and underfull).~~

596.12~~/3.1/4.1~~/11

This paragraph was deleted. Change from Revision 8:7:

596.13~~/3.1~~/11

This paragraph was deleted. ~~• New status section, changed format of references. Fixed several typos.~~

596.14~~/3.1~~/5/11

This paragraph was deleted. ~~• Added headers to Tabletable~~ 1

596.15~~/3.1~~/11

This paragraph was deleted. ~~• Added a note on use of B and A~~

596.16~~/3.1/4.1~~/11

This paragraph was deleted. ~~• Added mention of PUA to AI and removed mention of PUA from XX becausesince~~ ~~the data file assigns AI to them.~~

596.17~~/3.1~~/11

This paragraph was deleted. ~~• Clarified the membership and implication of class CM and ID.~~

596.18~~/3.1~~/11

This paragraph was deleted. ~~• Updated class ID by the new ranges for 3.1.~~

596.19~~/3.1/3.2/4.1~~/11

This paragraph was deleted. ~~• LB6LB 6: Clarified the description of LB6 to clarify how it affects conjoining Jamo.~~ .

596.20~~/3.1/4.1~~/11

This paragraph was deleted. ~~• LB7LB 7: Fixed the note so it matches the rule.~~

596.21~~/3.1/4.1~~/11

This paragraph was deleted. ~~• LB17LB 17: Fixed the regular expression for numbers in the explanation for this rule.~~

596.22~~/3.1~~/4/11

This paragraph was deleted. ~~• Reworded Sectionssections~~ ~~7.6 and 7.7 to clarify the customization process.~~

596.23~~/3.1/4.1~~/11

This paragraph was deleted. Change from Revision 7:6:

596.24~~/3.1~~/11

This paragraph was deleted. ~~• Fixed several typos.~~

596.25~~/3.1~~/11

This paragraph was deleted. ~~• New header.~~

596.26~~/3.1/4.1~~/11

This paragraph was deleted. Change from Revision 6:5:

596.27~~/11~~/12

This paragraph was deleted. Revision 41:

596.28~~/11~~/12

This paragraph was deleted. ~~• Reissued for Unicode 11.0.~~

596.29~~/11~~/12

{11.0.0: 149-A53, 155-A27, 155-C14, 155-A112; L2/17-074}

This paragraph was deleted. ~~• Removed the right side of the rule for LB8a, and revised the description of ZWJ.~~

596.30~~/11~~/12

{11.0.0: 155-A26; PRI-376#ID20180414084252}

This paragraph was deleted. • Corrected the names or abbreviations of the following characters: RUNIC SINGLE PUNCTUATION, RUNIC MULTIPLE PUNCTUATION, FIVE DOT MARK, PHOENICIAN WORD SEPARATOR, DOUBLE OBLIQUE HYPHEN, NON-BREAKING HYPHEN, PLUS-MINUS SIGN, MINUS-OR-PLUS SIGN, LINE SEPARATOR, PARAGRAPH SEPARATOR.

596.31~~/11~~/12

{11.0.0: 154-A128; L2/18-009#ID20171110171601}

This paragraph was deleted. ~~• Section 5, updated description of the Data File.~~

596.32~~/11~~/12

This paragraph was deleted. ~~• Section 5.1, subsection for GL, second paragraph, update the section number reference to Mongolian.~~

596.33~~/11~~/12

This paragraph was deleted. ~~Revision 40 being a proposed update, only changes between revisions 39 and 41 are noted here.~~

596.34~~/12~~/13

This paragraph was deleted. Revision 43:

596.35~~/12~~/13

This paragraph was deleted. ~~• Reissued for Unicode 12.0.~~

596.36~~/12~~/13

{12.0.0: 155-A31}

This paragraph was deleted. ~~• Clarified behavior of NNBSP for Mongolian.~~

596.37~~/12~~/13

{12.0.0: 173-A128; PRI-383#ID20190106230734}

This paragraph was deleted. ~~• Corrected typographical and editing errors identified in public feedback.~~

596.38~~/12~~/13

{12.0.0: 173-A128}

This paragraph was deleted. ~~• Added references to CLDR and UTS35 as a source for tailorings.~~

596.39~~/12~~/13

This paragraph was deleted. ~~Revision 42 being a proposed update, only changes between revisions 41 and 43 are noted here.~~

596.40~~/13~~/14

This paragraph was deleted. Revision 45:

596.41~~/13~~/14

This paragraph was deleted. ~~• Reissued for Unicode 13.0.~~

596.42~~/13~~/14

This paragraph was deleted. ~~• Updated editor to Christopher Chapman.~~

596.43~~/13~~/14

This paragraph was deleted. ~~• Section 5, Line Breaking Properties~~

596.44~~/13~~/14

This paragraph was deleted. ~~• Added text clarifying the tailoribility of line break classes.~~

596.45~~/13~~/14

This paragraph was deleted. ~~• Added a note that the East_Asian_Width property that rule LB30 depends on is also tailorable.~~

596.46~~/13~~/14

This paragraph was deleted. ~~• Section 6.2, Tailorable Line Breaking Rules~~

596.47~~/13~~/14

This paragraph was deleted. ~~• Changed LB30 to exclude full-width CP and OP, and added a note about syntax up in Section 6 Line Breaking Algorithm~~

596.48~~/13~~/14

This paragraph was deleted. ~~• Changed LB22 to simply disallow breaking before elipsis, instead of checking characters before.~~

596.49~~/14~~/15

This paragraph was deleted. Revision 46:

596.50~~/14~~/15

This paragraph was deleted. ~~• Reissued for Unicode 14.0.~~

596.51~~/14~~/15

This paragraph was deleted. ~~• Section 6.2, Tailorable Line Breaking Rules~~

596.52~~/14~~/15

{14.0.0: 163-A70}

This paragraph was deleted. ~~• Removed the redundant rule (JL | JV | JT | H2 | H3) × IN from LB27.~~

596.53~~/14~~/15

{14.0.0: 167-A94, 168-C7, 168-C8, 168-A98; L2/21-135R}

This paragraph was deleted. ~~• Changed LB30b to include potential emoji as described in L2/21-135.~~

596.54~~/15~~/15.1

This paragraph was deleted. Revision 49:

596.55~~/15~~/15.1

This paragraph was deleted. ~~• Reissued for Unicode 15.0.~~

596.56~~/15~~/15.1

This paragraph was deleted. ~~• Corrected code point and name for U+26A0 WARNING SIGN in Section 5.1 (Combining Marks)~~

596.57~~/15~~/15.1

This paragraph was deleted. ~~• Corrected two instances of doubled words in Section 8~~

596.58~~/15~~/15.1

This paragraph was deleted. ~~• Removed reference to ”German” in example for Section 5.4~~

596.59~~/15~~/15.1

596.60~~/15~~/15.1

{15.0.0: 172-A98; L2/22-124; PRI-446#ID20220603102213}

This paragraph was deleted. ~~• Removed note about special behavior of U+23B6 from Section 5.1 (Quotation)~~

596.61~~/15.1~~/16

This paragraph was deleted. Revision 51:

596.62~~/15.1~~/16

This paragraph was deleted. ~~• Reissued for Unicode 15.1.~~

596.63~~/15.1~~/16

{15.1.0: 162-A43, 175-C27, 175-A77, 175-A79; L2/23-072,L2/22-080R2,L2/22-086; PRI-335#ID20170504182906,PRI-406#ID20191105182535}

This paragraph was deleted. ~~• Added support for line breaking at orthographic syllable boundaries and LB28a.~~

596.64~~/15.1~~/16

{15.1.0: 175-C23, 175-A71; L2/23-063}

This paragraph was deleted. ~~• Replaced rule LB15 by LB15a and LB15b, improving the handling of « French style » quotation marks.~~

596.65~~/15.1~~/16

{15.1.0: 173-C29, 173-A128; L2/22-229R,L2/22-234R2}

This paragraph was deleted. ~~• Added a note under LB5 recommending that source code editors support even optional hard line breaks.~~

596.66~~/15.1~~/16

{15.1.0: 173-A8; L2/22-244; PRI-446#ID20220410201211}

This paragraph was deleted. ~~• Clarified the description of “third style” line breaking in Section 3.1.~~

596.67~~/15.1~~/16

{15.1.0: 173-A13; L2/22-244; L2/22-243#ID20220921024738}

This paragraph was deleted. ~~• Updated Section 5.2 to consistently use the Unicode characters mentioned, instead of CP-1252 fallbacks.~~

596.68~~/15.1~~/16

{15.1.0: 173-A6; L2/22-244; L2/22-243#ID20220921075300}

This paragraph was deleted. ~~• Added a clearer characterization of allowed tailorings to Section 8.1.~~

596.69~~/15.1~~/16

{15.1.0: 173-A6; L2/22-244; L2/22-243#ID20220921075300}

This paragraph was deleted. ~~• Corrected Example 6 in Section 8.2.~~

596.70/16

Revision 53:

596.71/16

• Reissued for Unicode 16.0.

596.72/16

{16.0.0: 177-C46, 177-A113; L2/23-234}

• Updated the description of line breaking class AS to mention that all digits of scripts that use the brahmic style of line breaking are assigned this class.

596.73/16

{16.0.0: 177-A115, 177-C47, 162-A67; L2/23-234}

• Updated the documentation of plane 1 ranges defaulting to lb=ID.

596.74/16

{16.0.0: 173-A14; L2/22-244; L2/22-243#ID20220921024738}

• Moved most of Section 5.2 to the core specification.

596.75/16

{16.0.0: L2/24-009R; L2/24-008#ID20231107140948}

• Clarified the text of LB9 in response to PRI feedback.

596.76/16

{16.0.0: 178-A20}

• Clarified regular expressions in LB28a.

596.77/16

{16.0.0: 133-C26}

• Corrected the description of line breaking class PR to mention unassigned code points in the Currency Symbols block, which are lb=PR since Unicode Version 6.3.

596.78/16

{16.0.0: 179-C28, 179-A102}

• Modified LB19 and added LB19a to improve line breaking around quotation marks in Simplified Chinese.

596.79/16

{16.0.0: 179-C35, 179-A116}

• Modified LB13, added LB15c and LB15d, and modified LB25 to improve the handling of numeric expressions. This incorporates the tailoring formerly described in Example 7 into the default rules.

596.80/16

{16.0.0: 179-C32, 179-A111}

• Added LB20a to prevent line breaks after word-initial hyphens.

596.81/16

{16.0.0: 179-C25, 179-A98; PRI-335#ID20170429231648}

• Modified LB21a to restrict its effect to hyphens that separate Hebrew from non-Hebrew.

596.82/16

{16.0.0: 179-A110, 162-A45; L2/24-064; PRI-406#ID20191204151853}

• Section 5.1: Removed the parenthetical labels (A), (B), (P), (XA), (XB), (XP) from the descriptions of line breaking classes, as they are no longer sufficient to usefully summarize the behavior of these classes; added references to the rules that explicitly involve each line breaking class instead.

596.83/16

{16.0.0: 180-C17, 180-A55, 180-A56; L2/24-162}

• Updated the description of line breaking class AI to clarify that the statement about the relation with East_Asian_Width is historical.

596.84/16

• Updated the description of line breaking class CJ to no longer claim that normal is the CSS default.

596.85/16

{16.0.0: 175-C23, 175-A71; L2/23-063} {16.0.0: 179-C28, 179-A102}

• Updated the description of line breaking class QU to account for the rule changes made in this version and the previous one.

596.86/16

{16.0.0: 179-C25, 179-A98; PRI-335#ID20170429231648} {16.0.0: 137-C9}

• Updated the description of line breaking class HL to account for the rule changes made in this version and Unicode Version 8.0.

596.87/16

{16.0.0: 179-C29, 179-A105; PRI-335#ID20170429224811}

• Updated the description of line breaking class GL to mention half marks and continuous lining marks.

596.88/16

{16.0.0: 179-C30, 179-A107}

• Updated the description of line breaking classes IS, CL, and NS to account for the change in Line_Break assignment of the presentation forms for vertical comma, colon, and semicolon.

596.89/16

{16.0.0: 180-A59; L2/24-162}

• In the description of line breaking class ID, replaced the full list of code point ranges which have that default value with a reference to DerivedLineBreak.txt.

596.90/16

{16.0.0: 175-A67}

• Changed Section 11 to refer to UTN #54 instead of carrying a table of rule numberings.

596.91/16

{16.0.0: 179-A97; PRI-446#ID20220405071453}

• Corrected the description of the behavior of line breaking classes PO and PR: these may be separated from numbers if spaces intervene.

596.92/16

{16.0.0: 172-A100; PRI-446#ID20220603194905}

• Updated the description line breaking class CP to account for changes in Line_Break assignment of phonetic brackets.

596.93/16

{16.0.0: 180-C18, 180-A57; L2/24-162}

• Updated LB10 to specify what happens to all properties.

596.94/16

{16.0.0: 180-C18, 180-A57; L2/24-162}

• Excluded [ BA & $EastAsian ] = [\N{IDEOGRAPHIC SPACE}] from participating in LB21a.

596.95/16

• Updated Section 5 to mention other properties used by this algorithm: General_Category and Extended_Pictographic.

597~~/3.1~~/11

This paragraph was deleted. • ~~Rewrite and reorganization of the text as part of the publication of the Unicode Standard, Version 3.0.~~

597.1~~/3.0.1~~/3.1

{3.0.1: 83-C6; L2/00-118}

This paragraph was deleted. ~~Change from Version 6.0: Fixed several typos, new header.~~

597.2~~/4.1~~/5/11

Modifications[~~No change history is available~~ for previous versions are listed in those respective versions.~~earlier revisions.~~]

598~~/3.0.1/3.1/3.2~~/4~~/4.0.1/4.1~~/5~~/5.1/5.2~~/6~~/6.1/6.3~~/7/8/9~~/10/11/12/13/14/15/15.1~~/16

Part of this paragraph was moved to §598.1. ~~Copyright~~ © 1999–2024~~202320222021202020192018201720161998–20152014-2013201220102009200820062005200420032002200120001999~~ Unicode, Inc. This publication is protected by copyright, and permission must be obtained from Unicode, Inc. prior to any reproduction, modification, or other use not permitted by theAll Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode Terms of Use. Specifically, you may make copies of this publication and may annotate and translate it solely for personal or internal business purposes and not for public distribution, provided that any such permitted copies and modifications fully reproduce all copyright and other legal notices contained in the original. You may not make copies of or modifications to this publication for public distribution, or incorporate it in whole or in part into any product or publication without the express written permission of Unicode. ~~apply.~~

598.1~~/3.0.1/3.1/3.2~~/4~~/4.0.1/4.1~~/5~~/5.1/5.2~~/6~~/6.1/6.3~~/7/8/9~~/10/11/12/13/14/15/15.1~~/16

Split from §598. Use of all Unicode Products, including this publication, is governed by the Unicode Terms of Use~~Copyright~~ ~~© 202320222021202020192018201720161998–20152014-2013201220102009200820062005200420032002200120001999~~ ~~Unicode, Inc. All Rights Reserved~~. The authors, contributors, and publishers have taken care in the preparation of this publication, but make~~Unicode Consortium makes~~ no express~~expressed~~ or implied representation or warranty of any kind, and assume~~assumes~~ no responsibility or liability for errors or omissions or~~. No liability is assumed~~ for consequential or incidental ~~and consequential~~ damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.~~in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode Terms of Use apply.~~

Annotated Unicode® Standard AnnexTechnical Report #14

Annotated Unicode Line Breaking AlgorithmProperties

Summary

Status

Contents

1 Overview and Scope

2 Definitions

3 IntroductionDescription

3.1 Determining Line Break Opportunitiesline breaking opportunities

4 Conformance

This paragraph was deleted. 4.1 Line Breaking Algorithm

4.1 Conformance Requirements

This paragraph was deleted. 4.21 Line Breaking Properties

This paragraph was deleted. 4.2 Line Breaking Algorithm

This paragraph was deleted. 4.3 Higher-Level Protocols

5 Line Breaking Properties

Data File

Future Updates

5.1 Description of Line Breaking Properties

AI: —- Ambiguous (Alphabetic or Ideograph)

AK: Aksara (XB/XA)

AL: —- Ordinary Alphabetic and Symbol Characters (XP)

AP: Aksara Pre-Base (B/XA)

AS: Aksara Start (XB/XA)

BA: —- Break Opportunity After (A)

Breaking Spaces

Tabs

Conditional Hyphens

Breaking Hyphens

Visible Word Dividers

Historic Word Separators

Dandas

Tibetan

Other Terminating Punctuation

Letters Attached to Orthographic Syllables

BB: —- Break Opportunities Beforeopportunities before characters (B)

Dictionary Use

Tibetan and Phags-Pa Head Letters

Mongolian

B2: —- Break Opportunity Before and After (B/A/XP)

BK: —- Mandatory Break (A) —- (Non-tailorablenormative)

Newline Function“"NEW LINE FUNCTION (NLF)”"

CB: —- Contingent Break Opportunity (B/A) —- (normative)

CJ: Conditional Japanese Starter

CL: —- CloseClosing Punctuation (XB)

CM: —- Attached Characters and Combining MarkMarks (XB) (Non-tailorable) —- (normative)

Combining Characterscharacters

This paragraph was deleted. Conjoining Jamos (non-initial)

Control and Formatting Charactersformatting characters

CP: Closing Parenthesis (XB)

CR: —- Carriage Return (A) —- (Non-tailorablenormative)

EB: Emoji Base (B/A)

EM: Emoji Modifier (A)

EX: —- Exclamation / Interrogation (XB)

GL: —- Non-breaking (“"Glue”") (XB/XA) —- (Non-tailorablenormative)

H2: — Hangul LV Syllable (B/A)

H3: — Hangul LVT Syllable (B/A)

HY: —- Hyphen (XA)

ID: —- Ideographic (B/A)

Korean

Symbols

HL: Hebrew Letter (XB)

IN: —- Inseparable Characterscharacters (XP)

Leaders

IS: —- Infix Numeric Separator (Infix) (XB)

JL: — Hangul L Jamo (B)

JT: — Hangul T Jamo (A)

JV: — Hangul V Jamo (XA/XB)

LF: —- Line Feed (A) —- (Non-tailorablenormative)

NL: —- Next Line (A) —- (Non-tailorablenormative)

NS: Nonstarters —- Non-starters (XB)

NU: —- Numeric (XP)

OP: —- Opening Punctuation (XA)

PO: —- Postfix (Numeric) (XB)

PR: —- Prefix (Numeric) (XA)

QU: —- Ambiguous Quotation (XB/XA)

RI: Regional Indicator (B/A/XP)

SA: —- Complex-Contextcontext Dependent Characters (South East Asian) (P)

SG: —- SurrogateSurrogates (XP) —- (Non-tailorablenormative)

SP: —- Space (A) —- (Non-tailorablenormative)