TOC-UNIT III

TOC

 Unit-III

Context free grammar (CFG) Definitions and Examples, Unions Concatenations And Kleene’s of Context free language, Regular Grammar for Regular Language, Derivations and Ambiguity , Unambiguous CFG and Algebraic Expressions, Backus Naur Form (BNF), Normal Form– CNF.

3.1)Context-Free Grammar (CFG)

Definitions and Examples

CFG stands for context-free grammar. It is is a formal grammar which is used to generate all possible patterns of strings in a given formal language. Context-free grammar G can be defined by four tuples as:

G = (V, T, P, S)  

Where,

G is the grammar, which consists of a set of the production rule. It is used to generate the string of a language.

T is the final set of a terminal symbol. It is denoted by lower case letters.

V is the final set of a non-terminal symbol. It is denoted by capital letters.

P is a set of production rules, which is used for replacing non-terminals symbols(on the left side of the production) in a string with other terminal or non-terminal symbols(on the right side of the production).

S is the start symbol which is used to derive the string. We can derive the string by repeatedly replacing a non-terminal by the right-hand side of the production until all non-terminal have been replaced by terminal symbols.

A CFG is defined by:

  • Nonterminal symbols (variables): Represent abstract categories or placeholders
    (e.g., E,SE,S).
  • Terminal symbols (alphabet): The actual characters or tokens in the language
    (e.g., a, b,+,
    ,(,)a, b, +, *, (, )a, b,+,,(,)).
  • Production rules: Specify how non terminals can be replaced with other non terminals or terminals
    (e.g., E→E+EE → E + EE→E+E).
  • Start symbol: A special nonterminal from which derivations begin.

Example 1:

Construct the CFG for the language having any number of a's over the set ∑= {a}.

Solution:

As we know the regular expression for the above language is

  1. r.e. = a*     

Production rule for the Regular expression is as follows:

  1. S → aS    rule 1  
  2. S → ε     rule 2  

Now if we want to derive a string "aaaaaa", we can start with start symbols.

  1.  S  
  2. aS   
  3. aaS          rule 1  
  4. aaaS         rule 1  
  5. aaaaS        rule 1  
  6. aaaaaS       rule 1  
  7. aaaaaaS      rule 1  
  8. aaaaaaε      rule 2  
  9. aaaaaa  

The r.e. = a* can generate a set of string {ε, a, aa, aaa,.....}. We can have a null string because S is a start symbol and rule 2 gives S → ε.

Example 2:

Construct a CFG for the regular expression (0+1)*

Solution:

The CFG can be given by,

  1. Production rule (P):  
  2. S → 0S | 1S  
  3. S → ε  

The rules are in the combination of 0's and 1's with the start symbol. Since (0+1)* indicates {ε, 0, 1, 01, 10, 00, 11, ....}. In this set, ε is a string, so in the rule, we can set the rule S → ε.

Example 3:

Construct a CFG for a language L = {wcwR | where w € (a, b)*}.

Solution:

The string that can be generated for a given language is {aacaa, bcb, abcba, bacab, abbcbba, ....}

The grammar could be:

  1. S → aSa     rule 1  
  2. S → bSb     rule 2  
  3. S → c       rule 3  

Now if we want to derive a string "abbcbba", we can start with start symbols.

  1. S → aSa   
  2. S → abSba       from rule 2  
  3. S → abbSbba     from rule 2  
  4. S → abbcbba     from rule 3  

Thus any of this kind of string can be derived from the given production rules.

Example 4:

Construct a CFG for the language L = anb2n where n>=1.

Solution:

The string that can be generated for a given language is {abb, aabbbb, aaabbbbbb....}.

The grammar could be:

  1. S → aSbb | abb  

Now if we want to derive a string "aabbbb", we can start with start symbols.

  1. S → aSbb  
  2. S → aabbbb    

Closure Properties of CFLs

Closure properties describe whether the set of CFLs remains closed under certain operations.

👉 CFLs are closed under:

  • Union (\cup)
  • Concatenation (\cdot)
  • Kleene Star (^*)
  • Reversal
  • Homomorphism

👉 CFLs are not closed under:

  • Intersection
  • Complement

Unions Concatenations And Kleene’s of Context free language

3. Union of CFLs

 

3.2)Regular Grammar For Regular Expression

1. Introduction

Regular Grammar and Regular Expressions are important tools used to define and describe Regular Languages.

  • Regular Grammar is a formal grammar used to generate regular languages.
  • Regular Expression is a symbolic method used to describe regular languages.

Both are equivalent in power and are at the lowest level of the Chomsky Hierarchy(Type-3)

2. Definition of Regular Grammar

A Regular Grammar (RG) is a type of formal grammar where every production rule is in a specific restricted form.

Formal definition:

A Regular Grammar is a 4-tuple:

G = (V, T, P, S)

Where:

  • V = Set of non-terminal symbols (variables)
  • T = Set of terminal symbols (alphabet of the language)
  • P = Set of production rules
  • S = Start symbol (S V)

Production rules are of the form:

  • A → aB (terminal followed by non-terminal)
  • A → a (only terminal)
  • A → ε (empty string)

These rules make the grammar right-linear or left-linear.

3. Types of Regular Grammar

 Right Linear Grammar

  • All productions are of the form: A → aB or A → a or A → ε
  • Example:

           S → aA 

           A → bB 

           B → c

Left Linear Grammar

  • All productions are of the form: A → Ba or A → a or A → ε
  • Example:

S → Ab 

A → Cc 

C → d

Note: Right-linear grammars are more commonly used because they are easier to convert to Finite Automata.

4. Regular Expression:

A Regular Expression (RE) is used to describe regular languages.

Basic operations:

  • Concatenation: ab means a followed by b
  • Union: a + b means either a or b
  • Kleene Star: a* means zero or more occurrences of a

Examples:

  • a* → ε, a, aa, aaa, …
  • (a + b)* → all combinations of a and b
  • a+b → either a or b

5. Relationship between Regular Grammar and Regular Expression

Feature

Regular Grammar

Regular Expression

Method

Generative

Descriptive

Output

Set of strings

Pattern to match strings

Equivalent To

Finite Automata

Finite Automata

Example Tool

Compiler's lexer

grep, regex

 Conclusion: Both are equivalent in expressive power.

6. Conversion from Regular Expression to Regular Grammar

Example:

Regular Expression: a (b + c)*

This RE represents:

  • Starts with 'a'
  • Followed by any number of 'b' or 'c'

Step-by-step grammar:

Let:

  • S → aA
  • A → bA | cA | ε

This Regular Grammar generates strings like: a, ab, ac, abc, abcb, etc.

7. Conversion from Regular Grammar to Regular Expression

Let’s take a grammar:

S → aS | bA 

A → bA | ε

Language generated:

  • Strings starting with any number of a, followed by one or more b

Regular Expression:

a*b*

Conversion steps:

  1. Convert grammar to Finite Automaton
  2. Use state elimination method to get the regular expression from FA

8. Regular Grammar and Finite Automata

There is a one-to-one correspondence between:

  • Regular Grammar ↔ Finite Automaton (FA)

Mapping:

  • Each non-terminal → state in FA
  • Production A → aB → transition from state A to B on input a
  • Production A → a → transition to final state
  • Production A → ε → ε-transition to final state

9. Properties of Regular Grammar

Property

Description

Closure Properties

Closed under union, concatenation, star, etc.

Decidability

Membership, emptiness, and finiteness are decidable

Equivalence

Equivalent to DFA, NFA, and Regular Expression

Simplicity

Easy to implement in compilers and automata


10. Applications

  • Compiler Design: Lexical analysis using RE and Regular Grammar
  • Text Processing: Tools like grep, awk, sed
  • Programming Languages: Token definitions using regular expressions
  • Data Validation: Validating formats like email, password patterns
  • Search Engines: Pattern matching in texts

 

3.3)Derivations and Ambiguity

Derivation

Derivation is a sequence of production rules. It is used to get the input string through these production rules. During parsing, we have to take two decisions. These are as follows:

  • We have to decide the non-terminal which is to be replaced.
  • We have to decide the production rule by which the non-terminal will be replaced.

We have two options to decide which non-terminal to be placed with production rule.

1. Leftmost Derivation:

In the leftmost derivation, the input is scanned and replaced with the production rule from left to right. So in leftmost derivation, we read the input string from left to right.

Example:

Production rules:

  1. E = E + E  
  2. E = E - E  
  3. E = a | b  

Input

  1. a - b + a  

The leftmost derivation is:

  1. E = E + E  
  2. E = E - E + E  
  3. E = a - E + E  
  4. E = a - b + E  
  5. E = a - b + a  

2. Rightmost Derivation:

In rightmost derivation, the input is scanned and replaced with the production rule from right to left. So in rightmost derivation, we read the input string from right to left.

Example

Production rules:

  1. E = E + E  
  2. E = E - E  
  3. E = a | b  

Input

  1. a - b + a  

The rightmost derivation is:

  1. E = E - E  
  2. E = E - E + E  
  3. E = E - E + a  
  4. E = E - b + a  
  5. E = a - b + a  

When we use the leftmost derivation or rightmost derivation, we may get the same string. This type of derivation does not affect on getting of a string.

Examples of Derivation:

Example 1:

Derive the string "abb" for leftmost derivation and rightmost derivation using a CFG given by,

  1. S → AB | ε  
  2. A → aB  
  3. B → Sb  

Solution:

Leftmost derivation:

Derivation

Rightmost derivation:

Derivation

Example 2:

Derive the string "aabbabba" for leftmost derivation and rightmost derivation using a CFG given by,

  1. S → aB | bA  
  2.  S → a | aS | bAA  
  3. S → b | aS | aBB  

Solution:

Leftmost derivation:

  1. S  
  2. aB            S → aB      
  3. aaBB          B → aBB  
  4. aabB          B → b  
  5. aabbS         B → bS  
  6. aabbaB        S → aB  
  7. aabbabS       B → bS  
  8. aabbabbA      S → bA  
  9. aabbabba      A → a  

Rightmost derivation:

  1. S  
  2. aB            S → aB      
  3. aaBB          B → aBB  
  4. aaBbS         B → bS  
  5. aaBbbA        S → bA  
  6. aaBbba        A → a  
  7. aabSbba       B → bS  
  8. aabbAbba      S → bA  
  9. aabbabba      A → a  

Example 3:

Derive the string "00101" for leftmost derivation and rightmost derivation using a CFG given by,

  1. S → A1B  
  2. A → 0A | ε  
  3. B → 0B | 1B | ε  

Solution:

Leftmost derivation:

  1. S  
  2. A1B  
  3. 0A1B  
  4. 00A1B  
  5. 001B  
  6. 0010B  
  7. 00101B  
  8. 00101  

Rightmost derivation:

  1. S  
  2. A1B  
  3. A10B  
  4. A101B  
  5. A101  
  6. 0A101  
  7. 00A101  
  8. 00101  

Derivation Tree

Derivation tree is a graphical representation for the derivation of the given production rules for a given CFG. It is the simple way to show how the derivation can be done to obtain some string from a given set of production rules. The derivation tree is also called a parse tree.

Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the operator in the parent node has less precedence over the operator in the sub-tree.

A parse tree contains the following properties:

  1. The root node is always a node indicating start symbols.
  2. The derivation is read from left to right.
  3. The leaf node is always terminal nodes.
  4. The interior nodes are always the non-terminal nodes.

Example 1:

Production rules:

  1. E = E + E  
  2. E = E * E  
  3. E = a | b | c  

Input

  1. a * b + c  

Step 1:

Derivation Tree

Step 2:

Derivation Tree

Step 2:

Derivation Tree

Step 4:

Derivation Tree

Step 5:

Derivation Tree

Note: We can draw a derivation tree step by step or directly in one step.

Example 2:

Draw a derivation tree for the string "bab" from the CFG given by

  1. S → bSb | a | b  

Solution:

Now, the derivation tree for the string "bbabb" is as follows:

Derivation Tree

The above tree is a derivation tree drawn for deriving a string bbabb. By simply reading the leaf nodes, we can obtain the desired string. The same tree can also be denoted by,

Derivation Tree

Example 3:

Construct a derivation tree for the string aabbabba for the CFG given by,

  1. S → aB | bA  
  2. A → a | aS | bAA  
  3. B → b | bS | aBB  

Solution:

To draw a tree, we will first try to obtain derivation for the string aabbabba

Derivation Tree

Now, the derivation tree is as follows:

Derivation Tree

Example 4:

Show the derivation tree for string "aabbbb" with the following grammar.

  1. S → AB | ε  
  2. A → aB  
  3. B → Sb  

Solution:

To draw a tree we will first try to obtain derivation for the string aabbbb

Derivation Tree

Now, the derivation tree for the string "aabbbb" is as follows:

Derivation Tree

 

Ambiguity in Grammar

A grammar is said to be ambiguous if there exists more than one leftmost derivation or more than one rightmost derivation or more than one parse tree for the given input string. If the grammar is not ambiguous, then it is called unambiguous.

If the grammar has ambiguity, then it is not good for compiler construction. No method can automatically detect and remove the ambiguity, but we can remove ambiguity by re-writing the whole grammar without ambiguity.

Example 1:

Let us consider a grammar G with the production rule

  1. E → I  
  2. E → E + E  
  3. E → E * E  
  4. E → (E)  
  5. I → ε | 0 | 1 | 2 | ... | 9  

Solution:

For the string "3 * 2 + 5", the above grammar can generate two parse trees by leftmost derivation:

Ambiguity in Grammar

Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is ambiguous.

Example 2:

Check whether the given grammar G is ambiguous or not.

  1. E → E + E  
  2. E → E - E  
  3. E → id  

Solution:

From the above grammar String "id + id - id" can be derived in 2 ways:

First Leftmost derivation

  1. E → E + E  
  2.    → id + E  
  3.    → id + E - E  
  4.    → id + id - E  
  5.    → id + id- id  

Second Leftmost derivation

  1. E → E - E  
  2.    → E + E - E  
  3.    → id + E - E  
  4.    → id + id - E  
  5.    → id + id - id  

Since there are two leftmost derivation for a single string "id + id - id", the grammar G is ambiguous.

Example 3:

Check whether the given grammar G is ambiguous or not.

  1. S → aSb | SS  
  2. S → ε  

Solution:

For the string "aabb" the above grammar can generate two parse trees

Ambiguity in Grammar

Since there are two parse trees for a single string "aabb", the grammar G is ambiguous.

Example 4:

Check whether the given grammar G is ambiguous or not.

  1. A → AA  
  2. A → (A)  
  3. A → a  

Solution:

For the string "a(a)aa" the above grammar can generate two parse trees:

Ambiguity in Grammar

Since there are two parse trees for a single string "a(a)aa", the grammar G is ambiguous.

3.4)Unambiguous Grammar&Algebraic Expression

3.4.1)Umambiguous Grammar

A grammar can be unambiguous if the grammar does not contain ambiguity that means if it does not contain more than one leftmost derivation or more than one rightmost derivation or more than one parse tree for the given input string.

To convert ambiguous grammar to unambiguous grammar, we will apply the following rules:

1. If the left associative operators (+, -, *, /) are used in the production rule, then apply left recursion in the production rule. Left recursion means that the leftmost symbol on the right side is the same as the non-terminal on the left side. For example,

  1. X → Xa  

2. If the right associative operates(^) is used in the production rule then apply right recursion in the production rule. Right recursion means that the rightmost symbol on the left side is the same as the non-terminal on the right side. For example,

  1. X → aX  

Example 1:

Consider a grammar G is given as follows:

  1. S → AB | aaB  
  2. A → a | Aa  
  3. B → b  

Determine whether the grammar G is ambiguous or not. If G is ambiguous, construct an unambiguous grammar equivalent to G.

Solution:

Let us derive the string "aab"

Unambiguous Grammar

As there are two different parse tree for deriving the same string, the given grammar is ambiguous.

Unambiguous grammar will be:

  1. S → AB  
  2. A → Aa | a  
  3. B → b  

Example 2:

Show that the given grammar is ambiguous. Also, find an equivalent unambiguous grammar.

  1. S → ABA  
  2. A → aA | ε  
  3. B → bB | ε  

Solution:

The given grammar is ambiguous because we can derive two different parse tree for string aa.

Unambiguous Grammar

The unambiguous grammar is:

  1. S → aXY | bYZ | ε  
  2. Z → aZ | a  
  3. X → aXY | a | ε  
  4. Y → bYZ | b | ε  

Example 3:

Show that the given grammar is ambiguous. Also, find an equivalent unambiguous grammar.

  1. E → E + E  
  2. E → E * E  
  3. E → id  

Solution:

Let us derive the string "id + id * id"

Unambiguous Grammar

As there are two different parse tree for deriving the same string, the given grammar is ambiguous.

Unambiguous grammar will be:

  1. E → E + T  
  2. E → T  
  3. T → T * F  
  4. T → F  
  5. F → id  

Example 4:

Check that the given grammar is ambiguous or not. Also, find an equivalent unambiguous grammar.

  1.  S → S + S  
  2. S → S * S  
  3. S → S ^ S  
  4. S → a  

Solution:

The given grammar is ambiguous because the derivation of string aab can be represented by the following string:

Unambiguous Grammar

Unambiguous grammar will be:

  1. S → S + A |  
  2. A → A * B | B  
  3. B → C ^ B | C  
  4. C → a  

 

Simplification of CFG

As we have seen, various languages can efficiently be represented by a context-free grammar. All the grammar are not always optimized that means the grammar may consist of some extra symbols(non-terminal). Having extra symbols, unnecessary increase the length of grammar. Simplification of grammar means reduction of grammar by removing useless symbols. The properties of reduced grammar are given below:

  1. Each variable (i.e. non-terminal) and each terminal of G appears in the derivation of some word in L.
  2. There should not be any production as X → Y where X and Y are non-terminal.
  3. If ε is not in the language L then there need not to be the production X → ε.

Let us study the reduction process in detail./p> Simplification of CFG

Removal of Useless Symbols

A symbol can be useless if it does not appear on the right-hand side of the production rule and does not take part in the derivation of any string. That symbol is known as a useless symbol. Similarly, a variable can be useless if it does not take part in the derivation of any string. That variable is known as a useless variable.

For Example:

  1. T → aaB | abA | aaT  
  2. A → aA  
  3. B → ab | b  
  4. C → ad  

In the above example, the variable 'C' will never occur in the derivation of any string, so the production C → ad is useless. So we will eliminate it, and the other productions are written in such a way that variable C can never reach from the starting variable 'T'.

Production A → aA is also useless because there is no way to terminate it. If it never terminates, then it can never produce a string. Hence this production can never take part in any derivation.

To remove this useless production A → aA, we will first find all the variables which will never lead to a terminal string such as variable 'A'. Then we will remove all the productions in which the variable 'B' occurs.

Elimination of ε Production

The productions of type S → ε are called ε productions. These type of productions can only be removed from those grammars that do not generate ε.

Step 1: First find out all nullable non-terminal variable which derives ε.

Step 2: For each production A → a, construct all production A → x, where x is obtained from a by removing one or more non-terminal from step 1.

Step 3: Now combine the result of step 2 with the original production and remove ε productions.

Example:

Remove the production from the following CFG by preserving the meaning of it.

  1. S → XYX  
  2. X → 0X | ε  
  3. Y → 1Y | ε  

Solution:

Now, while removing ε production, we are deleting the rule X → ε and Y → ε. To preserve the meaning of CFG we are actually placing ε at the right-hand side whenever X and Y have appeared.

Let us take

  1. S → XYX  

If the first X at right-hand side is ε. Then

  1. S → YX  

Similarly if the last X in R.H.S. = ε. Then

  1. S → XY  

If Y = ε then

  1. S → XX  

If Y and X are ε then,

  1. S → X  

If both X are replaced by ε

  1. S → Y  

Now,

  1. S → XY | YX | XX | X | Y  

Now let us consider

  1. X → 0X  

If we place ε at right-hand side for X then,

  1. X → 0  
  2. X → 0X | 0  

Similarly Y → 1Y | 1

Collectively we can rewrite the CFG with removed ε production as

  1. S → XY | YX | XX | X | Y  
  2. X → 0X | 0  
  3. Y → 1Y | 1  

Removing Unit Productions

The unit productions are the productions in which one non-terminal gives another non-terminal. Use the following steps to remove unit production:

Step 1: To remove X → Y, add production X → a to the grammar rule whenever Y → a occurs in the grammar.

Step 2: Now delete X → Y from the grammar.

Step 3: Repeat step 1 and step 2 until all unit productions are removed.

For example:

  1. S → 0A | 1B | C  
  2. A → 0S | 00  
  3. B → 1 | A  
  4. C → 01  

Solution:

S → C is a unit production. But while removing S → C we have to consider what C gives. So, we can add a rule to S.

  1. S → 0A | 1B | 01  

Similarly, B → A is also a unit production so we can modify it as

  1. B → 1 | 0S | 00  

Thus finally we can write CFG without unit production as

  1. S → 0A | 1B | 01  
  2. A → 0S | 00  
  3. B → 1 | 0S | 00  
  4. C → 01  

3.4.2)Algebraic Expressions

In Theory of Computation (TOC), an algebraic expression is a symbolic representation of computations, languages, or derivations.

  • Just as arithmetic expressions describe numbers and operations,
  • Algebraic expressions in TOC describe strings, formal languages, grammars, and automata behavior.

They are especially useful for:

  • Expressing grammar rules (CFGs).
  • Representing derivations.
  • Describing language operations like union, concatenation, and closure.

An algebraic expression in TOC is built using:

  1. Terminals (symbols of the language).
  2. Non-terminals (variables in grammar).
  3. Operators (like +, ·, *) for language operations.
  4. Parentheses (for grouping).

These algebraic operations allow us to build expressions of languages.

6. Applications of Algebraic Expressions in TOC

  1. Formal Language Theory – Representing and manipulating languages.
  2. Regular Expressions – Defining patterns in text processing, compilers, and search.
  3. Context-Free Grammars (CFGs) – Defining syntax of programming languages.
  4. Automata Theory – Algebraic expressions help in DFA/NFA construction.
  5. Compiler Design – Expression parsing, syntax-directed translation.
  6. Mathematical Modeling – Useful in proving closure properties of languages.

 

3.5)BNF(Backus-Naur Form)

·        BNF stands for Backus-Naur Form. It is used to write a formal representation of a context-free grammar. It is also used to describe the syntax of a programming language.

·        BNF notation is basically just a variant of a context-free grammar.

In BNF, productions have the form:

  1. Left side → definition  

·        Where leftside (Vn Vt)+ and definition (Vn Vt)*. In BNF, the leftside contains one non-terminal.

·        We can define the several productions with the same leftside. All the productions are separated by a vertical bar symbol "|".

There is the production for any grammar as follows:

  1. S → aSa  
  2. S → bSb  
  3. S → c  

In BNF, we can represent above grammar as follows:

  1. S → aSa| bSb| c  

Rules for making BNF

Naturally, we will be defining grammar for the rules in the BNF and at its core, a BNF grammar rule would look like this:

rule → name ::= expansion

·        Here, the name is the label for a part of the language, usually written between angle brackets like <identifier>.

·        The expansion depicts how that part can be formed by just making use of a combination of the smaller parts or symbols.

The way all these expansions work can be described like this:

  • One expansion can be easily followed by the other (which is known as concatenation).
  • However, the expansions can also be alternatives and could be separated by the help of the pipe symbol |.
  • An expansion can be a name (non-terminal) or a terminal (an actual symbol or word in the language).

More often, terminals are usually specific pieces such as keywords (if, while) or symbols (+, -).

Sometimes, they are literal strings inside quotes, like "+" or "switch". Other times, they may also refer to categories like integers or identifiers, which are defined elsewhere, often with the help of regular expressions.

For example:

  • A * means "repeat zero or more times."
  • A + means "repeat one or more times."
  • A? means "optional" (zero or one time)

Python's grammar also makes use of such as BNF, but it does not always make use of the angle brackets, and it combines regular expressions with the brackets [ ] for the optional parts. Here is a simplified look at the floating-point definition:

  • A float can be a number with a decimal point or with an exponent.
  • The decimal part might be optional or required, depending on the format.
  • The exponent part may include an e or E followed by an optional sign as well as the digits in an efficient manner.

This grammar says:

  • An expression can be an expression plus/minus a term, or just a term.
  • A term can be a term multiplied/divided by a factor, or just a factor.
  • A factor can be an expression in parentheses or a digit.

4. Advantages of BNF

  • Provides a formal, unambiguous way to describe languages.
  • Easier for compiler design, parsing, and syntax analysis.
  • Compact representation compared to large informal descriptions.

5. Extended BNF (EBNF)

Sometimes, an extended version is used with:

  • [] for optional parts,
  • {} for repetition (zero or more times),
  • () for grouping.

Example (EBNF):

<identifier> ::= <letter> { <letter> | <digit> }

This means: an identifier starts with a letter, followed by zero or more letters or digits.

3.6)Normal Forms

In Theory of Computation (TOC), particularly in Context-Free Grammars (CFGs), a normal form is a standardized or restricted way of writing grammar production rules without changing the language generated by the grammar.

  • A grammar is said to be in normal form if all its production rules follow a specific, predefined structure.
  • Converting grammars into normal form makes them simpler, more systematic, and easier to analyze.

Two Important Normal Forms for CFGs

1. Chomsky Normal Form (CNF)

A CFG is in Chomsky Normal Form if every production rule is of the form:

  1. A → BC
    (where A, B, C are non-terminals, and B, C are not the start symbol)
  2. A → a
    (where a is a terminal)
  3. S → ε
    (only allowed if ε is in the language, with S as the start symbol)

No unit productions (A → B) and no useless/ε-productions (except start symbol).

Example:

Grammar:

S → AB | a

A → a

B → b

Already in CNF:

  • S → AB (non-terminal to two non-terminals)
  • S → a, A → a, B → b (non-terminal to terminal)

Why CNF is useful?

  • Used in CYK parsing algorithm (Cocke–Younger–Kasami).
  • Helps in proving whether a CFG is ambiguous.
  • Provides a structured form for theoretical proofs.

2. Greibach Normal Form (GNF)

Definition:

A CFG is in Greibach Normal Form if every production rule is of the form:

A→aα

Where:

  • a is a terminal,
  • α is a string of non-terminals (possibly empty).

Example:

S → aAB | b

A → aS | b

B → a

All productions start with a terminal followed by zero or more non-terminals.

Why GNF is useful?

  • Ensures that derivation of any string begins with a terminal → helps in top-down parsing.
  • Used in constructing LL(1) parsers.

3. Other Normal Forms

  • Kuroda Normal Form: For context-sensitive grammars, rules are of form
    AB → CD, A → BC, or A → a.
  • Simplified Normal Form: Involves removing ε-productions, unit productions, and useless symbols before converting to CNF/GNF.

Steps to Convert a CFG into Normal Form

  1. Remove ε-productions (if any).
  2. Remove unit productions (A → B).
  3. Remove useless symbols.
  4. Convert to CNF or GNF depending on need.

3.7)Chomsky's Normal Form (CNF)

CNF stands for Chomsky normal form. A CFG(context free grammar) is in CNF(Chomsky normal form) if all production rules satisfy one of the following conditions:

  • Start symbol generating ε. For example, A → ε.
  • A non-terminal generating two non-terminals. For example, S → AB.
  • A non-terminal generating a terminal. For example, S → a.

For example:

  1. G1 = {S → AB, S → c, A → a, B → b}  
  2. G2 = {S → aA, A → a, B → c}  

The production rules of Grammar G1 satisfy the rules specified for CNF, so the grammar G1 is in CNF. However, the production rule of Grammar G2 does not satisfy the rules specified for CNF as S → aZ contains terminal followed by non-terminal. So the grammar G2 is not in CNF.

Steps for converting CFG into CNF

Step 1: Eliminate start symbol from the RHS. If the start symbol T is at the right-hand side of any production, create a new production as:

  1. S1 → S  

Where S1 is the new start symbol.

Step 2: In the grammar, remove the null, unit and useless productions. You can refer to the Simplification of CFG.

Step 3: Eliminate terminals from the RHS of the production if they exist with other non-terminals or terminals. For example, production S → aA can be decomposed as:

  1. S → RA  
  2. R → a  

Step 4: Eliminate RHS with more than two non-terminals. For example, S → ASB can be decomposed as:

  1. S → RS  
  2. R → AS  

Example:

Convert the given CFG to CNF. Consider the given grammar G1:

  1. S → a | aA | B  
  2. A → aBB | ε  
  3. B → Aa | b  

Solution:

Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS. The grammar will be:

  1.  S1 → S  
  2. S → a | aA | B  
  3. A → aBB | ε  
  4. B → Aa | b  

Step 2: As grammar G1 contains A → ε null production, its removal from the grammar yields:

  1. S1 → S  
  2. S → a | aA | B  
  3. A → aBB  
  4. B → Aa | b | a  

Now, as grammar G1 contains Unit production S → B, its removal yield:

  1. S1 → S  
  2. S → a | aA | Aa | b  
  3. A → aBB  
  4. B → Aa | b | a  

Also remove the unit production S1 → S, its removal from the grammar yields:

  1. S0 → a | aA | Aa | b  
  2. S → a | aA | Aa | b  
  3. A → aBB  
  4. B → Aa | b | a  

Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a exists on RHS with non-terminals. So we will replace terminal a with X:

  1. S0 → a | XA | AX | b  
  2. S → a | XA | AX | b  
  3. A → XBB  
  4. B → AX | b | a  
  5. X → a  

Step 4: In the production rule A → XBB, RHS has more than two symbols, removing it from grammar yield:

  1. S0 → a | XA | AX | b  
  2. S → a | XA | AX | b  
  3. A → RB  
  4. B → AX | b | a  
  5. X → a  
  6. R → XB  

Hence, for the given grammar, this is the required CNF.

 

Comments

Popular posts from this blog

RDBMS LAB EXERCISES WITH ANSWER

DATA STRUCTURES-UNIT IV

DATA STRUCTURES-UNIT V