TOC-UNIT III
TOC
Unit-III
Context
free grammar (CFG) Definitions and Examples, Unions Concatenations And Kleene’s
of Context free language, Regular Grammar for Regular Language, Derivations and
Ambiguity , Unambiguous CFG and Algebraic Expressions, Backus Naur Form (BNF),
Normal Form– CNF.
3.1)Context-Free Grammar (CFG)
Definitions
and Examples
CFG
stands for context-free grammar. It is is a formal grammar which is used to
generate all possible patterns of strings in a given formal language.
Context-free grammar G can be defined by four tuples as:
G = (V, T, P, S)
Where,
G is the grammar, which consists of a set of the
production rule. It is used to generate the string of a language.
T is the final set of a terminal symbol. It is
denoted by lower case letters.
V is the final set of a non-terminal symbol. It is
denoted by capital letters.
P is a set of production rules, which is used for
replacing non-terminals symbols(on the left side of the production) in a string
with other terminal or non-terminal symbols(on the right side of the
production).
S is the start symbol which is used to derive the
string. We can derive the string by repeatedly replacing a non-terminal by the
right-hand side of the production until all non-terminal have been replaced by
terminal symbols.
A
CFG is defined by:
- Nonterminal
symbols (variables): Represent
abstract categories or placeholders
(e.g., E,SE,S).
- Terminal
symbols (alphabet): The
actual characters or tokens in the language
(e.g., a, b,+,∗,(,)a, b, +, *, (, )a, b,+,∗,(,)).
- Production rules: Specify how non
terminals can be replaced with other non terminals or terminals
(e.g., E→E+EE → E + EE→E+E).
- Start symbol: A special nonterminal from
which derivations begin.
Example
1:
Construct
the CFG for the language having any number of a's over the set ∑= {a}.
Solution:
As
we know the regular expression for the above language is
- r.e. = a*
Production
rule for the Regular expression is as follows:
- S → aS rule 1
- S → ε rule 2
Now
if we want to derive a string "aaaaaa", we can start with start
symbols.
- S
- aS
- aaS rule 1
- aaaS rule 1
- aaaaS rule 1
- aaaaaS rule 1
- aaaaaaS rule 1
- aaaaaaε rule 2
- aaaaaa
The
r.e. = a* can generate a set of string {ε, a, aa, aaa,.....}. We can have a
null string because S is a start symbol and rule 2 gives S → ε.
Example
2:
Construct
a CFG for the regular expression (0+1)*
Solution:
The
CFG can be given by,
- Production rule (P):
- S → 0S | 1S
- S → ε
The
rules are in the combination of 0's and 1's with the start symbol. Since (0+1)*
indicates {ε, 0, 1, 01, 10, 00, 11, ....}. In this set, ε is a string, so in
the rule, we can set the rule S → ε.
Example
3:
Construct
a CFG for a language L = {wcwR | where w € (a, b)*}.
Solution:
The
string that can be generated for a given language is {aacaa, bcb, abcba, bacab,
abbcbba, ....}
The
grammar could be:
- S → aSa rule 1
- S → bSb rule 2
- S → c rule 3
Now
if we want to derive a string "abbcbba", we can start with start
symbols.
- S → aSa
- S → abSba from rule 2
- S → abbSbba from rule 2
- S → abbcbba from rule 3
Thus
any of this kind of string can be derived from the given production rules.
Example
4:
Construct
a CFG for the language L = anb2n where n>=1.
Solution:
The
string that can be generated for a given language is {abb, aabbbb,
aaabbbbbb....}.
The
grammar could be:
- S → aSbb | abb
Now
if we want to derive a string "aabbbb", we can start with start
symbols.
- S → aSbb
- S → aabbbb
Closure
Properties of CFLs
Closure
properties describe whether the set of CFLs remains closed under certain
operations.
👉
CFLs are closed under:
- Union (∪\cup∪)
- Concatenation
(⋅\cdot⋅)
- Kleene Star (∗^*∗)
- Reversal
- Homomorphism
👉
CFLs are not closed under:
- Intersection
- Complement
Unions
Concatenations And Kleene’s of Context free language
3.
Union of CFLs
3.2)Regular
Grammar For Regular Expression
1. Introduction
Regular Grammar and Regular Expressions are important tools used to
define and describe Regular Languages.
- Regular
Grammar is a formal grammar used to generate
regular languages.
- Regular
Expression is a symbolic method used to describe
regular languages.
Both are equivalent in power and are at the lowest level of the Chomsky
Hierarchy(Type-3)
2. Definition of Regular Grammar
A Regular Grammar (RG) is a type of formal grammar where every
production rule is in a specific restricted form.
Formal definition:
A Regular Grammar is a 4-tuple:
G = (V, T, P, S)
Where:
- V =
Set of non-terminal symbols (variables)
- T =
Set of terminal symbols (alphabet of the language)
- P =
Set of production rules
- S =
Start symbol (S ∈ V)
Production rules are of the form:
- A →
aB (terminal followed by non-terminal)
- A →
a (only terminal)
- A →
ε (empty string)
These rules make the grammar right-linear or left-linear.
3. Types of Regular Grammar
Right Linear Grammar
- All
productions are of the form: A → aB or A → a or A → ε
- Example:
S → aA
A → bB
B → c
Left Linear Grammar
- All
productions are of the form: A → Ba or A → a or A → ε
- Example:
S →
Ab
A →
Cc
C → d
Note: Right-linear grammars are more commonly used
because they are easier to convert to Finite Automata.
4. Regular Expression:
A Regular Expression (RE) is used to describe regular
languages.
Basic operations:
- Concatenation: ab
means a followed by b
- Union: a
+ b means either a or b
- Kleene
Star: a* means zero or more occurrences of a
Examples:
- a* →
ε, a, aa, aaa, …
- (a +
b)* → all combinations of a and b
- a+b
→ either a or b
5. Relationship between Regular
Grammar and Regular Expression
|
|
|
|
|
Method |
Generative |
Descriptive |
|
Output |
Set of
strings |
Pattern
to match strings |
|
Equivalent
To |
Finite
Automata |
Finite
Automata |
|
|
Compiler's
lexer |
grep,
regex |
Conclusion: Both are equivalent
in expressive power.
6. Conversion from Regular
Expression to Regular Grammar
Example:
Regular Expression: a (b + c)*
This RE represents:
- Starts
with 'a'
- Followed
by any number of 'b' or 'c'
Step-by-step grammar:
Let:
- S →
aA
- A →
bA | cA | ε
This Regular Grammar generates strings like: a, ab, ac, abc, abcb, etc.
7. Conversion from Regular
Grammar to Regular Expression
Let’s take a grammar:
S → aS |
bA
A → bA |
ε
Language generated:
- Strings
starting with any number of a, followed by one or more b
Regular Expression:
a*b*
Conversion steps:
- Convert
grammar to Finite Automaton
- Use state
elimination method to get the regular expression from FA
8. Regular Grammar and Finite
Automata
There is a one-to-one correspondence between:
- Regular
Grammar ↔ Finite Automaton (FA)
Mapping:
- Each
non-terminal → state in FA
- Production
A → aB → transition from state A to B on input a
- Production
A → a → transition to final state
- Production
A → ε → ε-transition to final state
![]()
![]()
![]()
9.
Properties of Regular Grammar
|
|
Description |
|
|
Closed
under union, concatenation, star, etc. |
|
|
Membership,
emptiness, and finiteness are decidable |
|
|
Equivalent
to DFA, NFA, and Regular Expression |
|
|
Easy to
implement in compilers and automata |
10. Applications
- Compiler
Design: Lexical analysis using RE and Regular
Grammar
- Text
Processing: Tools like grep, awk, sed
- Programming
Languages: Token definitions using regular expressions
- Data
Validation: Validating formats like email, password
patterns
- Search
Engines: Pattern matching in texts
3.3)Derivations and Ambiguity
Derivation
Derivation
is a sequence of production rules. It is used to get the input string through
these production rules. During parsing, we have to take two decisions. These
are as follows:
- We have to
decide the non-terminal which is to be replaced.
- We have to
decide the production rule by which the non-terminal will be replaced.
We
have two options to decide which non-terminal to be placed with production
rule.
1.
Leftmost Derivation:
In
the leftmost derivation, the input is scanned and replaced with the production
rule from left to right. So in leftmost derivation, we read the input string
from left to right.
Example:
Production
rules:
- E = E + E
- E = E - E
- E = a | b
Input
- a - b + a
The
leftmost derivation is:
- E = E + E
- E = E - E + E
- E = a - E + E
- E = a - b + E
- E = a - b + a
2.
Rightmost Derivation:
In
rightmost derivation, the input is scanned and replaced with the production
rule from right to left. So in rightmost derivation, we read the input string
from right to left.
Example
Production
rules:
- E = E + E
- E = E - E
- E = a | b
Input
- a - b + a
The
rightmost derivation is:
- E = E - E
- E = E - E + E
- E = E - E + a
- E = E - b + a
- E = a - b + a
When
we use the leftmost derivation or rightmost derivation, we may get the same
string. This type of derivation does not affect on getting of a string.
Examples
of Derivation:
Example
1:
Derive
the string "abb" for leftmost derivation and rightmost derivation
using a CFG given by,
- S → AB | ε
- A → aB
- B → Sb
Solution:
Leftmost
derivation:
Rightmost
derivation:
Example
2:
Derive
the string "aabbabba" for leftmost derivation and rightmost
derivation using a CFG given by,
- S → aB | bA
- S → a | aS | bAA
- S → b | aS | aBB
Solution:
Leftmost
derivation:
- S
- aB S → aB
- aaBB B → aBB
- aabB B → b
- aabbS B → bS
- aabbaB S → aB
- aabbabS B → bS
- aabbabbA S → bA
- aabbabba A → a
Rightmost
derivation:
- S
- aB S → aB
- aaBB B → aBB
- aaBbS B → bS
- aaBbbA S → bA
- aaBbba A → a
- aabSbba B → bS
- aabbAbba S → bA
- aabbabba A → a
Example
3:
Derive
the string "00101" for leftmost derivation and rightmost derivation
using a CFG given by,
- S → A1B
- A → 0A | ε
- B → 0B | 1B | ε
Solution:
Leftmost
derivation:
- S
- A1B
- 0A1B
- 00A1B
- 001B
- 0010B
- 00101B
- 00101
Rightmost
derivation:
- S
- A1B
- A10B
- A101B
- A101
- 0A101
- 00A101
- 00101
Derivation
Tree
Derivation
tree is a graphical representation for the derivation of the given production
rules for a given CFG. It is the simple way to show how the derivation can be
done to obtain some string from a given set of production rules. The derivation
tree is also called a parse tree.
Parse
tree follows the precedence of operators. The deepest sub-tree traversed first.
So, the operator in the parent node has less precedence over the operator in
the sub-tree.
A
parse tree contains the following properties:
- The root node
is always a node indicating start symbols.
- The
derivation is read from left to right.
- The leaf node
is always terminal nodes.
- The interior
nodes are always the non-terminal nodes.
Example
1:
Production
rules:
- E = E + E
- E = E * E
- E = a | b | c
Input
- a * b + c
Step
1:
Step
2:
Step
2:
Step
4:
Step
5:
Note:
We can draw a derivation tree step by step or directly in one step.
Example
2:
Draw
a derivation tree for the string "bab" from the CFG given by
- S → bSb | a | b
Solution:
Now,
the derivation tree for the string "bbabb" is as follows:
The
above tree is a derivation tree drawn for deriving a string bbabb. By simply
reading the leaf nodes, we can obtain the desired string. The same tree can
also be denoted by,
Example
3:
Construct
a derivation tree for the string aabbabba for the CFG given by,
- S → aB | bA
- A → a | aS | bAA
- B → b | bS | aBB
Solution:
To
draw a tree, we will first try to obtain derivation for the string aabbabba
Now,
the derivation tree is as follows:
Example
4:
Show
the derivation tree for string "aabbbb" with the following grammar.
- S → AB | ε
- A → aB
- B → Sb
Solution:
To
draw a tree we will first try to obtain derivation for the string aabbbb
Now,
the derivation tree for the string "aabbbb" is as follows:
Ambiguity
in Grammar
A
grammar is said to be ambiguous if there exists more than one leftmost
derivation or more than one rightmost derivation or more than one parse tree
for the given input string. If the grammar is not ambiguous, then it is called
unambiguous.
If
the grammar has ambiguity, then it is not good for compiler construction. No
method can automatically detect and remove the ambiguity, but we can remove
ambiguity by re-writing the whole grammar without ambiguity.
Example
1:
Let
us consider a grammar G with the production rule
- E → I
- E → E + E
- E → E * E
- E → (E)
- I → ε | 0 | 1 | 2 | ... | 9
Solution:
For
the string "3 * 2 + 5", the above grammar can generate two parse
trees by leftmost derivation:
Since
there are two parse trees for a single string "3 * 2 + 5", the
grammar G is ambiguous.
Example
2:
Check
whether the given grammar G is ambiguous or not.
- E → E + E
- E → E - E
- E → id
Solution:
From
the above grammar String "id + id - id" can be derived in 2 ways:
First
Leftmost derivation
- E → E + E
- → id + E
- → id + E - E
- → id + id - E
- → id + id- id
Second
Leftmost derivation
- E → E - E
- → E + E - E
- → id + E - E
- → id + id - E
- → id + id - id
Since
there are two leftmost derivation for a single string "id + id - id",
the grammar G is ambiguous.
Example
3:
Check
whether the given grammar G is ambiguous or not.
- S → aSb | SS
- S → ε
Solution:
For
the string "aabb" the above grammar can generate two parse trees
Since
there are two parse trees for a single string "aabb", the grammar G
is ambiguous.
Example
4:
Check
whether the given grammar G is ambiguous or not.
- A → AA
- A → (A)
- A → a
Solution:
For
the string "a(a)aa" the above grammar can generate two parse trees:
Since
there are two parse trees for a single string "a(a)aa", the grammar G
is ambiguous.
3.4)Unambiguous Grammar&Algebraic
Expression
3.4.1)Umambiguous
Grammar
A
grammar can be unambiguous if the grammar does not contain ambiguity that means
if it does not contain more than one leftmost derivation or more than one
rightmost derivation or more than one parse tree for the given input string.
To
convert ambiguous grammar to unambiguous grammar, we will apply the following
rules:
1.
If the left associative operators (+, -, *, /) are used in the production rule,
then apply left recursion in the production rule. Left recursion means that the
leftmost symbol on the right side is the same as the non-terminal on the left
side. For example,
- X → Xa
2.
If the right associative operates(^) is used in the production rule then apply
right recursion in the production rule. Right recursion means that the
rightmost symbol on the left side is the same as the non-terminal on the right
side. For example,
- X → aX
Example
1:
Consider
a grammar G is given as follows:
- S → AB | aaB
- A → a | Aa
- B → b
Determine
whether the grammar G is ambiguous or not. If G is ambiguous, construct an
unambiguous grammar equivalent to G.
Solution:
Let
us derive the string "aab"
As
there are two different parse tree for deriving the same string, the given
grammar is ambiguous.
Unambiguous
grammar will be:
- S → AB
- A → Aa | a
- B → b
Example
2:
Show
that the given grammar is ambiguous. Also, find an equivalent unambiguous
grammar.
- S → ABA
- A → aA | ε
- B → bB | ε
Solution:
The
given grammar is ambiguous because we can derive two different parse tree for
string aa.
The
unambiguous grammar is:
- S → aXY | bYZ | ε
- Z → aZ | a
- X → aXY | a | ε
- Y → bYZ | b | ε
Example
3:
Show
that the given grammar is ambiguous. Also, find an equivalent unambiguous
grammar.
- E → E + E
- E → E * E
- E → id
Solution:
Let
us derive the string "id + id * id"
As
there are two different parse tree for deriving the same string, the given
grammar is ambiguous.
Unambiguous
grammar will be:
- E → E + T
- E → T
- T → T * F
- T → F
- F → id
Example
4:
Check
that the given grammar is ambiguous or not. Also, find an equivalent
unambiguous grammar.
- S → S + S
- S → S * S
- S → S ^ S
- S → a
Solution:
The
given grammar is ambiguous because the derivation of string aab can be
represented by the following string:
Unambiguous
grammar will be:
- S → S + A |
- A → A * B | B
- B → C ^ B | C
- C → a
Simplification
of CFG
As
we have seen, various languages can efficiently be represented by a
context-free grammar. All the grammar are not always optimized that means the
grammar may consist of some extra symbols(non-terminal). Having extra symbols,
unnecessary increase the length of grammar. Simplification of grammar means
reduction of grammar by removing useless symbols. The properties of reduced
grammar are given below:
- Each variable
(i.e. non-terminal) and each terminal of G appears in the derivation of
some word in L.
- There should
not be any production as X → Y where X and Y are non-terminal.
- If ε is not
in the language L then there need not to be the production X → ε.
Let
us study the reduction process in detail./p>
Removal
of Useless Symbols
A
symbol can be useless if it does not appear on the right-hand side of the
production rule and does not take part in the derivation of any string. That
symbol is known as a useless symbol. Similarly, a variable can be useless if it
does not take part in the derivation of any string. That variable is known as a
useless variable.
For
Example:
- T → aaB | abA | aaT
- A → aA
- B → ab | b
- C → ad
In
the above example, the variable 'C' will never occur in the derivation of any
string, so the production C → ad is useless. So we will eliminate it, and the
other productions are written in such a way that variable C can never reach
from the starting variable 'T'.
Production
A → aA is also useless because there is no way to terminate it. If it never
terminates, then it can never produce a string. Hence this production can never
take part in any derivation.
To
remove this useless production A → aA, we will first find all the variables
which will never lead to a terminal string such as variable 'A'. Then we will
remove all the productions in which the variable 'B' occurs.
Elimination
of ε Production
The
productions of type S → ε are called ε productions. These type of productions
can only be removed from those grammars that do not generate ε.
Step
1: First find out all
nullable non-terminal variable which derives ε.
Step
2: For each production
A → a, construct all production A → x, where x is obtained from a by removing
one or more non-terminal from step 1.
Step
3: Now combine the
result of step 2 with the original production and remove ε productions.
Example:
Remove
the production from the following CFG by preserving the meaning of it.
- S → XYX
- X → 0X | ε
- Y → 1Y | ε
Solution:
Now,
while removing ε production, we are deleting the rule X → ε and Y → ε. To
preserve the meaning of CFG we are actually placing ε at the right-hand side
whenever X and Y have appeared.
Let
us take
- S → XYX
If
the first X at right-hand side is ε. Then
- S → YX
Similarly
if the last X in R.H.S. = ε. Then
- S → XY
If
Y = ε then
- S → XX
If
Y and X are ε then,
- S → X
If
both X are replaced by ε
- S → Y
Now,
- S → XY | YX | XX | X | Y
Now
let us consider
- X → 0X
If
we place ε at right-hand side for X then,
- X → 0
- X → 0X | 0
Similarly
Y → 1Y | 1
Collectively
we can rewrite the CFG with removed ε production as
- S → XY | YX | XX | X | Y
- X → 0X | 0
- Y → 1Y | 1
Removing
Unit Productions
The
unit productions are the productions in which one non-terminal gives another
non-terminal. Use the following steps to remove unit production:
Step
1: To remove X → Y,
add production X → a to the grammar rule whenever Y → a occurs in the grammar.
Step
2: Now delete X → Y
from the grammar.
Step
3: Repeat step 1 and
step 2 until all unit productions are removed.
For
example:
- S → 0A | 1B | C
- A → 0S | 00
- B → 1 | A
- C → 01
Solution:
S
→ C is a unit production. But while removing S → C we have to consider what C
gives. So, we can add a rule to S.
- S → 0A | 1B | 01
Similarly,
B → A is also a unit production so we can modify it as
- B → 1 | 0S | 00
Thus
finally we can write CFG without unit production as
- S → 0A | 1B | 01
- A → 0S | 00
- B → 1 | 0S | 00
- C → 01
3.4.2)Algebraic Expressions
In
Theory of Computation (TOC), an algebraic expression is a symbolic
representation of computations, languages, or derivations.
- Just as
arithmetic expressions describe numbers and operations,
- Algebraic
expressions in TOC describe strings, formal languages, grammars, and
automata behavior.
They
are especially useful for:
- Expressing grammar
rules (CFGs).
- Representing derivations.
- Describing language
operations like union, concatenation, and closure.
An
algebraic expression in TOC is built using:
- Terminals (symbols of the language).
- Non-terminals (variables in grammar).
- Operators (like +, ·, *) for language
operations.
- Parentheses (for grouping).
These
algebraic operations allow us to build expressions of languages.
6.
Applications of Algebraic Expressions in TOC
- Formal
Language Theory –
Representing and manipulating languages.
- Regular
Expressions – Defining
patterns in text processing, compilers, and search.
- Context-Free
Grammars (CFGs) – Defining
syntax of programming languages.
- Automata
Theory – Algebraic
expressions help in DFA/NFA construction.
- Compiler
Design – Expression
parsing, syntax-directed translation.
- Mathematical
Modeling – Useful in proving
closure properties of languages.
3.5)BNF(Backus-Naur Form)
·
BNF stands
for Backus-Naur Form. It is used to write a formal representation
of a context-free grammar. It is also used to describe the syntax of a
programming language.
·
BNF notation is
basically just a variant of a context-free grammar.
In
BNF, productions have the form:
- Left side → definition
·
Where leftside ∈ (Vn∪
Vt)+ and definition ∈ (Vn∪ Vt)*. In BNF, the leftside contains one
non-terminal.
·
We can define the
several productions with the same leftside. All the productions are separated
by a vertical bar symbol "|".
There
is the production for any grammar as follows:
- S → aSa
- S → bSb
- S → c
In
BNF, we can represent above grammar as follows:
- S → aSa| bSb| c
Rules
for making BNF
Naturally,
we will be defining grammar for the rules in the BNF and at its core, a BNF
grammar rule would look like this:
rule
→ name ::= expansion
·
Here, the name is
the label for a part of the language, usually written between angle brackets
like <identifier>.
·
The expansion depicts
how that part can be formed by just making use of a combination of the smaller
parts or symbols.
The
way all these expansions work can be described like this:
- One expansion
can be easily followed by the other (which is known as concatenation).
- However, the
expansions can also be alternatives and could be separated by the help of
the pipe symbol |.
- An expansion
can be a name (non-terminal) or a terminal (an actual symbol or word in
the language).
More
often, terminals are usually specific pieces such as keywords (if, while) or
symbols (+, -).
Sometimes,
they are literal strings inside quotes, like "+" or
"switch". Other times, they may also refer to categories
like integers or identifiers, which are defined
elsewhere, often with the help of regular expressions.
For
example:
- A * means
"repeat zero or more times."
- A + means
"repeat one or more times."
- A? means
"optional" (zero or one time)
Python's
grammar also makes use of such as BNF, but it does not always make use of the
angle brackets, and it combines regular expressions with the brackets [ ] for
the optional parts. Here is a simplified look at the floating-point definition:
- A float can
be a number with a decimal point or with an exponent.
- The decimal
part might be optional or required, depending on the format.
- The exponent
part may include an e or E followed by an optional sign as well as the
digits in an efficient manner.
This
grammar says:
- An expression
can be an expression plus/minus a term, or just a term.
- A term
can be a term multiplied/divided by a factor, or just a factor.
- A factor
can be an expression in parentheses or a digit.
4.
Advantages of BNF
- Provides a formal,
unambiguous way to describe languages.
- Easier for compiler
design, parsing, and syntax analysis.
- Compact
representation compared to large informal descriptions.
5.
Extended BNF (EBNF)
Sometimes,
an extended version is used with:
- [] for
optional parts,
- {} for
repetition (zero or more times),
- () for
grouping.
Example
(EBNF):
<identifier>
::= <letter> { <letter> | <digit> }
This
means: an identifier starts with a letter, followed by zero or more letters or
digits.
3.6)Normal Forms
In
Theory of Computation (TOC), particularly in Context-Free Grammars (CFGs), a normal
form is a standardized or restricted way of writing grammar production rules
without changing the language generated by the grammar.
- A grammar is
said to be in normal form if all its production rules follow a
specific, predefined structure.
- Converting
grammars into normal form makes them simpler, more systematic, and
easier to analyze.
Two
Important Normal Forms for CFGs
1.
Chomsky Normal Form (CNF)
A
CFG is in Chomsky Normal Form if every production rule is of the form:
- A → BC
(where A, B, C are non-terminals, and B, C are not the start symbol) - A → a
(where a is a terminal) - S → ε
(only allowed if ε is in the language, with S as the start symbol)
✔
No unit productions (A → B) and no useless/ε-productions (except start symbol).
Example:
Grammar:
S
→ AB | a
A
→ a
B
→ b
✅
Already in CNF:
- S → AB
(non-terminal to two non-terminals)
- S → a, A → a,
B → b (non-terminal to terminal)
Why
CNF is useful?
- Used in CYK
parsing algorithm (Cocke–Younger–Kasami).
- Helps in
proving whether a CFG is ambiguous.
- Provides a
structured form for theoretical proofs.
2.
Greibach Normal Form (GNF)
Definition:
A
CFG is in Greibach Normal Form if every production rule is of the form:
A→aα
Where:
- a is a terminal,
- α is a string
of non-terminals (possibly empty).
Example:
S
→ aAB | b
A
→ aS | b
B
→ a
✔
All productions start with a terminal followed by zero or more non-terminals.
Why
GNF is useful?
- Ensures that derivation
of any string begins with a terminal → helps in top-down parsing.
- Used in
constructing LL(1) parsers.
3.
Other Normal Forms
- Kuroda Normal
Form: For
context-sensitive grammars, rules are of form
AB → CD, A → BC, or A → a. - Simplified
Normal Form: Involves
removing ε-productions, unit productions, and useless
symbols before converting to CNF/GNF.
Steps
to Convert a CFG into Normal Form
- Remove ε-productions
(if any).
- Remove unit
productions (A → B).
- Remove useless
symbols.
- Convert to CNF
or GNF depending on need.
3.7)Chomsky's Normal Form (CNF)
CNF
stands for Chomsky normal form. A CFG(context free grammar) is in CNF(Chomsky
normal form) if all production rules satisfy one of the following conditions:
- Start symbol
generating ε. For example, A → ε.
- A
non-terminal generating two non-terminals. For example, S → AB.
- A
non-terminal generating a terminal. For example, S → a.
For
example:
- G1 = {S → AB, S → c, A → a, B → b}
- G2 = {S → aA, A → a, B → c}
The
production rules of Grammar G1 satisfy the rules specified for CNF, so the
grammar G1 is in CNF. However, the production rule of Grammar G2 does not
satisfy the rules specified for CNF as S → aZ contains terminal followed by
non-terminal. So the grammar G2 is not in CNF.
Steps
for converting CFG into CNF
Step
1: Eliminate start
symbol from the RHS. If the start symbol T is at the right-hand side of any
production, create a new production as:
- S1 → S
Where
S1 is the new start symbol.
Step
2: In the grammar,
remove the null, unit and useless productions. You can refer to the Simplification
of CFG.
Step
3: Eliminate terminals
from the RHS of the production if they exist with other non-terminals or
terminals. For example, production S → aA can be decomposed as:
- S → RA
- R → a
Step
4: Eliminate RHS with
more than two non-terminals. For example, S → ASB can be decomposed as:
- S → RS
- R → AS
Example:
Convert
the given CFG to CNF. Consider the given grammar G1:
- S → a | aA | B
- A → aBB | ε
- B → Aa | b
Solution:
Step
1: We will create a
new production S1 → S, as the start symbol S appears on the RHS. The grammar
will be:
- S1 → S
- S → a | aA | B
- A → aBB | ε
- B → Aa | b
Step
2: As grammar G1
contains A → ε null production, its removal from the grammar yields:
- S1 → S
- S → a | aA | B
- A → aBB
- B → Aa | b | a
Now,
as grammar G1 contains Unit production S → B, its removal yield:
- S1 → S
- S → a | aA | Aa | b
- A → aBB
- B → Aa | b | a
Also
remove the unit production S1 → S, its removal from the grammar yields:
- S0 → a | aA | Aa | b
- S → a | aA | Aa | b
- A → aBB
- B → Aa | b | a
Step
3: In the production
rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a exists on RHS
with non-terminals. So we will replace terminal a with X:
- S0 → a | XA | AX | b
- S → a | XA | AX | b
- A → XBB
- B → AX | b | a
- X → a
Step
4: In the production
rule A → XBB, RHS has more than two symbols, removing it from grammar yield:
- S0 → a | XA | AX | b
- S → a | XA | AX | b
- A → RB
- B → AX | b | a
- X → a
- R → XB
Hence,
for the given grammar, this is the required CNF.
Comments
Post a Comment