Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views67 pages

Character Set and C Tokens

The document provides an overview of character sets and tokens in the C programming language, detailing the types of characters, their functions, and the rules for creating identifiers. It explains the Source Character Set (SCS) and Execution Character Set (ECS), along with utility functions for character classification. Additionally, it covers the definition and types of tokens, including keywords, identifiers, constants, and the importance of adhering to naming conventions in C.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views67 pages

Character Set and C Tokens

The document provides an overview of character sets and tokens in the C programming language, detailing the types of characters, their functions, and the rules for creating identifiers. It explains the Source Character Set (SCS) and Execution Character Set (ECS), along with utility functions for character classification. Additionally, it covers the definition and types of tokens, including keywords, identifiers, constants, and the importance of adhering to naming conventions in C.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

Module 01

Character Set and C Tokens


Bhargavi Dalal
What is Character Set?
• Character Set is a collection of permissible characters that can be used in
a variety of contexts by the program.

• Just like we use a set of various words, numbers, statements, etc., the C
programming language also consists of a set of various different types of
characters. These are known as the characters in C.

• They include digits, alphabets, special symbols, etc.


• The C language provides support for about 256 characters.
Types of Character Set.
• Generally, there are two types of character sets in C.

• Source Character Set (SCS):


• Execution Character Set (ECS):
Source Character Set (SCS):
• Before preprocessing, SCS is used to parse the source code into an internal
representation.
• White-space characters and the Basic Character package are included in
this package.
• It is the collection of symbols that can be used to create source code.

• The initial stage of the C PreProcessor (CPP) is to translate the encoding


of the source code into the Source Character Set (SCS), which is done
before the preprocessing phase.
Execution Character Set (ECS)
• Constants for character strings are stored in ECS.

• This set includes


Control Characters,
Escape Sequences, and the
Basic Character Set.

• CPP converts character and string constant encoding into the Execution Character Set
(ECS) following the preprocessing phase.
• The use of utility functions found in C is also used to describe the various sorts of
character sets.
• Both the Source Character Set and the Execution Character Set use UTF-8
encoding by default in CPP.
• The following compiler flags allow the user to alter them.
• -finput-charset is used to set SCS.
• Usage: gccmain.c -finput-charset=UTF-8

• -fexec-charset is used to set ECS.


• Usage: gccmain.c -fexec-charset=UTF-8
Basic Character Set
• Origin and Method Characters in character
sets are rarely shared.

1. Alphabet
2. Digit
3. Symbols
4. Space
Alphabets
• The C programming language provides support for all the alphabets that we
use in the English language.
• Thus, in simpler words, a C program would easily support a total of 52
different characters- 26 uppercase and 26 lowercase.
• It has both capital and lowercase letters. Lowercase ASCII characters fall
within the range [97, 122], and uppercase ASCII characters fall within the
range [65, 90]. Example: A, B, A, B, etc.
Utility Functions:

• isalpha, islower, and isupper determine whether a character is an


uppercase, lowercase, or alphabet.

• The alphabets are changed to the proper case using tolower and
toupper.
Digits
• The C programming language provides the support for all the digits that help in constructing/
supporting the numeric values or expressions in a program.
• These range from 0 to 9, and also help in defining an identifier.
• Thus, the C language supports a total of 10 digits for constructing the numeric values or
expressions in any program.
• The range of the ASCII digits is [48, 57]. Example: 0, 1, 2, etc.

• Utility functions:
• The function isdigit determines if the supplied character is a digit. The function isalnum
determines if a character is an alphanumeric character.
Special Characters
• We use some special characters in the C language for some special purposes, such as
 logical operations,
 mathematical operations,
 checking of conditions,
 backspaces,
 white spaces, etc.
• We can also use these characters for defining the identifiers in a much better way.
• For instance, we use underscores for constructing a longer name for a variable, etc.
• The C programming language provides support for the following types of special chaPunctuation/Special Characters:
• The following characters are classified as punctuation by the default C locale.
• Utility functions:
• The function ispunct determines if a character is a punctuation character.
• The ASCII code and usage examples for each punctuation character are included in the table below table.
White Spaces
• Space is also important Character set of the "C" Language, this language use all the
types of space like blank space, back space, tab space, v - space , h - space ...etc in our
programme, Each Space have its own ASCII value
• We can use Actual Space or its ASCII value. both are same work in our programme.
• White-space characters:
• The Source Character Set includes these individuals. They have an impact on the
displayed text but are visually invisible.
• Utility Functions:
• The function isspace determines whether a character is a space.
• The white spaces in the C programming language contain the following:
Blank Spaces
Carriage Return
Tab
New Line
Summary of Special Characters in C
• Here is a table that represents all the types of character sets that we can use in
the C language:
Control Character Set
• The ASCII codes for these characters run from 0 to 31 (inclusive) and the
127th character.

• Although they are not visible, they still impact the program in several ways.

• In contrast to Backspace on the keyboard, which deletes the previous character,


the a (BEL) character may create a beep sound or screen flashing when
printed, and the b (BS) character moves the cursor one step back.

• Utility Functions:
• The function iscntrl determines if a character is a control character.
Escape Sequences

• The Execution Character Set includes these characters.


• You can use the backslash (/) key to distinguish these characters.
• Although it consists of two or more characters, C PreProcessor only
counts them as one.

• Example: a, b, t, etc.
Purpose of Character Set in C
• The character sets help in defining the valid characters that we can use in the
source program or can interpret during the running of the program.
• For the source text, we have the source character set, while we have the
execution character set that we use during the execution of any program.
• But we have various types of character sets.
• For instance, one of the character sets follows the basis of the ASCII
character definitions, while the other set consists of various kanji characters
(Japanese).
• The type of character set we use will have no impact on the compiler- but we
must know that every character has different, unique values.
• The C language treats every character with different integer values.
• Let us know a bit more about the ASCII characters.
ASCII Values
• All the character sets used in the C language have their equivalent ASCII value.
• The ASCII value stands for American Standard Code for Information Interchange value.
• It consists of less than 256 characters, and we can represent these in 8 bits or even less.
• But we use a special type for accommodating and representing the larger sets of
characters.
• These are called the wide-character type or wchat_t.
• However, a majority of the ANSI-compatible compilers in C accept these ASCII
characters for both the character sets- the source and the execution.
• Every ASCII character will correspond to a specific numeric value.
• Here is a list of all the ASCII characters, along with their assigned numeric values.
C TOKENS
Token in C
• In C, tokens are the smallest units that make up a program and are used to
build and execute the program.
• The different types of tokens include..
• Keywords: Reserved words in C that have fixed meanings and must be
written in lowercase
• Identifiers: User-defined names that identify program elements, such as
variables, functions, and arrays
• Constants: Fixed values that don't change during program execution
• Operators: Symbols that operate on one or more operands to produce an
output
• Strings: Sequences of characters enclosed within single or double quotation
marks
• Special characters: Include parentheses, braces, brackets, and semicolons
• When a C program is compiled, the compiler breaks it down into
tokens to understand the program's structure and functionality.

• This process is called tokenization and is a crucial step in the


compilation process.
Keywords in C
• We can define the keywords as the reserved or pre-defined words that hold
their own importance.
• It means that every keyword has a functionality of its own.
• Since the keywords are basically predefined words that the compilers
use, thus we cannot use them as the names of variables.
• If we use the keywords in the form of variable names, it would mean that
we assign a different meaning to it- something that isn’t allowed.

• The C language provides a support for 32 keywords, as mentioned below:


Here are some characteristics of C keywords
1. Reserved: The C language reserves keywords are those keywords that cannot be used as
identifiers in programs. Using a keyword as a variable name or other identifier will cause
a compilation error.

2. Predefined Meaning: Each keyword has a specific meaning that is assigned by the C
language. These meanings are built into the C language's grammar and syntax and the
compiler interprets them accordingly.

3. Specific Use: Keywords are designed for specific purposes and contexts within the C
language. They define control structures, data types, flow control, and other language
constructs. Attempting to use a keyword outside of its intended purpose will result in a
compilation error.

4. Standardized: C language keywords are standardized across different compilers and


implementations. It ensures the consistency and portability of C programs across different
platforms and environments.
The C language provides a support for 32
keywords, as mentioned below:
#include <stdio.h>

• break: It is used to terminate the int main() {


execution of a loop or switch for (int i = 0; i< 10; i++) {
statement. if (i == 5) {
break;
}
• Syntax: printf("%d ", i);
• It has the following syntax: }
break;
return 0;
}
Identifiers
• Identifiers or symbols are we give for a variable, type, functions, arrays,
structures, unions, labels in our program.

• C identifiers represent the name in the C program,

• For example, variables, functions, arrays, structures, unions, labels, etc.

• An identifier can be composed of letters such as uppercase, lowercase letters,


underscore, digits, but the starting letter should be either an alphabet or an
underscore.
• We can say that an identifier is a collection of alphanumeric characters
that begins either with an alphabetical character or an underscore, which are
used to represent various programming elements such as variables, functions,
arrays, structures, unions, labels, etc.

• There are 52 alphabetical characters (uppercase and lowercase), underscore


character, and ten numerical digits (0-9) that represent the identifiers.

• There is a total of 63 alphanumerical characters that represent the identifiers.


Rules for constructing C identifiers
• Rules for the first character of an identifier should be either an alphabet or an
underscore, and then it can be followed by any of the character, digit, or
underscore.
• It should not begin with any numerical digit.
• In identifiers, both uppercase and lowercase letters are distinct.
• Therefore, we can say that identifiers are case sensitive.
• Commas or blank spaces cannot be specified within an identifier.
• Keywords cannot be represented as an identifier.
• The length of the identifiers should not be more than 31 characters.
• Identifiers should be written in such a way that it is meaningful, short, and
easy to read.
Types Of Identifiers In C
• Identifiers in C can primarily be divided into two types.

• Internal identifier
• External identifier
Internal Identifiers In C
• The term internal identifier refers to an identifier that is used internally within a program
and cannot be utilized in an external connection or external linkage.
• For example, internal identifiers are variable names or function names that are used in a
C program, used to refer to these program entities.

• In this sense, these identifiers are local to a specific scoop/ module of the program,
meaning they can only be accessed within a specific block or function.
• Local variables may also serve as internal identifiers.
• Also, these identifiers are not visible or accessible from outside that specific scope or
module.
• Example
{
data_type identifier;
}
• We declare an integer variable x and initialize it to the value 20 inside the main() function. Then,
we display the value of x using a printf() function.

• Next, we initialize another variable, y, inside another code block with the value of 20 and print it
to the console using the printf() function.

• Now, as mentioned in the code comments, if we try to access y outside of the given code block,
for example, to print its value, the compiler will throw an error.

• This is because x is defined outside of any blocks, so it is an external identifier that can be
retrieved from anywhere in the code.

• But y is declared inside a block using curly brackets, which means that it can only be accessed
within that block.

• We cannot access y from outside of that block, so when the program tries to display the value of
y once again outside of the block, a compilation error happens. An error happens because y is
External Identifier

• The term external identifier refers to an identifier that is utilized in an external


linkage.

• That is, these are those identifiers in C that, when declared in one source file,
can be accessed by another source file in the same program.

• Function names and global variables can both be used as external identifiers.

• Here, the elements are the same as in the case of internal identifiers; the only
difference is there is no specific code block where the element is being
declared/ defined.
• We first define and initialize an integer variable num to the value 10
outside of the main() function.

• Then, inside main(), we use the printf() function to display the value
of the variable on the console.

• Here, num is an external identifier defined at the top of the file,


outside of any functions.
Invalid Identifiers In C
• Invalid identifiers are those that go against the rules or suggestions given in the c programming
language. Protected in this listing are names that-

• Begin with a variety of or comprise special characters.


• Are reserved phrases or are keywords in C.
• Are too long.
• Incorporate special characters.
• The way to make code comprehensible and clean is then to use proper, meaningful names for
identifiers. It is also vital to stick to the policies of the language to keep away from syntax mistakes
and different troubles.
• Example -
• Here are some invalid identifiers:
Valid Identifiers In C
• The term valid identifiers refers to those identifiers that the C language
recognizes as names for variables, functions, and other program entities.
• After the initial letter or underscore character in an identifier, any
combination of characters, numbers, or underscores may follow.
• The identifiers will be used in the program and understood by the C
compiler if these rules are followed.
• It is essential to choose names for identifiers that are useful and illustrative
in order to make the code easier to understand and read.
• Here are some valid identifiers:
Output
Using Keywords As Identifiers In C
• As per the guidelines for naming identifiers, one cannot use keywords for
naming entities.
• If someone attempts to use keywords for identification, an error warning will
appear.
• It is, hence, best to avoid using keywords as identifiers, which is the practice
of using reserved words from a programming language as variable names or
other identifiers in a program.
Differences Between Keywords And Identifiers
In C
Constant
• In programming languages like C, constants are essential because they give you a
mechanism to store unchanging values that hold true throughout the course of the
program.
• These numbers may be used for several things, such as creating mathematical constants
or giving variables set values.
• A constant in C is a value that doesn't change as the program runs.
Integers,
floating-point numbers,
characters, and
strings
• These are just a few of the several types of constants that may be employed.
• When a constant has a value, it cannot be changed, unlike variables.
• They may be utilized in various operations and computations and serve as the
program's representation of fixed values.
Rules For Defining A Constant In C
• Constants are declared using the const keyword, followed by an appropriate
state and a constant value.
• However, there is a set of rules and regulations that one must follow when
using constants.
• The rules for defining constants in C language are as follows:

1. At the time of declaration of a constant value, you must use the const
keyword followed by the data type of the respective variable/ value
and the variable name. That is, constants are declared using appropriate
literals such as const int, const float, etc.
const data_type var_name;
const int number = 10;
#define PI 3.1415
• For example, if you want to define a constant integer value with the name
number and initial value 10. Here is what the line of code will look like-
const int number = 10

• An alternate way of defining constants in C programs is by using the #define


pre-processor directive. For example, if you want to use the constant value of
Pi across a program., you can define it using the directive as follows-
#define PI 3.1415

• Note that we can define constants globally to be accessible from any part of the
program. We can also define them locally, in which case we can access them
only by deriving from the number block they are defined in.

• Note that you cannot make any modifications to values declared as constant.
Attempts to update the constant value will result in an error.
2 ways to define constant in C

• There are two ways to


define constant in C
programming.

1. const keyword
2. #define preprocessor

• The #define preprocessor is


also used to define constant.
We will learn about #define
preprocessor directive.
• We then declare and initialize two constant variables, MAX_VALUE (of integer
type)and PI (of float type), with the values 100 and 3.14, respectively.

• Then, inside the main() function, we declare another integer type constant,
MIN_VALUE, with the initial value 0.

• Next, we use a set of printf() statements to access and display all the values to the
console.

• As shown in the output, constant variables defined outside of the main() part are still
accessible globally (global access).

• Also, the numeric constant defined inside the main() part can be accessed from within
it (local access).
Advantages of C Constants
• There are several advantages of C Constants. Some main advantages of C
Constants are as follows:
1.Programmers may use constants to provide names that have meaning to fixed
numbers, which makes the code simpler to comprehend and update.
2.Constants assist in avoiding the usage of magic numbers, which are hard-coded
values, in the code. Instead, constants offer named representations of such values,
enhancing the code's readability.
3.Constants are reusable throughout the program, allowing for constant values in
various locations and lowering the possibility of errors brought on by typos or
inconsistent values.
4.Calculations or processes inside the program can be optimized by using certain
constants, such as mathematical or physical constants.
5.A constant is a value or variable that can't be changed in the program,
6.For example: 10, 20, 'a', 3.4, "c programming", etc.
There are different types of constants in C
programming.
Decimal Constant
• A whole number represented in base 10 is known as a decimal constant.
• It has digits that range from 0 to 9.
• Declaring a decimal constant has a simple syntax that just requires the
value to be written.
Real or Floating-Point Constant
• A fractional component or exponentiation of a number is represented
by a real or floating-point constant.

• It can be expressed with a decimal point, the letter "E", or the symbol
"e" in exponential or decimal notation.
Octal Constant
• A base 8 value is represented by an octal constant.
• It is prefixed with a '0' (zero) to show that it is an octal constant and has
digits ranging from 0 to 7.
Hexadecimal Constant
• A base-16 value is represented by a hexadecimal constant. It uses letters
A to F (or a to f) and numbers 0 to 9 to represent values from 10 to 15.
• It is prefixed with '0x' or '0X' to identify it as a hexadecimal constant.
Character Constant
• A character constant represents a single character that is enclosed in single
quotes.
String Constant

• A series of characters wrapped in double quotes is represented by a


string constant.

• It is a character array that ends with the null character \0.


Operators in C
• The operators in C are the special symbols that we use for performing various
functions.
• Operands are those data items on which we apply the operators.
• We apply the operators in between various operands. On the basis of the total
number of operands, here is how we classify the operators:

• Unary Operator
• Binary Operator
• Ternary Operator
Conclusion

• As a result of their representation of fixed values that don't change during


the course of the program, constants are crucial in C programming.
Thank You
Bhargavi Dalal

You might also like