What is Character Set in C?
In C, a character set is a group of recognized characters that are available for usage in the source code. These characters are the building blocks that make variables, functions, operators, and constants. In short, a collection of characters, numbers, and symbols that the C compiler recognizes is known as a character set.
Types of Character Sets in C Language
Character set in C language are split into two main categories:
- Source Character Set (SCS)
- Execution Character Set (ECS)
1. Source Character Set (SCS)
The C language's source character set (SCS) includes all of the characters that can be used in source code when building C programs. These characters form the building blocks of the code and help create the syntax of the program. The SCS consists of:
- Alphabets: Uppercase (A–Z) and lowercase (a–z) letters are used to name variables, functions, and other identifiers.
- Digits: The numbers 0 through 9 represent numeric values in the code.
- Special Characters: Symbols like !, @, #, $, %, ^, &, *, +, -, =, {}, [], (), <, >, ;, ,, etc., are used for operations, syntax, and formatting.
- Whitespace Characters: These characters, like spaces (' '), tabs ('\t'), and newlines ('\n'), format the code for readability and are ignored by the compiler during execution.
- Escape Sequences: These represent special characters like newline (\n), tab (\t), and backslash (\\), which are used to include non-printable characters in strings.
- Extended Characters: These include characters beyond the ASCII range (values >127), often used for international characters or special symbols, like é, ç, or ¥.
2. Execution Character Set (ECS)
The Execution Character Set in C language (ECS) includes the characters that the system processes when the program runs. While the SCS is about writing code, the ECS is about how that code behaves during execution. The ECS includes:
- All characters from the Source Character Set (SCS): This means the letters, digits, symbols, and whitespace characters you used while writing the code.
- Control Characters: Non-printable characters like newline ('\n'), carriage return ('\r'), and tab ('\t') control text formatting and data flow, but don't separate tokens in the code.
Explanation of Source Character Set Types in C
In C programming, characters play a significant role in defining variables and functions that create meaningful code. C supports many types of characters, and each type has its purpose. The key character types of the character set in C are listed below.
1. Alphabets
Both lowercase and uppercase letters can be used in C. These letters are commonly used when naming variables, functions, constants, and other identifiers in your code.
Uppercase Letters (A–Z)
These are the capital letters A through Z. In the ASCII (American Standard Code for Information Interchange) table, uppercase letters have values ranging from 65 ('A') to 90 ('Z'), For example: A, B, C, ..., Z
Lowercase Letters (a–z)
These are the small letters from a to z.In the ASCII table, lowercase letters have values varying from 97 ('a') to 122 ('z'). For example, a, b, c, ..., z.
Code Example
#include <stdio.h>
int main() {
printf("Uppercase Letters:\n");
for ( char ch = 'A'; ch <= 'Z'; ch++ )
{
printf("%c ", ch); //Prints the uppercase letters from “A” through “Z”
}
printf("\nLowercase Letters:\n");
for ( char ch = 'a'; ch <= 'z'; ch++ )
{
printf("%c ", ch); // prints the lowercase letters "a" through "z."
}
return 0;
}
Explanation
In this C program, we use two for loops to display all uppercase and lowercase letters. The first loop starts with the character 'A' and continues until 'Z', printing each letter using printf("%c ", ch). This works because characters in C are internally represented by their ASCII values.
The second loop follows the same logic but starts from 'a' and runs until 'z' to display lowercase letters. Each loop increases the character variable by one in every iteration, moving through the ASCII sequence for uppercase and lowercase letters.
Output
Uppercase Letters:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Lowercase Letters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
2. Digits
In C programming, digits perform numerical operations and represent numbers from 0 to 9. Also, in ASCII (American Standard Code for Information Interchange), each digit represents a different number. For example, the digit '0' has an ASCII value of 48, while '9' has an ASCII value of 57. These ASCII values help the computer understand and process numeric characters correctly. This concept is part of the Character Set in C, which defines how characters, including digits, letters, and symbols, are represented using ASCII or other encoding schemes.
Code Example
#include <stdio.h>
int main() {
printf("Digits:\n");
for (char ch = '0'; ch <= '9'; ch++) { // Loop through characters '0' to '9'
printf("%c ", ch); // Print each digit
}
return 0;
}
Explanation
This program displays the numbers 0 through 9 using a for loop. It starts with the character '0' and increments the character value until it reaches '9'. The loop functions smoothly because characters are internally represented by their ASCII values, which increase from ASCII 48 ('0') to 57 ('9').
The printf("%c ", ch); function is used inside the loop to print each digit, where %c is the format specifier for characters. The space character that comes after the %c format specifier is what causes the gaps between the digits in the output.
Output
Digits:
0 1 2 3 4 5 6 7 8 9
3. Special Characters
Special characters in C programming make it possible to do crucial tasks including flow control, value assignment, comparisons, and calculations. Without them, basic tasks wouldn't be possible. Common types include:
- Arithmetic Operators: + (addition), - (subtraction), * (multiplication), / (division)
- Logical Operators: && (logical AND), || (logical OR)
- Assignment Operator: Use the assignment operator = to give variables values.
- Comparison Operators: < (less than), > (greater than), <= (less than or equal to), >= (greater than or equal to), == (equal to), != (not equal to)
- Other Operators: Additional operators include & (bitwise AND), | (bitwise OR), % (modulus, returns the remainder), and more.
Code Example
#include <stdio.h>
int main() {
int a = 5, b = 10;
printf("Addition: %d\n", a + b); // Using '+'
printf("Subtraction: %d\n", b - a); // Using '-'
return 0;
}
Explanation
In this code, we first include the <stdio.h> library to use input/output functions. A and B are two defined integer variables, and 5 and 10 have been allocated their corresponding values. Then, we use the printf function to display the result of adding a and b, and subtracting a from b. The addition and subtraction operations are performed using the + and - operators.
The printf function prints the results of these arithmetic operations to the console. The program outputs the addition result first (15), then the subtraction result (5), and finally returns 0 to indicate the successful execution of the program.
Output
Addition: 15
Subtraction: 5
4. Whitespace Characters
Whitespace characters are essential in programming for organizing code and controlling how the program displays output. Newlines (\n), tabs (\t), and spaces are among the characters in the Character Set in C. While they don’t appear as visible symbols on the screen, they are required to make the code readable and format the program’s output properly.
Code Example
#include <stdio.h>
int main() {
printf("Hello,\tWorld!\n"); // Tab space
printf("Hello,\nWorld!"); // New line
return 0;
}
Explanation
The first printf statement prints the string "Hello," followed by a tab space (\t), and then the word "World!". Between "Hello," and "World!," there is a gap caused by the tab space.
The second printf statement prints "Hello," again, but this time followed by a newline character (\n), which moves the cursor to the next line before printing "World!". The newline separates the two words into different lines.
Output
Hello, World!
Hello,
World!
5. Control Characters
Control characters are special characters in the ECS that do not display as symbols on the screen but control text formatting or processing. These characters are essential for managing how text is displayed and processed during program execution. Control characters are essential for specifying text behavior in the character set in C, including line breaks, tabs, and other formatting directives.
Control Character |
Description |
ASCII Value |
Use Case |
Newline (\n) |
This moves the cursor to the next line. |
10 |
It is used to print text on a new line. |
Carriage Return (\r) |
It moves cursor to the beginning of the line. |
13 |
It is used to overwrite text on the same line. |
Tab (\t) |
Inserts a horizontal tab. |
9 |
It is used for formatting text. |
Form Feed (\f) |
It moves the cursor to the top of the next page. |
12 |
It is used in page formatting. |
Backspace (\b) |
This deletes the previous character. |
8 |
It corrects input or output. |
Escape (\e) |
It starts an escape sequence (less common in C) |
27 |
It is used for special characters. |
Null Character (\0) |
Marks the end of a string |
0 |
It terminates strings in C. |
Code Example
#include <stdio.h>
int main() {
// Print column headers
printf("Name\tAge\tLocation\n");
// Print data rows
printf("Alice\t25\tNew York\n");
printf("Bob\t30\tLos Angeles\n");
printf("Charlie\t35\tChicago\n");
return 0;
}
Explanation
The data and table headers are printed in this C program using the printf() function. The \t (tab) control character is used to add horizontal space between columns, while the \n (newline) control character moves the cursor to the next line after each row of data.
Each call to printf() prints one line of the table, and the program prints three rows of data: one for "Alice," one for "Bob," and one for "Charlie." The use of \t ensures that the columns are aligned.
Output
Name Age Location
Alice 25 New York
Bob 30 Los Angeles
Charlie 35 Chicago
6. Escape Sequences
In C, escape sequences are character combinations that allow you to include special characters in a string that are otherwise hard or impossible to type directly. They begin with a backslash (\) and are followed by one or more characters. Escape sequences are especially useful when handling characters like quotes or control characters in strings. Here's a list of common escape sequences in C:
Escape Sequence |
Description |
ASCII Value |
Use Case |
\n |
Newline |
10 |
Moves the cursor to the next line |
\t |
Horizontal Tab |
9 |
Moves the cursor to the next tab stop |
\\ |
Backslash |
92 |
Used to represent a backslash character |
\' |
Single Quote |
39 |
Used to represent a single quote character |
\" |
Double Quote |
34 |
Used to represent a double quote character |
\0 |
Null Character |
0 |
Marks the end of a string |
\b |
Backspace |
8 |
Deletes the previous character |
\r |
Carriage Return |
13 |
Moves the cursor to the beginning of the line |
\f |
Form Feed |
12 |
Moves the cursor to the next page |
Code Example
#include <stdio.h>
int main() {
// Using escape sequences to format output
printf("Hello, World!\n"); // Newline
printf("Tab\tIndented\n"); // Horizontal tab
printf("She said, \"Hello!\"\n"); // Double quotes in string
printf("Backslash: \\\n"); // Display a backslash
return 0;
}
Explanation
This C program uses printf to demonstrate how escape sequences work. "Hello, World!" is printed first, and then \n is used to move to a new line. Then, it prints "Tab" followed by a tab space and the word "Indented" using \t. After that, it prints a sentence with double quotes around the word "Hello!" using \". Finally, it demonstrates how to use \\ to print a backslash. These escape sequences help format the output to make it look neat and clear.
Output
Hello, World!
Tab Indented
She said, "Hello!"
Backslash: \
7. Extended Characters
Extended characters in C are special characters that go beyond the basic English letters and numbers. These include symbols and letters from other languages, like ©, ®, ñ, Ω, and é.
They are used when you want to support different languages or need special symbols. These characters use extra codes (from 128 to 255 in extended ASCII or Unicode) and may look different depending on your system. You can show or use them in your program using special codes or directly as characters, depending on how your system and compiler handle them.
Example
#include <stdio.h>
int main() {
// Printing extended characters
printf("Copyright Symbol: \xA9\n"); // (c) (ASCII 169)
printf("Registered Symbol: \xAE\n"); // (r) (ASCII 174)
printf("Tilde Character: \x7E\n"); // ~ (ASCII 126)
printf("Spanish Letter: \xF1\n"); // ñ (ASCII 241)
return 0;
}
Explanation
This C program shows how to print extended characters using their ASCII values with \x followed by a hexadecimal number. It prints symbols like the copyright (©), registered (®), tilde (~), and the Spanish letter ñ. Each printf line uses a specific code to display these characters on the screen. This is useful when you want to include special symbols in your program’s output.
Output
Copyright Symbol: (c)
Registered Symbol: (r)
Tilde Character: ~
Spanish Letter: ñ
Time and Space Complexity for Source Character Set Types in C
Character Set Type |
Time Complexity |
Space Complexity |
Alphabets |
O(26 + 26) = O(52) = O(1) (constant time, as the number of iterations is fixed) |
O(1) (constant space because there aren't many variables utilized) |
Digits |
O(1) for each iteration, which results in O(10) overall, which simplifies to O(1) since the number of digits (0-9) is constant. |
O(1) because the loop stores variables and output in constant memory. |
Special Characters |
O(1) The operations are constant time operations since they involve simple arithmetic. |
O(1) The program uses a fixed amount of space to store two integers (a and b). |
Whitespace Characters |
O(1) Constant time complexity since the size of the input has no effect on the operation. |
O(1) because it only uses a fixed amount of memory to store the string literals and doesn't require extra space that scales with input. |
Control Characters |
O(1) The time complexity is O(1) since the program prints a fixed number of lines and performs a constant amount of work. |
O(1) The space complexity is O(1) as the program only uses a fixed amount of memory for storing strings and performing the output. |
Escape Sequences |
O(1) – Each escape sequence is a constant time operation for formatting output. |
O(1) – No additional memory is required for the escape sequences themselves. |
Extended Characters |
O(1) – Each character is represented by a constant value. |
O(1) – The program uses a fixed amount of memory to store and print the extended characters. |
What is the ASCII Character Set?
ASCII, short for American Standard Code for Information Interchange, is a standardized system that assigns a unique numeric value to each character, symbol, or control code used in computers and programming. Originally developed in the 1960s, ASCII represents 128 characters using 7 bits, ranging from 0 to 127. These include:
- Uppercase and lowercase English letters
- Digits from 0 to 9
- Punctuation marks and symbols
- Control characters like newline (\n), tab (\t), etc.
ASCII ensures consistent text representation across platforms and languages in early programming and is still the foundation for modern encodings like UTF-8.
ASCII Values for Different Types of Character Sets
Here’s a categorized table of ASCII values for different character types:
Character Type |
Character(s) |
ASCII Range / Value |
Uppercase Letters |
A to Z |
65 to 90 |
Lowercase Letters |
a to z |
97 to 122 |
Digits |
0 to 9 |
48 to 57 |
Space |
Space |
32 |
Special Characters |
! " # $ % & ' ( ) * +, -./ |
33 to 47 |
: ; < = > ? @ |
58 to 64 |
[ \ ] ^ _ ` |
91 to 96 |
{ | } ~ |
123 to 126 |
Control Characters |
NULL, BEL, BS, TAB, LF, CR |
0 to 31 |
Delete Character |
DEL |
127 |
Each of these characters plays a role in programming, whether it’s forming variable names, structuring syntax, or controlling program output.
Uses of Character Set in C
Character set in C programming is a collection of characters recognized by the compiler and represented internally by their ASCII values. These characters are used in various ways in the C language to perform operations involving text, symbols, and control commands.
Key Uses of Character Sets in C:
- String Handling: Characters build strings for text manipulation (e.g., concatenation, search).
- Input/Output: Used in reading/writing data via functions like getchar(), printf(), etc.
- Control Flow: Useful in conditions, loops, and switch cases for input checks.
- Character Encoding: Maps characters to ASCII values for data encoding/decoding.
- File Operations: Handles character-by-character file reading/writing (fgetc(), fputc()).
- Arithmetic/Logic: Includes symbols like +, -, and && used in calculations and logic.
- Memory Use: Efficient character storage in arrays/strings, saving memory space.
Important Use Cases Of Character Set in C
Here are some of the important use cases where the character set in C plays a vital role:
1. String Comparison
C uses character sets to compare strings lexicographically. Functions like strcmp() and strncmp() compare characters one by one according to their ASCII values.
if (strcmp(str1, str2) == 0) {
printf("The strings are equal.\n");
}
2. Character Classification
Functions like isalpha(), isdigit(), islower(), and isupper() check whether a character belongs to a certain category. This is commonly used in input validation.
if (isalpha(c)) {
printf("The character is a letter.\n");
}
3. Character Encoding & Decoding
Characters are encoded using their ASCII values in communication protocols, file handling, and cryptography. For example, encoding characters into their ASCII equivalents and decoding them back into readable text.
4. Text Search and Pattern Matching
The character set is used for searching substrings in strings. Functions like strchr() and strstr() depend on the character set to locate characters or patterns in strings.
5. Parsing and Tokenization
In parsers and tokenizers, characters are used to divide a string into smaller parts (tokens) based on delimiters like spaces, commas, or semicolons.
6. Character-based Operations in Cryptography
Cryptographic algorithms often manipulate characters by shifting their ASCII values or encoding them to obscure the original text. The Caesar cipher, for instance, relies on the alphabet's shifting characters.
7. Character-based Buffers
When processing input, especially from files or network streams, buffers of characters are used to store and manipulate the data before processing or displaying it.
8. Control Characters for Formatting
The C language utilizes special control characters like \n (newline), \t (tab), and \r (carriage return) for formatting output, controlling cursor movement, or controlling printing behaviors on consoles.
Program to Count the Frequency of Each Character in a Given String
#include <stdio.h>
#include <string.h>
int main() {
char str[100];
int freq[256] = {0}; // Array to store frequency of characters
printf("Enter a string: ");
fgets(str, sizeof(str), stdin); // Read input string
for (int i = 0; str[i] != '\0'; i++) {
freq[(unsigned char)str[i]]++;
}
// Display the frequency of each character
printf("Character frequencies:\n");
for (int i = 0; i < 256; i++) {
if (freq[i] != 0) {
printf("'%c' = %d\n", i, freq[i]);
}
}
return 0;
}
Explanation
The program calculates how often each character appears in the string. Following the user's input of a string, it records the frequency of each character using an array. Each character in the string is checked by the software, which then changes the array's count. Lastly, the frequency of each character that occurs in the string is printed out. For example, if the input is "Hello", the program will show how many times 'H', 'e', 'l', and 'o' appear.
Output
Enter a string: Hello World
Character frequencies:
'H' = 1
'e' = 1
'l' = 3
'o' = 2
' ' = 1
'W' = 1
'r' = 1
'd' = 1
Program to Check Palindrome Sequence
#include <stdio.h>
#include <string.h>
int isPalindrome(char str[]) {
int start = 0;
int end = strlen(str) - 1;
while (start < end) {
if (str[start] != str[end]) {
return 0; // Not a palindrome
}
start++;
end--;
}
return 1; // It is a palindrome
}
int main() {
char str[100];
printf("Enter a string: ");
fgets(str, sizeof(str), stdin);
// Remove the newline character if present
str[strcspn(str, "\n")] = '\0';
if (isPalindrome(str)) {
printf("The string is a palindrome.\n");
} else {
printf("The string is not a palindrome.\n");
}
return 0;
}
Explanation
- The isPalindrome function determines whether a string reads the same in both forward and reverse directions.
- It advances toward the middle after comparing characters from the string's two ends.
- It indicates that the string is not a palindrome by returning 0 (false) if any mismatch is discovered.
- The loop returns 1 (true) if it finds no mismatch, indicating that the string is a palindrome.
Output
Enter a string: sos
The string is a palindrome.
C Program to Print Every Character in the C Character Set
#include <stdio.h>
int main() {
printf("Characters in the C character set:\n");
// Printing alphabets (uppercase and lowercase)
printf("Alphabets (Uppercase and Lowercase):\n");
for (char ch = 'A'; ch <= 'Z'; ch++) {
printf("%c ", ch);
}
for (char ch = 'a'; ch <= 'z'; ch++) {
printf("%c ", ch);
}
// Printing digits
printf("\nDigits:\n");
for (char ch = '0'; ch <= '9'; ch++) {
printf("%c ", ch);
}
// Printing special characters
printf("\nSpecial Characters:\n");
for (char ch = 33; ch <= 47; ch++) { // ASCII values for special characters like !, ", #, $
printf("%c ", ch);
}
for (char ch = 58; ch <= 64; ch++) { // ASCII values for special characters like :, ;, <, =
printf("%c ", ch);
}
for (char ch = 91; ch <= 96; ch++) { // ASCII values for special characters like [, \, ], ^
printf("%c ", ch);
}
for (char ch = 123; ch <= 126; ch++) { // ASCII values for special characters like {, |, }, ~
printf("%c ", ch);
}
// Printing extended characters (using ASCII values 128-255)
printf("\nExtended Characters (ASCII 128-255):\n");
for (int i = 128; i <= 255; i++) {
printf("%c ", i);
}
printf("\n");
return 0;
}
Explanation
- The program loops through different ranges of ASCII values to print uppercase and lowercase alphabets, digits, special characters, and extended characters.
- It uses the printf function to display each character.
- Special characters are printed by looping through their corresponding ASCII values.
Output
Characters in the C character set:
Alphabets (Uppercase and Lowercase):
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z
Digits:
0 1 2 3 4 5 6 7 8 9
Special Characters:
! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
Extended Characters (ASCII 128-255):
Ç ü é â ä à å ç ê ë è ï î ì Ä Å É æ Æ ô ö ò û ù o etc.
Points to Note while dealing with Character Set in C
- Character vs String Confusion:
A single character uses single quotes ('A'), while strings use double quotes ("A"). Mixing them up causes compilation errors. - Incorrect Data Type Usage:
Using int instead of char may work, but it wastes memory. Always use char for single characters. - Buffer Overflow in Strings:
Forgetting the null terminator \0 in character arrays can lead to undefined behavior or memory issues. - Signed vs Unsigned Char:
Depending on the compiler, the char may be signed or unsigned. This affects how values >127 are interpreted. - Escape Sequence Misuse:Mistyping escape sequences (e.g., using \c, which is invalid) will result in compiler warnings or errors.
Conclusion
Character set in C are essential for writing clear and efficient programs. The Source Character Set (SCS) is used when writing the code, while the Execution Character Set (ECS) determines how characters are handled during the program's execution. Understanding the different types of characters, such as letters, numbers, special symbols, and spaces, helps programmers effectively manage data, perform tasks, and organize their code.
Frequently Asked Questions
1. Which are the two main types of character sets in C?
The two main types are the Source Character Set (SCS), which includes characters used in writing code, and the Execution Character Set (ECS), which contains characters that are processed during execution.
2. What is the ASCII character set?
The ASCII (American Standard Code for Information Interchange) character set defines 128 characters, including letters, numbers, punctuation, and control characters.
3. What are the control characters in C?
Control characters are non-printing characters like newline (\n), tab (\t), and backspace (\b) that control text formatting and manage how text is displayed.
4. How do escape sequences work in C?
Escape sequences in C are combinations of characters starting with a backslash (e.g., \n for newline) that represent special characters not easily typed or visible.
5. Can C handle extended characters beyond ASCII?
Yes, C can handle extended characters beyond ASCII through multibyte encodings like UTF-8, allowing characters from different languages and symbols.
6. What functions are available for character manipulation in C?
The < ctype.h> library provides functions like isalpha(), isdigit(), toupper(), and tolower() to classify and convert characters.
7. How does the character set affect string handling in C?
The character set is required for string handling in C. Strings are arrays of characters, and understanding the character set helps create, manipulate, and process strings.