Basic C Syntax
Identifiers
An identifier is a keyword used to specify or give meaning (i.e., to understand the role of something just by reading its name, making programs faster to read) to variables, functions or actions, types, or structures. Identifiers are constructed as follows:
- they consist of a sequence of characters, digits, and the special character
_ - they do not start with a digit
- regarding characters, case (uppercase/lowercase) matters
Basic Types
All data represented and stored in a computer is in binary (base 2 representation). Even though, for simplicity, you can enter data in decimal, octal, hexadecimal form… it is stored internally in binary.
Variables
The concept of a variable stems from the desire to store data in memory — data that can be retrieved easily via its identifier. Since this data can be modified by the program, it is not constant over time, hence the term variable. Of course, data can be of different natures: numbers, sequences of numbers, text, images… This is why types are defined to categorize different variables. A variable of a specific type can only contain data of the same type. Similarly, some operations are valid on a given type but not on others. Ultimately, a variable is an element that:
- occupies space in memory (and the space occupied is defined by the type),
- has an identifier, which allows the programmer to not worry about where in memory the variable is located
- has a type that defines which operations are valid (i.e., the valid operations are not the same whether you are manipulating text, numbers, images…).
Before using a variable, it must be declared, meaning its type and identifier must be specified.
Variable identifiers are in lowercase with the _ character as a word separator.
Integer Type
The integer type is used to represent a positive or negative whole number. In C, the type int encodes signed integers on 32 bits (4 bytes). Consequently, the integers that can be encoded with the int type belong to the interval [-2^31, 2^31-1].
int x; // déclaration d'une variable x de type entier
int a = 1024; // idem avec affectation d'une valeur initiale décimale
int b = 012; // idem avec affectation d'une valeur initiale octale
int c = 0x4f; // idem avec affectation d'une valeur initiale hexadécimale
Other types exist for representing integers:
unsigned inton 4 bytes representing non-negative integers [0, 2^32-1],shorton 2 bytes representing signed integers [-2^15, 2^15-1],unsigned shorton 2 bytes representing non-negative integers [0, 2^16-1],longon 8 bytes representing signed integers [-2^63, 2^63-1].unsigned longon 8 bytes representing non-negative integers [0, 2^64-1].
This implies for example that:
short x = 40000; // ERREUR ! (la valeur d'un short est au max 32767)
unsigned short y = 40000; // OK !
int z = 40000; // OK !
A number of operations exist on these integer types:
int a = 5;
int b = 3;
int c = 10;
int d = a + b; // d = 8
int e = a - b; // e = 3
int f = a * b; // f = 15
int g = c / b; // g = 3 (quotient de la division entière)
int h = c % b; // h = 1 (reste de la division entière)
int i = a << 2; // i = 20 (décalage binaire de deux bits vers la gauche)
int j = a >> 2; // j = 1 (décalage binaire de deux bits vers la droite)
int k = a & b; // k = 1 (ET-logique bit à bit)
int l = a | b; // l = 7 (OU-logique bit à bit)
And of course it is possible to compare two integers using the comparison operators == (equality), != (inequality), <, >, <=, >=
Boolean Type
The bool type is used to represent a boolean (two states TRUE or FALSE). In C this type does not truly exist (meaning a type that takes 1 bit of memory). There is a bool type defined since 1999 but it is in fact an integer encoded on 1 byte which corresponds to false when it equals 0 and true when it equals 1 (in fact any value different from 0 is considered true).
#include <stdbool.h>
// bool n'est pas un type standard, une inclusion de sa déclaration est
// nécessaire avant de pouvoir utiliser des variables de types bool
bool a;
bool b = false;
bool c = 1; // équivalent à true
For booleans, the logical operators obviously exist: && for logical-AND, || for logical-OR, and ! for complement.
#include <stdbool.h>
bool a = true;
bool b = false;
bool c = a && b; // c = false
bool d = a || b; // d = true
bool e = !a; // e = false
Floating-Point Type
The floating-point type is used to represent any real number regardless of its representation. In C the internal representation is necessarily in scientific notation (i.e., like 0.5e^-4^) and is based on the IEEE754 standard which we will not detail here. Two types allow representing real numbers: single-precision float encoded on 4 bytes and double-precision double encoded on 8 bytes.
float a = 10.495;
double b = 756436.2357;
float c = 1.0f;
:::info Note
The f suffix is sometimes added because by default numbers are interpreted as type double. The f is added when you want to make sure the number is interpreted as a float (for optimization reasons). You will certainly encounter this notation during this year but its use will remain marginal.
:::
The same arithmetic and comparison operators as for integers exist, except %, <<, >> which have no meaning on real numbers.
:::info Note
Of course the / operator performs a real division when you manipulate floating-point variables.
:::
Character Type
The character type is used to represent any displayable character (letters, digits, punctuation marks…). In C, a character is encoded on one byte. Each character is in fact represented by an integer (signed or unsigned, depending on the architecture) based on the extended ASCII table. To represent the letter a, use the following syntax: 'a'. Throughout the year you will use a number of special characters: '\t' (tab), '\n' newline…
char c;
char d = 'a';
The use of the ASCII table implies that it is not the symbol of the character that is stored in memory but its ASCII code which is an integer. As a result, the following operations are possible:
int i = 'a'; // un code ASCII est codé sur 1 octet et donc
// est également stockable dans un int (4 octets)
char c = 65; // 65 est le code ASCII de 'A'
char d = c + 2; // d vaut 67 qui est le code ASCII de 'C'
Other Types
All other types derive from these basic types; they will be covered later (for arrays) and in semester 6 (for Cartesian structures).
Expressions
After storing data (previous section), processing it is done via expressions such as computing the sum of variables or testing whether an integer variable is less than another. We thus define numeric expressions and boolean expressions.
For example in C:
1 + x *y - 3(x<7 && y> 3) || b
Thus, we observe that an expression consists of operators, sub-expressions, and elementary sub-expressions (variable or constant).
In C, an expression can be (among others):
- an identifier
- a constant
- a string literal (“hello!”)
- a numeric expression
- a boolean expression
- an assignment expression. This is the expression used to store values in variables (assignment).
In C, an assignment is written as follows:
i = 5;
c = 'a';
An assignment is always structured as follows:
- on the left: a variable or expression that can be assigned to (called an
l-value) - on the right: a variable, an expression, or a constant (called an
r-value)
Thus the syntax 13 = i is not possible in C because 13 is not an l-value (it is a constant). In terms of how it works, the right-hand value is computed (evaluated) and assigned to the left-hand variable. In C, the value of the assignment expression is the value computed on the right. For example x = (y = 8) + 1 sets x to 9 and the value of the entire expression is also 9.
There are also assignments that use an operator encountered very frequently (called compound assignments):
j = i++; // j = i; i = i + 1 ;
j = ++i; // i = i + 1; j = i;
j = i--; // j = i; i = i - 1 ;
j = --i; // i = i - 1; j = i;
i += 4; // i = i + 4; idem avec -= /= *= ...
Constants
A constant is, as its name suggests, a value that does not change during the execution of a program. This is a useful feature since it allows writing evolutionary code that can be parameterized.
In C there are several ways to define a constant, we will see two of them.
define
The first syntax for defining a constant is the following:
// Déclaration d'une constante
#define CONSTANTE valeur
// Syntaxe à utiliser quand la déclaration
// ne tient pas sur une seule ligne
#define AUTRE_CONSTANTE (autre_ leur)
This syntax creates an alias (i.e., shortcut) that the compiler will use to replace all occurrences of CONSTANT with a value throughout the rest of the program. It is important to note that there is no final ;. This syntax has two drawbacks:
valuehas no type, which can cause problems in certain cases where operators differ depending on the type being manipulated.- it is not a variable assignment: there is no evaluation but only a literal substitution, which can lead to errors.
Here is an example showing the limits of #define:
#define X 2
#define Y 4
#define N X + Y
int z = 3 * N; // z = 10, ERREUR !
Here, the compiler by performing the substitution generates the expression int z = 3 * X + Y;. This error can be corrected by writing #define N ( (x) + (y) ).
const
This syntax avoids the drawbacks mentioned previously: the absence of type and the absence of assignment. The syntax is close to the declaration of a variable:
const int X = 2;
const int Y = 4;
const int N = X + Y;
int z = 3 * N; // z = 18, OK !
The main drawback of this approach is that the constant is in memory and can therefore be accidentally modified (through memory overflows for example).
In conclusion
Both approaches are valid, each is usable. You just need to be aware of their limitations which can create bugs in your programs.
Instructions
An instruction is an elementary line of a C program. Instructions can perform a simple calculation, assign variables… The instructions covered in this course will be:
- simple instructions
- compound instructions
- conditional instructions
- iteration instructions
Simple Instruction
In C, an instruction is also an expression followed by ; (semicolon). For example:
int x = 2;
int y = x + 4;
printf ("Bonjour !");
int z = pow (x, y);
Compound Instruction
The compound instruction or block is defined in C by a group of instructions enclosed in braces {}. A block has several benefits:
- grouping several instructions into the syntactic form of a single instruction (see the Fundamental Computer Science course in semester 7).
- declaring variables accessible only within the block. Once outside the block, the variables are no longer visible.
int z;
z = 10;
{
int x;
x = 4; // OK x a été déclaré dans le block
z = 3;
z = z + x;
}
x = 0; // ERREUR ! x n'est pas accessible
Conditional Instruction
The conditional instruction is a compound instruction where certain instructions are only executed under certain conditions. The syntax in C is as follows:
if ( condition )
{
// instructions1
}
else
{
// instructions2
}
:::info Note
condition is a boolean expression. If its evaluation gives true then the first set of instructions is executed, otherwise it is the set instructions2 that is executed. Note that the else{} block is optional.
:::
Here are some examples in C:
if ( a > b )
{
max = a;
}
else
{
max = b;
}
if ( 2 + a <= b )
{
printf("Less than or equal!");
}
if ( a > b )
{
if (c < d)
{
u = v;
}
else
{
i = j;
}
}
// a compact version can be used (see coding conventions)
if ( a ) printf("a is different from 0");
:::warning Warning
Recall that the comparison operator is == and that the assignment operator is =. But since the use of both operators produces expressions that can be evaluated, they can be used in conditional instructions… except that for readability/maintenance reasons, assignments inside conditions should be avoided as much as possible.
:::
Consider the two program fragments below. What are the values of t and x?
int x = 3, t;
if ( x == 4 )
{
t = 3;
}
else
{
t = 2;
}
int x = 3, t;
if ( x = 4 )
{
t = 3;
}
else
{
t = 2;
}
This second instruction is strongly discouraged (very little benefit and a source of many bugs). To avoid errors related to confusion between the two operators, it is possible to use Yoda conditions, namely (value == variable). If by mistake you write if (5 = count), the compiler generates an error since 5 is not an l-value and therefore cannot receive an assignment.
Iteration Instruction
The objective of this instruction is simple: to automate the writing of certain repetitive instructions rather than writing them manually. We often speak of a loop since it is a part of the program that is executed multiple times.
for Loop
The classic syntax is:
int cpt;
for ( cpt = inf; cpt <= sup; cpt = cpt + inc )
{
// instructions
}
// If your counter only needs to be accessible
// inside the loop, you can write:
for ( int cpt = inf; cpt <= sup; cpt = cpt + inc )
{
// instructions
}
The execution of a for loop proceeds as follows:
cptis initialized with the valueinf- the expression
cpt <= supis evaluated. If it equalsfalse, we exit the loop (i.e., the block between braces) - Otherwise (i.e., if it is
true) the instructions in the block are executed. - The expression
cpt = cpt + incis then executed… - …then the expression
cpt <= supis evaluated again, and so on.
:::info Note
The for loop should be used first when the number of iterations is known.
:::
while Loop
The classic syntax in C is:
while (expression)
{
// instructions
}
The execution of a while loop proceeds as follows: as long as expression evaluates to true, the block of instructions is executed. This means, among other things, that if the condition is initially false, the block of instructions is never executed.
Here are some examples in C:
while ( x > 0 )
{
x = x - 1;
z = z + x;
}
while ( x > 0 ) x--;
// a single instruction, braces are optional
do-while Loop
The classic syntax is:
do
{
// instructions
} while ( expression );
The execution of a do-while loop proceeds as follows: the block of instructions is executed once, then the expression is evaluated. If it is true, we loop back, otherwise we exit the loop. Unlike the while loop, even if expression is false, the block instructions are executed at least once.