A Beginner's Guide to Mastering Data Types in Rust

In this lesson, we'll explore the fundamentals of data types in Rust, including how to work with primitive types, compound types, and more. By the end, we'll have a solid understanding of how Rust handles data and how to effectively use data types in our own programs.

In the previous lessons, we glimpsed a few basic data types in Rust, such as i32, u32, f64, &str, and String. In this lesson, we will dive deeper into more data types supported in Rust. These can be categorized into scalar and compound data types.

Scalar Data Types

A scalar data type represents a single value, such as an integer, a floating-point number, a boolean, or a string literal. Since they represent the most basic and low-level data from which more complex data structures can be created, they are also called “primitive data types.”

Number

There are several data types to represent numbers in Rust, and we have already seen a few in previous lessons.

Signed Integers

A signed integer can represent both positive (>= 0) and negative (< 0) integers. Rust provides several signed integer data types based on the size of the integer we want to store, allowing for space savings and increased performance. For example, if you want to store a small integer, you should use i8, which stores integers in 8 bits, compared to i128, which takes 128 bits (16 bytes) to store an integer.

Here is a table of all supported unsigned integers, along with their bit sizes and the values they can store.

Data TypeBit SizeMin ValueMax Value
i88-128127
i1616-32,76832,767
i3232-2,147,483,6482,147,483,647
i6464-9,223,372,036,854,775,8089,223,372,036,854,775,807
i128128-170,141,183,460,469,231,731,687,303,715,884,105,728170,141,183,460,469,231,731,687,303,715,884,105,727
isizePlatform-dependent (32 or 64 bits)Depends on architectureDepends on architecture

The isize data type is slightly different from the others, as it is an alias for either i32 or i64, depending on the system architecture for which the program will be compiled. It is primarily used in operations where the integer size must match the system’s architecture, such as array indexing, memory and pointer-related operations, or some low-level system tasks.

fn main() {
    let a: i8 = 127;
    let b: i16 = 128;
    let c: i32 = -36000;
    let d: i64 = 400300200100;
    let e: i128 = 900800700600500400300200100;

    let i: isize = 1;
    let arr: [i32; 3] = [1, 2, 3];
    let v: i32 = arr[i]; // v = 2
}

As Rust supports type inference, the default type assigned to a variable when it is initialized with an integer value is i32.

Unsigned Inetegers

Unlike signed integers, an unsigned integer can represent only positive integers (>= 0). They are the complement of signed integer types; instead of i, they begin with u for unsigned.

Data TypeBit SizeMin ValueMax Value
u880255
u1616065,535
u323204,294,967,295
u6464018,446,744,073,709,551,615
u1281280340,282,366,920,938,463,463,374,607,431,768,211,455
usizePlatform-dependent (32 or 64 bits)0Depends on architecture

Similar to isize, the usize type is platform-dependent and has similar use cases as isize.

fn main() {
    let a: u8 = 255;
    let b: u16 = 256;
    let c: u32 = 68_000;
    let d: u64 = 400_300_200_100;
    let e: u128 = 900_800_700_600_500_400_300_200_100;

    let i: usize = 1;
    let arr: [u32; 3] = [1, 2, 3];
    let v: u32 = arr[i]; // v = 2
}

Floating-Point Numbers

Rust supports storing floating-point numbers in a single-precision floating-point type f32, where the value is stored in 32 bits of memory, or in a double-precision floating-point type f64, which takes 64 bits to store the value.

Data TypeBit SizeMin ValueMax Value
f32321.17549435e-383.40282347e+38
f64642.2250738585072014e-3081.7976931348623157e+308

As Rust supports type inference, the default type assigned to a variable when it is initialized with a floating-point number is f64.

fn main() {
    let a: f32 = 3.14;
    let b: f64 = 1.414;
}

If you would like to get smallest or largest numeric value these types support, then we can use the ::MAX and ::MIN constants these type exposed. For example, i8::MIN is -128 while i8::MAX is 127. This works for all above numeric types.

NaN

In certain situations, the result of a mathematical operation cannot be interpreted as a meaningful number. For example, the result of 0.0 / 0.0 or the square root of a negative number doesn’t yield a meaningful number. To represent such results, Rust uses the NaN (Not a Number) value. This can be accessed through f32::NAN and f64::NAN.

fn main() {
    let x: f64 = 0.0 / 0.0;

    println!("x: {}", x);
}

// x: NaN

NaN only exists in the context of floating-point numbers (see the IEEE 754 standard). When dividing 0.0 / 0.0 in floating-point arithmetic, the result will be NaN. However, if you attempt to divide an integer by zero (0 / 0), a Rust program will panic at runtime with the error attempt to divide by zero.

fn main() {
    let x: f64 = 1.0 / 0.0;
    let y: f64 = -1.0 / 0.0;

    println!("x: {}", x);
    println!("y: {}", y);
}

// x: inf
// y: -inf

Any other number, except 0.0, when divided by 0.0 in floating-point arithmetic, will return either f64::INFINITY or -f64::INFINITY. Both NAN and INFINITY values come from the f32 and f64 types, since floating-point numbers are represented using these types.

fn main() {
    println!("100.0 > NaN: {}", 100.0 > f64::NAN);
    println!("100.0 < NaN: {}", 100.0 < f64::NAN);
    println!("inf < NaN: {}", f64::INFINITY < f64::NAN);
    println!("inf > NaN: {}", f64::INFINITY > f64::NAN);
    println!("NaN == NaN: {}", f64::NAN == f64::NAN);
    println!("NaN != NaN: {}", f64::NAN != f64::NAN);
}
$ cargo run
100.0 > NaN: false
100.0 < NaN: false
inf < NaN: false
inf > NaN: false
NaN == NaN: false
NaN != NaN: true

An interesting fact about NaN is that it’s not comparable with other numbers, not even itself. This happens because NaN can’t be defined as a real value, so it can’t be compared with any real number, not even with itself. Any comparison with other numbers, even with itself, will result in false, as shown above, except for NaN != NaN, which will result in true.

Due to this behavior, NaN is unordered, which means floating-point numbers (f32 and f64) aren’t fully ordered. A fully ordered type would provide consistent comparison results for any value. However, due to the presence of the NAN value, this can’t be guaranteed for f32 and f64. This has some consequences for compound data types like HashMap and HashSet, which are explained in the next sections.

fn main() {
    let x = 0.0 / 0.0;

    if x == f64::NAN {
        println!("[1] x is NaN");
    }

    if x.is_nan() {
        println!("[2] x is NaN");
    }
}
$ cargo run
[2] x is NaN

Since NaN == NaN is always false, we can’t use the == operator to check if a value is NaN. Instead, we can use the .is_nan() method on f32 and f64 type to check if a value is NaN.

If the section below doesn’t make a lot of sense due to the unfamiliar types being used and concepts like traits still yet to be discussed, please revisit this section once you are more familiar with them.

fn main() {
    let mut iv: Vec<i32> = vec![0, 1, 2, -1];
    iv.sort();

    let mut fv: Vec<f64> = vec![0.0, 1.0, 2.0, -1.0];

    // Error: the trait bound `f64: Ord` is not satisfied
    // fv.sort(); // <-- uncomment
}

In the example above, sorting a Vector of floating-point numbers using the .sort() method is not supported because the .sort() method requires the type (of Vector) to implement the Ord trait, which in turn requires the Eq trait to be implemented. f64 and f32 do not implement these traits, which is why we encounter an error.

The PartialEq trait enforces a type to implement the eq and ne methods to allow values of those types to be checked for equality. For example, 0.1 == 0.2 is allowed since f64 (and f32 as well) implement the PartialEq trait. This trait enforces the symmetric rule: a == b implies b == a, and the transitive rule: a == b and b == c implies a == c. However, they do not implement the Eq trait, which is a marker trait that guarantees the reflexivity rule: a == a. Since NaN != NaN, this breaks reflexivity, which is why f64 and f32 do not implement the Eq trait.

A marker trait in Rust is a trait that does not have any methods or associated functions. Its purpose is to provide additional information or metadata about a type, usually for compile-time checks or optimizations, rather than to define behavior.

use std::collections::HashMap;

fn main() {
    let mut hm: HashMap<f64, i32> = HashMap::new();

    hm.insert(1.0, 2);
}
$ cargo run
the method `insert` exists for struct `HashMap<f64, i32>`, but its trait bounds were not satisfied
 --> src/main.rs:6:8
  |
6 |     hm.insert(1.0, 2);
  |        ^^^^^^
  |
  = note: the following trait bounds were not satisfied:
          `f64: Eq`
          `f64: Hash`

Similarly, Rust won’t allow us to insert values into a HashMap or HashSet where the keys are f64 or f32. This happens because keys of these data types are hashed for efficient storage and retrieval of values. At times, different keys may produce the same hash (called a hash collision), which is why Rust also compares the actual key values in the collection to check if a key exists. Since NaN can’t be compared to itself, when HashMap or HashSet checks if a NaN value already exists in the collection, it will always return false because NaN == NaN is always false.

We can use a wrapper around f64 or f32 that guarantees total ordering and equality, such as using ordered_float. This crate provides a wrapper type for floats that implements Eq, Ord, and Hash, allowing us to use them in collections like HashSet and HashMap.

Non-decimal Number Formats

In addition to the base-10 (decimal) number system, Rust also supports expressing numbers in binary, octal, and hexadecimal formats. A binary number is expressed with the prefix 0b, an octal number is expressed with the prefix 0o, and a hexadecimal number is expressed with the prefix 0x. However, their final values are stored as decimal values (in base-10).

fn main() {
    let binary_val: i8 = 0b11;
    println!("binary_val: {}", binary_val);

    let octal_val: i32 = 0o11;
    println!("octal_val: {}", octal_val);

    let hex_val: i32 = 0xFF;
    println!("hex_val: {}", hex_val);
}

// $ cargo run
// binary_val: 3
// octal_val: 9
// hex_val: 255

Digit Separator

For visual convenience, you can use _ (underscore) to separate digits in a number.

fn main() {
    let a = 12_233_000;
    let b: i64 = -9_223_372_036_854_775_808;
    let c = 3.40_282_347;
    let d: i8 = -0b_01_11_11_11;
    let d = 0xFF_00;
}

Type Suffix

Apart from specifying a type with variable declaration, a variable can also infer its type from the initial value. For example, if the value assigned to a variable is an integer, then it will receive the i32 type by default. However, there is a way to provide a custom type while assigning an initial value: by using a type suffix. You can provide the data type at the end of the value as a suffix.

fn main() {
    let a = 12_233_000u64;
    let c = 3.14_f32; // optional `_`
    let d = -0b_01_11_11_11_i8;
    let d = 0xFF_00_u32;
}

Overflows

If a value to be stored in a type is larger than what it can accommodate, an overflow will occur. Similarly, if the value is too small (negative), then an underflow will occur. For example, the maximum value that i8 can store is 127, and the minimum value it can store is -128. If we store a value like 128, an overflow will occur, and the final value will be wrapped.

fn get_one() -> i8 {
    return 1;
}

fn main() {
    let a: i8 = 127 + get_one(); // 128
    let b: i8 = -128 - get_one(); // -129

    println!("a: {}", a);
    println!("b: {}", b);
}

In the example above, the values stored in a and b are greater than or less than what they can accommodate. In debug mode (during development with the debug profile), the program panics, as shown below.

$ cargo run
   Compiling hello_world v0.1.0 (/Users/thatisuday/rust/hello_world)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.20s
     Running `target/debug/hello_world`
thread 'main' panicked at src/main.rs:6:17:
attempt to add with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

In release mode (when we use the --release flag), the actual value stored will depend on the extent of the overflow. For example, 128 is 1 more than what i8 can accommodate, which is 127. So, wrapping will occur, and the value will be -128. If the overflow is 2, the final value would be -127. The same goes for -129. Since the minimum value that can be stored in i8 is -128, wrapping will occur, and the final value would be 127.

$ cargo run --release
a: -128
b: 127
  • You can use methods like wrapping_add, wrapping_sub, wrapping_mul, etc., (such as a.wrapping_add(1)) to always wrap around on overflow without causing a panic, whether you’re in debug or release mode.
  • Methods like checked_add, checked_sub, etc., return None if an overflow occurs, otherwise they return Some(result), allowing you to safely handle potential overflow.
  • Methods like saturating_add, saturating_sub, etc., clamp the result at the minimum or maximum value for the type when an overflow occurs, preventing wrapping.

Type casting

Type casting is a way to convert the value of one data type into another data type.

fn main() {
    let a: i32 = -10;
    let b: u32 = 5;

    let sum = a + b;
    println!("Sum: {}", sum);
}
$ cargo run
   Compiling hello_world v0.1.0 (/Users/thatisuday/rust/hello_world)
error[E0308]: mismatched types
 --> src/main.rs:5:19
  |
5 |     let sum = a + b;
  |                   ^ expected `i32`, found `u32`

In the example above, we have a variable a of type i32 and a variable b of type u32. Since these are two different data types, it is not legal to perform an arithmetic operation such as addition (+) in Rust. To do this, we need to ensure that both variables have the same data type. What we can do is cast b, which has an original type of u32, to i32 so that its type matches that of a.

fn main() {
    let a: i32 = -10;
    let b: u32 = 5;

    let sum = a + (b as i32);
    println!("Sum: {}", sum);
}
$ cargo run
Sum: -5

In the example above, we explicitly cast b to i32 so that an addition operation can be performed with another value of type i32. To cast a value from one type to another, we use the as keyword followed by the type we want the value to be cast into.

When it comes to casting, there are certain limitations:

  • Casting must be done explicitly. For example, in the previous example, you can’t just assign let b: u32 = a since a and b have different types, and Rust can’t perform the casting automatically. Instead, you would write let b: u32 = a as u32.
  • When casting from larger types (u32) to smaller types (u8), overflows may occur leading to unexpected results.
  • Casting from signed integers (i32) to unsigned integers (u32) can produce unexpected results.
  • When casting from floating-point numbers (f64) to integers, the fractional part will be truncated.
  • When casting from f64 to f32, precision loss may occur.

Parsing from string

You can parse a string into a number using the .parse() method on the string. However, this operation may lead to an error, which is why the .parse() method returns a Result enum that contains either an Ok or an Err value. We can use the .unwrap() method to extract the Ok value while ignoring the error value (although this isn’t an appropriate way to handle errors, and we will discuss it in the error handling lesson). We also need to specify the type for the variable in which we are storing the value since Rust can’t automatically determine the data type of the number stored in the string.

fn main() {
    let int_text = "-300";
    let fpn_text = "3.14";

    let int_val: i32 = int_text.parse().unwrap();
    let fpn_val: f32 = fpn_text.parse().unwrap();

    println!("int_val: {}", int_val);
    println!("fpn_val: {}", fpn_val);
}
$ cargo run
int_val: -300
fpn_val: 3.14

Boolean

Rust has the bool data type to handle boolean values, and its variants are true and false.

fn main() {
    let a: bool = true;
    let b = false; // b gets type `bool`

    println!("a: {} and b: {}", a, b);
}
$ cargo run
a: true and b: false

We can also parse a bool value from a string.

fn main() {
    let text_true = "true";
    let text_false = "false";

    let bool_true: bool = text_true.parse().unwrap();
    let bool_false: bool = text_false.parse().unwrap();

    println!("bool_true: {}", bool_true);
    println!("bool_false: {}", bool_false);
}
bool_true: true
bool_false: false

Character

A text character, such as A, a, or 0, can be stored as a char data type in 4 bytes. Since strings in Rust are UTF-8 encoded, a char takes 4 bytes to safely represent all possible UTF-8 characters. Just as a string literal is declared within "" (double quotes), a character is declared within '' (single quotes).

fn main() {
    let c: char = 'A';

    println!("Character c: {}", c);
}
$ cargo run
Character c: A

If you want to get the Unicode code point (integer value) of a character, you can cast it to an integer type.

fn main() {
    let c: char = 65 as char;

    println!("Character c: {}", c);
}
$ cargo run
Character c: A

Similarly, you can convert a character into its Unicode code point.

fn main() {
    let c = 'A' as u32;

    println!("Unicode value: {}", c);
}
$ cargo run
Unicode value: 65

Since, under the hood, a character is a number, they can be compared with each other.

fn main() {
    let a = 'A'; // 65
    let b = 'B'; // 66

    if a < b {
        println!("Character b is greather than a");
    }
}
$ cargo run
Character b is greather than a

The char type also supports some quality-of-life methods such as follows.

fn main() {
    let c = 'A';

    println!("Is Alphabetic?: {}", c.is_alphabetic());
    println!("Is Numeric?: {}", c.is_numeric());
    println!("Is Alphanumeric?: {}", c.is_alphanumeric());
    println!("Is Uppercase?: {}", c.is_uppercase());
}
Is Alphabetic?: true
Is Numeric?: false
Is Alphanumeric?: true
Is Uppercase?: true

A string is essentially a sequence of characters, we can convert it into a list of characters using the .chars() method. This returns an iterator that we can iterate on using the for loop.

fn main() {
    let str = "AB😀CD";

    for c in str.chars() {
        println!("{}", c);
    }
}
$ cargo run
A
B
😀
C
D

Beware that .len() method on string returns the size of the string in bytes and not the number of characers. Since a string in Rust is UTF-8 encoded, a character may take more than 1 byte. Therefore, a string with N characters may have more than N bytes. If you like to count number of characters in a string, use .chars().count() method.

fn main() {
    let str = "AB😀CD";

    println!("Number of bytes: {}", str.len());
    println!("Number of characters: {}", str.chars().count());
}
$ cargo run
Number of bytes: 8
Number of characters: 5

Compound Data Types

A compound data type is composed of one or more scalar data types. If you are familiar with other programming languages, you may be acquainted with the concept of a class (or an object). A class is a collection of one or more fields of the same or different data types, so a class is a compound data type. They are also called complex or non-primitive data types.

Although String in Rust looks like a primitive data type, under the hood it’s a struct that holds a pointer to a region in memory on the heap where the string data is stored, along with length and capacity fields. It also behaves differently from &str, which is why it deserves its own lesson, and we will discuss it in greater depth in the next lesson.

Array

An array is a fixed-length collection of values of the same data type. Since it is a fixed-length data structure, its length must be known at compile time, and all elements in the array must be initialized. Arrays are allocated on the stack, making them faster for read/write operations.

fn main() {
    let a: [u32; 5] = [1, 2, 3, 4, 5];

    println!("a: {:?}", a);
}
$ cargo run
a: [1, 2, 3, 4, 5]

In the example above, we defined an array a of length 5, which will contain elements of type u32. The type notation for array declaration is [type; size]. We need to use the {:?} debug formatter when printing an array to a string since it doesn’t implement the Display trait, which is used by the {} formatting string. The {:?} formatter is used to print developer-facing output (for types that implement the Debug trait), while {} is meant for user-facing output.

Do not worry about what a trait is; we will discuss it in a later lesson.

We can use the standard for loop to iterate through an array.

fn main() {
    let a = [1, 2, 3, 4, 5];

    for i in a {
        println!("Element: {}", i);
    }
}
Element: 1
Element: 2
Element: 3
Element: 4
Element: 5

If you are looking for a quick and easy way to initialize an array with all elements being the same value and optionally specify the type of the elements so that Rust can infer the correct type, you can do it as follows:

fn main() {
    let a = [0u32; 5];

    println!("a: {:?}", a);
    println!("Length of a: {}", a.len());
}
$ cargo run
a: [0, 0, 0, 0, 0]
Length of a: 5

In the program above, the expression [0_u32; 5] returns an array of size 5 with all elements being 0 of type u32. If you want to get the length of an array (the number of elements in the array), you can use the .len() method, which returns a usize.

fn main() {
    let mut a = [0_u32; 5];

    println!("[before] a: {:?}", a);
    println!("[before] a[1]: {}", a[1]);

    a[1] = 1;
    println!("[after] a: {:?}", a);
    println!("[after] a[1]: {}", a[1]);
}
$ cargo run
[before] a: [0, 0, 0, 0, 0]
[before] a[1]: 0

[after] a: [0, 1, 0, 0, 0]
[after] a[1]: 1

We can access an element in an array using the array[index] expression, where index is the index of the element in the array we want to access. The first element in the array has index 0; hence, a[1] returns the second element. If we want to update the array element at index index, we can do so with an assignment statement like array[index] = new_value. Just make sure that your array is mutable. Hence, we have used the mut keyword above.

However, the array[index] expression can cause the program to panic at runtime. This is because if index is greater than the maximum index that exists (array.len() - 1), the program is trying to read from a memory location that is not represented by the array, and it’s an unsafe operation, which is why the program panics.

fn main() {
    let a = [0_u32; 5];

    println!("a[5]: {}", a[5]); // a[5] doesn't exist
}
$ cargo run
thread 'main' panicked at src/main.rs:5:35:
index out of bounds: the len is 5 but the index is 5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Trying to read elements out of bounds leads to a program panic. However, instead of crashing the program, there is a safe way to read a value at an index.

fn main() {
    let a: [u32; 5] = [0_u32; 5];

    let a_at_5: Option<&u32> = a.get(5);

    match a_at_5 {
        Some(value) => println!("a[5]: {}", value),
        None => println!("a[5] doesn't exist."),
    }
}

In the program above, instead of using the a[5] expression, we used the .get() method, which accepts an index and returns an Option enum that contains a reference to the element in Some if it exists; otherwise, it returns a None value. We will discuss Option in later lessons, as well as the match statement. Using this, we can check if the value is Some or None and take action without crashing the program at runtime.

We will discuss the Option enum and enums in general in an upcoming lesson.

$ cargo run
a[5] doesn't exist.

Arrays are always fixed in size and can’t grow or shrink. If we want a growable array, we need to use Vec (Vector), which we will discuss shortly.

Here are some useful methods on array:

  • len: Returns the number of elements in the array.
  • is_empty: Returns true if the array contains no elements (always false for fixed-size arrays).
  • first: Returns a reference to the first element of the array, or None if empty.
  • last: Returns a reference to the last element of the array, or None if empty.
  • get: Returns a reference to an element at a given index, or None if out of bounds.
  • get_mut: Returns a mutable reference to an element at a given index, or None if out of bounds.
  • fill: Fills the array with a given value.

Tuple

A tuple is like an array in that its size is fixed, but it can contain elements of different data types. They are also stored on stack.

fn main() {
    let mut t: (i32, f64, char, bool) = (1, 3.14, 'A', true);

    println!("t.0: {}", t.0);
    println!("t.1: {}", t.1);
    println!("t.2: {}", t.2);
    println!("t.3: {}", t.3);

    t.3 = false;
    println!("[after] t.3: {}", t.3);
}

We define a tuple using parentheses by laying out elements separated by commas. Similarly, we can also specify the types of the tuple within parentheses, separated by commas. However, in most cases, it’s best to rely on Rust’s type inference. Like arrays, elements in a tuple are accessed by their index. However, they are accessed using the .<index> notation. Due to this restriction, we can’t access elements out of bounds.

Since tuples can contain different data types, they can also contain other tuples (also called nested tuples).

fn main() {
    let t: (i32, f64, (char, bool)) = (1, 3.14, ('A', true));

    println!("t.2.1: {}", t.2.1);
}

In the program above, the second element in the tuple t is another tuple, whose second element is a boolean.

$ cargo run
t.2.1: true

In Rust, a tuple without any elements (empty tuple) is valid and it is called a unit type. They are somewhat special because a function that doesn’t anything returns (). You can also return this explicitly.

fn my_function() {
    // empty
}

fn main() {
    let result: () = my_function();
}

In the program above, the function my_function returns nothing, but Rust implicitly returns (), which is why it can be stored in a variable result of type ().

fn main() {
    let t: (i32,) = (1,);

    println!("t.0: {}", t.0);
}
$ cargo run
t.0: 1

In Rust, if you need to have a tuple that contains only one value, you must put a comma , after the type in the tuple type signature, as well as after the value in parentheses (), as shown above. Without the comma, Rust will treat (i32) the same as i32 and (1) the same as 1.

Vector

As we learned, an array is a fixed-size collection of homogeneous elements. Since it is fixed in size and its size is known at compile time, it is stored on the stack, allowing for faster and more efficient execution of the program. However, if we want a growable collection, we need to use a Vector. A Vector is stored on the heap, which makes it a bit more expensive in terms of performance, but it does not have the size limitations of an array.

fn main() {
    let v: Vec<i32> = Vec::new();

    println!("v: {:?}", v);
}

// v: []

Well, the program above looks a little complicated to understand, but let’s break it down. Vec is a type in Rust that represents a vector, much like i32 or String. It has an associated function Vec::new(), which, when called, returns a new vector.

A Vector is internally represented by the following 3 things:

  • A pointer to the region of memory where the data in the vector is stored on the heap. This is a pointer to the first element in the vector collection.
  • A length, which indicates how many elements are currently in the collection.
  • A capacity, which indicates the amount of space reserved (in number of elements) for the data in the vector. Rust generally allocates more memory for the vector than the number of elements currently in the vector, so when we want to add more items, it doesn’t need to allocate memory on the fly. This makes these operations efficient. As the length of a vector increases, so does its capacity.

The Vec::new() expression in our program initializes a vector with a length and capacity of 0, since it doesn’t have any elements yet. The Vec type is generic (we will discuss Generics in upcoming lessons) and needs to be provided with a type that represents the type of elements we want to store in the vector.

fn main() {
    let mut v: Vec<i32> = Vec::new();

    v.push(1);
    v.push(2);
    v.push(3);
    v.push(4);
    v.push(5);

    println!("v: {:?}", v);
}

// v: [1, 2, 3, 4, 5]

We use the .push() method on a vector to add elements to it. However, in order to do that, we need to make the vector v mutable since we are modifying its value. Additionally, there is a way to initialize a vector with initial elements using the vec! macro provided by Rust’s standard library (std) in the prelude. Like arrays, vectors can also be iterated over using the for loop.

fn main() {
    let v = vec![1, 2, 3, 4, 5];

    for i in v {
        println!("Element: {}", i);
    }
}
$ cargo run
Element: 1
Element: 2
Element: 3
Element: 4
Element: 5

In the example above, we provided the initial comma-separated elements of the vector v inside [] using the vec! macro. This way, if we don’t want to modify v in the future, we don’t have to make it mutable. We can also avoid giving the vector an explicit type, as its type can be inferred by Rust from the initial elements.

Here are some useful methods on vectors:

  • push: Adds an element to the end of the vector.
  • pop: Removes and returns the last element of the vector, or None if it’s empty.
  • insert: Inserts an element at a specific index, shifting subsequent elements to the right.
  • remove: Removes and returns the element at the specified index, shifting subsequent elements to the left.
  • len: Returns the number of elements in the vector.
  • is_empty: Returns true if the vector contains no elements.
  • capacity: Returns the number of elements the vector can hold before reallocating.
  • clear: Removes all elements from the vector.
  • sort: Sorts the elements of the vector in place using the default comparison.
  • reverse: Reverses the order of elements in the vector in place.
  • contains: Returns true if the vector contains the specified value.
  • append: Moves all elements from another vector into the current one, leaving the other vector empty.
  • extend: Extends the vector by appending elements from an iterable.
  • clone: Returns a copy of the vector and its elements.

Slices

A slice is a portion of a contiguous sequence of elements within a collection. A collection could be an array, a vector, or even a string. In fact, a string literal we saw before (&str type) is a reference to a slice str of string data stored somewhere in memory (we will learn about this in the next lesson). Instead of owning the data, a variable can simply reference the data owned by another variable, allowing for safe and efficient memory management.

fn main() {
    let mut a: [i32; 5] = [1, 2, 3, 4, 5];
    let s: [i32; 3] = [a[0], a[1], a[2]];

    a[0] = 100;

    println!("a: {:?}", a);
    println!("s: {:?}", s);
}
$ cargo run
a: [100, 2, 3, 4, 5]
s: [1, 2, 3]

In the example above, the array a is a collection of i32 values with 5 elements. Our use case states that we need only the first 3 elements of the array a as a collection. We could simply create a new array s and copy the first 3 values of a. However, this approach has two downsides: first, it duplicates the data, and second, if a changes, s won’t reflect those changes.

This is where a slice comes into the picture. Since a slice is just a portion of an existing collection, it doesn’t own the data, and any changes made to the original collection will be reflected in the slice. So, how do we define a slice?

A slice has the type [T], where T is the type of element in the collection. However, direct access to the slice type [T] is not allowed in Rust because it’s a Dynamically Sized Type (DST), meaning its size is not known at compile time. This differs from an array like [i32; 5], whose size is known (5 elements of size 4 bytes each), or a u32 (4 bytes). Instead, we use a reference to a slice, whose size is known at compile time. A reference is like a pointer, but it’s an abstraction over a pointer that carries additional metadata and enforces strict ownership, borrowing, and safety rules in Rust. A reference to a slice contains a pointer to the first element in the slice (which takes usize data) and the length of the slice (also stored as usize). Therefore, the size of a refence to a slice is 2 x usize.

In Rust, when we talk about a slice, it is generally assumed that we are referring to &[T], the reference to a slice, rather than [T], the bare slice type. This is because slices ([T]) are Dynamically Sized Types (DSTs), and you cannot use them directly without some form of indirection, such as a reference (&[T]) or a smart pointer like Box<[T]>.

To create a slice, we place & before the variable (in the example above, a) and specify how many elements we want the slice to reference within [] (square brackets). Here, the & symbol is called the borrow operator, and it is used to create a reference to a value without taking ownership.

fn main() {
    let a: [i32; 5] = [1, 2, 3, 4, 5];
    let s: &[i32] = &a[0..3];

    println!("s: {:?}", s);
}

// s: [1, 2, 3]

In the example above, the variable s is a slice with the type &[i32], which means it contains a reference to a collection of i32 elements. Its initial value is a reference to the array a, from element index 0 to element index 2. The .. is called the range operator in Rust, and it creates a sequence from 0 up to, but not including, 3. If you want 3 to be included, you can use the 0..=3 expression.

A slice, under the hood, is just a pointer to an element in the collection and a length specifying how many elements to reference from this starting point. So in our example above, the starting pointer for the slice s is the element at index 0 in the array a, and its length is 3. You can think of a slice as a tuple, with the first element being a pointer and the second element being the length. Therefore, it is also called a fat pointer.

slice (&[T])  ->  (pointer, length)

We are allowed to mutate (update) the original source data via a slice as long as the source data is mutable, hence let mut a instead of just let a, and the reference to this data is also mutable, which is why we use &mut a instead of just &a. If we don’t have a mutable reference, then even though a is mutable, we can’t update it through the slice s.

fn main() {
    let mut a: [i32; 5] = [1, 2, 3, 4, 5];
    let s: &mut [i32] = &mut a[1..3];

    s[0] = 100;

    println!("s: {:?}", s);
    println!("a: {:?}", a);
}

In the example above, the slice s references a from index 1 to index 3. Therefore, its length is only 2. The first element of the slice s is the second element of the array a. We made the array a mutable so that its values can be updated. If we want to update a through the slice s, then we also need to specify &mut when referencing the slice portion. Here, we don’t need s itself to be mutable since we are not interested in changing the slice s itself, just the values it references.

$ cargo run
s: [100, 3]
a: [1, 100, 3, 4, 5]
fn main() {
    let a: [i32; 5] = [1, 2, 3, 4, 5];
    let s: &[i32] = &a[1..=3]; // a[1,2,3]
    let ss: &[i32] = &s[1..];  // a[2,3]

    println!("a: {:?}", a);
    println!("s: {:?}", s);
    println!("ss: {:?}", ss);
}

A slice can reference another slice. In the example above, the slice ss references only a portion of the slice s, which in turn references the array a. This is perfectly valid.

$ cargo run
a: [1, 2, 3, 4, 5]
s: [2, 3, 4]
ss: [3, 4]

The range operator has several convenient variations:

  • a[0..3]: References the collection a from index 0 up to, but excluding, index 3. This is called a half-open range.
  • a[0..=3]: References the collection a from index 0 up to and including index 3. This is an inclusive range.
  • a[1..]: References the collection a starting from index 1 until the end of the collection.
  • a[..3]: References the collection a from the beginning (index 0) up to, but excluding, index 3.
  • a[..=3]: References the collection a from the beginning up to and including index 3.
  • a[..]: References the entire collection a. This is equivalent to referencing all elements in a with no explicit start or end index.
fn main() {
    let a: [i32; 5] = [1, 2, 3, 4, 5];
    let s: &[i32] = &a[..];

    for i in s {
        println!("Element: {}", i);
    }
}
$ cargo run
Element: 1
Element: 2
Element: 3
Element: 4
Element: 5

Like an array or a vector, a slice can also be iterated over using the for loop as shown above. Each element in the iteration (i) is &i32, a reference to the value stored in a, the source of the slice.

fn main() {
    let a: Vec<i32> = vec![1, 2, 3, 4, 5];
    let s: &[i32] = &a[..];

    for i in s {
        println!("Element: {}", i);
    }
}
$ cargo run
Element: 1
Element: 2
Element: 3
Element: 4
Element: 5

Like an array, a slice can also be created from a vector as shown above. However, unlike an array, where a slice refers to data stored on the stack, a slice of a vector references the data stored on the heap.

HashMap

A HashMap is a collection of key-value pairs. Rust internally computes a hash of the key and uses it to efficiently store the key-value pair in memory. Like a Vector, a HashMap is also stored on the heap. Unlike in some programming languages, where there are restrictions on the type of key, in Rust, keys can be of any type, as long as they are hashable.

Rust computes the hash of a key to efficiently store and retrieve values. Based on the hash value, Rust will store the new value in a specific bucket, and while retrieving, it will only access that bucket instead of scanning the whole collection. Hence, the key type must implement the Hash trait. However, uniqueness of keys is not guaranteed, meaning two different keys may produce the same hash (called a hash collision). To handle this scenario, Rust compares the actual key values as a second step.

Rust provides the HashMap type to construct a HashMap. However, it’s not part of the prelude, so we need to import it from Rust’s standard library. Like a Vector, a HashMap has the HashMap::new() associated function to construct a new HashMap. It’s also a generic type, but this time we need to provide two types: one for the key and one for the value.

use std::collections::HashMap;

fn main() {
    let h: HashMap<i32, bool> = HashMap::new();

    println!("h: {:?}", h);
}

// h: {}

In the example above, we have initialized a HashMap h which contains keys of type i32 and values of type bool. However, this HashMap is currently empty, meaning there are no elements present in it. To add a key-value pair, we need to use the .insert() method, which inserts a new element into the HashMap and make sure it is mutable.

use std::collections::HashMap;

fn main() {
    let mut h: HashMap<i32, bool> = HashMap::new();

    h.insert(0, true);
    h.insert(1, false);
    h.insert(2, true);

    println!("h: {:?}", h);
}

// h: {1: false, 2: true, 0: true}

If you would like to initialize a HashMap with some initial entries, you can use the HashMap::from() associated function, which takes an array of tuples where the first element of the tuple is the key and the second element is the value.

use std::collections::HashMap;

fn main() {
    let h: HashMap<i32, bool> = HashMap::from([
        (0, true),
        (1, false),
        (2, true)
    ]);

    println!("h: {:?}", h);
}

// h: {1: false, 2: true, 0: true}
use std::collections::HashMap;

fn main() {
    let h: HashMap<i32, bool> = HashMap::from([
        (0, true),
        (1, false),
        (2, true),
    ]);

    for i in h.iter() {
        println!("Element: {:?}", i);
    }
}

If we want to iterate over a HashMap using the for loop, we first need to call the .iter() method on it, which returns an iterable. Each element in the iteration is a tuple of the key and value. So, in the example above, you can access the key and value with i.0 and i.1, respectively.

Element: (2, true)
Element: (0, true)
Element: (1, false)

As you may have noticed, the order of retrieval of items in the for loop is not the same as the insertion order in the HashMap. This is because a HashMap is an unordered collection, and the order of items depends on the internal state of the hash table. If you would like an ordered HashMap, where the retrieval of items is guaranteed to be in the order of their insertion, you can use the indexmap crate.

To access a specific element by its key, we can use the .get(&key) method, which accepts a key (actually its reference) and returns an Option enum. If the key exists in the HashMap, it will return the Some variant containing a reference to the value in the HashMap, or None if the key doesn’t exist.

use std::collections::HashMap;

fn main() {
    let h: HashMap<i32, bool> = HashMap::from([
        (0, true),
        (1, false),
        (2, true),
    ]);

    let val_at_2 = h.get(&2); // exists
    let val_at_3 = h.get(&3); // doesn't exist

    match val_at_2 {
        Some(value) => println!("Value at 2: {}", value),
        None => println!("Key 2 doesn't exist."),
    }

    match val_at_3 {
        Some(value) => println!("Value at 3: {}", value),
        None => println!("Key 3 doesn't exist."),
    }
}
$ cargo run
Value at 2: true
Key 3 doesn't exist.

In the example above, h.get(&2) looks for the value associated with the key 2 in the HashMap h. The &2 provides a reference to the value 2, which is what the .get() method requires. Using the match statement, we can check if the Option returned by this method is either Some or None. If it’s Some, it will contain a reference to the value associated with this key. Therefore, value in Some(value) has the type &bool, instead of bool. For the key 2, it returns Some, but for 3, it returns None since no such key exists in the HashMap h.

Do not worry too much about what a reference is. We will cover this topic in detail since it’s one of the most important topics in Rust.

When we want to update a value associated with a key, there are a few ways to do it. The easiest way is to use the .insert() method. Since we need to provide both a key and a value with this method, if the key exists, its value will be updated; otherwise, a new entry will be added. If we only want to add a new entry if the key doesn’t exist, we can use the .contains_key() method.

use std::collections::HashMap;

fn main() {
    let mut h: HashMap<i32, bool> = HashMap::from([
        (0, true),
        (1, false),
        (2, true),
    ]);

    // updates
    h.insert(0, false);

    // inserts
    h.insert(3, true);

    // inserts only if it doesn't exist
    if !h.contains_key(&2) {
        h.insert(2, false);
    }

    println!("h: {:?}", h);
}

// h: {0: false, 1: false, 2: true, 3: true}

In the example above, h.insert(0, false) updated the value of the key 0 since it already exists in the HashMap. However, for h.insert(3, true), it inserts the item since the key 3 doesn’t exist. Later, using the if condition, we checked if the key 2 exists, and if it doesn’t, we insert a new item which doesn’t happen since key 2 already exists.

use std::collections::HashMap;

fn main() {
    let mut h: HashMap<i32, bool> = HashMap::from([(0, true), (1, false), (2, true)]);

    let mut_val_at_2 = h.get_mut(&2);

    match mut_val_at_2 {
        Some(val_ref) => {
            *val_ref = false;
            println!("Value updated!");
        }
        None => println!("This key doesn't exist."),
    }

    println!("h: {:?}", h);
}
$ cargo run
Value updated!
h: {0: true, 1: false, 2: false}

We could also use the .get_mut(&key) method, which returns an Option with a mutable reference to the value associated with the key. By dereferencing the reference, we can update the value. In the example above, mut_val_at_2 will contain Some(&mut bool) since the key 2 exists, and using the match statement branch, we can update the value. We use the * (called the Dereference Operator) to convert the reference to the actual value and also to update it with a new value.

One of the easiest ways to handle this is to use the entry API. The HashMap has an .entry(&key) method that returns an Entry object, which provides several useful methods.

use std::collections::HashMap;

fn main() {
    let mut h: HashMap<i32, bool> = HashMap::from([(0, true), (1, false), (2, true)]);

    h.entry(1).and_modify(|v| *v = true);
    h.entry(3).or_insert(false);
    h.entry(4).or_default();

    println!("h: {:?}", h);
}

// h: {0: true, 1: true, 2: true, 3: false, 4: false}

In the example above, we have used several Entry API methods as follows:

  • The and_modify method is used to modify a value in the HashMap if the key exists. It takes a closure (which is similar to a function, and we will discuss it in a later lesson). This closure receives a mutable reference to the value associated with the key (if it exists), which we can use to modify the value.
  • The or_insert method inserts a value into the HashMap if the key doesn’t exist.
  • Similar to or_insert, the or_default method inserts the default value for the type (in this case, the type is bool, and the default value is false).

There are other interesting methods for HashMap that you can explore.

HashSet

Like an array or vector, a HashSet stores elements of the same type, but no duplicate elements are allowed. Under the hood, it uses a HashMap to store these values efficiently, which is why the values we want to store must be hashable. Like HashMap, HashSet is not part of the prelude and must be imported from Rust’s standard library.

Since HashSet uses a HashMap behind the scenes, the HashSet values are stored as keys in the HashMap, while their associated values are ().

use std::collections::HashSet;

fn main() {
    let mut h: HashSet<i32> = HashSet::new();

    println!("Inserted 1? {}", h.insert(1));
    println!("Inserted 2? {}", h.insert(2));
    println!("Inserted 2 again? {}", h.insert(2));

    println!("h: {:?}", h);
}
$ cargo run
Inserted 1? true
Inserted 2? true
Inserted 2 again? false
h: {1, 2}

In the example above, the HashSet h stores a collection of i32 values. The .insert() method takes the value to be stored and returns a bool indicating whether the value was successfully added to the HashSet. If a value already exists in the HashSet, it will return false, as HashSet can only store unique values.

use std::collections::HashSet;

fn main() {
    let h = HashSet::from([1, 2, 3, 4, 5]);

    for i in h {
        println!("Element: {}", i);
    }
}
$ cargo run
Element: 1
Element: 4
Element: 2
Element: 5
Element: 3

We can use the HashSet::from() associated function on HashSet to initialize a HashSet with initial entries. Like other collections, we can use the for loop to iterate over its elements. However, since HashSet is based on a HashMap, it does not preserve the insertion order. If you need to preserve insertion order in a set, you can use the indexmap crate, which provides the IndexSet type.

Here are some useful HashSet methods:

  • contains: Returns true if the HashSet contains the specified value.
  • remove: Removes a value from the HashSet, returning true if the value was present.
  • len: Returns the number of elements in the HashSet.
  • is_empty: Returns true if the HashSet contains no elements.
  • clear: Clears the HashSet, removing all elements.
  • union: Returns an iterator over all values in this HashSet or another.
  • difference: Returns an iterator over the values in this HashSet that are not in another.

! Never Type

The !, or Never Type, in Rust is a special type because it’s uninhabited, meaning it cannot represent any values, and therefore has a size of 0, unlike the types we saw earlier. When a function cannot return a value, it’s called a diverging function, and its return type is !. Anything that returns or evaluates to the ! (never type) will never return control back to the caller. It will either loop infinitely, panic, or exit the process.

fn infinite_loop() -> ! {
    loop {}
}

fn always_crash() -> ! {
    panic!("Sorry, I panicked!");
}

fn exit_process() -> ! {
    std::process::exit(1)
}

The function infinite_loop, once called, never returns a value because of the infinite loop. When loop is used without a break statement, it evaluates to ! since it will never return control. Similarly, the function always_crash does not return a value because it panics and crashes. The panic! macro also evaluates to ! because it causes the program to crash. The std::process::exit() function is used to terminate the process, which is why it also returns !. Therefore, we explicitly provide a return type of ! to signal to the developer and the compiler that this function will never return, which improves code readability.

One special feature of the ! type is that it can coerce into any other type.

fn infinite_loop() -> ! {
    loop {}
}

fn main() {
    let value: u32 = infinite_loop();
}

In the example above, we can assign the return value of the infinite_loop function to a variable value with a type of u32. This is possible because the ! type, when it occurs, never returns control to the caller. As a result, the return value of the infinite_loop function will never actually be assigned to value, and the Rust compiler accepts this. This behavior offers several benefits.

fn get_value(condition: bool) -> u32 {
    let x: u32 = if condition {
        128
    } else {
        panic!("Something went wrong!")
    };

    return x;
}

In the get_value function, we return a value of type u32. We’ve used an if/else statement as an expression (which we will discuss in detail in the Control Flow lesson), where the if block returns 128 (with an inferred type of u32) and the else block triggers a panic!(), which returns !. Although Rust typically enforces that both the if and else blocks return the same type when used as an expression, it makes an exception for the ! type. This is because when Rust encounters the else block, it knows that the block will never assign a value to x and therefore cannot return a value from the function. This simplifies handling functions that may panic, as we don’t need to worry about defining complex return types for them.

fn process_option(opt: Option<i32>) -> i32 {
    return match opt {
        Some(val) => val, // val is `i32`
        None => panic!("No value found!"),
    };
}

This coercion superpower also helps in pattern matching with the match statement/expression. When match is used as an expression, Rust enforces that all arms return the same type (we will also discuss this in the Control Flow lesson). However, if an arm—such as None in certain cases—returns !, the Rust compiler accepts this, as ! can coerce into any type.

Type alias

A type alias is a different name for an existing type. We use the type <new type name> = <existing type> statement to declare a new type alias. A type alias is not a new type; it’s merely a new name to refer to an existing type. Therefore, while Rust does not allow operations on data of different types, it is perfectly fine to work with type aliases since they refer to the same underlying type.

type Age = u32;

fn main() {
    let age_1: Age = 21;
    let age_2: u32 = 18;

    let sum_age = age_1 + age_2;
    println!("sum_age: {}", sum_age);
}
$ cargo run
sum_age: 39

In the example above, even though age_1 has the type Age and age_2 has the type u32, we can still perform addition on them because Age is just a type alias for u32. Therefore, a type alias is not a distinct or strong type.

Type aliases are mostly used to provide shorter names for long and complex types. This improves readability and makes code refactoring easier, as the underlying type is abstracted by the type alias. Type aliases are inlined during compilation, meaning they do not exist in the compiled binary and introduce no runtime overhead.

use std::collections::HashMap;

// name, age, is_male
type Student = (String, u32, bool);

// key = roll_number, value = Student
type StudentsCollection = HashMap<u32, Student>;

fn main() {
    let coll = StudentsCollection::from([
        (1, ("John Doe".to_string(), 21, true)),
        (2, ("Jenny Smith".to_string(), 18, false)),
    ]);

    println!("coll: {:?}", coll);
}

// coll: {1: ("John Doe", 21, true), 2: ("Jenny Smith", 18, false)}

In the example above, we declared two type aliases. The Student type is a tuple that stores information about a student, while StudentsCollection is an alias for a HashMap that uses u32 values as keys and Student types as values.

#rust #data-types