📌 Systems Cheatsheet
Essential Powers of Two
Powers of two is fundamental in computing, as they frequently represent limits, capacities, and common units of measurement. The following table consolidates key powers of two that are particularly useful for quick recall.
📝 $2^{n}$
- $2^{8} \rightarrow$
u8
- $2^{16} \rightarrow$
u16
- $2^{32} \rightarrow$
u32
- $2^{64} \rightarrow$
u64
- $2^{10}$ Bytes $\rightarrow$
1 KiB
- $2^{20}$ Bytes $\rightarrow$
1 MiB
- $2^{30}$ Bytes $\rightarrow$
1 GiB
$2^n$ | Value | Common Use Case / Significance |
---|---|---|
$2^8$ | $256$ | Maximum value for an 8-bit unsigned integer (u8 ) number of possible ASCII characters, byte values (integers that can fit within 1-Byte or 8-bits e.g. 0-255). |
$2^{10}$ | $1,024$ | Exactly 1 Kilobyte (KB). Also relevant for stack sizes and memory allocation. |
$2^{16}$ | $65,536$ | Maximum value for a 16-bit unsigned integer (u16 ), common range for network port numbers (0-65535). |
$2^{20}$ | $1,048,576$ | Exactly 1 Megabyte (MB). Represents a million in binary contexts. |
$2^{24}$ | $16.7 \times 10^6$ | Number of colors in 24-bit RGB color depth (True Color) Approximated as 16.7 Million. |
$2^{32}$ | $4.29 \times 10^9$ | Maximum value for a 32-bit unsigned integer (u32 ), total address space for IPv4, common for file IDs and large counters. Approximated as 4.29 Billion. |
$2^{64}$ | $1.84 \times 10^{19}$ | Maximum value for a 64-bit unsigned integer (u64 ), used for extremely large IDs, precise timestamps (e.g., nanoseconds since epoch), and representing vast amounts of memory. Approximated as $1.84 \times 10^{19}$. |
Core Computer Storage Units
The fundamental units of digital information and their conversions is crucial in computer science and engineering. Unlike the decimal system (base-10) used in everyday life, computers primarily operate using the binary system (base-2). This often leads to differences in how “kilo,” “mega,” and “giga” are interpreted in computing contexts compared to their standard metric definitions.
📝 Note
- 1 bit $\to \hspace{0.1em}$ $0$ or $1$
- $8$ bits $\to \hspace{0.1em}$ $1$ Byte
💻 IEC
- $2^{10}$ Bytes $\to \hspace{0.1em}$
1 KiB
- $2^{20}$ Bytes $\to \hspace{0.1em}$
1 MiB
- $2^{30}$ Bytes $\to \hspace{0.1em}$
1 GiB
used during programming for memory estimations
📏 SI
1000 bytes
$ \rightarrow $1 KB
1000 KB
$ \rightarrow $1 MB
1000 MB
$ \rightarrow $1 GB
used by hardware manufacturers for reporting hard drive capacities
A consolidated reference for common bit and byte units are shown below:
Unit | Value / Description | Notes |
---|---|---|
1 Bit | The smallest unit of data | Represents a binary digit, either 0 or 1. |
1 Byte (B) | 8 bits | The fundamental addressable unit of memory. Commonly stores a single character (e.g., ASCII). |
1 Kilobyte (KB/KiB) | $2^{10}$ = 1,024 Bytes | 1 KB = 1,000 bytes; 1 KiB (kibibyte) = 1024 |
1 Megabyte (MB/MiB) | $2^{20}$ = 1,048,576 Bytes | 1 MB = 1 million bytes (1000 KB); MiB (mebibyte) = 1024 KiB |
1 Gigabyte (GB/GiB) | $2^{30}$ = 1,073,741,824 Bytes | 1 GB = 1 billion bytes (1000 MB); 1 GiB (gibibyte) = 1024 MiB |
1 Terabyte (TB/TiB) | $2^{40}$ = 1,099,511,627,776 Bytes | 1 TB = 1 trillion bytes (1000 GB), 1 TiB (tebibyte) = 1024 (GiB) |
Cache Line (e.g., x86-64) | Typically 64 Bytes | The smallest unit of data that can be transferred between main memory and CPU cache. Optimizing for cache lines is critical for performance. |
Page Size (e.g., x86-64) | Typically 4 KB (4,096 Bytes) | The smallest unit of memory that the operating system manages in virtual memory. Memory is allocated and protected in pages. |
Prefix Comparison
It’s worth noting the distinction between the traditional binary prefixes (powers of 2) and the decimal prefixes (powers of 10) used in the International System of Units (SI).
- Binary Prefixes (IEC Standard): KiB (kibibyte), MiB (mebibyte), GiB (gibibyte), TiB (tebibyte) use powers of $2^{10}$ $\to \hspace{0.1em}$ precise for computer memory and storage during programming.
- Decimal Prefixes (SI Standard): KB (kilobyte), MB (megabyte), GB (gigabyte), TB (terabyte) typically use powers of $10^3$ $\to \hspace{0.1em}$ used by hard drive manufacturers to express capacity, leading to slight discrepancies with the actual binary capacity reported by operating systems.
For example, a 1 TB hard drive is $10^{12}$ bytes, which is slightly less than 1 TiB ($2^{40}$ bytes).
Primitive Data Types: (64-bit Systems)
Understanding the characteristics of primitive data types—their size in memory and the range of values they can hold—is fundamental for efficient programming, especially in systems-level languages like C++ and Rust. This knowledge is crucial for memory optimization, avoiding overflow errors, and ensuring data integrity. While Python’s dynamic typing abstracts many of these details, it’s still beneficial to grasp the underlying concepts.
The below tables consolidates information on common primitive types, their typical sizes on a 64-bit architecture, and their value ranges, along with comparisons across C++, Rust, and Python.
Signed Data Types️
These types can represent both positive and negative values. They use a bit (typically the most significant one) to indicate the sign.
Type Category | C++ Type(s) | Rust Type(s) | Size (Bytes) | Signed Range (Approximate) | Notes |
---|---|---|---|---|---|
Integer (8-bit) | int8_t , char (often) | i8 | 1 | $-128$ to $127$ | Used for small integer values; the sign of C++ char can be platform-dependent. |
Integer (16-bit) | short , int16_t | i16 | 2 | $-32,768$ to $32,767$ | Common for port numbers, smaller counters. |
Integer (32-bit) | int , long , int32_t | i32 | 4 | $-2.14 \text{ Billion}$ to $2.14 \text{ Billion}$ | A standard integer size on many systems. |
Integer (64-bit) | long long , int64_t | i64 | 8 | $\pm9.22 \text{ Quintillion}$ | Used for large numbers, timestamps, and unique IDs. |
Pointer-sized Integer | ptrdiff_t | isize | 8 | $\pm2^{63}-1$ (on 64-bit) | Represents the difference between two pointers, allowing for negative offsets. |
Floating Point (Single) | float | f32 | 4 | $\sim\pm3.4 \times 10^{38}$ | IEEE 754 type with ~7 decimal digits of precision. |
Floating Point (Double) | double | f64 | 8 | $\sim\pm1.8 \times 10^{308}$ | IEEE 754 type with ~15 decimal digits of precision. |
Unsigned Data Types
These types can only represent non-negative values (zero and positive numbers), allowing them to store larger positive values than their signed counterparts of the same size.
Type Category | C++ Type(s) | Rust Type(s) | Size (Bytes) | Unsigned Range (Approximate) | Notes |
---|---|---|---|---|---|
Character | unsigned char , char | char , u8 | 1 (C++) 4 (Rust) | C++: 0 to 255; Rust: 0 to $0x10FFFF$ | Rust’s char is a 4-byte Unicode scalar value. C++ char is often 1 byte. u8 is used for byte manipulation. |
Integer (8-bit) | uint8_t | u8 | 1 | $0$ to $255$ | Ideal for representing a single byte of data. |
Integer (16-bit) | uint16_t | u16 | 2 | $0$ to $65,535$ | Useful for data that won’t exceed this limit, like image dimensions. |
Integer (32-bit) | uint32_t | u32 | 4 | $0$ to $4.29 \text{ Billion}$ | Commonly used for IPv4 addresses and file IDs. |
Integer (64-bit) | uint64_t | u64 | 8 | $0$ to $18.4 \text{ Quintillion}$ | Essential for large counts or bit manipulation on 64-bit values. |
Pointer-sized Integer | size_t | usize | 8 | $0$ to $2^{64}-1$ (on 64-bit) | The standard type for memory sizes and collection indices. |
Special Types
These types don’t fit into the signed/unsigned numeric categories but are fundamental building blocks. Python’s int
and str
types are included here as they are dynamically sized and don’t have fixed-width signed/unsigned counterparts.
Type Category | C++ Type(s) | Rust Type(s) | Python Type(s) | Size (Bytes) | Notes |
---|---|---|---|---|---|
Boolean | bool | bool | bool | 1 | Stores a simple true or false value. |
Raw Pointer | T* (e.g., int* ) | *const T , *mut T | N/A | 8 (on 64-bit) | Stores a memory address. Its “value” is an address, not a signed/unsigned number. |
Character | char | char | str (len 1) | 1 B (Python/C++), 4 B (Rust) | Python uses strings for single characters. C++ char is typically an 8-bit integer representing an ASCII character. Rust char is a 32-bit Unicode Scalar Value. |
Dynamic String | std::string | String , &str | str | Varies | A heap-allocated data structure for text. Python str is immutable. |
String View/Slice | std::string_view | &str | (Slicing str ) | Varies | An immutable, non-owning view into a part of a string. |
Best Practices
Integer Sizing:
- C++ & Rust: Offer fixed-width integers (e.g.,
int32_t
,i32
) which guarantee the same size on any platform. For portability and unambiguous code in C++, it is recommended to use these over generic types likeint
orlong
. - Python: Its
int
type has arbitrary precision, meaning it automatically uses more memory as needed to store larger numbers and has no practical size limit. - Mixing Signed & Unsigned Types: This is a classic C++ bug. A condition like
i < n
wherei
is signed andn
is unsigned can lead to unexpected behavior due to implicit type promotion rules. Rust’s strict type system prevents this by disallowing operations between different types without an explicit cast.
- C++ & Rust: Offer fixed-width integers (e.g.,
Characters:
- The
char
type is fundamentally different. In C++,char
is a 1-byte type typically holding an ASCII value. In Rust,char
is a 4-byte type representing any Unicode Scalar Value, making it suitable for all languages. char
Discrepancy: A C++char
is a 1-byte type representing an ASCII character. In contrast, a Rustchar
is always 4 bytes, representing a full Unicode Scalar Value.
- The
Pointer-Sized Integers:
usize
(Rust) andsize_t
(C++) are crucial for indexing and memory-related calculations. Their size matches the system’s address space (8 bytes on a 64-bit system), ensuring they can always hold the size of any object in memory.
Strings:
- All three languages provide dynamic, heap-allocated strings (
std::string
,String
,str
) for mutable text. - Rust and modern C++ also offer lightweight, immutable string “views” or “slices” (
&str
andstd::string_view
) for efficient, non-owning access to string data.
- All three languages provide dynamic, heap-allocated strings (
Think in… | To Achieve… |
---|---|
Bytes, Bits, & Pages | A clear understanding of memory layout, alignment, and fragmentation. |
Powers of Two | Quick and accurate estimations for storage, memory, and latency. |
Signed vs. Unsigned | Prevention of subtle overflow and comparison bugs in loops and conditions. |
Platform-Sized Pointers | Optimized indexing and correct interaction with low-level system APIs. |
Stack vs. Heap | Proper memory management: use the stack for fast, scoped, fixed-size data and the heap for large or dynamically-sized data. |
Endianness | Correct data serialization and network communication by understanding byte order (Big vs. Little-Endian). |
Quick Memory Estimation Guide
This table provides quick, practical estimates for the memory footprint of common data types and system-level structures, essential for performance-aware software design.
Item / Data Structure | Approximate Size | Notes |
---|---|---|
Large Data Collections | ||
1 million 32-bit integers | ~4 MB | (1,000,000 items × 4 bytes/item) |
100 million 32-bit floats | ~400 MB | (100,000,000 items × 4 bytes/item) |
A 10-character string | ~10 to 40 bytes | 1 byte/char for ASCII; 1-4 bytes/char for UTF-8. |
A 100-character string | ~100 to 400 bytes | Does not include metadata overhead from string objects. |
UNIX Timestamp | 8 bytes | Typically stored as a 64-bit integer (i64 ). |
Identifiers & Hashes | ||
IPv4 Address | 4 bytes | A 32-bit numerical address (uint32_t) . |
64-bit Pointer | 8 bytes | The size of a memory address on a 64-bit system. |
UUID (v4) | 16 bytes | A 128-bit universally unique identifier. |
SHA-256 Hash | 32 bytes | A 256-bit secure hash algorithm output. |
Hex String Representation | 1 byte per 2 hex digits | Example: The string "0xDEADBEEF" represents 4 bytes of data. |
ASCII Character | 1 byte | |
UTF-8 Character | 1 to 4 bytes | |
System-Level Constants | ||
Memory Page | 4 KB | (4,096 bytes) The basic unit of memory managed by the OS. |
Typical L1/L2 Cache Line | 64 bytes | The smallest unit of data transferred between memory and a CPU cache. |
Memory Allocation
Indexing & Memory Allocation
- Use the Right Type for Sizes: For indexing or representing object sizes, always use
size_t
in C++ andusize
in Rust. These types are guaranteed to be the width of a memory pointer on the target platform (e.g., 64 bits on a 64-bit system). - Stack vs. Heap Allocation: Know when to use the heap. Prefer the heap (
Vec
,Box
in Rust) when dealing with large amounts of data or when the data’s size is not known at compile time. A typical thread’s stack size is small (1-8 MB).
Stack vs Heap
Feature | Stack | Heap |
---|---|---|
Allocation Time | Fast (LIFO) | Slower (dynamic alloc/free) |
Lifetime | Auto (scope-bound) | Manual (must free/drop) |
Size | Limited (MBs) | Large (GBs) |
Location | Per-thread | Shared |
Use Case | Local vars, small arrays | Dynamic structures, Box , Vec |
Cold Estimations
- ✅ Estimate RAM usage of a struct with multiple fields
- ✅ Say how much memory
vector<int>
of size 1M uses - ✅ Recall signed/unsigned integer ranges instantly
- ✅ Explain overflow behavior in Rust vs C++ vs Python
- ✅ Index an array in Rust with
usize
and know why - ✅ Know what
sizeof(int*)
is on your target system - ✅ Use
std::numeric_limits<T>::max()
orT::MAX
in Rust