FatArena: Simplifying Memory Management in C

By Floréal Risso

Github: FlorealRISSO

Email: floreal.risso@proton.me

Code : FatArena

Published: April 14, 2024

Introduction

C's manual memory management, while powerful, can be a source of bugs and complexity. The traditional "malloc" and "free" paradigm demands significant cognitive overhead from developers and can lead to performance bottlenecks. But what if we could simplify this process? Enter FatArena, a novel approach to memory management that challenges conventional wisdom and offers a streamlined alternative.

The Problem with Traditional Memory Management

The standard malloc allocator is a complex beast. It juggles multiple responsibilities:

This complexity comes at a cost: increased CPU usage, potential for bugs, and cognitive load on developers. But what if we could simplify these requirements?

Introducing FatArena

FatArena is a different approach to memory management with two primary goals:

  1. Allocate new chunks of memory efficiently
  2. Free all allocated chunks at once

By narrowing our focus, we can create a simpler, more efficient system. But this approach comes with a caveat: you can't free individual chunks. This limitation, however, can lead to better code structure by encouraging developers to group data with similar lifetimes.

How FatArena Works

FatArena leverages a clever trick in modern operating systems: when you request a new page of memory, the OS doesn't immediately allocate it in RAM. Instead, it waits until the memory is actually accessed (read or written to). This allows us to create large, virtual buffers for specific lifetimes, flushing them entirely when no longer needed.

Here's the basic structure of our FatArena (code here):


typedef struct FatArena {
    uint8_t *addr;  // The page address
    size_t remain;  // The remaining storage
    size_t size;    // The total storage
} FatArena;
            

FatArena Memory Layout

To better understand how FatArena uses memory, let's look at its layout:

graph LR subgraph "FatArena Memory Layout" A[Virtual Memory] --> B[Page 1] A --> C[Page 2] A --> D[Page 3] A --> E[...] B --> F[Used Memory] B --> G[Remaining Memory] C --> H[Unused] D --> I[Unused] E --> J[Unused] end

FatArena allocates virtual memory pages, but only uses what it needs. The OS only commits real memory when it's accessed.

Implementing FatArena

Let's look at the core function for creating a new FatArena:


Bool ftnew(FatArena *a, size_t sz)
{
    pgsz = sysconf(_SC_PAGESIZE);
    size_t fatsz = pgsz;
    while (fatsz < sz) {
        fatsz += pgsz;
    }

    uint8_t *pg = mmap(NULL, fatsz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
    if (pg == MAP_FAILED) {
        return 0;
    }

    a->addr = pg;
    a->remain = fatsz;
    a->size = fatsz;
    return 1;
}
            

This function calculates the number of pages needed, rounds up to the nearest integer, and uses mmap to allocate the virtual memory.

Advantages of FatArena

Real-World Application: Streamlining Compiler Lexical Analysis

Let's dive into a practical example where FatArena shines: the lexical analysis phase of a compiler. This stage transforms source code into a sequence of tokens, and it's a perfect showcase for FatArena's efficiency. We'll use the Chi Compiler as our reference implementation.

The Traditional Approach vs. FatArena

In a traditional lexer implementation, you might create a complex list of unions or structures to represent different token types. Each token could have various fields to store additional information, like line numbers, literal values, or identifiers. This approach, while flexible, can lead to increased memory usage and more complex token handling code.

With FatArena, we can dramatically simplify this process. Instead of a complex data structure, our lexer produces a simple list of integers. These integers represent either token types (as enum values) or indexes into our FatArena. This approach offers several advantages:

Efficient Token Processing

Processing this token stream becomes straightforward and efficient. Here's a simplified example of how we might handle tokens in the next phase of compilation:


void process_tokens(int *tokens, int token_count, FatArena *arena) {
    for (int i = 0; i < token_count; i++) {
        switch (tokens[i]) {
            case TOK_IF:
                // Handle 'if' statement
                break;
            case TOK_IDENTIFIER:
                // The next token is the index in the FatArena
                char *identifier = ftptr(arena, tokens[++i]);
                // Process identifier
                break;
            case TOK_LITERAL:
                // The next token is the index in the FatArena
                char *literal = ftptr(arena, tokens[++i]);
                // Process literal
                break;
            // ... handle other token types ...
        }
    }
}
            

This approach offers several benefits:

Setting Up FatArena for Lexical Analysis

Here's how we might set up our FatArenas for the lexer:


// Initialize our FatArenas
FatArena tokens, literals;
ftnew(&tokens, TOKENS_ESTIMATE * sizeof(int));
ftnew(&literals, LITERALS_ESTIMATE);

// Perform lexical analysis
lex_source_code(&tokens, &literals);

// Now 'tokens' contains our integer token stream,
// and 'literals' contains all string data (identifiers, literal values, etc.)
            

This setup allows us to efficiently store and process tokens without the overhead of complex data structures. The lexer can quickly append tokens and literal data, and subsequent compiler stages can easily access this information.

The Power of Simplicity

By leveraging FatArena in our lexer, we've transformed a potentially complex and memory-intensive process into a streamlined, efficient operation. This approach not only simplifies our code but also paves the way for better performance in later stages of compilation.

The beauty of this system lies in its simplicity. By reducing our token representation to its essence - a series of integers - we've created a flexible, fast, and memory-efficient lexical analysis stage. This is just one example of how FatArena's unique approach to memory management can lead to elegant solutions in real-world applications.

Conclusion

FatArena offers a fresh perspective on memory management in C. By simplifying our requirements and leveraging OS-level optimizations, we can create more efficient and less error-prone code. While it may require a shift in thinking about memory allocation, the benefits in terms of performance and code clarity can be significant.

As with any tool, FatArena isn't a one-size-fits-all solution. It's particularly well-suited for scenarios where you can group memory allocations by lifetime, such as in compilers, game engines, or any application with distinct processing phases. By understanding its strengths and limitations, you can make informed decisions about when and how to use FatArena in your projects.