Mohammed Manssour

A New Debugging Tool in Go: The Trace Flight Recorder

Go

Introduced in Go 1.25, the trace flight recorder is a powerful tool that helps developers find and fix problems in their code by capturing a snapshot of what the program was doing right before an issue occurred.

What is Tracing?

Tracing is like creating a detailed log of your program as it runs. It collects information about what your program is doing, such as which functions are being called, how long they take, and how memory is being used. This information is crucial for finding performance bottlenecks and debugging complex issues.

What is Flight Recording?

Flight recording is a special kind of tracing that uses a circular buffer. Think of it like a security camera that only saves the last few minutes of video. It continuously records program activity but only keeps the most recent data. This is efficient because it saves memory and reduces the size of the final trace file, capturing only the essential information leading up to an event.

Working with the trace Package

Before we build a full example, let's look at the key functions for controlling the flight recorder.

1. Configuring and Starting the Recorder

First, you need to configure and start the flight recorder. You define how much data you want to keep. In this case, we're keeping the last 5 seconds of activity, up to a maximum of 3MB.

func configureAndStartFlightRecorder() {
	// Configure the flight recorder.
	traceCfg := trace.FlightRecorderConfig{
		MinAge:   5 * time.Second,
		MaxBytes: 3 * 1024 * 1024, // 3 MB
	}

	// Create and start a new flight recorder instance.
	fr = trace.NewFlightRecorder(traceCfg)
	if err := fr.Start(); err != nil {
		log.Fatalf("Failed to start flight recorder: %v", err)
	}
	log.Println("Flight recorder started.")
}

2. Writing the Trace Data

When you want to save a snapshot of the collected data, you write it to a file. This function creates a trace.out file and writes the buffer's contents to it.

func writeTrace() {
	// Create a file to store the trace output.
	f, err := os.Create("trace.out")
	if err != nil {
		log.Printf("Failed to create trace file: %v", err)
		return
	}
	defer f.Close()

	// Write the collected trace data to the file.
	if err := fr.WriteTo(f); err != nil {
		log.Printf("Failed to write trace data: %v", err)
	}
	log.Println("Trace data written to trace.out")
}

3. Triggering the Trace

You don't want to save a trace on every request. Instead, you can set up a trigger. Here, we check if a request took longer than 300 milliseconds. If it did, we call writeTrace(). We use sync.Once to ensure the trace is only written the first time a slow request is detected.

if duration > 300*time.Millisecond {
    once.Do(func() {
        log.Println("Slow request detected, capturing trace...")
        writeTrace()
    })
}

Example: A Web Server with a Slow Request

Now, let's put all the pieces together in a simple web server. This server will simulate a heavy workload and use our trigger to capture a trace when needed.

Here is the complete code:

package main

import (
	"fmt"
	"log"
	"net/http"
	"os"
	"runtime/trace"
	"sync"
	"time"
)

var (
	fr   *trace.FlightRecorder
	once sync.Once
)

// configureAndStartFlightRecorder sets up and starts the flight recorder.
func configureAndStartFlightRecorder() {
	traceCfg := trace.FlightRecorderConfig{
		MinAge:   5 * time.Second,
		MaxBytes: 3 * 1024 * 1024, // 3 MB
	}
	fr = trace.NewFlightRecorder(traceCfg)
	if err := fr.Start(); err != nil {
		log.Fatalf("Failed to start flight recorder: %v", err)
	}
	log.Println("Flight recorder started.")
}

// writeTrace captures a snapshot of the trace data and writes it to a file.
func writeTrace() {
	f, err := os.Create("trace.out")
	if err != nil {
		log.Printf("Failed to create trace file: %v", err)
		return
	}
	defer f.Close()
	if _, err := fr.WriteTo(f); err != nil {
		log.Printf("Failed to write trace data: %v", err)
	}
	log.Println("Trace data written to trace.out")
}

// heavyLoad simulates a CPU-intensive task.
func heavyLoad() {
	var wg sync.WaitGroup
	for i := 0; i < 2; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for j := 0; j < 1e8; j++ {
				_ = fmt.Sprintf("hello")
			}
			time.Sleep(100 * time.Millisecond)
		}()
	}
	wg.Wait()
}

// handler is the HTTP handler for our web server.
func handler(w http.ResponseWriter, r *http.Request) {
	start := time.Now()
	heavyLoad()
	duration := time.Since(start)
	fmt.Fprintf(w, "Work done in %v\n", duration)
	log.Printf("Request handled in %v", duration)

	// Trigger the trace if the request is slow.
	if duration > 300*time.Millisecond {
		once.Do(func() {
			log.Println("Slow request detected, capturing trace...")
			writeTrace()
		})
	}
}

func main() {
	// Configure and start the flight recorder when the application starts.
	configureAndStartFlightRecorder()
	// Ensure the flight recorder is stopped gracefully on exit.
	defer fr.Stop()

	http.HandleFunc("/", handler)
	log.Println("Server starting on port 8080...")
	if err := http.ListenAndServe(":8080", nil); err != nil {
		log.Fatalf("Server failed: %v", err)
	}
}

Explaining the Example

  1. main() function: When the program starts, it immediately calls configureAndStartFlightRecorder() to get the recorder running. We use defer fr.Stop() to ensure the recorder is cleanly shut down when the application exits.
  2. handler() function: This function handles incoming web requests. It measures how long the heavyLoad() function takes. If the duration exceeds our 300ms threshold, it triggers writeTrace() to save the debug information.

How to Analyze the Data

After running the server and making a request that triggers the trace, a file named trace.out will be created. You can analyze this file using Go's built-in toolchain.

  1. Run the trace tool. Open your terminal and run the following command:
go tool trace trace.out
  1. View the trace in your browser. This command will start a local web server and open a new tab in your browser with the trace visualization.

The web UI provides several views to help you understand what your program was doing:

  • Goroutine analysis: Shows you all the goroutines that were running, how long they ran, and where they were created.
  • Processor trace view: Gives you a timeline of what was happening on each CPU core. You can see when goroutines were running, when garbage collection (GC) occurred, and how busy the processors were.
  • Heap view: Shows how memory was allocated and freed over time.

When to Use the Flight Recorder

The trace flight recorder is not for everyday use. It's a specialized tool for specific situations where traditional debugging and logging are not enough.

  • Production Debugging: When you have an error that only happens in your live production environment and is hard to reproduce locally. The flight recorder can capture the state of the application right when the error occurs with minimal overhead.
  • Intermittent Performance Issues: For tracking down performance problems that don't happen consistently, like a request that is suddenly slow for no obvious reason.
  • Memory-Constrained Environments: In situations where you can't afford the overhead of continuous, full-scale tracing, the flight recorder provides a lightweight alternative to maintain observability.

By using the trace flight recorder, you can gain deep insights into your Go applications and solve some of the most challenging bugs.

It's important to note that the example code can be optimized in many ways; its primary purpose is to demonstrate how the trace feature works in a clear and understandable manner.

Crafted with love by Mohammed Manssour