Mastering Instruments (Part 4): Flame Graphs, Swift Concurrency Under the Microscope, and Processor Trace in Action
Learn to read Flame Graphs, audit async tasks with Swift Tasks, and push Processor Trace to its limits with a real CLI project that uses Swift Concurrency intensively.
In Part 1 we learned to use Instruments as technicians. In Part 2 we became doctors. In Part 2.5 we saw memory in action. In Part 3 we mastered the scientific method of profiling, the Call Tree surgery operations, and the three levels of analysis — from Time Profiler to Processor Trace.
Today we switch both the scalpel and the patient. We leave the SuperStuff app behind and open an entirely different project: a command-line tool that uses Swift Concurrency intensively. And we’ll discover visualizations that transform the way we read performance data.
It’s not about having more data. It’s about seeing it in a way that reveals what was previously invisible.
New Patient: Swift Evolution Metadata Extractor
The Swift Evolution Metadata Extractor is an official tool from the Swift team that generates JSON data for the Swift Evolution Dashboard. It’s a real, production project maintained by Apple.
Why is it perfect for our lab?
- Massive concurrency — It uses
withTaskGroupto process over 400 proposals in parallel. Hundreds of tasks being created, suspended, and resumed. - Intensive parsing — Each proposal is a Markdown file parsed with
swift-markdown(Apple’s official library). The parser generates a complete AST of the document. - Networking — It makes HTTP requests to the GitHub API to fetch each proposal’s content, using
URLSessionwithasync/await. - No UI — Being a CLI, there are no run loops, no views, no main thread competing for attention. All performance is measured in pure processing.
// The extractor's heart: TaskGroup processing proposals in parallelawait withTaskGroup(of: SortableProposalWrapper.self) { taskGroup in for spec in proposalSpecs { taskGroup.addTask { await readAndExtractProposalMetadata(for: spec, ...) } } for await result in taskGroup { proposals.append(result) }}
Flame Graphs — A New Way to See Data
The Origin: From Netflix to Xcode
Flame Graphs were born in 2011, created by Brendan Gregg while working at Netflix. His problem was simple: linear stack traces were impossible to read when you have thousands of samples. The solution was to stack functions visually, where each box’s width represents the percentage of samples — not chronological time.
This idea was so transformative that it was formally published in Communications of the ACM in 2016 and adopted universally. Apple integrated Flame Graphs into Instruments 16.3 as part of the Processor Trace instrument, though the visualization is also available in the standard Time Profiler.
How to Read a Flame Graph
Unlike the chronological timeline we already know, in a Flame Graph:
- The X axis is not time. It represents 100% of captured samples. A box’s width indicates what percentage of total execution time the CPU spent in that function.
- The Y axis is stack depth. Deeper functions are lower (in Instruments they appear as “icicles” — top to bottom).
- Order is by weight. Instruments places the widest boxes (highest sample percentage) on the left.
- Colors indicate category: blue for your code, purple for libraries, gray for system code, magenta for the Swift runtime.
In a Call Tree, the bottleneck hides among numbers. In a Flame Graph, it’s the widest plateau that jumps out at you.
How to Access the Flame Graph in Instruments
- Profile your app with Time Profiler or Processor Trace.
- In the detail view below, look for the Graph button in the top-right corner of the Call Tree view.
- Click it — the view instantly switches to the Flame Graph.

Visual Cleanup: Flatten to Boundary Frames
When the Flame Graph is dominated by an external library (like swift-markdown), you can clean up the noise without losing information:
- Control-click on any function from the library.
- Select “Flatten ‘swift-markdown’ to Boundary Frames”.
- All internal functions of that library collapse into a single bar, showing only the entry and exit points.
This is conceptually different from Flatten in the Call Tree (which we covered in Part 3). Here we’re not removing an individual function — we’re collapsing an entire library to its boundaries, so your code stands out above the noise.

Experiment with the interactive Flame Graph based on extractor data:
Flame Graph Explorer
Visualize profiling data from the Swift Evolution Metadata Extractor. Toggle between views and apply Flatten.
- Bar width = percentage of samples (not chronological time)
- Depth = position in the call stack (deeper = lower)
- Unexpectedly wide bar = potential bottleneck
- Colors: Blue = your code | Purple = libraries | Gray = system | Magenta = runtime
- Flatten to Boundary Frames = collapse library to its boundaries (clean up noise)
- Access: “Graph” button in the top-right corner of the Call Tree
Swift Concurrency Under the Microscope
The Call Tree and Flame Graphs show us where the CPU is spent. But when your app uses Swift Concurrency, there’s an equally important question: what are your tasks doing? How many are alive? How many are actually running? How many are suspended waiting for something?
The Swift Concurrency Template
Instruments includes a dedicated template: Swift Concurrency. When you select it, you get two main instruments:
- Swift Tasks — Tracks the lifecycle of every async task.
- Swift Actors — Monitors exclusive access to actors and their wait queues.
For the extractor, Swift Tasks is our star. When profiling the metadata extraction, the instrument captures every taskGroup.addTask as a new task with a unique identifier.
The Three Key Counters
At the top of the Swift Tasks track, Instruments shows three histograms:
- Running Tasks — How many tasks are executing simultaneously at any given moment. In our extractor, you’ll see spikes when the
TaskGrouplaunches extraction tasks. - Alive Tasks — How many tasks exist (created but not finalized). The difference between Alive and Running reveals how many tasks are suspended or queued.
- Total Tasks — Cumulative count of tasks created up to that point. Useful for detecting if more tasks are being created than necessary.
Running tells you how many tasks are working. Alive tells you how many exist. The difference between them is time your tasks spend waiting — and that’s what you should investigate.


Task Summary and Task Forest
Below the histograms, the detail panel offers two key views:
- Task Summary — A table showing how much time each task spent in each state: running, suspended, waiting for actor access. If you see a task with a lot of “Enqueued” time, it means it’s blocked waiting for exclusive access to an actor.
- Task Forest — A graphical representation of parent-child relationships between tasks. In our extractor, you’ll see the main task (
ExtractionJob.run) as the root, with hundreds of child tasks (one per proposal) organized under theTaskGroup.
Narrative View: A Task’s Biography
Select any task in the Task Summary and right-click → Pin Track. Instruments adds a dedicated track for that task in the timeline, and the Narrative View appears in the bottom panel.
The Narrative View is like reading a task’s biography:
- Which thread it started running on.
- Why it was suspended (waiting for a continuation, waiting for actor access, etc.).
- How much time it spent in each state.
- If it was waiting for another task, which task that was.
For our extractor, this reveals fascinating patterns: each extraction task starts running briefly to parse the Markdown, suspends waiting for I/O if it needs network data, and resumes to write the result.

- Template: Swift Concurrency (includes Swift Tasks + Swift Actors)
- Running Tasks = tasks executing right now (limited by cores)
- Alive Tasks = tasks created but not finalized
- Total Tasks = historical cumulative
- Task Summary = time per state (running, suspended, enqueued)
- Task Forest = parent-child relationships (structured concurrency)
- Narrative View = complete biography of an individual task
- Pin Track = right-click in Task Summary to pin a task to the timeline
Processor Trace — Going Deeper

In Part 3 we learned about the three profiling levels: Time Profiler (statistical, ~1kHz), CPU Profiler (hardware counters), and Processor Trace (every instruction). We know what Processor Trace is. Now let’s use it.
Hardware Requirements
Processor Trace requires cutting-edge chips:
- Mac with M4 or later
- iPad Pro with M4 or later
- iPhone 16 / iPhone 16 Pro or later
If you don’t have this hardware, don’t worry — you can analyze traces saved by someone on your team who does, on any Mac with Instruments 16.3+.
The Overhead Surprise
Perhaps the most counter-intuitive thing about Processor Trace is its overhead. When you’re recording every instruction executed by every core, you’d expect a brutal performance impact. But Apple reports overhead of only ~1%. The trick is that the hardware stores the information in a dedicated buffer and flushes it to disk asynchronously — without interfering with normal execution.
The real cost isn’t CPU overhead, but data volume: a few seconds of recording in a multi-threaded app can generate gigabytes of information. That’s why Apple recommends keeping recordings short and targeted.
Processor Trace in Action with the Extractor
- Open Instruments and select the Processor Trace template.
- Record 3-5 seconds during metadata extraction.
- Zoom in extremely (Option-drag) on the timeline.
What you’ll see is revealing: where Time Profiler showed thick bars, Processor Trace reveals a mosaic of tiny functions. You can literally see every call to swift_retain and swift_release — the reference counting operations that ARC executes behind the scenes (the ones we studied in Part 2.5).
Deterministic Flame Graph
With Processor Trace active, switch to the Flame Graph view (the Graph button in Call Tree). Now each bar reflects the exact count of instructions and cycles — not a statistical estimate. The difference is subtle but fundamental:
- In Time Profiler’s Flame Graph, a fast function that always executes between two samples might never appear.
- In Processor Trace’s Flame Graph, everything shows up. Nothing escapes.
- Hardware: M4 / A18 or later
- Overhead: ~1% (the real cost is data volume, not performance)
- Recommendation: Short recordings (3-5 seconds), targeted at the moment of interest
- Flame Graph: Deterministic — each bar reflects actual instructions executed
- Unique capability: See nanosecond functions (retain/release, destructors, thunks)
- Remote analysis: You can open saved traces on any Mac with Instruments 16.3+
Connecting the Dots
We started this series with buttons and templates. Today we analyzed a real CLI tool with hundreds of concurrent tasks, read its Flame Graphs, audited the lifecycle of its async tasks, and saw nanosecond-level operations with Processor Trace.
The arc has been deliberate: from the interface to the mental model, from the mental model to anatomy, from anatomy to the scientific method, and from the scientific method to the most advanced visualization tools. Each part builds on the one before it.
Tools change, templates get updated, instruments evolve. But the ability to observe, hypothesize, measure, and interpret is permanent. That’s what this series aims to cultivate.
References
- Analyzing CPU usage with the Processor Trace instrument — Apple Documentation — Apple’s official documentation on Processor Trace, including Flame Graphs and Charge/Prune/Flatten operations.
- Visualize and optimize Swift concurrency — WWDC22 — The session where Apple introduces the Swift Tasks instrument and the Narrative View.
- Optimize CPU performance with Instruments — WWDC25 — The most recent session on Processor Trace, Flame Graphs, and CPU Counters.
- Analyze hangs with Instruments — WWDC23 — How to use Instruments to diagnose hangs across all Apple platforms.
- Flame Graphs — Brendan Gregg — The official page of the Flame Graph creator, with philosophy, variants, and tools.
- The Flame Graph — Communications of the ACM — Brendan Gregg’s formal article in ACM about the visualization.
- How to find and fix slow code using Instruments — Paul Hudson (Hacking with Swift) — Paul Hudson’s practical guide to optimizing with Instruments.
- Using Instruments to profile a SwiftUI app — Donny Wals — Donny Wals’ tutorial on profiling with Instruments.
- Xcode Instruments Time Profiler — Antoine van der Lee (AvanderLee) — Antoine van der Lee’s tutorial on effective Time Profiler usage.
Related
-
- swift
- swift-zero-expert
- swift-fundamentals
Swift from Zero to Expert #5: Functions — first-class citizens
Parameters, labels, inout, function types and functions as values. The gateway to closures and functional programming.
-
- swift
- ios
- performance
Mastering Instruments (Part 3): Scientific Method, Advanced Time Profiler, and Profiling at Scale
Learn to diagnose performance issues as a scientific process. Master Weight vs Self-Weight, Charge/Prune/Flatten, and scale profiling with xctrace.
-
- swift
- swift-zero-expert
- swift-fundamentals
Swift from Zero to Expert #4: Control Flow — from if/else to pattern matching
if/else, exhaustive switch with pattern matching, guard as a philosophy, and how the compiler optimizes your decisions into jump tables.