Semgrep Architecture: Comprehensive Reference

Semgrep (Semantic Grep) is a multi-language static analysis tool that matches code by its structure – not just its text – using a unified Abstract Syntax Tree and a rich pattern language. This document covers the full internal architecture from CLI entry-point to taint sink detection.

Table of Contents

  1. Introduction
  2. High-Level Architecture
  3. Component Breakdown
  4. The Full Analysis Pipeline
  5. Target Discovery & Filtering
  6. Rule Parsing & Optimization
  7. Parsing & the Universal AST
  8. The Matching Engine
  9. The Intermediate Language (IL) & CFG
  10. Taint Analysis (Dataflow)
  11. Output & Reporting Pipeline
  12. OSemgrep / RPC Architecture
  13. Key Data Structures

-1. Why?

I have been working as a DevSecOps engineer for nearly four years now. When I started, I had little exposure to tooling like SAST, SCA, or DAST. My mindset was firmly rooted in offensive security. Penetration testing was the goal, the dream.

[Read More]