Semgrep (Semantic Grep) is a multi-language static analysis tool that matches code by its structure – not just its text – using a unified Abstract Syntax Tree and a rich pattern language. This document covers the full internal architecture from CLI entry-point to taint sink detection.
Table of Contents
- Introduction
- High-Level Architecture
- Component Breakdown
- The Full Analysis Pipeline
- Target Discovery & Filtering
- Rule Parsing & Optimization
- Parsing & the Universal AST
- The Matching Engine
- The Intermediate Language (IL) & CFG
- Taint Analysis (Dataflow)
- Output & Reporting Pipeline
- OSemgrep / RPC Architecture
- Key Data Structures
-1. Why?
I have been working as a DevSecOps engineer for nearly four years now. When I started, I had little exposure to tooling like SAST, SCA, or DAST. My mindset was firmly rooted in offensive security. Penetration testing was the goal, the dream.
[Read More]