Research on RORO's blog

Semgrep Architecture: Comprehensive Reference

contact@rodolpheg.xyz (0xRo) — Wed, 08 Apr 2026 00:00:00 +0000

Semgrep (Semantic Grep) is a multi-language static analysis tool that matches code by its structure – not just its text – using a unified Abstract Syntax Tree and a rich pattern language. This document covers the full internal architecture from CLI entry-point to taint sink detection.

Introduction
- What is SAST?
- History of Semgrep
High-Level Architecture
Component Breakdown
The Full Analysis Pipeline
Target Discovery & Filtering
Rule Parsing & Optimization
Parsing & the Universal AST
The Matching Engine
The Intermediate Language (IL) & CFG
Taint Analysis (Dataflow)
Output & Reporting Pipeline
OSemgrep / RPC Architecture
Key Data Structures

-1. Why?

I have been working as a DevSecOps engineer for nearly four years now. When I started, I had little exposure to tooling like SAST, SCA, or DAST. My mindset was firmly rooted in offensive security. Penetration testing was the goal, the dream.

Understanding Code Property Graphs

contact@rodolpheg.xyz (0xRo) — Tue, 05 Aug 2025 00:00:00 +0000

When I first started developing tools for source code auditing, my primary need was to track tainted data flows through complex codebases during manual code reviews. Initially, I turned to Tree-Sitter, which proved excellent for single-file analysis with its fast, incremental parsing capabilities. However, as I scaled to larger codebases with complex cross-file dependencies and data flows, Tree-Sitter’s AST-only approach became limiting. The challenge wasn’t just parsing individual files. It was understanding how data flows between functions, across modules, and through various execution paths during thorough manual security assessments.

Code auditing 101

contact@rodolpheg.xyz (0xRo) — Sat, 02 Aug 2025 00:00:00 +0000

Topics covered

This post explores the evolution from manual code review to automated security testing, covering:

The reality of manual code review and its limitations
Understanding vulnerabilities vs weaknesses
How SAST tools work under the hood
Taint analysis and data flow tracking
Sink-to-source vs source-to-sink methodologies
Mitigation strategies: whitelisting vs blacklisting
Dealing with false positives in practice
Choosing and implementing SAST tools at scale
The complementary relationship between manual and automated testing