PyPi Project

The PyPi Project is an automated threat-intelligence dashboard designed to inspect newly-released Python packages on the Python Package Index (PyPI). By running static heuristics analysis and multi-stage LLM code reviews, the scanner acts as an early warning system against supply-chain attacks, credential theft, and malicious supply chain insertions.

Scanner Architecture

The scanner pipeline is built on top of serverless Cloudflare infrastructure, operating in three distinct stages:

  • RSS Feed Monitor: A scheduled cron trigger executes every 15 minutes to crawl the official PyPI RSS feed (https://pypi.org/rss/packages.xml) for newly published packages and versions.
  • Task Backlog Queue: Newly discovered packages are added to a scanning queue table. Packages are downloaded, decompressed, and processed concurrently by workers.
  • D1 Database Storage: The analysis report containing risk scores, matching heuristics, and LLM text summaries are stored in a Cloudflare D1 relational database, which serves this frontend dashboard.

Heuristic Checks

The static analysis engine inspects package files and metadata for 10 core threat indicators:

🛰️ Outbound Network Activity
Flags raw sockets connections, or HTTP clients (requests, urllib, httpx, aiohttp) executed inside package setup script configurations.
🧩 Obfuscation Signatures
Flags uses of base64-decoded source text arrays, ROT13 decoding strings, dynamic exec() or eval() blocks, and loading compressed bytecode via marshal.
💻 Subprocess & Shell Execution
Flags system command executions using subprocess.Popen or direct shell execution commands (os.system, os.popen).
🎯 Typosquatting Detection
Compares the package name using Levenshtein distance algorithms against the top 5,000 downloaded packages on PyPI to prevent developer phishing.

AI-Assisted Analysis

When heuristic flags exceed standard threshold risk limits, the source code and configuration snippets are routed to a multi-stage AI reasoning agent:

  1. Network & Syscall Audit: LLMs parse all network parameters and execution target flags to confirm legitimate package functionality.
  2. Payload Deobfuscation: The model decrypts base64 sequences and explains the actions taken by hidden setups.
  3. Security Synthesis: The final agent stage aggregates inputs from the previous phases to provide a cohesive safety verdict.

Understanding Verdicts

Each scanned package receives one of the following safety status verdicts:

  • Safe: No critical heuristics matching malicious behavior were triggered. Safe for developer ingestion.
  • Suspicious: Contains unusual code constructs (e.g. calling subprocesses in setup) that are not definitively malicious but merit user caution.
  • Malicious: Code contains verified malware signatures, such as uploading credentials, stealing tokens, or running shell commands backdoors.
  • Quarantined: The package has been removed from the official PyPI repository by the index security team.