What is HKUDS/DeepCode?
Definition and scope
From idea to working software: HKUDS/DeepCode translates natural language and research content into executable code.
- Definition: HKUDS/DeepCode is an open-source project that translates natural language and research content into executable code.
- Scope: It integrates three capabilities:
- Code generation from papers (Paper2Code)
- Natural language to web interfaces (Text2Web)
- Backend/service boilerplate (Text2Backend)
Open-source and trending
Meet a fast-growing open-source project that delivers practical value for developers and researchers.
- Hosted on GitHub, it has gained quick traction thanks to its ambitious scope and real-world potential for researchers and developers.
- Active contributions, clear documentation, and ready-made pipelines help newcomers start experiments quickly.
Why Open Agentic exploring-archon-the-open-source-tool-transforming-ai-coding-assistants/”>Coding matters
Open Agentic Coding gives you open, agent-driven coding workflows where software agents collaborate with humans to design, implement, and test code—so you turn ideas into working software, faster.
- Reduces boilerplate and speeds up early-stage prototyping from papers and ideas to runnable code
- Agentic assistants propose starter code, templates, and scaffolds for common tasks (data loading, model training, evaluation), eliminating repetitive setup.
- From concept to prototype: wire together components described in a paper (models, optimizers, datasets) and see quick results.
- Prototype iteratively—swap in hypotheses, adjust configurations, and test ideas without boilerplate overhead.
- Promotes reproducibility with end-to-end pipelines from theory to implementation
- Open pipelines capture data, experiments, models, metrics, and results in a single, shareable workflow.
- From theory to practice: a single end-to-end workflow covers dataset curation, model code, training scripts, evaluation, and logs.
- Built-in versioning, environment specifications, and configuration management bolster reproducibility and auditing.
In short, Open Agentic Coding cuts boilerplate and delivers transparent, end-to-end workflows that turn ideas into reliable, reproducible software faster.
Core components: Paper2Code, Text2Web, and Text2Backend
Paper2Code
Paper2Code turns published research into executable code you can run today, closing the gap between scholarly writing and practical software. It translates papers, preprints, and algorithm descriptions into runnable starting points you can extend and customize, accelerating experimentation and prototyping.
- Converts research papers, preprints, and algorithm descriptions into executable code skeletons.
- It analyzes the narrative, pseudocode, formulas, and method steps in papers and translates them into runnable starting points you can extend and customize.
- Supports multiple languages and well-documented templates to accelerate experimentation.
- Templates are available in several programming languages, with clear documentation to help you adapt the skeleton to your environment and workflow.
As a tech evangelist, I’m excited about how Paper2Code lowers the barrier to turning ideas from papers into working code, enabling rapid iteration and validation across domains like machine learning, algorithms, and systems research.
Text2Web
Text2Web turns natural-language prompts into ready-to-use web interfaces. It’s a bridge from a description of what you want to a live UI you can open in any browser.
- Transforms natural-language prompts into interactive frontends and dashboards.
- Speeds up UI creation for ML demos, visualizations, and data exploration.
As a tech evangelist, I champion software that understands human language and translates it into practical interfaces. As a careful fact-checker, I remind readers that the reliability of such systems hinges on clear mappings from prompts to UI components, sensible defaults, and thoughtful handling of data inputs and privacy.
Text2Backend
Text2Backend turns natural-language software needs into ready-to-run backend scaffolding. Describe your requirements in plain language, and it generates a runnable server with APIs and data access layers.
- Generates backend services, APIs, and the wiring for endpoints described in plain language.
- Includes authentication boilerplate, data access code, and deployment-ready configurations.
What you typically get and how it helps:
- Automatically generates API routes, controllers, data models, and the connections between layers (API -> business logic -> data access).
- Authentication boilerplate for signup, login, tokens, and protected endpoints.
- Data access boilerplate with ORM/repository patterns, queries, and basic data validation.
- Deployment-ready configurations, including Dockerfiles, containerization notes, and sample CI/CD pipelines.
| Generated Output | Boilerplate & Deployment |
| Backend services, APIs, and wiring for endpoints described in plain language | Authentication boilerplate, data access boilerplate, deployment-ready configs |
| Data models and data access wiring | CI/CD manifests, environment configs, containerization |
Workflow and architecture
How data flows through the system
Data moves from input to a runnable product—here’s how the flow unfolds.
- Input: paper URLs or text prompts
- Sources include paper URLs (with metadata such as title and authors) or natural-language prompts describing the desired features and constraints.
- Validation and normalization ensure URLs are reachable and prompts stay within scope.
- Context extraction can fetch relevant abstracts or seed processing with prompt context.
- Processing: parsing inputs, mapping concepts to code templates, and generating scaffolds
- Parsing: extract requirements, relationships, and constraints from the input.
- Mapping concepts to code templates: translate requirements into reusable templates—CRUD scaffolds, authentication flows, and data models.
- Generating scaffolds: assemble folder structures, boilerplate files, configurations, and initial tests.
- Output: runnable code, UI components, and API endpoints, with tests and documentation
- Runnable code: a functional codebase that can be built and run locally or in CI.
- UI components: reusable frontend pieces wired to the generated API and data models.
- API endpoints with tests and docs: REST or GraphQL endpoints accompanied by test suites and developer-facing documentation.
- Documentation and tests: auto-generated docs and a suite of unit and integration tests ensuring reliability.
Typical pipelines
A solid pipeline turns research ideas into reliable software and compelling demos. Here are two practical patterns with clear, actionable steps.
- Paper-to-code pipeline: turn research papers into modular software with unit tests and representative datasets
- Convert the paper into modular software by identifying core algorithms, data formats, and expected outputs.
- Develop small, well-scoped units that can be tested in isolation (unit tests).
- Provide representative datasets that cover common and edge cases to simplify result reproduction.
- Run tests locally and in continuous integration to catch regressions as the codebase grows.
- Document reproducible results and include an easy-to-install environment (for example, a requirements file or container setup).
- End-to-end demos that show UI and backend working together
- Create a minimal backend API that processes data and returns results to the client.
- Build a lightweight UI that talks to the backend, presents results, and handles errors gracefully.
- Use a reproducible demo dataset or synthetic data to illustrate the full workflow from input to output.
- Show the complete flow: UI interactions trigger processing, then results render, letting stakeholders see the system in action.
Getting Started
Installation
Below is a practical Python script that automates installing HKUDS/DeepCode: it clones the repository, creates a virtual environment, installs dependencies, and installs the package in editable mode. Run it from anywhere; it creates a local DeepCode_install folder and prints progress.
import os
import sys
import subprocess
from pathlib import Path
def run(cmd, shell=False):
print(f"$ {' '.join(cmd) if not isinstance(cmd, str) else cmd}")
subprocess.run(cmd, shell=shell, check=True)
def main():
base = Path.cwd() / "DeepCode_install"
base.mkdir(parents=True, exist_ok=True)
os.chdir(base)
repo_url = "https://github.com/HKUDS/DeepCode.git"
if not (base / "DeepCode").exists():
run(["git", "clone", repo_url, "DeepCode"])
project = base / "DeepCode"
os.chdir(project)
venv_dir = project / ".venv"
if not venv_dir.exists():
run([sys.executable, "-m", "venv", str(venv_dir)])
if sys.platform == "win32":
python_bin = venv_dir / "Scripts" / "python.exe"
pip_bin = venv_dir / "Scripts" / "pip.exe"
else:
python_bin = venv_dir / "bin" / "python"
pip_bin = venv_dir / "bin" / "pip"
if not python_bin.exists():
raise SystemExit("Virtual environment creation failed.")
run([str(pip_bin), "install", "--upgrade", "pip"])
req = project / "requirements.txt"
if req.exists():
run([str(pip_bin), "install", "-r", str(req)])
if (project / "setup.py").exists():
run([str(pip_bin), "install", "-e", "."])
else:
run([str(pip_bin), "install", "."])
print("Installation complete. Activate the virtual environment to use DeepCode.")
if __name__ == "__main__":
main()
Example: Generate code from a paper
Example: Generate Python code from a research paper using HKUDS/DeepCode. This minimal script calls the DeepCode CLI to read a paper (PDF) and emit a Python file as a starting implementation. Ensure the DeepCode CLI is installed and the paper path is correct.
import subprocess
import sys
def generate_code_from_paper(paper_path, output_path="generated_code.py", language="python"):
# Assumes the DeepCode CLI is installed and available as 'deepcode'
cli = "deepcode"
args = [
"generate",
"--paper", paper_path,
"--lang", language,
"--out", output_path
]
result = subprocess.run([cli] + args, capture_output=True, text=True)
if result.returncode != 0:
print("DeepCode failed:\n", result.stderr, file=sys.stderr)
raise SystemExit(1)
return output_path
if __name__ == "__main__":
paper = "papers/algorithm_paper.pdf"
out = "generated_algorithm.py"
path = generate_code_from_paper(paper, out, language="python")
print(f"Generated code saved to {path}")

Leave a Reply