An open-source tool that reads through the noise of scanned lecture notes and rewrites them as clean, modern LaTeX documents. Made by a student, for students.
§ 01 · WHY THIS EXISTS
University courses are full of brilliant lecturers whose course materials haven't aged well: scanned photocopies of handwritten notes, pages photographed with CamScanner, image-only PDFs with no text layer, no copy-paste, no search. Dense physics, statics, fluid mechanics, tensor notation — locked inside blurry pixels.
Generic OCR doesn't work. Tesseract chokes on
∂²u/∂x².
Google Docs mangles integrals. There's nothing built for STEM content
that doesn't cost a fortune or require a Mathpix subscription.
Palimpsest is the workaround. Drop a scanned PDF, get a clean
.tex
and a compiled .pdf
back — readable, searchable, printable, hand-in-able. That's the whole
promise.
§ 02 · BEFORE & AFTER
A typical page from a 1980s mechanics course, before and after Palimpsest:
Equations are typeset properly. Sections are numbered. The Table of Contents is generated. Every figure reference points somewhere real. Open it in Overleaf, edit, hand in.
§ 03 · HOW IT WORKS
Each page goes through a chain of small, replaceable stages. If one fails, the rest keep going; if you stop midway, you can resume from the last cached page.
Every page of the PDF is rasterised at 400 DPI so even the smallest indices stay legible.
Adaptive binarisation kills the photocopy grain, Hough deskew straightens the page, denoising smooths the rest.
A vision LLM reads the page image directly — formulas, Greek letters, integrals, indices — and emits a first LaTeX pass.
A small YAML ledger tracks variables, conventions and section structure so notation on page 3 still makes sense on page 50.
A post-pass scrubs the patterns that break Overleaf: stray code fences, banned macros, orphan TikZ, unbalanced math.
All pages are assembled into one document with a proper preamble, cover page and TOC, then compiled with xelatex.
§ 04 · WHAT YOU GET
Proper \section{}, \begin{equation}, \begin{tikzpicture}. Edit it like any other LaTeX document.
Every output ships with a typeset titlepage (title, subject, author, credit) so you can hand it in as-is.
Interrupted runs resume exactly where they stopped. You don't pay for the same page twice.
A notation defined on page 4 is still understood on page 47 — variables and conventions persist.
Every document is logged. Come back days later and the archive is still there with download links.
The output compiles with both xelatex and pdflatex via an iftex conditional.
Eight models across OpenAI and Anthropic. Use o4-mini for cheap-and-good, Claude Opus for hard pages.
Vision-direct mode lets the LLM OCR straight from images — no extra subscription, no extra API key.
§ 05 · FAQ
With o4-mini (the default), roughly $0.04 per page.
A 50-page lecture costs around $2. A 300-page textbook
chapter, around $12. Costs vary with page complexity
and image size — heavy figures and dense equations cost more than plain
text.
Your file is uploaded to the server, sent to the chosen LLM provider
(OpenAI or Anthropic), and the output is stored locally on the server.
Uploads are auto-purged after PALIMPSEST_UPLOAD_RETENTION_DAYS
(default: 7 days). The hosted instance is for personal/educational use —
if you're working on something sensitive, self-host it. The whole
codebase is on GitHub.
STEM content is full of equations, Greek letters, indices, integrals, and matrix notation. LaTeX is the only output format that renders all of that correctly without compromise. As a bonus, your hand-in will look like it was typeset by a publisher.
The pipeline degrades gracefully: even very degraded scans usually
produce a readable first pass, sometimes with a few transcription
errors. Use the slower but more accurate claude-opus or
gpt-4.1 for rough scans. The page-by-page cache means you
can re-process individual pages without restarting the whole run.
Yes. Clone the repo, fill in your API key in config.yaml,
run python server.py. There's also a Dockerfile + a
docker-compose.yml for a one-command deploy. See the
README on GitHub
for the full instructions.
The pipeline is currently optimised for French STEM content (because that's what I needed it for as a student), but it works with any Latin-script language the underlying LLM understands — English, Spanish, Italian, German, etc. Multi-language UX is on the roadmap.
A student called Abdullah Camur who got fed up with photocopied lecture notes. No company, no startup, no monetisation. Just a tool that exists because it had to. Source on GitHub, MIT licensed.
Drop a PDF on the workshop page. The pipeline takes a few minutes per
document — you'll get a live progress feed while it runs, and a clean
.tex
plus .pdf
at the end.