Building a OCR / Document AI pipeline used to be hard work.
- Training OCR models
- Building multi-step systems (eg: line segmentation, layout detection, table detection)
- Ensuring that these steps works nicely with each other
- Adding heuristics and special cases
- Doing a lot of testing to ensure your baseline is good enough
Now: you could just give a PDF to Gemini Flash and get a 'good enough' output in one API call. What used to take weeks / months can now be done in hours.
With AI models becoming more capable — you can often get very far on your problem statement if you just try. But if you are stuck in an old mindset and think about some problems as difficult or time consuming, you may not even attempt harder problems!
I don't think realisation has sunk-in yet. We still follow old patterns of behaviour, we still mostly try to build the same things as in the past.
Here's another recent example: I need to parse something reliably. The 'right way' to do this is to write a tokeniser / parser, but this is time-consuming. In the past, I would have just depended on some basic regexes and would have just got it working (and over time — fixed edge cases as they came up).
This time though: I decided to do it in the 'right way'. I worked with an AI coding agent and got a tokeniser and recursive descent parser working in around an hour. The AI wasn't perfect — I definitely needed to know the theory and know when to give the right inputs. Even though this wasn't automatic, I ended up up with at least a 10x productivity boost.
The hard lesson for me personally was:
- I needed to let go of my pre-conceived notions that a task can take days / weeks.
- I had to decide to build something ambitious.
I need to keep reminding myself of the fact that hard things are now easy, that the impossible may be actually possible.
I think this is problem where younger folks have an edge over more experience people: with experience comes caution, but now is the time to let go of caution and just build.
To everyone reading:
- I suggest partnering with one of the current frontier models
- Use Cursor, or Claude Code, or Windsurf, or v0, or Lovable, or anything else you want)
- Try to build something ambitious today!