I'm a solo founder bootstrapping from Cameroon. No VC. No team. No AWS bill I can expense.
Just me, a 6GB ARM box on Oracle Cloud Free Tier, and a product safety database that grew
to 1.7 million rows before I had any real infrastructure to handle it.
The platform is called LemonKnows — it tracks government recalls, appliance defects,
and ingredient safety data across 41,000+ global recall records. Think "can I trust this
product before I buy it?" but actually fast and actually free for users.
Here are the three architectural decisions that kept the whole thing from falling over.
Early on, every category page triggered this:
SELECT COUNT(*) FROM products WHERE category = 'automotive' AND status = 'active';
On 1.7M rows, PostgreSQL's MVCC means full sequential scans. Under any real load, this
stalled the server for up to 45 seconds.
The fix was brutal in its simplicity — I stopped asking the database to count things at
query time. I built a stats_cache table: a flat key-value store of pre-aggregated
counters. Every time a background ingestion job runs or a product status changes, it
increments or decrements the relevant counter.
Front-end requests now do a 2ms primary key lookup instead of a 45-second scan:
SELECT total_count FROM stats_cache WHERE key = 'products_automotive_active' LIMIT 1;
The tradeoff: counts are eventually consistent, not real-time. For a consumer safety
product, that's completely fine.
I was indexing category, status, and score separately. Completely useless when your
actual query filters by all three simultaneously and then orders by score.
Dropped the single-column noise. Deployed one targeted composite index:
CREATE INDEX idx_products_cat_status_score
ON products (category, status, score DESC);
Column order matters. It mirrors the exact execution sequence of the WHERE and
ORDER BY clauses, which lets the query planner skip sequential scans entirely and go
straight to index-only lookups.
LemonKnows serves 7 languages. Dynamically calling an LLM API on every international
request would have destroyed both latency and budget immediately.
Instead, I built a translation proxy that treats localization as a content-addressed cache.
The flow:
ugc_translations where hash = SHA256 and lang = 'de'Every string gets translated exactly once. After that, it's free to serve to any
international user or search engine crawler forever.
1.7M products. 41,000 recalls. 7 languages. One free VM. 66ms load times.
No magic. Just aggressive caching, the right indexes, and refusing to pay for things that
don't need to cost money.
We are officially launching on Product Hunt this Tuesday — but more than the launch,
I'm genuinely curious how other people are solving these problems on constrained
infrastructure.
Brutal architectural roasts welcome. I've made plenty of mistakes and there's still a
lot I'd love to pressure-test.
The $0 budget isn't a limitation — it's a forcing function. Most founders over-provision because AWS credits or VC money make it easy. The real skill is knowing what your system actually needs vs what's comfortable.
The COUNT(*) -> stats_cache pattern is the right call. I'd add: same logic applies to the translation cache. Most teams would build this with a Redis cluster and a translation service. You did it with a hash lookup in Postgres. Same result, zero complexity.
Curious how you handle cache invalidation for the stats_cache when a bulk ingestion runs. Do you batch update or recalculate on a schedule?
That's seriously impressive.
I'm curious—what ended up being the hardest scaling problem that nobody talks about?
Honestly? Content fragmentation at scale.
The database was fast. The data was a mess.
Ingesting from wildly inconsistent government sources — CPSC, EU RAPEX, Health Canada — each with completely different schemas, different naming conventions for the same product, different recall severity classifications. Normalizing that into something a user can actually trust is the unglamorous work nobody writes architecture posts about.
Infrastructure scaling is a solved problem. Data quality scaling at 1.7M rows is mostly just you, alone, making judgment calls at 2am.
Impressive
Thanks
What stood out wasn't the architecture.
It was the tradeoffs you were willing to make.
Eventually consistent counts, cached translations, precomputed data... all of them sacrifice something in exchange for scale.
I'd be curious which compromise ended up feeling uncomfortable at first but later turned out to be the right decision. Those are usually the choices that shape the product more than the infrastructure itself.
Honestly, the COUNT(*) one was the hardest to make peace with.
There's something psychologically uncomfortable about showing a user a number you know is slightly stale. It felt like lying, even when the delta was maybe 3-4
products off on a 1.7M row dataset. Statistically irrelevant. Emotionally? It bugged
me for weeks.
What flipped it for me was reframing the question. I stopped asking "is this number
accurate?" and started asking "does this number need to be accurate for the user to
make a safe decision?"
On a consumer safety platform, someone checking whether a car seat has recalls doesn't need a live count of how many automotive products exist in the database. They need the recall data to be accurate. That's where real-time precision actually matters — and that's exactly where I didn't cut corners.
So the compromise ended up clarifying the product's actual contract with its users.
Not "everything here is live and precise" but "the safety data is precise, the
surrounding metadata is fast." That distinction shaped how I think about every
subsequent tradeoff.
You're right that it's rarely the infrastructure choices that define a product. It's
what you decide is allowed to be imperfect.
That's actually close to what caught my attention.
I don't think it's really an infrastructure question anymore.
Happy to share the fuller thought if useful. What's the best email to reach you on?
Thanks, I really appreciate it. You can reach me at [email protected].
Sent over the fuller context by email.
I am really looking forward to your feedback on how I can make this system even better.