soft-shell crabdifferent species of crabvietnamese mud crab
3
10 Comments

How I Run a 1.7M Product Search Engine at 66ms on a $0 Hosting Budget

I'm a solo founder bootstrapping from Cameroon. No VC. No team. No AWS bill I can expense.

Just me, a 6GB ARM box on Oracle Cloud Free Tier, and a product safety database that grew
to 1.7 million rows before I had any real infrastructure to handle it.

The platform is called LemonKnows — it tracks government recalls, appliance defects,
and ingredient safety data across 41,000+ global recall records. Think "can I trust this
product before I buy it?"
but actually fast and actually free for users.

Here are the three architectural decisions that kept the whole thing from falling over.

1. I Killed COUNT(*) Entirely

Early on, every category page triggered this:

SELECT COUNT(*) FROM products WHERE category = 'automotive' AND status = 'active';

On 1.7M rows, PostgreSQL's MVCC means full sequential scans. Under any real load, this
stalled the server for up to 45 seconds.

The fix was brutal in its simplicity — I stopped asking the database to count things at
query time. I built a stats_cache table: a flat key-value store of pre-aggregated
counters. Every time a background ingestion job runs or a product status changes, it
increments or decrements the relevant counter.

Front-end requests now do a 2ms primary key lookup instead of a 45-second scan:

SELECT total_count FROM stats_cache WHERE key = 'products_automotive_active' LIMIT 1;

The tradeoff: counts are eventually consistent, not real-time. For a consumer safety
product, that's completely fine.

2. Composite Indexes — Not Single-Column Ones

I was indexing category, status, and score separately. Completely useless when your
actual query filters by all three simultaneously and then orders by score.

Dropped the single-column noise. Deployed one targeted composite index:

CREATE INDEX idx_products_cat_status_score
ON products (category, status, score DESC);

Column order matters. It mirrors the exact execution sequence of the WHERE and
ORDER BY clauses, which lets the query planner skip sequential scans entirely and go
straight to index-only lookups.

3. A Zero-Cost Multi-Language AI Translation Layer

LemonKnows serves 7 languages. Dynamically calling an LLM API on every international
request would have destroyed both latency and budget immediately.

Instead, I built a translation proxy that treats localization as a content-addressed cache.
The flow:

  1. Incoming request for German text → generate SHA-256 hash of the English source string
  2. Look up ugc_translations where hash = SHA256 and lang = 'de'
  3. Cache hit → return local text instantly. API cost: $0
  4. Cache miss → background worker calls Gemini 2.5 Flash Lite, writes result to DB,
    returns to user

Every string gets translated exactly once. After that, it's free to serve to any
international user or search engine crawler forever.

The Result

1.7M products. 41,000 recalls. 7 languages. One free VM. 66ms load times.

No magic. Just aggressive caching, the right indexes, and refusing to pay for things that
don't need to cost money.

We are officially launching on Product Hunt this Tuesday — but more than the launch,
I'm genuinely curious how other people are solving these problems on constrained
infrastructure.

My Actual Questions for IH

  • How are you handling heavy aggregation queries without letting your DB become the
    bottleneck? Async cache invalidation like this, or something smarter?
  • What's your translation strategy for multi-language products? LLM-backed, static files,
    or a paid service?
  • Anyone else running production workloads on Oracle Free Tier? What's been your breaking
    point?

Brutal architectural roasts welcome. I've made plenty of mistakes and there's still a
lot I'd love to pressure-test.

lemonknows.com

on June 24, 2026
  1. 1

    That's seriously impressive.

    I'm curious—what ended up being the hardest scaling problem that nobody talks about?

    1. 1

      Honestly? Content fragmentation at scale.
      The database was fast. The data was a mess.
      Ingesting from wildly inconsistent government sources — CPSC, EU RAPEX, Health Canada — each with completely different schemas, different naming conventions for the same product, different recall severity classifications. Normalizing that into something a user can actually trust is the unglamorous work nobody writes architecture posts about.
      Infrastructure scaling is a solved problem. Data quality scaling at 1.7M rows is mostly just you, alone, making judgment calls at 2am.

  2. 1

    What stood out wasn't the architecture.

    It was the tradeoffs you were willing to make.

    Eventually consistent counts, cached translations, precomputed data... all of them sacrifice something in exchange for scale.

    I'd be curious which compromise ended up feeling uncomfortable at first but later turned out to be the right decision. Those are usually the choices that shape the product more than the infrastructure itself.

    1. 1

      Honestly, the COUNT(*) one was the hardest to make peace with.

      There's something psychologically uncomfortable about showing a user a number you know is slightly stale. It felt like lying, even when the delta was maybe 3-4
      products off on a 1.7M row dataset. Statistically irrelevant. Emotionally? It bugged
      me for weeks.

      What flipped it for me was reframing the question. I stopped asking "is this number
      accurate?" and started asking "does this number need to be accurate for the user to
      make a safe decision?"

      On a consumer safety platform, someone checking whether a car seat has recalls doesn't need a live count of how many automotive products exist in the database. They need the recall data to be accurate. That's where real-time precision actually matters — and that's exactly where I didn't cut corners.

      So the compromise ended up clarifying the product's actual contract with its users.
      Not "everything here is live and precise" but "the safety data is precise, the
      surrounding metadata is fast." That distinction shaped how I think about every
      subsequent tradeoff.

      You're right that it's rarely the infrastructure choices that define a product. It's
      what you decide is allowed to be imperfect.

      1. 1

        That's actually close to what caught my attention.

        I don't think it's really an infrastructure question anymore.

        Happy to share the fuller thought if useful. What's the best email to reach you on?

        1. 1

          Thanks, I really appreciate it. You can reach me at [email protected].

          1. 1

            Sent over the fuller context by email.

  3. 1

    I am really looking forward to your feedback on how I can make this system even better.

Trending on Indie Hackers
Priorities for launching a SaaS solo, with no budget User Avatar 228 comments I built a tool directory that doesn't pretend every founder has the same needs User Avatar 55 comments AI helped me ship faster. Then I forgot what my product actually does. User Avatar 33 comments I thought picking a voice for my app would take a day. It rebuilt everything. User Avatar 14 comments Most early-stage SaaS companies miss churn signals — here’s how to catch them early User Avatar 8 comments