When Google took over the search scene years ago they did something that the likes of Dogpile, Metacrawler, Altavista, Yahoo etc didn't. They didn't have a better looking website that users preferred, or better customer service, or even a better social media presence with cat memes. They had quicker lookup times and more relevant results.
Clearly the technology was the edge. But how did they do it? Well, they built it. They built a non-SQL non-relational DB form of database - an inverted index.
Inverted indexes work by storing the data based on its content. So data gets tokenized and those tokens get indexed. When you want to get all documents based on say, car, the DB already has that indexed for quick lookup. This differs from relational databases in which it has to spin through the data to find it (some modern SQL DBs do have indexes but they don't compete with a proper II).
Several years later an Open Source product called Lucene was released and it was brilliant. You could lookup terms in sub-second time and to be honest why would you use SQL now? I used Lucene (and still do) a lot and it was great for most situations, but when the number of documents got over a few million and real-time indexing was constant it would slow down.
So I thought about making my own. You could split/shard Lucene but I thought optimizing on one machine would be very beneficial and cost efficient. I dug through the Lucene code and found a relevancy scorer - basically a way to rank how good of a match a search phrase is to each particular document. I then set on a journey deep into file systems, IO buffering, compression, serialization, code testing etc.
Java has the APIs for making just about anything and I spent a long time researching and testing various IO classes for best performance. Overall I wanted something that could handle many read/writes a second. I looked into disk performance (solid state was king) and file system types (reiserFS was king although it wasn't well supported). For those that aren't aware, a file system is a database and all modern databases are abstractions of filesystems. I eventually had a fair idea of the design of the index and started to code.
Theoretical max IO throughput was one of main considerations. Imagine a document with 1000 tokens and 1000 of these documents being written a second. 1m disk writes a second? Even a SSD would struggle. Add to that, you need to sync with read as well. Buffers needed to be used efficiently. In fact the sync between read-write buffers was a light bulb moment. Maybe the write and read threads could communicate and share? So that's what I did. Data didn't even need to be flushed to disk, the read thread already knew about it and would search it as soon as it arrived. These buffers could be variable sized based on available hardware resources so performance could be leveraged. CI/CD TDD indeed. And very "agile".
The performance was really good and the only theoretical max was disk space. Data would come in and be instantly searchable. I setup a multi-threaded news web-scraper that would send BFR (over a serialized socket) the documents and on the HTML frontend there were over 50 docs that displayed as "0 seconds ago". In fact, the first several pages were constantly 0 seconds ago. I knew it wouldn't hit any unforeseen errors because I knew the code and the infrastructure, and even if it did hit runtime errors I could fix it more easily than you could with off-the-shelf software.
This ran for months and resulted in a searchable index of several 100 GB. All searches remained sub-second (often it would be less than 10ms - and this is on a bog standard server btw). Eventually I had to shut the news scraper down for bandwidth reasons and due to the fact Google News owns the space but I still use BFR today as both a learning tool and for production.