Our client makes computer graphics and special effects for large film making companies. The nature of the process requires adding the effects into every frame of a certain video clip, which is then saved as a separate file. Our client has accumulated over 200 million of these files over the years which were all held on physical tapes due to the nature of established filmmaking processes.
All of these tapes were made and categorized so that when the time comes, the right film is shipped to the client. This process was botched, however, as our client worked with multiple teams who did not follow the same patterns when naming the tapes. As such, the search for the right file was taking hours as it relied on searching through unstructured data.
A simple search by name or common symbols in a file’s name is a daunting and time-consuming task as the basic algorithm does not understand context. It also avoids similar files that could correspond to the user’s intent as they don’t have an exact keyword or phrase match in their name.
This is why we chose to realize a full-text search that’s based on the Apache Lucene technology.
A typical search relies on searching for an appropriate result in the entirety of the database. This search needs to run through the database several times in order to find the most fitting matches which, depending on the size of the base, can take anything from several minutes to several years. We chose to categorize all of the files so that se search does not have to rely on browsing through the entirety of the DB but rather explores the index for matching results.
Ordinary search is incapable of understanding context or looking for words and phrases that have a similar meaning. A full-text search that codes and then indexes the entire database, on the other hand, is much more capable of delivering satisfying results. The process works in a similar way to Google’s fuzzy search algorithms.
After careful consideration, we’ve come to the conclusion that Apache Lucene is the most suitable technology to build a full-text search on. The technology is implemented in products like Jira for reasons that are similar to our client’s use cases – a base to build advanced categorization and indexing algorithms , and plenty of adapters for multiple programming languages. This choice has greatly improved time-to-market delivery and saved our client nearly 20% off the estimated project’s scope.
We willingly chose to take the open-source route as this approach offers a series of significant benefits to our client:
After Catware’s engineers were done with the client’s pool of data, they were left with a simple, lightweight solution that saved them hours on a daily basis while offering a series of unexpected benefits:
We have delivered a smart, fast, and robust big data search system that streamlines the client’s workflows and shaves hours off their employee’s daily routines.
Investing in new high-end software solutions may seem like a risky step, especially for a business that’s not focused around the world of IT. The right partner to walk you and your team through the experience takes away the risks and empowers you to make quick, yet informed and data-driven decisions thus greatly improving successful software adoption rates. Future-proof your business today!