If you are reading this, you have decided to start on the exciting journey of exploring neural search with Jina.
Will the journey be easy? No guarantees, but we will make it as smooth as possible.
Will it be rewarding? For sure! By the end of this section, you'll have the skills to build simple neural search solutions.
As we recommend these steps, keep in mind that these are just ideal steps of progression designed for your ease and not a strict code of conduct that you have to always follow in the same order. Feel free to skip the parts if you are already familiar with them!
Before we understand what Neural Search is, it is important to note why search as a concept is important and what's the current approach to it.
Retrieving the right answer from any kind of data is crucial for all information systems. Traditionally the way to do this was using Symbolic Search - A keyword-based search that works on a fixed set of predefined rules. It works well for structured data, where you have a specific format, and you can store it using a relational database. Think of any movie, song, or video that can be stored along with its ID. So all items can be paired with their IDs and easily retrieved when required using a simple set of rules.
But nowadays, we have more and more data of various types, including unstructured data. This data doesn't stick to one specific format, and it's not possible to store it using a two-dimensional tabular structure. Think of a video you saw on YouTube, and you remember a quote you would like to revisit, but you can't recall what it was. But you remember a cat in the background, the exact moment of this quote. Neural Search would let you search through YouTube videos describing the scene you remember, making it more flexible and convenient than traditional Symbolic Search.
In short, neural search is a new approach to retrieving relevant information from unstructured data.
This is done by transforming the data into vector embeddings that can be semantically searched by leveraging ML models. To help you understand those concepts better, we have created a video that explains key concepts of neural search using cute fuzzy animals.
Jina AI equips developers with the tools to build end-to-end sophisticated search applications that can be easily scaled and deployed in the cloud.
Jina AI ecosystem consists of:
DocArray - A standard data structure for all kinds of data
Jina - An easy-to-use neural search framework with plug and play search pipeline
Jina Hub - A marketplace to share and reuse the components of search pipelines
Finetuner - A one-stop solution for finetuning any deep learning model
DocArray is a unique first-of-its-kind data structure for unstructured data. It can accommodate all kinds of data including text, images, audio, video, etc. DocArray is designed to be intuitive to use with Python, so you can get started right away without any pre-requisites.
If you are a data professional working with text, images, audio, video, etc where you have to use a new tool or library to process and manipulate different data types. DocArray is a one-stop solution for working with any data type. It can greatly accelerate your speed with representing, embedding, matching, visualizing, and sharing data while providing the flexibility to work with your favourite deep learning framework like TF, Keras, Pytorch, etc.
DocArray also provides a safe and secure environment for collaborating with others. It lets you share a DocumenArray using the push/pull feature over the internet. You can push a pre-processed DocumentArray with a unique ID, and your colleague sitting in a different part of the world can just pull and use it. To understand more about this feature, check out the blog and documentation.
To get started with DocArray, you can install it with the following command:
pip install docarray
To install DocArray with all the external dependencies, use the following command:
pip install "docarray[full]"
In the above section, we learned how Documents are the primary data type and can be used to contain any kind of data like text, images, audio, videos, tables, 3D Mesh, etc. We have designed a few tutorials for you to learn how to manipulate different data types in Jina:
Developing a neural search application using text can be an excellent first step. Text is widely available and easily accessible, creating a case for a variety of applications ranging from simple fuzzy string matching to intelligent question-answering. In this tutorial, you'll learn the different ways of maintaining textual content in Jina.
Images are a more detailed data type than plain text, as it makes it easier for people to comprehend without much context. Image data requires preprocessing before using it for building applications. A neural image search can range from traditional content-based to cross-modality retrieval. Follow this tutorial to learn the different ways to manipulate image data with Jina.
Audio is everywhere in the form of soundbites, music, ringtones, or even background noise. It is unstructured by nature and requires some level of preprocessing before being used in an application. This tutorial will walk you through the different techniques of manipulating audio with Jina.
Many people in the world consume video on a daily basis, whether that's TV, YouTube, or even animated GIFs. If you break down a video, it's just a collection of frames arranged sequentially to depict a story. To work with videos in your search application, follow this tutorial to learn how you can preprocess raw videos to make them fit for search applications.
Tabular data accounts for a large part of the structured data available digitally. Most of the data science and machine learning is based on tabular data and uses it to find insights and patterns. Follow this tutorial to learn about the different ways of working with tables in Jina.
A 3D mesh is the structural build of a 3D model consisting of polygons. Most meshes are created with software like Unity, Blender, etc. A mesh consists of three key concepts: Vertex, Edge, and Face. Follow this tutorial to understand working with 3D meshes in Jina.
You have learned the basic concepts of Jina, which has introduced you to the exciting world of Neural Search.
Take this quiz to continue your journey towards building future-proof search solutions and earn an exclusive beginner level certificate.
Next:
We’d appreciate any feedback you’d have about your experience with the learning bootcamp. Please check it out and provide us with your valuable feedback.