Blog of science and life

Contact
Archives
Feed

Django full text search (Postgres)


Link as QR code of article Django full text search (Postgres)

Let's say we have a large database, few milions row, one-to-many relationship models with some text field, and we want to search some keywords.

Traditional way will be real pain and slow, I know. So let's do something smart and enjoy lightning-fast execution with Full Text Search.


Suppose we have an app named main and it's models look like this

from django.db import models

class Parent(models.Model):
   title = models.CharField(max_length=255)

class Child(models.Model):
   parent = models.ForeignKey(Parent, on_delete=models.CASCADE, related_name="children")
   content = models.TextField(null=True, blank=True)

For example, we want to search keyword in childs content, and we want to do it quick! You can do like this, but that not quick enough for me. Maybe because I have too damn much data. But there are other way. First we need to add this line to settings.py

settings = [
    # ...
    "django.contrib.postgres",
    # ...
 ]

Then we add SearchVectorField to the model

search_vector = SearchVectorField(null=True, blank=True)

Finally we can create index

from django.contrib.postgres.search import SearchVectorField
from django.contrib.postgres.indexes import GinIndex

class Child(models.Model):
   content = models.TextField(null=True, blank=True)
   search_vector = SearchVectorField(null=True, blank=True)

   class Meta:
      indexes = [GinIndex(fields=['search_vector'])]

Now for the old data, we need to run this line to update their search vector, you may run this as a script or run it in python manage.py shell

from django.contrib.postgres.search import SearchVector

Child.objects.update(search_vector=SearchVector('content'))

That will do the trick. Now you can search like a pro, high accuracy and high performance.

Child.objects.filter(search_vector=keywords)

Child.objects.filter(search_vector=keywords).values_list("parent", flat=True).distinct()  # To get all parent objects, for various purposes

You may think, "Oh, simple and elegant, like the way nature should be". But that's not it. We not done yet. We need to update search vector for new created objects too. And we'll using the power of Django Signal.

Let's create a file signals.py inside main directory, and fill it with

from django.db.models.signals import post_save
from django.dispatch import receiver
from django.contrib.postgres.search import SearchVector

from main.models import Child

@receiver(post_save, sender=Child)
def update_search_vector(sender, instance, **kwargs):
   Child.objects.filter(id=instance.id).update(search_vector=SearchVector('content'))

You may ask "Why not just assign instance.search_vector=SearchVector('content') then instance.save()?". Good question, but we can not do that. You may try and got this error

F() expressions can only be used to update, not to insert.

Search and read the reasons. Last thing, we need to register the signal to our app. Update the file main/apps.py to add this line:

def ready(self):
   import main.signals

Then main/__init__.py

default_app_config = "main.apps.MainConfig"

And we are officially done. Good luck.