Blog of science and life
Django full text search (Postgres)
Let's say we have a large database, few milions row, one-to-many relationship models with some text field, and we want to search some keywords.
Traditional way will be real pain and slow, I know. So let's do something smart and enjoy lightning-fast execution with Full Text Search.
Suppose we have an app named main and it's models look like this
from django.db import models
class Parent(models.Model):
title = models.CharField(max_length=255)
class Child(models.Model):
parent = models.ForeignKey(Parent, on_delete=models.CASCADE, related_name="children")
content = models.TextField(null=True, blank=True)
For example, we want to search keyword in childs content, and we want to do it quick! You can do like this, but that not quick enough for me. Maybe because I have too damn much data. But there are other way. First we need to add this line to settings.py
settings = [
# ...
"django.contrib.postgres",
# ...
]
Then we add SearchVectorField to the model
search_vector = SearchVectorField(null=True, blank=True)
Finally we can create index
from django.contrib.postgres.search import SearchVectorField
from django.contrib.postgres.indexes import GinIndex
class Child(models.Model):
content = models.TextField(null=True, blank=True)
search_vector = SearchVectorField(null=True, blank=True)
class Meta:
indexes = [GinIndex(fields=['search_vector'])]
Now for the old data, we need to run this line to update their search vector, you may run this as a script or run it in python manage.py shell
from django.contrib.postgres.search import SearchVector
Child.objects.update(search_vector=SearchVector('content'))
That will do the trick. Now you can search like a pro, high accuracy and high performance.
Child.objects.filter(search_vector=keywords)
Child.objects.filter(search_vector=keywords).values_list("parent", flat=True).distinct() # To get all parent objects, for various purposes
You may think, "Oh, simple and elegant, like the way nature should be". But that's not it. We not done yet. We need to update search vector for new created objects too. And we'll using the power of Django Signal.
Let's create a file signals.py inside main directory, and fill it with
from django.db.models.signals import post_save
from django.dispatch import receiver
from django.contrib.postgres.search import SearchVector
from main.models import Child
@receiver(post_save, sender=Child)
def update_search_vector(sender, instance, **kwargs):
Child.objects.filter(id=instance.id).update(search_vector=SearchVector('content'))
You may ask "Why not just assign instance.search_vector=SearchVector('content')
then instance.save()
?". Good question, but we can not do that. You may try and got this error
F() expressions can only be used to update, not to insert.
Search and read the reasons. Last thing, we need to register the signal to our app. Update the file main/apps.py to add this line:
def ready(self):
import main.signals
Then main/__init__.py
default_app_config = "main.apps.MainConfig"
And we are officially done. Good luck.