Blog of science and life


Pywright - Render javascript websites


What is Playwright?

Playwright is a browser automation library from Microsoft. It can be used to automate tasks in the browser, for testing purpose.

But we can use it to scrape data from websites that use javascript to render the content.

Think about Facebook, Twitter, Instagram, etc. They use javascript to render the content. So we can't scrape them with normal tools like requests or scrapy.

Selenium is a popular tool for scraping javascript websites. It's powerful, have a lot of features, but it's slow and hard to use.

Playwright is a newplayer in this field. It's fast, have excelent simple API.

What is Pywright?

Pywright is a API service writen in python that use Playwright to render javascript websites. You can easily deploy it to any cloud provider then use it as a service.

Install Pywright

I've created a docker image for Playwright. It's contain …

Read more

Trick for Python dev

If you are a Python dev, I will share with you this trick to avoid future Python related problems.

After install a new, fresh OS, you should install those dependencies

Example, after installed a new Ubuntu for my PC, I will run this command

sudo apt-get update; sudo apt-get install make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev

Then I will install pyenv, which I think is the best python version management system.

curl -L | bash

And I never have a problem with Python anymore.

Read more

What is mask attention?

Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)

Under the hood, the model is composed of an encoder and a decoder.

The encoder processes each item in the input sequence, it compiles the information it captures into a vector (called the context). After processing the entire input sequence, the encoder sends the context over to the decoder, which begins producing the output sequence item by item.

A friendly introduction to Recurrent Neural Networks

Read more

Page 1 / 2 »