A bug's life

How to log the wrong way in Flask

Our past decisions can stab us in the back in quite surprising and unexpected ways. This is no different during the natural evolution of our code. Only time can tell whether the choices we make today will turn out to be right or wrong in the future.
Of course, there are "habits", "methodologies", "patterns" and other things that can be followed, which will produce good results in most cases, but sometimes we still manage to be creative enough to shoot ourselves in the foot.

Today we'll try to achieve that using Python and Flask by extending a very complex "Hello World" application.

A bug's birth

So the project starts, everyone is excited, we finally have a blank page to fill in, and the development will not be slowed down by the weight of our previous (wrong) decisions. Soon we finish the first version.

class HelloView(MethodView):
    def get(self):
        return 'Hello World\n'

app = Flask(__name__)
app.add_url_rule('/', view_func=HelloView.as_view('hello'))

if __name__ == '__main__':
    app.run()

Soon after, we realise that such a complex application cannot exist without logging, so we add it as an afterthought.

class HelloView(MethodView):
    logger = logging.getLogger('view.hello')

    def get(self):
        self.logger.info('hello from view')
        return 'Hello World\n'

So far so good, calling the app works as expected:

$ curl localhost:8080
Hello World

And our message also appears in the logs:

{"name": "view.hello", "message": "hello from view"}

But as time goes on and requirements change, we need to be able to add extra data to the logs. Such data could be, say, a unique identifier generated for the requests.

class ContextLogger:
    def __init__(self, logger):
        self.__logger = logger
        self.__context = {}

    def add_context(self, key, value):
        self.__context[key] = value

    def info(self, message):
        self.__logger.info(message, extra=self.__context)

def add_request_id(logger):
    def decorator(f):
        @wraps(f)
        def decorated_function(self):
            request_id = uuid.uuid4()
            logger.add_context('request_id', request_id)
            logger.info(f'hello from middleware ({request_id})')
            return f(self, request_id)
        return decorated_function
    return decorator

class HelloView(MethodView):
    logger = ContextLogger(logging.getLogger('view.hello'))

    @add_request_id(logger)
    def get(self, request_id):
        time.sleep(0.01)
        self.logger.info(f'hello from view ({request_id})')
        return 'Hello World\n'

In reality, ContextLogger would be a bit more complex, implementing the other logging functions (warn, error, etc.) and handling the external extra parameters.

The sleep call is intended to simulate other work done by the view class. Like it should also call some backend service to collect the data needed to display Hello World or something like that. Anyway, the application still works fine.

{
  "name": "view.hello",
  "message": "hello from middleware (5f428aa7-cf68-4ae4-a003-2b95085f7f7d)",
  "request_id": "5f428aa7-cf68-4ae4-a003-2b95085f7f7d"
}
{
  "name": "view.hello",
  "message": "hello from view (5f428aa7-cf68-4ae4-a003-2b95085f7f7d)",
  "request_id": "5f428aa7-cf68-4ae4-a003-2b95085f7f7d"
}

Or does it? What happens if we try to send multiple requests at the same time?

$ curl localhost:8080 & curl localhost:8080 &
{
  "name": "view.hello",
  "message": "hello from middleware (36d1627c-2159-4865-92ed-c63969efde47)",
  "request_id": "36d1627c-2159-4865-92ed-c63969efde47"
}
{
  "name": "view.hello",
  "message": "hello from middleware (78062d9e-bb3a-4a74-a24c-e0ba22be6660)",
  "request_id": "78062d9e-bb3a-4a74-a24c-e0ba22be6660"
}
{
  "name": "view.hello",
  "message": "hello from view (36d1627c-2159-4865-92ed-c63969efde47)",
  "request_id": "78062d9e-bb3a-4a74-a24c-e0ba22be6660"
}
{
  "name": "view.hello",
  "message": "hello from view (78062d9e-bb3a-4a74-a24c-e0ba22be6660)",
  "request_id": "78062d9e-bb3a-4a74-a24c-e0ba22be6660"
}

The middleware logs look fine, but the view logs have the same request_id for both requests. Yet the message has the right value. What the f... udge.

The quick fix

The error is caused by the fact that our logger variable in the HelloView class is static, so it is only initialized once when the application is started and our module containing the class is loaded. We expected to have a different logger class per request. Or rather, we didn't even think about it. Logging worked before, it worked after, so there should be no problem here.

A possible fix might be to initialize the logger in the constructor, but then we can't simply pass it to the middleware (which might be necessary so the view and the middleware running before it create logs with the same name).

def add_request_id():
    def decorator(f):
        @wraps(f)
        def decorated_function(self):
            request_id = uuid.uuid4()
            self.logger.add_context('request_id', request_id)
            self.logger.info(f'hello from middleware ({request_id})')
            return f(self, request_id)
        return decorated_function
    return decorator

class HelloView(MethodView):
    logger = None

    def __init__(self):
        self.logger = ContextLogger(logging.getLogger('view.hello'))

    @add_request_id()
    def get(self, request_id):
        time.sleep(0.01)
        self.logger.info(f'hello from view ({request_id})')
        return 'Hello World\n'

The decorator can access all the parameters of the function it decorates, including self, which can still be used to access the logger. Not very elegant, but it works. That's what quick fixes are all about, right?

A more permanent solution

In Flask there is a thing called Request Context, the request variable will always refer to the request currently being processed, so if we could add our own data to it, that would solve the problem. By overriding some Flask classes we can solve this.

class MyRequest(Request):
    def __init__(self, environ, populate_request=True, shallow=False):
        super().__init__(environ, populate_request, shallow)
        self.__context = {}

    @property
    def context(self):
        return self.__context

    def add_context(self, key, value):
        self.__context[key] = value

class MyFlask(Flask):
    request_class = MyRequest

class RequestContextLogger:
    def __init__(self, logger):
        self.__logger = logger

    def info(self, message):
        self.__logger.info(message, extra=request.context)

def add_request_id(logger):
    def decorator(f):
        @wraps(f)
        def decorated_function(self):
            request_id = uuid.uuid4()
            request.add_context('request_id', request_id)

            logger.info(f'hello from middleware ({request_id})')
            return f(self, request_id)
        return decorated_function
    return decorator

class HelloView(MethodView):
    logger = RequestContextLogger(logging.getLogger('view.hello'))

    @add_request_id(logger)
    def get(self, request_id):
        time.sleep(0.01)
        self.logger.info(f'hello from view ({request_id})')
        return 'Hello World\n'

app = MyFlask(__name__)
app.add_url_rule('/', view_func=HelloView.as_view('hello'))

if __name__ == '__main__':
    app.run()

During my attempts to create this solution, I managed to shoot myself in the foot once again, just as I did with the logger. In the first version, the __context was not set in __init__, so every request used the same dict, which made it seem like it would not work. I didn't learn from my mistakes fast enough.

After a bit of digging, you may find that we're not the only ones who thought that adding extra data to request might be a desirable thing to do. We can replace our homebrew solution with Flask's g variable, which will help to get rid of a good chunk of code.

class GContextLogger:
    def __init__(self, logger):
        self.__logger = logger

    def info(self, message):
        self.__logger.info(message, extra=g.log_context)

def add_request_id(logger):
    def decorator(f):
        @wraps(f)
        def decorated_function(self):
            request_id = uuid.uuid4()
            g.log_context = {'request_id': request_id}

            logger.info(f'hello from middleware ({request_id})')
            return f(self, request_id)
        return decorated_function
    return decorator

class HelloView(MethodView):
    logger = GContextLogger(logging.getLogger('view.hello'))

    @add_request_id(logger)
    def get(self, request_id):
        time.sleep(0.01)
        self.logger.info(f'hello from view ({request_id})')
        return 'Hello World\n'

A bit better, but still, we could improve this. What happens, for example, if the view throws an exception and it is handled by a Flask error handler, which also happens to create a log? Will the extra data be on it? And on the logs of an external module we use?

The "final" solution

As far as anything can be final. Perhaps "a solution that meets the current requirements to the best of our knowledge" would be a better term. First of all, let's get rid of the GContextLogger class so that we don't have to wrap all our log instances in something. To do this, we can use Python's log filter, which is designed for filtering logs but is also often used for this purpose.

class LogContextFilter(Filter):
    def filter(self, record):
        if has_request_context() and hasattr(g, 'log_context'):
            for k, v in g.log_context.items():
                setattr(record, k, v)

        return True

Here we are a bit more careful about accessing g.log_context. We should only try to get data from it if we are processing a Flask request and it already has a log_context.

You can also get rid of the logger creation and use the Flask-configured app.logger instead. The only drawback is that you have to manually add the view name to the logs (if you really need it) because the name field will have the same value as the app.name variable.

def add_request_id():
    def decorator(f):
        @wraps(f)
        def decorated_function(self):
            request_id = uuid.uuid4()
            g.log_context = {
                'request_id': request_id,
                'view': self.__class__.__name__,
            }

            app.logger.info(f'hello from middleware ({request_id})')
            return f(self, request_id)
        return decorated_function
    return decorator

class HelloView(MethodView):
    @add_request_id()
    def get(self, request_id):
        time.sleep(0.01)
        app.logger.info(f'hello from view ({request_id})')
        return 'Hello World\n'
{
  "name": "test",
  "message": "hello from middleware (7575ef53-8764-4d17-967b-af2902691ac4)",
  "request_id": "7575ef53-8764-4d17-967b-af2902691ac4",
  "view": "HelloView"
}
{
  "name": "test",
  "message": "hello from view (7575ef53-8764-4d17-967b-af2902691ac4)",
  "request_id": "7575ef53-8764-4d17-967b-af2902691ac4",
  "view": "HelloView"
}

Since we probably want to do this for every request, we can also replace the add_request_id middleware with a global solution.

class HelloView(MethodView):
    def get(self):
        g.log_context['view'] = self.__class__.__name__

        app.logger.info(f'hello from view')
        return 'Hello World\n'

app = Flask(__name__)
app.add_url_rule('/', view_func=HelloView.as_view('hello'))

@app.before_request
def init_logging_context():
    g.log_context = {
        'request_id': uuid.uuid4(),
        'ip': request.remote_addr,
    }

One drawback of this solution is that we don't know which view will be called in before_request, so we can't put its name in the log_context.

This brings us to our current final solution. You can see the individual phases in their entirety at the related GitHub repository. Have a nice log.

Ez a bejegyzés magyar nyelven is elérhető: Egy hiba élete

Have a comment?

Send an email to the blog at deadlime dot hu address, or visit the tweet related to this post.

Want to subscribe?

We have a good old fashioned RSS feed if you're into that, and you can also follow the blog on Twitter.