Complex content parsing

Combined parsers

In situations, when particular content type is unknown before parsing, you can create combined parser, which allows you to use multiply different parsers transparently. E.g. some server usually returns JSON, but in cases of server errors it returns HTML pages with some text. Then:

from pyanyapi.parsers import CombinedParser, HTMLParser, JSONParser


class Parser(CombinedParser):
    parsers = [
        JSONParser({'test': 'test'}),
        HTMLParser({'error': 'string(//span)'})
    ]

>>> parser = Parser()
>>> parser.parse('{"test": "Text"}').test
Text
>>> parser.parse('<body><span>123</span></body>').error
123

Another example

Sometimes different content types can be combined inside single string. Often with AJAX requests.

{"content": "<span>Text</span>"}

You can work with such data in the following way:

from pyanyapi.decorators import interface_property
from pyanyapi.parsers import HTMLParser, JSONParser


inner_parser = HTMLParser({'text': 'string(.//span/text())'})


class AJAXParser(JSONParser):
    settings = {'content': 'content'}

    @interface_property
    def text(self):
        return inner_parser.parse(self.content).text


>>> api = AJAXParser().parse('{"content": "<span>Text</span>"}')
>>> api.text
Text

Now AJAXParser is bundled in pyanyapi, but it works differently. But anyway, this example can be helpful for building custom parsers.