Usage

The library provides an ability to create API over various content. Currently there are bundled tools to work with HTML, XML, CSV, JSON and YAML. Initially it was created to work with requests library.

Basic setup

Basic parsers can be declared in the following way:

from pyanyapi.parsers import HTMLParser


class SimpleParser(HTMLParser):
    settings = {'header': 'string(.//h1/text())'}


>>> api = SimpleParser().parse('<html><body><h1>Value</h1></body></html>')
>>> api.header
Value

Or it can be configured in runtime:

from pyanyapi.parsers import HTMLParser


>>> api = HTMLParser({
    'header': 'string(.//h1/text())'
}).parse('<html><body><h1>Value</h1></body></html>')
>>> api.header
Value

To get all parsing results as a dict there is parse_all method. All properties (include defined with @interface_property decorator) will be returned.

from pyanyapi.parsers import JSONParser

>>> JSONParser({
    'first': 'container > 0',
    'second': 'container > 1',
    'third': 'container > 2',
}).parse('{"container":["first", "second", "third"]}').parse_all()
{
    'first': 'first',
    'second': 'second',
    'third': 'third',
}

Complex setup

In some cases you may want to apply extra transformations to result list. Here comes “base-children” setup style.

from pyanyapi.parsers import HTMLParser


class SimpleParser(HTMLParser):
    settings = {
        'test': {
            'base': '//test',
            'children': 'text()|*//text()'
        }
    }


>>> api = SimpleParser().parse('<xml><test>123 </test><test><inside> 234</inside></test></xml>')
>>> api.test
['123 ', ' 234']

There is another option to interact with sub-elements. Sub parsers!

from pyanyapi.parsers import HTMLParser


class SubParser(HTMLParser):
    settings = {
        'href': 'string(//@href)',
        'text': 'string(//text())'
    }


class Parser(HTMLParser):
    settings = {
        'elem': {
            'base': './/a',
            'parser': SubParser
        }
    }

>>> api = Parser().parse("<html><body><a href='#test'>test</a></body></html>")
>>> api.elem[0].href
#test
>>> api.elem[0].text
test
>>> api.parse_all()
{'elem': [{'href': '#test', 'text': 'test'}]}

Also you can pass sub parsers as classes or like instances.

Settings inheritance

Settings attribute is merged from all ancestors of current parser.

from pyanyapi.parsers import HTMLParser


class ParentParser(HTMLParser):
    settings = {'parent': '//p'}


class FirstChildParser(ParentParser):
    settings = {'parent': '//override'}


class SecondChildParser(ParentParser):
    settings = {'child': '//h1'}


>>> FirstChildParser().settings['parent']
//override

>>> SecondChildParser().settings['parent']
//p

>>> SecondChildParser().settings['child']
//h1

>>> SecondChildParser({'child': '//more'}).settings['child']
//more

Results stripping

Parsers can automagically strip trailing whitespaces with strip=True option.

from pyanyapi.parsers import XMLParser


>>> settings = {'p': 'string(//p)'}
>>> XMLParser(settings).parse('<p> Pcontent </p>').p
 Pcontent
>>> XMLParser(settings, strip=True).parse('<p> Pcontent </p>').p
Pcontent