Wrangling with OpenStreetMap Data

2014-11-20 17:58 | Source

Data Wrangling
Udacity

OpenStreetMap is an open project, which means it's free and everyone can use it and edit as they like. OpenStreetMap is direct competitor of Google Maps. How OpenStreetMap can compete with the giant you ask? It's depend completely on crowd sourcing. There's lot of people willingly update the map around the world, most of them fix their map country.

Openstreetmap is so powerful, and rely heavily on the human input. But its strength also the downfall. Everytime there's human input, there's always be human error.It's very error prone.

Take the name of the street for example. People like to abbreviate the type of the street. Street become St. st. In Indonesia, 'Jalan'(Street-Eng), also abbreviated as Jln, jln, jl, Jln. It maybe get us less attention. But for someone as Data Scientist/Web Developer, they expect the street to have generic format.

'Jalan Sudirman' -> Jalan <name> -> name = Sudirman
'Jln Sudirman' -> Jalan <name> -> ERROR!

This project tends to fix that, it fix abbreviate name, so it can use more generalize type. Not only it's benefit for professional, But we can also can see more structured words.

In this project, i want to show you to fix one type of error, that is the address of the street. I choose whole places of Jakarta. Jakarta is the capital of Indonesia.This dataset is huge, over 250,000 examples. It's my hometown, and i somewhat want to help the community. And not only that, i also will show you how to put the data that has been audited into MongoDB instance. We also use MongoDB's Agregation Framework to get overview and analysis of the data.

the changeset is here http://osmhv.openstreetmap.de/changeset.jsp?id=26730562

If you want to try this yourself, you can always download this source code, and play around with it :)

In [2]:

OSM_FILE = 'jakarta.osm'

In [11]:

%load mapparser.py

To audit the osm file, first we need to know the overview of the data. To get an overview of the data, we count the tag content of the data.

In [3]:

# %%writefile mapparser.py
#!/usr/bin/env python

import xml.etree.ElementTree as ET
import pprint

def count_tags(filename):
    """count tags in filename.
    
    Init 1 in dict if the key not exist, increment otherwise."""
    tags = {}
    for ev,elem in ET.iterparse(filename):
        tag = elem.tag
        if tag not in tags.keys():
            tags[tag] = 1
        else:
            tags[tag]+=1
    return tags

def test():

    tags = count_tags(OSM_FILE)
    pprint.pprint(tags)
#     assert tags == {'bounds': 1,
#                      'member': 3,
#                      'nd': 4,
#                      'node': 20,
#                      'osm': 1,
#                      'relation': 1,
#                      'tag': 7,
#                      'way': 1}

    

if __name__ == "__main__":
    test()

{'bounds': 1,
 'member': 1024,
 'meta': 1,
 'nd': 457105,
 'node': 342236,
 'note': 1,
 'osm': 1,
 'relation': 108,
 'tag': 191366,
 'way': 62473}

In [13]:

%load tags.py

In [69]:

%%writefile tags.py
#!/usr/bin/env python
import xml.etree.ElementTree as ET
import pprint
import re


lower = re.compile(r'^([a-z]|_)*$')
lower_colon = re.compile(r'^([a-z]|_)*:([a-z]|_)*$')
problemchars = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')


def key_type(element, keys):
    """ 
    Count the criteria in dictionary for the content of the tag.
    """
    if element.tag == "tag":
        if lower.search(element.attrib['k']):
            keys['lower'] +=1
        elif lower_colon.search(element.attrib['k']):
            keys['lower_colon']+=1
        elif problemchars.search(element.attrib['k']):
            keys['problemchars']+=1
        else:
            keys['other']+=1
        
    return keys



def process_map(filename):
    keys = {"lower": 0, "lower_colon": 0, "problemchars": 0, "other": 0}
    for _, element in ET.iterparse(filename):
        keys = key_type(element, keys)

    return keys



def test():
    # You can use another testfile 'map.osm' to look at your solution
    # Note that the assertions will be incorrect then.
    keys = process_map(OSM_FILE)
    pprint.pprint(keys)
#     assert keys == {'lower': 5, 'lower_colon': 0, 'other': 2, 'problemchars': 0}


if __name__ == "__main__":
    test()

Overwriting tags.py

In [14]:

%load users.py

In [67]:

%%writefile users.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import pprint
import re

def get_user(element):
    return


def process_map(filename):
    """
    Count the user id in the filename.
    """
    users = set()
    for _, element in ET.iterparse(filename):
        try:
            users.add(element.attrib['uid'])
        except KeyError:
            continue

    return users


def test():

    users = process_map(OSM_FILE)
    pprint.pprint(users)
#     assert len(users) == 6



if __name__ == "__main__":
    test()

Overwriting users.py

In [2]:

import xml.etree.cElementTree as ET

In [1]:

%load audit.py

In [46]:

# %%writefile audit.py

import xml.etree.cElementTree as ET
from collections import defaultdict
import re
import pprint
from optparse import OptionParser

# OSMFILE = "sample.osm"
# OSMFILE = "example_audit.osm"
#In Indonesia, type first, then name. So the regex has to be changed.
#street_type_re = re.compile(r'\b\S+\.?$', re.IGNORECASE)
street_type_re = re.compile(r'^\b\S+\.?', re.IGNORECASE)


# expected = ["Street", "Avenue", "Boulevard", "Drive", "Court", "Place", "Square", "Lane", "Road", 
#             "Trail", "Parkway", "Commons"]
expected = ['Jalan', 'Gang','Street', 'Road']
# UPDATE THIS VARIABLE
#Mapping has to sort in length descending.
#languange English-Indonesian{Street: Jalan}. 
#{Sudirman Stret:Jalan Sudirman}
mapping = {

            'jl.':'Jalan',
            'JL.':'Jalan',
            'Jl.':'Jalan',
            'GG':'Gang',
            'gg': 'Gang',
            'jl' :'Jalan',
            'JL':'Jalan',
            'Jl':'Jalan',
        
        }
# mapping = { 
#             "Ave":"Avenue",
#             "St.": "Street",
#             "Rd." : "Road",
#             "N.":"North",
#             "St" : "Street",
#             }


def audit_street_type(street_types, street_name):
    m = street_type_re.search(street_name)
    if m:
        street_type = m.group()
        if street_type not in expected:
            street_types[street_type].add(street_name)
            #return True if need to be updated
            return True
    return False


def is_street_name(elem):
    """
    Perhaps the addr:full should also included to be fixed  
    """
    return (elem.attrib['k'] == "addr:street") or (elem.attrib['k'] == "addr:full")

def is_name_is_street(elem):
    """Some people fill the name of the street in k=name.
    
    Should change this"""
    s = street_type_re.search(elem.attrib['v'])
    #print s
    return (elem.attrib['k'] == "name") and s and s.group() in mapping.keys()

def audit(osmfile):
    osm_file = open(osmfile, "r")
    street_types = defaultdict(set)
#     tree = ET.parse(osm_file, events=("start",))
    tree = ET.parse(osm_file)
    
    listtree = list(tree.iter())
    for elem in listtree:
        if elem.tag == "node" or elem.tag == "way":
            n_add = None
            
            for tag in elem.iter("tag"):
                if is_street_name(tag):
                    if audit_street_type(street_types, tag.attrib['v']):
                        #Update the tag attribtue
                        tag.attrib['v'] = update_name(tag.attrib['v'],mapping)
                elif is_name_is_street(tag):
                    tag.attrib['v'] = update_name(tag.attrib['v'],mapping)
                    n_add = tag.attrib['v']
                   
            if n_add:
                elem.append(ET.Element('tag',{'k':'addr:street', 'v':n_add}))

            
                
    #write the to the file we've been audit
    tree.write(osmfile[:osmfile.find('.osm')]+'_audit.osm')
    return street_types


def update_name(name, mapping):
    """
    Fixed abreviate name so the name can be uniform.
    
    The reason why mapping in such particular order, is to prevent the shorter keys get first.
    """
    dict_map = sorted(mapping.keys(), key=len, reverse=True)
    for key in dict_map:
        
        if name.find(key) != -1:          
            name = name.replace(key,mapping[key])
            return name

#essentially, in Indonesia, you specify the all type of street as Street. 
#So if it doesnt have any prefix, add 'Jalan'
    return 'Jalan ' + name


def test():
    st_types = audit(OSMFILE)
    pprint.pprint(dict(st_types))
    #assert len(st_types) == 3
    

    for st_type, ways in st_types.iteritems():
        for name in ways:
            better_name = update_name(name, mapping)
            print name, "=>", better_name


if __name__ == '__main__':
#     test()
    parser  = OptionParser()
    parser.add_option('-d', '--data', dest='audited_data', help='osm data that want to be audited')
    (opts,args) = parser.parse_args()
    audit(opts.audited_data)

This will save the jakarta osm that has been audited into jakarta_audit.osm Not let's prepare the audited file to be input to the MongoDB instance.

In [16]:

%load data.py

In [ ]:

# %%writefile data.py
#!/usr/bin/env python
import xml.etree.ElementTree as ET
import pprint
import re
import codecs
import json


lower = re.compile(r'^([a-z]|_)*$')
lower_colon = re.compile(r'^([a-z]|_)*:([a-z]|_)*$')
problemchars = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')
addresschars = re.compile(r'addr:(\w+)')
CREATED = [ "version", "changeset", "timestamp", "user", "uid"]
OSM_FILE = 'jakarta_audit.osm'

def shape_element(element):
    #node = defaultdict(set)
    node = {}
    if element.tag == "node" or element.tag == "way" :
        #create the dictionary based on exaclty the value in element attribute.
        node = {'created':{}, 'type':element.tag}
        for k in element.attrib:
            try:
                v = element.attrib[k]
            except KeyError:
                continue
            if k == 'lat' or k == 'lon':
                continue
            if k in CREATED:
                node['created'][k] = v
            else:
                node[k] = v
        try:
            node['pos']=[float(element.attrib['lat']),float(element.attrib['lon'])]
        except KeyError:
            pass
        
        if 'address' not in node.keys():
            node['address'] = {}
        #Iterate the content of the tag
        for stag in element.iter('tag'):
            #Init the dictionry

            k = stag.attrib['k']
            v = stag.attrib['v']
            #Checking if indeed prefix with 'addr' and no ':' afterwards
            if k.startswith('addr:'):
                if len(k.split(':')) == 2:
                    content = addresschars.search(k)
                    if content:
                        node['address'][content.group(1)] = v
            else:
                node[k]=v
        if not node['address']:
            node.pop('address',None)
        #Special case when the tag == way,  scrap all the nd key
        if element.tag == "way":
            node['node_refs'] = []
            for nd in element.iter('nd'):
                node['node_refs'].append(nd.attrib['ref'])
#         if  'address' in node.keys():
#             pprint.pprint(node['address'])
        return node
    else:
        return None


def process_map(file_in, pretty = False):
    """
    Process the osm file to json file to be prepared for input file to monggo
    """
    file_out = "{0}.json".format(file_in)
    data = []
    with codecs.open(file_out, "w") as fo:
        for _, element in ET.iterparse(file_in):
            el = shape_element(element)
            if el:
                data.append(el)
                if pretty:
                    fo.write(json.dumps(el, indent=2)+"\n")
                else:
                    fo.write(json.dumps(el) + "\n")
    return data

def test():

    data = process_map(OSM_FILE)
    pprint.pprint(data[500])


if __name__ == "__main__":
    test()

The processed map has ben saved to jakarta_audit.osm.json Now that we have process the audited map file into array of JSON, let's put it into mongodb instance. this will take the map that we have been audited. First we load the script to insert the map

In [3]:

from data import *
import pprint

In [39]:

data = process_map('jakarta_audit.osm')

Okay let's test if the data is something that we expect

In [33]:

pprint.pprint(data[0:6])

[{'created': {'changeset': '20029239',
              'timestamp': '2014-01-16T08:18:23Z',
              'uid': '646006',
              'user': 'Irfan Muhammad',
              'version': '13'},
  'id': '29938967',
  'pos': [-6.1803929, 106.8226699],
  'type': 'node'},
 {'created': {'changeset': '20029239',
              'timestamp': '2014-01-16T08:18:23Z',
              'uid': '646006',
              'user': 'Irfan Muhammad',
              'version': '28'},
  'id': '29938968',
  'pos': [-6.1803972, 106.8231199],
  'type': 'node'},
 {'created': {'changeset': '20029239',
              'timestamp': '2014-01-16T08:18:23Z',
              'uid': '646006',
              'user': 'Irfan Muhammad',
              'version': '9'},
  'id': '29938969',
  'pos': [-6.1809102, 106.8230928],
  'type': 'node'},
 {'created': {'changeset': '20029239',
              'timestamp': '2014-01-16T08:18:23Z',
              'uid': '646006',
              'user': 'Irfan Muhammad',
              'version': '15'},
  'id': '29938970',
  'pos': [-6.1808689, 106.8226461],
  'type': 'node'},
 {'created': {'changeset': '20029239',
              'timestamp': '2014-01-16T08:18:23Z',
              'uid': '646006',
              'user': 'Irfan Muhammad',
              'version': '10'},
  'id': '29938971',
  'pos': [-6.1805893, 106.8225613],
  'type': 'node'},
 {'created': {'changeset': '20029239',
              'timestamp': '2014-01-16T08:18:23Z',
              'uid': '646006',
              'user': 'Irfan Muhammad',
              'version': '11'},
  'id': '29938972',
  'pos': [-6.1805659, 106.8232191],
  'type': 'node'}]

The data seems about right. After we verified the data is ready, let's put it into MongoDB

In [4]:

from pymongo import MongoClient

In [5]:

client  = MongoClient('mongodb://localhost:27017')
db = client.examples

In [ ]:

[db.jktosm.insert(e) for e in data]

Okay, it seems that we have sucessfully insert all of our data into MongoDB instance. Let's test this

In [6]:

pipeline = [
    {'$limit' : 6}
]
pprint.pprint(db.jktosm.aggregate(pipeline)['result'])

[{u'_id': ObjectId('546d9d818cbd2f060eb432f2'),
  u'created': {u'changeset': u'11134443',
               u'timestamp': u'2012-03-29T07:25:28Z',
               u'uid': u'642271',
               u'user': u'ragunan',
               u'version': u'1'},
  u'id': u'1695812051',
  u'pos': [-6.2949894, 106.8198961],
  u'type': u'node'},
 {u'_id': ObjectId('546d9d818cbd2f060eb432f3'),
  u'created': {u'changeset': u'11134443',
               u'timestamp': u'2012-03-29T07:25:28Z',
               u'uid': u'642271',
               u'user': u'ragunan',
               u'version': u'1'},
  u'id': u'1695812052',
  u'pos': [-6.2950642, 106.8199212],
  u'type': u'node'},
 {u'_id': ObjectId('546d9d818cbd2f060eb432f4'),
  u'created': {u'changeset': u'11134444',
               u'timestamp': u'2012-03-29T07:25:27Z',
               u'uid': u'642195',
               u'user': u'tebet_timur',
               u'version': u'1'},
  u'id': u'1695812053',
  u'pos': [-6.2300963, 106.855384],
  u'type': u'node'},
 {u'_id': ObjectId('546d9d818cbd2f060eb432f5'),
  u'created': {u'changeset': u'11134443',
               u'timestamp': u'2012-03-29T07:25:28Z',
               u'uid': u'642271',
               u'user': u'ragunan',
               u'version': u'1'},
  u'id': u'1695812054',
  u'pos': [-6.2950931, 106.8189926],
  u'type': u'node'},
 {u'_id': ObjectId('546d9d818cbd2f060eb432f6'),
  u'created': {u'changeset': u'11134444',
               u'timestamp': u'2012-03-29T07:25:28Z',
               u'uid': u'642195',
               u'user': u'tebet_timur',
               u'version': u'1'},
  u'id': u'1695812055',
  u'pos': [-6.2301173, 106.8553364],
  u'type': u'node'},
 {u'_id': ObjectId('546d9d818cbd2f060eb432f7'),
  u'created': {u'changeset': u'11134443',
               u'timestamp': u'2012-03-29T07:25:28Z',
               u'uid': u'642271',
               u'user': u'ragunan',
               u'version': u'1'},
  u'id': u'1695812056',
  u'pos': [-6.2950931, 106.8211174],
  u'type': u'node'}]

Show 5 data that have street¶

In [7]:

pipeline = [
            {'$match': {'address.street':{'$exists':1}}},
            {'$limit' : 5}
]
result  = db.jktosm.aggregate(pipeline)['result']
pprint.pprint(result)

[{u'_id': ObjectId('546d9d758cbd2f060eb3916d'),
  u'address': {u'housename': u'Pasar Festival',
               u'street': u'Jalan HR Rasuna Said'},
  u'building': u'yes',
  u'created': {u'changeset': u'16848088',
               u'timestamp': u'2013-07-06T12:21:11Z',
               u'uid': u'76518',
               u'user': u'Firman Hadi',
               u'version': u'2'},
  u'id': u'1394516071',
  u'leisure': u'sports_centre',
  u'name': u'Soemantri Brojonegoro',
  u'pos': [-6.2213611, 106.8329498],
  u'sport': u'basketball',
  u'type': u'node'},
 {u'_id': ObjectId('546d9d768cbd2f060eb39c64'),
  u'address': {u'city': u'Jakarta',
               u'country': u'ID',
               u'housename': u'Meruvian Camp - Cempaka Baru',
               u'housenumber': u'39',
               u'street': u'Jalan Swadaya 2 No. 39'},
  u'created': {u'changeset': u'9758314',
               u'timestamp': u'2011-11-06T18:44:31Z',
               u'uid': u'70696',
               u'user': u'xybot',
               u'version': u'2'},
  u'id': u'1493006911',
  u'pos': [-6.1700951, 106.8655072],
  u'type': u'node'},
 {u'_id': ObjectId('546d9d6d8cbd2f060eb32173'),
  u'address': {u'housename': u'Gandaria City',
               u'postcode': u'12240',
               u'street': u'Jalan Sultan Iskandar Muda Kebayoran Lama'},
  u'created': {u'changeset': u'7760855',
               u'timestamp': u'2011-04-04T04:16:03Z',
               u'uid': u'431638',
               u'user': u'esoedjasa',
               u'version': u'1'},
  u'id': u'1231819753',
  u'name': u'Gandaria City',
  u'pos': [-6.2446998, 106.7832904],
  u'shop': u'supermarket',
  u'type': u'node'},
 {u'_id': ObjectId('546d9d6d8cbd2f060eb323e9'),
  u'address': {u'street': u'Jalan Sahari'},
  u'created': {u'changeset': u'11638099',
               u'timestamp': u'2012-05-18T22:26:16Z',
               u'uid': u'445671',
               u'user': u'flierfy',
               u'version': u'2'},
  u'highway': u'bus_stop',
  u'id': u'1278972435',
  u'name': u'Halte Sahari',
  u'pos': [-6.1277779, 106.8464371],
  u'type': u'node'},
 {u'_id': ObjectId('546d9d758cbd2f060eb39153'),
  u'address': {u'housename': u'Pasar Festival',
               u'housenumber': u'Kav C.22 Unit GF 05-06',
               u'postcode': u'12960',
               u'street': u'Jalan HR Rasuna Said'},
  u'amenity': u'restaurant',
  u'created': {u'changeset': u'10024298',
               u'timestamp': u'2011-12-03T18:57:09Z',
               u'uid': u'92274',
               u'user': u'adjuva',
               u'version': u'5'},
  u'cuisine': u'Indonesian',
  u'id': u'1394496957',
  u'name': u'Warung Tekko',
  u'phone': u'+62 21 5263137',
  u'phone2': u'+62 21 5263278',
  u'pos': [-6.2216971, 106.8328855],
  u'type': u'node',
  u'website': u'www.facebook.com/warungtekko'}]

Show the top 5 of contributed users¶

In [45]:

pipeline = [
            {'$match': {'created.user':{'$exists':1}}},
            {'$group': {'_id':'$created.user',
                        'count':{'$sum':1}}},
            {'$sort': {'count':-1}},
            {'$limit' : 5}
]
result  = db.jktosm.aggregate(pipeline)['result']
pprint.pprint(result)

[{u'_id': u'Firman Hadi', u'count': 113770},
 {u'_id': u'dimdim02', u'count': 38860},
 {u'_id': u'riangga_miko', u'count': 36695},
 {u'_id': u'raniedwianugrah', u'count': 30388},
 {u'_id': u'Alex Rollin', u'count': 26496}]

Show the restaurant's name, the food they serve, and contact number¶

In [9]:

pipeline = [
            {'$match': {'amenity':'restaurant',
                        'name':{'$exists':1}}},
            {'$project':{'_id':'$name',
                         'cuisine':'$cuisine',
                         'contact':'$phone'}}
]
result  = db.jktosm.aggregate(pipeline)['result']
pprint.pprint(result)

[{u'_id': u'Taman Hek'},
 {u'_id': u'3 House'},
 {u'_id': u'Jimbaran'},
 {u'_id': u'Death by Chocolate'},
 {u'_id': u"McDonald's"},
 {u'_id': u"Chef's Kitchen"},
 {u'_id': u'Planet Hollywood Jakarta', u'cuisine': u'american'},
 {u'_id': u'Soto kudus'},
 {u'_id': u'KFC Cikini', u'cuisine': u'chicken'},
 {u'_id': u'Mc Donald Cikini'},
 {u'_id': u'Pempek Cuko'},
 {u'_id': u'Warung Tekko',
  u'contact': u'+62 21 5263137',
  u'cuisine': u'Indonesian'},
 {u'_id': u'Kafe Betawi', u'cuisine': u'asian'},
 {u'_id': u'QBox Cafe', u'cuisine': u'asian'},
 {u'_id': u'Comics Cafe', u'cuisine': u'american'},
 {u'_id': u'Pizza Hut', u'cuisine': u'pizza'},
 {u'_id': u'Otel Lobby', u'cuisine': u'international'},
 {u'_id': u'Loewy', u'cuisine': u'french'},
 {u'_id': u'Food Court Passer Kuningan', u'cuisine': u'asian'},
 {u'_id': u'Pastis', u'cuisine': u'italian'},
 {u'_id': u'Pizza Hut', u'cuisine': u'pizza'},
 {u'_id': u'Dunkin donuts'},
 {u'_id': u'Warung Pasta'},
 {u'_id': u'Ayam Balphuss'},
 {u'_id': u'Riung Tenda'},
 {u'_id': u'Ayam Bakar Gilimanuk'},
 {u'_id': u'Ikan Bakar Banyuwangi'},
 {u'_id': u'Dim Sum Inc'},
 {u'_id': u'Heartz Chicken Buffet'},
 {u'_id': u'Ko he Noor'},
 {u'_id': u'de Resto'},
 {u'_id': u'Bakmi GM'},
 {u'_id': u'Caho Mung Qui Khach'},
 {u'_id': u'Dapur Melayu', u'cuisine': u'asian'},
 {u'_id': u'E Corner'},
 {u'_id': u'Ho Lung Sechan Cuisine', u'cuisine': u'asian'},
 {u'_id': u'Madam Kwok'},
 {u'_id': u'Mangotree Bistro'},
 {u'_id': u'Talaga'},
 {u'_id': u'Tgrill'},
 {u'_id': u'Usselsspring'},
 {u'_id': u'Eastern Promise'},
 {u'_id': u'Bubur Angke', u'cuisine': u'chinese'},
 {u'_id': u'Kembang Goela', u'cuisine': u'indonesia'},
 {u'_id': u'Kantin Mega Rasa', u'cuisine': u'indonesian'},
 {u'_id': u'Mbah Jingkrak Setiabudi', u'cuisine': u'indonesian'},
 {u'_id': u'Makan Babi'},
 {u'_id': u'3 house'},
 {u'_id': u'YaUdah bistro',
  u'contact': u'+62213140343',
  u'cuisine': u'german'},
 {u'_id': u'Mamink Daeng Tata', u'cuisine': u'regional'},
 {u'_id': u'Restoran Putri Duyung'},
 {u'_id': u'Le Bridge Restaurant'},
 {u'_id': u'Lanna Thai', u'cuisine': u'thai'},
 {u'_id': u'The Goods Diner'},
 {u'_id': u'Taco Local', u'cuisine': u'mexican'},
 {u'_id': u"Chili's", u'cuisine': u'american'},
 {u'_id': u'Hacienda ', u'cuisine': u'mexican'},
 {u'_id': u'Sederhana', u'cuisine': u'regional'},
 {u'_id': u'Dim Sum Restaurant', u'cuisine': u'international'},
 {u'_id': u'Warung Desa', u'cuisine': u'asian'},
 {u'_id': u'PEPeNERO'},
 {u'_id': u'Sakura Japanese Restaurant'},
 {u'_id': u'Pelangi Seafood ', u'cuisine': u'indonesian'},
 {u'_id': u'Restoran Kurnia Jaya'},
 {u'_id': u'Rumah Makan Padang Sederhana'},
 {u'_id': u'Wabito Ramen',
  u'contact': u'62 21 3923810',
  u'cuisine': u'japanese'},
 {u'_id': u'Rava House'},
 {u'_id': u'Musketeers'},
 {u'_id': u'Kebab Baba Rafi'},
 {u'_id': u'Bakul TUkul'},
 {u'_id': u'Nasi Bebek'},
 {u'_id': u'Ayam Panggang Rawamangun', u'cuisine': u'chicken'},
 {u'_id': u'Goma ramen', u'contact': u'081807217074', u'cuisine': u'japanese'},
 {u'_id': u'Takigawa', u'cuisine': u'Japanese'},
 {u'_id': u'Warung Pasta', u'cuisine': u'italian'},
 {u'_id': u'Rumah Solo'},
 {u'_id': u'Amigos', u'cuisine': u'mexican'},
 {u'_id': u'Amigos', u'cuisine': u'mexican'},
 {u'_id': u'Amigos', u'cuisine': u'mexican'},
 {u'_id': u'Koi'},
 {u'_id': u'sop janda', u'cuisine': u'regional'},
 {u'_id': u'Waroeng Kito', u'cuisine': u'chicken,_juice'},
 {u'_id': u'Sate Senayan'},
 {u'_id': u'Holy Cow'},
 {u'_id': u'Holy Cow'},
 {u'_id': u'MM Juice'},
 {u'_id': u'Abuba Steak'},
 {u'_id': u'Bubur Mangga Besar', u'cuisine': u'congee'},
 {u'_id': u'Pia Jakarta', u'cuisine': u'bakpia,hopia,pia'},
 {u'_id': u'Awen Seafood', u'cuisine': u'seafood'},
 {u'_id': u'Bluegrass',
  u'contact': u'+62 21 29941660',
  u'cuisine': u'american'},
 {u'_id': u'Warung Bang Hoody'},
 {u'_id': u'Bakmi Toko Tiga', u'cuisine': u'chinese'},
 {u'_id': u'Ayam Goreng Berkah Rachmat'},
 {u'_id': u'Ayam Goreng Suharti'},
 {u'_id': u'Bushido Restaurant'},
 {u'_id': u'Restoran Caping Gunung'},
 {u'_id': u'Bakso Lapangan tembak', u'cuisine': u'regional'},
 {u'_id': u'Baruna'},
 {u'_id': u'Pizza Hut Matraman', u'cuisine': u'pizza'},
 {u'_id': u'RM. Handayani'},
 {u'_id': u'kintamani'},
 {u'_id': u'sentral'},
 {u'_id': u'Kantin Umum', u'cuisine': u'variety_of_cuisines'},
 {u'_id': u'RM Raja Rasa', u'cuisine': u'regional'},
 {u'_id': u'RM Sederhana', u'cuisine': u'regional'},
 {u'_id': u'Sate Tomang', u'cuisine': u'regional'},
 {u'_id': u'warkop asep'},
 {u'_id': u'Iga Bakar Mas Giri', u'cuisine': u'regional'},
 {u'_id': u'Ayam Presto', u'cuisine': u'regional'},
 {u'_id': u'Masakan Rumah Ibu Endang', u'cuisine': u'regional'},
 {u'_id': u'Oenpao', u'cuisine': u'chinese'},
 {u'_id': u'rumah makan ibu ida'},
 {u'_id': u'fix me', u'cuisine': u'chinese'},
 {u'_id': u'Sederhana', u'cuisine': u'padang'},
 {u'_id': u'Saung Elbuston'},
 {u'_id': u'Rumah Makan Soto Betawi'},
 {u'_id': u'Warung Kopi'},
 {u'_id': u'Kampung Kandang'},
 {u'_id': u'La Codefin'},
 {u'_id': u'Kantin Prima Salemba'},
 {u'_id': u'DeJons Burger'},
 {u'_id': u'Bebek Kaleyo', u'cuisine': u'regional'},
 {u'_id': u'Q Smokehouse'},
 {u'_id': u'Kemang Food Fest'},
 {u'_id': u'Rumah Makan Padang', u'cuisine': u'international'},
 {u'_id': u'Bakmi Fajar', u'cuisine': u'regional'},
 {u'_id': u'Foof Court Pinang Ranti'},
 {u'_id': u'RM Sederhana'},
 {u'_id': u'warung soto'},
 {u'_id': u'Pizza Hut'},
 {u'_id': u'AYAM GORENG SUHARTI'},
 {u'_id': u'Ayam Goreng Ny. Suharti'}]