Basic analysis pandas
Using pyodk
and pandas
for basic analysis¶
pyodk
simplifies connecting to ODK Central using Python. 🐍
If your preferred data science language is R, check out ruODK
and repvisforODK
.
To get started with pyodk
, you'll need to configure a Central server you'd like to connect to. Create a .pyodk_config.toml
file in your home directory. Copy the following configuration and replace the placeholders:
[central]
base_url = "https://www.example.com"
username = "my_user"
password = "my_password"
default_project_id = 123
pyodk
will use these credentials and defaults so that they never have to appear in your Python code.
To use these examples as-is, you will need to put this form definition on your server and make a few submissions.
You will need to create a Client
to establish a connection to your configured Central server so let's import the class. We'll also be using the pandas
library.
from pyodk.client import Client
import pandas as pd
with Client() as client:
submissions = client.submissions.get_table(form_id='fav_color')
df = pd.json_normalize(data=submissions['value'], sep='/')
df.head(3)
__id | first_name | age | favorite_color | favorite_color_other | location/type | location/coordinates | location/properties/accuracy | meta/audit | meta/instanceID | ... | __system/submitterId | __system/submitterName | __system/attachmentsPresent | __system/attachmentsExpected | __system/status | __system/reviewState | __system/deviceId | __system/edits | __system/formVersion | location | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | uuid:7993ed66-5f17-432d-896c-fd9cbc76bf07 | Tochakwa | 37 | o | Blue | Point | [7.436423, 10.528851, 646.1000366210938] | 16.204 | None | uuid:7993ed66-5f17-432d-896c-fd9cbc76bf07 | ... | 548 | WHO | 0 | 0 | None | rejected | None | 0 | 2022062100 | NaN |
1 | uuid:f1b96b04-4cf8-4bdf-8eeb-b1f2893e2022 | John Doe | 50 | g | None | Point | [17.071151, -22.555347, 1711.5999755859375] | 17.416 | None | uuid:f1b96b04-4cf8-4bdf-8eeb-b1f2893e2022 | ... | 548 | WHO | 0 | 0 | None | None | None | 0 | 2022062100 | NaN |
2 | uuid:3a470336-2de5-46eb-9da5-3a03815f9fde | Zeenia | 19 | y | None | Point | [-117.1115681395832, 32.773376588223655, 68.08... | 0.000 | audit.csv | uuid:d1c762c8-51ee-4966-a14b-5d50427c1534 | ... | 56 | Yaw | 1 | 1 | None | rejected | collect:cAOhxkeJksuCQfjE | 1 | 2022062100 | NaN |
3 rows × 23 columns
You can build graphs from form data:
colors = {'g':'Green', 'o':'Orange', 'r':'Red', 'y':'Yellow'}
df['favorite_color_labels'] = df['favorite_color'].map(colors)
df['favorite_color_labels'].value_counts().plot(kind='bar', title='Count of favorite colors', xlabel="Color", ylabel="Count", rot=0)
<AxesSubplot:title={'center':'Count of favorite colors'}, xlabel='Color', ylabel='Count'>
You can also analyze form metadata such as review state:
df['__system/reviewState'].value_counts().plot(kind='pie', title='Submission review state', ylabel='', rot=0)
<AxesSubplot:title={'center':'Submission review state'}>
With geopandas
, you can quickly plot points or create choropleth maps:
import geopandas
from geopandas import GeoDataFrame
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
base = world.plot(color='lightgrey', figsize=(15, 10))
base.set_axis_off()
locations = pd.DataFrame(df['location/coordinates'].dropna().to_list(), columns=['x', 'y', 'alt'])
geodf = GeoDataFrame(locations, geometry=geopandas.points_from_xy(locations['x'], locations['y']))
geodf.plot(ax=base, marker='o', color='blue', markersize=15)
<AxesSubplot:>