Databoard

With the datavana databoard service you can easily feed content into an LLM for automated analyses:

📄 SummaryWorkflow: Used for qualitative content analysis approaches. Summaries can, for example, be single words (topics, open categories) or contain multiple sentences (abstracts). If you provide categories, one summary or keyword is returned for each category.
🏷️ CodingWorkflow: Assign predefined categories to a text, i.e. coding in the sense of quantitative content analysis. You provide a codebook with category name, description, and optionally examples. Depending on your settings, the workflow will return one or multiple matching categories.
🖍️ AnnoWorkflow: Add annotations, for example, to extract person or place names. According to your rule set, the matching text segments are enclosed in XML tags.
🧩 TripleWorkflow: Extract propositions in a text. The workflow returns statements for each sentence in the input text. Each statement consists of a subject-predicate-object triple. Such a structure is, for example, used for knowledge graphs (RDF).

The databoard service generates prompts from templates and inserts your data and rules into the prompts. Prompts are posted one by one to the University of Münster UniGPT service (by now Llama-3.3-70B). Results are parsed and returned as structured JSON data. To work with raw LLM prompts and results, provide your own prompt templates and bypass pre- and postprocessing with the summary workflow.

Contact the Digital Media & Computational Methods research unit to get a user account for the service.

How to prepare your data?

We recommend to prepare your input data as a CSV file. The CSV file should contain your input in the text column.

case	text
1	The Beatles, Hey Jude, 1968
2	Michael Jackson, Thriller, 1982
3	Queen, Bohemian Rhapsody, 1975
4	Adele, Rolling in the Deep, 2010
5	Bob Dylan, Like a Rolling Stone, 1965
6	Nirvana, Smells Like Teen Spirit, 1991
7	Madonna, Like a Virgin, 1984
8	Elton John, Your Song, 1970
9	Taylor Swift, Shake It Off, 2014
10	Billie Eilish, Bad Guy, 2019

For workflows that use rules to extract or categorize content, prepare a codebook as JSON file. The codebook contains a list of rules. Each rule (depending on the workflow) consists of the keys:

category (the output, should only contain lowercase letters, numbers, underscores),
description (definition of the category or rule),
example (one or more examples matching the rule).

[
  {
    "category": "pop",
    "description": "Popular music with catchy melodies and broad appeal",
    "example": "Bruno Mars - Uptown Funk"
  },
  {
    "category": "rock",
    "description": "Music characterized by strong beats and electric guitars",
    "example": "Led Zeppelin - Stairway to Heaven"
  },
  {
    "category": "hip hop",
    "description": "Rhythmic music often featuring rapping and DJing",
    "example": "Eminem - Lose Yourself"
  },
  {
    "category": "jazz",
    "description": "Music with swing and blue notes, improvisation, and complex harmonies",
    "example": "John Coltrane - Giant Steps"
  },
  {
    "category": "classical",
    "description": "Western art music from the medieval to the modern period",
    "example": "Johann Sebastian Bach - Brandenburg Concerto No.3"
  },
  {
    "category": "country",
    "description": "Music with roots in American folk and western styles",
    "example": "Dolly Parton - Jolene"
  },
  {
    "category": "electronic",
    "description": "Music produced primarily with electronic instruments and synthesizers",
    "example": "Calvin Harris - Summer"
  },
  {
    "category": "rb",
    "description": "Rhythm and Blues featuring soulful vocals and groove-based instrumentation",
    "example": "Mary J. Blige - Family Affair"
  },
  {
    "category": "reggae",
    "description": "Music originating from Jamaica characterized by offbeat rhythms",
    "example": "Peter Tosh - Legalize It"
  },
  {
    "category": "metal",
    "description": "Loud, aggressive rock music with distorted guitars and emphatic rhythms",
    "example": "Iron Maiden - The Trooper"
  }
]

Instead of working with JSON directly, you can prepare a CSV or Excel file to be converted to JSON.

Let the LLM do the work!

Processing data with the databoard service is always a two-step procedure: 1. Submit tasks and get task IDs, 2. Get task results with your task IDs. This approach allows you to submit large samples (each case as a single task), let the service do its work and come back later to retrieve results once they are ready. All users, by now, share a queue and each task is processed one after another. Running jobs for hours or overnight is absolutely fine! In doubt reach out for us to coordinate resources.

The databoard service provides a simple REST API to push data and get results. If you don't want to code yourself, use Facepager to submit data and fetch results. Python scripts are well suited for interacting with the API (see the examples below). You can try out the API on the API documentation page with some examples. Read the documentation for further options not explained below.

Getting Started with Facepager

Download the latest Facepager version.
Click the Presets button and open the Databoard category. Read the preset explanations.
Add your data as nodes. Rule-based workflows such as coding tasks need a rule book. Either adjust the rules in the payload or put your JSON rules file into the upload folder and include it with a placeholder. See the preset explanation for how to do it.
Click the login button and enter your Databoard credentials.
Fetch data with a preset to submit tasks.
Fetch data with a preset to get task results. Keep fetching until all results are ready.
Adjust the column setup and export your data.

For larger samples, we recommend to first submit all cases one by one with a wait timeout of 0 (fast task submission) and then poll in reasonable intervals until the tasks are finished. For smaller samples, set the wait timeout to 10 seconds. The result will be immediately returned if 10 seconds were enough to process the input case. Otherwise, start polling with reasonable intervals.

Getting Started with Python

The following snippets contain an example how to submit one case for coding. This should get you started to develop a script for submitting larger jobs. Don't forget to implement error handling and a progress bar :)

See the API documentation for further options.

1. Get token

import requests

token_resp = requests.post(
    "https://databoard.uni-muenster.de/token",
    data={
        "username": "YOUR DATABOARD USERNAME",
        "password": "YOUR DATABOARD PASSWORD",
    }
)

access_token = ""
if token_resp.status_code == 200:
    access_token = token_resp.json()["access_token"]
    print("✅ Logged in")

Tokens are usually valid for one day.

2. Submit task

import requests

task_resp = requests.post(
    "https://databoard.uni-muenster.de/tasks/run",
    json={
        "task": "coding",
        "input": ["Adele, Rolling in the Deep, 2010"],

        "options": {
            "rules": [
                {
                    "category": "pop",
                    "description": "Popular music with catchy melodies and broad appeal"
                },
                {
                    "category": "rock",
                    "description": "Music characterized by strong beats and electric guitars"
                },
                {
                    "category": "classical",
                    "description": "Western art music from the medieval to the modern period"
                },
                {
                    "category": "country",
                    "description": "Music with roots in American folk and western styles"
                },
                {
                    "category": "rb",
                    "description": "Rhythm and Blues featuring soulful vocals and groove-based instrumentation"
                }
            ],
            "mode": "multi"
        }
    },
    headers={"Authorization": f"Bearer {access_token}"}
)

task_state = "PENDING"
task_id = ""

if task_resp.ok:
    task_result = task_resp.json()

    task_id = task_result['task_id']
    task_state = task_result['state']

    print(f"✅ Submitted task {task_id}")

Try out how the result looks like if you change the mode to "single".

3. Retrieve result

import time
import json
import requests

while (task_state == 'PENDING'):
    task_resp = requests.get(
        f"https://databoard.uni-muenster.de/tasks/run/{task_id}",
        headers={"Authorization": f"Bearer {access_token}"}
    )

    task_result = task_resp.json()

    task_id = task_result['task_id']
    task_state = task_result['state']

    if task_state == 'PENDING':
        print(f"⌛ Waiting 10 more seconds")
        time.sleep(10)

print(f"Task finished with state {task_state}")

print(json.dumps(task_result, indent=2))

You submit your cases as a list. In the results, you will find one answer for each input case of the task. While you can submit multiple cases in one task, we recommend to split cases for larger samples. Thus, you only submit one item in the input list and get exactly one item in the answers array.

The prompt templates

There are two prompt templates for each workflow: a system prompt and a user prompt. The templates can contain the placeholders {{text}} (replaced by your input text) and {{rules}} (replaced by your rule book, automatically formatted as markdown by the workflows).

Default system prompt for the coding workflow in mode=multi:

You are an expert in content analysis. Be precise and concise.
Use the codebook provided to decide for each category, whether the text falls into the category.
Please carefully read the text and determine for each category whether it applies.
Return one line for each category.
Each returned line must start with the category name, followed by a colon ':' and then followed by one of the codes
'2' if the category strongly applies, '1' if the category applies, '0' if not, '?' if you can't decide.
Return only the lines for each category - nothing else, no introduction, no explanation, no disclaimer!

Each workflow has its own post-processing procedures. For example, the coding workflow expects a list of category names followed by a colon and an output value. The LLM output is then parsed and returned as JSON by the API.

Default user prompt for the coding workflow in mode=multi:

The following text must be classified in all the categories defined in the codebook below.
Return one line per category. Nothing else, no introduction, no explanation, no disclaimer!

# Code Book:
{{rules}}


# Text to classify:
{{text}}

Changing the prompts is easy. Submit your own templates along the task options. Use the `summarize` task for full input and output control. Since the single summary task does not involve any rules, pre- and post-processing is skipped. Example:

task_resp = requests.post(
    "https://databoard.uni-muenster.de/tasks/run",
    json={
        "task": "summarize",
        "input": ["The moon landing in 1969 marked a monumental achievement in human history, showcasing humanity's ability to explore beyond Earth. Neil Armstrong's first steps on the lunar surface symbolized a giant leap forward for science and technology."],

        "options": {
            "prompts": {
                "system": "Output a list of person names contained in the input, comma separated. Output NA if no persons are found. Nothing else, no explanation!",
                "user": "{{text}}"
            }
        }
    },
    headers={"Authorization": f"Bearer {access_token}"}
)

Be aware: JSON does not allow line breaks inside values. Each value is a single line. Escape linebreaks with `\n`.