nlppln

A flexible solution to build text mining workflows that allows you to quickly combine Natural Language Processing tools from different sources.

4
mentions
1
contributor
343 commits | Last update: November 14, 2018

Cite this software

Choose a version:
DOI:
[[ releases.length > 0 ? releases[selectedIndex].doi : conceptDOI ]]
Copy to clipboard
Choose a citation style:
Download file

What nlppln can do for you

  • Quickly build text mining and/or nlp workflows in Python
  • Combine tools written in different programming languages

Digital Humanities research often involves Natural Language Processing (NLP), in which a body of natural language text, or corpus, is analyzed using software. While there are many software packages available, constructing new research analyses by combining (parts of) existing packages remains challenging. This is due to the fact that individual software packages are designed to do a task and to do that task well; they are not primarily designed to interact with other, complementary packages. Another problem is that there are many tools available for English, but not for other languages.

nlppln (pronounced 'NLP pipeline') is an open source Python package that helps to address these problems, by making it easy to package existing tools in a uniform way as defined in the CWL (Common Workflow Language) standard for describing data analysis workflows. nlppln includes components to do tasks that are common in NLP, such as tokenization (multiple languages), lemmatization (for Dutch), and named entity recognition (for Dutch). These components are based on existing tools. Users can easily construct new analysis workflows by combining these pre-baked components with tools of their own creation.

Besides improving interoperability, nlppln also keeps a formal record of all steps taken in a workflow. This makes the research more transparent, and improves reproducibility.

Read more
Tags
  • Text analysis & natural language processing
Programming Language
  • Python
License
  • Apache-2.0
Source code

Participating organizations

Mentions

4 Presentations

  • A Tool for Flexible and Transparent Text Processing Pipelines
  • A Standard for NLP Pipelines
  • Creating Flexible and Transparent Data Processing Pipelines using Common Workflow Language
  • Flexible NLP Pipelines for Digital Humanities Research

Contributors

  • Janneke van der Zwaan
    Netherlands eScience Center
Contact person
Janneke van der Zwaan
Netherlands eScience Center