Corpus Generator & NLP Analysis


How to Use?

Get your API key. Create your corpus. Do your NLP research. Done.

Step 1. Get the API key.


Visit Springer Nature API Portal to get a free API key. Using an educational email address is recommended.

Step 2. Create corpus.

Use our Python 2 tool to make your own corpus that includes titles and abstracts of the publications with the selected keyword and year.


Download KnoGlo. You can also download or clone our repository.

Unzip it, and you will see a folder called source-code.

➋ Prepare for the workspace to run Python. For example, on Mac, set the current file path to the "source-code/Corpus-Generator-and-NLP" folder by using the cd command in Terminal. For example,

cd mypath/source-code/Corpus-Generator-and-NLP

➌ Run the corpus generator KnoGlo-Corpus-Generator-keyword-and-year.py by using this command:

python2 KnoGlo-Corpus-Generator-keyword-and-year.py

➍ KnoGlo corpus generator will first ask you to make a corpus folder. Enter your desired name. If you named a corpus that already exists in the Corpus-Generator-and-NLP folder, which is you would be directed to this folder where the program will next generate the output files here. Otherwise, it will make you a new corpus folder.

At the same time, another folder will be created in order to save the raw JSON output in TXT format.

➎ Say a keyword as a topic of your choice. It could be a single word or phrase.

➏ Next, enter a year for the publications that you'd like to see.

➐ Enter your Springer Nature API key. Press enter or return to start generating files.

Each time when you run this tool, it will generate a text file in TXT format that includes all titles and abstracts inside your corpus folder.

A raw JSON output file will also be saved in another folder in TXT format. This file includes your API key, so please do not share this file with the public.

Repeat these processes by re-run this program until you get a satisfying number of text files in your corpus.

Step 3. NLP Analytics.


After you have completed creating your corpus, you are ready to do some NLP stuff for your research.

KnoGlo offers an example code for Topic Modeling in Python 3.

➊ Install Jupyter Notebook by following this tutorial.

➋ Run Jupyter Notebook. On Mac, run this command in Terminal:

jupyter notebook

Jupyter Notebook will be opened in your default browser.

➌ Find the file KnoGlo NLP Topic Modeling.ipynb in the same folder, source-code/Corpus-Generator-and-NLP. Open it.

Now, you are ready to do some Topic Modeling for your corpus.