My Journey to Getting a Ph.D. in NLP in 75 Steps

Share this post

Dataset: Make or Take

nlpjourney.substack.com

Dataset: Make or Take

Update 9/75

Jan Spörer
Oct 17, 2022
Share this post

Dataset: Make or Take

nlpjourney.substack.com

This week, I tried to decide between taking an existing dataset for question answering for financial documents vs. making a new dataset.

I gathered some pros and cons and identified potential datasets. Building a new dataset from scratch requires effort. But it can also be the basis for a publication. Also, there does not seem to be a dataset that fits my research question.

My supervisor and I will discuss the results and decide in the upcoming week.

You can follow these updates: Substack Blog Telegram WhatsApp LinkedIn Medium Twitter Calendly

This large screen makes me more productive.

What Happened Since Last Week?

I finished the other half of the book “How to Write and Publish a Scientific Paper” by Robert Day. Good book.

My supervisor suggested a dataset that I might use for benchmarking my algorithms. If this dataset is suitable for my research, it can save me much time as I would not have to create a dataset on my own. I reflected on this dataset and am not sure if it is suitable for me. I will discuss this with my supervisor on Tuesday.

My colleague Thomas Huber held a presentation at the Chair of Data Science and NLP about the introspection of transformer-based language models [LM-Debugger - An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models (Geva et al, 2022)]. The paper presents a method to accurately change the behavior of language models for specific prompts (among other research contributions).

What Were the Biggest Obstacles?

No major obstacles. I left the phone at home again today, and it was great.

Which Goals Did I Meet?

  1. Write one section for the first paper. The first paper will be a literature review/state-of-the-art.

  2. Identify a conference for the state-of-the-art.

Which Goals Did I Miss?

  1. Align the outlet (conference) with my supervisor. (I.e., ask him if he likes the conference and thinks that it fits my research question.

Was It a Good Week?

Yes. Everything starts falling into place, and I have a clearer view of the literature and what I want to write about.

Short-Term Tasks for The Coming Week

  1. Align the outlet (conference) with my supervisor. (I.e., ask him if he likes the conference and thinks that it fits my research question.

  2. Decide on whether to prepare a dataset ourselves or take an existing dataset.


About “75-Step Journey Toward a Ph.D. in Natural Language Processing”

You will, from now on, witness my grind. Feel my blood, sweat, and tears.

With this series of articles, you become a real-life weekly witness of my dissertation progress, all in 75 steps. This has multiple purposes:

1) Forcing myself to keep moving through the power of public shame!

2) Helping other (prospective) Ph.D. students to stay motivated and to show that hard times are normal when going through this process.

3) Getting support from the community when I go through hard times.

Share this with your Ph.D. student friends: Substack Blog Telegram WhatsApp LinkedIn Medium Twitter Calendly.

Read More From the 75 Steps Toward a Ph.D. in NLP Series

2022-08-20: Update 1/75 - Kicking Off the Journey Toward a Ph.D. in NLP

2022-08-28: Update 2/75 - Literature Review

2022-09-04: Update 3/75 - Back on Track and Back to Vallendar

2022-09-10: Update 4/75 - Long Test Runtime; Retriever Works

2022-09-18: Update 5/75 - Jour Fixe Joy

2022-09-26: Update 6/75 - Reading Group

2022-10-02: Update 7/75 - Leaving the Phone at Home

2022-10-09: Update 8/75 - Finding a Conference

2022-10-16: Update 9/75 - Dataset - Make or Take

Share this post

Dataset: Make or Take

nlpjourney.substack.com
Comments
TopNew

No posts

Ready for more?

© 2023 Jan Spörer
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing