Can LLMs Identify Korean Adverbs? (Part 2)


Where we left off in round one, I was struggling with both ChatGPT 4 and Google Gemini 2.0 Flash to get them to just process the list correctly. I gave up on ChatGPT 4 because I didn’t need that kind of stress in my life. I already had decent results from Claude 3.7 Sonnet.

I say “decent” because I noticed that there were words in the non-adverb table that should have been in the adverb table. I decided to have Claude reprocess each list.

In the first round, Claude said that 94 words were non-adverbs. I gave it the non-adverb list again and now 68 of those words are considered adverbs and 25 of them are non-adverbs.
Claude labeled 250 words as adverbs. I gave it the adverb list again and only 205 were considered adverbs. After processing about 103 entries, the prompt stopped due to message constraints. I had to prompt it with “continue” to get it to finish.

I still noticed words that should be considered adverbs in the non-adverb table.

“Why are 유효하게, 저조하게, 부자연스럽게, and 천진난만하게, not considered adverbs when they are single words that end in "게"?”

“You're absolutely right to question those classifications. My categorization wasn't fully consistent. Let me clarify:

In Korean linguistics, words with the ending -게 (하게, 스럽게, etc.) formed from adjectives are actually functioning as adverbs in sentences. These are called "derived adverbs" and should be considered adverbs in usage.

The words you pointed out:
  • 유효하게 (effectively)
  • 저조하게 (poorly)
  • 부자연스럽게 (unnaturally)
  • 천진난만하게 (innocently)
All of these are single words ending in -게 derived from adjectives (유효하다, 저조하다, 부자연스럽다, 천진난만하다), and they function adverbially in sentences. They should have been included in the first table of adverbs, not the second table.

My error was being inconsistent in applying this rule. I correctly classified many similar words (like 부주의하게, 경솔하게, 쾌활하게, etc.) as adverbs in the first table, but incorrectly moved these particular ones to the second table.

Thank you for pointing out this inconsistency. These four words should indeed be considered adverbs just like the other -게 ending words in the first table.”

I found two more entries that should be classified as an adverb, but I only had one more message I could send before running out of free messages.

“what about "계획적으로" and "마음속으로"? They are in your non-adverb table too. they look like adverbs.”

It responded:

“You've identified another inconsistency in my categorization. Let me explain:

You're right that "계획적으로" and "마음속으로" function as adverbs in Korean sentences and should be considered adverbs.

These words use the suffix -(으)로, which forms adverbs from nouns or other parts of speech:
  1. 계획적으로 (systematically, in a planned manner)
    • From 계획적 (systematic, planned) + -(으)로
    • Functions as an adverb in sentences
  2. 마음속으로 (in one's heart, mentally)
    • From 마음속 (inside the heart/mind) + -(으)로
    • Functions as an adverb in sentences
The error in my analysis was treating these -(으)로 endings differently from -게 endings. In Korean, both endings create forms that function adverbially in sentences.

I incorrectly identified them as "Noun + (으)로" and "Adjective + (으)로" structures rather than recognizing them as proper adverbs. In practical usage, these are considered adverbs just like the -게 ending words.

For consistency, these words should have been included in the first table along with other adverbs. Thank you for pointing out this oversight.”

I’m not happy that I had to reprocess the lists again, but that was better than not getting a list at all.

I’m pretty pleased with my results from Claude. Despite having to delete message windows and start a new conversation to be able to have enough context to finish the task. The tables seem decent.

There were 391 candidates in total. Claude stated that 54 words were non-adverbs and 273 words were adverbs. However, there are 8 words that are miscategorized as non-adverbs and there are 64 entries missing.

I’m too drained to go through the list looking for the missing 64 words. I am assuming it removed duplicates because in round 3, that is what it said it did. I was just happy to have a list I could finally put in Quizlet and Anki.

But before we move on to round 3, at the time of running this experiment, Google released Gemini 2.5 Pro (experimental). Since Gemini 2.0 Flash was still insisting that it was working on processing the list and I wanted to keep that chat thread open for a couple of days to see if it was really telling the truth (side eye), I decided to give the task to 2.5 Pro.

Wow!! Gemini 2.5 Pro is soooo much better!!

It has an indicator letting me know it was still working. It displayed the list while it was still processing my request, so I waited. When it was done, it provided both tables. I was able to export them directly to Google Sheets.

It also has a button at the top of the prompt called “Show thinking”. You can press it to see its thought process. It shows the steps it was using to solve the request.

Gemini 2.5 Pro (experimental) Thought Process::

The user wants me to categorize a list of Korean words. I need to create two tables:
  1. Adverb Table: For words that are Korean adverbs.
    • Column 1: Korean (the word itself)
    • Column 2: English (translation)
    • Column 3: Hint (Korean sentence using the word exactly as provided, with English translation)
  2. Non-Adverb Table: For items that are not adverbs or are multi-word phrases.
    • Column 1: Korean (the item)
    • Column 2: Part of Speech
    • Column 3: Reason (why it's not a single adverb)
    • Column 4: Translation
Let's go through the list word by word and determine its category and details. I need to be careful about identifying adverbs vs. other forms and multi-word phrases. The suffix -하게 often marks adverbs derived from adjectives, but not always. I also need to ensure the example sentences use the word exactly as provided.

List Analysis:

[list of words and table it should go into to]

Now, I need to create the example sentences for Table 1, using the exact adverb form. I also need to finalize the reasons and translations for Table 2. I should handle duplicates by just listing them once in the final table.

Refining Table 1 (Adverbs):
  • Remove duplicates.
  • Create example sentences.
Refining Table 2 (Non-Adverbs):
  • Finalize Part of Speech, Reason, and Translation.
Sentence Generation Strategy:
  • Use simple sentence structures like "Subject + Adverb + Verb" or "Subject + Object + Adverb + Verb".
  • Ensure the adverb is used exactly as listed.
  • Provide English translation for the sentence.
Final Check:
  • Check if all words from the original list are accounted for in one of the two tables.
  • Ensure the formatting is correct (two tables with specified columns).
  • Adhere to user guidelines (no romanization, respond in English).
Let's compile the lists, removing duplicates for Table 1.

Table 1 - Adverbs (Unique List):
[words]

Table 2 - Non-Adverbs (Unique List):
[words]

Now generate the tables with sentences and reasons.
I can see why on Reddit, people were saying that Claude subscription users were going to switch to Gemini 2.5 Pro.

Looking through the non-adverb list, I had nothing to question because they were clearly not adverbs, at least to me. There were no false negatives. I was so shocked that it was done and done as instructed, that I just sat and stared at the screen for a while.

Google Gemini 2.5 Pro (experimental) stated that 41 words were non-adverbs and 273 words were adverbs. The non-adverbs list does not contain adverbs, which is great. It is missing 77 words. It states that it removed 76 duplicates from the adverbs list and 1 from the non-adverb list.

I don’t know who to believe. Claude is missing 64 words and Gemini is missing 77. I know that some of them were duplicates. For scientific purposes, I should sort through my original list to figure out the number of duplicates and which words are missing, but this was originally just a language acquisition task. In the future, I will do that, because I understand the limitations of these models better. No matter how much smarter they get, they still fail at tasks they should be able to complete.

If you want to see the output from round 2, the links are below:

[link to view round 2 spreadsheet online] [link to download round 2 spreadsheet]

In round 3, I check in with Gemini 2.0 Flash to see if it really is still working on the list, I bring ChatGPT 4 back into the ring, provide a new (smaller) list, and a reformatted prompt.


Comments

Popular posts from this blog

Can LLMs Identify Korean Adverbs? (Part 1)

Can LLMs I dentify Korean Adverbs? (Part 3)