Culture
文藝版塊
Johnson
約翰遜專欄
Speaking in many tongues
講多國語言
ChatGPT may make things up, but it does so fluently in more than 50 languages.
ChatGPT可能會編假話,但它能用50多種語言流利地編假話。
The hype that followed ChatGPT's public launch last year was, even by the standards of tech innovations, extreme.
ChatGPT自去年公開發布后所引發的炒作,即使以科技創新的標準來看也是極端的。
OpenAI's natural-language system creates recipes, writes computer code and parodies literary styles.
OpenAI的這一自然語言系統能創造食譜,編寫計算機代碼,模仿各種文學風格。
Its latest iteration can even describe photographs.
其最新版本甚至可以描述照片。
It has been hailed as a technological breakthrough on a par with the printing press.
ChatGPT被譽為與印刷機相媲美的技術突破。
But it has not taken long for huge flaws to emerge, too.
但沒過多久,巨大的缺陷也顯現出來。
It sometimes "hallucinates" non-facts that it pronounces with perfect confidence, insisting on those falsehoods when queried.
它有時會"幻想"出并非事實的東西,并自信滿滿地把這些東西講出來,就算被質疑也堅持這些謊言。
It also fails basic logic tests.
它也未能通過基本的邏輯測試。
In other words, ChatGPT is not a general artificial intelligence, an independent thinking machine.
換句話說,ChatGPT不是通用人工智能,不是一臺能獨立思考的機器。
It is, in the jargon, a large language model.
用行話來說,它是一個大型語言模型。
That means it is very good at predicting what kinds of words tend to follow which others, after being trained on a huge body of text -- its developer, OpenAI, does not say exactly from where -- and spotting patterns.
這意味著,在用大量文本進行訓練后,它非常擅長預測哪些單詞之后往往接著哪些其他單詞并找出其中的規律,其開發者OpenAI沒有具體說明這些文本的來源。
Amid the hype, it is easy to forget a minor miracle.
在炒作中,很容易忘記一個小小的奇跡。
ChatGPT has aced a problem that long served as a far-off dream for engineers: generating human-like language.
ChatGPT成功解決了一個長期以來一直被工程師們視為遙遠夢想的問題:生成類似人類的語言。
Unlike earlier versions of the system, it can go on doing so for paragraphs on end without descending into incoherence.
與早期版本不同,ChatGPT可以長篇大段地一直說下去,而不會出現語句不通的情況。
And this achievement's dimensions are even greater than they seem at first glance.
這一成就的影響范圍甚至比它在初看之時所表現的更大。
ChatGPT is not only able to generate remarkably realistic English.
ChatGPT不僅能生成非常逼真的英語。
It is also able to instantly blurt out text in more than 50 languages -- the precise number is apparently unknown to the system itself.
還能立即脫口而出50多種語言 -- 系統自己顯然也不知道確切數字是多少。
Asked (in Spanish) how many languages it can speak, ChatGPT replies, vaguely, "more than 50", explaining that its ability to produce text will depend on how much training data is available for any given language.
當被問及(用西班牙語)它會說幾種語言時,ChatGPT含糊地回答說"超過50種",并解釋說,它以某種語言生成文本的能力取決于這一語言的訓練數據有多少。
Then, asked a question in an unannounced switch to Portuguese, it offers up a sketch of your columnist's biography in that language.
然后,在沒有通知的情況下轉而用葡萄牙語提問時,它又用葡萄牙語提供了您的專欄作家的生平簡介。
Most of it was correct, but it had him studying the wrong subject at the wrong university.
大部分內容是正確的,但他就讀的大學和專業搞錯了。
The language itself was impeccable.
而語言本身無可挑剔。
Portuguese is one of the world's biggest languages.
葡萄牙語是世界上最大的語種之一。
Trying out a smaller language, your columnist probed ChatGPT in Danish, spoken by only about 5.5m people.
為了試一個更小的語種,您的專欄作家又用丹麥語對ChatGPT進行了追問,大約只有550萬人說丹麥語。
Danes do much of their online writing in English, so the training data for Danish must be orders of magnitude scarcer than what is available for English, Spanish or Portuguese.
丹麥人在網上寫東西大部分都是用英語,所以丹麥語的訓練數據肯定比英語、西班牙語或葡萄牙語能提供的訓練數據要少幾個數量級。
ChatGPT's answers were factually askew but expressed in almost perfect Danish.
ChatGPT的回答歪曲了事實,但其丹麥語幾近完美。
(A tiny gender-agreement error was the only mistake caught in any of the languages tested.)
(在所有測試的語言中,只發現了一個微小的性別一致性錯誤。)
Indeed, ChatGPT is too modest about its own abilities.
的確,ChatGPT對自己的能力過于謙虛。
On request, it furnishes a list of 51 languages it can work in, including Esperanto, Kannada and Zulu.
它應要求提供了它可以使用的51種語言的清單,其中包括世界語、卡納達語和祖魯語。
It declines to say that it can "speak" these languages, but rather "generates text" in them.
它拒絕說自己會"說"這些語言,而是說能用這些語言"生成文本"。
This is too humble an answer.
這個回答真是過謙了。
Addressed in Catalan -- a language not on the list -- it replies in that language with a cheerful "Yes, I do speak Catalan -- what can I help you with?"
在用加泰羅尼亞語(這種語言不在清單上)和它說話時,它用這種語言愉快地回答道:"是的,我會說加泰羅尼亞語,有什么可以幫你的嗎?"
A few follow-up questions do not trip it up in the slightest, including a query about whether it is merely translating answers first generated in another language into Catalan.
一些后續的提問也絲毫沒能讓它出差錯,包括詢問它是否只是先用另一種語言生成答案,然后再翻譯成加泰羅尼亞語。
This, ChatGPT denies: "I don't translate from any other language; I look in my database for the best words and phrases to answer your questions."
ChatGPT否認了這一點:"我不翻譯任何其他語言,我在我的數據庫中尋找最佳詞句來回答您的問題。"
Who knows if this is true?
誰知道這是不是真的?
ChatGPT not only makes things up, but incorrectly answers questions about the very conversation it is having.
ChatGPT不僅編造故事,而且錯誤地回答了有關正在進行的對話的問題。
(It has no "memory", but rather feeds the last few thousand words of each conversation back into itself as a new prompt.
(它沒有"記憶",而是將每次對話的最后幾千個單詞反饋給自己,作為新的提示符。
If you have been speaking English for a while it will "forget" that you asked a question in Danish earlier and say that the question was asked in English.)
如果你說了一段時間的英語,它就會"忘記"你之前用丹麥語問了一個問題,并說那個問題是用英語問的。)
ChatGPT is untrustworthy not just about the world, but even about itself.
ChatGPT在關于世界,甚至關于它自己的方面是不可信賴的。
This should not overshadow the achievement of a model that can effortlessly mimic so many languages, including those with limited training data.
但這不應該掩蓋這一模型的成就,它可以毫不費力地模仿如此多的語言,包括那些訓練數據有限的語言。
Speakers of smaller languages have worried for years about language technologies passing them by.
多年來,較小語種的使用者一直擔心語言技術會與他們擦肩而過。
Their justifiable concern had two causes: the lesser incentive for companies to develop products in Icelandic or Maltese, and the relative lack of data to train them.
他們這一合理擔憂有兩個原因:公司開發冰島語或馬耳他語產品的動力較小,以及訓練數據相對缺乏。
Somehow the developers of ChatGPT seem to have overcome such problems.
ChatGPT的開發者似乎不知如何已經克服了這些問題。
It is too early to say what good the technology will do, but this alone gives one reason to be optimistic.
現在說這項技術會有什么好處還為時過早,但只是這一點就給了我們一個保持樂觀的理由。
As machine-learning techniques improve, they may not require the vast resources, in programming time or data, traditionally thought necessary to make sure smaller languages are not overlooked online.
隨著機器學習技術的進步,它們可能不像之前以為的那樣,需要編程時間或數據方面的大量資源,這會確保較小的語種不會在網上被忽視。