How To Train Open Source Artificial Neural Network GPT-2 To Do Your Language Homework For You


Updated on June 11th, 2019. After its release, OpenAI LP changed its non-profit nature to “capped profit.” This coupled with the limited release of GPT-2 puts its contribution to Free, Open Source Software (FOSS) into question.


Clever boy Donny Ferguson took advantage of the GPT-2 artificial neural network model to do the homework for him. Tasked with describing the reasons for the Civil War in the US, all he had to do was give “Geeppy” the prompt and let it give it a few tries.

On Geeppy’s fifth attempt, this was the response:

«It is easy to identify why the Civil War happened, because so many people and so many books and so much television and films tell us that it was the cause, that it has something to do with race or economics or religion. There is much agreement that it was essentially a war of slavery on behalf of capitalism, about a century of slavery. But that’s not what most people think of when they think of the Civil War. Many people think of the war as about states rights. You’re not wrong about that, but it’s kind of misleading to say that the Civil War was a conflict between states’ rights and federalism. So let’s try again. What’s the idea that has inspired so many people to go onto the battlefields of Gettysburg and Antietam and Gettysberg and Petersburg and Fredericksburg? The American idea of the republic–a notion of limited government–is a great part of the history.»

The text goes on.

For a change, Geeppy is the result of open source initiatives. Partly at least. It was built by OpenAI, a consortium of research institutions and companies pledging for transparency and universal benefit in the avancement of Artificial Intelligence, which they deem “will be the most significant technology ever created by humans.” However, they decided to release only a basic and untrained version of Geeppy, out of four they claim to have built. The decision, they argue, is “due to our concerns about malicious applications of the technology.” They claim the community, arguably as a reflection of society as a whole, has not achieved a common understanding on how to address the challenge of automated content generation that looks human or made by humans.

Geeppy is only the latest in an ongoing streak of advancements in AI. But it is proof of the possibility of Open Source to compete in the space. The model beat current records in accuracy and “perplexity” (ability to deal with incomplete information), according to various metrics. It slowly and surely approaches human levels of written clarity.

In other examples, given a textbook or a dataset, Geeppy was able to answer fill-in-the-blank questions with good, not perfect accuracy. It also summarized news and translated texts from English to French. More data and computing power are only expected to refine his writing skills.

As for Donny, should he be punished for not doing an assignment now machines can do, despite the fact that the outcome was completely original? We don’t have to answer for now. He was created by Geeppy too.

Find the part of the model that was public here.

