What is the future of programming with Large Language Models?

Insights from a classroom experiment show that LLMs are benefiting both skilled programmers and beginners. Simultaneously the boundaries between coding and natural language are blurring.

Jose A. Rodríguez-Serrano

A few weeks after the release of ChatGPT, renowned computer scientist Andrej Karpathy, member of OpenAI’s founding team and former AI director at Tesla, famously tweeted “The hottest new programming language is English”. This statement alludes to the current capabilities of ChatGPT and other large language models (LLMs) to autocomplete source code and generate code based on human instructions. It’s yet another example of AI’s penetration into fields that demand cognitive and specialized skills

Whether all coding can be replaced by language remains an open debate in the community. A question like “Will programming disappear?” is probably over-simplifying. A thorough discussion of the impact of LLMs in programming is complex, multi-faceted and nuanced. Unfortunately, this debate is also cluttered with a significant amount of background noise. 

Undeniably, assistants like Github Copilot are improving the productivity of competent programmers, who often perceive themselves “augmented” rather than replaced. It is also accepted that LLMs can create computer applications for tasks that are not too excessively open-ended. A different question is whether users across all levels of programming experience might one day create complex, domain-specific apps easily without any direct code interactions.  

A thorough discussion of the impact of LLMs in programming is complex, multi-faceted and nuanced

Students of Esade’s MSc in Business Analytics will find themselves navigating this world. Consequently, they should be aware of this ongoing debate and seek evidence to form their positions over time. To foster this reflection, we could have easily organized a class debate. Instead, in the elective course ‘Data-driven prototypes’, we took a different approach. During the last lecture, we conducted a class experiment that allowed students to discover first-hand evidence through a hands-on challenge, rather than relying solely on theoretical discussions. 

The experiment

During this experiential session, students were assigned three short programming challenges: 

  1. Build a website with a specific functionality, using a programming language they were not familiar with.  
     
  2. Build a web application with the same functionality, using a programming language known to them. 
     
  3. Build a simple game using a known language but where the operation flow was not straightforward.  

The key feature of the challenge was that students were required to use ChatGPT prompts as the sole means of generating code. 

It is assumed students do have experience both with ChatGPT and code assistants such as GitHub Copilot, and have previously employed it for code development, debugging and explanations. However, this exercise places students in an unfamiliar situation under competitive pressure, involving various scenarios (known vs. unknown language, easy vs. difficult logic). The learning objective is to gain different perspectives and perceptions that might be useful for the debate.  

Exercise outcomes 

As an example, these were some observations registered during the exercise, highlighting essential points of the debate.  

Example
Results of one of the participants to the first challenge, a prototype of a website to query nutritional information of products

 

  • LLMs are like language-enhanced "code interpreters" that process natural language input and generate code snippets. Users describe their programming needs, LLMs respond with code, and users iterate until they achieve the desired result. This known workflow is the basis of some innovations like OpenAI’s Data Analyst or Anthropic’s artifacts. Student teams also exploited other advantages of LLMs like asking to debug erroneous code or asking for explanations. 
  • Code automation should be understood as an 80/20 rule. While it is relatively quick to generate basic designs and functionality using LLM prompts, some students observed that, for finer details, sometimes it was hard to get the LLM to produce exactly the detailed functionality they had in mind. Rather than viewing LLMs as omnipotent, experienced users see them more like a tool that can accomplish 80% of the work in just 20% of the time. The usefulness of this approach depends on the importance and cost of the remaining 20%. In this sense, LLMs can be highly effective for creating quick prototypes or jumpstarting code development. 
  • Knowing how to code is still useful. Users who understand technical concepts and can provide specific prompts (e.g. “use the bootstrap HTML library”) get better results from LLMs or advance faster. Like in other domains, the combination of human and AI capabilities produces the greatest efficiency. Indeed, this is confirmed by our data: while the majority of used prompts were specifying functional features of the prototype (48%), a non-negligible portion (19%) included technical concepts, instructions to update code or code itself. 
  • Combining LLMs with APIs is an effective pattern. As the landscape of programming is rapidly evolving with the increasing reliance on APIs, LLMs can help generate the "glue code" to connect various API calls. Connecting with APIs or external tools is the key behind ideas like the GPTs, or the Computational Visual Programming approach (best article in the prestigious CVPR conference last year), and, in general, remains an ongoing field of research.  
  • Humans learn how to adapt to LLMs. For example, student teams easily found that translating code from one language to another worked more effectively than re-asking instructions for a new language. As algorithms that generate text word-by-word, LLMs work more reliably for tasks which are more deterministic (translating one code into another) and, conversely, produce more variability in open tasks. 
  • Giving the model time to "think" improves its performance. Prompts such as "Make a plan first" (used by one student team) are not only informative to the users but also lead to better code generation. Due to the model's autoregressive nature—where each new piece of text is generated based on the previous text—it can be challenging for the model to maintain coherence without an intermediate plan to guide its responses. 

Concluding remarks

These results highlighted points of the debate that students could have read about online but instead experienced “by doing”. It should be acknowledged that there are many angles of the debate not covered by this exercise. For instance, the programming challenges were kept short to fit the constraints of a class activity in a prototyping course. But, of course, the development of larger applications would involve greater complexity (for instance, refer to this personal experiment documented by Xavier Amatriain). Also, the debate is anchored to the shape of the current tools, and future innovations might introduce new interfaces or paradigms for code generation and assistance. 

Collaboration between humans and AI appears to yield the most effective results

What we can say at this point is that LLMs are benefiting both skilled programmers and beginners, while simultaneously blurring the lines between coding and natural language for both groups. As with other fields, collaboration between humans and AI appears to yield the most effective results, underscoring the continued value of programming skills. 

Ultimately, we are navigating a dynamic landscape where LLM paradigms and tools are still evolving, making it challenging to predict the future direction. As Ed Catmull, founder of Pixar and author of Creativity, Inc., wisely noted, sometimes the best way to explore a path is to walk it. I encourage both students and instructors to embrace this uncertainty by actively experimenting and exploring new approaches, as hands-on experience is crucial for uncovering innovative solutions and gaining deeper insights. 

All written content is licensed under a Creative Commons Attribution 4.0 International license.