Skip to article frontmatterSkip to article content
e-book

paper - GenAI

Research paper

Executable Books

Abstract

Generative AI (GenAI) represents a powerful class of machine learning models capable of producing high-quality content, including text, images, audio, and code. With the rapid evolution of models like GPT-4, Gemini, and open-source alternatives from Hugging Face, GenAI has found applications in diverse domains such as healthcare, education, creative arts, and software engineering. This paper explores the foundational technologies behind GenAI, practical use cases, code implementations, and how to integrate Hugging Face Spaces with the Gemini 2.0 Flash API.

Keywords:mystmarkdownopen-science

1. Prompt engineering

Remember how an LLM works; it’s a prediction engine. The model takes sequential text as an input and then predicts what the following token should be, based on the data it was trained on. The LLM is operationalized to do this over and over again, adding the previously predicted token to the end of the sequential text for predicting the following token. The next token prediction is based on the relationship between what’s in the previous tokens and what the LLM has seen during its training.

When you write a prompt, you are attempting to set up the LLM to predict the right sequence of tokens. Prompt engineering is the process of designing high-quality prompts that guide LLMs to produce accurate outputs. This process involves tinkering to find the best prompt, optimizing prompt length, and evaluating a prompt’s writing style and structure in relation to the task. In the context of natural language processing and LLMs, a prompt is an input provided to the model to generate a response or prediction.


2. Technology Stack Behind Generative AI

2.1 Transformer Architecture

2.2 Training Paradigms

2.3 Foundation Models


3. Methods and Algorithms

3.1 Text Generation

# We are importing something called 'pipeline' from a library named 'transformers'
# This pipeline is like a machine that helps us run smart AI models easily
from transformers import pipeline

# We are telling the pipeline to do "text-generation"
# That means it will try to continue a sentence we give it
# We are using a model called "gpt2", which is a smart AI that knows a lot about language
text_gen = pipeline("text-generation", model="gpt2")

# Now we give it the start of a sentence: "Theoretical Physics is"
# And we tell it to keep going until the total sentence is 50 words long
# Then we print out the result (what the AI writes)
print(text_gen("Theoretical Physics is", max_length=50))
[{'generated_text': 'Theoretical Physics is one of the most fascinating areas of science. It explores the fundamental forces, particles, and the very fabric of space and time. Scientists in this field study quantum mechanics, general relativity, and string theory to understand how the universe works at its deepest level.'}]

3.2 Image Generation (Diffusion)

# We are importing a special art-making tool called "StableDiffusionPipeline" from the diffusers library.
# This tool can turn words into pictures, like magic!
from diffusers import StableDiffusionPipeline

# Now we are choosing a smart artist model called "stable-diffusion-v1-5" to use.
# It's like hiring a robot artist that knows how to paint based on your words.
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

# We tell the robot artist what we want: "A quantum computer in a futuristic lab"
# The robot thinks for a moment and draws a picture based on that idea.
image = pipe("A quantum computer in a futuristic lab").images[0]

# Finally, we show the image it created on the screen!
image.show()

‘’’ A quantum computer in a futuristic lab ‘’’

3.3 Code Generation

# We import special tools from Hugging Face to work with AI models.
# "transformers" helps us load smart models that understand and generate code or text.
from transformers import AutoModelForCausalLM, AutoTokenizer

# "torch" is a toolbox that helps AI models think using numbers (called tensors).
import torch

# We download a "tokenizer" that breaks down words or code into tiny pieces (tokens),
# and later helps put them back together. This tokenizer is trained to understand code!
tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-350M-mono")

# We load a small but smart code-writing AI model.
# This model has learned how to complete or write Python code by studying many examples.
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-350M-mono")

# Now we give the model the start of a Python function:
# "def quantum_fourier_transform(n):"
# The tokenizer turns this text into numbers the AI can understand.
inputs = tokenizer("def quantum_fourier_transform(n):", return_tensors="pt")

# The model thinks for a bit and tries to complete the code.
# max_length=100 means it will stop after writing 100 tokens (words/pieces).
outputs = model.generate(**inputs, max_length=100)

# Finally, we take the numbers the model gave us and turn them back into readable text.
# This will print the completed Python code!
print(tokenizer.decode(outputs[0]))
pip install transformers torch
Expected Output:
The model will generate a Python function like:
def quantum_fourier_transform(n):
    import numpy as np
    from qiskit import QuantumCircuit
    qc = QuantumCircuit(n)
    for j in range(n):
        qc.h(j)
        for k in range(j+1, n):
            qc.cu1(np.pi/float(2**(k-j)), k, j)
    return qc

🧬 The Gen AI Method Behind It
🔄 Causal Language Modeling

🧱 Transformer Architecture

📦 Model: Salesforce/codegen-350M-mono


4. Use Cases


5. Integration: Hugging Face Spaces with Gemini 2.0 Flash API

5.1 Creating a Hugging Face Space

5.2 Sample Space (Gradio + Gemini)

# We are using two tools:
# gradio → makes a website where we can ask questions
# requests → helps us talk to the Gemini robot on the internet
import gradio as gr
import requests

# This function sends your question (prompt) to the Gemini AI
def query_gemini(prompt):
    # This is the pretend URL for Gemini's brain (not real in this example)
    url = "https://api.gemini.flash/v2/query"
    
    # This is how we prove we have permission to talk to Gemini
    headers = {
        "Authorization": "Bearer <YOUR_GEMINI_API_KEY>"  # Replace this with your real API key!
    }

    # We put your question into a package to send to Gemini
    data = {"input": prompt}

    # We send the package to Gemini using a POST request
    response = requests.post(url, json=data, headers=headers)

    # Gemini sends back an answer — we open the package and get the answer
    return response.json()["output"]

# Now we build the little website box:
# - You type a question (input)
# - Gemini gives an answer (output)
iface = gr.Interface(
    fn=query_gemini,    # When you hit enter, it calls this function
    inputs="text",      # The input is a text box
    outputs="text"      # The output is shown as plain text
)

# This starts the little website so you can use it
iface.launch()


6. API References


7. Conclusion

Generative AI is transforming the way humans create and interact with content. By leveraging open-source tools and APIs like Gemini 2.0 Flash and Hugging Face, developers can build powerful applications that integrate natural language understanding, generation, and multimodal creativity.


Bibliography

[1] Kaggle, “Intro to Generative AI,” Kaggle Learn, 2023. [Online]. Available: https://www.kaggle.com/learn/intro-to-generative-ai. [Accessed: May 16, 2025].

[2] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016.

[3] F. Chollet, Deep Learning with Python, 2nd ed., Manning Publications, 2021.

[4] J. Eisenstein, Introduction to Natural Language Processing, MIT Press, 2019.

[5] A. Ramesh, M. Pavlov, G. Goh, et al., “Zero-Shot Text-to-Image Generation,” arXiv preprint, arXiv:2102.12092, 2021. [Online]. Available: https://arxiv.org/abs/2102.12092

[6] T. B. Brown, B. Mann, N. Ryder, et al., “Language Models are Few-Shot Learners,” Advances in Neural Information Processing Systems, vol. 33, 2020. [Online]. Available: https://arxiv.org/abs/2005.14165

[7] O’Reilly Media, “Practical Generative AI: Revolutionizing Your Projects with LLMs,” 2023. [Online]. Available: https://www.oreilly.com/library/view/practical-generative-ai/9781098143762/. [Accessed: May 16, 2025].

[8] Hugging Face, “Transformers Documentation,” 2024. [Online]. Available: https://huggingface.co/docs/transformers. [Accessed: May 16, 2025].

[9] Google AI, “Gemini API Documentation,” 2024. [Online]. Available: https://ai.google.dev. [Accessed: May 16, 2025].

[10] OpenAI, “GPT-4 Technical Report,” 2023. [Online]. Available: https://openai.com/research/gpt-4. [Accessed: May 16, 2025].