Science & Technology · Biotechnology · UPSC GS-III
Genome Sequencing — Decoding the Blueprint of Life 🧬
Complete UPSC Notes — What is a genome, how sequencing works (made easy!), Human Genome Project, Genome India Project (completed Jan 2025!), IndiGen, Earth BioGenome Project. Applications in medicine, forensics, agriculture. Limitations & ethical concerns. Updated April 2026.
🧬 DNA = A, T, C, G — 4 Letters of Life
✅ Genome India Completed (Jan 2025)
🇮🇳 10,000 Indian Genomes Sequenced
180 Million Variants Identified
Published in Nature Genetics (Apr 2025)
📚 Legacy IAS — Civil Services Coaching, Bangalore · Updated: April 2026
Section 01 — Start Here
🔥 The Basics — Made Simple
💡 Think of it Like a Book
Imagine your body is a library. Each cell contains a complete copy of the instruction manual — that's your genome. This manual is written using only 4 letters: A (adenine), T (thymine), C (cytosine), G (guanine). These letters pair up (A–T, C–G) to form the rungs of the famous double helix ladder — that's DNA. A gene is like one chapter of the book — it contains instructions for making one specific protein. And genome sequencing is like reading every single letter of the entire book — all 3 billion of them — in order.
3 Billion
Base pairs (letters) in the human genome
20,000–25,000
Estimated genes in the human genome
23 Pairs
Chromosomes in the human cell nucleus
99.9%
Human DNA is identical — 0.1% makes us unique
📌 Key Terms: DNA = the molecule carrying genetic instructions (double helix). Gene = a unit of DNA with instructions for making one protein. Genome = the complete set of all DNA in a cell. Sequencing = determining the exact order of A, T, C, G bases. Chromosome = a tightly packed strand of DNA (humans have 23 pairs = 46 total).
Section 02
⚙️ How Does Genome Sequencing Work?
Whole genome sequencing (WGS) reads the entire DNA of an organism in one process. Here's how it works — step by step:
1
Extract
DNA is extracted from a sample (usually blood)
→
2
Shear
DNA is cut into small readable pieces using molecular scissors
→
3
Barcode
Small DNA tags are added to identify which piece belongs to which sample
→
4
Sequence
DNA sequencer reads the A, C, T, G of each piece
→
5
Assemble
Bioinformatics software stitches pieces back into complete genome
💡 The Jigsaw Puzzle Analogy
Imagine tearing a 3-billion-page book into millions of small snippets, reading each snippet, and then using a computer to figure out the correct order by matching overlapping words at the edges of each snippet. That's essentially what genome sequencing does!
📌 Cost Revolution: The first human genome (Human Genome Project, 2003) cost $3 billion and took 13 years. Today, sequencing one genome costs about $200–$600 and takes ~5 days. This 10,000× cost reduction has made large-scale projects like Genome India possible.
Section 03
🌍 Major Global Genome Projects
| Project | Year | Led By | Key Facts | Status |
| Human Genome Project | 1990–2003 | USA + International | First complete human genome sequence. 13 years, $3 billion. Identified 20,000–25,000 genes. Launched modern genomics. | Complete ✓ |
| ENCODE Project | 2003–ongoing | US NHGRI | Identify all functional elements in human genome: protein-coding regions, regulatory elements (promoters, enhancers, silencers). | Ongoing |
| Earth BioGenome Project | 2018–ongoing | International | "Biology moonshot" — sequence genomes of all eukaryotic life on Earth in 10 years. Digital library of all known DNA. | Ongoing |
Section 04 — Very Important 🇮🇳
🇮🇳 Genome India Project — Completed Jan 2025
🎉 Landmark Achievement: PM Modi announced the completion of the Genome India Project at the Genome India Data Conclave (January 2025). President Murmu called it "a significant chapter in the history of Indian Science." Findings published in Nature Genetics (April 2025).
10,074
Individual genomes sequenced from across India
85
Distinct population groups covered (32 tribal + 53 non-tribal)
180M
Genetic variants identified (130M autosomal + 50M sex chromosome)
20+
Leading Indian institutions collaborated
🏛️ Led By
IISc Bengaluru (CBR)
💰 Funded By
Dept of Biotechnology (DBT)
💾 Data Stored At
IBDC, Faridabad
🧬 Biobank
20,000 blood samples at CBR
📝 Published
Nature Genetics, Apr 2025
Why is this important? India has over 4,600 distinct population groups — one of the most genetically diverse populations in the world. Yet Indian genomes were severely underrepresented in global databases. This project creates India's own "reference genome" — a foundational template for all future Indian genetic research, personalised medicine, and disease prevention.
📌 Key Outcomes: (1) 38 critical genetic variants identified that affect drug metabolism in Indians. (2) Millions of rare variants linked to diseases like thalassemia, sickle cell anaemia, neurological disorders. (3) Data is a "digital public good" — accessible to researchers worldwide via IBDC. (4) FeED Protocol launched for ethical, transparent data sharing under Biotech-PRIDE Guidelines.
Other Indian Genome Initiatives
| Initiative | Year | Key Details |
| IndiGen Programme | 2019 | Backed by CSIR. Sequenced 1,029 Indian genomes. Identified 55.9 million single nucleotide variants. Pilot for Genome India. |
| Indian Initiative on Earth Bio-Genome Sequencing (IIEBS) | 2020 | Part of global Earth BioGenome Project. Phase 1: sequence 1,000 plant & animal species in 5 years. Led by JNTBGRI. Prevents biopiracy. |
| One Day One Genome | 2024 | DBT initiative. Sequences and publicly releases one bacterial genome daily to showcase India's microbial diversity. |
Section 05
💊 Applications of Genome Sequencing
🔬 Biological Research
Understanding gene function, protein production, gene regulation. Foundation for all modern biology.
🔍 Forensics
DNA sequences differentiate organisms to species and individual level. Criminal identification, paternity testing.
🏥 Diagnostics
Prenatal screening for genetic disorders. Assess rare diseases, cancer predisposition from genetic viewpoint. Pharmacogenomics — predict drug efficacy/side effects.
💉 Vaccines
Sequencing viruses (COVID-19, Ebola) enables rapid vaccine development by knowing variants/strains and hidden transmission pathways.
📊 Population Studies
AI + genomic profiles across populations → understand disease causation. Critical for rare genetic diseases needing large datasets.
🌾 Agriculture
Identify genes for disease resistance, yield, nutrition. Improve breeding. Detect pathogens. Revolutionise food security.
📌 Pharmacogenomics: The relationship between drugs and the genome. Genome sequencing reveals why the same drug works differently in different people — based on their genetic makeup. This is the foundation of personalised medicine. Genome India identified 38 variants affecting drug metabolism in Indians.
Section 06
⚠️ Limitations & Ethical Concerns
📊 Data Overload
Each genome = 80 GB of data. Genome India generated 8 petabytes. Analysis, storage, and interpretation remain enormous challenges.
🔧 Structural Variants
Current sequencing is accurate for single bases but struggles with large structural variants — duplications, deletions, inversions affecting big DNA segments.
❓ Incomplete Knowledge
Many genes still have unknown functions. Large numbers of variants are unclassified — we don't know if they're benign or harmful.
🔒 Privacy & Ethics
Genetic data is highly sensitive. Risks: genetic discrimination by insurance companies/employers, data breaches. India lacks comprehensive genetic data protection law.
🧬 Repetitive DNA
Larger genomes have repetitive DNA sequences that are hard to assemble correctly — like having many identical jigsaw pieces.
💰 Access & Equity
While costs have dropped dramatically, large-scale projects remain expensive. Risk of genomic inequality — benefits going only to wealthy populations.
⚠️ Social Risk: Genetic studies could potentially reinforce stereotypes and fuel divisive politics around racial purity and heredity. In India, debates over "indigenous" populations could take a genetic turn. Historical controversies around eugenics highlight the sensitivity of this subject.
Section 07 — Practice
📝 UPSC-Style MCQs
Q1The Genome India Project, completed in January 2025:
1. Sequenced the genomes of 10,000 individuals from diverse Indian populations.
2. Was funded by the Council of Scientific and Industrial Research (CSIR).
3. Data is stored at the Indian Biological Data Centre (IBDC) in Faridabad.
Which of the statements is/are correct?
a) 1 and 3 only
b) 1 and 2 only
c) 2 and 3 only
d) 1, 2 and 3
Statements 1 (10,000 individuals ✓) and 3 (IBDC Faridabad ✓) are correct. The Genome India Project was funded by the Department of Biotechnology (DBT), not CSIR. (CSIR funded the separate IndiGen programme.) Answer: (a).
Q2The term "pharmacogenomics" refers to:
a) The study of how pharmaceutical companies market drugs
b) The study of how an individual's genetic makeup affects their response to drugs
c) The process of manufacturing drugs using genetically modified organisms
d) The sequencing of drug molecules for quality control
Pharmacogenomics studies how a person's genetic profile affects their response to drugs — enabling doctors to predict drug efficacy and adverse effects based on DNA, and prescribe personalised treatments. Answer: (b).
Q3Consider the following:
1. Human Genome Project was completed in 2003.
2. The first human genome cost approximately $3 billion.
3. Today, sequencing a human genome costs approximately $100,000.
Which is/are correct?
a) 1 and 2 only
b) 2 and 3 only
c) 1, 2 and 3
d) 1 only
Statements 1 (completed 2003 ✓) and 2 ($3 billion ✓) are correct. Today, sequencing costs approximately $200–$600, not $100,000 (statement 3 is wrong — the cost has dropped dramatically). Answer: (a).
Q4The IndiGen Programme is associated with:
a) Department of Biotechnology (DBT)
b) Council of Scientific and Industrial Research (CSIR)
c) Indian Council of Medical Research (ICMR)
d) Defence Research and Development Organisation (DRDO)
The IndiGen Programme (2019) was endorsed and funded by CSIR. It sequenced 1,029 Indian genomes as a pilot. Note: The larger Genome India Project (10,000 genomes) was funded by DBT — a common confusion point in exams! Answer: (b).
Section 08
🧠 Memory Aid
🔑 Lock These In for Prelims Day
GENOME
Complete set of DNA in a cell. Humans: ~3 billion base pairs, 23 chromosome pairs, 20,000–25,000 genes. 99.9% identical between humans.
A-T, C-G
Four bases of DNA. A pairs with T, C pairs with G. Sequencing = reading the order of these bases.
HGP
Human Genome Project. 1990–2003. $3 billion. 13 years. First complete human genome. International (US-led).
GENOME INDIA
DBT funded. IISc Bengaluru led. 10,074 individuals from 85 populations. 180M variants. Completed Jan 2025. Published Nature Genetics Apr 2025. Data at IBDC Faridabad.
IndiGen
CSIR funded (NOT DBT). 2019. 1,029 genomes. Pilot project. Different from Genome India!
IBDC
Indian Biological Data Centre, Faridabad. India's first national life science data repository. Stores Genome India data.
COST DROP
First genome: $3 billion (2003). Now: ~$200–600 (~5 days). A 10,000× cost reduction!
PHARMA-GEN
Pharmacogenomics = how genes affect drug response. Why same drug works differently in different people. Foundation of personalised medicine.
EBP
Earth BioGenome Project (2018). Sequence ALL eukaryotic life on Earth. India's IIEBS (2020) is part of this — led by JNTBGRI.
Section 09
❓ FAQs
What is the difference between Genome India Project and IndiGen Programme?
IndiGen (2019) was a pilot programme funded by CSIR. It sequenced 1,029 Indian genomes to test the approach. Genome India Project (2020–2025) was the full-scale initiative funded by DBT. It sequenced 10,074 individuals from 85 population groups — 10× larger. Both aim for personalised medicine, but Genome India is the comprehensive national database. A common UPSC trap: confusing which agency funded which project (CSIR for IndiGen, DBT for Genome India).
Why does India need its own genome database?
India has over 4,600 distinct population groups — one of the most genetically diverse populations on Earth. Many carry unique genetic markers that affect disease susceptibility and drug response. But Indian genomes were severely underrepresented in global databases (which are dominated by European-ancestry populations). Without India-specific data, diagnoses and treatments developed elsewhere may not work optimally for Indians. Genome India fills this gap by creating a reference genome tailored to India's diversity — enabling personalised medicine, rare disease diagnosis, and targeted drug development for Indian populations.
What are the ethical concerns with genome data?
Major concerns include: (1) Privacy — genetic data can reveal predispositions to diseases, ancestry, and family relationships. If leaked, it could be used for genetic discrimination by insurers or employers. (2) Consent — participants must understand what their data will be used for. (3) Social misuse — genetic studies could fuel debates about racial purity or caste-based genetics. (4) Biopiracy — Indian genetic material could be exported and commercialised without benefit to the source communities. India's Biotech-PRIDE Guidelines (2021) and the FeED Protocol address some of these concerns, but India still lacks a comprehensive genetic data privacy law.
Section 10 — Mains
📜 Probable Mains Questions
Probable Question 1
"Discuss the significance of the Genome India Project for personalised medicine and disease prevention in India. What challenges does the project face?"
Probable Question 2
"What is genome sequencing? Discuss its applications in healthcare, agriculture, and forensics. What ethical concerns does it raise?"
Probable Question 3
"Explain the concept of pharmacogenomics. How can genome sequencing data contribute to the development of personalised medicine in India?"
Section 11
🏁 Conclusion
🧬 Reading the Book of Life
In 2003, it took 13 years and $3 billion to read one human genome. In January 2025, India announced that it had read 10,074 — from 85 of its most diverse population groups — creating the country's first comprehensive genetic reference database. The findings, published in Nature Genetics, identified 180 million genetic variants, including rare mutations linked to thalassemia, sickle cell anaemia, and neurological disorders. Thirty-eight critical variants were found that affect how Indians metabolise drugs — the foundation for a future where your medicine is prescribed based on your DNA, not your symptoms alone.
This is the promise of genome sequencing: a world where diseases are predicted before they appear, where drugs are tailored to the individual, where agricultural crops are bred with precision, and where forensic science identifies with certainty. But it is also a world of profound ethical complexity — where genetic privacy, discrimination, consent, and equity must be safeguarded as carefully as the data itself.
For UPSC, remember the core chain: DNA → Gene → Genome → Sequencing → Applications (medicine, agriculture, forensics) → Limitations (privacy, data, ethics). And know the India-specific projects: IndiGen (CSIR, 1,029 genomes, 2019) vs Genome India (DBT, 10,074 genomes, completed 2025, data at IBDC Faridabad).