India is the world's pharmacy. It supplies over 60% of global vaccines and a third of all generic medicines. Yet structured, analytics-ready data about what is actually sold in the Indian market has been hard to access at scale. That is starting to change.
The A-Z Medicines Dataset of India, available on Kuinbee's marketplace, is a structured reference covering 253,973 pharmaceutical products marketed in India. It spans drug names, brand identities, generic compositions, dosage forms, manufacturer information, and pricing - cleaned, standardized, and ready for analytics.
⚡ Key Takeaways
- India manufactures for the world, but its pharma data has been fragmented across regulators, circulars, and filings.
- The A-Z Medicines Dataset provides a single clean table of 253,973 products with eight core attributes.
- The dataset is free under CC BY 4.0 with a KDTS quality score of 89.4.
- Pricing and composition fields enable market analysis, ML pricing models, and supply chain risk studies.
- The largest limitation is freshness: pricing snapshots need periodic refreshes as DPCO revisions change MRPs.
Why India's Pharma Data Has Been a Blind Spot
India's pharmaceutical industry generated roughly $50B in revenue in 2025 and is projected to reach $130B by 2030. It exports to 200+ countries and supports more than 10,500 registered manufacturers. The scale is extraordinary.
Yet public drug databases are fragmented across CDSCO, state authorities, and formularies. Pricing data sits in government circulars, and composition details are scattered across labeling filings. Analysts have had to stitch inconsistent sources together or rely on expensive proprietary databases.
India manufactures for the world, but the data about what it manufactures has lived in silos. A consolidated, analytics-ready dataset changes the unit economics of pharmaceutical research entirely.
What's Inside: 253,973 Records, 8 Attributes
The dataset is a single flat table covering the core attributes needed for pharmaceutical analytics. This schema makes it practical for SQL, Python, and BI workflows without complex joins.
A-Z Medicines Dataset schema (core fields).
| Field | Type | Required | Description |
|---|---|---|---|
| Medicine_Name | string | Required | Full product name as marketed in India |
| Brand_Name | string | Optional | Brand identity; may differ from medicine name for generics |
| Composition | string | Optional | Generic active ingredient(s) with strength ratios |
| Dosage_Form | categorical | Optional | Tablet, capsule, syrup, injection, cream, and more |
| Strength | string | Optional | Concentration per unit, such as 500mg or 10mg/5ml |
| Manufacturer | string | Optional | Normalized manufacturer name |
| Price | float | Optional | MRP in INR, range validated |
| Pack_Size | string | Optional | Units per pack, such as 10 tablets or 100ml |
Six High-Value Use Cases
Pharmaceutical pricing analysis
Benchmark prices by manufacturer, dosage form, and pack size. Identify under- and over-priced segments by molecule class.
Competitive benchmarking
Analyze manufacturer presence across therapeutic categories and map brand versus generic positioning.
ML price prediction
Train regression models to predict pricing from composition, manufacturer, and dosage form features.
Drug category classification
Cluster medicines by composition similarity and build molecule-to-brand mapping tools at scale.
Supply chain modeling
Quantify manufacturer concentration by category and simulate procurement risks.
Healthcare cost research
Study affordability, branded vs generic price gaps, and essential medicine baskets.
Indicative Distribution of Medicines by Dosage Form
Data Quality: What Was Done to Get It Here
Raw pharmaceutical data is messy. Manufacturer names appear in multiple spellings, compositions are inconsistently recorded, and pricing can include outliers. The cleaning pipeline applied schema normalization, duplicate removal, missing value assessment, categorical standardization, and range validation on price.
KDTS score: 89.4. Completeness 95, Legitimacy 92, Precision 84, Usefulness 94, Freshness 70. The freshness score reflects the reality that pricing snapshots need periodic refreshes as DPCO revisions change MRPs.
— Kuinbee data quality pipeline assessment, May 23, 2026
Honest Limitations Worth Knowing
- Community-sourced; not an official CDSCO registry.
- Pricing is a snapshot at time of compilation and not real-time.
- No clinical trial, prescription, or approval timeline fields.
- Intended for analytics and research, not prescribing decisions.
- Optional fields such as Brand_Name and Composition may be null in some records.
The Bigger Opportunity This Dataset Unlocks
India's domestic formulations market is growing at roughly 10 to 12% annually. Generic exports are accelerating, and the government's PLI scheme is reshaping manufacturing geography. Digital health infrastructure is expanding rapidly, creating demand for pharma intelligence tools that did not exist five years ago.
Indicative Manufacturer Portfolio Concentration
Explore the A-Z Medicines Dataset
Access structured pharma datasets covering pricing, composition, dosage, and manufacturer intelligence for India's drug market.
Explore Healthcare Datasets