Topic 18 – SQL for Big Data and Cloud

image 2

Play Store Application link – SQL in 18 steps – App on Google Play

SQL is not just for traditional databases anymore. With the rise of Big Data and Cloud computing, SQL has evolved to work with massive datasets and distributed systems. Let’s explore how SQL is used in Big Data platforms like Hadoop and cloud services like AWS, Azure, and Google Cloud.


SQL in Hadoop (HiveQL)

Big Data systems like Apache Hadoop store vast amounts of data in distributed clusters. Since traditional SQL databases struggle with such large-scale data, Hadoop introduced Hive, which allows SQL-like querying using HiveQL.

Example: Querying Data in HiveQL

HiveQL is similar to standard SQL but optimized for distributed processing.

CREATE TABLE sales (
    id INT,
    product STRING,
    amount DOUBLE,
    date STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

LOAD DATA INPATH '/data/sales.csv' INTO TABLE sales;

SELECT product, SUM(amount) AS total_sales
FROM sales
GROUP BY product;

  • Hive stores data in HDFS (Hadoop Distributed File System).
  • It does not support transactions like traditional RDBMS.
  • Best for batch processing, not real-time queries.

Cloud Databases (AWS RDS, Azure SQL, Google Cloud SQL)

Cloud platforms offer managed SQL database services, eliminating the need for manual setup and maintenance.

AWS RDS (Relational Database Service)

  • Supports MySQL, PostgreSQL, SQL Server, MariaDB, and Oracle.
  • Automatically handles backups, scaling, and maintenance.
  • Example of creating an RDS MySQL instance using CLI:
aws rds create-db-instance \
  --db-instance-identifier mydb \
  --db-instance-class db.t2.micro \
  --engine mysql \
  --allocated-storage 20 \
  --master-username admin \
  --master-user-password password123

Azure SQL Database

  • Microsoft’s fully managed SQL Server in the cloud.
  • Supports serverless compute and automatic tuning.
  • Example of querying an Azure SQL database:
SELECT TOP 10 * FROM customers ORDER BY created_at DESC;

Google Cloud SQL

  • Managed MySQL, PostgreSQL, and SQL Server instances.
  • Supports automatic backups and high availability.
  • Example of connecting via command line:
gcloud sql connect my-instance --user=root


Differences in SQL Usage on Cloud Platforms

FeatureAWS RDSAzure SQLGoogle Cloud SQL
Database EnginesMySQL, PostgreSQL, SQL Server, Oracle, MariaDBSQL ServerMySQL, PostgreSQL, SQL Server
ScalingVertical & horizontalAuto-scaleAuto-scale
Pricing ModelPay-as-you-goPay-as-you-goPay-as-you-go
BackupsAutomatedAutomatedAutomated
SecurityIAM Roles, EncryptionActive DirectoryIAM Roles, Encryption

Conclusion

SQL has expanded beyond traditional databases to power Big Data and Cloud applications. Whether using HiveQL for large-scale data processing or AWS/Azure/Google Cloud SQL for managed databases, SQL continues to evolve and adapt to modern needs. Understanding these different SQL environments helps in designing scalable and efficient data solutions.

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *