
Shashi Shankar
Mar 17, 2023
Up and Running in a flash with Snowflake Data Lake
What is Snowflake
Intro
Software as a Service
Hardware and software managed by the Snowflake team.
Data Warehouse hosted on public clouds – AWS, Azure and Google Cloud.
No Hardware or software to buy or maintain – near zero setup and maintenance.
As a Snowflake customer you simply sign up, load your data and start querying
Cannot be hosted on private cloud or on-premise.
Separate layers for storage, compute and services.
Data stored in columnar, compressed format in micro-partitions.
Low cost of ownership.
Supports variety of file formats - CSV, Parquet, ORC, JSON, XML, regular expressions, AVRO.
Supported programing languages – SQL, Python and Java
Tools - Web UI, SnowSQL Command Level Interface.
Interfaces – ODBC, JDBC, Python, Spark, Node.js, PHP, .NET, Several third party partners.
Concept of External table for reading data hosted outside of Snowflake storage.
Ease of Use
SQL based data warehouse.
Full Transactional consistency (ACID).
Supports structured and semi-structured data format - CSV, JSON, Parquet, ORC, XML.
Provides data loading and unloading tools including data pipeline.
Snowflakes handles all aspects of authentication, configuration, resource management, data protection, availability, optimization, etc.
Every workload gets the same copy of data.
Every workload can get its own compute environment for data processing.
Zero copy cloning.
Scalability
No hardware or software needed to be bought and configured.
Scalability possible with click of a few buttons.
Enables faster Time-to-Market.
Variety of workloads (Data Engineering, Data Scientists, Analysts, BI, Metadata Management)
Unlimited storage scalability
The concept of Elastic Warehouse provides easy vertical and horizontal scaling.
Unlimited storage scalability.
Concept of Elastic Warehouse provides scalability and compute isolation.
Availability and Recoverability
Time Travel – point in time rollback feature
99.9% availability and failover capability.
Variety of workloads (Data Engineering, Data Scientists, Analysts, BI, Metadata Management)
Automatic backup, replication and cross zone cross-cloud replication.
DR Solutions – Database replication and Failover,
Multi region replication between Snowflake accounts.
Replication possible across AWS, Azure and Google Cloud.
Security
End to end data encryption.
Customer managed encryption key.
Column based control through data masking and tokenization.
Authentication, MFA, Federated, SSO, OAuth.
Role based access control.
Complete data security and compliant with HIPAA, PCI DSS and other regulatory requirements.
Concept of Secured view provides additional security.
Concept of Reader account provides secured read only data sharing.
Performance
Data Sharing – Secure and safe data sharing both within and outside of the organization.
Column level security using masking and tokenization.
Secured and seamless data sharing.
Data Marketplace.
Smart use of data and metadata cashing.
Clustering of elastic warehouse.
Snowflake managed materialized view provides improved performance for complex queries.
Snowflake managed micro partitioning and columnar compressed storage provides faster query performance.