AWS Data Wrangler

Pandas on AWS

AWS Data Wrangler

An AWS Professional Service open source initiative |

Release Python Version Code style: black License

Checked with mypy Coverage Static Checking Documentation Status

Source Downloads Installation Command
PyPi PyPI Downloads pip install awswrangler
Conda Conda Downloads conda install -c conda-forge awswrangler

Powered By

Table of contents

Quick Start

Installation command: pip install awswrangler

import awswrangler as wr
import pandas as pd

df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})

# Storing data on Data Lake

# Retrieving the data directly from Amazon S3
df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True)

# Retrieving the data from Amazon Athena
df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")

# Get Redshift connection (SQLAlchemy) from Glue and retrieving data from Redshift Spectrum
engine = wr.catalog.get_engine("my-redshift-connection")
df = wr.db.read_sql_query("SELECT * FROM external_schema.my_table", con=engine)

# Get MySQL connection (SQLAlchemy) from Glue Catalog and LOAD the data into MySQL
engine = wr.catalog.get_engine("my-mysql-connection")
wr.db.to_sql(df, engine, schema="test", name="my_table")

# Get PostgreSQL connection (SQLAlchemy) from Glue Catalog and LOAD the data into PostgreSQL
engine = wr.catalog.get_engine("my-postgresql-connection")
wr.db.to_sql(df, engine, schema="test", name="my_table")

Read The Docs

Community Resources

Please send a Pull Request with your resource reference and @githubhandle.

Who uses AWS Data Wrangler?

Knowing which companies are using this library is important to help prioritize the project internally.

Please send a Pull Request with your company name and @githubhandle if you may.

Aws Data Wrangler

Pandas on AWS

Aws Data Wrangler Info

⭐ Stars 1098
πŸ”— Source Code
πŸ•’ Last Update a year ago
πŸ•’ Created 3 years ago
🐞 Open Issues 23
βž— Star-Issue Ratio 48
😎 Author awslabs