


Otherwise, let’s quickly review how to deploy Zeppelin.ĭeploying a Zeppelin notebook with AWS Glue If you already used an AWS Glue development endpoint to deploy a Zeppelin notebook, you can skip the deployment instructions. I deployed a Zeppelin notebook using the automated deployment available within AWS Glue.
ATHENA AWS JSON CODE
The AWS Glue database name I used was “blog,” and the table name was “players.” You can see these values in use in the sample code that follows. You can find instructions on how to do that in Cataloging Tables with a Crawler in the AWS Glue documentation. I stored my data in an Amazon S3 bucket and used an AWS Glue crawler to make my data available in the AWS Glue data catalog.In my example, I took two preparatory steps that save some time in your ETL code development: You can further extend the usefulness of the data by performing joins between data stored in S3 and the data stored in an Amazon Redshift data warehouse. Storing the transformed files in S3 provides the additional benefit of being able to query this data using Amazon Athena or Amazon Redshift Spectrum. You can use either of these format types for long-term storage in Amazon S3. You can also write it to delimited text files, such as in comma-separated value (CSV) format, or columnar file formats such as Optimized Row Columnar (ORC) format. You can then write the data to a database or to a data warehouse. If the developers want to ETL this data into their data warehouse, they might have to resort to nested loops or recursive functions in their code. Further down, the player’s arsenal information includes additional nested JSON data. The player named “user1” has characteristics such as race, class, and location in nested JSON data. Sample 1 shows example user data from the game. Suppose that the developers of a video game want to use a data warehouse like Amazon Redshift to run reports on player behavior based on data that is stored in JSON. Let’s look at how Relationalize can help you with a sample use case. The transformed data maintains a list of the original keys from the nested JSON separated by periods. Relationalize transforms the nested JSON into key-value pairs at the outermost level of the JSON document.

May 2022: This post was reviewed and updated to include resources for orchestrating data and machine learning pipelines.ĪWS Glue has a transform called Relationalize that simplifies the extract, transform, load (ETL) process by converting nested JSON into columns that you can easily import into relational databases.
