#How to set up Presto, and use it with Superset + QueryPie

With QueryPie's recent implementation of data warehouse sources like Snowflake, BigQuery and Redshift, feedback from developers as well as data analysts dealing with large volumes of data has been coming in frequently. We are also preparing to add new data sources such as Presto and Qubole to meet the needs of our users.

Today we are going to discuss how to connect to Presto on QueryPie, how to set server settings, how to connect using Supersets, and how much more convenient it is to run queries on QueryPie compared to other tools.

1. What is Presto?

Presto is a big data analysis tool for open sources developed by Facebook. It is easy to mistake Presto as an existing database, but instead of replacing the existing concept of MySQL and PostgreSQL, it's more like a SQL query engine effectively handling large amounts of data using distributed queries. Presto's server consists of a coordinator and several workers, where the role of the interface is played by the coordinator.

source
Reference: https://labs.gree.jp/blog/2014/12/12838/

Some advantages of Presto are support for a variety of repositories and easy plug-in expansion. It is also faster than Hive, which was developed on Facebook for the same initial purpose. Given that Presto supports standard SQL, it has been extended to Presto-based big data processing resources (platforms) such as Athena and Qubole, and the number of users is increasing.

2. How to set up a Presto Server

To set up Presto, prepare a server for installation and then download the necessary files from prestosql.io.

wget https://repo1.maven.org/maven2/io/prestosql/presto-server/332/presto-server-332.tar.gz

tar -xvzf presto-server-332.tar.gz

cd presto-server-332
Unzip after downloading tar files

You also need to create an etc directory inside the folder.

And create properties in the etc directory.

## etc/node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/presto/data

## etc/jvm.properties
-server
-Xmx16G
-XX:-UseBiasedLocking
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-XX:+UseGCOverheadLimit
-XX:+HeapDumpOnOutOfMemoryError
-XX:ReservedCodeCacheSize=512M
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000

## etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080

## etc/log.properties
io.prestosql=INFO

## etc/catalog/jmx.propterties
connetor.name = jmx
The following is a minimal

After creating all the config files, type bin/launcher start and launch.

Once launched, go to localhost:8080, and you'll see the default UI running automatically. You can also see that it is connected to Presto by creating the properties of the desired database and re-executing under the etc/catalog.

3. Use case (1): BI Tool

📌Presto on Superset

Presto is not only similar to RDBMSs such as MYSQL, SQLServer, PostgreSQL, but also like various other data sources such as Hive, Cassandra, Kafka, and Elastic Search, so it is often used by organizations that process large amounts of data in real time. When analyzing using and analyzing SQL, we usually share visualized data in the form of a dashboard.

There are many amazing BI tools that support Presto. Check out this helpful list of BI Tools to find the right one for you.

Today we're going to use superset, an Open Source tool, to connect to Presto.

Superset - https://superset.incubator.apache.org/

Using Superset, you can connect to your desired database by going to Sources > Databases. Click the plus button next to 'Filter List' to add your database. On the page that pops up, you can connect to Presto and enter the corresponding values in Database, SQLAlchemy URI.

SQL Alchemy URI - presto://user@host:port/catalog

Now you can use the linked data sources to generate charts and dashboards. You can also use the SQL Lab, which acts as en editor when you need to run a query and check a value. Within the SQL Lab, there is an editor, search functions, and an option to save queries so you can conveniently run SQL inside Superset. However, there are limited options for formatting, viewing tables and no way to see simultaneous execution of queries. For that, you need to work within a SQL Tool.

SQL Editor on Superset

4. Use case (2): SQL Tool

📌Presto on QueryPie

Presto supports basic jdbc, so you can create queries by connecting to an SQL IDE like DataGrip that can import custom jdbc. However, it's a bit bothersome to import after downloading the jar file. There is also the inconvenience of having to write a query every time you want to specify a catalog.

QueryPie supports Presto without the need for jar file import and regular query statements without the need to specify a catalog.

It also provides basic SQL IDE tools such as simultaneous execution of SQL and easy table browsing. Compared to other tools, QueryPie has the advantage of being able to collaborate with your teams easily through functions such as sharing SQL through shared directories and comments.

If you were previously analyzing your data with Presto, try out new Presto SQL IDE!

📌 Start QueryPie for free
📌 See QueryPie Tutorial