Skip to content
On this page

Query Tips ​

Spark SQL Programming Guide

SQL Tutorial from w3schools.com

Cheat Sheets from LearnSQL.com🔥🔥🔥

SparkSQL VS. PostgresSQL ​

DescriptionSparkSQL(current version used by geniidata)PostgresSQL
Usage of Single Quote, Double Quote, and BackticksQuotes (Single and Double) are used around strings. Backticks are used around table and column identifiers and alias.
Example:
SELECT "SparkSQL" as `type`, block_time, `from`, `to` FROM `ethereum`.`transactions` LIMIT 1
PostgresSQL support only single quote (') character for strings and use the double quote (") around table and column identifiers.
Example:
SELECT 'PostgresSQL' as "type", block_time, "from", "to" FROM "ethereum"."transactions" LIMIT 1
Usage of INTERVALExample:
INTERVAL '1day'
Example:
INTERVAL 1 day
Usage of SUBSTRING/SUBSTR functionExample:
substring(str, 1, 2)
Example:
substring(str FROM 1 FOR 2)
Case Sensitivity in String ComparisonsSpark is case-sensitive. We can use case conversion like (lower, upper, etc.) and compare the objects.
Example:
WHERE name = 'GeniiData'
might give different results than
WHERE name = 'geniidata'
PostgreSQL is a case-sensitive database by default but could be case-insensitive on some columns by applying some settings.
expr1 ^ expr2expr1 ^ expr2 - Returns the result of bitwise exclusive OR of expr1 and expr2. Example:
>SELECT 3 ^ 5;
6
To raise expr1 to the power of expr2 we can use the power function. And numbers can be written in scientific notation.
Example:
> SELECT pow(3, 5)
243
> SELECT 1e8
100000000
> SELECT 1E-1
0.1
expr1 ^ expr2 - Raises expr1 to the power of expr2. Example:
> SELECT 3 ^ 5;
243

Date and Timestamp Functions ​

FunctionDescriptionExample
current_date()Returns the current date at the start of query evaluation. All calls of current_date within the same query return the same value.> SELECT current_date();
2022-10-15
current_timestamp()Returns the current timestamp at the start of query evaluation. All calls of current_timestamp within the same query return the same value. Equivalent to function now().> SELECT current_timestamp();
2022-10-15 20:51:00
intervalInterval is a data type, not a function. Using interval literals, it is possible to perform subtraction or addition of an arbitrary amount of time from a date or timestamp value. For example, we can easily express queries like "Find all transactions that have happened during the past hour".> SELECT current_timestamp(), current_timestamp() - interval 1 hour as col;
2022-11-09 10:50:44.444000
2022-11-09 09:50:44.444000
> SELECT current_date(), current_date() - interval 1 day as col;
2022-11-09 2022-11-08
date_trunc(fmt, ts)Returns timestamp ts truncated to the unit specified by the format model fmt. fmt should be one of ["YEAR", "YYYY", "YY", "MON", "MONTH", "MM", "DAY", "DD", "HOUR", "MINUTE", "SECOND", "WEEK", "QUARTER"]> SELECT date_trunc('YEAR', '2015-03-05T09:32:05.359');
2015-01-01 00:00:00
> SELECT date_trunc('MM', '2015-03-05T09:32:05.359');
2015-03-01 00:00:00
> SELECT date_trunc('DD', '2015-03-05T09:32:05.359');
2015-03-05 00:00:00
> SELECT date_trunc('HOUR', '2015-03-05T09:32:05.359');
2015-03-05 09:00:00
datediff(endDate, startDate)Returns the number of days from startDate to endDate.> SELECT datediff('2009-07-31', '2009-07-30');
1
> SELECT datediff('2009-07-30', '2009-07-31');
-1
from_utc_timestamp(timestamp, timezone)Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and render that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.> SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
016-08-31 09:00:00
to_utc_timestamp(timestamp, timezone)Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and render that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.> SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-30 15:00:00
last_day(date)Returns the last day of the month to which the date belongs to.> SELECT last_day('2009-01-12');
2009-01-31
trunc(date, fmt)Returns date with the time portion of the day truncated to the unit specified by the format model fmt. fmt should be one of ["year", "yyyy", "yy", "mon", "month", "mm"].> SELECT trunc('2009-02-12', 'MM');
2009-02-01
> SELECT trunc('2015-10-27', 'YEAR');
2015-01-01

Conversion Functions ​

FunctionDescriptionExample
bigint(expr)Casts the value expr to the target data type bigint.
binary(expr)Casts the value expr to the target data type binary.
boolean(expr)Casts the value expr to the target data type boolean.
cast(expr AS type)Casts the value expr to the target data type type.> SELECT cast('10' as int);
10
date(expr)Casts the value expr to the target data type date.
decimal(expr)Casts the value expr to the target data type decimal.
double(expr)Casts the value expr to the target data type double.
float(expr)Casts the value expr to the target data type float.
int(expr)Casts the value expr to the target data type int.
smallint(expr)Casts the value expr to the target data type smallint.
string(expr)Casts the value expr to the target data type string.
timestamp(expr)Casts the value expr to the target data type timestamp.
tinyint(expr)Casts the value expr to the target data type tinyint.
conv(num, from_base, to_base)Convert num from from_base to to_base. Not works for numbers higher than 64 bits.> SELECT conv('100', 2, 10);
4
> SELECT conv(-10, 16, -10);
-16
bytea2numeric(hex)Convert a hexadecimal number to decimal and returned as string type.Equivalent to conv(num, 16, 10), but works for numbers higher than 64 bits.> SELECT bytea2numeric('0x1234');
4660
> SELECT bytea2numeric('1234');
4660