数据源
数据源之json
val jsonPath = ""
spark.read.json(jsonPath) //方式一
spark.read.format("json").load(jsonPath) //方式二
数据源之parquet
val parqPath = ""
spark.read.parquet(parqPath) //方式一
spark.read.format("parquet").load(parqPath) //方式二
数据源之Mysql
准备工作:
pom.xml
SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。 <dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
</dependency>
测试代码: TestMysql.scala
// 提前 pom.xml中导入Mysql Driver(驱动)依赖
// 设置url,table,properties
val url = "jdbc:mysql://localhost:3306/interview"
val table = "items"
val properties = new Properties()
properties.setProperty("user","root")
properties.setProperty("password","root")
// 直接使用SparkSession中的jdbc方法
val df = spark.read.jdbc(url,table,properties)
df.createOrReplaceTempView("items")
spark.sql("select * from items").show()
}
数据源之Hive
准备工作
(1)pom.xml
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
(2) 开发环境则把resource文件夹下添加hive-site.xml文件,集群环境把hive的配置文件要发到$SPARK_HOME/conf目录下
hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://Master:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive/warehouse</value>
<description>hive default warehouse, if nessecory, change it</description>
</property>
</configuration>
测试代码: TestHive.scala
package com.bky.TestHive
import org.apache.spark.sql.SparkSession
object TestHive extends App {
val spark = SparkSession
.builder()
.appName(this.getClass.getSimpleName)
.master("local[2]")
.getOrCreate()
spark.sql("select * from student_ext").show()
}
更多精彩