数据源之json

  val jsonPath = ""
  spark.read.json(jsonPath) //方式一
  spark.read.format("json").load(jsonPath) //方式二

数据源之parquet

  val parqPath = ""
  spark.read.parquet(parqPath) //方式一
  spark.read.format("parquet").load(parqPath) //方式二

数据源之Mysql

准备工作:

pom.xml

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.38</version>
    </dependency>

测试代码: TestMysql.scala

  // 提前 pom.xml中导入Mysql Driver(驱动)依赖
  // 设置url,table,properties
  val url = "jdbc:mysql://localhost:3306/interview"
  val table = "items"
  val properties = new Properties()
  properties.setProperty("user","root")
  properties.setProperty("password","root")

  // 直接使用SparkSession中的jdbc方法
  val df = spark.read.jdbc(url,table,properties)
  df.createOrReplaceTempView("items")
  spark.sql("select * from items").show()
}

数据源之Hive

准备工作

(1)pom.xml

    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.38</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>

(2) 开发环境则把resource文件夹下添加hive-site.xml文件,集群环境把hive的配置文件要发到$SPARK_HOME/conf目录下

 数据源 Hadoop

hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://Master:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
        <description>username to use against metastore database</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
        <description>password to use against metastore database</description>
    </property>

    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/hive/warehouse</value>
        <description>hive default warehouse, if nessecory, change it</description>
    </property>
</configuration>

测试代码: TestHive.scala

package com.bky.TestHive

import org.apache.spark.sql.SparkSession

object TestHive extends App {

  val spark = SparkSession
    .builder()
    .appName(this.getClass.getSimpleName)
    .master("local[2]")
    .getOrCreate()

  spark.sql("select * from student_ext").show()
}

扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄