Sqoop is an open source tool (originally written at Cloudera) designed to transfer data between Hadoop and RDBMS.
Comment
It’s possible to read data directly from an RDBMS in spark application
May cause DDOS on RDBMS
In practice - don’t do it!
Import data into HDFS beforehand
Use JDBC interface, works with any JDBC-compatible database
Imports data to HDFS as delimited text files or SequenceFiles
Default is comma delimited text files
Can be used for incremental data imports
First import retrieves all rows in a table
Subsequent imports retrieve just rows created since the last import
Syntax
Use sqoop help to get basic commands. each command also support help like sqoop import help
$ sqoop help
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
list-databases List available databases on a server
list-tables List available tables in a database
version Display version information
See 'sqoop help COMMAND' for information on a specific command.