RT-CV scraper client
A client that can be spawned by a scraper to ease communication with RT-CV
Communication flow
custom scraper <> rtcv_scraper_client (this project) <> RT-CV
- The custom scraper spawns rtcv_scraper_client as child process
- The custom scraper sends it's credentials to the child process via stdin
- rtcv_scraper_client handles the authentication and reports if it went successfull
- The custom scraper starts scraping and sends every scraped result to it's child process (rtcv_scraper_client)
- rtcv_scraper_client sends the scraped data to rt-cv and reports if it was successfull
- ...
Methods
Authentication
There are 3 authentication methods
set_credentials
if you have one RT-CV serverset_multiple_credentials
if you have multiple RT-CV serversset_mock
if you want to debug a scraper, this mocks the RT-CV server
set_credentials
Set credentials and location of a RT-CV server
Example input
{"type":"set_credentials","content":{"server_location":"http://localhost:4000","api_key_id":"111111111111111111111111","api_key":"ddd"}}
Ok Response
{"type":"ok"}
set_multiple_credentials
Set credentials and location of multiple RT-CV servers
Note that we need to define which server the primary should be, the primary server is used to fetch secrets and recently scraped reference numbers
Example input
{"type":"set_credentials","content":[{"primary":true,"server_location":"http://localhost:4000","api_key_id":"111111111111111111111111","api_key":"ddd"},{"server_location":"http://localhost:4000","api_key_id":"111111111111111111111111","api_key":"ddd"}]}
Ok Response
{"type":"ok"}
set_mock
Enable mock mode
Example input:
{"type":"set_mock","content":{}}
You can also provide mock secrets for the *_secret
methods.
This object need to follow the following type convention key (string) -> value (any)
{"type":"set_mock","content":{"secrets":{"users": [{"username":"foo","password":"bar"}],"user":{"username":"foo","password":"bar"}}}}
Ok Response
{"type":"ok"}
send_cv
Send a scraped CV to RT-CV and triggers the set_cached_reference
for the reference number of this cv if there where matches on this CV
Example input
{"type":"send_cv","content":{"reference_number":"abcd","..":".."}}
Ok Response
The content represents if a match was made
{"type":"ok","content":true}
get_secret
Get a user defined secret from the server
Example input
{"type":"get_secret","content":{"encryption_key":"my-very-secret-encryption-key", "key":"key-of-value"}}
Ok Response
{"type":"ok","content":{/*Based on the content stored in the secret value*/}}
get_users_secret
Get a secret from the server where the contents is a strictly defined list of users
Example input
{"type":"get_users_secret","content":{"encryption_key":"my-very-secret-encryption-key", "key":"users"}}
Ok Response
{"type":"ok","content":[{"username":"foo","password":"foo"},{"username":"bar","password":"bar"}]}
get_user_secret
Get a secret from the server where the contents is a strictly defined user
Example input
{"type":"get_user_secret","content":{"encryption_key":"my-very-secret-encryption-key", "key":"user"}}
Ok Response
{"type":"ok","content":{"username":"foo","password":"bar"}}
set_cached_reference
Save a reference number to the cache with a timeout of 3 days This cache is used to avoid sending the same CV twice or scraping data that has already been scraped
Note that "send_cv" also executes this function automatically
Example input
{"type":"set_cached_reference","content":"abcd"}
Ok Response
{"type":"ok"}
set_short_cached_reference
Same as previouse one except this one has a time out of 12 hours
has_cached_reference
Is there a cache entry for a specific reference number?
Example input
{"type":"has_cached_reference","content":"abcd"}
Ok Response
{"type":"ok","content":true}
ping
Send a ping message to the server and the server will respond with a pong
Example input
{"type":"ping"}
Ok Response
{"type":"pong"}
How to develop / Debug
# After a change update the binary in your path using:
go install
# Then in your scraper project
cd ../some-scraper
go run .
Debug data send by scraper to this program
# Logs the IO to scraper_client_input.log
export LOG_SCRAPER_CLIENT_INPUT=true
# Logs the debug messages to scraper_client.log
export ENABLE_SCRAPER_CLIENT_LOG=true
# Now run your scraper
go run .
# Now all messages written to this program are also written to scraper_client_input.log
# You can follow the input by tailing the file in a new terminal:
tail -f scraper_client_input.log
# All debug messages are now visible in scraper_client.log
# You can follow the input by tailing the file in a new terminal:
tail -f scraper_client_input.log
# After collecting the input data you can also use rtcv_scraper_client to replay sending the data
# This is very handy for debugging
rtcv_scraper_client -replay scraper_client_input.log
# There are also additional options for the -replay command:
rtcv_scraper_client \
-replay ./scraper_client_input.log \
-replaySkipCommands has_cached_reference,set_cached_reference,ping,get_users_secret
How to ship?
Currently we don't have pre build binaries so you'll need to compile the binary yourself
FROM golang:alpine AS obtain-rtcv-client
RUN go install github.com/script-development/rtcv_scraper_client@latest
FROM denoland/deno:alpine AS runtime
COPY --from=obtain-rtcv-client /go/bin/rtcv_scraper_client /bin/rtcv_scraper_client