Transcriptor

https://github.com/aphananthe42/transcriptor/assets/68156481/d5f3ea30-b681-4f13-8219-b4e0e365709e

**The recorded content of the audio data**
「本日はご来場いただきまして、誠にありがとうございます。  開演に先立ちまして、お客様にお願い申し上げます。携帯電話など、音の出るものの電源はお切りください。また許可のない録音・撮影はご遠慮ください。皆様のご協力をよろしくお願いいたします。」

Overview

This is a CLI tool for transcribing and summarizing audio data. It can also distinguish speakers and output the transcription separately for each speaker, useful for meeting minutes.

Requirement

Deno >= 1.41.0
Amazon S3 Bucket
AWS IAM Access key pair
OpenAI API key

Technologies

Deno
TypeScript
Amazon S3
Amazon Transcribe

System

Screenshot 2024-03-07 at 0 54 05

Transcriptor put audio data to S3.
AmazonTranscribe read audio data from S3.
AmazonTranscribe output transcription result to S3.(same bucket as the one where the audio data is stored.)
Transcriptor get transcription result from S3.
Transcriptor summarize transcription result via OpenAI API.

Usage

0. Install Deno and set the PATH

$ curl -fsSL https://deno.land/x/install/install.sh | sh

Add the location of the deno executable to the PATH variable. (e.g., ~/.bashrc, ~/.bash_profile, or ~/.zshrc)

export DENO_INSTALL="$HOME/.deno"
export PATH="$DENO_INSTALL/bin:$PATH"

1. Install transcriptor from deno.land/x

$ deno install --allow-env --allow-sys --allow-read --allow-net https://deno.land/x/transcriptor@v1.1.4/src/transcriptor.ts

2. Create .env file and fill in environment variables as per the following example.

$ touch .env
# or add the following if .env already exists.

AWS_ACCESS_KEY_ID="YOUR_AWS_IAM_ACCESS_KEY_ID"
AWS_SECRET_ACCESS_KEY="YOUR_AWS_IAM_SECRET_ACCESS_KEY"
AWS_REGION="your-aws-region"
TRANSCRIPTOR_S3_BUCKET_NAME='your-s3-bucket-name'

OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
OPENAI_GPT_MODEL="your-prefer-gpt-model(ex. gpt-3.5-turbo)"
TRANSCRIPTOR_SYSTEM_PROMPT="system prompt for summarizing with GPT"

3. Run script like below.

transcriptor --file='path/to/your/audio/data/to/summarize

Argument options

--lang='ja-JP'
// The language spoken in the audio file.
// default: 'ja-JP'

--model='gpt-3.5-turbo'
// The name of the GPT model used for summarizing.
// default: 'gpt-3.5-turbo'

--speakerCount='4'
// The number of speakers in the audio data.
// default: 1

License

MIT License