PageRenderTime 14ms CodeModel.GetById 7ms app.highlight 2ms RepoModel.GetById 1ms app.codeStats 0ms

/doc/proposal/old-idea/proposal.tex

Relevant Search: With Applications for Solr and Elasticsearch

For more in depth reading about search, ranking and generally everything you could ever want to know about how lucene, elasticsearch or solr work under the hood I highly suggest this book. Easily one of the most interesting technical books I have read in a long time. If you are tasked with solving search relevance problems even if not in Solr or Elasticsearch it should be your first reference. Amazon Affiliate Link
http://github.com/hhughes/ocaml-frui
LaTeX | 47 lines | 37 code | 10 blank | 0 comment | 0 complexity | 480c6061beef42710bb846ea751d1248 MD5 | raw file
 1\documentclass[10pt,a4paper]{article}
 2\usepackage{hyperref}
 3\begin{document}
 4\setlength{\parindent}{0cm}
 5\setlength{\parskip}{10pt plus2mm minus2mm}
 6\part*{Project Proposal}
 7\section{Summary}
 8I propose to create a system which transcribes an audio conversation and outputs text sections tagged by speaker in the style of a script. Current systems take an audio file and return a transcript of continuous prose; I intend to combine this with speaker identification to indicate when the speaker changes. Initially I would create a transcription system that identifies when the speaker changes before extending the system to cluster these sections of text based on a known and then unknown number of speakers.
 9
10This project has some use in the real world. It could be used as a time-saving device when transcribing the contents of a recorded interview or for automated minute taking during meetings. It could aid searching the internet for work by certain people or be used for creating coloured subtitles denoting speaker on television programmes and films.
11
12\section{Proposed Work}
13
14Speech-to-text transcription has been implemented many times. This project will focus on deciding when the speaker changes, using an existing speech recognizer to carry out the transcription. There are several stages of this project which would serve as milestones.
15\begin{itemize}
16\item Successfully transcribing an audio file into text
17\item Distinguish when the speaker changes
18\item Marking the speech blocks with information about the speaker (tone, pitch, frequencies etc.)
19\item Grouping blocks with similar speech properties (work out who says what)
20\end{itemize}
21
22\section{Resources}
23\begin{itemize}
24\item A Java IDE: Eclipse (\url{www.eclipse.org})
25\item Open source speech recognizer: Sphinx4 (\url{http://cmusphinx.sourceforge.net/sphinx4})
26\item Google Code mercurial repository for version control and backup (\url{http://code.google.com})
27\item Sample conversations, use news interviews or record some myself
28\end{itemize}
29
30\section{Structure of the Project}
31The application should be simple. A console-based solution will be used which takes the audio file as an input and produces a text file as output. The program should provide the user with some feedback showing progress of the transcription process. The output file should consist of a series of speech sections, each starting with the speaker (�Speaker n�) and followed by the text.
32
33The project stages will be based loosely around the four milestones mentioned above. The very minimum system implemented would find the position in an audio file where the speaker first changes. This is probably the most challenging part of the project and once achieved, the rest of the project should follow on in a straight forward manner.
34
35To measure effectiveness of the project I will run tests on several recordings of the same audio conversations. This will test the robustness of the speaker identification algorithm. The accuracy of the transcribed text is not of much concern; as this is not the project's main objective I do not intend to modify or improve this function.
36
37As it stands, the software allocates a number to speakers as opposed to their real identity. It could be extended further by collecting and storing speech data from speakers and linking this to their name. This gives two advantages. The more data we have about a speaker the more accurately they can be identified. The second advantage is that an internet search could bring up the content of recorded conversations the speaker participated in.
38
39\section{Success Criteria}
40The following should be achieved:
41\begin{itemize}
42\item Create a console application which will transcribe an audio conversation
43\item Group sentences with a known number of speakers
44\end{itemize}
45I intend to compare my algorithms for identifying speakers to na\"{i}ve versions which I will implement. For the project to be a success, the advanced solution must be more accurate at defining the boundaries between different speakers.
46
47\end{document}