superhypertext specs
a plain text/speech union
b more visual definition
c fast align tag timing translate toggle

evil patent could be good:
1 cross-license w evil-doers
2 tokenize, be user-owned
3 fund better text

specs below are old
> aim now is start conversation
> list ways to better text
> build tools easier

prototypes can clean up
1 taptime/ie
2 twext
3 pix8/ggif
4 caption 10x sync in youtube demos

people involved
a duke crawford, artist/inventor
b wafaa ibrahim, javascript prototypes
c jason tudisco, production softwares
d no more 10x devs w 10x probs
e ward cunningham, advisor

clean up plan
1 sbir/sttr grant via creativestartups
2 avax token maybe via wyohackathon
3 see if debuild/gpt3 can help
4 wafaa help duke clean up code
5 jason build production caption sync



alert: 150mb, let it load!
0 1 2 3 4 5 6 7 8 9 12
0. twext
federate me.
.
PCT/US14/29959 drawings, comments: 

th.ai this human augmenting intelligence
mixing twext, vocals, depictions, faces
in any language you like

twext? twin text translation alignment;
error and trial failed to make twext work. 
Too hard to edit. Now it's easy.

We can twext a wiki between the lines; federated,
you can compare your twexts with collective twexts;
on your client, your data will be super fast..
   1) TWEXTAREA
   2) SEGMENT SYNC
   3) TAP VOCAL TEXT
   4) SEGMENT CORRESPONDENCE 
   5) ALIGN CHUNKS
   6) SYNC CONTEXTS
   7) TAG. WHAT? 
   8) SORT TIERED DEPICTIONS
   9) TOGGLE FORMS/VERSIONS
  12) FACE MIX
  
0 1 2 3 4 5 6 7 8 9 12
1. multimedia textarea
align contexts, twext
start with textarea in a modern computer

1a represents a modern computing environment, with 
camera/mic, media player and textarea; a full tactile 
keyboard is ideal; haptic feedback may improve the 
mobile experience; we'll also try to detect taps 
via audio recording, because we gonna find and 
cancel tap noise out of audio anyway.
make text better: more futuristic; more informative; more fun
1) fact: is there prior art? where?
2) opinion: are these inventive steps?
twext in twextarea, align by the chunk

1b represents twext control in textarea. HTML5 
span tag may be the first known way to effectively
align editable twext in textarea. All WYSIWYG.
Any font, proportional or monospace. Any language.

If you can write it in Unicode, you can edit twext.


note: the span tag trick isn't limiting; maybe it's better to draw to canvas like bespin, carota, etc i don't wanna own some idea, i just wanna make text better
align translations by the chunk, toggle versions
cut saccades in half; acquire translation info 2x faster
imagine google translate twext
0 1 2 3 4 5 6 7 8 9 12
2. align timings
for each segment, a timing

2a shows text segments handled in twextarea,
with a timing aligned for each segment.

2b represents media controlled from twextarea;
media player above sorting tiers to left of twextarea;
media playback controlled from keyboard combos, so you
navigate while editing text, fixing errors fast.

Timings shown aligned with syllabified text,
used to synchronize "vocal text" playback,
are made via Tap Tap process shown below.
disrupt the barrier between voice and text
sync voice right in textarea
0 1 2 3 4 5 6 7 8 9 12
3. vocal text
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4
human/bot sync vocal text

3a represents a "Human-Computation" feedback loop, 
where software guesses timings of a given audio segment,
then humans correct timings, so machines learn.

The slash over the textarea represents a fast switch 
from normal textarea content to twextarea, with timing
context shown aligned by the segment.
can we get paid to learn languages?
"Luis Von Ahn" + "Human Computation"
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

CAPsync plain-text


3b shows a nifty trick: see the word "TIMings"?
CAPitalize syllables in plaintext precisely while 
vocalization is reproduced. Call it "Vocal Text".

"Vocal" means voice saying words in text:
or pre-recorded voice like youtube, 
or generated like voice Text-To-Speech,
or live voice like yours with any text now, 
or voice recognition robot writing what you say
and synchronizing your vocals with text.

3c represents "CAPsync" or uppercase element 
moving through plaintext, precisely timed with 
start timings and end timings for each syllable 
or segment. Note: the equals "=" sign is used as 
a pause marker. 

sync syllabic plaintext with vocals
auto syllabify any text, any language
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

syl-lab-ifi-ca-tion pre-view

3d-f show a preview of syllables in text, so 
you see what you'll tap before you tap. Many
ways to do it, and not that big a deal..

sync any vocalization in plaintext
tap sync vocal text
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

TAP sync vocal text

3g represents the system on a mobile tablet, with 
touch input areas defined; literally zillions of ways 
to configure this, but the basic idea is:
a) you TAP with multiple fingers
b) your taps drive CAPcase through the text
c) if you tap accurately, you SYNC vocals with text
d) separate keys or input areas add timing controls
* PAUSE, insert pause between syllables in realtime
* JOIN, delete next pause or join this and next syllable
* UNTAP if you tap too soon, untap
* BACK, go back and retap this line more accurately
* SLOW play vocals at 1/2 speed, or slower/faster
* STOP/GO 

all these controls are working in old prototype;
you can test some GAME controls where 
you play against perfect timings, (ok, but needs fix).

make your fingers see sound in text
tap sync pause join back more perfect timings
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

real-time pause insert/delete

3hij describe JOIN and PAUSE controls: 
from 3h to 3i, the "=" pause marker and 
corresponding timing are deleted; from 3i to 3j, 
a new pause marker is added; these controls are 
built and tested and work great.

Pauses in speech can ring with meaning; we 
insert pauses with extreme precision while 
TAP timing plaintext CAPcase in realtime.
put vocal+text out of mouth
vocal text lip sync
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

put vocal text where you want

3klm show positioning of Vocal Text over visuals, 
so we see text as close as possible to lips, or 
anywhere we want.

note: those words will pour out of mouths
sooner or later; or sooner with vocal+text
more & more perfect timing control
more perfect timings, control live in textarea
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

fine tune perfect timings

3n represents a way to 
a) select some text 
b) adjust timings of selected text 
c) so you can "fine tune" timings 
 
So we get More Perfect Timings.
can you tap sync 10 syllables per second?
multiplayer real-time game for more perfect timings
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

tally more perfect timings


3o Tally Timings will find More Perfect Timings fast:
or many re-timings from one player, 
or many timings from many players, 
c) so we see if we
b) get more perfect timing faster
a) have more fun

make game teach tap sync skills

3p shows a GAME with live feedback,
you play it for any of many reasons:
* beat people
* be champ
* make perfect timings
* learn new words
the problems is, the more pefect that timings get,
the harder it is to win; addicting, but useful

it's a skill; who's the champ?
extend tap sync skills
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

tap tap playback


3q represents TAP input causing segment *playback*,
where each tap *plays* a perfectly timed audio segment;
so if you tap along precisely, then you patch together 
accurately timed playback; so you get real-time audio 
feedback while playing More Perfect timings game, or 
you tap slow to clearly hear new sounds in vocals.

tap TTS

3r represents TAP control of TTS production,
playback speed, pause insertion and so on,
so you hear text to speech at speeds you want, or
control expression in text-to-speech performance.



switch to IPA for non upPERcase writing systems
see sound in text via phonetic transliteration
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

IPA switch

3s shows an "IPA switch" from native writing system 
to the same vocalization encoded in the International 
Phonetic Alphabet; it'll be easy to switch back and forth, 
even while tapping or playing.
 
vocal text sync in any caption format
 
3t shows conversion from the twext format to 
standard caption formats like SUB, SRT etc; it means
you can sync syllables and make playback in 
CLOSED CAPTIONS like you see on TV,
or even captions in http://youtube.com/vocaltext
ever seen plaintext captions sync syllables? where?
CC captions ON, get vocal+text youtube live now
for extra credit, choose "monospace" option
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

keyboard control

3u shows arabic keyboard version and customizable keyboard; 
i like touch-type fast keyboard layout, but you might like 
mapping "t" to "tap" and "b" to back; ok, customize keyboard? 
nothing new here, no claim other than part of system using all 
thumbs and fingers on keyboard to sync text faster/better 
and funner than before, with joins, pauses, and such.
 
same timings, variably expressed

we tried averaging errors in each timing, but that 
fails to score properly; next will view timings 
more ways, like as frequency and modulation, so we 
can detect and apply patterns for more perfect SCORE!!!
 
timing patterns

so if 10 taps are each 1 second too soon, then 
score is -1s off perfect, not compounded -10s,
so we get more perfect SCORE!!!
pattern detect more perfect SCORE!!!
tap the keys YOU want. WHEN you want.
3 3a 3bc 3def 3g 3hij 3klm 3n 3op 3qr 3st 3uvw 3xy 4

text, talk, tap, sync, live
 
3x is you tap, talk and sync text in realtime; 
easy way to sync your voice w/ any textarea text, 
then compare vocal expressions. 

Try it. (chrome/desktop)

3y shows live vocal text from touch input;
maybe tap noise in audio, before being cancelled
can by used to track touch input, so we don't
need keyboard or sensor other than mic. We'll see.

Vocal sync from textarea is easy, did you try it?


touch vocals heard in text, sync vocal+text
let's make ALL text vocal.
0 1 2 3 4 5 6 7 8 9 12
4. align segments
4 4abcd 4efgh 4ijkl 4mnop 5

timing per segment alignments in twextarea

4a shows unformatted plaintext on two lines; for 
every segment (syllable or pause) there is a timing.

4b shows twext format plaintext: small-sized context 
such as timings shown and large-sized text; each timings 
aligns with each segment.

4c compresses 4b the contents; context is aligned by the word. 

4d hides the pause and syllable markers; context is 
aligned by the line.

Where we always control one timing context segment 
for each text segment, we can control multiple views 
of the data in twextarea.
sync vocals in plaintext textarea
align timings in twext
optionally edit like plain-text
4 4abcd 4efgh 4ijkl 4mnop 5

edit text without breaking timings

If we edit text, corresponding timings are added or 
deleted; so we can edit the text transcription of 
a vocalization without screwing up timings 
well?
4 4abcd 4efgh 4ijkl 4mnop 5

transcribe and sync simultaneously in twextarea

CORRESPONDENCE makes sure each segment stays timed.

a) you edit transcription
b) switch to timer mode
c) tap time text you're editing
d) basically time text while transcribing
e) no switch between separate ui
f) faster
g) good

just enable TAPtimer keyboard and play the game
of MORE PERFECT TIMINGS, all in textarea.

if edit transcription, timings adapt
call it "correspondence"
4 4abcd 4efgh 4ijkl 4mnop 5

IPA segment per segment alignment

4m shows a text with segmentation info hidden

4n shows the text with alignment points marked,
(marked separately for words and syllables within words)

4o shows IPA transliteration aligned by the segment

4p shows segment markers hidden in both text and twext

FIG 2-4 show TIME context aligning with text in textarea; 
edit timings directly or switch to timer mode and TAP times;
CAPcase in plaintext allows vocal text sync anywhere, 
like in textarea, twextarea, closed captions, etc.
 
Above, time context is controlled with text in textarea;
alignment is 'n=n' or timing-for-segment key value parity;
multi-finger TAP TAP syncs plaintext CAPcase element;
text edit, syllabification, pauses, retimings, all easy.

Below, translation alignment is separately controlled,
where CHUNKs of context align with chunks of text,
where you say what's a chunk and which chunks align.
sync ipa transcription with all vocals
control IPA transcription aligned with text
0 1 2 3 4 5 6 7 8 9 12
5. align chunks
5 5abc 5def 5g 5h 5i 5jk 5lmno 5pqrs 5tuvw 5xyz 6

ABC Aligning Bitext Chunks 
5a shows example source text, including text, context 
and alignment guides.

5b shows the example aligned by an old method, 2 spaces 
between all corresponding chunks, (now outdated).

5c shows the example aligned by the new method, 2 spaces 
between one of two corresponding chunks, as described below:
this is twext
space align paired chunks in textarea
5 5abc 5def 5g 5h 5i 5jk 5lmno 5pqrs 5tuvw 5xyz 6

control chunk pairs in text and twext

5d represents chunk pairs controlled, where 
at least two spaces precede one chunk in a pair.

5e shows BUMP alignment in unformatted plaintext.
 
align twext in any font, any character set


5f shows proportional-font-formatted textarea text 
with directly editable aligned context; both texts 
directly edited, no separate source text required.
 
(note the alignment guides "1:1 3:3 5:6", where 
third word in context aligns with third word 
in text "3:3: and fifth word in context 
aligns with sixth word in text "5:6" .)

align meanings between the lines
any font, any language
5 5abc 5def 5g 5h 5i 5jk 5lmno 5pqrs 5tuvw 5xyz 6

same-sized fixed-width alignment, plus bump control

note: 5g-5i figure numbering isn't final.. 
i ran out of numbers 

5g shows a simple algorithm to find spaces required 
to align text/context chunks in single-sized monospace fonts; 
nothing new here, other than 
* BUMP to minimize extra spaces and
* control INCIDENTAL ALIGNMENT error
align by the chunk
basic aligner, w/ bump control
5 5abc 5def 5g 5h 5i 5jk 5lmno 5pqrs 5tuvw 5xyz 6

ratio fixed-width alignment, bump controlled

5h shows the known 5g method with ratio applied, 
so we align variably-sized monospace text/context; 
!!
in practice, the method delivers uneven results at 
different font sizes on different systems, zoom levels, etc
so it fails..

ratio method doesn't quite "just work"
5 5abc 5def 5g 5h 5i 5jk 5lmno 5pqrs 5tuvw 5xyz 6

span align any contenteditable font/character 

5i applies SPAN within contenteditible in html5; 
the context chunk is moved one space at a time until
aligned with text chunk; if INCIDENTAL ALIGNMENT, we
need to know which words are in what CHUNKS, so we
BUMP context one space.

Now, in any font, at any sizes, in any language, we can 
align chunks of context with chunks of text, and 
within directly editable textarea or "twextarea".

Below you see how to SCOOCH those chunks around,
so you can edit exactly what context aligns with
which chunk, simply add/remove a space between chunks.


solution working in demo
5 5abc 5def 5g 5h 5i 5jk 5lmno 5pqrs 5tuvw 5xyz 6

scooch chunks back and forth

SCOOCH is where twext starts to breath;
you easily edit a context chunk and align
it with the exact chunk in text you want

pull back

5j shows PULL controls, which interpret
backspace or delete input between words while
handling exceptions and maintaining the 'n:N'
alignment guides

push forth


5K shows PUSH controls, interpreting
space input within or between words; PUSH and PULL
are tested and work well

Add/remove space between words in either text/context;
easily align chunks in context with chunks in text

Detailed examples below
scooch chunks where you want
5 5abc 5def 5g 5h 5i 5jk 5lmno 5pqrs 5tuvw 5xyz 6

5l takes example we saw in 5a; 
alignment guides show first text word 
aligned with first context word.

5m one space is added before 
2nd context word "context"; PUSH control 
aligns it with 2nd text word "in".

5n another space added before 
2nd context word "context"; PUSH 
aligns it with 3rd text word "text".

5o a space is removed before 
2nd context word "context"; PULL control 
aligns it with 2nd text word "in".
align context with any chunk you want
5 5abc 5def 5g 5h 5i 5jk 5lmno 5pqrs 5tuvw 5xyz 6

5p space removed before 2nd context word "context"; 
chunk is BUSY so PAIR deleted from guide

5q space added in before 2nd word "in" in big text; 
scooch chunk one word to right

5r optionally interpret 5q input to align via BUMP 
(one extra space to delineate context chunks)

5s push context to right (note next chunk is BUSY 
so next PAIR is deleted from alignment guide)

edit twexts direct
5 5abc 5def 5g 5h 5i 5jk 5lmno 5pqrs 5tuvw 5xyz 6

5t push text chunk, align with next word in context 
(push/pull controlled in both text/context)

5u push context, find next chunk BUSY 
so delete next PAIR

5v edit context (adding extra word); PAIR guides 
are modified from 2:3 to 3:3

5w edit text (remove second word "in"), PAIR guides 
are modified from 3:3 to 3:2 (error in drawing)

if you wanna align text, just add/remove spaces
5 5abc 5def 5g 5h 5i 5jk 5lmno 5pqrs 5tuvw 5xyz 6

5x shows empty context line; if cursor put 
into the empty line, cursor moves to nearest 
alignment point; if backspace, move cursor to 
previous alignment point; if space added, move 
cursor to next alignment point; no context content, 
so PAIR guide = 0:0

5y shows blank chunk in context controlled via 
pair 0:1; first word in context aligns with third 
word in text (1:3); push/pull controls apply while 
blank chunks are controlled

5z represents all above controls working with 
unicode text, with proportional font of any width, 
and non-space-delineated writing systems are handled.

Alignment controls defined above apply to text 
in segments and/or chunk; two context forms, 
timings and restatement are seen. In FIG6 series, 
we sync both timings and restatements.

0 1 2 3 4 5 6 7 8 9 12
6. sync contexts
6 6abcd 6efgh 6ijk 6lmn 6opqr 7

linguistic alignment in time

FIG6 SERIES combines timing context with restatement context; 
for a given recorded vocalization, text segments are perfect timed; 
perfect timings are applied to restatement context via map. The 
result is synchronous playback in both text and translation, or 
linguistic alignment in the time dimmension.

6a shows required data in single-size plaintext
* medialink containing recorded vocalization
* text, segmented into syllables and optional pauses 
* timings, one for each text segment
* context, optionally segmented into word parts 
* map linking context parts with text segments

6b shows alignment of 6a data in variable size 
directly editable twext. (Everywhere possible within 
this system, direct edit control is facilitated, but 
not intended as exclusive means to input data; there 
are many way to both map and control the data; what 
has not been seen is synchronous playback of text and 
translation, optionally while applying CAPcase)

6c-6k show playback of a single text CHUNK 
synchronized with translation parts; playback is 
optionally normal speed or slow; tap playback provides 
segment by segment playback control.

6c shows a first part of a text word 
in synchronous CAPcase playback while a corresponding 
word in translation context is highlighted.

6d shows a second part of a text word synchronized 
with two parts of translation context 1) a word part, 
2) a complete word.

tap back/forth, experience ties
tie full words and segments within words
6 6abcd 6efgh 6ijk 6lmn 6opqr 7

align whole words or parts within words

6e shows part of a context word sync'd with a full text word

6f shows part of a context word sync'd with first part of text word

6g shows same part of context word sync'd with second part of text word

6h shows complete context word sync'd with complete text word
tie text segment w/ context chunk
6 6abcd 6efgh 6ijk 6lmn 6opqr 7




6i-6k show separate text syllables independently sync'd with distinct words in context
tie words, segments within words, and multi-word chunks
6 6abcd 6efgh 6ijk 6lmn 6opqr 7

align whole words, segments and/or chunks

6l shows a concurrent second map applied, with 
multi-word chunk control

6m shows same playback seen in 6c

6n shows three word context chunk sync'd with 
verbal form in second part of text word

Any number of maps can be added; the result is highly 
specific linguistic alignment experienced in time.

explore meaning while hearing vocal text
6 6abcd 6efgh 6ijk 6lmn 6opqr 7

experience vocal text parts you want

6o-6p show pointer beneath text hovering over 
hidden context; system shows translation parts while 
synchronously playing (CAPcase text *and* audio) 
vocalization of text part.

6q-6r show pointer hovering over text parts; 
audible vocalization of hovered parts is peformed, 
while linguistically aligned translation chunk is 
temporarily visible.

So you experience the sight and sound of synchronous 
vocal text while only temporarily seeing the 
translation information.

FIG 1-6 show controls of context alignment in two forms: 
timings and restatement/translation. Fig 7-9 show other 
context forms applied and controlled, so we see if we can
make text more informative.
 
0 1 2 3 4 5 6 7 8 9 12
7. tag, what?
align tags, structure
7 7abc 7defg 7hijk 7l 7mnop 8

tag twext

7a shows editable little timings aligned with segmented text

7b shows word alignment points while hiding timing context
 
tag structure by questions

7c shows context question words or "ASQ" tags 
aligned with text word.

ASQ system simply parses text into segments/chunks 
labeled by question; structure is defined more by meaning, 
in comparsion to formal grammatically structures; grammar 
tags are easily aligned but now shown in this example.  
Forms of aligned context are easily switched, 
as shown in FIG.9 series
tag structure, questions
7 7abc 7defg 7hijk 7l 7mnop 8

kinda like grammar, but different

7d shows 7c text with tags hidden and ASQ links listed above.

7e shows question word "WHO" input (via mic or click); 
text words tagged with "who" tag appear in CAPcase, and are 
also optionally reproduced in audible vocalization.

7f"WHAT" is input; words tagged with "what" are 
distinguished and played in audio.

7g "DO" is input; words tagged with "do" are 
distinguished and played in audio.

question structured text?
or how tag text meaningfully?
7 7abc 7defg 7hijk 7l 7mnop 8

an experiment with tags in twext

7h "HOW" is input; words tagged with "how" are stressed, made audible.

7i "WHERE" is input; words tagged "where" stressed, played audibly.

7j "WHY" is input; words tagged "why" stressed, made audible.

7k represents color coding of question tagged text.

align tags by segment, full word or chunk
7 7abc 7defg 7hijk 7l 7mnop 8

align chunks, words and segments.. all twext

7l shows tags aligned simultaneously by the chunk and segment; 
two arrangments of source text record the directly editable 
output in the middle; above, a simplified notation is used; 
below, the explicitly declared chunk pair system is adapted 
to handle segments within words.

7l shows source text which informs the vocal text experience 
illustrated in 7m-7p below.

structure vocal+text playback via questions
où est la structure?
7 7abc 7defg 7hijk 7l 7mnop 8

en français

7m-7p show "foreign" language example of ASQ system in use, 
and also shows the result of tags aligned with all words, 
chunks and segments in words, as shown controlled in 7l.
 
7m shows "QUI" input via voice/hover/click; 
a full word and a segment within another word are 
made audible and more visible.

7n shows "FAIRE" input; a full word is made 
audible and more visible in text.

7o shows "OU" input; a chunk of three words 
is made audible and more visible in text.

7p shows "QUAND" input; both a chunk of four words 
and a segment within a fifth word is made audible and 
more visible in text.

The idea with Vocal Text is to get text perfectedly 
synchronized with a vocalization, then apply the 
synchronous performance in environments ranging 
from simple playback to tap controlled playback to 
playback synchronized with translation to 
playback parsed by tags such as questions; 
in all cases, writing system permitting, CAPcase is used 
to transform and change the form of text parts being heard. 

Vocal Text is text you hear. As described below, we 
control multiple vocalizations of constant text, 
(so you hear many ways/voices to say the same thing), 
and we control multiple depictions of any 
(preferably short) text string.

0 1 2 3 4 5 6 7 8 9 12
8. depict vocals
8 8a 8b 8cde 8fgh 8ijk 9

apply tiered carousels

8a represents TIERED CAROUSELS used to ORDER LISTS, 
sort images relevant to a specific text string; 
lower right, a list of links to pictures (including 
motion pictures) are controlled; to the left, 
thumbnails of the pictures are presented in tiered 
carousels; thumbnails are moved up or down, toward 
or away from association with the text string; 
pictures may be viewed in more detail in media player 
above carousels; text string search query is entered 
to fetch associated pictures, either from local 
or internet search.

So a picture dictionary isn't stuck with one picture 
per word or phrase; phrases are easily associated 
with multiple pictures.

8 8a 8b 8cde 8fgh 8ijk 9


thumbnails up, thumbnails down

8b shows tiered carousels GUI used in sorting action; 
order of media links seen in list is changed 
as picture #16 is moved up to #4 position.

Vocal Text is optionally performed while preferred depictions 
are sorted up.

Tiered Carousels can also be applied to sort variable 
vocalizations of constant text strings, where each 
separate vocalization is represented by a separate picture. 

represent/sort vocalizations
8 8a8b 8cde 8fgh 8ijk 9

sort vocalizations

8c represents one vocalization of the example text string 
"Can you hear me", which is represented in synchronous 
Vocal Text performance; picture sorter is minimized below.

8d shows picture sorter actived: tiered carousels 
contain thumbnails of multiple vocalizations of the 
constant text string; the pointer is seen hovering 
over list item #28.

8e thumbnail previously ordered as #28 from 8d 
has been dragged onto the Vocal Text playback area, 
and is now numbered #1 and playbed in maximized media player; 
a separate vocalization of the same text string is 
experienced in synchronous Vocal Text. 

The internet has vast troves of recorded vocalization 
in audio and video; vocal text synchronized vocalization 
precisely with text; thus, vocal text of a specific 
string can be searched, and various vocalizations of 
the constant text string can be listed, then sorted 
in tiered carousels. Sorting decisions are optionally 
returned to the image search engine, so better 
depictions are more easily seen.

If you're learning new language or enjoying language 
you already know and love, you can experience an 
assortment of vocalized expressions of those words, 
all performed in synchronous vocal text.

sort depictions
8 8a8b 8cde 8fgh 8ijk 9

sort depictions

8f widens to present more text surrounding the 
example string "can you hear me"; the example selected 
in 8e is fetched from one video performance 
of the song "Space Oddity".

8g shows the 8f text aligned with context; as shown 
in FIG 9 series, withing same textarea or "twextarea", 
multiple context forms are controlled in independent 
alignment with any text; in the 8g example, depiction 
context is written.

8h depection context from 8g is applied, optionally 
in conjuntion with text, in IMAGE SEARCH; thumbnails of 
images found are loaded into tiered carousels and sorted.

align twext, mix media
remix vocals, depictions
8 8a8b 8cde 8fgh 8ijk 9

edit timed depictions and vocalizations
 
8i shows a select depiction of text string "Can 
you hear me, Major Tom?" while VocalText is performed.

8j shows vocalization context in the form of 
specific media controlled in alignment with parts in text; 
multiple vocalizations are assembled to perform a 
single remix vocalization of the text.

8k shows a switch between picture sorter 
*depictions* of a text string, and large view 
*vocalization* of the same string; multiple depictions 
and vocalizations of the string are easily accessed.

0 1 2 3 4 5 6 7 8 9 12
9. toggle twexts
vary FORMS of independently aligned context
9 9abcde 9fghi 9jklmn 9opqr 9stuv 9wx 12

toggle various forms of twext

FIG-9 SERIES represents TOGGLE CONTEXTS, meaning 
toggle through multiple contexts, in distinct FORMS then 
VERSIONS or variations; each context is controlled in 
INDEPENDENT ALIGNMENT with constant text.

9a shows IPA context, with corresponding 
segmentations, aligned by the word

9b shows VOCALIZATION context to remix a new vocal 
performance from existing sources, aligned by the chunk.

9c shows perfect TIMING context used to play 
synchronous VocalText, aligned by the segment.

9d shows QUESTION context, structuring the text into
question categories, aligned by the word

9e represents TOGGLE access to various context FORMS, 
each in independent alignment with the text.

vary VERSIONS within a form
perfect translation/restatement? it doesn't exist.
9 9abcde 9fghi 9jklmn 9opqr 9stuv 9wx 12

toggle versions within a form

9f shows one VERSION of same-language RESTATEMENTS 
aligned by the chunk.

9g shows another separate VERSION of same-language restatement, 
INDEPENDENTLY ALIGNED with distinct chunks of text.

9h shows three VERSIONS of restatement context, each 
independently aligned with distinct chunks in the text.

9i represents TOGGLE fast access to variable VERSIONS 
controlled within the form of same-language restatement.

SORT versions: good up, ok down, hide bad
perfect translation? doesn't exist.
9 9abcde 9fghi 9jklmn 9opqr 9stuv 9wx 12

sort preferred versions, exclude unwanted versions


9j shows one VERSION of Spanish translation context 
aligned by the chunk.

9k shows eight versions of Spanish translation context, 
each version INDEPENDENTLY ALIGNED by the chunk.

9l represents SORT control applied to order PREFERRED 
VERSIONS of Spanish translation context; unwanted versions 
are placed beneath the empty line.

9m represents TOGGLE fast access to multiple versions 
of Spanish translation context.

9n represents SORT control applied where most recent 
version is moved to most preferred version.

compare mixes
toggle personalized depictions
9 9abcde 9fghi 9jklmn 9opqr 9stuv 9wx 12

toggle personalized depictions
 
9o shows DEPICTION context tags aligned by the chunk

9p shows various VERSIONS of depiction context, each 
independently aligned by the chunk; note that depiction 
context is recorded in any written language.

9q shows tiered carousel picture sorter 
loaded with thumbnail depictions suggesting the text string 
"Can you hear me, Major Tom?"; listed along the bottom row are 
usernames which link to personalized assortments of depictions of 
said text string.

9r represents an average of most preferred depictions of 
said text string.

customer-owned ai
9 9abcde 9fghi 9jklmn 9opqr 9stuv 9wx 12

toggle via headtacking

9s-9v shows a mobile device with camera applied to 
detect viewers head position; variable head positions 
detected are applied to TOGGLE both DEPICTION and 
TRANSLATION context simultanously.

9s shows device seen from above; a 1st depiction 
is seen in the media player; Chinese language translation 
context is independently aligned by the word.

9t shows device seen from left; 2nd depiction 
in media player; Thai language translation context is 
seen independently aligned by the chunk.

9u shows device seen from the right; 3rd depiction 
in media player; Russian language translation context, 
independently aligned by the chunk.

9v device seen from below; 4th depiction in media 
player; Italian language translation context, independently 
aligned by the chunk.

So you can see all the translation and depiction 
context you want, with minimal effort; this example assumes 
you're learning English, Mandarin, Thai, Russian and Italian; 
concurrently you experience sorted and preferred depictions 
of the string.

oh no
9 9abcde 9fghi 9jklmn 9opqr 9stuv 9wz 12

toggle twext during vocal text

9w-9x affirm that all TOGGLE controls, 
(applied to vary aligned context forms and versions), 
can be made while PLAYBACK of perfectly timed 
synchronous VocalText occurs.

9w shows toggle applied to switch context 
from Same-Language Restatment to Spanish translation.

9x shows toggle applied to switch context from one 
VERSION of Spanish translation to another version of same.

It's understood that each form and version of aligned 
context is controlled in independent alignment with 
segments and chunks in text.

who cares about literacy?
1 2 3 4 5 6 7 8 9 10 11 12
10. twext power

too much twext

FIG.10 series was supposed to define methods 
to WRAP aligned context, and control contexts in an 
experiment plaintext data format, but I ran out of time.

10a shows the problem of too much content in aligned 
context causing unnatural space to appear in text.

10b shows context contents truncated.

10c shows size of aligned context controlled to 
fit complete contents.

take 'em up
1 2 3 4 5 6 7 8 9 10 11 12
11. twext style

accessible but non-intrusive twext


11a represents aligned context printed on computer display, 
where when viewed from a low angle is invisble.

11b shows the context as slightly visible when viewed head-on

11c show the context easily seen when viewed from above

This is a continuation of claims made a while ago; aligned context 
can be the same height relative to text if typeface is narrow.

ci siamo capiti?


11d shows light background with white context aligned by chunk

11e shows dark background with black context aligned by chunk

Again, Chunk Aligned contexts can be made detectable but non-intrusive.

repeat, remember
11f-11h represent aligned context printed with ink on paper, 
made virtually invisible in medium to low-light environments.

a mal tiempo, buena cara
0 1 2 3 4 5 6 7 8 9 12
12. twext face
12 12a 12bcde 12fghi 12jklm 12nopq 0


12a diagram represents a conversation between two persons; 
the digital representation of faces is constructed from both people, 
the eyes of one put above the mouth of other, and vice versa; 
optionally, the faces are constructed from other personas. 

The purpose is to reduce shame or have more fun 
while speaking new language. Or have more empathy.

Facial detection and image stabilization tools are used to 
fit each face within a framework; the faces are paired and 
then combined. CAPcase shown represents VocalText performance 
in real time; speech recognition and SYNC is applied; 
error correction of sync'd text handled by TAP controls.

se te ve en la cara
12 12a 12bcde 12fghi 12jklm 12nopq 0

real-time face mix

12b represent video capture from two computers; it's understood 
that face positions vary due to body movements and computer position; 
synchronous vocal text is seen in performance below.

12c represents a simple framework upon which detected face parts, 
eyes and mouth, are fit after image stabilization is performed.

12d shows one face fit into the framework

12e shows the other face fit into the framework


any face you like
12 12a 12bcde 12fghi 12jklm 12nopq 0

12f shows the faces combined across vertical axis; 
black arrow in upper left represents optional axis motion

12g shows faces combined across horizontal axis; 
first face mouth below second face eyes above; axis 
optionally continues clockwise motion.

12h shows faces combined side by side; axis 
rotation optionally continues at user selected speed.

12i shows faces combined first face eyes above 
second face mouth below; axis rotation continues.

12f-12i axis rotation can be tested fast to create 
illusion of a single face, or stopped entirely 
at a desired position.

know what they say
12 12a 12bcde 12fghi 12jklm 12nopq 0

facemix while twexting

12j-12m shows same face combination controls described 
and shown in 12f-12i with vocaltext performance 
and aligned contexts superimposed; superimposed positions 
of Vocal Text are optionally controlled.
any language you want
12 12a 12bcde 12fghi 12jklm 12nopq 0

12n shows an alternative framework to guide stabilized face 
feature repositioning from computer camera

12o-12p shows faces combined enlarged eyes and mouth 
like some big baby.

12q shows picture sorter applied to sort 
close-up of lips pronouncing text string "vedi quello che dico".
 
NOTE: this face combining thing is not at all well developed; 
processing power available by year 2020 will be *16x* more than 
today's typical smartphone and I was trying to throw all methods 
I possibly could into this application filed 15 March 2013, 
before the USPTO adopted international standard First to File rules.


SUMMARY: THIS SPEC WANTS FOUR THINGS:
1) VOCALTEXT: perfectly timed plain-text in sync with vocals, 
2) TWEXT: align contexts by seg/chunk, edit direct in textarea,
3) PIX8: sort and compare depictions for any string
4) unnamed: safe place for faces to play, mix and empathize

giusto?