What is it, what does it aim to be? I'll answer generally first, and then in detail, after reviewing objections.
Printed paper is self-contained. The Delhi print shop charged me $0.03/page, and once in hand, a recipient doesn't need smart phone or computer, electricity, chargers, blackboards, government-paid teacher employees, or anything but time, interest, and light.
Therefore I say: print copies, put them in the hands of the unlettered, and let them go at it.
The need, then, is for black and white line drawings which can survive the xerox copy machine. With a QR code to bring smart phone users like those who own copiers to access the original clean source through the internet, it doesn't have to survive many generations of copying. But the instrument itself will come out of a copier, in the form which users receive it, and that must be good enough. So: clear black and white line drawings are the target format.
What shall they communicate? The unguessable, arbitrary, and conventional correspondence between sound and squiggle that a script represents. This is the key to the kingdom. Which sounds, which squiggles?
Among sounds, we want to uniquely and unambiguously distinguish all or at least most of the linguistically-distinctive phonetic features. These include laryngeal states, vocalic features, consonantal place and manner, and for tone languages, tone. Among all the scripts in the world, any choice might work, but the International Phonetic Alphabet was designed for just this purpose, to symbolize all the linguistically distinctive sounds in human language. So the IPA will do.
Notice: Orthographies are not the target. In each detail where they differ from phonetic scripts, they cannot be taught by phonetic charts. Hence this is not about English spelling, or Cambodian. You can learn those using IPA, but orthography is a separate level and if you seek to achieve the possible then you won't start with orthography.
Nor can scripts with phonetic clusters such as Devanagari "ढ" = [ ɖ h ə ] be the target; a phonetic script can easily write clusters as sequences of simple sounds each of which is capable of being specified using a phonetic chart, in this case. But the other way around would be impenetrable. So three IPA symbols in sequence represent a cluster symbol very nicely, and each was teachable with a segmental phonetic chart; whereas the chart for even a minimal CV sequence would be many-dimensional, thus practically impossible to learn from. So:
Learn IPA first!(And then use IPA, as in writing sequences of sounds in a cluster. With IPA you can learn anything, with definiteness and clarity, because the IPA is defined in terms of simple actual sounds. IPA First.)
Some have objected, saying, Hey, No, instead just start small, just do one language, customize this for Hindi or something, make it work for one before you aim for all. I reply, No, you miss the point. Solving the problem for one language means you specifically prevent yourself from solving the problem universally. Let's keep the focus for a time on a universal instrument, which could give something to everyone. With that we can give something to everyone, and I want to not aim to leave 99% of everyone behind, if possible, and it seems possible. After that, we can put more effort into individual languages and have a Hindi-specific version and versions for different specific local languages. But let's light the world-wide fire first.
Oh that's just the first objection of many.
Some have objected: this should be an app on the internet. These students just need a teacher. This requires audio visual support! We should be providing accessory teaching resources for actual teachers, not this impossible and demoralizing context-less task for unsupported learners to simply destroy their motivation without giving them a functional improvement in their lives. I agree that more is better, and my list of additional accessory elements for learners and teachers is on my list too, though I expect others have provided them already.
Such objectors are not getting the point. More is better, yes, wherever there is more, but what about those without? I think there are a billion have-nots who are living without. The primary, the target recipients for this instrument are illiterate, low-resource people in impoverished and remote corners of the globe. If they do have tech, phones, public schoolteachers, and audiovisual support stuff, then that is great, for them, but then they don't need basic phonetic literacy on a piece of paper. They've got literacy as a whole covered already. They are not our target recipients.
You could say of our target recipients, the world basically until now actually cannot properly reach them, but barely, potentially, yes we can begin to finally reach them, if only by putting something self-contained into their hands: printed pieces of paper. Something that communicates without teachers, without smart phones or computers or tech. It just needs the information to be in there, clearly and unambiguously, so that if they are interested, they can figure it out themselves.
And among the innumerable steps of the literacy ladder, everyone already knows their own language in its full glory. They know its words. They know its grammar. They know its sounds. We don't have to teach language to humans: They have language! What they don't have is the arbitrary, conventional, and generally quite impenetrable sound-symbol correspondence. Even for the most phonetic of scripts, Korean hangul, the phonetic meaning of this or that squiggle MUST be explained by a teacher in direct audio-visual contact with the student. Without that, it is impossible; people are helpless to cross this fundamental obstacle, and because of this impossibility, they are functionally outside modern society, and perhaps even functionally enslaved by the literates around them.
So the crucial essential problem is, can a line drawing communicate sufficient phonetic detail, enough to uniquely identify the distinctive sounds of language?
I say Yes. And here to prove it are the details. Of course you can just look at the drawings and satisfy yourself, but some guided commentary might not be inappropriate.
Hence the drawing for [schwa] is shown as the equal-cross-sectional-area vowel tract configuration, and then all the non-schwa vowels can be understood as deviations in one direction or another from it.