I have improved the quality of the images that I am using for training the AI, but so far, I cannot produce an acceptible embedding. I will continue to work on this.
What I have been able to do though is use a combination of controlnet and charturner to effectively “pregenerate” the images that I need without necessarily training the AI.
What I am doing is creating an image with all of the poses that I will need for a given character on it, and then using that with control net open pose and charturner (and a specific, highly detailed prompt) to generate the images that I will need for each character.
I can then upscale these, and perform post processing on them as needed.
This isn’t ideal, and of course it only works for a small number of specific images, but for a game where only a small number of different poses is required, this could be enough.
I have used several techniques to try to train an AI to generate consistent characters.
I have achieved some success with this.
The most common technique seems to be to generate a number of images using either a third party program or stable diffusion itself, and using those images as a training set to train an embedding, then using that embedding in text prompts.
I was able to achieve some success with this, however it wasn’t quite enough to do what I needed it to do. I suspect that I need better training images.
I can generate better training images using tools like charturner and controlnet, and this, again, does seem to produce decent results, but there are still issues.
I intend to try a few more techniques, the current results are still not quite good enough for my purposes.
My current goal is to produce consistent images of the same character, in different poses (possibly in different clothes, etc), easily and reliably.
I think that the correct way of doing this is to use an embedding, or “Textual Inversion”.
The problem is generating the images for the training set. I want to do this entire in substance diffusion (I could generate a 3D model first, for example, and then render out images from it) and so generate sufficient images for a training set is difficult, however if I can figure this out, I should have all that I need to use substance diffusion for visual novel-style games.
I have been experiment with Stable Diffusion recently, and I have been using it to generate photo-realistic images of various characters, etc.
The software is incredibly powerful, and can easily create very highly detailed images.
It cannot be used for 3D models, etc, or for texture sets, but for a visual-novel style game it could be invaluable. Not only could it generate backgrounds and scenes, it can also generate characters too.
The only problem is that it is quite difficult to generate the same character consistently across multiple images, each image is unique.
This is a problem that I am working on solving now, if I can figure this out, I could create some simple concept games using this software.
I have always had a great interest in multi-dimensional shapes. The idea that there could be a whole “dimension” of space that exists in dimensions beyond those that we can see is fascinating.
Human beings are used to seeing in three dimensions, length, breadth, and depth. In computer graphics these are generally labelled as X,Y, and Z.
However, there are an unlimited number of dimensions possible. Since humans only see the world in 3 dimensions, it is not possible to “create” a true 4 dimensional or higher object, but it is possible to create a representation of one of these shapes.
The secret is something called “Projection“. This is a mathematical means of mapping a higher dimensional structure onto a lower dimensional structure.
As a computer games programmer, I am very familiar with projection. Computer games, for now at least, are played on screens, which are two dimensional, while most games themselves, are three dimensional.
This means that it is necessary to do a “window to viewport transformation”. This basically “projects” the Z axis, or the depth information, onto the screen, while preserving the X and Y axis. This means that a 3D object can be drawn accurately onto a 2D screen.
The same technique works at higher dimensions. By fixing two axes, X and Y, in the same way as the 3D to 3D projection, and then projecting the other axes, (however many there are) it is possible to create a 2D representation of the higher dimensional shape.
These shapes look bizarre, especially when animated. This is because the way they are constructed and the way the move is completely unlike anything we are used to dealing with. We understand 3D objects, since we live in a 3D world. We also understand 2D objects, because 2D is a subset of 3D, but 4D objects seem to be “impossible”, even though they follow the same rules as other objects.
To demonstrate what a 4D object is, and how it is constructed, I constructed a “Tesseract” by following THIS tutorial. A tesseract is basically a cube in 4D.
First, here is a 1-Dimensional object, a simple line. containing only length information:
Then, a 2-Dimensional object, a square. Containing both length and width information:
Now there is a problem, because the screen is only two dimensions. So, how is it possible to represent a 3-dimensional shape, such as a cube, which contains depth information as well? The answer is simple, draw a representation of that cube:
This is not a true cube, but it can be essentially thought of as a “snapshot” of a cube at a given position and rotation. This idea of the “snapshot” will become important shortly. First, take a look at the tesseract, the 4-Dimensional cube:
This obviously looks very complex, but look closely, and you will see that it is essentially a series of interlocking cubes. Now, take a look at this image, which I obtained (HERE) from the wikipedia page on the tesseract:
This is also a tesseract, but it looks completely different. Why? This is simply because the viewing angle and rotation is different, producing a vastly different projected shape. Wikipedia has two examples of rotating tesseracts. The first (left) shows rotation about one plane, and the second (right) shows rotation about two planes:
These animations look impossible, and seem to violate the rules of physics. Yet what is happening here is exactly the same as what would happen if a cube was rotated about one or two of it’s axes, the only difference is that another dimension is involved, and our brains can’t truly comprehend it.
I have toyed with the idea of writing either a program or a library (probably for Torque) that can render higher dimensional shapes. It would be very interesting to see what they look like when using the Oculus Rift, especially since the rift is a true 3D device, and therefore would only need a 4D to 3D projection, and not a 4D to 2D.
There is a game by the name of Miegakure (and another game or concept project which I played some time ago, I believe that it was called Daedalus) which created 4D game worlds. This lends itself well to complex reasoning and puzzle solving. It seems that with enough practice, human beings can in fact learn to “process” the fourth dimension, and make some sense of it.
I don’t know if a 4D first person shooter has ever been created, I doubt it, but it would definitely be a fascinating concept to develop.
Also, I have recently read about the “Teslasuit“. This is a full-body haptic suit created by a UK company that allows users to “feel” what is going on in a Virtual World. The suit uses Electro Muscular Stimulation, or EMS, to allow users to feel everything from “a cool breeze to the impact of a bullet”, according to a roadtovr.com article.
The suit will begin funding on kickstarter from January the 1st.
I have recently been thinking about purchasing a device that I can use to remain connected while in remote areas.
There are many options available today to solve this particular problem, from smart phones to tablets, and from 3G dongles to wifi antennas. However, I was surprised at how difficult it is to make such a system truly portable for long periods of time, while still remaining connected to the internet.
Firstly, I could not find a single phone, tablet, or laptop with a built-in jack for an external antenna. When using a device in a remote area, this is essential. The only option would be to replace the built in 3G/4G or Wifi card with a dongle, and then connect an external antenna to the dongle. This is inefficient and adds needless bulk to a system designed for portability. It can also difficult to find even dongles with antenna jacks, many don’t have them.
The second issue is power. There are several types of solar panels and external batteries available for mobile devices, however, for a larger tablet, or small netbook, it is difficult to know if these would be sufficient. A small tablet may not be powerful enough to perform adequately over a longer period of time, especially for an individual who demands a lot from their computer.
Another problem that needs to be solved for a device like this is which operating system to get. Most small tablets use Android or iOS, and these mobile operating systems work fine for general tasks, but for individuals who require more from their device, or who want to integrate more easily with other computers, programs, or file formats, a desktop operating system (Such as Windows 8.1) may be required. However it is rare to find a Windows 8.1 device under the 10″ screen size mark, which makes them big and power-hungry.
The ideal system solution, for my purposes at least, would have:
Screen size between 7.x” and 9.x”. This should be big enough to be usable, while still being small and power-efficient.
Feature a desktop-operating system. (Windows 8.1 is usually the only option, but I have seen some flavours of linux installed in these machines too).
Feature extensive power-saving features and long battery life
Have support for an external 3G antenna. (This is likely not possible without an external dongle)
I am currently researching various options in this area, and am torn mainly between getting a small tablet, and learning to conform to Android as my primary operating system, and buying the smallest Windows 8 tablet that I can find, and hoping it’s not too big or power hungry.
After some more research, I have learned that the primary concept of this project is actually more or less impossible.
The fatal flaw was in trying to find a single seed, which, when combined with a random number generator, would produce a specific string of integer values. If this was possible, then it would revolutionise the world of data compression, since, in theory, you could compress an entire website or database to a single integer.
In reality, the chances of finding a single int that produces the desired output, even for very short strings, it so low as to be essentially impossible.
I had originally intended to limit myself to a text-only solution, but I have reluctantly turned my attention to hiding text in images.
Hiding data is raw images, such as .BMP files, is easy, you just change the least significant bit of the pixel colour values to represent the data that you are trying to represent. This produces very little visual distortion in the image, and can encode relatively large messages, or even other files.
However, for image files which have been compressed (most commonly used on the internet) such as .JPG files, this approach cannot be used. This is because the least significant bit is where the compression algorithm does it’s work (to reduce the filessize of the image) meaning that most of the data stored there would be lost. There is another way however, of hiding data in a JPG image, and that is by using Discret Cosine Tranforms, or DCT.
THIS site provides a great deal of information on this, as well as several links to free programs that provide this functionality.
The program that I am using is the simple command line tool StegHide.
The only disadvantage I can see with this program, and the others that I looked at, was that they seem to include some form of encryption. I wanted a simple program to hide, rather than encrypt, a text message into a .jpg image. These programs seem to require passphrases to encode/decode the data, although they can be simply left blank.
I may do some more research on this topic in the future. it’s something that I find very interesting.
I wasn’t sure what to name this project for a while, I eventually settled on “Steganographic Obfuscator”, even though it is slightly redundant. This is a very quick concept test of an idea I had recently.
I wanted to create a program capable of hiding a message within a message, purely using text. I didn’t want to use manipulation of digital images, or any kind of bitwise operations, I wanted the text generated by the program to be usable without a computer, ie, it should be possible to write down the original message and send it in a letter, and still be able to recover the hidden message from it.
This idea developed from an idea I had to use some kind of encryption algorithm with two separate keys, each one providing a different, but still readable, text output.
The way P159 works is very simple, and as such, relies on “security through obscurity”. It would not stand up to a concerted attempt to recover the original message if the attacker knew that a message was present.
P159 requires two inputs: a plaintext message, to be used as a “cover” for the hidden message, and the hidden message itself. Like in most similiar algorithms, the plaintext message must be longer than the hidden message.
P159 will iterate through each character in the hidden message and the plain text message at the same time, searching for a “match”, a place where the same letter appears in both text strings. These can be in no particular order, and in fact, later version of the program will begin the search in random locations to increase obfuscation. The program will then store the position in the plaintext string where the match occurred, and move on to the next character in the hidden message.
Eventually, all character in the hidden message will have been “merged” with the original message, and an integer value will be stored for each one, corresponding to its location in the plaintext string, for example:
Plain text:
T h i s i s a t e s t
1 2 3 4 5 6 7 8 9 10 11
Hidden text:
at
The letters “A” and “T” occur at positions 7 and 1 in the plain text string. This is the integer string.
So now, I have an integer string corresponding to the message, but, if I simply put that at the end of the plain text message, people will very easily figure out what the hidden message is.
What happens next is a little unique.
In computer, random number generators are not truly random, they are “pseudo” random. This means that the string of numbers that they output can be reproduced if the initial “seed” value is known. What I do in P159 is I work backwards. Instead of using a seed to generate random numbers, I use the string if random integers corresponding to the hidden message as input, and I get a seed back from the random number generator which will produce that same string of ints every time it is input to the random function.
This requires “cracking” the random number generator. Thankfully, this appears to be very easy to do, (Many random number generators, including, I believe, the one used in C++, are what are known as “Linear Congruential Generators” and are very easy to crack”).
This leaves me with the whole message essentially encoded in one little int, which I can conceal in the plain text message relatively easily, such as in a date, or a time, or some other subtle way.
This project is not yet completed, but initial concept tests look very promising.
I have been considering the pros and cons of creating my own games engine for some time now. Obviously, this is a massive undertaking. Creating a games engine from scratch would involve writing custom modules to handle rendering, physics, object and mesh importing, animation, sound, networking, lighting, shading, texturing, and a whole host of other features.
Creating an engine, along with creating an MMO game, are probably the two areas where developers fail most often when embarking on a new project. It is the reason I have never tried to create a complete 3d engine before now.
However, in recent years my work has become more complex and more specialised, and I have moved more into simulations and virtual worlds, and away from traditional games.
This has meant that traditional engines, such as my much loved T3D engine, have been not entirely suited to the tasks I have been asking them to perform.
One major problem that I have come across on several occasions, and which I mentioned in a recent post, is the fact that T3D, along with the vast majority of conventional games engines, uses 32-bit floats to store position information. This gives a limit of about 10,000 units of precision, after which “jittering” starts to occur, which is where imperfections in the math used to control the objects position cause the objects to appear to shake as they move.
Assuming 1 unit is 1 meter, that means the player can only travel 10,000 meters in any direction before encountering precision issues.
For the vast majority of games, this is more than enough room, however for driving games, flight sims, open world games, and especially, space games, this is not even close to enough. 64-bit floats would allow an area approximately the size of the solar system to be traversed without issue, which is much more reasonable for game with large worlds.
However, even 64-bit floats would prevent a true virtual universe. For that, you would need even bigger numbers. In C++, a “Big Number” library would be needed, since 64-bits is the largest number that can be represented as standard. This would be a major decision, since these transforms would be used everywhere, every time an object is placed or moved, every time a shot is fired, and the larger the number, the slower the game will run (Hence the reason why 32-bit numbers are normally used).
There are quite a few other “pros” with regards to writing my own engine. I would get full control over the code, I would be able to market it, sell it, and more importantly, I would be able to include only the features and libraries that I wanted. I could optimise and streamline the engine to do exactly what I need it to do.
I would also understand exactly how it works, I wouldn’t need to spend hours scrolling through pages of code trying to figure out how a particular feature was implemented or where to add a modification.
There are many cons too, namely, the time it would take to create something of this complexity, and the problem of ensuring the engine is reliable enough and tested well enough. Most engines have many users beta testing them for a lengthy period before the engine is released, I would likely have to go down that route too.
I have decided to begin a preliminary feasibility study of a 3D engine designed for very large open worlds. It will be optimised for persistent and massively multiplayer worlds, and should support procedural content, and seamless surface-to-space transitions (a player should be able to explore the surface of a planet,take off, fly to a distant world, and fly back again, without load zones or trickery of any kind).
I am tentatively calling it the “Phoenix” Engine.
I will not be committing to the use of this engine yet, this is, for now, just an experiment.