What you're explaining there is AIs being good at faking it.
If you don't know how these models generate their output, don't correct me with BS.
I know a lot about how these AIs work.
See the beach hut, etc uncrops. You can see in this link to Palette by Google how good its image completion ability is. It worked on all sorts of complex images.
These AIs are trained to predict the next word/ blob of pixels (the next "Token" ...or the noise...) as said on their websites and research papers. The ANN recognizes lots of patterns in the context and combines the predictions to get probability %s for each vocab token, of which it usually outputs the higher probable ones. While it can predict things that sound like as if it has done a clinical study or reasoned strongly over your question, or just fail at predicting ex. due to a lack of data, it also predicts the next token very good (and that's what humans do is predict the next token).
That's not Faking good "Predictions"! As illustrated above, nothing is fake or crap material here in good Predictions. While the wrong outputs are not called Faking either, but rather are due to a lack of recognition, data, and goals, it outputs something Probably Wrong because it hasn't seen what usually comes next after anything similar to this input. If it says it reasoned over your post in some complex way, or ran a clinical trial, that is from a lack of what chatGPT has, not training data. Did you try chatGPT? It has been told to say who made it if asked, and says that it has no opinion as it is only an AI. It guides what to say and what not to say. Seeing someone be a doctor doesn't make you start saying/seeing you did what he did at that moment in time. You recognize yourself and then you say it because it is linked to the reward.
If you take an image, stretch it, scribble on it, change its colors, thicken the lines, and remove some parts of the subject, rotate it, etc it can still recognize it with some accuracy. An input image will match it and other similar images, even if distorted. We combine all that experience to predict. Same for text, it recognize phrases and words like this. It also recognizes references ex. it (it what?) was thirsty...the 'cat' I said paragraphs back! Also it remembers to place a ending quotation ".....", water reflections, etc. If you say a word lots ex. pig horse cat dog, it primes its mind to likely say a related or exact word, zebra. It has many complex recognition abilities. Recognition is akin to prediction, as it activates memories and triggers the unfinished part of the memories to be outputted. This is why chatGPT also knows where to finish a context and not stop writing its
Ray Kurzweil says in a very recent interview by Alan I think it was, that he thinks these AIs are a good chunk of the way there and don't need too much more to get to AGI. This is also my opinion too. People like to think it is far from AGI because they don't want to be lower than a machine or don't understand how we work.
Did you see Imagen Video? It's maybe the best video predictor out there. Imagine if it predicted humans talking in zoom calls? You could input your side, and it would talk to you back by predicting the next context, just like chatGPT does basically. But with sound if it had sound, facial expressions, and body movements. People would feel like it is a real person due to the expected and useful outputs from the AI's outputs. If these AIs like IV were perfected more and then had sound, and a lot more compute, data, a bigger model, and a few more algorithm improvements, then it may be close to human level.